Skip to Main Content
  • Questions
  • Oracle FullText - classifying PDF/DOCX file

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Ionut.

Asked: August 15, 2017 - 9:58 am UTC

Last updated: August 17, 2017 - 1:19 am UTC

Version: 11g, 12c

Viewed 1000+ times

You Asked

I want to use the Oracle Classifying Documents over a PDF/DOCX files. Is it possible that, as the PDF/DOCX are kept into database as BFILEs or BLOBs?

I did that for CLOBs but I couldn't do it for BLOBs or BFILEs.

Can anyone give me a hint, please?

and Chris said...

Tim Hall has a nice example using Oracle Text on BLOBs storing PDFs etc. over at:

https://oracle-base.com/articles/9i/full-text-indexing-using-oracle-text-9i

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

not exactly what I was expected

Ionut Preda, August 16, 2017 - 4:31 am UTC

Hello Chris,

I observed into Tim's article that it is used CLOB as data type to hold the documents into the table.

My question was regarding BLOB or BFILE data type.
Is it possible to classify this kind of data type?

Thank you,
Ionut Preda.
Chris Saxon
August 16, 2017 - 1:42 pm UTC

Are we reading the same article?

Certainly looks like it uses blobs to me:

In this example we will store the data in a BLOB column, which allows us to store binary documents like Word and PDF as well as plain text

Ionut Preda, August 16, 2017 - 3:07 pm UTC

Hello Chris,

The phase you provided refers to CONTEXT index which is used for document collection application.

For classifying application, as it is described in same document, it should be used CTXRULE index which is used with MATCHES function, to build document classification application.

... and in the table definition used for example:

CREATE TABLE my_docs (
id NUMBER(10) NOT NULL,
name VARCHAR2(200) NOT NULL,
doc CLOB NOT NULL
);

doc column is CLOB and not BLOB or BFILE.

That's why I am saying the article is not relevant.

Am I wrong somehow?
Chris Saxon
August 16, 2017 - 3:48 pm UTC

Gotcha, I missed the part about classifying.

When you say:

but I couldn't do it for BLOBs or BFILEs.

What exactly have you tried, and why didn't it work?

I think they are talking about the limitation of MATCHES

paul, August 16, 2017 - 5:49 pm UTC

From the Docs:

https://docs.oracle.com/cd/B28359_01/text.111/b28304/csql.htm#CCREF0106

MATCHES
Use the MATCHES operator to find all rows in a query table that match a given document. The document must be a plain text, HTML, or XML document.

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.