Skip to Main Content
  • Questions
  • Oracle TEXT is slow when using wildcard in CONTAINS()

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Minh.

Asked: October 02, 2006 - 11:05 am UTC

Last updated: November 11, 2009 - 1:44 pm UTC

Version: 9.2.0

Viewed 1000+ times

You Asked

Hello Tom,
I using Oracle TEXT in my application. Here is how I created table and index :
CREATE TABLE T_DOCUMENT
(
DOC_ID NUMBER(22, 0) NOT NULL,
CREATE_TIME DATE DEFAULT SYSDATE NULL,
CREATE_BY NUMBER(22,0),
CHANGE_TIME DATE,
CHANGE_BY NUMBER(22,0),
DOC_TITLE VARCHAR2(1024),
FILE_NAME VARCHAR2(256),
FILE_TYPE_ID NUMBER(22, 0) NOT NULL,
FILE_CONTENT BLOB,
DOC_NOTE VARCHAR2(4000),
FILE_SIZE NUMBER(22, 0),
IS_ACTIVE CHAR(1) DEFAULT 'Y' NOT NULL
)
;

And index:
create index ctx_filecontent_idx on T_DOCUMENT(FILE_CONTENT) indextype is ctxsys.context parameters ('filter CTXSYS.INSO_FILTER');

And store procedure to update index:

CREATE OR REPLACE PROCEDURE refresh_doc_index
IS
BEGIN
ctxsys.ctx_ddl.sync_index('ctx_filecontent_idx', '2M');
END ;

The table T_DOCUMENT is having around 90 documents, includes word, excel, pdf, text. When I ran a query without wildcard to search for word 'released', it returned quickly (6 records returned), but when I ran with a wildcard '%rele%', the performance very slow (it took more than 3 minutes, returned 12 records):

SELECT t_doc.DOC_ID AS DOC_ID,
t_doc.CREATE_TIME AS DOC_CREATE_TIME,
t_doc.CREATE_BY AS DOC_CREATE_BY,
t_doc.CHANGE_TIME AS DOC_CHANGE_TIME,
t_doc.CHANGE_BY AS DOC_CHANGE_BY,
t_doc.DOC_TITLE AS DOC_TITLE,
t_doc.FILE_NAME AS DOC_FILE_NAME,
t_doc.FILE_TYPE_ID AS DOC_FILE_TYPE_ID,
t_doc.DOC_NOTE AS DOC_NOTE,
t_doc.FILE_SIZE AS DOC_FILE_SIZE,
t_doc.IS_ACTIVE AS DOC_IS_ACTIVE
FROM PF_MIS.T_DOCUMENT t_doc
WHERE CONTAINS(t_doc.FILE_CONTENT,
'%relea%',
1) > 0 AND
t_doc.IS_ACTIVE = 'Y'

Do you have any idea to improve speed of this query ?
Thanks very much
Minh




and Tom said...

Definitely,

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:37336026927381#37360199308431 <code>


ctx_ddl.set_attribute('SUBSTRING_PREF',
'SUBSTRING_INDEX','TRUE');

was designed to help facilitate "leading edge wildcard queries" just like this.

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Oracle TEXT is slow when using wildcard in CONTAINS()

Minh Vo, October 02, 2006 - 1:15 pm UTC

Your repsonse is as fast as speed of light
Thanks much.
Minh

multi_column_datastore inverts column names?

Richard Limanowski, December 25, 2006 - 7:51 am UTC

Hello Tom,
a context index inverts columns names.
See:

drop table t;

create table t (
id number(7) not null primary key,
author varchar2(20),
title varchar2(20)
)
tablespace data
/

insert into t (id, author, title) values (1, 'Iglesias', 'Espana');
insert into t (id, author, title) values (2, 'Rebok', 'Spanien');
insert into t (id, author, title) values (3, 'Doe', 'Spain');

exec ctx_ddl.drop_preference('t_datastore');
exec ctx_ddl.create_preference('t_datastore', 'multi_column_datastore');
exec ctx_ddl.set_attribute('t_datastore', 'columns', 'author,title');


drop index t_ctx_idx
/

create index t_ctx_idx on t(author)
indextype is ctxsys.context
parameters('datastore t_datastore')
/

col author format a10
col title format a10
select * from t
where contains(author, 'title', 1) > 0
/
/*
ID AUTHOR TITLE
---------- ---------- ----------
1 Iglesias Espana
2 Rebok Spanien
3 Doe Spain
*/

select * from (select token_text, token_count from "DR$T_CTX_IDX$I" order by token_count desc)
where rownum < 4;
/*
TOKEN_TEXT TOKEN_COUNT
------------ -----------
AUTHOR 3
TITLE 3
DOE 1
*/

Using 10.2.0.1.0 under RedHat.
Yours,
Richard

Tom Kyte
December 25, 2006 - 1:24 pm UTC

not sure what you mean by "inverts column names"?

multi_column_datastore inverts column names?

Richard Limanowski, December 26, 2006 - 6:22 pm UTC

Helo Tom,

I mean: the column names themselves appear in the token list
with the count equal to the cardinality of the table being inverted.
The "contains" query returns ALL records if the term searched is one of the column names.
Using user_datastore with an appropriate glue function yields
correct results.
Yours,
Richard

INSERT OR UPDATE INTO CLOB COLUMN VERY SLOW

Mariah, April 06, 2009 - 2:06 pm UTC

Hello Tom, .....

listen, im using a table that has a clob column... over a year i do inserts without empty_clob's intructions, but... even now im having very big performance troubles.

im doing the next:

into a package i have dinamic information and i have a variable called cuerpo and it type is cLOB.

then ... the clob variable (cuerpo) that has inside HTML code... and then, i make the next insert:

INSERT INTO cto_confirmacion
(fol_confirmacion,
cuerpo,
id_email,
folio_factura)
VALUES
(1,
cuerpo,
1,
20);

over a year... this doesnt have any problem... even now... is very very very slow... almost 7 seconds 4 operation before maybe less than a second.

i tried to change:

INSERT INTO Xtable
(fol,
cuerpo,
idX,
folio_X)
VALUES
(1,
empty_clob(),
1,
20);

inmmediatly

UPDATE Xtable
SET cuerpo = cuerpo
WHERE fol = 1

and the primary key is fol....

even then... is very very slow....

i dont know what to do... im very worried.

Thanks 4 anything.

Regards.







Tom Kyte
April 13, 2009 - 10:52 am UTC

4???

anyway, can you enable sqltrace with wait events enabled and get us a tkprof to look at please?



Insert clob problem

Mariah, November 06, 2009 - 4:07 pm UTC

Hello again Tom.

Listen, i have an store procedure that have some logic to insert confirmations.

This procedure make an HTML text that is put it into a CLOB variable. the way to write the varible is create first a clob variable in memory like this:


Definition:
vlBody tableX.cuerpo%type := empty_clob();
vlTemporal varchar2(3500);

I create it in memory:
DBMS_LOB.CREATETEMPORARY (vlBody, TRUE, DBMS_LOB.CALL);
DBMS_LOB.OPEN(vlBody, DBMS_LOB.LOB_READWRITE);

and Then i begin to write:
vlTemporal := 'XXXXXXXXXXXXXXXXXXXXXX';
amount := LENGTH(vlTemporal); --number of characters to write
offset := 1; --begin writing to the first character of the CLOB

-- First write
DBMS_LOB.WRITE(vlBody, amount, offset, vlTemporal);

-- Write again (really its html code)
vlTemporal := 'WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW';
DBMS_LOB.WRITEAPPEND(vlBody, LENGTH(vlTemporal), vlTemporal);

Then i write some times again... until close the variable and make my insert:

vlTemporal := 'WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWYYYYYYY';
DBMS_LOB.WRITEAPPEND(vlBody, LENGTH(vlTemporal), vlTemporalBanOrden);

(after, i write like this many times)

-- close variable
DBMS_LOB.CLOSE(vlBody);

insert into tableX(entity, cuerpo, sit_cuerpo) values ('ENTITY1', vlBody, 'AC');

-- Destroy variable.
DBMS_LOB.FREETEMPORARY(vlBody);

Even if something bad happens (exception) i destroy variable in memory.

And i make commit (im not sure that this is necesary but the procedure is executed by a job).

This in a begin was ok... but now its very very very slow... almost 5 or 8 seconds for transaccion in a productive database.

I have the same code in a Developer Database and its very fast even 0.203 seconds.

The size of the file is aprox 8 kb and the size to store clob into table is 6mb.

I make a profiler to my code and the trouble is in the INSERT.... there is the problem...

what more can i do?

Many thanks.

Mariah.



Tom Kyte
November 11, 2009 - 1:44 pm UTC

sql_trace and tkprof are your friends, turn them on and see what it says.

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here