You Asked

Hi Tom,

I have a table which will have millions of records, for example.

  
CREATE TABLE employee
       (ID NUMBER NOT NULL ENABLE, 
 EMPLOYEE_NUMBER VARCHAR2(100 CHAR), 
 FULLNAME VARCHAR2(100 CHAR) NOT NULL ENABLE, 
 GENDER NUMBER(1,0), 
 CITY_NAME VARCHAR2(100 CHAR),
 FULL_TEXT_SEARCH CLOB);

FULL_TEXT_SEARCH is a column in which I have concatenated EMPLOYEE_NUMBER, FIRSTNAME, and CITY_NAME.

I have set the subtring_index attribute as,

BEGIN
    ctx_ddl.set_attribute('CTXSYS.BASIC_WORDLIST', 'SUBSTRING_INDEX', 'TRUE');
END;

and created a context index on it, defined as follows

CREATE INDEX idx_employee_txt ON employee (full_text_search)
INDEXTYPE IS ctxsys.context PARAMETERS ( 'wordlist CTXSYS.BASIC_WORDLIST MEMORY 50M SYNC(ON COMMIT)' );

Now, I am testing a query with just 19000 registries and it took 6.739 seconds to get the result (10458 registries match the filter), here the query

SELECT
    COUNT(1) OVER() AS totalRegistries,
    emp.ID,
    emp.EMPLOYEE_NUMBER,
    emp.FULLNAME,
    emp.GENDER,
    emp.CITY_NAME
FROM
    employee emp
WHERE
    CONTAINS(emp.full_text_search, '%001%') > 0
ORDER BY emp.FULLNAME ASC NULLS LAST
OFFSET 0 ROWS FETCH FIRST 50 ROWS ONLY;

If I use LIKE '%001%' it took 0.085 seconds.

Am I doing something wrong? I think that is taking much time just for 19000 registries, how can I get improve my implementation to get a better performance?

The explain plan for CONTAINS query

OPERATION
OBJECT_NAME OPTIONS BYTES CARDINALITY COST CPU_COST IO_COST
SELECT STATEMENT
1625 1 2 36093185 1
VIEW
SYS.null 1625 1 2 36093185 1
Filter Predicates
AND
from$_subquery$_002.rowlimit_$$_rownumber<=0+50
from$_subquery$_002.rowlimit_$$_rownumber>0
WINDOW
SORT 297 1 2 36093185 1
TABLE ACCESS
CITIZEN BY INDEX ROWID 297 1 1 1 1
DOMAIN INDEX
IDX_CITIZEN_TXT 1 1 1
Access Predicates
CTXSYS.CONTAINS(CTZ.FULL_TEXT_SEARCH,'%001%')>0

and Connor said...

The response time for indexes (not just context indexes) is generally proportional the amount of data that is *returned* to the client.

eg, If your employee table had 1 billion records and still returned 10458 rows, the context index will still be 6 seconds, the standard LIKE will a *lot* longer.

One thing to keep in mind is that a substring index is precisely that, ie, *all* of the substrings possible.

A couple of things you might want to experiment with.

1) Choose a min/max length of things that people can search for, eg

begin 
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); 
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '4');
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '10');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end;

so that index entries less than 4 char are not included, and your app would not let them search on

2) See if the wildcard index helps

begin
     ctx_ddl.create_preference('mywordlist','BASIC_WORDLIST');
     ctx_ddl.set_attribute('mywordlist','WILDCARD_INDEX','TRUE');
end;

Availability of these is version dependent.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

Design

Performance issue using index type Context

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

Design