Skip to Main Content
  • Questions
  • Performance issue using index type Context

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Giovanni.

Asked: May 21, 2019 - 1:20 pm UTC

Last updated: May 29, 2019 - 6:37 am UTC

Version: 12c SE

Viewed 1000+ times

You Asked

Hi Tom,

I have a table which will have millions of records, for example.

  
CREATE TABLE employee
       (ID NUMBER NOT NULL ENABLE, 
 EMPLOYEE_NUMBER VARCHAR2(100 CHAR), 
 FULLNAME VARCHAR2(100 CHAR) NOT NULL ENABLE, 
 GENDER NUMBER(1,0), 
 CITY_NAME VARCHAR2(100 CHAR),
 FULL_TEXT_SEARCH CLOB);


FULL_TEXT_SEARCH is a column in which I have concatenated EMPLOYEE_NUMBER, FIRSTNAME, and CITY_NAME.

I have set the subtring_index attribute as,

BEGIN
    ctx_ddl.set_attribute('CTXSYS.BASIC_WORDLIST', 'SUBSTRING_INDEX', 'TRUE');
END;


and created a context index on it, defined as follows

CREATE INDEX idx_employee_txt ON employee (full_text_search)
INDEXTYPE IS ctxsys.context PARAMETERS ( 'wordlist CTXSYS.BASIC_WORDLIST MEMORY 50M SYNC(ON COMMIT)' );


Now, I am testing a query with just 19000 registries and it took 6.739 seconds to get the result (10458 registries match the filter), here the query

SELECT
    COUNT(1) OVER() AS totalRegistries,
    emp.ID,
    emp.EMPLOYEE_NUMBER,
    emp.FULLNAME,
    emp.GENDER,
    emp.CITY_NAME
FROM
    employee emp
WHERE
    CONTAINS(emp.full_text_search, '%001%') > 0
ORDER BY emp.FULLNAME ASC NULLS LAST
OFFSET 0 ROWS FETCH FIRST 50 ROWS ONLY;


If I use LIKE '%001%' it took 0.085 seconds.

Am I doing something wrong? I think that is taking much time just for 19000 registries, how can I get improve my implementation to get a better performance?

The explain plan for CONTAINS query

OPERATION
OBJECT_NAME OPTIONS BYTES CARDINALITY COST CPU_COST IO_COST
SELECT STATEMENT
1625 1 2 36093185 1
VIEW
SYS.null 1625 1 2 36093185 1
Filter Predicates
AND
from$_subquery$_002.rowlimit_$$_rownumber<=0+50
from$_subquery$_002.rowlimit_$$_rownumber>0
WINDOW
SORT 297 1 2 36093185 1
TABLE ACCESS
CITIZEN BY INDEX ROWID 297 1 1 1 1
DOMAIN INDEX
IDX_CITIZEN_TXT 1 1 1
Access Predicates
CTXSYS.CONTAINS(CTZ.FULL_TEXT_SEARCH,'%001%')>0

and Connor said...

The response time for indexes (not just context indexes) is generally proportional the amount of data that is *returned* to the client.

eg, If your employee table had 1 billion records and still returned 10458 rows, the context index will still be 6 seconds, the standard LIKE will a *lot* longer.

One thing to keep in mind is that a substring index is precisely that, ie, *all* of the substrings possible.

A couple of things you might want to experiment with.

1) Choose a min/max length of things that people can search for, eg

begin 
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); 
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '4');
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '10');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end;


so that index entries less than 4 char are not included, and your app would not let them search on

2) See if the wildcard index helps

begin
     ctx_ddl.create_preference('mywordlist','BASIC_WORDLIST');
     ctx_ddl.set_attribute('mywordlist','WILDCARD_INDEX','TRUE');
end;


Availability of these is version dependent.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.