Hi Tom,
I have a table which will have millions of records, for example.
CREATE TABLE employee
(ID NUMBER NOT NULL ENABLE,
EMPLOYEE_NUMBER VARCHAR2(100 CHAR),
FULLNAME VARCHAR2(100 CHAR) NOT NULL ENABLE,
GENDER NUMBER(1,0),
CITY_NAME VARCHAR2(100 CHAR),
FULL_TEXT_SEARCH CLOB);
FULL_TEXT_SEARCH is a column in which I have concatenated EMPLOYEE_NUMBER, FIRSTNAME, and CITY_NAME.
I have set the subtring_index attribute as,
BEGIN
ctx_ddl.set_attribute('CTXSYS.BASIC_WORDLIST', 'SUBSTRING_INDEX', 'TRUE');
END;
and created a context index on it, defined as follows
CREATE INDEX idx_employee_txt ON employee (full_text_search)
INDEXTYPE IS ctxsys.context PARAMETERS ( 'wordlist CTXSYS.BASIC_WORDLIST MEMORY 50M SYNC(ON COMMIT)' );
Now, I am testing a query with just 19000 registries and it took 6.739 seconds to get the result (10458 registries match the filter), here the query
SELECT
COUNT(1) OVER() AS totalRegistries,
emp.ID,
emp.EMPLOYEE_NUMBER,
emp.FULLNAME,
emp.GENDER,
emp.CITY_NAME
FROM
employee emp
WHERE
CONTAINS(emp.full_text_search, '%001%') > 0
ORDER BY emp.FULLNAME ASC NULLS LAST
OFFSET 0 ROWS FETCH FIRST 50 ROWS ONLY;
If I use LIKE '%001%' it took 0.085 seconds.
Am I doing something wrong? I think that is taking much time just for 19000 registries, how can I get improve my implementation to get a better performance?
The explain plan for CONTAINS query
OPERATION
OBJECT_NAME OPTIONS BYTES CARDINALITY COST CPU_COST IO_COST
SELECT STATEMENT
1625 1 2 36093185 1
VIEW
SYS.null 1625 1 2 36093185 1
Filter Predicates
AND
from$_subquery$_002.rowlimit_$$_rownumber<=0+50
from$_subquery$_002.rowlimit_$$_rownumber>0
WINDOW
SORT 297 1 2 36093185 1
TABLE ACCESS
CITIZEN BY INDEX ROWID 297 1 1 1 1
DOMAIN INDEX
IDX_CITIZEN_TXT 1 1 1
Access Predicates
CTXSYS.CONTAINS(CTZ.FULL_TEXT_SEARCH,'%001%')>0
The response time for indexes (not just context indexes) is generally proportional the amount of data that is *returned* to the client.
eg, If your employee table had 1 billion records and still returned 10458 rows, the context index will still be 6 seconds, the standard LIKE will a *lot* longer.
One thing to keep in mind is that a substring index is precisely that, ie, *all* of the substrings possible.
A couple of things you might want to experiment with.
1) Choose a min/max length of things that people can search for, eg
begin
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '4');
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '10');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end;
so that index entries less than 4 char are not included, and your app would not let them search on
2) See if the wildcard index helps
begin
ctx_ddl.create_preference('mywordlist','BASIC_WORDLIST');
ctx_ddl.set_attribute('mywordlist','WILDCARD_INDEX','TRUE');
end;
Availability of these is version dependent.