Skip to Main Content
  • Questions
  • Oracle - Optimising the pattern matching (LIKE '%ABC')

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Vikas.

Asked: October 14, 2002 - 10:49 am UTC

Last updated: January 04, 2010 - 9:40 am UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

We store all the vehicle details in a table. In the display screen we allow the users to search for a vehicle by providing some part of chassis number. Presently we allow only for initial part i.e. ABC% or A% queries. However users sometimes know the last digits/alphabets & would like to query as %12 or %64. If we provide for such queries the performance is low as the index on the chassis number is not used. Is there any new feature or some workaround to improve the performance.

We will not allow for
select * from vehicle_table where chassis_number like '%56%' queries.

So the query for optimization will be:
select * from vehicle_table where chassis_number like '%56'.



and Tom said...

function based indexes could work very well here (see
</code> http://asktom.oracle.com/~tkyte/article1/index.html <code>
or I have a chapter on them in my book "Expert one on one Oracle")...

I used the undocumented reverse function (which might not work in PLSQL, you might have to use DYNAMIC SQL in plsql to use it)

You could write your OWN reverse function in PLSQL instead to avoid using the undocumented (and hence, not really "supported") function....



ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t as select * from all_users;
Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create index t_idx1 on t(reverse(username));
Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create index t_idx2 on t(username);
Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> exec dbms_stats.set_table_stats( user, 'T', numrows => 100000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set query_rewrite_enabled=true;

Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set query_rewrite_integrity=trusted;

Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace on explain
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t where username like 'OPS$%';

USERNAME USER_ID CREATED
------------------------------ ---------- ---------
OPS$CLBECK 30 09-SEP-01
OPS$ORA817 27 28-AUG-01
OPS$TKYTE 578 14-OCT-02


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=5000 Bytes=195000)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=10 Card=5000 Bytes=195000)
2 1 INDEX (RANGE SCAN) OF 'T_IDX2' (NON-UNIQUE) (Cost=2 Card=5000)



ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t where reverse(username) like reverse('%KYTE');

USERNAME USER_ID CREATED
------------------------------ ---------- ---------
OPS$TKYTE 578 14-OCT-02


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=5000 Bytes=195000)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=10 Card=5000 Bytes=195000)
2 1 INDEX (RANGE SCAN) OF 'T_IDX1' (NON-UNIQUE) (Cost=2 Card=5000)



ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace off



see, they both use indexes!


Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Brilliant

Jim, October 14, 2002 - 6:57 pm UTC

Right... that one is going straight into the
"toolkit"


Gisel, December 15, 2002 - 9:03 am UTC

Your approach solves problems like '%ABC' and 'ABC%' but if you'd want to solve '%ABC%' using only one index how would you do?

Regards
Gisel

Tom Kyte
December 15, 2002 - 10:40 am UTC

You could use the index as a "skinny table" is all. Something to generate rowids from to access the base table.

Think about it -- every single row in the entire table must be inspected in order to do %abc%. An index cannot be used to inspect just "some" of the rows in this case obviously.


big_table@ORA920.US.ORACLE.COM> select object_name from big_table where object_name like '%ABC%';

1088 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=978 Card=93821 Bytes=1594957)
1 0 INDEX (FAST FULL SCAN) OF 'BIG_TABLE_IDX1' (NON-UNIQUE) (Cost=978 Card=93821 Bytes=1594957)




N-grams

Anders, December 15, 2002 - 2:20 pm UTC

If you need to be able to search for '%ABC%', you could have a look at n-grams. If you have the word "oracle" with the id 34, you would in an index-table insert (34, "ora"), (34, "rac"), (34, "acl") and (34, "cle"). For a search on "acl", you can do a direct lookup on the index-table and get the ids of the rows containing the substring.
Can a function based index be used here? (i.e. can the function "return more than one row"?)
This might be a good time to roll out intermedia.

What about Oracle text????

Dan, December 15, 2002 - 11:23 pm UTC

I would use a context index on this problem.

excellent

A reader, January 16, 2003 - 7:43 pm UTC


Intermedia performance not so great

Xavier, August 05, 2003 - 12:14 am UTC

I wish Intermedia indexing worked better. I have a table with about 600,000 rows, having both conventional and Intermedia indexing on some important text. I find a query this:

foo like '%str%' or foo like '%bar%' actually runs quite a bit faster than a CONTAINS query on str or bar. The % query runs in <10 secs, the CONTAINS query can take over a minute. This is quite absurd and I've been looking for something I'm doing wrong. A contains query on a phrase works very fast, e.g. CONTAINS with 'foo bar' runs like greased lightning. It's the OR conditions that kill it.





Tom Kyte
August 05, 2003 - 7:16 am UTC

give me a concrete example to work with. you aren't calling contains over and over are you - you are using contains correctly right?

ask_tom@ASKUS> select count(*) from wwc_ask_indexed_questions$ where contains( text, 'hello' ) > 0;

COUNT(*)
----------
709
Elapsed: 00:00:00.38
ask_tom@ASKUS> select count(*) from wwc_ask_indexed_questions$ where contains( text, 'world' ) > 0
2 /

COUNT(*)
----------
918
Elapsed: 00:00:00.50
ask_tom@ASKUS> select count(*) from wwc_ask_indexed_questions$ where contains( text, 'hello OR world' ) > 0
2 /

COUNT(*)
----------
1436
Elapsed: 00:00:00.74




LIKE vs. Intermedia

Andy, December 17, 2003 - 5:25 am UTC

I need to compare a column to a set of values, and include this comparison in a FGAC policy. The "set of values" are read from another table at runtime. The comparisons are all against short, distinct words, or against trailing wildcards (i.e. Z% and not %Z or %Z%). The easiest way to do this for me would be to use intermedia i.e.

select count(*) from fulltext where contains(oe,'test OR Z%', 1) > 0

...but an alternative would be to examine the target values, and contruct individual LIKE clauses for each one - which would be a pain:

select count(*) from fulltext where oe like 'test' or oe like 'Z%'

(in this example I've got two terms to compare against - 'test' and Z%' - but I might have just 1 or as many as 10).

Because this comparison will be used by FGAC (and therefore applied to just about every single query), I'm concerned about the performance. For example, a simple test in SQL*Plus shows the following in TKPROF:-

TKPROF: Release 9.2.0.1.0 - Production on Wed Dec 17 10:51:30 2003
********************************************************************************

select count(*) from fulltext where contains(oe,'test OR Z%', 1) > 0


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.09 0.27 0 30 0 0
Execute 2 0.03 0.03 0 0 0 0
Fetch 4 0.04 0.11 0 26 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 8 0.16 0.42 0 56 0 2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 62

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
499 DOMAIN INDEX FTI_OE

********************************************************************************

select count(*) from fulltext where oe like 'test' or oe like 'Z%'


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.04 0.06 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.02 0 17 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.05 0.08 0 17 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 62

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
499 INDEX FAST FULL SCAN IX_COMBO1 (object id 33071)

********************************************************************************


i.e. the CONTEXT search seems to do more work. What I'd really like to be able to do would be:

select .... where oe in('test','Z%'...)

but that won't work because of the wildcards! Would you go with the LIKE option?

(I'm on Oracle 9.2.0.1.0).

Tom Kyte
December 17, 2003 - 6:58 am UTC

you have only 499 rows?

how big was your test.....

and if you scale it up what does it do...

have you looked at the ctxcat index type instead
</code> http://docs.oracle.com/docs/cd/B10501_01/text.920/a96517/cdefault.htm#1006111 <code>




(and this approach is not going to be "the most performant in the world", you have to expect a measurable impact -- very measurable impact -- with this particular requirement)

why is reverse undocumented/unsupported?

pete, December 17, 2003 - 8:44 am UTC

Tom,

Why is an obviously useful function like REVERSE not supported by Oracle? Am I right in thinking that the only reason it exists is that it is used internally by Oracle? If that is the case then why not include it in the supported features and budle it into PL/SQL?

Thanks,
Pete

Tom Kyte
December 18, 2003 - 8:15 am UTC

don't know - there are lots of little internal functions (bitand used to be one until recently). the way to get them out there is to file an enhancement request via support (thats how bitand got "published" and alter session set current_schema and so on)

How about utl_raw.REVERSE Function

pasko, December 18, 2003 - 8:45 am UTC

Hi Tom,
i just came across another Reverse Function in UTL_RAW Package.

may be this could be used but the problem is that it works only on RAWs.

For Example:
SQL> SELECT object_name , utl_raw.cast_to_varchar2( 
  2       utl_raw.REVERSE( 
  3       utl_raw.cast_to_raw( object_name ))) rev_name 
  4  from all_objects where object_name like 'USER_%' and rownum < 5 ;

OBJECT_NAME                    REV_NAME
------------------------------ ------------------------------
USER$                          $RESU
USER_ALL_TABLES                SELBAT_LLA_RESU
USER_APPLICATION_ROLES         SELOR_NOITACILPPA_RESU
USER_AQ_AGENT_PRIVS            SVIRP_TNEGA_QA_RESU

Also, if the Undocumented REVERSE works fine, is there any harm if we use it ?

Thanks.

 

Tom Kyte
December 18, 2003 - 11:17 am UTC

the harm in using undocumented features/functions is that

a) they can disappear
b) change behaviour
c) break

and there is nothing you can do about it.

CTXCAT vs. CONTEXT vs. LIKE

Andy, December 19, 2003 - 9:13 am UTC

Thanks for your feedback to my comment above, Tom. I scaled up and it didn't take long before LIKE was clearly slower. I looked into CTXCAT, but the nature of the queries in the FGAC policy would make that hard to implement, so I'll go with a CONTAINS clause.

CONTEXT vs. CTXCAT

amitabh, December 30, 2009 - 12:12 am UTC

Hi Tom
I am trying to decide between CONTEXT and CTXCAT indexes for our app


in my current situation ctxcat type index seem to be a good fit because of :

- small text fragments to be searched
- simple seraches without things like thesaurus etc etc
- mixed searches where text seraches have to be combined with other columns

HOWEVER

i do need to be able to execute the text serach on multiple columns. in the case of context type indexes that is doable by creating multi_column_datastore and section groups and then using WITHIN in the CONTAINS
as described by an example at the bottom of my email.


MY Question is - if using ctxcat - is there a way to search on multiple coulumns with a WITHIN type functionality ?

Will appreciate your reply.


Example Showing Multi-Column-Search using CONTEXT Index
========================================================
example that includes a section group, one section per column, and examples of searching individual columns using "within".

-- table and data for demo:

SCOTT@10gXE> CREATE TABLE noticias
2 (id NUMBER,
3 ouid NUMBER,
4 dateinsert DATE,
5 resumen CLOB,
6 inserted VARCHAR2 (10),
7 tema VARCHAR2 (10),
8 search_cols VARCHAR2 (1))
9 /

Table created.

SCOTT@10gXE> INSERT ALL
2 INTO noticias VALUES (1, 10, SYSDATE, 'testing', 'something', 'whatever', null)
3 INTO noticias VALUES (2, 10, SYSDATE, 'something', 'testing', 'whatever', null)
4 INTO noticias VALUES (3, 10, SYSDATE, 'something', 'whatever', 'testing', null)
5 INTO noticias VALUES (4, 10, SYSDATE, 'testing', 'testing', 'testing', null)
6 INTO noticias VALUES (5, 10, SYSDATE, 'whatever', 'nothing', 'something', null)
7 SELECT * FROM DUAL
8 /

5 rows created.


-- create multi_column_datastore and specify columns:

SCOTT@10gXE> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('noticias_mcds', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('noticias_mcds', 'COLUMNS', 'resumen, inserted, tema');
4 END;
5 /

PL/SQL procedure successfully completed.


-- create section group and add one section per column:

SCOTT@10gXE> BEGIN
2 CTX_DDL.CREATE_SECTION_GROUP ('noticias_sg', 'BASIC_SECTION_GROUP');
3 CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'resumen', 'resumen', TRUE);
4 CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'inserted', 'inserted', TRUE);
5 CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'tema', 'tema', TRUE);
6 END;
7 /

PL/SQL procedure successfully completed.


-- create index using multicolumn_datastore and section group:

SCOTT@10gXE> CREATE INDEX noticias_index
2 ON noticias (search_cols)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS
5 ('DATASTORE noticias_mcds
6 SECTION GROUP noticias_sg')
7 /

Index created.


-- variables for searching:

SCOTT@10gXE> VARIABLE v_palabra VARCHAR2 (30)
SCOTT@10gXE> VARIABLE v_ouid NUMBER
SCOTT@10gXE> EXEC :v_palabra := 'testing'

PL/SQL procedure successfully completed.

SCOTT@10gXE> EXEC :v_ouid := 10

PL/SQL procedure successfully completed.


-- search of all columns:

SCOTT@10gXE> COLUMN resumen FORMAT A10 word_wrapped
SCOTT@10gXE> SELECT /*+ FIRST_ROWS(50) */
2 score (1) AS RANK,
3 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
4 resumen, inserted, tema
5 FROM NOTICIAS
6 WHERE contains (search_cols, :v_palabra, 1) > 0
7 AND ouid = :v_ouid
8 ORDER BY dateinsert DESC
9 /

RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
3 1 04/09/2007 testing something whatever
3 2 04/09/2007 something testing whatever
3 3 04/09/2007 something whatever testing
10 4 04/09/2007 testing testing testing


-- searches of individual columns using "within":

SCOTT@10gXE> SELECT score (1) AS RANK,
2 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
3 resumen, inserted, tema
4 FROM NOTICIAS
5 WHERE contains (search_cols, :v_palabra || ' WITHIN resumen', 1) > 0
6 /

RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
4 1 04/09/2007 testing something whatever
4 4 04/09/2007 testing testing testing

SCOTT@10gXE> SELECT score (1) AS RANK,
2 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
3 resumen, inserted, tema
4 FROM NOTICIAS
5 WHERE contains (search_cols, :v_palabra || ' WITHIN inserted', 1) > 0
6 /

RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
4 2 04/09/2007 something testing whatever
4 4 04/09/2007 testing testing testing

SCOTT@10gXE> SELECT score (1) AS RANK,
2 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
3 resumen, inserted, tema
4 FROM NOTICIAS
5 WHERE contains (search_cols, :v_palabra || ' WITHIN tema', 1) > 0
6 /

RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
4 3 04/09/2007 something whatever testing
4 4 04/09/2007 testing testing testing

===========================================================

Tom Kyte
January 04, 2010 - 9:40 am UTC

non-starter here.

ctxcat does not do clob types.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library