Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Antonio.

Asked: September 21, 2011 - 1:46 pm UTC

Last updated: June 26, 2018 - 6:25 am UTC

Version: 11.1.0

Viewed 10K+ times! This question is

You Asked

Hi Tom,

First off all I am newbie to this Oracle Text thing so please bear with me.

I am currently evaluating the possibility and interest in changing the use of context indexes to ctxcat indexes in a big database that I use (the text columns are generally less than 300 bytes...).

I have the following doubts:

1) When using CTXCAT indexes the structured clauses should go on the catsearch clause according to the docs. Using them out of the catsearch instruction implies a relevant penalty in terms of performance? Placing simple indexes in the other fields used in the structured part will not lead to a similar performance? This is important because if there is significant difference I will need to rewrite a lot of (static and dynamic) queries to accommodate the catsearch syntax.

2) To make an exact match query using the contains clause I need to use the following trick so that the query may use an index (the text index):

select textcolumn from t
where contains(textcolumn, 'word')>0
and textcolumn='word'

Is there any other way to get this done using contains? Or do I need to create also a simple index on the textcolumn and use only the equals clause so that it runs faster/have a better execution plan (this way I will need to maintain two indexes over the same column to satisfy queries which need contains and queries with exact match)?

Also, when experimenting with catsearch I'm getting an execution plan with a cost 25x bigger if I use the equals clause comparing to not using it. I am not comfortable either with understanding an execution plan in Oracle but is this difference expected to happen? In this case I only have a ctxcat index over the textcolumn.

Thanks in advance for your help and comments!

Regards,
Antonio

and Tom said...

1) using them outside of catsearch might

run faster
run slower
run the same


It depends.

Typically, in general, however - you would use them in the catsearch so that the relational (structured) and text (unstructured) searches can run "at the same time" - we can evaluate both filters on the same pass instead of filtering text and then filtering relational.


As always - you should benchmark and evaluate this, it would not take much effort to test this out and see what you could expect.




2) if I was looking for an exact match - a conventional b*tree index on textcolumn would be the most efficient.

but if you only want a single index - your approach above is sound - using equals after the contains.


I would not worry about the cost estimate too much, I would evaluate the actual runtime performance. You cannot really compare the cost of two different queries - there are too many factors involved. A query with a cost of 100 might out perform a query with a cost of 5. The cost is used to select the 'best' query plan from a set of plans generated for a single SQL statement. We don't really tend to look at them across different queries so much.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

CTXCAT index and plans

AndrE, June 25, 2018 - 11:36 am UTC

Hi, Tom

I have table
create table CVC
(
config_field_id NUMBER not null,
key_value NUMBER not null,
value_char VARCHAR2(2000),
program_id NUMBER not null,
ts DATE,
user_id NUMBER
)

and add index to it
begin
ctx_ddl.create_index_set('CVC_iset');
ctx_ddl.add_index('CVC_iset','config_field_id, key_value, program_id'); /* sub-index */
end;

CREATE INDEX TEXTi_CVC on CVC(value_char) INDEXTYPE IS ctxsys.ctxcat
PARAMETERS ('index set CVC_iset');

and run queries

select count(*)
from CVC
where catsearch(value_char, 'Lucas | pound', 'config_field_id = 100169430 and key_value = 1001152962 and program_id = 1') > 0

and have plan
select
sort aggregate
domain index

but
select count(value_char) -- or config_field_id or key_value or program_id
from CVC
where catsearch(value_char, 'Lucas | pound', 'config_field_id = 100169430 and key_value = 1001152962 and program_id = 1') > 0

and
select
sort aggregate
table access by index rowid
domain index

why? Column value_char (or any from config_field_id or key_value or program_id or all together) is in index and table access should not be here.

Thanks



Connor McDonald
June 26, 2018 - 6:25 am UTC

The challenge here is the "null-ness" of value_char. If the definition of the *column* is nullable, then in order to skip the table lookup, we'd need to parse the entire content of the catsearch parameters to prove that value_char (or the expressions being used to search it) guarantee not null. Whilst that is certainly doable, it is not something that I think is currently implemented.

Making the column non-null is an obvious solution that (might) be applicable here.

SQL> create table CVC
  2  (
  3  config_field_id NUMBER not null,
  4  key_value NUMBER not null,
  5  value_char VARCHAR2(2000),
  6  program_id NUMBER not null,
  7  ts DATE,
  8  user_id NUMBER
  9  ) ;

Table created.

SQL>
SQL> insert into cvc
  2  select 100000000+rownum, 100000000+rownum, 'vc', mod(rownum,3), sysdate, rownum
  3  from dual
  4  connect by level <= 20000;

20000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL>
SQL> begin
  2  ctx_ddl.create_index_set('CVC_iset');
  3  ctx_ddl.add_index('CVC_iset','config_field_id, key_value, program_id'); /* sub-index */
  4  end;
  5  /

PL/SQL procedure successfully completed.

SQL>
SQL> CREATE INDEX TEXTi_CVC on CVC(value_char) INDEXTYPE IS ctxsys.ctxcat
  2  PARAMETERS ('index set CVC_iset');

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','CVC');

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> set autotrace traceonly explain
SQL> select count(*)
  2  from CVC
  3  where catsearch(value_char, 'vc | pound', 'config_field_id = 100000409 and key_value = 100000409 and program_id = 1') > 0 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1442005353

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |     1 |    15 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE  |           |     1 |    15 |            |          |
|*  2 |   DOMAIN INDEX   | TEXTI_CVC |  1000 | 15000 |            |          |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CATSEARCH"("VALUE_CHAR",'vc |
              pound','config_field_id = 100000409 and key_value = 100000409 and
              program_id = 1')>0)

SQL>
SQL> select count(value_char) -- or config_field_id or key_value or program_id
  2  from CVC
  3  where catsearch(value_char, 'vc | pound', 'config_field_id = 100000409 and key_value = 100000409 and program_id = 1') > 0 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2776781297

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |     1 |    15 |    30   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |           |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| CVC       |  1000 | 15000 |    30   (0)| 00:00:01 |
|*  3 |    DOMAIN INDEX              | TEXTI_CVC |       |       |            |          |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CTXSYS"."CATSEARCH"("VALUE_CHAR",'vc | pound','config_field_id =
              100000409 and key_value = 100000409 and program_id = 1')>0)

SQL>
SQL> select count(value_char) -- or config_field_id or key_value or program_id
  2  from CVC
  3  where catsearch(value_char, 'vc', 'config_field_id = 100000409 and key_value = 100000409 and program_id = 1') > 0 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2776781297

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |     1 |    15 |    30   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |           |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| CVC       |  1000 | 15000 |    30   (0)| 00:00:01 |
|*  3 |    DOMAIN INDEX              | TEXTI_CVC |       |       |            |          |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CTXSYS"."CATSEARCH"("VALUE_CHAR",'vc','config_field_id = 100000409
              and key_value = 100000409 and program_id = 1')>0)

SQL>
SQL> alter table cvc modify value_char not null;

Table altered.

SQL>
SQL> select count(*)
  2  from CVC
  3  where catsearch(value_char, 'vc | pound', 'config_field_id = 100000409 and key_value = 100000409 and program_id = 1') > 0 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1442005353

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |     1 |    15 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE  |           |     1 |    15 |            |          |
|*  2 |   DOMAIN INDEX   | TEXTI_CVC |  1000 | 15000 |            |          |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CATSEARCH"("VALUE_CHAR",'vc |
              pound','config_field_id = 100000409 and key_value = 100000409 and
              program_id = 1')>0)

SQL>
SQL> select count(value_char) -- or config_field_id or key_value or program_id
  2  from CVC
  3  where catsearch(value_char, 'vc | pound', 'config_field_id = 100000409 and key_value = 100000409 and program_id = 1') > 0 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1442005353

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |     1 |    15 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE  |           |     1 |    15 |            |          |
|*  2 |   DOMAIN INDEX   | TEXTI_CVC |  1000 | 15000 |            |          |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CATSEARCH"("VALUE_CHAR",'vc |
              pound','config_field_id = 100000409 and key_value = 100000409 and
              program_id = 1')>0)

SQL>
SQL>