Brilliant
Jim, October 14, 2002 - 6:57 pm UTC
Right... that one is going straight into the
"toolkit"
Gisel, December 15, 2002 - 9:03 am UTC
Your approach solves problems like '%ABC' and 'ABC%' but if you'd want to solve '%ABC%' using only one index how would you do?
Regards
Gisel
December 15, 2002 - 10:40 am UTC
You could use the index as a "skinny table" is all. Something to generate rowids from to access the base table.
Think about it -- every single row in the entire table must be inspected in order to do %abc%. An index cannot be used to inspect just "some" of the rows in this case obviously.
big_table@ORA920.US.ORACLE.COM> select object_name from big_table where object_name like '%ABC%';
1088 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=978 Card=93821 Bytes=1594957)
1 0 INDEX (FAST FULL SCAN) OF 'BIG_TABLE_IDX1' (NON-UNIQUE) (Cost=978 Card=93821 Bytes=1594957)
N-grams
Anders, December 15, 2002 - 2:20 pm UTC
If you need to be able to search for '%ABC%', you could have a look at n-grams. If you have the word "oracle" with the id 34, you would in an index-table insert (34, "ora"), (34, "rac"), (34, "acl") and (34, "cle"). For a search on "acl", you can do a direct lookup on the index-table and get the ids of the rows containing the substring.
Can a function based index be used here? (i.e. can the function "return more than one row"?)
This might be a good time to roll out intermedia.
What about Oracle text????
Dan, December 15, 2002 - 11:23 pm UTC
I would use a context index on this problem.
excellent
A reader, January 16, 2003 - 7:43 pm UTC
Intermedia performance not so great
Xavier, August 05, 2003 - 12:14 am UTC
I wish Intermedia indexing worked better. I have a table with about 600,000 rows, having both conventional and Intermedia indexing on some important text. I find a query this:
foo like '%str%' or foo like '%bar%' actually runs quite a bit faster than a CONTAINS query on str or bar. The % query runs in <10 secs, the CONTAINS query can take over a minute. This is quite absurd and I've been looking for something I'm doing wrong. A contains query on a phrase works very fast, e.g. CONTAINS with 'foo bar' runs like greased lightning. It's the OR conditions that kill it.
August 05, 2003 - 7:16 am UTC
give me a concrete example to work with. you aren't calling contains over and over are you - you are using contains correctly right?
ask_tom@ASKUS> select count(*) from wwc_ask_indexed_questions$ where contains( text, 'hello' ) > 0;
COUNT(*)
----------
709
Elapsed: 00:00:00.38
ask_tom@ASKUS> select count(*) from wwc_ask_indexed_questions$ where contains( text, 'world' ) > 0
2 /
COUNT(*)
----------
918
Elapsed: 00:00:00.50
ask_tom@ASKUS> select count(*) from wwc_ask_indexed_questions$ where contains( text, 'hello OR world' ) > 0
2 /
COUNT(*)
----------
1436
Elapsed: 00:00:00.74
LIKE vs. Intermedia
Andy, December 17, 2003 - 5:25 am UTC
I need to compare a column to a set of values, and include this comparison in a FGAC policy. The "set of values" are read from another table at runtime. The comparisons are all against short, distinct words, or against trailing wildcards (i.e. Z% and not %Z or %Z%). The easiest way to do this for me would be to use intermedia i.e.
select count(*) from fulltext where contains(oe,'test OR Z%', 1) > 0
...but an alternative would be to examine the target values, and contruct individual LIKE clauses for each one - which would be a pain:
select count(*) from fulltext where oe like 'test' or oe like 'Z%'
(in this example I've got two terms to compare against - 'test' and Z%' - but I might have just 1 or as many as 10).
Because this comparison will be used by FGAC (and therefore applied to just about every single query), I'm concerned about the performance. For example, a simple test in SQL*Plus shows the following in TKPROF:-
TKPROF: Release 9.2.0.1.0 - Production on Wed Dec 17 10:51:30 2003
********************************************************************************
select count(*) from fulltext where contains(oe,'test OR Z%', 1) > 0
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.09 0.27 0 30 0 0
Execute 2 0.03 0.03 0 0 0 0
Fetch 4 0.04 0.11 0 26 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 8 0.16 0.42 0 56 0 2
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 62
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
499 DOMAIN INDEX FTI_OE
********************************************************************************
select count(*) from fulltext where oe like 'test' or oe like 'Z%'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.04 0.06 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.02 0 17 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.05 0.08 0 17 0 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 62
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
499 INDEX FAST FULL SCAN IX_COMBO1 (object id 33071)
********************************************************************************
i.e. the CONTEXT search seems to do more work. What I'd really like to be able to do would be:
select .... where oe in('test','Z%'...)
but that won't work because of the wildcards! Would you go with the LIKE option?
(I'm on Oracle 9.2.0.1.0).
December 17, 2003 - 6:58 am UTC
you have only 499 rows?
how big was your test.....
and if you scale it up what does it do...
have you looked at the ctxcat index type instead
</code>
http://docs.oracle.com/docs/cd/B10501_01/text.920/a96517/cdefault.htm#1006111 <code>
(and this approach is not going to be "the most performant in the world", you have to expect a measurable impact -- very measurable impact -- with this particular requirement)
why is reverse undocumented/unsupported?
pete, December 17, 2003 - 8:44 am UTC
Tom,
Why is an obviously useful function like REVERSE not supported by Oracle? Am I right in thinking that the only reason it exists is that it is used internally by Oracle? If that is the case then why not include it in the supported features and budle it into PL/SQL?
Thanks,
Pete
December 18, 2003 - 8:15 am UTC
don't know - there are lots of little internal functions (bitand used to be one until recently). the way to get them out there is to file an enhancement request via support (thats how bitand got "published" and alter session set current_schema and so on)
How about utl_raw.REVERSE Function
pasko, December 18, 2003 - 8:45 am UTC
Hi Tom,
i just came across another Reverse Function in UTL_RAW Package.
may be this could be used but the problem is that it works only on RAWs.
For Example:
SQL> SELECT object_name , utl_raw.cast_to_varchar2(
2 utl_raw.REVERSE(
3 utl_raw.cast_to_raw( object_name ))) rev_name
4 from all_objects where object_name like 'USER_%' and rownum < 5 ;
OBJECT_NAME REV_NAME
------------------------------ ------------------------------
USER$ $RESU
USER_ALL_TABLES SELBAT_LLA_RESU
USER_APPLICATION_ROLES SELOR_NOITACILPPA_RESU
USER_AQ_AGENT_PRIVS SVIRP_TNEGA_QA_RESU
Also, if the Undocumented REVERSE works fine, is there any harm if we use it ?
Thanks.
December 18, 2003 - 11:17 am UTC
the harm in using undocumented features/functions is that
a) they can disappear
b) change behaviour
c) break
and there is nothing you can do about it.
CTXCAT vs. CONTEXT vs. LIKE
Andy, December 19, 2003 - 9:13 am UTC
Thanks for your feedback to my comment above, Tom. I scaled up and it didn't take long before LIKE was clearly slower. I looked into CTXCAT, but the nature of the queries in the FGAC policy would make that hard to implement, so I'll go with a CONTAINS clause.
CONTEXT vs. CTXCAT
amitabh, December 30, 2009 - 12:12 am UTC
Hi Tom
I am trying to decide between CONTEXT and CTXCAT indexes for our app
in my current situation ctxcat type index seem to be a good fit because of :
- small text fragments to be searched
- simple seraches without things like thesaurus etc etc
- mixed searches where text seraches have to be combined with other columns
HOWEVER
i do need to be able to execute the text serach on multiple columns. in the case of context type indexes that is doable by creating multi_column_datastore and section groups and then using WITHIN in the CONTAINS
as described by an example at the bottom of my email.
MY Question is - if using ctxcat - is there a way to search on multiple coulumns with a WITHIN type functionality ?
Will appreciate your reply.
Example Showing Multi-Column-Search using CONTEXT Index
========================================================
example that includes a section group, one section per column, and examples of searching individual columns using "within".
-- table and data for demo:
SCOTT@10gXE> CREATE TABLE noticias
2 (id NUMBER,
3 ouid NUMBER,
4 dateinsert DATE,
5 resumen CLOB,
6 inserted VARCHAR2 (10),
7 tema VARCHAR2 (10),
8 search_cols VARCHAR2 (1))
9 /
Table created.
SCOTT@10gXE> INSERT ALL
2 INTO noticias VALUES (1, 10, SYSDATE, 'testing', 'something', 'whatever', null)
3 INTO noticias VALUES (2, 10, SYSDATE, 'something', 'testing', 'whatever', null)
4 INTO noticias VALUES (3, 10, SYSDATE, 'something', 'whatever', 'testing', null)
5 INTO noticias VALUES (4, 10, SYSDATE, 'testing', 'testing', 'testing', null)
6 INTO noticias VALUES (5, 10, SYSDATE, 'whatever', 'nothing', 'something', null)
7 SELECT * FROM DUAL
8 /
5 rows created.
-- create multi_column_datastore and specify columns:
SCOTT@10gXE> BEGIN
2 CTX_DDL.CREATE_PREFERENCE ('noticias_mcds', 'MULTI_COLUMN_DATASTORE');
3 CTX_DDL.SET_ATTRIBUTE ('noticias_mcds', 'COLUMNS', 'resumen, inserted, tema');
4 END;
5 /
PL/SQL procedure successfully completed.
-- create section group and add one section per column:
SCOTT@10gXE> BEGIN
2 CTX_DDL.CREATE_SECTION_GROUP ('noticias_sg', 'BASIC_SECTION_GROUP');
3 CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'resumen', 'resumen', TRUE);
4 CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'inserted', 'inserted', TRUE);
5 CTX_DDL.ADD_FIELD_SECTION ('noticias_sg', 'tema', 'tema', TRUE);
6 END;
7 /
PL/SQL procedure successfully completed.
-- create index using multicolumn_datastore and section group:
SCOTT@10gXE> CREATE INDEX noticias_index
2 ON noticias (search_cols)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS
5 ('DATASTORE noticias_mcds
6 SECTION GROUP noticias_sg')
7 /
Index created.
-- variables for searching:
SCOTT@10gXE> VARIABLE v_palabra VARCHAR2 (30)
SCOTT@10gXE> VARIABLE v_ouid NUMBER
SCOTT@10gXE> EXEC :v_palabra := 'testing'
PL/SQL procedure successfully completed.
SCOTT@10gXE> EXEC :v_ouid := 10
PL/SQL procedure successfully completed.
-- search of all columns:
SCOTT@10gXE> COLUMN resumen FORMAT A10 word_wrapped
SCOTT@10gXE> SELECT /*+ FIRST_ROWS(50) */
2 score (1) AS RANK,
3 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
4 resumen, inserted, tema
5 FROM NOTICIAS
6 WHERE contains (search_cols, :v_palabra, 1) > 0
7 AND ouid = :v_ouid
8 ORDER BY dateinsert DESC
9 /
RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
3 1 04/09/2007 testing something whatever
3 2 04/09/2007 something testing whatever
3 3 04/09/2007 something whatever testing
10 4 04/09/2007 testing testing testing
-- searches of individual columns using "within":
SCOTT@10gXE> SELECT score (1) AS RANK,
2 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
3 resumen, inserted, tema
4 FROM NOTICIAS
5 WHERE contains (search_cols, :v_palabra || ' WITHIN resumen', 1) > 0
6 /
RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
4 1 04/09/2007 testing something whatever
4 4 04/09/2007 testing testing testing
SCOTT@10gXE> SELECT score (1) AS RANK,
2 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
3 resumen, inserted, tema
4 FROM NOTICIAS
5 WHERE contains (search_cols, :v_palabra || ' WITHIN inserted', 1) > 0
6 /
RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
4 2 04/09/2007 something testing whatever
4 4 04/09/2007 testing testing testing
SCOTT@10gXE> SELECT score (1) AS RANK,
2 ID, TO_CHAR (dateinsert, 'dd/mm/yyyy') "DATE",
3 resumen, inserted, tema
4 FROM NOTICIAS
5 WHERE contains (search_cols, :v_palabra || ' WITHIN tema', 1) > 0
6 /
RANK ID DATE RESUMEN INSERTED TEMA
---------- ---------- ---------- ---------- ---------- ----------
4 3 04/09/2007 something whatever testing
4 4 04/09/2007 testing testing testing
===========================================================
January 04, 2010 - 9:40 am UTC
non-starter here.
ctxcat does not do clob types.