Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, sanjay.

Asked: June 24, 2000 - 12:28 pm UTC

Last updated: October 17, 2011 - 6:37 pm UTC

Version: 8.1.6

Viewed 10K+ times! This question is

You Asked

Tom,
Can we use the Context Option against any table containing
text ? We would like to store text, received as part of
a user request (it will be more than 4000 chars, is a CLOB
a better way of storage?) and then allow keyword searching,
with keyword higlighting.
In order to use Context, is it enough to build Context
Indexes on these columns and then write a front-end
which issues Context Queries.
For. e.g We pose questions to you and later we can
search using keywords. The search returns a result set
with ranking and keyword higlighting.


and Tom said...

The answer is simply "yes".

Here are some more details. I use context myself on a very similar appliation "Ask Tom". People contribute questions -- I follow up and answer them and if I feel they have broad impact, I "publish" them. In order to publish them, I format the question and answer into 1 field with some HTML markup and such and put it into a single CLOB column. The table I put this into looks like this:

create table submitted_indexed_questions
( dispid number primary key,
text clob
);

Of course, your table may have many other attributes, I leave many attributes in another table. That other table has all of the questions and answers. The above table only has questions available for everyone to look at.

I have an index on this other table:


create index search_idx
on submitted_indexed_questions(text)
indextype is ctxsys.context
/

That allows me to do text based searches against this table. I have a search routine (you can interact with it at
</code> http://asktom.oracle.com/wa/ask/owa/ask_tom_pkg.search <code>
).

Basically, I can now run queries like:


ask_tom@OSI1.WORLD> l
1 select count(*)
2 from submitted_indexed_questions
3* where contains( text, 'index', 1 ) > 0
ask_tom@OSI1.WORLD> /

COUNT(*)
----------
130

ask_tom@OSI1.WORLD> select count(*) from submitted_indexed_questions;

COUNT(*)
----------
573

that shows me 130 out of 573 questions have the word index in them... I can get more complex:

ask_tom@OSI1.WORLD> select count(*)
2 from submitted_indexed_questions
3 where contains( text, 'index and desc', 1 ) > 0;

COUNT(*)
----------
4

ask_tom@OSI1.WORLD>

Now I see only 4 of them have both index and desc in them...

Now, if I want to rank them and get the "top 5" I can do something like:


1 select *
2 from ( select dispid, score(1),
dbms_lob.substr(text,25,1) text
3 from submitted_indexed_questions
4 where contains( text, 'index', 1 ) > 0
5 order by 2 DESC )
6* where rownum < 6
ask_tom@OSI1.WORLD> /

DISPID SCORE(1) TEXT
--------------- --------------- -------------------------
239614547000 100 Igor -- Thanks for the qu
14812348049 100 Ratnaprabha
-- Thanks f

9212348049 100 Robert
-- Thanks for th

153212348067 72 Paul
-- Thanks for the

157412348067 67 Lalita
-- Thanks for t



Now, once I've searched, I can display a document marked up (do a search at the url above and you'll see your hits with the search terms marked up for you -- intermedia text does this as well). It would look like this:


CTX_DOC.MARKUP( index_name => 'search_idx',
textkey => p_dispid,
text_query => p_search_str,
restab => 'markup',
query_id => userenv('sessionid'),
plaintext => true,
starttag => '<font color="red">',
endtag => '
</font>' );


Here we simply call the MARKUP procedure and specify the name of our index (so it knows what table we are accessing, what the text column in that table is and what the primary key of that table is). We pass in the primary key to identify the instance we wish to mark up. We give it our search string, what we used to find this document. We tell it what table to place the marked up document in ("markup" in this case). This table looks like:

ask_tom@OSI1.WORLD> desc markup
Name Null? Type
----------------------------- -------- --------------------
QUERY_ID NOT NULL NUMBER
DOCUMENT CLOB

We pass it a unique key that we can use to identify our markedup document (I find userenv('sessionid') great for this purpose). We tell it the tags to use to markup our text, I use bolded red as my markup.






Rating

  (85 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent

Kinnari, May 12, 2001 - 3:12 am UTC

Excellent example for the beginners !!!!!!

Very good explanation !

A reader, July 21, 2001 - 10:03 am UTC


A reader, September 27, 2001 - 11:30 am UTC

Hi Tom,
I am searching a table which a millian records in it ..I have to display all the rows which has that particular text in any of the columns..
say i have
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 columns in test table
and i want to list the rows which has zzzzzz in any column.
What is the best solution....
pl help

Tom Kyte
September 27, 2001 - 2:24 pm UTC

Consider indexing a function with interMedia text that concatenates all of the fields together

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:440419921146 <code>

or if the data is relatively static (not updated frequently) and the values in c1, c2, c3, ... repeat alot (the cardinality of each column is low) a bitmap index on EACH OF c1, c2, c3 (eg: N indexes -- not an index with N columns) might work well.

link doesn't work

rajiv, September 13, 2002 - 5:38 pm UTC

</code> http://asktom.oracle.com/wa/ask/owa/ask_tom_pkg.search <code>

are the words intermedia search option and context are same or different?

is there anything new in 9i with context option.

where do i find more info on that.

does your book talk about that

Tom Kyte
September 14, 2002 - 3:06 pm UTC

sorry -- just click on the search link.

context = intermedia = text


Yes there is new stuff all of the time. On otn.oracle.com, you'll find links to the documentation. There is a new features guide and each document also lists "whats new in this document"

Rebuild after new records inserted?

A Reader, January 30, 2003 - 9:37 am UTC

Hi Tom,
Is it true that the rows inserted into a table after the context index has been built will not appear in the search results and after we insert new records we need to rebuild the index to include the new records? My search results do not include the new records.
Thank you very much.

Tom Kyte
January 30, 2003 - 11:11 am UTC

It depends. See
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:238814521769 <code>

for a discussion of the ctxsrv process. A procedure I use on my internal site -- instead of running the ctxsrv process is:

create or replace procedure sync_all_ctx_indexes
as
l_err long;
begin
for x in ( select u.name idx_owner,
idx_name idx_name
from sys.user$ u, dr$index
where idx_owner# = u.user#
and u.name <> 'CTXSYS' )
loop
begin
ctx_ddl.sync_index( x.idx_owner || '.' || x.idx_name );
exception
when others then
l_err := l_err || sqlerrm;
end;
end loop;

if ( l_err is not null )
then
raise_application_error( -20001, 'Errors ' || l_err );
end if;
end;
/


then we have dbms_job run that every couple of minutes.

markup over multiple rows

Scott, February 21, 2003 - 2:47 am UTC

G'day

Is the ctx_doc.markup applicable/useful/efficient when selecting multiple rows.
After looking at the definition, and having a small play, it seems that it is only useful AFTER you've found your record to get the primary key.

For example, when you Google a few words, the result list marks-up the words searched for.
As ctx_doc.markup is a procedure, this doesn't seem possible, unless you build a function that called for each row returned would go away, markup the text, find and remove the record from restab and return the marked-up text.

Or is ctx_doc.markup intended for marking larger documents as requested?

And what's the latest URL for
</code> http://asktom.oracle.com/wa/ask/owa/ask_tom_pkg.search <code>
as this seems to be out-of-date (doesn't work)...
You said 'just click the search link', but I didn't know what you meant.

Thanks.

Tom Kyte
February 21, 2003 - 9:34 am UTC

you use it to markup A document for viewing.

What you would do to do a google like thing would be to format the hitlist, insert it into a table and mark it up - probably not the most efficient thing in the world...

just goto asktom.oracle.com -- the search is on the home page now.

Link is dead

Phil, March 24, 2003 - 2:45 pm UTC

Hello Tom,
I'm really interested in this and would be keen to see
the way you've implemented the search package, however the
link (</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:440419921146 <code> is dead.
Please can you re-post it?
Thanks!

Tom Kyte
March 24, 2003 - 3:09 pm UTC

works for me.

Index not supported for RAW PKs ? - 9iR2

pasko, March 25, 2003 - 3:37 am UTC

Hi Tom,

thank for a Great answer above ..

i wanted to test CTX index on some XML data which is stored in the database , but i keep hitting the error shown below.

My table has a parimary key column of RAW(16) Type and i am always using sys_guid() to auto-fill this column.


ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: Oracle Text error:
DRG-10528: primary keys of type RAW are not allowed
ORA-06512: at "CTXSYS.DRUE", line 157
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 186

Are RAW types not supported in this case as the error says ?

Thanks .




Tom Kyte
March 25, 2003 - 8:12 am UTC

It is not supported until the next release of Oracle after 9iR2 when this implementation restriction is removed.

too good

john, May 07, 2003 - 4:40 pm UTC

thanks for the simple examples given.

Tom, can you please give examples how do we do the same in Oracle 9i as well.
Like how do we create any index and use the contains function on the text field.

thanks

Tom Kyte
May 08, 2003 - 9:28 am UTC

cut and paste -- there is no difference.

Tables underlying Oracle Text

Gurunandan, May 08, 2003 - 11:05 pm UTC

Thanks for the clear explanation. How do I go about getting at the tables underlying Oracle Text? For example, I want to build a thesaurus tree with SYN, NT and BT I use the "contains(SYN/NT/BT)" construct. but I would much rather generate the tree using selects on some ctxsys tables. Is there a way to do this?

Tom Kyte
May 09, 2003 - 1:12 pm UTC

not recommended at all. use the documented views -- sure, access the base tables at your own risk and be prepared to rewrite your code that does this with every release.


perhaps what you need is in ctx_user_thes* views

Create the fnd_lobs_ctx intermedia index

Sean, October 25, 2003 - 1:26 pm UTC

I use the script as shown following to create the index and got the error, could you point out what the probelms?
Many thanks.
-----------------------------------------------
whenever sqlerror exit failure rollback;
connect &1/&2

declare
stmt varchar2(1000);
mycount number;
cursor oldIndex is select owner from all_indexes
where index_name = 'FND_LOBS_CTX';
begin
for dropit in oldIndex loop
execute immediate 'drop index '||dropit.owner||'.fnd_lobs_ctx force';
end loop;
delete from fnd_lobs where file_id = -1;
select count(*) into mycount from fnd_lobs;
if (mycount = 0) then
insert into fnd_lobs (file_id,file_content_type,oracle_charset,file_format)
values (-1,'text/html','UTF8','TEXT');
end if;
commit;
stmt := ' create index fnd_lobs_ctx on fnd_lobs(file_data) '||
' indextype is ctxsys.context '||
' parameters(''lexer fnd_lobs_global_lexer '||
' language column language '||
' filter fnd_lobs_charset_filter '||
' format column file_format '||
' charset column oracle_charset '||
' storage fnd_lobs_basic_storage '||
' wordlist fnd_lobs_basic_wordlist'') ';
execute immediate stmt;
exception
when others then
execute immediate 'drop index fnd_lobs_ctx force';
raise;
end;
/
commit;
exit;
------------------------------------------
declare
*
ERROR at line 1:
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: interMedia Text error:
ORA-06508: PL/SQL: could not find program unit being called
ORA-06512: at line 30


Tom Kyte
October 25, 2003 - 1:54 pm UTC

try it just as a script -- get a handle on what statement is causing it. given that I don't have an entire test case (eg: whats fnd_lobs_global_lexer, etc etc etc)....

Also, just try creating a simple text index -- is interMedia installed correctly on your system. are any of the packages invalid and not compileable?

(and oh, if this is an 11i apps upgrade or something -- please, contact support, that would be the best way to handle that.)

Why I need to rebuild index

Yong, October 28, 2003 - 4:45 pm UTC

Tom,

create table notes(note_id number primary key,title varchar2(255),text clob)

create index search_body_idx
on notes(text)
indextype is ctxsys.context;

create index search_title_idx
on notes(title)
indextype is ctxsys.context;

insert into notes values(note_seq.nextval,'this is oracle title','asdf asd some sdf');

commit;

select * from notes
where contains( text, 'some', 1) > 0
or contains(title,'oracle') > 0

return no rows.

After I rebuild the indexes, it works. I try this twice.
Is there setup need to make index happend automatically?




Yong, October 29, 2003 - 1:17 pm UTC

Tom,

Thanks, ctxsvr works.
I have another question

whey the following query doesn't work

  1  select * from notes
  2  where  contains(title,'database',1) > 0
  3* or contains(text,'dadtabase',1)>0
SQL> /
or contains(text,'dadtabase',1)>0
   *
ERROR at line 3:
ORA-29907: found duplicate labels in primary invocations

the following query works

select * from notes
where  contains(title,'database',1) > 0
union all
select * from notes
where contains(text,'dadtabase',1)>0

I wonder whether there is a better way than the union all
 

Tom Kyte
October 29, 2003 - 2:27 pm UTC

change the 1 to a 2 in the second one. you "labeled" the contains clause with the same label

Yong, October 29, 2003 - 5:52 pm UTC

Tom,

I got an error when I query and display html.

for c in (select * from notes where parent_id=0 order by created_date desc) loop
i:=i+1;
htp.p(''||i||'. '||c.title||'<p>'||c.text);
for d in (select * from notes
start with parent_id=c.note_id
connect by prior note_id=parent_id) loop
htp.p('<hr>'||d.text);
end loop;
end loop;

ORA-06502: PL/SQL: numeric or value error

The problem may be caused by text which is clob type.
Should I retrieve the clob piece by piece and display?
or any better way to handle this?

thanks,

Tom Kyte
October 30, 2003 - 6:49 am UTC

well, you don't give the line numbers or anything - so it is hard to say.

if the clob exceeds 32k, you'll need to print it a piece at a time since htp.p takes a varchar2 and in plsql varchar2's are limited to 32k

Yong, October 30, 2003 - 12:57 pm UTC

Tom,

I fixed using dbms_lob.read as follows

procedure get_content(id number,string varchar2) is
Lob_loc CLOB;
Buffer VARCHAR2(32767);
Amount BINARY_INTEGER := 32767;
Position INTEGER := 1;
i INTEGER;
last number;
BEGIN
SELECT text INTO Lob_loc FROM notes where note_id=id;

DBMS_LOB.OPEN (Lob_loc, DBMS_LOB.LOB_READONLY);
last:=round(dbms_lob.GETLENGTH(Lob_loc)/32767)+1;
FOR i IN 1..last LOOP
DBMS_LOB.READ (Lob_loc, Amount, Position, Buffer);
Position := Position + Amount;
dbms_output.put_line(substr(buffer,1,255));
END LOOP;
DBMS_LOB.CLOSE(Lob_loc);
EXCEPTION WHEN OTHERS THEN
dbms_output.put_line(sqlerrm);
DBMS_LOB.CLOSE(Lob_loc);

Let me know if you see some potential problem.

Thanks,

Tom Kyte
October 30, 2003 - 9:43 pm UTC

other then wondering how we got from htp to dbms_output, i see no "issues" (unless the lob is evenly divisible by 32767 of course ;)

Do I have to replace '_' with a space in the search?

Yong, October 31, 2003 - 2:25 pm UTC

Tom,

Thank you for pointing to the division

round(dbms_lob.GETLENGTH(Lob_loc)/32767)+1;

I changed to ceil(dbms_lob.GETLENGTH(Lob_loc)/32767)

I found another problem with search

insert into notes(note_id,text) values(12345,'this_word_with_underline');

SQL> select note_id from notes where contains(text,'this_word_with_underline',1) > 0

no rows selected

SQL> select note_id from notes  where contains(text,'this word with underline',1) > 0;

   NOTE_ID
----------
     12345

I have to replace all '_' with a space to search the word.

What I did wrong?

 

Tom Kyte
November 01, 2003 - 12:25 pm UTC

well, by default, Oracle text is going to turn the _ into ' ' in the orginal input.  If you look at the actual tokens generated from that:

insert into notes(note_id,text) values(12345,'this_word_with_underline');

  1* select token_text from  DR$T_IDX$I
ops$tkyte@ORA920> /
 
TOKEN_TEXT
----------------------------------------------------------------
UNDERLINE
WORD
 

this, with -- stopwords, not indexed.  You can either cleanup the input string, or do what I do here:

ops$tkyte@ORA920> select * from notes where contains( text, '{this_word_with_underline}' ) > 0;
 
   NOTE_ID TEXT
---------- --------------------------------------------------------------------------------
     12345 this_word_with_underline


"quote" the search string with braces.
 

Does search sequnce matter?

Yong, November 06, 2003 - 1:37 pm UTC

Tom,

I assume that sequnce of searching words should not be matter. But how to explain this?


SQL> insert into notes(note_id,text) values(12345,'pl/sql email ascii excel');

1 row created.

SQL> commit;

Commit complete.

SQL> select note_id from notes where contains(text,'ascii excel') > 0;

   NOTE_ID
----------
     12345

SQL> select note_id from notes where contains(text,'excel ascii') > 0;

no rows selected
 

Tom Kyte
November 06, 2003 - 5:33 pm UTC

the STRING "ascii excel" exists

the STRING "excel ascii" does not

if you want to find all documents that contain excel and ascii -- use {}...




ops$tkyte@ORA920LAP> select note_id from notes where contains(text,'ascii excel') > 0;

   NOTE_ID
----------
      1234

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> select note_id from notes where contains(text,'excel ascii') > 0;

no rows selected

ops$tkyte@ORA920LAP> select note_id from notes where contains(text,'{excel}& {ascii}') > 0;

   NOTE_ID
----------
      1234

ops$tkyte@ORA920LAP>



 

Yong, November 06, 2003 - 7:52 pm UTC

Tom,

My assumption was wrong. I need to tokenize the search string and replace all unwanted character and have '&'. It works.

Thanks a lot!

Intermedia search on blob

Yong, April 22, 2004 - 1:29 pm UTC

Tom,

Here is what I am tring to do and get error

create table test(docid number,content blob);
Create index test_idx on test(content) indextype is ctxsys.context; 

load a rtf document into the test(content)

SQL> select docid from test where contains('content','test')>0;
select docid from test where contains('content','test')>0
*
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-10599: column is not indexed

To make sure the index work
SQL>  select token_text from DR$TEST_IDX$I;

TOKEN_TEXT
----------------------------------------------------------------
DOCUMENT
GET
INTERMEDIA
MY
NOTHING
OUR
SEARCH
SHOW
TEST
TEXT
USER

11 rows selected.

 it seems index is ok, but why Oracle complain column is not indexed?

thanks 

Tom Kyte
April 23, 2004 - 8:33 am UTC

ops$tkyte@ORA9IR2> create table test(docid number,content blob);
 
Table created.
 
ops$tkyte@ORA9IR2> Create index test_idx on test(content) indextype is ctxsys.context;
 
Index created.
 
ops$tkyte@ORA9IR2> select docid from test where contains('content','test')>0;
select docid from test where contains('content','test')>0
*
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-10599: column is not indexed
 
 
ops$tkyte@ORA9IR2> select docid from test where contains(content,'test')>0;
 
no rows selected
 


no quotes around that column....
 

it works thanks!

A reader, April 23, 2004 - 12:02 pm UTC


Google PageRank vs. Oracle Text

A reader, June 01, 2004 - 10:21 pm UTC

Are there objective industry studies or at least discussions about the effectiveness of Google's legendary PageRank technology vs. Oracle Context? Google is now marketing its technology as a 'Search Appliance' to index any enterprise Intranet and bring the "power of Google".

Given the same search domain, same search string, which one is more likely to bring up first hits which "read the users mind"?!

Thanks

Tom Kyte
June 02, 2004 - 7:48 am UTC

well, they are very different technologies.

Oracle Text is predominantly a convention text search. "find this word near this other word". "find all documents that contain this word or that word but not this other word". "Do a stem search for this word" (stemming in the linguistic sense).

Oracle Text also has theming -- "tell me the 10 themes about this page"

Text also has gisting -- reduce this 100 page document into a few paragraphs.

It now also has a rule driven document classification system.

Google is similar in some respects, different in others. You would have to look at what you want to search, how you want to search it, whether the search needs to be integrated with other things (eg: join 5 tables and have predicates on structured and unstructured data) and so on.

At the end of the day - you develop your requirements and see which product best meets the needs.

A little proble with stem search

Alessandro Nazzani, July 02, 2004 - 7:53 am UTC

Hi Tom.

Thanks as usual for your time and efforts.

I'm trying to reproduce the basic examples from your book and I have a minor (I hope!) problem:

SQL*Plus: Release 10.1.0.2.0 - Production on Ven Lug 2 12:57:05 2004
Copyright (c) 1982, 2004, Oracle. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

_scott_@ARCHIMEDE_10g> select * from mytext;

ID THETEXT
---------------------------------------------------------------------------------
1 The headquarters of Oracle Corporation is in Redwood Shores, California
2 Oracle has many training centers around the world


_scott_@ARCHIMEDE_10g> select score(1), id from mytext where
2 contains(thetext, 'oracle or california',1) > 0 order by score(1) desc
3 /

SCORE(1) ID
---------- ----------
4 1
3 2

So far, so good. But:

_scott_@ARCHIMEDE_10g> select id from mytext where contains(thetext,'$train') > 0;

no rows selected

What am I missing?

Alessandro

Tom Kyte
July 02, 2004 - 10:19 am UTC

probably a language issue -- I see "Ven Lug" in your banner -- not english, perhaps in your language train is not stemmed to training?



ops$tkyte@ORA10G> create table t ( id number, thetext varchar2(4000) );
 
Table created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> insert into t values
  2  ( 1, 'The headquarters of Oracle Corporation is in Redwood Shores, California' );
 
1 row created.
 
ops$tkyte@ORA10G> insert into t values
  2  ( 2, 'Oracle has many training centers around the world' );
 
1 row created.
 
ops$tkyte@ORA10G> commit;
 
Commit complete.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> create index t_idx on t(thetext) indextype is ctxsys.context;
 
Index created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> select score(1), id from t where contains(theText,'$train',1) > 0;
 
  SCORE(1)         ID
---------- ----------
         4          2
 
 

Alessandro Nazzani, July 02, 2004 - 10:38 am UTC

<quote>probably a language issue</quote>

I see. I thought English was supported by default (from Oracle Text Reference: "Oracle Text ships with a system stemming dictionary") but I was wrong (there's no such thing as $ORACLE_HOME/ctx/data/enlx/dren.dct on my system).

So am I right assuming that without a Language-Specific Knowledge Base (that Oracle doesn't provide) advanced functions such as stemming, theming, etc. cannot be used?

Alessandro

Tom Kyte
July 02, 2004 - 10:57 am UTC

what I mean is -- your NLS_LANG is obviously not english, so Text is saying "use the rules for stemming in language 'German'" or whatever language you are in.

We support multiple languages.

The Oracle Text stemmer, licensed from Xerox Corporation s XSoft Division, supports the following languages: English, French, Spanish, Italian, German, and Dutch.

Alessandro, July 05, 2004 - 5:12 am UTC

You were right - of course ;)

I was misleaded because inserting a row in my language didn't work: it turned out it was just a synchronization issue.

Thanks again and sorry for wasting your time.

Alessandro

Link is not found

Rafique Awan, July 08, 2004 - 11:09 am UTC

Tom,

I don't find link below.Is it not valid anymore.

</code> http://asktom.oracle.com/wa/ask/owa/ask_tom_pkg.search <code>

Tom Kyte
July 08, 2004 - 11:56 am UTC

change it to

</code> http://asktom.oracle.com/ <code>



A reader, August 21, 2004 - 10:49 am UTC

Tom,

Can you please help me on this ...

PRDSYRSQL> start ?/ctx/admin/defaults/drdefus.sql;
Creating lexer preference...
Creating wordlist preference...
Creating stoplist...
begin
*
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-12505: error while adding or removing a stop word
ORA-06508: PL/SQL: could not find program unit being called
ORA-06512: at "CTXSYS.DRUE", line 157
ORA-06512: at "CTXSYS.CTX_DDL", line 956
ORA-06512: at line 4
 
 
Creating default policy...
begin
*
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-50857: oracle error in ctx_ddl.create_policy
ORA-06508: PL/SQL: could not find program unit being called
ORA-06512: at "CTXSYS.DRUE", line 157
ORA-06512: at "CTXSYS.CTX_DDL", line 1655
ORA-06512: at line 2

Why am i getting this .. am i missing anything, ...

Thanks,
Joe Anand
 

Tom Kyte
August 21, 2004 - 12:19 pm UTC

looks like Oracle text was not installed or only partially installed. Time to talk with the admin for your database.

Rebuilding TEXT INDEX

MSU, September 10, 2004 - 8:42 am UTC

Hi Tom,

Thanks for suggesting the "Text Index" option for my Problem which I had posted in "Dynamic Allocation of Value for Function based Index". But this puts me in another strange situation

In the following link.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:350017644883, <code>

You have written that when you create an Text Index it needs to be rebuild when ever there is an update or insert. It is true and I myself have checked it in my own test_tab table. When I update the VAL column or insert a new row and run the query I dont get the correct answer. You have suggested that we have to rebuild the Index at regular intervals.

Now comes my problem ... My production site has a decent hit rate and orders will be placed and updated at a resonable frequency. May be there will be 100s of orders that will be placed at the same time.

Option 1: Assume that I rebuild the index after a regular interval of time. Its a problem for the user - Say, he update his own order and tries to search for the one which he had updated just now, it will not be reflected until the index is rebuild - Which is likely to happen after a fixed time interval.

The other option is I rebuild the Index after every insert or update ... Which I feel is not at all a good way of doing things. That will slow down the performance of the site. So what is the solution out ?

I also wish to know when I rebuild the Index what is the approximate time ORACLE will take to rebuild it assuming that there are a million records in my table.

Thanking you in anticipation.

Tom Kyte
September 10, 2004 - 9:35 am UTC

not rebuilt - sync'ed.

it is not rebuilt.


*sync* it.

Phrase and wild card search using intermedia

Ranjan, September 13, 2004 - 7:50 pm UTC

Hi Tom.
I am having a requirement where variety kind of search is going to be performed on a piece of text stored in database. For example consider the below piece of text is stored in a column.

"The headquarters of Oracle Corporation is in Redwood Shores, California. Have you ever been to California. It is a nice place to live. But sometimes you can have earthquake, Thats fine if you ignore that,Oracle has large number of application '. But here sales tax rate 8.25 . But you can earn 100000 here."

1 Do a phrase search using wild card . Like search for Ora%
So the Output will be
Oracle
Oracle
Similalry if you want to search for ni% pl% , then it should return
nice place
2. Apart from % , is should be able to return pattern matching datatype .e.g you can search for 'Oracle Corpo??????' where ? denotes for a character and search should return Oracle Corporation . Similarly you can search for 'rate #.##' and it should return 8.25 in the result.

3. All the above condition should work in a mixed and match manner.

I am exploring the idea of how we can make use of intermedia to acheive this. Your opinion will be appreciated. Thanks a Lot.

Tom Kyte
September 13, 2004 - 9:17 pm UTC

you can do most of this with intermedia -- #.## won't be, but the wildcards are.

How useful are themes and gists

Tom, September 14, 2004 - 2:44 am UTC

Tom

You point to intermedia being able to generate themes and gists but I've yet to see a good example of these in use. Every time I've seen gists they seem to be a pointless waste of time.

Can you point to any examples where the gist of a long document is a good reflection of the full document??

Tom Kyte
September 14, 2004 - 8:40 am UTC

a gist is a gist, it tries to find the most meaningful paragraph(s) and use them.

can you point to an example of a "truly meaningless" gist.  I just ran my chapter 1 from "Expert One on One Oracle" thru it.  One gist (the generic one) was:

  QUERY_ID POV
---------- --------------------
GIST
------------------------------------------------------------------------
         1 GENERIC
In other databases, you would use a temporary table and procedural code
in a stored procedure to achieve the same results, perhaps. You paid for
 these features so you might as well use them.
 
These are the same techniques developers, who implement multi-platform c
ode, utilize. Oracle Corporation for example, uses this technique in the
 development of its own database. There is a large amount of code (a sma
ll percentage of the database code overall) called OSD (Operating System
 Dependent) code that is implemented specifically for each platform. Usi
ng this layer of abstraction, Oracle is able to make use of many native
OS features for performance and integration, without having to rewrite t
he large majority of the database itself. The fact that Oracle can run a
s a multi-threaded application on Windows and a multi-process applicatio
n on UNIX attests to this feature. The mechanisms for inter-process comm
unication are abstracted to such a level that they can be re-implemented
 on an OS-by-OS basis, allowing for radically different implementations
that perform as well as an application written directly, and specificall
y, for that platform.
 
In addition to SQL syntactic differences, implementation differences, an
d differences in performance of the same query in different databases ou
tlined above, there are the issues of concurrency controls, isolation le
vels, query consistency, and so on. We cover these items in some d
 
We've seen the case where a business rules appears to get enforced on on
e database, due to side effects of the databases locking model, and does
 not get enforced in another database. You'll find that, given the same
exact transaction mix, reports come out with different answers in differ
ent databases - all because of fundamental implementation differences. Y
ou will find that it is a very rare application that can simply be picke
d up and moved from one database to another. Differences in the way the
SQL is interpreted (for example, the NULL=NULL example) and processed wi
ll always be there.
 
On a recent project, the developers were building a web-based product us
ing Visual Basic, ActiveX Controls, IIS Server and the Oracle 8i Databas
e. I was told that the development folks had expressed concern that sinc
e the business logic had been written in PL/SQL, the product had become
database dependent and was asked: "How can we correct this?''
 
I was a little taken aback by this question. In looking at the list of c
hosen technologies I could not figure out how being database dependent w
as a 'bad' thing:
 
? They had chosen a language that locked them into a single operating sy
stem and is supplied by a single vendor (they could have opted for Java)
 
? They had chosen a component technology that locked them into a single
operating system and vendor (they could have opted for EJB or CORBA)
? They had chosen a web server that locked them in to a single vendor an
d single platform (why not Apache?)
 
Every other technology choice they had made locked them into a very spec
ific configuration 


<b>which in my opinion captures the spirit of chapter 1 perfectly.  Looking at a different point of view, "databases", they nailed it</b>

ops$tkyte@ORA9IR2> select * from gisttab where pov = 'databases' and query_id = 1;
 
  QUERY_ID POV
---------- --------------------
GIST
------------------------------------------------------------------------
         1 databases
So, if you are used to the way other databases work with respect to quer
y consistency and concurrency, or you have never had to grapple with suc
h concepts (no real database experience), you can now see how understand
ing how this works will be important to you. In order to maximize Oracle
's potential, you need to understand these issues as they pertain to Ora
cle - not how they are implemented in other databases.
Database Independence?
By now, you might be able to see where I'm going in this section. I have
 made references above to other databases and how features are implement
ed differently in each. With the exception of some read-only application
s, it is my contention that building a wholly database independent appli
cation that is highly scalable is extremely hard - and is in fact quite
impossible unless you know exactly how each database works in great deta
il.
 
 
 
I have encountered issues such as this many times when an application is
 being moved from database A to database B. When an application that wor
ked flawlessly in database A does not work, or works in an apparently bi
zarre fashion, on database B, the first thought is that database B is a
'bad database'. The simple truth is that database B just does it differe
ntly - neither database is wrong or 'bad', they are just different. Know
ing and understanding how they work will help you immensely in dealing w
ith these issues.
 
<b>when I flipped to sentence level gisting:</b>

  1* select * from gisttab where pov = 'databases' and query_id = 2
ops$tkyte@ORA9IR2> /
 
  QUERY_ID POV
---------- --------------------
GIST
------------------------------------------------------------------------
         2 databases
If I can do it in the database, I will.
 
Most had no database experience whatsoever.


<b>"if i can do it in the database, I will." -- that's my mantra, out of the entire chapter, to pick that sentence -- well, I was laughing out loud -- perfection, the generic, 2 sentence gist of chapter 1:</b>


  1* select * from gisttab where pov = 'GENERIC' and query_id = 2
ops$tkyte@ORA9IR2> /
 
  QUERY_ID POV
---------- --------------------
GIST
------------------------------------------------------------------------
         2 GENERIC
If I can do it in the database, I will.
 
The simple truth is that database B just does it differently - neither d
atabase is wrong or 'bad', they are just different.
 


<b>again, perfection.  Is it "speed reading" -- no, it is a gist, a summary, an overview of whats to come, the executive summary, something you read to see if you want to thread the entire thing.
</b>

from those gists -- you definitely get an idea of what this guy is going to say more verbosely in chapter 1 don't you? 

Sentence gist!

A reader, September 14, 2004 - 10:26 am UTC

"If I can do it in the database, I will"

This is really astounding!

How do you think Text was able to boil down the chapter to this one sentence? Is it because you repeat it a lot in the chapter, or did it really "understand" the chapter? (nah, computers are dumb, arent they?!)

implementing `search in search results' functionality

Sergei, October 28, 2004 - 3:02 am UTC

Hi Tom,

does the way to _effectively_ perform search in the results of previous search exist? i assume utilization of oracle text of any other product which uses domain indexes.


Tom Kyte
October 28, 2004 - 7:51 am UTC

effectively -- sure


select * from ( OTHER_SEARCH ) where .....


if you have text, domain indexes absolutely come into play, but not sure what thaty has to do with "previous search" which really just means (to me) "A QUERY"

sorry for inexact question

sergei, October 28, 2004 - 9:47 am UTC

but i was asking about the underlying technology

let's pretend we have some web application (like asktom) but with the facility of subsearching

just assume that previous search is stored into temporary table:

create global temporary table search_results
(
search_id
doc_id number
)
/
create sequence seq_search_results
/

we perform a search:

insert into search_results
select seq_search_results.nextval, doc_id
from domain_indexed_table
where operator(...) = ...;

we get too many hits and want to make a refining query:
select doc_id
from domain_indexed_table
where operator(...) = ...
and doc_id in (
select doc_id from search_results
where search_id = ...
);

CBO tries to generate query plan based on user (domain index developer) supplied statistics

as i can understand, it basically has two choices:
a) perform a domain index search first and then filtering results of this search by subquery
b) perform subquery filtering first and then applying operator implementation (pl/sql function) for each row in the returned set.

some of these choices can be effective, but sometimes both choices are bad and the best solution is to `merge' both methods: perform a domain index search in the subsearch results as if it were case (a) - in a batch (instead of calling operator implementation for each row)...

what solution might be?

that is the question...

Tom Kyte
October 28, 2004 - 12:36 pm UTC

this "doesn't compute", whats up with putting the results in a table?


you would have one of three plans generated -- period. Unless you did not use bind variables which only works in a DW type environment (where you query infrequently -- but they run for a long time).

if developer has some "knowledge" of which of the three possible plans should be used -- they can use that knowledge in the generation of the query as well -- giving the optimizer a chance to see one of three different "style" queries, each with a unique plan.

bad question

sergei, October 29, 2004 - 2:44 am UTC

thanks for the answer and sorry for the question -- is was put in such a way that i cannot imagine what the good answer would be... more technical description of situation does'n fit into interMedia text branch so i wait until the status become "free" or invent my own solution.

-sergei

why cannot reproduce your gist result?

Winston, November 08, 2004 - 1:12 am UTC

I cannot reproduce your gist result on 
SQL> select * from v$version;
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0       Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

I got the pdf from www.orafaq.com/books/samples.htm
its size is  290,539 4826_chap01.pdf

I loaded it into my blob field:
select id, dbms_lob.getlength(theblob) from demo;

        ID DBMS_LOB.GETLENGTH(THEBLOB)
---------- ---------------------------
         1                       15328
         3                       15328
         4                        1100
        27                      290539

then create index demo_idx on demo(theblob) indextype is ctxsys.context;

create table ctx_gist (query_id  number,
                       pov       varchar2(80), 
                      gist      CLOB);  

tried a couple of options to call gist procedure as below:

('demo_idx','27','CTX_GIST',1,'PARAGRAPH', pov =>'GENERIC');

('demo_idx','27','CTX_GIST',2,'PARAGRAPH');

('demo_idx','27','CTX_GIST',3,'PARAGRAPH',numParagraphs => 10);

('demo_idx','27','CTX_GIST', 4, glevel=>'S',num_themes=>20);

('demo_idx','27','CTX_GIST',6,glevel=>'S',num_themes=>20,  maxPercent =>10);

SQL> select gist from ctx_gist where pov='<b>databases</b>' and query_id=6;
nt. I tend to take a database-centric a

? is the exact same feature that ma

atabase is my Virtual Machine. It is my `virtual ope

such as Java). In many cases, they had had

tra when it comes to developing database software: ? You should do it in a singl
e SQL statement if at al

but take much longer t

urs of Oracle experience. Most had no database experience whatsoever. ? They had
 massive performance problems, data integrity

using the database. With the databa

ven faster). The single most comm

encapsulate my experiences into a few broad

statements, they would be: ? An

 don?t need to know about. Maybe they have a SQL generator that will save them f
rom the ha

vironment, the Java Virtual Machine (JVM), and so are highly portable ? is the e
xact

un ? including the operating system). In fact, compared to UNIX,

Windows 3.x was not really a multi-tasking OS

. When I have to implement a feature outside of the database, I find it extrem


for the generic one, I cannot get it either,

  1  select query_id||'**size is :'||dbms_lob.getlength(gist) from ctx_gist
  2* where  pov='GENERIC'
SQL> /
1**size is :10151
3**size is :10151
4**size is :1388
5**size is :10151
6**size is :1388

select gist from ctx_gist where pov='<b>GENERIC</b>' and query_id=4
SQL> /
nt. I tend to take a database-centric a

atabase is my Virtual Machine. It is my `virtual ope

such as Java). In many cases, they had had

tra when it comes to developing database software: ? You should do it in a singl
e SQL statement if at al

ve compilation of Java ? the ability to convert your Java bytecode into operatin
g system specific object code on your platform. This lets Java run just as fast
as C. The Black Box Approach I have an idea, borne out by first-hand experience,
 as to why databas

but take much longer t

o perform a major `re-write?, `re-architecture?, or `tuning? effort. Personally,
 I call t

urs of Oracle experience. Most had no database experience whatsoever. ? They had
 massive performance problems, data integrity

roblems, hanging issues and the like (but very pretty screens). As a result of t
he inevitable performance problems, I would be called in to help solve the diffi
culties. I can recall one p

using the database. With the databa

f applications that will run on a database: understanding that database is cruci
al to your success. If you do not understand what your particular database does
or how it d

ven faster). The single most comm

encapsulate my experiences into a few broad

statements, they would be: ? An

Windows 3.x was not really a multi-tasking OS

. When I have to implement a feature outside of the database, I find it extrem 

Tom Kyte
November 08, 2004 - 10:15 am UTC

with all of the chopped words -- looks like something went wrong in the filtering.

any errors in the CTX dictionary tables reported?

if you just run the filter program (ctxhx in $OH/ctx/bin), does it seem to filter from the command line OK?

(and I used my text document, which was a tad different from the final copy - not much, but different -- before the editor moved stuff around, so we are not expecting to get the same exact thing)

chopped words

Winston, November 08, 2004 - 1:50 pm UTC

SQL> select count(*) from CTX_USER_INDEX_ERRORS;

  COUNT(*)
----------
         0
         

localhost:/u01/app/oracle/product/oracle9i/ctx/bin >./ctxhx /tmp/4826_chap01.pdf /tmp/out_chap01 ASCII8 ISO8859-1 T 
localhost:/u01/app/oracle/product/oracle9i/ctx/bin >
         
then I loaded out_chap01 into BLOB field and ran through sentence-level gist on it. Here is what I got:

SQL> select dbms_lob.getlength(gist) from ctx_gist where query_id=11 and pov='GENERIC';
                    1429



SQL> select stra.word_wrap(gist, 50) from ctx_gist where query_id=11 and pov='<b>GENERIC</b>';
<b>If I can do it in the database, I will.
My approach is to do everything I can in the
database.
This is an approach to database development that
I've never been able to understand.</b>

Most had no database experience whatsoever.
statement if at all possible. * If you cannot do
it in a single SQL Statement, then do it in
PL/SQL. * If
e on your platform. This lets Java run just as
fast as C. The Black Box Approach I have an idea,
borne out by first-hand experience, as to why
database-backed software development efforts so
frequently fail. Let me be clear that I'm
including here those projec
rite', 're-architectur
failures': more often than not they could have
been completed on schedule (or even faster). The
singl
that SQL, transactions and data integrity are
'hard'. The solution * don't make anyone do
anything 'hard'. They treat the database as a
black box and have some software tool generate a
ut very pretty screens). As a result of the
inevitable performance problems, I would be
called in to help solve the difficulties
 I can recall one particular occasion when I
could not fully remember the syntax of a new
command that we needed to use. I asked for the
SQL Reference manual, and I was handed an Oracle 6
a, you do need to know everything i
standing of the fundamental tool
apsulate my experiences into a few broad st
tements, they would be: * An appl
evelopers had to understand exactly how the W


SQL> select dbms_lob.getlength(gist) from ctx_gist where query_id=11 and pov='databases';
                    1019

SQL> select stra.word_wrap(gist, 50) from ctx_gist where query_id=11 and pov='<b>databases</b>';
<b>If I can do it in the database, I will.
My approach is to do everything I can in the
database.
This is an approach to database development that
I've never been able to understand.</b>
Most had no database experience whatsoever.
statement if at all possible. * If you cannot do
it in a single SQL Statement, then do it in
PL/SQL. * If
rite', 're-architectur
ut very pretty screens). As a result of the
inevitable performance problems, I would be
called in to help solve the difficulties
a, you do need to know everything i
standing of the fundamental tool
apsulate my experiences into a few broad st
tements, they would be: * An appl
bout. Maybe they have a SQL generator that will
save them from the hardship of having to lea
able * is the exact same feature that make the
database appealing to me. The database
ed to UNIX, Windows 3.x was not really a
multi-tasking OS at all.
evelopers had to understand exactly how the W
 find it extremely hard to deploy that feature
anywhere I want. One of the mai


My obersavation is that the 1st two sentences covered well the main idea of the chapter but the 3rd sentence shouldn't be there at all.

Questions: Why CTX gave me the chopped words without reporting any errors in CTX dictionary  tables? do we always have to convert pdf to text first via ctxhx before calling gist?  

Tom Kyte
November 08, 2004 - 5:38 pm UTC

you should not have to filter first -- not sure why it munged up the text, the text stuff simply calls ctxhx

what you can do is setup a script called ctxhx (move ctxhx to ctxhx.bin for example) and echo the $* (command line args) out to /tmp/foo and then call ctxhx.bin from the script -- to see what command line text was passing to the filter, could be a characterset issue.

parameters used

Winston, November 08, 2004 - 8:02 pm UTC

I tried to capture the parameters as you suggested. and here they are:
/tmp/drgibbaESvb /tmp/drgitowGCxE ASCII8 unicode H NOMETA 120 HEURISTIC

What shall I change them to?

Many thanks!

Tom Kyte
November 08, 2004 - 9:32 pm UTC

nothing -- did you compare the text output of each -- the one you did and the one it does? (don't post it, just "gist" the differences for us)

difference

Winston, November 09, 2004 - 12:29 am UTC

localhost:/tmp >ls -alFt out_chap01
-rw-rw-r-- 1 oracle oracle 107817 Nov 8 10:33 out_chap01 <--mine
localhost:/tmp >ls -alFt native.html
-rw-r-xr-- 1 oracle oracle 217262 Nov 8 20:52 native.html* <--Oracle's


diff native.html out_chap01
Binary files native.html and out_chap01 differ

native.html does contain broken words and strange characters.

cat native.html |more
<HTML><BODY>



Developing Successful Oracle
Applications
I spend the bulk of my time working with Oracle database software and, more to t
he point, with people
who use this software. Over the last twelve years, I ve worked on many projects
successful ones as
well as failures, and if I were to encapsulate my experiences into a few broad s
tatements, they would be:
Q'
An application built around the database dependent on the database will succ
eed or fail
based on how it uses the database.
Q'
A development team needs at its heart a core of &#9618; database savvy coders who are
responsible
for ensuring the database logic is sound and the system is tuned.
These may seem like surprisingly obvious statements, but in my experience, I hav
e found that too many
--More--



My question now, what can we do to hint the ctxsys to generate meaningful gist like you got? do I have to have your file? or we just cannot generate a good gist for a PDF file?


Tom Kyte
November 09, 2004 - 8:42 am UTC

asking again -- when you compare the "clean text file you got the gist from that was good, with the text file produced by ctxhx using the inputs you captured -- what are the fundemental differences between them"

difference in character set

Winston, November 09, 2004 - 3:52 pm UTC

Oracle passed in 
'/tmp/drgibGmJOVa /tmp/drgitIG28AS ASCII8 <b>unicode</b> H NOMETA 120 HEURISTIC'. 
In order to force filter to use the correct character set I changed the ctxhx wrapper to 
'$ORACLE_HOME/ctx/bin/ctxhx.bin $1 $2 $3 <b>ISO8859-1</b> T'. 

After this change, I am able to create index, but I got errors when I tried to run gist:

SQL> create index demo_idx on demo(theblob) indextype is ctxsys.context ;

Index created.

SQL> exec ctx_doc.gist('demo_idx','27','CTX_GIST',71,'PARAGRAPH');
BEGIN ctx_doc.gist('demo_idx','27','CTX_GIST',71,'PARAGRAPH'); END;

*
ERROR at line 1:
ORA-20000: Oracle Text error:
ORA-06512: at "CTXSYS.DRUE", line 157
ORA-06512: at "CTXSYS.CTX_DOC", line 739
ORA-06512: at line 1


I also tried out note      Note:113330.1. I was able to do the above two operations however I got a new issue: all row's POV were NULL in the result table except 'GENERIC'.
SQL> select pov, count(*) from ctx_gist where query_id=72 group by pov;

POV
--------------------------------------------------------------------------------
  COUNT(*)
----------
GENERIC
         1


        50


My question now, what can we do to hint the ctxsys to generate meaningful gist 
like you got? do I have to have your file? or we just cannot generate a good 
gist for a PDF file? 

Tom Kyte
November 09, 2004 - 4:05 pm UTC

it should have worked "as is, out of the box" and did for me, with pdf's, with word docs, with whatever.

please file a tar with support for this one.

"Contains"

Ganesh, January 04, 2005 - 6:55 am UTC

I am getting the following error. What could be the reason.
I just copy-pasted the scripts that you provided.

SQL> select count(*)
  2       from submitted_indexed_questions
  3     where contains( text, 'index', 1 ) > 0
  4  /
   where contains( text, 'index', 1 ) > 0
         *
ERROR at line 3:
ORA-00904: "CONTAINS": invalid identifier
 

Tom Kyte
January 04, 2005 - 8:37 am UTC

sounds like your dba did not install the TEXT functionality.

Performance of ctxsys.context

Ryan, April 04, 2005 - 10:09 am UTC

I am indexing 5.16 GB of data. Every word of this data must be indexed so we can search on it. Problem is that creating the index(s) takes 18 hours. I'm basically calling "create index indexName on tableName(columnNames) INDEXTYPE IS CTXSYS.CONTEXT ..."

Does this sound right? or is there a way to speed up creation of indexes?

thanks

Tom Kyte
April 04, 2005 - 11:51 am UTC

memory parameter and parallel can significantly change the response time.

ctx_ddl

Mike, April 04, 2005 - 12:19 pm UTC

Is this utility recommended for use in indexing name and address type columns, or is there some other method that would be better. We currently have a number of online users who are trying to look up the accounts of customers, but they consistently query the names and addresses with methods like this:
Select account_number
from customer_table
where customer_name like '%KYTE%'
and customer_addr like '%ORACLE CO%';

This, of course, does a full table scan. The customer table is getting rather large, and the scans are taking longer and longer. I noticed this thread and thought that this type of index might be beneficial for columns that are being queried in this manner.

I'm trying to get the developers to modify the queries to use bind variables, but that's not going to help this at all. I thought that while they were re-writing for the binds, that we could change the SQL to use the contains clause and get rid of the wildcarding, but only if we've indexed the columns as if they were text documents.

Is this an appropriate use of this utility?

Lily, April 14, 2005 - 9:58 am UTC

Tom,
If database is upgraded from 8.1.7 to 9.2.0.4 but don't have
CTXSYS user and all its objects/packages, what scripts should run to create CTXSYS and its objects which allow to create context index?

Thanks in advance

Tom Kyte
April 14, 2005 - 10:02 am UTC

use dbca to add it.

Lily, April 14, 2005 - 10:10 am UTC

Thanks.



Knowledge Base/Thesaurus

Rumburak, October 13, 2005 - 6:26 am UTC

Hi Tom,

I have some questions about Knowledge Base and Thesaurus.

1. What is the difference between the Knowledge Base and the Thesaurus?

2. In the docs I read that Oracle Text has a english/french knowlegde base installed. Where is it stored? I can't find anything in the ctxsys-tables.

3. For what are the files under /ctx/data?

Bye,

Rumburak

problem running ctxsrv

ian gallacher, November 19, 2005 - 10:06 am UTC

Hi Tom,

Have just started using context indexes and searching for text strings without any problems. Have just found why new data was not being picked up so now run

C:\prs\cre>sqlplus prs/prs@m

SQL*Plus: Release 8.1.7.0.0 - Production on Sat Nov 19 14:46:43 2005

(c) Copyright 2000 Oracle Corporation.  All rights reserved.


Connected to:
Oracle8i Release 8.1.7.0.0 - Production
JServer Release 8.1.7.0.0 - Production

SQL> start rebuild_imt

Index altered.

Searched your site for more info and found that you can set a server process via ctxsrv

However when I run 
Ctxsrv –user ctxsys/ctxsys@m –log ctx.log , the service fails with
C:\prs\cre>ctxsrv -user ctxsys/ctxsys@m -log ctx.log

C:\prs\cre>type ctx.log

Oracle interMedia Text: Release 8.1.7.0.0 - Production on Sat Nov 19 14:52:31 20
05

(c) Copyright 2000 Oracle Corporation.  All rights reserved.


14-52-31 11/19/05 === OCO server startup ===
14-52-31 11/19/05 Initialized CORE
14-52-35 11/19/05 Connected to database.
14-52-39 11/19/05 === Server failure ===
14-52-39 11/19/05 DRG-50857: oracle error in drasmadd
14-52-39 11/19/05 ORA-06550: line 1, column 7:
PLS-00201: identifier 'DRIADM.ADD' must be declared
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored

14-52-39 11/19/05 === Server shutting down ===

C:\prs\cre>

Have been unable to get any decent info on error messages

Any ideas what I am missing – an install script to be run ?

Op sys is XP Pro

Any help would be appreciated !

Thanks

Ian
 

Tom Kyte
November 19, 2005 - 1:39 pm UTC

was Oracle text added "in the normal fashion" to the database? driadm should be installed by default.

problem running ctxsrv

ian gallacher, November 19, 2005 - 3:21 pm UTC

Hi Tom

Thanks for replying so promptly

8i was installed "out of the box" taking normal defaults without any customisation!

8i rel 3 8.1.7 cd used

Ian

Tom Kyte
November 19, 2005 - 8:14 pm UTC

I'll ask you to contact support, on my 8i installs, driadm is installed - do you see it in your ctxsys schema?

ctxsrv wont run

ian gallacher, November 19, 2005 - 8:50 pm UTC

Hi

When I look at tables in ctxsys schema this is what is returned

C:\>sqlplus ctxsys/ctxsys

SQL*Plus: Release 8.1.7.0.0 - Production on Sun Nov

(c) Copyright 2000 Oracle Corporation.  All rights


Connected to:
Oracle8i Release 8.1.7.0.0 - Production
JServer Release 8.1.7.0.0 - Production

SQL> select table_name,tablespace_name from user_ta

TABLE_NAME                     TABLESPACE_NAME
------------------------------ --------------------
DR$CLASS                       DRSYS
DR$DELETE
DR$INDEX                       DRSYS
DR$INDEX_ERROR                 DRSYS
DR$INDEX_OBJECT
DR$INDEX_SET                   DRSYS
DR$INDEX_SET_INDEX             DRSYS
DR$INDEX_VALUE                 DRSYS
DR$OBJECT                      DRSYS
DR$OBJECT_ATTRIBUTE            DRSYS
DR$OBJECT_ATTRIBUTE_LOV        DRSYS

TABLE_NAME                     TABLESPACE_NAME
------------------------------ --------------------
DR$PARALLEL
DR$PARAMETER
DR$PENDING
DR$PREFERENCE                  DRSYS
DR$PREFERENCE_VALUE            DRSYS
DR$SECTION                     DRSYS
DR$SECTION_GROUP               DRSYS
DR$SERVER                      DRSYS
DR$SQE
DR$STATS                       DRSYS
DR$STOPLIST                    DRSYS

TABLE_NAME                     TABLESPACE_NAME
------------------------------ --------------------
DR$STOPWORD
DR$SUB_LEXER                   DRSYS
DR$THS                         DRSYS
DR$THS_BT                      DRSYS
DR$THS_FPHRASE
DR$THS_PHRASE                  DRSYS
DR$WAITING                     DRSYS
SYS_IOT_OVER_24403             DRSYS
SYS_IOT_OVER_24494             DRSYS

31 rows selected.

but when I try and desc any of the tables 

SQL> desc DR$THS
ERROR:
ORA-04043: object DR$THS does not exist

Any other suggestions ?

Ian 

Tom Kyte
November 19, 2005 - 9:03 pm UTC

there is something wrong with the installation, please utilize support for this one.

ctxsrv wont run

ian gallacher, November 20, 2005 - 5:57 am UTC

Thanks for you input

Will contact support as you suggest

Ian

10g sync

Phil W, February 02, 2006 - 5:51 am UTC

Hi Tom

How is this altered to make it sync every 10 minutes for example and to just do this - ie leave the partition and memory values out.

Thanks!

Phil

CREATE INDEX tdrmauto02x ON tdrmauto02(text)
INDEXTYPE IS CTXSYS.CONTEXT local
(PARTITION tdrm02x_i1 PARAMETERS('
MEMORY 20m SYNC(ON COMMIT)'),
PARTITION tdrm02x_i2,
PARTITION tdrm02x_i3) PARAMETERS('
SYNC (EVERY "NEXT_DAY(TRUNC(SYSDATE), ''MONDAY'') + 15/24")
');


Tom Kyte
February 02, 2006 - 11:58 am UTC

you have

"next monday at 3pm"

you want

"10 minutes from now":

sysdate + 10/26/60

A reader, February 03, 2006 - 10:20 am UTC


How the score is calculated?

Sami, March 21, 2006 - 1:26 pm UTC

Dear Tom,

Thanks for providing this excellent service.

Can you please tell me how the score is calculated in Text Index search. I did go through oracle documents but I am not able to understand completely. Can you please explain in your style with examples.

Thanks in advance

Tom Kyte
March 22, 2006 - 2:40 pm UTC

</code> http://docs.oracle.com/docs/cd/B19306_01/text.102/b14218/ascore.htm#g635054 <code>

they do it pretty well right there.

uninstall

juancarlosreyesp, April 24, 2006 - 2:20 pm UTC

Hi Tom, I ran the @C:\oracle\ora92\ctx\admin\dr0dsys.sql;
to unsintall, but when I do an

select comp_name,version,status from dba_registry;

Oracle interMedia
9.2.0.1.0 LOADED

Intermedia is still there, how can I uninstall it, or I'm missing something.

Thank you

Tom Kyte
April 24, 2006 - 2:23 pm UTC

given that the contents of that script is:

set feedback off

prompt ...dropping user CTXSYS
drop user CTXSYS cascade;

prompt ...dropping roles
drop role CTXAPP;


it is likely "uninstalled" but not unregistered...

A reader, April 24, 2006 - 2:28 pm UTC

Thank you Tom, very fast.

This means the upgrade process will not upgrade intermedia?

And I'm curious is the a way to unregister?

Thank you :)


Tom Kyte
April 24, 2006 - 2:41 pm UTC

that I don't know - you'd probably want to use support for that one. there won't be anything to upgrade, that is for sure.

A reader, April 24, 2006 - 2:42 pm UTC

Thank you Tom

A reader, April 25, 2006 - 9:12 am UTC

fairlie helped me

execute dbms_registry.removed('ORDIM');

Question on using Oracle Text Vs Verity

Manjunath Kottur, August 01, 2006 - 12:53 pm UTC

Hi Tom,
We have a running application wherein we use Verity for the Search functionality. There are just 3 highly denormalized tables which are indexed by Verity and the indexed collections are used during user's search operations. Apart from these 3 tables, Verity is also used for items like faq, help, documentation etc that are maintained outside of the database and are all from legacy data(maintaining which is not in our purview).
There were some performance concerns and so we were trying to see if we can use Oracle database's native capabilities for search and also if Oracle Text is a better choice.

Biggest of these 3 tables is around 5 million records and average row length is less than 700 bytes. This is expected to grow upto 10m rows. Other 2 tables are less than a million rows each.

I understand there will be lots of testing before we decide on which way to go. But with almost no first hand experience on using Oracle Text, I had a few doubts and was hoping to get a quick answer.

1. The 3 tables are static and are never updated/deleted. We have home grown EPL solution that is used for populating these tables almost every week. If I put on either b-tree or bitmap indexes on all the columns, how does that compare with Verity? THe search is using wild card, LIKE, IN, case sensitive/insensitive, single/batch etc.

2. If #1 is not meaningful at all for these types of searches, can Oracle Text give me what we are looking for?

On the business requirement for these searches: This is a e-com site, but the clientele is restricted to a small segment(research institutions, pharma cos, students etc). The search results will help them to decide what to order. The search related data is in one instance and the e-com is taken care of in a separate instance. Both are 9i.

Idea is to use Oracle for the 3 tables and for the rest like faq/documentation etc continue with Verity.

Thanks a lot in advance.

Manjunath

Tom Kyte
August 02, 2006 - 9:19 am UTC

1) EPL?

cannot really answer about the b*tree or bitmap indexes - however, if it is 700 bytes of "text" (words), then a text index (context or ctxcat) would be more appropriate as it does the wildcards (and can even optimize leading '%' on the search terms), words, case insensitive and so on.

2) yes.

but you don't need to pay verity for the faq/documentation either - you can use bfiles and have Text index and find them as well.

Thanks!

manjunath, August 03, 2006 - 8:03 am UTC

Hi Tom,

Thanks a lot for your reply. We will be doing tests to ascertain Oracle Text can do the same thing as Verity does in our search operations. EPL should have been ETL, forgive the typo.

I wanted to include one more thing...In less than 30 minutes after I posted my first request, I had searched your site with string 'verity' and it returned this thread too. How did that work? Was it that I just got lucky and your incremental indexing happened in that 30 minutes or is there something else I am missing?

Thanks and regards
Manjunath

Tom Kyte
August 03, 2006 - 9:43 am UTC

you got lucky, yes, I sync it up every hour.

so, you could find something with a new string 1 second after you insert it here, or maybe in 60 minutes and any point in between.

Statistics on Context indexes

Andrew Markiewicz, August 03, 2006 - 6:13 pm UTC

Tom,

I've found differing posts in newsgroups about statistics on the dr$* tables and indexes that support the context indexes.
Should statistics be generated on these tables for the CBO or not?

Thanks


Tom Kyte
August 03, 2006 - 6:52 pm UTC

if you are in production - don't change production. test it.


but in general, the text tables in 9i should and in 10g - do.

ctxsys.dr$ tables and indexes

Andrew Markiewicz, August 04, 2006 - 12:30 pm UTC

Should stats also be generated for the CTXSYS owned objects as well or are those more like data dictionary tables that should not have stats?

CTXSYS DR$PARAMETER
CTXSYS DR$CLASS
CTXSYS DR$OBJECT
CTXSYS DR$OBJECT_ATTRIBUTE
CTXSYS DR$OBJECT_ATTRIBUTE_LOV
CTXSYS DR$PREFERENCE
CTXSYS DR$PREFERENCE_VALUE
CTXSYS DR$INDEX
CTXSYS DR$INDEX_PARTITION
CTXSYS DR$INDEX_VALUE
CTXSYS DR$POLICY_TAB
CTXSYS DR$INDEX_OBJECT
CTXSYS DR$SQE
CTXSYS DR$THS
CTXSYS DR$THS_PHRASE
...

We have nothing in production right now. I am just trying to set up the environment and learning where all the underlying objects are stored.

Thanks.

Tom Kyte
August 04, 2006 - 12:52 pm UTC

how about a version - 10g, it'll "happen" (yes, it should be)

in 9i - your call.

Text sample in the search results

A reader, August 16, 2006 - 1:42 pm UTC

Tom,
How about the sample text that comes under the titles when you search in google or metalink? Does Oracle Text have this functionality?
It generally contains some paragraph with the search words, but some times they mix two sentences in the way to show you your search words.

Thinking on my implementation with pl/sql, once I have a cursor with the search results, I could call some function with some logic that could extract these fragments for me, based on the search words. Is this the way to do this?
Regards

Tom Kyte
August 16, 2006 - 3:52 pm UTC

yes, text has a "summary" ability.

is it just like google? no.

see ctx_doc for the functionality available to you

Showing the words around a text hit

kevin, August 16, 2006 - 4:43 pm UTC

10g has a new ctx_doc.snippet() function that does exactly -that - provides a basic keyword in context capability. if you want more of a summary, ctx_doc.theme() and ctx_doc.gist() will give you some interesting results.

Appana Anil Kumar, September 12, 2006 - 9:13 am UTC

Hi,

create table quick
(
quick_id number primary key,
text varchar(80)
);
insert into quick ( quick_id, text )
values ( 11, 'C___2917241_20' );
commit;
begin
Ctx_Ddl.Create_Preference('wildcard_pref', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 100) ;
end;
create index quick_text on quick ( text )
indextype is ctxsys.ConText
parameters ('lexer basic_lexer Wordlist wildcard_pref');

I have created a table and inserted one record into the table and indexed the table as given above. When i ran the below query i am unable to get the records. Can any body help to solve this problem...

select * from quick where contains(text, 'C___2917241_20')>0 order by quick_id;

Advance thanks...
A.Anil Kumar


Tom Kyte
September 12, 2006 - 9:20 am UTC

search this site for printjoins

</code> http://asktom.oracle.com/pls/ask/search?p_string=printjoins <code>

A reader, September 12, 2006 - 9:53 am UTC

Thank you...

Html entities

Salaam Yitbarek, October 23, 2006 - 4:06 pm UTC

Hi,

I'd like to be able to search as if the html entities have been resolved. 

For example, I'd like to get a hit if I search for 'description' or 'déscription' in a column that contains 'd&eacute;scription'. 

I tried the following, which didn't work:

SQL> set escape on;
SQL> 
SQL> drop table foo;

Table dropped.

SQL> 
SQL> create table foo (
  2  id NUMBER,
  3  mystring VARCHAR2(20));

Table created.

SQL> 
SQL> insert into foo (id,mystring) values (1,'d\&eacute;scription');

1 row created.

SQL> insert into foo (id,mystring) values (2,'description');

1 row created.

SQL> commit;

Commit complete.

SQL> 
SQL> CREATE INDEX foo_index ON foo(mystring)
  2    INDEXTYPE IS CTXSYS.CONTEXT
  3    PARAMETERS('filter ctxsys.null_filter section group ctxsys.html_section_group');

Index created.

SQL> 
SQL> select *
  2  from foo
  3  where contains(mystring,'description') > 0;

        ID MYSTRING
---------- --------------------
         2 description

SQL> 

Thanks. 

Retrieving PDF, HTML and XML documents with CONTAINS operator.

Stephan Vinette, April 30, 2007 - 2:22 pm UTC

Hi Tom,

Using Oracle 9iR2, so Oracle Text.

Any issues retrieving PDF (HTML and XML documents for that matter) that contains a specific text? I'm able to retrieve Word, Excel and WordPerfect documents, but I'm not able to retrieve PDF, HTML and XML documents.

CREATE TABLE ITF_DOCUMENTS 
(ID NUMBER, 
 CONTEXT_ID VARCHAR2(40 BYTE) NOT NULL ENABLE, 
 CONTEXT_VALUE" VARCHAR2(500 BYTE) NOT NULL ENABLE, 
 DOX_BLOB BLOB
)
/

CREATE INDEX IDX_ITF_DOC ON ITF_DOCUMENTS (DOX_BLOB) 
INDEXTYPE IS CTXSYS.CONTEXT
/

SELECT ID,CONTEXT_ID,CONTEXT_VALUE 
FROM   ITF_DOCUMENTS 
where  contains(dox_blob, 'ANT KILLER') > 0
/



In the example above, I have several Word and PDF documents that contains "ANT KILLER", but only the Word documents are retrieved. Am I missing something?

Thx!

Cheers,
Tom Kyte
April 30, 2007 - 3:39 pm UTC

did you look for errors during the indexing process - pdf works fine - there are filters for that. Might be a setup issue.

intermedia Thesaurus terms with '('

MSR, April 30, 2007 - 6:38 pm UTC

How do we build a interMedia index using thesaurus file with data like below?
NASDAQ: HFWA
NASDAQ/ NM:XXX
NASDAQ: XXX
(NASDAQ: HFWA
(NASDAQ/ NM:XXX
(NASDAQ: XXX

CTX_load does not allow "(" as leading character.
How do I escape "(" character to build intermedia index using thesaurus?

I tried using backslash character to escape "(" , but no luck (CTX_load failed).

Thanks


Oracle text

Raghav, May 14, 2007 - 7:58 am UTC

Hi tom,

While searching through oracle text I am surprised to see that the search is able to identify the plural form of the words like mouse - mice, woman-women, man-men, mango, mangoes etc.

If I want to add new words or some abbreviations in the dictionary so as to increase the search capability of my queries where can I add these new words or groups?

what is theasurus? can you please give some brief on that.

Thanks in advance
Tom Kyte
May 14, 2007 - 2:11 pm UTC

have you read the documentation? it is on otn.oracle.com.....

Regarding oracle text

Hemal, May 24, 2007 - 10:05 am UTC

Hello Tom,
I am having a zip file containing 10 microsoft word documents (.doc documents).The zip file is present on the application server.

The database table is having the path to the zip file.
Database is present on different server.

Using Oracle Text can I perform the search of one particular character string in these 10 documents present in the zip file.

Please advice
Thanks and Best Regards
-Hemal


Tom Kyte
May 26, 2007 - 11:26 am UTC

not if the file is not accessible to the database - which it isn't, it is on a file system on another machine.

PATH FOR FILES ON DIFFERENT SERVER

Laxman Kondal, August 24, 2007 - 2:12 pm UTC

Hi Tom,

How do i give path for files on different server - Windows-2000 and using Oracle 9i.

C:\>PING EEI-027

Pinging EEI-027.eei1.com [172.16.100.67] with 32 bytes of data:

Reply from 172.16.100.67: bytes=32 time<1ms TTL=128
...
Ping statistics for 172.16.100.67:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
...
C:\>

SQL> desc mydocs
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 THEFILE                                            VARCHAR2(100)

SQL> insert into mydocs values(1, '\\EEI-027\dpar_doc\DP10039171_012705.pdf');

1 row created.

SQL> commit;

Commit complete.

SQL> create index mydocs_indx on mydocs( thefile)
  2  indextype is ctxsys.context
  3  parameters( 'datastore ctxsys.file_datastore');

Index created.

SQL> select *  from ctx_user_index_errors;

ERR_INDEX_NAME                 ERR_TIMES ERR_TEXTKEY
------------------------------ --------- ------------------
ERR_TEXT
--------------------------------------------------------------------------------
MYDOCS_INDX                    24-AUG-07 AAAGjwAAGAAACpYAAD
DRG-11513: unable to open or write to file \\EEI-027\dpar_doc\DP10039171_012705.
pdf

SQL>

Even if change hostname to IP or '\\' to '//' still error.

File folder is shared and with Window Explore \\EEI-027\dpar_doc I can see the file.

Am I missing any thing which spits this error?

What will the the path if Oracle is on Linux and files on Windows or Oracle on Windows and files on Linux box.

Thanks and regards.

Tom Kyte
August 27, 2007 - 3:56 pm UTC

the account that oracle is running as must be able to see this, windows makes this so very impossibly hard to do (something that on unix is trivial is so hard on windows)

You'll definitely want to hit up metalink for the latest greatest info but
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:241814624807

You have to understand that just because YOUR logon on YOUR machine (which might be the database machine -but so what) can see something does not mean others on that machine can.

Oracle Text questions (indexing, mail_filter)

belu, May 05, 2009 - 5:45 am UTC

Hi, Tom

I am new in Oracle Text, and now I'm using Oracle Text 10.2. First, I have a table

 create table doc (id_doc number primary key, email_txt clob not null, sender varchar(50), email_tkn varchar2(4000));

 Table created.

 SQL> create index myindex on doc(email_txt)
 2  indextype is ctxsys.context
 3  ;

 Index created.
 
email_txt column filled with email in RFC-822 standard (one full email per record).
I need to fill the sender column with the email sender from email in email_txt,
and email_tkn must be generated from email_txt too. 

if the email in email_txt is like this :
"we provide you with sales leads and a list of names, addresses and phone numbers visiting your Website. 
This can significantly increase the value of your future B2B marketing strategies as it allows you to conduct very direct, customised marketing"

the desirable result in email_tkn is:
"provide sale lead list name address phone number visit website 
significant increase value future b2b market strategy allow conduct direct custom market"
 
(just the body part of email, without stopword, and every word is in the root form) 

should i use MAIL_FILTER?but from reference i've read, mail_filter have a limitation, input's lines must not be longer than 1024 byte,
besides most of the document longer than that. and often, this is an HTML message.

can you please help me to solve this problem Tom?

Thanks

Belu

oracle text

Arvind, October 25, 2009 - 5:58 pm UTC

Hello tom

Sorry for stupid quetsion but I can not find code http://asktom.oracle.com/wa/ask/owa/ask_tom_pkg.search

I tried to search it using search on main page as well as on your blog.

Please help me to find it.


Thanks,

Arvind


Tom Kyte
October 26, 2009 - 2:12 pm UTC

just go to the home page and do a search

or use advanced search.

Oracle Text

Arvind Mishra, October 28, 2009 - 4:19 am UTC

Sorry....missed the word interact...Actually what I was thinking that I will find code of the search...

:-)

Thanks,

Arvind
Tom Kyte
October 28, 2009 - 8:04 am UTC

see

https://www.oracle.com/technetwork/issue-archive/2009/09-jul/o49asktom-090487.html

it'll look very much like that example - but using contains for the text part.

Blob Tuning

Manoj, January 12, 2010 - 6:24 am UTC

Hi Tom,

Can you please suggest how can we tune the retrieval or insertion ob BLOB or CLOB? It takes a lot of time when we are doing the operation.

Thanks a lot.
Tom Kyte
January 18, 2010 - 12:47 pm UTC

you give me nothing, again, just like your other question.

no details, no languages, no "this is what we do", no "this is what we need to accomplish"


I'm sorry, silver magic bullets do not exist, one needs a context, a set of conditions - to explain how to streamline some process.

Intermedia performance issue with CTXCAT

A reader, August 24, 2010 - 8:30 am UTC

Tom,

Have you noticed if you query DBA_INDEXES for DOMAIN indexes that no matter how you try to gather stats (eg. DBMS_STATS,ANALYZE TABLE,ANALYZE INDEX,etc), the stats column LAST_ANALYZED is never updated for CTXCAT type of indexes but they are for CONTEXT ?

We seem to be getting different execution plans on 2 different systems where in one it drives of the Intermedia CTXCAT index and thinks the cardinality is a few hundred, whereas on another system it thinks there's many thousands thus causing a full table scan to be performed on a table it joins on.

We thought our stats may be stale, but no matter which stats gathering method is used, LAST_ANALYZED is never updated for CTXCAT, so wonder how Oracle determines the cardinality if the index isn't analyzed?

OCCURENCE BASED TEXT SEARCH

Arvind Mishra, December 07, 2010 - 3:46 am UTC

Hello Tom,

I want to get rank of search based on occurance and not the relevence. Script for test case is as follows:

CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));

INSERT INTO docs VALUES(1, '<HTML>California is a state in the US.</HTML>');
INSERT INTO docs VALUES(2, '<HTML>Paris is a city in France.</HTML>');
INSERT INTO docs VALUES(3, '<HTML>France is in Europe.</HTML>');
INSERT INTO docs VALUES(4, '<HTML>France France France is in Europe.</HTML>');
INSERT INTO docs VALUES(5, '<HTML>France France France is in Europe.</HTML>');
INSERT INTO docs VALUES(6, '<HTML>France France France France is in Europe.</HTML>');


--EXEC CTX_DDL.SYNC_INDEX('idx_docs', '2M');

commit;

DROP INDES IDX_DOCS;

CREATE INDEX idx_docs ON docs(text)
INDEXTYPE IS CTXSYS.CONTEXT;

SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'France', 1) > 0;

SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'DEFINESCORE ( France, OCCURENCE)', 1) > 0;

My problem is last query where i am using DEFINESCORE is giving following error:

ORA-29902: error in executing ODCIIndexStart() routine
ORA-20000: Oracle Text error:
DRG-11124: invalid value OCCURENCE
29902. 00000 - "error in executing ODCIIndexStart() routine"
*Cause: The execution of ODCIIndexStart routine caused an error.
*Action: Examine the error messages produced by the indextype code and
take appropriate action.

Please help me.

Tom Kyte
December 07, 2010 - 10:53 am UTC

OCCURRENCE has two R's

OCCURENCE BASED TEXT SEARCH

Arvind Mishra, December 07, 2010 - 10:10 pm UTC

:-) . Thanks a lot

Oracle text reference guide also needs correction at 3-14.

Regards,

Arvind Mishra


Tom Kyte
December 08, 2010 - 9:33 am UTC

OCCURENCE BASED TEXT SEARCH

Arvind Mishra, December 08, 2010 - 1:05 am UTC

Hi Tom,

I am trying to execute following code:

begin
ctx_doc.markup(index_name => 'idx_docs',
textkey => '2',
text_query => 'France',
restab => 'markuptab',
query_id => 1,
tagset => 'HTML_DEFAULT');

end;
/

But I am getting following error:

Error report:
ORA-20000: Oracle Text error:
DRG-50857: oracle error in ctx_doc.markup
ORA-00904: "DOCUMENT": invalid identifier
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DOC", line 1304

Please help me on this.

Also can we pass following query to ctx_doc.markup and get all resultant rows in markuptab table:

V_SQL := 'SELECT id, text FROM docs WHERE CONTAINS(text,' || CHR(39) || 'DEFINESCORE ( France, OCCURRENCE)' ||CHR(39) ||' , 1) > 0';

I am new to Oracle text. Please help me.

Thanks,

Arvind Mishra
Tom Kyte
December 08, 2010 - 9:36 am UTC

give full example so we can reproduce. create tables, create index, everything.

OCCURRENCE BASED TEXT SEARCH

Arvind Mishra, December 08, 2010 - 9:57 pm UTC

I have pdf file of Oracle® Text Reference 11g Release 2 (11.2) where on page 3-14 it was mentioned as:
’DEFINESCORE (dog, OCCURENCE)’

I also checked at on line pdf from oracle site where it is mentioned as:
’DEFINESCORE (dog, OCCURRENCE)’

It seems like Oracle has already done the correction.

Thanks,

Arvind

OCCURRENCE BASED SEARCH

Arvind Mishra, December 08, 2010 - 10:04 pm UTC

Hello Tom,

Here is the full code:

CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));

INSERT INTO docs VALUES(1, '<HTML>California is a state in the US.</HTML>');
INSERT INTO docs VALUES(2, '<HTML>Paris is a city in France.</HTML>');
INSERT INTO docs VALUES(3, '<HTML>France is in Europe.</HTML>');
INSERT INTO docs VALUES(4, '<HTML>France France France is in Europe.</HTML>');
INSERT INTO docs VALUES(5, '<HTML>France France France is in Europe.</HTML>');
INSERT INTO docs VALUES(6, '<HTML>France France France France is in Europe.</HTML>');

commit;

DROP INDES IDX_DOCS;

CREATE INDEX idx_docs ON docs(text)
INDEXTYPE IS CTXSYS.CONTEXT;

CREATE TABLE MARKUPTAB ( QUERY_ID NUMBER, DOC clob);


BEGIN

ctx_doc.markup(index_name => 'idx_docs',
textkey => '2',
text_query => 'France',
restab => 'markuptab',
query_id => 1,
tagset => 'HTML_DEFAULT');


END;
/

I am getting following error:

Error report:
ORA-20000: Oracle Text error:
DRG-50857: oracle error in ctx_doc.markup
ORA-00904: "DOCUMENT": invalid identifier
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DOC", line 1304
ORA-06512: at line 3
20000. 00000 - "%s"
*Cause: The stored procedure 'raise_application_error'
was called which causes this error to be generated.
*Action: Correct the problem as described in the error message or contact
the application administrator or DBA for more information.


Thanks,

Arvind

Tom Kyte
December 09, 2010 - 10:28 am UTC

ops$tkyte%ORA11GR2> CREATE TABLE MARKUPTAB ( QUERY_ID NUMBER, <b>DOCument</b> clob);

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> BEGIN
  2  
  3        ctx_doc.markup(index_name => 'idx_docs',
  4                          textkey => '2',
  5                          text_query => 'France',
  6                          restab => 'markuptab',
  7                          query_id => 1,
  8                          tagset => 'HTML_DEFAULT');
  9  end;
 10  /

PL/SQL procedure successfully completed.


document, not doc


http://docs.oracle.com/docs/cd/E11882_01/text.112/e16593/arestab.htm#i635134

OCCURRENCE BASED SEARCH

Arvind Mishra, December 14, 2010 - 12:18 am UTC

Hello Tom,

I am using same example given in previous question. My query

SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'DEFINESCORE ( France, OCCURRENCE)', 1) > 0;

is returning following results:

"SCORE(1)" "ID" "TEXT"
1 2 "<HTML>Paris is a city in France.</HTML>"
1 3 "<HTML>France is in Europe.</HTML>"
3 4 "<HTML>France France France is in Europe.</HTML>"
3 5 "<HTML>France France France is in Europe.</HTML>"
4 6 "<HTML>France France France France is in Europe.</HTML>"



I want to format the word Paris in bold and red font. I am able to do this with following code:

DECLARE

cursor c_dis is
SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'DEFINESCORE ( France, OCCURRENCE)', 1) > 0;

v_dis c_dis%rowtype;

BEGIN
open c_dis;

fetch c_dis into v_dis;

while c_dis%found loop

CTX_DOC.MARKUP( index_name => 'idx_docs',
textkey => v_dis.id,
text_query => 'France',
restab => 'markuptab',
query_id => v_dis.id,
--plaintext => true,
starttag => '<font color="red">',
endtag => '
</font>' );

fetch c_dis into v_dis;

end loop;

END;
/

However if I try following code I am getting error:

DECLARE

V_SQL VARCHAR2(2000);

BEGIN

V_SQL := 'SELECT id, text FROM docs WHERE CONTAINS(text,' || CHR(39) || 'DEFINESCORE ( France, OCCURRENCE)' ||CHR(39) ||' , 1) > 0';

DBMS_OUTPUT.PUT_LINE(V_SQL);

CTX_DOC.MARKUP( index_name => 'idx_docs',
textkey => 'id',
text_query => v_sql,
restab => 'markuptab',
query_id => userenv('sessionid'),
plaintext => true,
starttag => '<font color="red">',
endtag => '
</font>' );

END;
/

Error:

Error report:
ORA-20000: Oracle Text error:
DRG-50857: oracle error in drvdoc.get_rowid
ORA-01722: invalid number
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DOC", line 1301
ORA-06512: at line 11
20000. 00000 - "%s"
*Cause: The stored procedure 'raise_application_error'
was called which causes this error to be generated.
*Action: Correct the problem as described in the error message or contact
the application administrator or DBA for more information.
SELECT id, text FROM docs WHERE CONTAINS(text,'DEFINESCORE ( France, OCCURRENCE)' , 1) > 0

Do you think something like above is possible?

Thanks,

Arvind


Tom Kyte
December 14, 2010 - 7:19 am UTC

I don't understand what you are trying to do in the last one. Posting a bit of code that doesn't work without explaining what you were attempting to do and not explaining what you were trying to do doesn't work.

tell us what your goal is, not how you tried to do it ;)

Occurrance based search

Arvind, December 15, 2010 - 12:03 am UTC

Hello Tom,

My objective is to format the word Paris in bold and red font in the resultset of following query

SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'DEFINESCORE ( France, OCCURRENCE)', 1) >
0;

is returning following results:

"SCORE(1)" "ID" "TEXT"
1 2 "<HTML>Paris is a city in France.</HTML>"
1 3 "<HTML>France is in Europe.</HTML>"
3 4 "<HTML>France France France is in Europe.</HTML>"
3 5 "<HTML>France France France is in Europe.</HTML>"
4 6 "<HTML>France France France France is in Europe.</HTML>"

I have done this with while loop as posted in my previous question. But I am wondering if I can pass following sql query to directly to CTX_DOC.MARKUP procedure's text_query parameter and avoid the loop. That was I tried in following code:

DECLARE

V_SQL VARCHAR2(2000);

BEGIN

V_SQL := 'SELECT id, text FROM docs WHERE CONTAINS(text,' || CHR(39) || 'DEFINESCORE ( France,
OCCURRENCE)' ||CHR(39) ||' , 1) > 0';

DBMS_OUTPUT.PUT_LINE(V_SQL);

CTX_DOC.MARKUP( index_name => 'idx_docs',
textkey => 'id',
text_query => v_sql,
restab => 'markuptab',
query_id => userenv('sessionid'),
plaintext => true,
starttag => '<font color="red">',
endtag => '
</font>' );

END;
/

Can we do this?

Thanks for your patience.

Regards,

Arvind



Tom Kyte
December 15, 2010 - 11:18 am UTC

no you cannot avoid the loop, each document must be marked up. markup works on a document at a time.

solution for Arvind Mishra's problem

Barbara Boehmer, December 15, 2010 - 1:33 pm UTC

Arvind keeps saying that he wants to mark the word Paris, but his code indicates he wants to mark the work France that he has searched for, so I will assume that he means France, not Paris. It appears that his ultimate goal is to get the score, id, and marked-up text in one result set. Although ctx_doc.markup only processes one row at a time, it is possible to create a user-defined wrapper function that executes the ctx_doc.markup procedure, using in-memory result storage instead of result table storage, returning marked-up text, and use that function in the select statement, so that you do not have to join the original query to the result table. Please see the demonstration below that passes the search string to the contains clause and function as a bind variable.

SCOTT@orcl_11gR2> CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200))
2 /

Table created.

SCOTT@orcl_11gR2> BEGIN
2 INSERT INTO docs VALUES(1, '<HTML>California is a state in the US.</HTML>');
3 INSERT INTO docs VALUES(2, '<HTML>Paris is a city in France.</HTML>');
4 INSERT INTO docs VALUES(3, '<HTML>France is in Europe.</HTML>');
5 INSERT INTO docs VALUES(4, '<HTML>France France France is in Europe.</HTML>');
6 INSERT INTO docs VALUES(5, '<HTML>France France France is in Europe.</HTML>');
7 INSERT INTO docs VALUES(6, '<HTML>France France France France is in Europe.</HTML>');
8 END;
9 /

PL/SQL procedure successfully completed.

SCOTT@orcl_11gR2> CREATE INDEX idx_docs ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT
2 /

Index created.

SCOTT@orcl_11gR2> CREATE OR REPLACE FUNCTION your_markup
2 (p_index_name IN VARCHAR2,
3 p_textkey IN VARCHAR2,
4 p_text_query IN VARCHAR2,
5 p_plaintext IN BOOLEAN DEFAULT TRUE,
6 p_tagset IN VARCHAR2 DEFAULT 'HTML_DEFAULT',
7 p_starttag IN VARCHAR2 DEFAULT '<font color="red">',
8 p_endtag IN VARCHAR2 DEFAULT '</font>',
9 p_key_type IN VARCHAR2 DEFAULT 'PRIMARY_KEY')
10 RETURN CLOB
11 AS
12 v_mklob CLOB;
13 BEGIN
14 CTX_DOC.SET_KEY_TYPE (p_key_type);
15 CTX_DOC.MARKUP
16 (index_name => p_index_name,
17 textkey => p_textkey,
18 text_query => p_text_query,
19 restab => v_mklob,
20 plaintext => p_plaintext,
21 tagset => p_tagset,
22 starttag => p_starttag,
23 endtag => p_endtag);
24 RETURN v_mklob;
25 END your_markup;
26 /

Function created.

SCOTT@orcl_11gR2> SHOW ERRORS
No errors.
SCOTT@orcl_11gR2> VARIABLE g_text_query VARCHAR2(200)
SCOTT@orcl_11gR2> EXEC :g_text_query := 'DEFINESCORE ( France, OCCURRENCE)'

PL/SQL procedure successfully completed.

SCOTT@orcl_11gR2> COLUMN markedup_text FORMAT A45 WORD_WRAPPED
SCOTT@orcl_11gR2> SELECT SCORE(1), id,
2 your_markup ('idx_docs', id, :g_text_query) as markedup_text
3 FROM docs
4 WHERE CONTAINS (text, :g_text_query, 1) > 0
5 ORDER BY SCORE(1) DESC
6 /

SCORE(1) ID MARKEDUP_TEXT
---------- ---------- ---------------------------------------------
4 6 <HTML><font color="red">France</font> <font
color="red">France</font> <font
color="red">France</font> <font
color="red">France</font> is in
Europe.</HTML>

3 4 <HTML><font color="red">France</font> <font
color="red">France</font> <font
color="red">France</font> is in
Europe.</HTML>

3 5 <HTML><font color="red">France</font> <font
color="red">France</font> <font
color="red">France</font> is in
Europe.</HTML>

1 2 <HTML>Paris is a city in <font
color="red">France</font>.</HTML>

1 3 <HTML><font color="red">France</font> is in
Europe.</HTML>


5 rows selected.

SCOTT@orcl_11gR2>

Tom Kyte
December 16, 2010 - 2:06 am UTC

don't forget to close those lobs in the client application that fetches them - that is a potential "cursor leak" if you do not.

but thanks!

solution for Arvind Mishra's problem

Arvind Mihshra, December 16, 2010 - 10:28 pm UTC

Thanks a lot Barbara.

Paris and France ...Ahhh...my bad. But deep inside me Paris = France ;-).

Regards,

Arvind

Markup Starttag contents misalignment

Carl, October 17, 2011 - 6:08 pm UTC

Using 10.2.0.3, we have been using a similar approach as to the original example in this topic for several years now without issue. Now, for some unexplained reason in one of the html-marked up CLOBS, the ctx_doc.markup "starttag" contents is placed in the portion of the html code prior to the actual value thereby providing a somewhat ridiculous presentation.

I've read the cautionary note about "Oracle text does not guarantee well-formatted output...." but, again, have never experienced a problem before. The source of the package used to generate the html-formatted CLOB contents has not changed so why this happens now escapes me. In another environment, using 10.2.0.4, the problem does not occur.

Can you shed any light as to if this package does indeed suffer some problem with markup, why this would happen, and, if you believe we might want to consider writing our own "home brew" markup function?

By the way, we are using the "markup table" implementation in lieu of "in memory" approach although I would think the actual html markup methodology would be the same regardless.
Tom Kyte
October 17, 2011 - 6:37 pm UTC

this sounds like something you would need to bring up with support - if they've changed the behavior on purpose - that would be one thing and they'll tell you why. If they have not - then it would be a 'product issue'.

you'll need to have a small reproducible test case - one that reproduces on your system at least - as small as possible (sounds like it should be a one row, one column table for this test case).

Searching for Synonyms

Oracle User, August 02, 2015 - 6:06 am UTC

Hi

We are creating a Oracle Text Search capability for a field containing company names. In this domain certain synonyms are very common, such as

Inc -> Incorporated
Ltd -> Limited
PLC -> Public Limited Company

So if one searches for
'Tata Steel Limited' - they should also find 'Tata Steel Ltd'
'Sigma Berl Public Limited Company' - they should also find 'Sigma Berl PLC'

What is the best way to achieve this?

I saw some simple examples of the oracle's Thesaurus feature - where i can set up the above as synonyms.

However how would i use that in my contains query. For example if user is entering 'Tata Steel Limited' - i want only the Limited word to be expanded to its equivalent words (not the words Tata and Steel)

What I have tried ...

BEGIN
ctx_thes.create_thesaurus ('DEFAULT');
ctx_thes.create_relation
('DEFAULT', 'limited', 'SYN', 'ltd');
ctx_thes.create_relation
('DEFAULT', 'public limited company', 'SYN', 'plc');
END;

-- This query does not find TATA STEEL LIMITED
select match_party_name from my_party_table
WHERE CONTAINS(party_name, 'SYN(TATA STEEL LTD)', 1) > 0;



Thanks in advance for any pointers

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here