Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Thomas.

Asked: December 02, 2002 - 9:21 am UTC

Last updated: July 06, 2008 - 6:59 pm UTC

Version: 9.2.0

Viewed 1000+ times

You Asked

Hi Tom,

I've created a new Oracle Text index with a user defined datastore. The index works, but I'm having trouble understanding how Oracle Text handles plural vs. singular when searching. For example:

SQL> insert into foo (col1) values ('elephants');

SQL> select count(*) from foo
where contains(col1, 'elephants', 1) > 0;

COUNT(*)
--------
1

SQL> select count(*) from foo
where contains(col1, 'elephant', 1) > 0;

COUNT(*)
--------
0

Is using the 'fuzzy' operator the best (or only) way to address this:

SQL> select count(*) from foo
where contains(col1, 'fuzzy(elephant, 70, 6, weight)', 1) > 0

COUNT(*)
--------
1

or is there something else that can be done when creating the index? I've tried using a wordlist preference with the STEM_FUZZY_PREF attribute set, which I thought would obviate the need to use the 'fuzzy' operator in the contains clause:

begin
ctx_ddl.drop_preference('STEM_FUZZY_PREF');
ctx_ddl.create_preference('STEM_FUZZY_PREF', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_MATCH','ENGLISH');
ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_SCORE','60');
ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_NUMRESULTS','5000');
ctx_ddl.set_attribute('STEM_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
ctx_ddl.set_attribute('STEM_FUZZY_PREF','STEMMER','ENGLISH');
end;

but that doesn't have any effect. BTW, what does the 'contains' clause in the search functionality of your asktom application look like?

Thanks much!

and Tom said...

I think you just want to use a stem search

</code> http://docs.oracle.com/cd/B10501_01/text.920/a96518/cqoper.htm#14499 <code>

select count(*) from foo
where contains(col1, '$elephant', 1) > 0;

will do that out of the box.


If you do a search on asktom -- I print out the contains clause right on the "hit" page, just do a search and it'll be there.

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Using stems...

Thomas Barry, December 02, 2002 - 4:15 pm UTC

Tom- thanks for the response. What would be your philosophy for providing 'stem' capable searches for web users who won't necessarily know about the different search constructs that Oracle Text offers? Here's an example of the functionality I'd like to provide without requiring the user to enter '$word':

Requirement: Allow the user to search our site for 'tablespace'. The search results should be basically the same regardless of whether the user enters 'tablespace' or 'tablespaces'

We could certainly massage the search string provided by the user to prefix search keys with '$', and maybe do nothing if Oracle Text constructs are supplied. I'd be curious to know if this is how you might handle such a requirement...

Thanks for this terrific resource!

Tom Kyte
December 02, 2002 - 4:28 pm UTC

That is how I would do it -- in fact that is exactly how i do do it.

You type in a string - i removed unwanted characters, look for " " quoted strings and preserve them, wrap each term in a {} and then put it back together.

I would anticipate you would have a checkbox


[] use plurality <or something to that effect>


and upon seeing that checked, you would put {$word} in there instead of just {word}

Indexing of new data

Martin, October 19, 2003 - 6:28 am UTC

Hi Tom, 

I've learned a lot from this thread about oracle text, but i've got a small problem which is probably easy to remedy, but I can't find anything in the documentation regarding this (but it's problem just me!).

Basically, new data does not get returned from the query unless I recreate the index. Is there some mechanism I'm missing for refreshing the index before I issue the query?

This is 9.2.0.4.0.

SQL> SELECT * FROM t;

A
----------------
XYZ
W X YZ
X Y Z

SQL> create index t_ctx on t(a) indextype is ctxsys.context;

Index created.

SQL> SELECT * FROM T WHERE CONTAINS(a, 'X') > 0;

A
----------------
X Y Z
W X YZ

Ok, now insert a new row which should get returned.

SQL> INSERT INTO t VALUES ( 'V W X Y' );

1 row created.

SQL> SELECT * FROM T WHERE CONTAINS(a, 'X') > 0;

A
----------------
X Y Z
W X YZ

Recreate the index....

SQL> DROP INDEX t_ctx;

Index dropped.

SQL> create index t_ctx on t(a) indextype is ctxsys.context;

Index created.

SQL> SELECT * FROM T WHERE CONTAINS(a, 'X') > 0;

A
--------------------------------------------------------------------------------
V W X Y
X Y Z
W X YZ

As I said before, it's probably something I'm missing, but any advice would be much appreciated.

Regards
 

Stemming : Is it default for English

Ranjan, September 20, 2004 - 7:56 pm UTC

Hi Tom,
Regarding stemming i want to extend this discussion. When I do a theme on a piece of text it returns me "noticeableness" for "noticeble" . Is it a result of stemming ? If yes how can i remove stemming from the index ? Also when we create a context index , does the stemming parameter set to 'True' always ? Thanks a lot.

Tom Kyte
September 20, 2004 - 8:53 pm UTC

themes are not "stemmed" like that, they are themes.

stemming is a function of searching -- contains expands the stem terms out. It really isn't an "index" thing. Although, you can speed up a stem search by using the index_stems option in the BASIC_LEXER.

Splitting the Input Word and then Searching

Venkatesh Jayaraman, May 21, 2006 - 3:16 pm UTC

I am planning to use Oracle Text to improve the Search Capabilities of my program.

Suppose I have a table with a record having one of its column value as "FirstName LastName". This column is Indexed with CTXSYS.CONTEXT.

Now in my search UI, the user inputs "FirstNameLastName" without the space inbetween. Even if I use FUZZY operator supplied by Oracle Text, I am not able to get the record with the value having space inbetween the words.

How can I achieve the above functionality. Omitting spaces is a comman mistake made by users and I want the Fuzzy Operator to pick records even then.

Tom Kyte
May 21, 2006 - 8:16 pm UTC

omitting spaces is a comman mistake? how so?

Text does things "by words"

foo bar
foobar

three totally different words.

If you want spaces to not be considered, I doubt you'd use text - unless you used text to index a function a and the function returns a string without any spaces and probably does substring prefix indexing to speed leading wildcard searches...

fuzzy - deals with words, you have a word the user input (I'm still dumbfounded by the "they leave spaces out" - I've never encountered that myself) - and it'll not fuzzy match EITHER of the WORDS (stress plural) that you have indexed

Splitting the input word and searching

Venkatesh Jayaraman, May 22, 2006 - 5:34 am UTC

Just like the way users omit some letters in their word, it is equally possible for users to accidentally omit spaces.

Suppose the column has the value "Oracle Text"

When the user keys in "Oracl Text", FUZZY search is able to retrieve the record containing "Oracle Text".

In this case, the user has omitted the letter "e".

Its equally possible for the user to omit space between "Oracle Text". Why shouldnt it be possible.

Lets try to google "oracletext". Its able to clearly give "Did you mean: oracle text".

Similarly I would like such a functionality in Oracle Text.



Tom Kyte
May 22, 2006 - 7:58 am UTC

You would have to write it, no such functionality exists (as stated above)

Setting the input workd and searching

ravi, June 30, 2008 - 1:37 pm UTC

Hi Tom,

What is your opinion about Lucene ? Between Lucene and Oracle Text which one do you think is powerful for fuzzy matching ?
(I know I am asking VP, Oracle ...but still I trust your logical explanation).

Thanks,
Ravi
Tom Kyte
July 06, 2008 - 6:59 pm UTC

Well, I've never used Lucene.

Basically this is what I can say however....

Oracle Text started life outside the database - which is about the wrongest place on the planet for.... a search.

We spent a lot of time putting it into the database.

So you could ask the question 'show me the documents edited within the last 5 days that contain this' - that is RELATIONAL and TEXT at the same time

as well as have the backup, recoverability, securability and so on - of database data.

To me, it comes down to a lot more than "which has the best fuzzy search" - their search would have to be so far beyond - for me to even consider giving up the database aspect and all that comes with it.