Using stems...
Thomas Barry, December 02, 2002 - 4:15 pm UTC
Tom- thanks for the response. What would be your philosophy for providing 'stem' capable searches for web users who won't necessarily know about the different search constructs that Oracle Text offers? Here's an example of the functionality I'd like to provide without requiring the user to enter '$word':
Requirement: Allow the user to search our site for 'tablespace'. The search results should be basically the same regardless of whether the user enters 'tablespace' or 'tablespaces'
We could certainly massage the search string provided by the user to prefix search keys with '$', and maybe do nothing if Oracle Text constructs are supplied. I'd be curious to know if this is how you might handle such a requirement...
Thanks for this terrific resource!
December 02, 2002 - 4:28 pm UTC
That is how I would do it -- in fact that is exactly how i do do it.
You type in a string - i removed unwanted characters, look for " " quoted strings and preserve them, wrap each term in a {} and then put it back together.
I would anticipate you would have a checkbox
[] use plurality <or something to that effect>
and upon seeing that checked, you would put {$word} in there instead of just {word}
Indexing of new data
Martin, October 19, 2003 - 6:28 am UTC
Hi Tom,
I've learned a lot from this thread about oracle text, but i've got a small problem which is probably easy to remedy, but I can't find anything in the documentation regarding this (but it's problem just me!).
Basically, new data does not get returned from the query unless I recreate the index. Is there some mechanism I'm missing for refreshing the index before I issue the query?
This is 9.2.0.4.0.
SQL> SELECT * FROM t;
A
----------------
XYZ
W X YZ
X Y Z
SQL> create index t_ctx on t(a) indextype is ctxsys.context;
Index created.
SQL> SELECT * FROM T WHERE CONTAINS(a, 'X') > 0;
A
----------------
X Y Z
W X YZ
Ok, now insert a new row which should get returned.
SQL> INSERT INTO t VALUES ( 'V W X Y' );
1 row created.
SQL> SELECT * FROM T WHERE CONTAINS(a, 'X') > 0;
A
----------------
X Y Z
W X YZ
Recreate the index....
SQL> DROP INDEX t_ctx;
Index dropped.
SQL> create index t_ctx on t(a) indextype is ctxsys.context;
Index created.
SQL> SELECT * FROM T WHERE CONTAINS(a, 'X') > 0;
A
--------------------------------------------------------------------------------
V W X Y
X Y Z
W X YZ
As I said before, it's probably something I'm missing, but any advice would be much appreciated.
Regards
October 19, 2003 - 6:19 pm UTC
Stemming : Is it default for English
Ranjan, September 20, 2004 - 7:56 pm UTC
Hi Tom,
Regarding stemming i want to extend this discussion. When I do a theme on a piece of text it returns me "noticeableness" for "noticeble" . Is it a result of stemming ? If yes how can i remove stemming from the index ? Also when we create a context index , does the stemming parameter set to 'True' always ? Thanks a lot.
September 20, 2004 - 8:53 pm UTC
themes are not "stemmed" like that, they are themes.
stemming is a function of searching -- contains expands the stem terms out. It really isn't an "index" thing. Although, you can speed up a stem search by using the index_stems option in the BASIC_LEXER.
Splitting the Input Word and then Searching
Venkatesh Jayaraman, May 21, 2006 - 3:16 pm UTC
I am planning to use Oracle Text to improve the Search Capabilities of my program.
Suppose I have a table with a record having one of its column value as "FirstName LastName". This column is Indexed with CTXSYS.CONTEXT.
Now in my search UI, the user inputs "FirstNameLastName" without the space inbetween. Even if I use FUZZY operator supplied by Oracle Text, I am not able to get the record with the value having space inbetween the words.
How can I achieve the above functionality. Omitting spaces is a comman mistake made by users and I want the Fuzzy Operator to pick records even then.
May 21, 2006 - 8:16 pm UTC
omitting spaces is a comman mistake? how so?
Text does things "by words"
foo bar
foobar
three totally different words.
If you want spaces to not be considered, I doubt you'd use text - unless you used text to index a function a and the function returns a string without any spaces and probably does substring prefix indexing to speed leading wildcard searches...
fuzzy - deals with words, you have a word the user input (I'm still dumbfounded by the "they leave spaces out" - I've never encountered that myself) - and it'll not fuzzy match EITHER of the WORDS (stress plural) that you have indexed
Splitting the input word and searching
Venkatesh Jayaraman, May 22, 2006 - 5:34 am UTC
Just like the way users omit some letters in their word, it is equally possible for users to accidentally omit spaces.
Suppose the column has the value "Oracle Text"
When the user keys in "Oracl Text", FUZZY search is able to retrieve the record containing "Oracle Text".
In this case, the user has omitted the letter "e".
Its equally possible for the user to omit space between "Oracle Text". Why shouldnt it be possible.
Lets try to google "oracletext". Its able to clearly give "Did you mean: oracle text".
Similarly I would like such a functionality in Oracle Text.
May 22, 2006 - 7:58 am UTC
You would have to write it, no such functionality exists (as stated above)
Setting the input workd and searching
ravi, June 30, 2008 - 1:37 pm UTC
Hi Tom,
What is your opinion about Lucene ? Between Lucene and Oracle Text which one do you think is powerful for fuzzy matching ?
(I know I am asking VP, Oracle ...but still I trust your logical explanation).
Thanks,
Ravi
July 06, 2008 - 6:59 pm UTC
Well, I've never used Lucene.
Basically this is what I can say however....
Oracle Text started life outside the database - which is about the wrongest place on the planet for.... a search.
We spent a lot of time putting it into the database.
So you could ask the question 'show me the documents edited within the last 5 days that contain this' - that is RELATIONAL and TEXT at the same time
as well as have the backup, recoverability, securability and so on - of database data.
To me, it comes down to a lot more than "which has the best fuzzy search" - their search would have to be so far beyond - for me to even consider giving up the database aspect and all that comes with it.