Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question.

Asked: January 22, 2020 - 8:32 pm UTC

Last updated: January 31, 2022 - 2:13 am UTC

Version: 12.1

Viewed 1000+ times

You Asked

In one of the databases , we have list of courses , descriptions , outlines , ...
We would like to implement "google" like search

( ie)

a) If I type the keyword "data" , it look for databases , data mining , big data ...

b) If I type the keyword such as "Project Mgmt" , it should auto correct the criteria as "Project Management" ...
Kinda of knowledge search

It looks like , Oracle Text can't do this .
Please let me know what technology/algorithm I can use to achieve this , if possible .

Thanks

and Connor said...

For (a), text search *can* do that. That's pretty much what we use on AskTOM here to provide search.

We want an index across "all" the text columns that make up a question, an answer and the reviews. So we have a procedure to do that:

CREATE OR REPLACE PROCEDURE  asktom.ate_question_idx
(rid in rowid, tlob in out nocopy clob)
is
begin
for c1 in (select q.subject, q.question, q.answer, q.id
             from ate_submitted_questions q,
                  ate_admins a
            where q.rowid = rid
            and   q.admin_id = a.id(+)
            )
loop
    tlob := tlob || c1.subject || ' ';

    if c1.email is not null then
        tlob := tlob || c1.email || ' ';
    end if;

    tlob := tlob || c1.question || ' ';

    if c1.answer is not null and length(c1.answer) > 0
    then
        tlob := tlob || c1.answer || ' ';
    end if;

    for c2 in (select review_comments, followup_comments
               from ate_question_reviews
               where question_id = c1.id
               order by created_on)
    loop
        if c2.review_comments is not null
        then
            tlob := tlob || c2.review_comments ||' ';
        end if;

        if c2.followup_comments is not null
        then
            tlob := tlob || c2.followup_comments ||' ';
        end if;
    end loop;
  end if;

end loop;
end;


and then we can define a text index that uses that output at the "column" for the text index to be based on

begin
  ctxsys.ctx_ddl.create_preference('ATE_SEARCH_IDX_DST','USER_DATASTORE');
  ctxsys.ctx_ddl.set_attribute('ATE_SEARCH_IDX_DST','PROCEDURE','ASKTOM.ATE_QUESTION_IDX');
end;
/

create index ATE_SEARCH_IDX 
  on ATE_SUBMITTED_QUESTIONS(ANSWER)
  indextype is ctxsys.context
  parameters('
    datastore       ATE_SEARCH_IDX_DST
    sync (every SYSDATE + 4/24 memory 67108864)
  ')
/



So even though the index is defined for on the "ANSWER" column, the procedure above means that we'll go collect all the other information as well to put in the index.

For (b) ... no we don't do the "auto correct" but we do provide fuzzy search capability

https://docs.oracle.com/database/121/CCAPP/GUID-098F287A-DD02-42E1-B685-B1E4ED108FBA.htm


Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

What is the Fuzzy Search criteria that Ask Tom uses

A reader, March 01, 2021 - 10:41 pm UTC

Suppose I enter "Oracle Text" for the search. what is the parsed string? Is it just

...
and contains(answer, 'fuzzy({oracle text}, 30, 2000)') > 0
...



Connor McDonald
March 02, 2021 - 2:03 am UTC

We do a little bit of pre-processing on the inputs etc, but basically:

contains(info_col, :your_criteria, 99) > 0}


"info_col" being a consolidation of keywords in the question, answer, comments and other things

Marcel van Lare, January 29, 2022 - 11:45 am UTC

We have build a solution like this where a materialized view (on-commit or fast) puts the data in rows with a searchfield string and text index. It is an oltp system. Works quite nice for a not too big system. Only it seems that it doesnt scale super, on increasing search mviews, and database overhead logging. Also important how realtime you need it (is it really necessary?) and to have a good mechanism to handle bulk changes , and keeping the oltp system performing for other users.

Kind regards, Marcel
Connor McDonald
January 31, 2022 - 2:13 am UTC

Agreed.

We update the asktom index not on commit but on a set frequuency.

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.