You Asked

Hi Tom,

I want intermedia query result sorted by date desc (last in first out)
but can not afford sorting (20 million rows web application).

table created as:

create table menus
( id number primary key,
recipe_xml_text clob,
date_created date
)

intermedia index created as:

create index xml_idx on menus (recipe_xml_text)
indextype is ctxsys.context
parameters ('section group ctxsys.auto_section_group')

regular index created as:
create index date_created_idx on menus (date_created desc)

query A:
select id,date_created from menus where
contains(recipe_xml_text,'keyword within tags')>0
and date_created <= SYSDATE
and rownum < 101 order by date_created desc;

The query result set did not satisfy the last in first out by date.

query B: added first_rows hint.
select /*+ FIRST_ROWS */ id,date_created from menus where
contains(recipe_xml_text,'keyword within tags')>0
and date_created <= SYSDATE
and rownum < 101 order by date_created desc;

the matched rows came out with latest date first,but if there is no match on the search text,
the query did not come back right away.

Both queries ended up with the same Execution Plan:
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=59 Card=1
Bytes=210)

1 0 COUNT (STOPKEY)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'MENUS
' (Cost=55 Card=1 Bytes=210)

3 2 INDEX (RANGE SCAN) OF 'DATE_CREATED_INDEX' (NO
N-UNIQUE) (Cost=6 Card=1)

Is there a solution for query B to come back right away if no match is found ?

I know it's difficult to combine domain index with regular index (to use xml_idx
and date_created_idx simultaneously),If only the score can be used here,is there any
way I can manipulate the score (like put more weight on the date_created) so that
order by score desc will generate the desired (last in first out) result ?

Thank You!

and Tom said...

Well, your last query there says:

o goto the index and row by row check each row if the row contains that data (20,000,000 times!!!)

If the "latest" rows don't have matches, this will take a long time.

o the query that uses the text index first AND THEN sorts will be very very fast if there are a couple of hundred rows.

It is my humble opinion that if a search returns more then say 500 hits, it is a totally useless search. It NEEDS to be refined. No one is EVER going to look at more then that (goto google or anywhere, you generally get 10 hits/page, 10 pages to click thru... Have you ever gone further then that?)

In fact, goto google and search for Oracle. They show 10 hits/page. They showed me 83 pages -- I am 100% sure there are more then 830 pages out there with "oracle". In fact on the very last page they say:

"In order to show you the most relevant results, we have omitted some entries very similar to the 822 already displayed."

basically saying "refine your SEARCH!!!!"

What I do is this:

o limit the text search to 500 hits.
o order by WHATEVER after that.

sorting 500 rows will take fractions of a second, finding 500 rows will too. I then also put up "your search returned WAY too much data, here are 500 good hits. Please refine your criteria to get more meaningful results".

You cannot manipulate the score, the score is based on the words in the text and the search terms.

Rating

(3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Strange behavior on inline join

Jimmy Lee, March 01, 2004 - 5:31 am UTC

Hi Tom,

Here is similiar case, yet temporary table comes some interesting result. Is there any reason why inline join could not run as fast as PK of temporary table? Thanks.

Create table Objects (GUID varchar2(20) not null, CreateDt date, primary key(GUID));
Create table ObjectContent (GUID varchar2(20) not null, Content clob, primary key(GUID), FOREIGN KEY(GUID) REFERENCES Objects(GUID));
CREATE INDEX OBJGLUEX ON ObjectContent(Content) indextype IS ctxsys.context parameters ('lexer my_lexer');

Case 1: Straightforward query on table join, it takes more than 30 seconds...

select /*+ FIRST_ROWS */ * from (
select /*+ FIRST_ROWS */ a.*, rownum rnum from (
select /*+ FIRST_ROWS */ distinct o.GUID, o.CreateDT
from Objects o, ObjectContent oc
where oc.guid=o.guid
and contains (oc.Content, 'Power') > 0 order by o.CreateDT desc
) a
where rownum <= 20
) where rnum > 0;

20 rows selected.

Execution plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=6 Card=1 Bytes=60)
1 0 VIEW (Cost=6 Card=1 Bytes=60)
2 1 COUNT (STOPKEY)
3 2 VIEW (Cost=6 Card=1 Bytes=47)
4 3 SORT (UNIQUE STOPKEY) (Cost=4 Card=1 Bytes=94)
5 4 NESTED LOOPS (Cost=3 Card=1 Bytes=94)
6 5 TABLE ACCESS (BY INDEX ROWID) OF 'OBJECTCONTENT' (Cost=2 Card=1 Bytes=47)
7 6 DOMAIN INDEX OF 'OBJGLUEX' (Cost=0)
8 5 TABLE ACCESS (BY INDEX ROWID) OF 'OBJECTS' (Cost=1 Card=1 Bytes=47)
9 8 INDEX (UNIQUE SCAN) OF 'SYS_C003964' (UNIQUE)

Statistic
----------------------------------------------------------
4907 recursive calls
0 db block gets
68956 consistent gets
9934 physical reads
0 redo size
1702 bytes sent via SQL*Net to client
514 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
20 rows processed

Case 2: Straightforward query on table join, it takes totally less than 10 seconds...

create global temporary table Temp (rid rowid not null, primary key(rid) ) on commit delete rows ;
insert into Temp select rowid from ObjectContent oc where contains (oc.Content, 'Power') > 0;
select /*+ FIRST_ROWS */ * from (
select /*+ FIRST_ROWS */ a.*, rownum rnum from (
select /*+ FIRST_ROWS */ distinct o.GUID, o.CreateDT
from Objects o, ObjectContent oc, Temp t
where oc.guid=o.guid and oc.rowid = t.rid
order by o.CreateDT desc
) a
where rownum <= 20
) where rnum > 0;

15317 row created.

Execution plan
----------------------------------------------------------
0 INSERT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=9)
1 0 DOMAIN INDEX OF 'OBJGLUEX' (Cost=0 Card=1 Bytes=9)

Statistic
----------------------------------------------------------
4963 recursive calls
47689 db block gets
5571 consistent gets
47 physical reads
4623532 redo size
630 bytes sent via SQL*Net to client
597 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
15317 rows processed

20 rows selected.

Execution plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=302 Card=20 Bytes=1200)
1 0 VIEW (Cost=302 Card=20 Bytes=1200)
2 1 COUNT (STOPKEY)
3 2 VIEW (Cost=302 Card=8168 Bytes=383896)
4 3 SORT (UNIQUE STOPKEY) (Cost=153 Card=8168 Bytes=808632)
5 4 NESTED LOOPS (Cost=4 Card=8168 Bytes=808632)
6 5 NESTED LOOPS (Cost=4 Card=991351 Bytes=91204292)
7 6 INDEX (FAST FULL SCAN) OF 'OBX' (NON-UNIQUE) (Cost=4 Card=76453 Bytes=3593291)
8 6 INDEX (UNIQUE SCAN) OF 'SYS_C003977' (UNIQUE)
9 5 INDEX (UNIQUE SCAN) OF 'SYS_C004117' (UNIQUE)

Statistic
----------------------------------------------------------
0 recursive calls
0 db block gets
79417 consistent gets
539 physical reads
0 redo size
1702 bytes sent via SQL*Net to client
514 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
20 rows processed

March 01, 2004 - 8:26 am UTC

select /*+ first_rows */ guid, createdt
from objects
where guid in (select guid
from objectContent
where contains( content, 'Power' ) > 0)
order by createDT desc

first_rows on that inner most query in your case (with a distinct and a join) could be "the reason". assuming that guid is the primary key of objects (seems reasonable), the distinct isn't necessary and it looks like you just want "in"

Another way to do it would be:

select /*+ FIRST_ROWS */ * from (
select /*+ FIRST_ROWS */ a.*, rownum rnum from (
select /*+ FIRST_ROWS */ o.GUID, o.CreateDT
from Objects o,
(select distinct guid from ObjectContent where contains(content,'Power') > 0 ) oc
where oc.guid=o.guid
order by o.CreateDT desc
) a
where rownum <= 20
) where rnum > 0;

which pretty much mimicks your approach -- in fact, yours could probably go faster even if you inserted the distincted GUID's not rowids.

Jimmy Lee, March 02, 2004 - 11:13 pm UTC

Hi Tom,

You precious advice just happens to be the weird point about GUID (PK of table) and rowid (physical row id). I tried lots combinations on 'first_rows' hint, 'distinct' keyword, 'in' join or subquery. None of them could be even close to the speed of using rowid as PK of temporary table.

In other words, using rowid, instead of GUID, as primary key of temporary table has HUGE different on the inserting temporary table. Once temporary table was generated, no matter using GUID or rowid as primary key, the speed on table join was not much different. It matches the slow-speed observation on "select guid from objectContent where contains( content, 'Power' ) > 0".

The performance bottleneck seems to be that ctxsys.context stores rowid instead of PK inside DR$ tables. Then it takes tons of time to looking up PK by rowid.

I tried to use IOT table but ctxsys.context could not be secondary index of IOT table.

Oracle version is 9.2.0.4 (result is as the same as 9.2.0.1).

Any idea? Thanks so much on your help

March 03, 2004 - 9:39 am UTC

try

select /*+ FIRST_ROWS */ * from (
select /*+ FIRST_ROWS */ a.*, rownum rnum from (
select /*+ FIRST_ROWS */ o.GUID, o.CreateDT
from Objects o,
(select distinct rowid rid from ObjectContent where
contains(content,'Power') > 0 and rownum >= 0 ) oc
where oc.rowid=o.rid
order by o.CreateDT desc
) a
where rownum <= 20
) where rnum > 0;

Jimmy Lee, March 03, 2004 - 11:32 pm UTC

Hi Tom,

Retrive rowid in subquery to join with original table, just like your sample, is just slow as others. Using rowid as primary key of tamporary table still run much faster (about 1:4 ratio or more). Reasonable it is to retrieve all rows if we need to sort query result, yet doubtful it is to understand the performance difference.

Thanks.

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Strange behavior on inline join

More to Explore

CLOB/BLOB/etc

Intermedia query result order by date desc

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Strange behavior on inline join

More to Explore

CLOB/BLOB/etc