how to order by score(1)
Ahmad Sawalha, July 04, 2016 - 11:24 am UTC
Dir Sir,
thank you for you response, I read the solution, and followed the link but I need to know:
1- can I order by score(1), in creating my index?
....CREATE INDEX foox ON foo(D) INDEXTYPE IS CTXSYS.CONTEXT
FILTER BY B, C
ORDER BY score(1);
2- If not, what columns to include in the filter by clause? and how they will enhance my performance?
thanks.
July 04, 2016 - 12:26 pm UTC
1. No, you have to reference columns in the table.
2. Just include the column you're indexing and "order by score" won't need a table access:
create index i on t(x) indextype is ctxsys.context order by x;
See the code in my example above.
about cluase && score(0)> threshold
Ahmad Sawalha, July 13, 2016 - 6:57 pm UTC
Thank you very much
but I have another question:
1- which is better for performance using " about " clause or the syntax above?
2- what if I use in the "contains" cluase, instead of >0 , I use > 60 like this
<score datatype="INTEGER" algorithm="COUNT"/> </query>',1)>60
is it better for performance
thank you very very very much.
July 14, 2016 - 9:13 am UTC
1. ABOUT and your syntax are functionally different! The "," matches for exact matches of the strings in your input sentence. The more different terms there are, the higher the score.
ABOUT searches for text related to your term:
For example, an ABOUT query on heat might return documents related to temperature, even though the term temperature is not part of the query.So if anything ABOUT will be slower, because it has more matches. But it depends on your data and search terms.
But decide what functionality you need first!
2. The contains() clause returns the score of each row. So comparing this to a higher value will filter more rows in the index. This means you access fewer table blocks. So yes, having:
contains(...) > 60
Should be more efficient than
contains(...) > 0
Other enhancements
Ahmad Sawalha, July 23, 2016 - 9:39 pm UTC
Dear Sir, Thank you for help, but I need more help please,
what other settengs can I do to get the best performance?
I read in Oracle documentation, I found some other settengs like , parallel, partitioning... etc.
what is the best hardware and software settings in my case?
what is the best parallel degree? I have 8 CPUs
how can I partition my table? there is no column to partition by( like price , order_date ) I have just plain text with ids. can I partition my table logically?
any suggestions?
thanks a lot.
July 25, 2016 - 8:26 am UTC
The first question to ask is "is my query fast enough"? To determine this, speak with the users to find out what "fast enough" is!
If the answer is yes, then you can stop!
I don't have enough information about your situation to answer your other questions. If you want to read more about these, check out the performance tuning guide:
http://docs.oracle.com/database/121/TGDBA/toc.htm
Processed_text column is greater than 249
Ahmad Sawalha, July 24, 2016 - 9:53 pm UTC
Dir Sir
when I executed the create index with order by col.
it gave me an error
when I checked the documentation, it was that my column is greater than 249 chars
what to do please ? I am stuck with this project
thanks
July 25, 2016 - 8:27 am UTC
What exactly is your error?
Can you post a test-case demonstrating it?
error in col. size
Ahmad Sawalha, July 26, 2016 - 9:33 am UTC
Dir sir
here is the error ( my col. PROCESSED_TEXT size is 200 char, but contains Arabic text) :
SQL Error: ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: Oracle Text error:
DRG-11310: invalid order by column: PROCESSED_TEXT
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 366
29855. 00000 - "error occurred in the execution of ODCIINDEXCREATE routine"
*Cause: Failed to successfully execute the ODCIIndexCreate routine.
*Action: Check to see if the routine has been coded correctly.
July 26, 2016 - 8:31 pm UTC
And what is your index create statement? And the create table while you're at it! ;)
explain plan for index
Ahmad Sawalha, July 26, 2016 - 11:23 pm UTC
Dir Sir: thanks for you effort, I solved the error but I have another issue ( no performance enhancements), here is my steps
1- create index PLAG_DOCSEN3_IDX on plag_docsentences3(PROCESSED_TEXT) indextype is ctxsys.context order by PROCESSED_TEXT;
processed_text is varchar2(200).
2- DBMS_STATS.GATHER_TABLE_STATS('plagdb','PLAG_DOCSENTENCES3') ;
3- my query is :
select sentence_id ,sc ,isn ,sentence_length ,processed_text from
(select /*+ First_Rows(5) */ sentence_id,score(1) as sc, isn ,sentence_length ,processed_text from plag_docsentences3
where contains(PROCESSED_TEXT,'<query>
<textquery>
هدف درس حدد درج نمي نظم اعضاء هيأ درس درأ جمع نجح وطن قدس ومد أثر غور جنس أهل علم
<progression>
<seq><rewrite>transform((TOKENS, "{", " }", ","))</rewrite></seq>
</progression>
</textquery>
<score datatype="INTEGER" algorithm="COUNT"/>
</query>',1)>0 order by score(1) desc )
where rownum <=5;
4- the explain plan :
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1167 | 1807 (1)| 00:00:22 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 1167 | 1807 (1)| 00:00:22 |
| 3 | TABLE ACCESS BY INDEX ROWID| PLAG_DOCSENTENCES3 | 1 | 215 | 1807 (1)| 00:00:22 |
|* 4 | DOMAIN INDEX | PLAG_DOCSEN3_IDX | | | 1806 (0)| 00:00:22 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
4 - access("CTXSYS"."CONTAINS"("PROCESSED_TEXT",'<query> <textquery> هدف درس حدد
درج نمي نظم اعضاء هيأ درس درأ جمع نجح وطن قدس ومد أثر غور جنس أهل علم <progression>
<seq><rewrite>transform((TOKENS, "{", " }", ","))</rewrite></seq>
</progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/>
</query>',1)>0)
I created the index another time without order by
and repeated the whole steps
and this is the explain plan
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1167 | 1456 (1)| 00:00:18 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 1167 | 1456 (1)| 00:00:18 |
| 3 | TABLE ACCESS BY INDEX ROWID| PLAG_DOCSENTENCES3 | 1 | 215 | 1456 (1)| 00:00:18 |
|* 4 | DOMAIN INDEX | PLAG_DOCSEN3_IDX | | | 1455 (0)| 00:00:18 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
4 - access("CTXSYS"."CONTAINS"("PROCESSED_TEXT",'<query> <textquery> هدف درس حدد
درج نمي نظم اعضاء هيأ درس درأ جمع نجح وطن قدس ومد أثر غور جنس أهل علم <progression>
<seq><rewrite>transform((TOKENS, "{", " }", ","))</rewrite></seq>
</progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/>
</query>',1)>60)
it takes the same time.
what I am doing wrong
thanks
July 29, 2016 - 1:46 pm UTC
..continue
Ahmad Sawalha, July 26, 2016 - 11:34 pm UTC
my table is 3 million rows now, and is growing toward 100 million, and the query takes 0.2 seconds.
my case was that when the table has 100M records, and I sent 300 query, it takes 45 minutes.
July 29, 2016 - 1:46 pm UTC
Please show us the execution plan.
execution plan ( without order by, with order by)
Ahmad Sawalha, August 01, 2016 - 5:38 pm UTC
without order by:
1- create index PLAG_DOCSEN3_IDX on plag_docsentences3(processed_text) indextype is ctxsys.context;
2-DBMS_STATS.GATHER_TABLE_STATS('plagdb','plag_docsentences3') ;
3-
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:01.99 | 309K| 19926 |
|* 1 | COUNT STOPKEY | | 1 | | 3 |00:00:01.99 | 309K| 19926 |
| 2 | VIEW | | 1 | 2 | 3 |00:00:01.99 | 309K| 19926 |
|* 3 | COUNT STOPKEY | | 1 | | 3 |00:00:01.99 | 309K| 19926 |
| 4 | VIEW | | 1 | 2 | 3 |00:00:01.99 | 309K| 19926 |
|* 5 | SORT ORDER BY STOPKEY | | 1 | 2 | 3 |00:00:01.99 | 309K| 19926 |
| 6 | TABLE ACCESS BY INDEX ROWID| PLAG_DOCSENTENCES3 | 1 | 2 | 1642K|00:00:01.72 | 309K| 19926 |
|* 7 | DOMAIN INDEX | PLAG_DOCSEN3_IDX | 1 | | 1642K|00:00:00.75 | 28177 | 11 |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=3)
3 - filter(ROWNUM<=5)
5 - filter(ROWNUM<=5)
7 - access("CTXSYS"."CONTAINS"("PROCESSED_TEXT",'<query> <textquery> أثر خدم ظمأ شكل حسن حصل طلب كلأ
علم ربأ تبع وكل غوثالأردن ميد روض خلد حلم خشن يمن حلم <progression>
<seq><rewrite>transform((TOKENS, "{", " }", ","))</rewrite></seq> </progression> </textquery> <score
datatype="INTEGER" algorithm="COUNT"/> </query>',1)>0)
with order by
Ahmad Sawalha, August 01, 2016 - 5:42 pm UTC
1-create index PLAG_DOCSEN3_IDX on plag_docsentences3(processed_text) indextype is ctxsys.context order by processed_text;
2- DBMS_STATS.GATHER_TABLE_STATS('plagdb','plag_docsentences3') ;
3-
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:01.89 | 310K| 6832 |
|* 1 | COUNT STOPKEY | | 1 | | 3 |00:00:01.89 | 310K| 6832 |
| 2 | VIEW | | 1 | 2 | 3 |00:00:01.89 | 310K| 6832 |
|* 3 | COUNT STOPKEY | | 1 | | 3 |00:00:01.89 | 310K| 6832 |
| 4 | VIEW | | 1 | 2 | 3 |00:00:01.89 | 310K| 6832 |
|* 5 | SORT ORDER BY STOPKEY | | 1 | 2 | 3 |00:00:01.89 | 310K| 6832 |
| 6 | TABLE ACCESS BY INDEX ROWID| PLAG_DOCSENTENCES3 | 1 | 2 | 1642K|00:00:01.63 | 310K| 6832 |
|* 7 | DOMAIN INDEX | PLAG_DOCSEN3_IDX | 1 | | 1642K|00:00:00.77 | 28861 | 0 |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=3)
3 - filter(ROWNUM<=5)
5 - filter(ROWNUM<=5)
7 - access("CTXSYS"."CONTAINS"("PROCESSED_TEXT",'<query> <textquery> أثر خدم ظمأ شكل حسن حصل طلب كلأ
علم ربأ تبع وكل غوثالأردن ميد روض خلد حلم خشن يمن حلم <progression>
<seq><rewrite>transform((TOKENS, "{", " }", ","))</rewrite></seq> </progression> </textquery> <score
datatype="INTEGER" algorithm="COUNT"/> </query>',1)>0)
sql statement
Ahmad Sawalha, August 01, 2016 - 5:46 pm UTC
sorry, for the 3 replies, just to format.
and thank you very much for help.
select /*+ gather_plan_statistics */*
from (
select sentence_id ,sc ,isn ,sentence_length ,processed_text from
(select /*+ First_Rows(5) */ sentence_id,score(1) as sc, isn ,sentence_length ,processed_text from plag_docsentences3
where contains(PROCESSED_TEXT,'<query>
<textquery>
أثر خدم ظمأ شكل حسن حصل طلب كلأ علم ربأ تبع وكل غوثالأردن ميد روض خلد حلم خشن يمن حلم
<progression>
<seq><rewrite>transform((TOKENS, "{", " }", ","))</rewrite></seq>
</progression>
</textquery>
<score datatype="INTEGER" algorithm="COUNT"/>
</query>',1)>0 order by score(1) desc )
where rownum <=5)
where rownum <=3;