Hi!
There is test environment: analytic table with 15,000,000 of documents. Average size of every document is from 20 to 100 pages.
We built Oracle Text Index on this table - type is CONTEXT:
CREATE INDEX rco_anal ON files(txt) INDEXTYPE IS ctxsys.context;
We want to search people in this documents. We do search on surname, name and patronymic: contains(txt,'NEAR (((BIRYUKOV),(ALEKSEY),(IGOREVICH)),5)').
When surname is rare (Biryukov) it takes 9 seconds, but when family is frequent (IVANOV) it takes 310 seconds. When we search with 4 word (surname, name, patronymic and year of birth) - search takes ours!
Documents in Russian, family in this text I change to Roman alphabet.
We got Oracle Text Trace information, but I don't understand what trouble is and what we can do for acceleration of searching.
Our SQL-case with statistics:
==============================
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> select count(*) from files;
COUNT(*)
----------
15256791
Executed in 0,052 seconds
SQL> select count(*) from dr$rco_anal$i;
COUNT(*)
----------
402923895
Executed in 21,543 seconds
SQL> select count(*) from dr$rco_anal$k;
COUNT(*)
----------
15256791
Executed in 0,413 seconds
SQL> select count(*) from dr$rco_anal$n;
COUNT(*)
----------
0
Executed in 0,000 seconds
SQL> select count(*) from dr$rco_anal$r;
COUNT(*)
----------
1
Executed in 0,001 seconds
SQL> select count(*) from files where contains(txt,'(BIRYUKOV AND ALEKSEY AND IGOREVICH)') > 0;
COUNT(*)
----------
5527
Executed in 1,373 seconds
SQL> select count(*) from files where contains(txt,'NEAR (((BIRYUKOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
COUNT(*)
----------
3
Executed in 9,126 seconds
SQL> select count(*) from files where contains(txt,'(IVANOV AND ALEKSEY AND IGOREVICH)') > 0;
COUNT(*)
----------
37255
Executed in 15,429 seconds
--------------- Oracle Text Trace -----------------------------
SQL> select id from anal.files where contains(txt,' NEAR(((BIRYUKOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
ID
----------
23727822
30537373
26275335
Executed in 9,126 seconds
SQL> select id from anal.files where contains(txt,'NEAR (((IVANOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
ID
----------
22088462
...
53649006
53866426
223 rows selected
Executed in 309,959 seconds
Oracle Text Trace BIRYUKOV IVANOV
(9 s) (309s)
-------------------------------------------------------------- -------- ------
Time spent executing user datastore: 0 0
Time spent invoking the AUTO_FILTER filter: 0 0
Time spent executing the $X cursor: 186 177
Time spent fetching from $X: 4033 4256
Total number of rows whose token metadata was fetched from $X: 6381 6827
Time spent fetching the LOB locator from $I: 23563 52426
Time spent reading $I LOB information: 24542 62483
Number of rows whose $I token_info was actually read: 4977 6609
Number of bytes read from $I LOBs: 71897 14651
Time spent fetching and reading $R information: 111 4756
Time spent in CONTAINS processing: 91205 30920
Time spent fetching and reading $S information: 0 0
Time spent reading $O information: 0 0
Time spent reading $D information: 0 0
Time spent extracting a snippet from a document: 0 0
--------- Get stats with Thomas Kyte script -------------------
SQL> exec runstats_pkg.rs_start;
PL/SQL procedure successfully completed
Executed in 0,015 seconds
SQL> select count(*) from files where contains(txt,'NEAR (((BIRYUKOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
COUNT(*)
----------
3
Executed in 9,212 seconds
SQL> exec runstats_pkg.rs_middle;
PL/SQL procedure successfully completed
Executed in 0,032 seconds
SQL> select count(*) from files where contains(txt,'NEAR (((IVANOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
COUNT(*)
----------
223
Executed in 308,103 seconds
SQL> exec runstats_pkg.rs_stop(1);
Run1 ran in 921 cpu hsecs
Run2 ran in 30810 cpu hsecs
run 1 ran in 2.99% of the time
Name Run1 Run2 Diff
STAT...db block changes 115 113 -2
STAT...redo entries 23 25 2
STAT...HSC Heap Segment Block 23 25 2
STAT...Heap Segment Array Inse 23 25 2
STAT...parse time elapsed 6 4 -2
LATCH.object queue memory 1 3 2
LATCH.KTF sga latch 0 2 2
LATCH.temp lob duration state 0 2 2
LATCH.resmgr:resource group CP 5 3 -2
LATCH.job workq parent latch 1 3 2
LATCH.Result Cache: SO Latch 0 2 2
STAT...user I/O wait time 1 4 3
STAT...non-idle wait time 1 4 3
STAT...active txn count during 0 3 3
STAT...cleanout - number of kt 0 3 3
STAT...consistent changes 91 87 -4
STAT...calls to kcmgcs 66 70 4
STAT...parse time cpu 6 2 -4
STAT...execute count 42 38 -4
LATCH.list of block allocation 1 5 4
LATCH.dispatcher info 0 4 4
STAT...pinned cursors current 6 0 -6
LATCH.kjoedcso state object fr 0 6 6
LATCH.dml lock allocation 0 6 6
LATCH.lob segment hash table l 1 7 6
LATCH.deferred cleanup latch 0 6 6
LATCH.cp sga latch 0 6 6
LATCH.ncodef allocation latch 0 6 6
LATCH.ASM network state latch 0 6 6
LATCH.ASM remote client latch 0 6 6
LATCH.Sched IM Job latch 1 7 6
STAT...opened cursors current 7 0 -7
LATCH.Result Cache: RC Latch 0 7 7
STAT...db block gets 85 77 -8
STAT...db block gets from cach 85 77 -8
STAT...parse count (total) 41 33 -8
LATCH.shared pool simulator 9 1 -8
LATCH.job_queue_processes free 0 8 8
LATCH.JS slv state obj latch 1 9 8
LATCH.SQL memory manager latch 1 10 9
LATCH.Real-time descriptor lat 1 10 9
STAT...opened cursors cumulati 44 34 -10
STAT...consistent gets pin (fa 174 184 10
LATCH.FOB s.o list latch 5 15 10
LATCH.resmgr:actses change sta 4 14 10
LATCH.kwslbmdl: metadata latch 1 11 10
LATCH.AQ Coord jobx_kwsbgsgn l 1 11 10
LATCH.AQ Sharded master pool l 1 11 10
LATCH.krso process latch 0 11 11
LATCH.managed standby latch 0 11 11
LATCH.Shared B-Tree 1 12 11
LATCH.kwqbsn:qsga 0 11 11
STAT...session cursor cache hi 15 27 12
STAT...parse count (hard) 13 1 -12
LATCH.object queue header free 1 13 12
LATCH.file cache latch 0 12 12
LATCH.threshold alerts latch 1 13 12
STAT...enqueue releases 19 6 -13
STAT...session cursor cache co 16 3 -13
LATCH.Consistent RBA 2 15 13
STAT...enqueue requests 20 6 -14
LATCH.dtp latch 3 17 14
LATCH.sequence cache 0 15 15
LATCH.event stats latch 3 19 16
LATCH.Report Request stats lat 4 20 16
LATCH.compile environment latc 3 20 17
LATCH.cp srv type state latch 1 19 18
LATCH.JS mem alloc latch 0 18 18
LATCH.JS queue access latch 1 19 18
LATCH.session switching 4 24 20
LATCH.kss move lock 7 27 20
LATCH.PL/SQL warning settings 37 60 23
STAT...workarea memory allocat -10 -35 -25
LATCH.Request holder compeltio 7 35 28
LATCH.process group creation 6 35 29
LATCH.channel handle pool latc 7 36 29
LATCH.OS process: request allo 6 35 29
LATCH.ksz_so allocation latch 6 35 29
LATCH.simulator lru latch 1 30 29
LATCH.kwsptjobAdPtLt: AdPt lis 0 31 31
LATCH.Event Group Locks 7 39 32
LATCH.session statistics 7 40 33
LATCH.dummy allocation 7 40 33
LATCH.ILM Stats main anchor la 7 40 33
LATCH.resmgr:schema config 3 36 33
LATCH.cache buffers lru chain 1 38 37
STAT...recursive calls 357 396 39
LATCH.ksv class latch 13 53 40
LATCH.process allocation 9 52 43
LATCH.resmgr group change latc 12 55 43
LATCH.resmgr:actses change gro 13 56 43
LATCH.session allocation 9 53 44
LATCH.qmn task queue latch 0 44 44
LATCH.ksv allocation latch 12 58 46
STAT...cleanouts only - consis 70 120 50
STAT...immediate (CR) block cl 70 120 50
LATCH.resmgr:free threads list 12 65 53
STAT...shared io pool buffer g 78 136 58
LATCH.cache buffer handles 10 72 62
LATCH.session state list latch 10 73 63
STAT...non-idle wait count 95 158 63
STAT...physical read IO reques 84 149 65
STAT...physical read total IO 91 156 65
STAT...securefile direct read 84 149 65
LATCH.DML lock allocation 7 73 66
STAT...physical reads direct 84 152 68
STAT...physical reads direct ( 84 152 68
STAT...physical reads 84 152 68
STAT...Cached Commit SCN refer 95 166 71
LATCH.channel operations paren 16 88 72
STAT...undo change vector size 5,360 5,444 84
LATCH.parameter table manageme 6 91 85
LATCH.undo global data 6 96 90
LATCH.client/application info 22 113 91
LATCH.object queue header oper 18 120 102
LATCH.resmgr:active threads 16 119 103
LATCH.Change Notification Hash 4 111 107
LATCH.KMG MMAN ready and start 4 112 108
LATCH.session timer 4 112 108
LATCH.lgwr LWN SCN 4 115 111
LATCH.sort extent pool 4 129 125
LATCH.OS process 28 155 127
LATCH.ktfbn latch 1 129 128
LATCH.io pool granule list 150 279 129
LATCH.active checkpoint queue 4 135 131
LATCH.ksuosstats global area 5 150 145
LATCH.session idle bit 62 227 165
LATCH.transaction allocation 12 204 192
LATCH.redo allocation 11 246 235
LATCH.query server process 8 251 243
STAT...redo size 7,312 7,556 244
LATCH.ASM db client latch 8 254 246
LATCH.io pool granule metadata 319 590 271
LATCH.post/wait queue 9 286 277
LATCH.call allocation 32 336 304
STAT...index fetch by key 425 53 -372
LATCH.redo writing 19 415 396
STAT...consistent gets pin 177 605 428
LATCH.ktm global data 16 456 440
STAT...no work - consistent re 162 607 445
LATCH.simulator hash latch 426 897 471
LATCH.space background task la 20 497 477
STAT...consistent gets examina 6,263 6,924 661
STAT...consistent gets examina 6,263 6,924 661
LATCH.OS process allocation 42 747 705
STAT...file io wait time 31,000 32,000 1,000
STAT...session logical reads 6,525 7,606 1,081
STAT...consistent gets from ca 6,440 7,529 1,089
STAT...consistent gets 6,440 7,529 1,089
LATCH.shared pool 1,289 2,391 1,102
STAT...table fetch by rowid 5,407 6,807 1,400
STAT...rows fetched via callba 5,399 6,799 1,400
LATCH.SGA Logging Bkt Latch 53 1,457 1,404
STAT...calls to get snapshot s 1,964 3,399 1,435
LATCH.active service list 64 1,696 1,632
LATCH.SGA Logging Log Latch 65 1,807 1,742
STAT...in call idle wait time 300 2,722 2,422
STAT...buffer is not pinned co 10,923 13,726 2,803
LATCH.row cache objects 1,733 5,228 3,495
LATCH.JS queue state obj latch 126 3,872 3,746
LATCH.messages 162 4,429 4,267
LATCH.checkpoint queue latch 193 5,922 5,729
LATCH.SQL memory manager worka 303 7,701 7,398
LATCH.enqueue hash chains 394 10,010 9,616
LATCH.cache buffers chains 7,256 20,322 13,066
STAT...Effective IO time 19,376 46,413 27,037
STAT...recursive cpu usage 922 30,811 29,889
STAT...CPU used by this sessio 922 30,811 29,889
STAT...Elapsed Time 1,223 33,559 32,336
STAT...securefile direct read 688,128 1,245,184 557,056
STAT...cell physical IO interc 802,816 1,359,872 557,056
STAT...physical read total byt 802,816 1,359,872 557,056
STAT...physical read bytes 688,128 1,245,184 557,056
STAT...session uga memory max 1,732,672 2,807,008 1,074,336
STAT...session uga memory 1,522,304 -65,488 -1,587,792
STAT...session pga memory max 4,194,304 1,048,576 -3,145,728
STAT...session pga memory 4,194,304 -2,162,688 -6,356,992
STAT...logical read bytes from 53,452,800 62,308,352 8,855,552
Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
13,407 74,188 60,781 18.07%
I got some feedback from the Text product manager on this.
Doing a NEAR search on words requires is done as
a- fetch document location and position for each word,
b- a merge to reduce it down to those documents that contain all words
c- examine the word position information to see where they occur near each other
So if (a) and (b) yield lots of hits, then (c) becomes very expensive
Some things to try:
- Optimize the index with CTX_DDL.OPTIMIZE_INDEX
- Look at ensuring the I/O rates are as efficient as possible, by tracing the queries and seeing how much time is being lost on I/O (as opposed to cpu)