Skip to Main Content
  • Questions
  • Oracle Text Contains search with Near is very slow

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Dmitry.

Asked: January 21, 2017 - 4:19 pm UTC

Last updated: February 03, 2017 - 7:45 pm UTC

Version: Oracle Database 12.1.0.2.0

Viewed 1000+ times

You Asked

Hi!
There is test environment: analytic table with 15,000,000 of documents. Average size of every document is from 20 to 100 pages.
We built Oracle Text Index on this table - type is CONTEXT:

CREATE INDEX rco_anal ON files(txt) INDEXTYPE IS ctxsys.context;

We want to search people in this documents. We do search on surname, name and patronymic: contains(txt,'NEAR (((BIRYUKOV),(ALEKSEY),(IGOREVICH)),5)').
When surname is rare (Biryukov) it takes 9 seconds, but when family is frequent (IVANOV) it takes 310 seconds. When we search with 4 word (surname, name, patronymic and year of birth) - search takes ours!

Documents in Russian, family in this text I change to Roman alphabet.

We got Oracle Text Trace information, but I don't understand what trouble is and what we can do for acceleration of searching.


Our SQL-case with statistics:

==============================

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select count(*) from files;

  COUNT(*)
----------
  15256791

Executed in 0,052 seconds

SQL> select count(*) from dr$rco_anal$i;

  COUNT(*)
----------
 402923895

Executed in 21,543 seconds

SQL> select count(*) from dr$rco_anal$k;

  COUNT(*)
----------
  15256791

Executed in 0,413 seconds

SQL> select count(*) from dr$rco_anal$n;

  COUNT(*)
----------
         0

Executed in 0,000 seconds

SQL> select count(*) from dr$rco_anal$r;

  COUNT(*)
----------
         1

Executed in 0,001 seconds

SQL> select count(*) from files where contains(txt,'(BIRYUKOV AND ALEKSEY AND IGOREVICH)') > 0;
  COUNT(*)
----------
      5527
Executed in 1,373 seconds

SQL> select count(*) from files where contains(txt,'NEAR (((BIRYUKOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
  COUNT(*)
----------
         3
Executed in 9,126 seconds

SQL> select count(*) from files where contains(txt,'(IVANOV AND ALEKSEY AND IGOREVICH)') > 0;
  COUNT(*)
----------
     37255
Executed in 15,429 seconds


--------------- Oracle Text Trace -----------------------------

SQL> select id from anal.files where contains(txt,' NEAR(((BIRYUKOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
        ID
----------
  23727822
  30537373
  26275335
Executed in 9,126 seconds

SQL> select id from anal.files where contains(txt,'NEAR (((IVANOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
        ID
----------
  22088462
...
  53649006
  53866426
223 rows selected

Executed in 309,959 seconds

Oracle Text Trace                                              BIRYUKOV IVANOV
                                                               (9 s)    (309s)
-------------------------------------------------------------- -------- ------
Time spent executing user datastore:                           0        0    
Time spent invoking the AUTO_FILTER filter:                    0        0    
Time spent executing the $X cursor:                            186      177  
Time spent fetching from $X:                                   4033     4256 
Total number of rows whose token metadata was fetched from $X: 6381     6827 
Time spent fetching the LOB locator from $I:                   23563    52426
Time spent reading $I LOB information:                         24542    62483
Number of rows whose $I token_info was actually read:          4977     6609 
Number of bytes read from $I LOBs:                             71897    14651
Time spent fetching and reading $R information:                111      4756 
Time spent in CONTAINS processing:                             91205    30920
Time spent fetching and reading $S information:                0        0    
Time spent reading $O information:                             0        0    
Time spent reading $D information:                             0        0    
Time spent extracting a snippet from a document:               0        0    


--------- Get stats with Thomas Kyte script -------------------

SQL> exec runstats_pkg.rs_start;
PL/SQL procedure successfully completed
Executed in 0,015 seconds

SQL> select count(*) from files where contains(txt,'NEAR (((BIRYUKOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
  COUNT(*)
----------
         3
Executed in 9,212 seconds

SQL> exec runstats_pkg.rs_middle;
PL/SQL procedure successfully completed
Executed in 0,032 seconds

SQL> select count(*) from files where contains(txt,'NEAR (((IVANOV),(ALEKSEY),(IGOREVICH)),5)') > 0;
  COUNT(*)
----------
       223
Executed in 308,103 seconds

SQL> exec runstats_pkg.rs_stop(1);

Run1 ran in 921 cpu hsecs
Run2 ran in 30810 cpu hsecs
run 1 ran in 2.99% of the time
 
Name                                      Run1            Run2            Diff
STAT...db block changes                    115             113              -2
STAT...redo entries                         23              25               2
STAT...HSC Heap Segment Block               23              25               2
STAT...Heap Segment Array Inse              23              25               2
STAT...parse time elapsed                    6               4              -2
LATCH.object queue memory                    1               3               2
LATCH.KTF sga latch                          0               2               2
LATCH.temp lob duration state                0               2               2
LATCH.resmgr:resource group CP               5               3              -2
LATCH.job workq parent latch                 1               3               2
LATCH.Result Cache: SO Latch                 0               2               2
STAT...user I/O wait time                    1               4               3
STAT...non-idle wait time                    1               4               3
STAT...active txn count during               0               3               3
STAT...cleanout - number of kt               0               3               3
STAT...consistent changes                   91              87              -4
STAT...calls to kcmgcs                      66              70               4
STAT...parse time cpu                        6               2              -4
STAT...execute count                        42              38              -4
LATCH.list of block allocation               1               5               4
LATCH.dispatcher info                        0               4               4
STAT...pinned cursors current                6               0              -6
LATCH.kjoedcso state object fr               0               6               6
LATCH.dml lock allocation                    0               6               6
LATCH.lob segment hash table l               1               7               6
LATCH.deferred cleanup latch                 0               6               6
LATCH.cp sga latch                           0               6               6
LATCH.ncodef allocation latch                0               6               6
LATCH.ASM network state latch                0               6               6
LATCH.ASM remote client latch                0               6               6
LATCH.Sched IM Job latch                     1               7               6
STAT...opened cursors current                7               0              -7
LATCH.Result Cache: RC Latch                 0               7               7
STAT...db block gets                        85              77              -8
STAT...db block gets from cach              85              77              -8
STAT...parse count (total)                  41              33              -8
LATCH.shared pool simulator                  9               1              -8
LATCH.job_queue_processes free               0               8               8
LATCH.JS slv state obj latch                 1               9               8
LATCH.SQL memory manager latch               1              10               9
LATCH.Real-time descriptor lat               1              10               9
STAT...opened cursors cumulati              44              34             -10
STAT...consistent gets pin (fa             174             184              10
LATCH.FOB s.o list latch                     5              15              10
LATCH.resmgr:actses change sta               4              14              10
LATCH.kwslbmdl: metadata latch               1              11              10
LATCH.AQ Coord jobx_kwsbgsgn l               1              11              10
LATCH.AQ Sharded master pool l               1              11              10
LATCH.krso process latch                     0              11              11
LATCH.managed standby latch                  0              11              11
LATCH.Shared B-Tree                          1              12              11
LATCH.kwqbsn:qsga                            0              11              11
STAT...session cursor cache hi              15              27              12
STAT...parse count (hard)                   13               1             -12
LATCH.object queue header free               1              13              12
LATCH.file cache latch                       0              12              12
LATCH.threshold alerts latch                 1              13              12
STAT...enqueue releases                     19               6             -13
STAT...session cursor cache co              16               3             -13
LATCH.Consistent RBA                         2              15              13
STAT...enqueue requests                     20               6             -14
LATCH.dtp latch                              3              17              14
LATCH.sequence cache                         0              15              15
LATCH.event stats latch                      3              19              16
LATCH.Report Request stats lat               4              20              16
LATCH.compile environment latc               3              20              17
LATCH.cp srv type state latch                1              19              18
LATCH.JS mem alloc latch                     0              18              18
LATCH.JS queue access latch                  1              19              18
LATCH.session switching                      4              24              20
LATCH.kss move lock                          7              27              20
LATCH.PL/SQL warning settings               37              60              23
STAT...workarea memory allocat             -10             -35             -25
LATCH.Request holder compeltio               7              35              28
LATCH.process group creation                 6              35              29
LATCH.channel handle pool latc               7              36              29
LATCH.OS process: request allo               6              35              29
LATCH.ksz_so allocation latch                6              35              29
LATCH.simulator lru latch                    1              30              29
LATCH.kwsptjobAdPtLt: AdPt lis               0              31              31
LATCH.Event Group Locks                      7              39              32
LATCH.session statistics                     7              40              33
LATCH.dummy allocation                       7              40              33
LATCH.ILM Stats main anchor la               7              40              33
LATCH.resmgr:schema config                   3              36              33
LATCH.cache buffers lru chain                1              38              37
STAT...recursive calls                     357             396              39
LATCH.ksv class latch                       13              53              40
LATCH.process allocation                     9              52              43
LATCH.resmgr group change latc              12              55              43
LATCH.resmgr:actses change gro              13              56              43
LATCH.session allocation                     9              53              44
LATCH.qmn task queue latch                   0              44              44
LATCH.ksv allocation latch                  12              58              46
STAT...cleanouts only - consis              70             120              50
STAT...immediate (CR) block cl              70             120              50
LATCH.resmgr:free threads list              12              65              53
STAT...shared io pool buffer g              78             136              58
LATCH.cache buffer handles                  10              72              62
LATCH.session state list latch              10              73              63
STAT...non-idle wait count                  95             158              63
STAT...physical read IO reques              84             149              65
STAT...physical read total IO               91             156              65
STAT...securefile direct read               84             149              65
LATCH.DML lock allocation                    7              73              66
STAT...physical reads direct                84             152              68
STAT...physical reads direct (              84             152              68
STAT...physical reads                       84             152              68
STAT...Cached Commit SCN refer              95             166              71
LATCH.channel operations paren              16              88              72
STAT...undo change vector size           5,360           5,444              84
LATCH.parameter table manageme               6              91              85
LATCH.undo global data                       6              96              90
LATCH.client/application info               22             113              91
LATCH.object queue header oper              18             120             102
LATCH.resmgr:active threads                 16             119             103
LATCH.Change Notification Hash               4             111             107
LATCH.KMG MMAN ready and start               4             112             108
LATCH.session timer                          4             112             108
LATCH.lgwr LWN SCN                           4             115             111
LATCH.sort extent pool                       4             129             125
LATCH.OS process                            28             155             127
LATCH.ktfbn latch                            1             129             128
LATCH.io pool granule list                 150             279             129
LATCH.active checkpoint queue                4             135             131
LATCH.ksuosstats global area                 5             150             145
LATCH.session idle bit                      62             227             165
LATCH.transaction allocation                12             204             192
LATCH.redo allocation                       11             246             235
LATCH.query server process                   8             251             243
STAT...redo size                         7,312           7,556             244
LATCH.ASM db client latch                    8             254             246
LATCH.io pool granule metadata             319             590             271
LATCH.post/wait queue                        9             286             277
LATCH.call allocation                       32             336             304
STAT...index fetch by key                  425              53            -372
LATCH.redo writing                          19             415             396
STAT...consistent gets pin                 177             605             428
LATCH.ktm global data                       16             456             440
STAT...no work - consistent re             162             607             445
LATCH.simulator hash latch                 426             897             471
LATCH.space background task la              20             497             477
STAT...consistent gets examina           6,263           6,924             661
STAT...consistent gets examina           6,263           6,924             661
LATCH.OS process allocation                 42             747             705
STAT...file io wait time                31,000          32,000           1,000
STAT...session logical reads             6,525           7,606           1,081
STAT...consistent gets from ca           6,440           7,529           1,089
STAT...consistent gets                   6,440           7,529           1,089
LATCH.shared pool                        1,289           2,391           1,102
STAT...table fetch by rowid              5,407           6,807           1,400
STAT...rows fetched via callba           5,399           6,799           1,400
LATCH.SGA Logging Bkt Latch                 53           1,457           1,404
STAT...calls to get snapshot s           1,964           3,399           1,435
LATCH.active service list                   64           1,696           1,632
LATCH.SGA Logging Log Latch                 65           1,807           1,742
STAT...in call idle wait time              300           2,722           2,422
STAT...buffer is not pinned co          10,923          13,726           2,803
LATCH.row cache objects                  1,733           5,228           3,495
LATCH.JS queue state obj latch             126           3,872           3,746
LATCH.messages                             162           4,429           4,267
LATCH.checkpoint queue latch               193           5,922           5,729
LATCH.SQL memory manager worka             303           7,701           7,398
LATCH.enqueue hash chains                  394          10,010           9,616
LATCH.cache buffers chains               7,256          20,322          13,066
STAT...Effective IO time                19,376          46,413          27,037
STAT...recursive cpu usage                 922          30,811          29,889
STAT...CPU used by this sessio             922          30,811          29,889
STAT...Elapsed Time                      1,223          33,559          32,336
STAT...securefile direct read          688,128       1,245,184         557,056
STAT...cell physical IO interc         802,816       1,359,872         557,056
STAT...physical read total byt         802,816       1,359,872         557,056
STAT...physical read bytes             688,128       1,245,184         557,056
STAT...session uga memory max        1,732,672       2,807,008       1,074,336
STAT...session uga memory            1,522,304         -65,488      -1,587,792
STAT...session pga memory max        4,194,304       1,048,576      -3,145,728
STAT...session pga memory            4,194,304      -2,162,688      -6,356,992
STAT...logical read bytes from      53,452,800      62,308,352       8,855,552
 
Run1 latches total versus runs -- difference and pct
          Run1               Run2              Diff        Pct
            13,407            74,188            60,781     18.07%


and Connor said...

I got some feedback from the Text product manager on this.

Doing a NEAR search on words requires is done as

a- fetch document location and position for each word,
b- a merge to reduce it down to those documents that contain all words
c- examine the word position information to see where they occur near each other

So if (a) and (b) yield lots of hits, then (c) becomes very expensive

Some things to try:

- Optimize the index with CTX_DDL.OPTIMIZE_INDEX
- Look at ensuring the I/O rates are as efficient as possible, by tracing the queries and seeing how much time is being lost on I/O (as opposed to cpu)


Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Does not work for us

Dmitry, February 03, 2017 - 4:25 pm UTC

Thank you for the answer, but the simple execution of CTX_DDL.OPTIMIZE_INDEX did not help.

1. We have executed CTX_DDL.OPTIMIZE_INDEX and the result remained the same. The index was clean, as was created from scratch for the whole array of documents. Therefore, its optimization has not brought results.

2. IO the output is not significant, as you can see from the statistics themselves:
STAT...user I/O wait time                    1               4               3
STAT...physical reads                       84             152              68
STAT...rows fetched via callba           5,399           6,799           1,400
STAT...file io wait time                31,000          32,000           1,000
STAT...physical read total byt         802,816       1,359,872         557,056
STAT...physical read bytes             688,128       1,245,184         557,056
STAT...securefile direct read          688,128       1,245,184         557,056
STAT...logical read bytes from      53,452,800      62,308,352       8,855,552


All the time spent on the CPU, that is, the math:
STAT...recursive cpu usage                 922          30,811          29,889
STAT...CPU used by this sessio             922          30,811          29,889
STAT...Elapsed Time                      1,223          33,559          32,336


It seems that Oracle spends all his time to perform the permutation of words to find those documents where the required words are close in the desired range of each other. At the same time executing a query on math not parallelized. That is loaded with only one CPU-core of 16. It is possible to parallelize some of these mathematical calculations?
Connor McDonald
February 03, 2017 - 7:45 pm UTC

Text queries *can* be run in parallel, but this is subject to some restrictions

http://docs.oracle.com/database/121/CCAPP/GUID-068C5AA8-D3C9-4885-B49A-8CA8E3BDE018.htm#CCAPP9249


More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.