Skip to Main Content
  • Questions
  • How to paginate properly in Oracle Text?

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Sarvagy.

Asked: October 31, 2022 - 4:01 am UTC

Last updated: November 22, 2022 - 1:28 pm UTC

Version: 19c

Viewed 100+ times

You Asked

Hi Tom / Experts,

I have a test table with following schema

BLOG_DOCS
----------
doc_id number PRIMARY KEY
doc_title varchar2(25) NOT NULL
doc_author varchar2(25) NOT NULL
doc_data CLOB NOT NULL


There is an oracle text index created on blog_data as following:

create index blog_doc_txt_idx_ft on blog_docs(doc_data)
  indextype is ctxsys.context
  parameters (
  'WORDLIST SubStrWordListPref
   STORAGE DocsStorage
   DATASTORE MultiColumnDataStore
   sync (on commit)'
);


This table has been populated with about 10K records. Oracle recommended query using FIRST_ROWS has good CPU cost, see here: https://i.stack.imgur.com/jetSw.png

However the moment I use pagination using rownum CPU cost jumps much higher:

SELECT *
FROM
  (SELECT tmp.*,
          rownum rn
   FROM
     (SELECT /*+ FIRST_ROWS */  *
      FROM blog_docs
      WHERE contains(doc_data, 'richard', 1) > 0
      ORDER BY score(1) DESC) tmp
   WHERE rownum <= 20 )
WHERE rn > 10;

See https://i.stack.imgur.com/qF5Xb.png

What is the proper way to do pagination for oracle text columns such that CPU cost also comes down? This is needed because multiple high cost CPU queries will be flagged by DBAs.

and Chris said...

You should use the FETCH FIRST syntax for top-N queries since 12c:

SELECT *
FROM blog_docs
WHERE contains(doc_data, 'richard', 1) > 0
ORDER BY score(1) DESC
OFFSET 10 ROWS
FETCH FIRST 10 ROWS ONLY;


It's also better to use the "seek" method rows to get the rows for pages 2 onwards. This is where you save the value(s) from the last row on page N - 1 to filter the data.

Which would look something like:

SELECT d.*, score(1)
FROM blog_docs d
WHERE contains(doc_data, 'richard', 1) > 0
AND   score(1) <= :last_score
AND   doc_id > :last_id
ORDER BY score(1) DESC, doc_id
FETCH FIRST 10 ROWS ONLY;


This helps because the query only processes the N rows you want. With OFFSET it'll also read the M before this.

See more about this at https://asktom.oracle.com/pls/apex/asktom.search?tag=comparing-pagination-methods-offset-vs-seek

When it comes to analyzing your queries, the DBAs should really be checking row estimates are accurate and how much work (buffers/consisten gets/logical IO) the query actually does. Focusing on (CPU) cost alone is kinda meaningless.

https://asktom.oracle.com/pls/apex/asktom.search?tag=explain-plan-cardinality-and-cost

Rating

  (3 ratings)

Comments

Nice

dekike, November 22, 2022 - 9:58 am UTC

Excel OFFSET is an excellent formula for performing dynamic calculations with "moving" ranges. It's slow because it has to find the top offset rows and then scan the next 100 rows. When you're dealing with massive amounts of data, no amount of optimization will change that.
Chris Saxon
November 22, 2022 - 1:28 pm UTC

True

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.