Skip to Main Content
  • Questions
  • Efficient way to fetch data using rownumber from a billion record table??

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Kalyan.

Asked: February 14, 2019 - 7:08 am UTC

Last updated: February 20, 2019 - 11:14 am UTC

Version: 10.1.2

Viewed 10K+ times! This question is

You Asked

I'm trying to fetch required data from a billion record table using the row number approch
(i.e, select "ingest_ts_utc" from (SELECT to_char(sys_extract_utc(systimestamp), 'YYYY-MM-DD HH24:MI:SS.FF') as "ingest_ts_utc" ,ROWNUM as rno from XYZ.ABC ) where rno between 361754731 and 381852215),

But its taking long time to get the data and some times facing connection issues,

How can i improve my sql to fetch data efficiently??


and Chris said...

As mentioned in the review where you raised this:

Scanning hundreds of millions+ rows
then
Returning tens of millions of these

is going to take a while

To make this query anything approaching "fast", you need to read and return less data. Much less.

Questions to answer so we can help you make some progress:

Are you really selecting no columns from the table, just systimestamp? If so, why are you running this query at all?
Why are you fetching 20 million+ rows? Do you really need that many?
Why are you using the rownum trick to get rows 361 million - 381 million? Why can't you a regular where clause against columns in the table?

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Kalyan Deshi, February 15, 2019 - 11:44 am UTC

Are you really selecting no columns from the table, just systimestamp? If so, why are you running this query at all?

No,We are fetching all columns data.

Why are you fetching 20 million+ rows? Do you really need that many?

Yes, We are having a use case where we need to make parallel ingestion. So, we are splitting the table using the row number and selecting data accordingly.

Why are you using the row number trick to get rows 361 million - 381 million? Why can't you a regular where clause against columns in the table?

We are having a use case where we need to make parallel ingestion. So, we are splitting the table using the row number and selecting data accordingly.
Chris Saxon
February 15, 2019 - 11:58 am UTC

So... what exactly are you doing?

A reader, February 19, 2019 - 3:55 am UTC

Using row number we split the table accordingly.
For example : if a table consits of 1000 records,split table into 5 parts i.e. 1st split consists of 1-100 records likewise nxt split consits of 101-200 records.

After spliting we stream the data parallelly i.e. we make 5 connections and select data from 5splits parrallely and streams the data.
Chris Saxon
February 19, 2019 - 10:42 am UTC

The problem is to find rows 361M -> 381M, you need to read all 381M rows.

You're better off splitting the table into chunks of XXX rows. And finding the min/max primary key or rowid values for each chunk.

Then have each parallel process find the rows between the min/max values you pass it.

DBMS_parallel_execute can help with this:

https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2

PS - your query also doesn't have an order by. Is this the case in the real statement? If so you may end up reading rows twice or not at all!

A reader, February 19, 2019 - 11:20 am UTC

I have gone through the link and read the info about dbms_parallel_execute

But we dont have access to create any task or chunks.

Is there any other approach to select data parallelly ??
Connor McDonald
February 20, 2019 - 3:38 am UTC

But we dont have access to create any task or chunks.


Why? ie, why not take advantage of all the features available in the Oracle database?

totally different idea

xx, February 20, 2019 - 10:12 am UTC

If You have partitioning option, then partition (or subpartition) table by some suitable key with hash partitions 32 (or some other suitable value).

So You can then efficiently scan by 32 parallel jobs by referencing partition.

Or if there is some sequence generated column, You could partition table with interval partitioning and set interval the amount of rows you would have in partition at the maximum.

Then just look from metadata which partitions You have to scan.

Chris Saxon
February 20, 2019 - 11:14 am UTC

Yep. Although that requires partitioning the table first! Which could take a while on a billion-row table.

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database