Kalyan Deshi, February 15, 2019 - 11:44 am UTC
Are you really selecting no columns from the table, just systimestamp? If so, why are you running this query at all?
No,We are fetching all columns data.
Why are you fetching 20 million+ rows? Do you really need that many?
Yes, We are having a use case where we need to make parallel ingestion. So, we are splitting the table using the row number and selecting data accordingly.
Why are you using the row number trick to get rows 361 million - 381 million? Why can't you a regular where clause against columns in the table?
We are having a use case where we need to make parallel ingestion. So, we are splitting the table using the row number and selecting data accordingly.
February 15, 2019 - 11:58 am UTC
So... what exactly are you doing?
A reader, February 19, 2019 - 3:55 am UTC
Using row number we split the table accordingly.
For example : if a table consits of 1000 records,split table into 5 parts i.e. 1st split consists of 1-100 records likewise nxt split consits of 101-200 records.
After spliting we stream the data parallelly i.e. we make 5 connections and select data from 5splits parrallely and streams the data.
February 19, 2019 - 10:42 am UTC
The problem is to find rows 361M -> 381M, you need to read all 381M rows.
You're better off splitting the table into chunks of XXX rows. And finding the min/max primary key or rowid values for each chunk.
Then have each parallel process find the rows between the min/max values you pass it.
DBMS_parallel_execute can help with this:
https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2 PS - your query also doesn't have an order by. Is this the case in the real statement? If so you may end up reading rows twice or not at all!
A reader, February 19, 2019 - 11:20 am UTC
I have gone through the link and read the info about dbms_parallel_execute
But we dont have access to create any task or chunks.
Is there any other approach to select data parallelly ??
February 20, 2019 - 3:38 am UTC
But we dont have access to create any task or chunks.
Why? ie, why not take advantage of all the features available in the Oracle database?
totally different idea
xx, February 20, 2019 - 10:12 am UTC
If You have partitioning option, then partition (or subpartition) table by some suitable key with hash partitions 32 (or some other suitable value).
So You can then efficiently scan by 32 parallel jobs by referencing partition.
Or if there is some sequence generated column, You could partition table with interval partitioning and set interval the amount of rows you would have in partition at the maximum.
Then just look from metadata which partitions You have to scan.
February 20, 2019 - 11:14 am UTC
Yep. Although that requires partitioning the table first! Which could take a while on a billion-row table.