Skip to Main Content
  • Questions
  • Retreive rows of primary key in batches

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Ramesh.

Asked: March 06, 2017 - 4:37 pm UTC

Last updated: March 06, 2017 - 5:42 pm UTC

Version: ORACLE 11.2

Viewed 1000+ times

You Asked

I want to fetch a 80Million table, due to performance constraints, I want to process the data in batches(5000), i want to get the firstKey and lastKey for each batch.
Each batch has 5000 unique Id records.

Example of output:-
BATCH_ID FIRST_KEY LAST_KEY
-------------------------------------
1 1 5000
2 5001 10000
3 10001 15000


My Query is given below, i get batch details but when I query for a given firstKey & lastKey individually in the database i do not get 5000 records, sometimes its less/more.
Pls help.
------
SELECT
t.batch_nbr AS batchId,
MIN (t.id) AS firstKey,
MAX (t.id) AS lastKey
FROM
(
SELECT
id,
CASE WHEN MOD (RANK () OVER (ORDER BY id),
5000) = 0 THEN ROUND ((RANK () OVER (ORDER BY id)) / 5000)
ELSE ROUND ((((RANK () OVER (ORDER BY id)) / 5000) + 1)) END AS batch_nbr
FROM CUSTOMER
WHERE -- my query conditions
dt_fee <= TO_DATE ('2017-01-31',
'YYYY-MM-DD')
customer_type IN
(
1,
2,
3
)
GROUP BY id
ORDER BY id
)
t
GROUP BY t.batch_nbr
ORDER BY t.batch_nbr;



and Chris said...

Your expression to split the rows into batches is way off. Mixing mod and division doesn't work to produce equal groups!

All you need to do is take the ceil() of the rank/5000 (or however many you want in each group):

with rws as (
  select rank() over (order by level) rk from dual connect by level <= 10
)
  select rk, mod(rk, 5), round(rk/5), round(rk/5) + 1, 
         case
           when mod(rk, 5) = 0 then
             round(rk/5)
           else 
             round(rk/5) + 1
         end bnum,
         ceil(rk/5)
  from   rws;

RK  MOD(RK,5)  ROUND(RK/5)  ROUND(RK/5)+1  BNUM  CEIL(RK/5)  
1   1          0            1              1     1           
2   2          0            1              1     1           
3   3          1            2              2     1           
4   4          1            2              2     1           
5   0          1            2              1     1           
6   1          1            2              2     2           
7   2          1            2              2     2           
8   3          2            3              3     2           
9   4          2            3              3     2           
10  0          2            3              2     2


For a more thorough discussion of efficiently splitting a table into equal sized chunks, read Bryn Llewellyn's "Transforming one table to another" paper:

https://blogs.oracle.com/plsql-and-ebr/entry/transforming_one_table_to_another

and Stew Ashton's series discussing his SQL solution:

https://stewashton.wordpress.com/2016/02/01/chunking-tables-1-genesis/

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Ramesh Kasarla, March 06, 2017 - 5:29 pm UTC

Thanks Chris, i am little confused.
I have updated query as below but still do not get batches of 5000, from below output i get batches having more than 5000 records.

Thanks for your patience.

SELECT
t.batch_nbr AS batchId,
MIN (t.id) AS firstKey,
MAX (t.id) AS lastKey
FROM
(
SELECT
id,
CASE WHEN MOD (RANK () OVER (ORDER BY id),
5000) = 0
THEN CEIL((RANK () OVER (ORDER BY id)) / 5000)
ELSE CEIL((((RANK () OVER (ORDER BY id)) / 5000) + 1)) END AS batch_nbr
FROM CUSTOMER
WHERE -- my query conditions
dt_fee <= TO_DATE ('2017-01-31',
'YYYY-MM-DD')
customer_type IN
(
1,
2,
3
)
GROUP BY id
ORDER BY id
)
t
GROUP BY t.batch_nbr
ORDER BY t.batch_nbr;
Chris Saxon
March 06, 2017 - 5:42 pm UTC

Delete the whole case statement. Replace it with ceil(). There's no need for the mod nonsense!

If you're still struggling, please post a complete test case, including:

- Create table DDL
- Insert statements
- Your query
- The output you get and what you want

else NTILE

Rajeshwaran, Jeyabal, March 06, 2017 - 6:25 pm UTC

demo@ORA12C> select level x
  2  from dual
  3  connect by level <=10 ;

         X
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

demo@ORA12C> select x, ntile(2) over(order by x) nt
  2  from ( select level x
  3     from dual
  4     connect by level <= 10 );

         X         NT
---------- ----------
         1          1
         2          1
         3          1
         4          1
         5          1
         6          2
         7          2
         8          2
         9          2
        10          2

10 rows selected.

demo@ORA12C>
demo@ORA12C> create table t as select * from all_objects;

Table created.

demo@ORA12C> column x new_val x1
demo@ORA12C> select ceil(count(*)/5000) x from t ;

         X
----------
        16

demo@ORA12C> select nt,count(*),min(object_id),max(object_id)
  2  from (
  3  select object_id, ntile(&x1) over(order by object_id) nt
  4  from t     )
  5  group by nt
  6  order by nt;
old   3: select object_id, ntile(&x1) over(order by object_id) nt
new   3: select object_id, ntile(        16) over(order by object_id) nt

        NT   COUNT(*) MIN(OBJECT_ID) MAX(OBJECT_ID)
---------- ---------- -------------- --------------
         1       4840            133           6635
         2       4840           6636          14639
         3       4840          14641          22907
         4       4840          22908          27748
         5       4840          27749          32597
         6       4840          32598          37437
         7       4840          37438          42277
         8       4840          42278          47117
         9       4839          47118          51956
        10       4839          51957          56795
        11       4839          56796          61634
        12       4839          61635          66473
        13       4839          66474          71315
        14       4839          71316          76554
        15       4839          76555          82121
        16       4839          82122          95348

16 rows selected.

demo@ORA12C>

12c pattern matching

Rajeshwaran, Jeyabal, March 06, 2017 - 6:30 pm UTC

with 12c pattern matching, I could be like this.

demo@ORA12C> select *
  2  from t
  3  match_recognize(
  4    order by object_id
  5    measures
  6      match_number() mno,
  7      count(*) as cnt,
  8      min(object_id) as min_id,
  9      max(object_id) as max_id
 10    one row per match
 11    pattern( strt down*)
 12    define
 13      down as prev(object_id) <= object_id
 14            and running count(*) <=:batch_size )
 15  /

       MNO        CNT     MIN_ID     MAX_ID
---------- ---------- ---------- ----------
         1       5000        133       6886
         2       5000       6887      15092
         3       5000      15093      23387
         4       5000      23388      28391
         5       5000      28392      33397
         6       5000      33398      38397
         7       5000      38398      43397
         8       5000      43398      48397
         9       5000      48398      53397
        10       5000      53398      58397
        11       5000      58398      63397
        12       5000      63398      68400
        13       5000      68401      73774
        14       5000      73775      79528
        15       5000      79529      84973
        16       2432      84974      95348

16 rows selected.

demo@ORA12C>

Ramesh Kasarla, March 06, 2017 - 8:08 pm UTC

Thanks Jeyabal.

I tried to execute to have 5000 records for each batch and gave NTILE(5000) , also with NTILE( ceil ...count(*)/5000) but the records in the batch were less than 5000, were around 3268.

also i have observed that firstKey & lastKey were not unique all the times.

output(4th row firstKey is same as 3rd Row LastKey)

Batch_nbr Count(*) FirstKey LastKey
---------------------------------------------
1 3648 3514466 137428270
2 3648 137428283 139269688
3 3648 139269689 140757552
4 3648 140757552 141116898

Query:-

SELECT
t.batch_nbr,
COUNT (*),
MIN (t.id) AS firstKey,
MAX (t.) AS lastKey
FROM
(
SELECT
id,
NTILE (5000) OVER (ORDER BY id) batch_nbr
FROM CUSTOMER
WHERE
dt_fee <= TO_DATE ('2017-01-31',
'YYYY-MM-DD') -- current eop passed from java
AND
dt_srce_beg <= TO_TIMESTAMP ('2017-02-01 17:00:00',
'YYYY-MM-DD HH24:MI:SS') -- next BOP passed from java
AND
dt_srce_end >= TO_TIMESTAMP ('2017-02-01 17:00:00',
'YYYY-MM-DD HH24:MI:SS') -- next BOP passed from java
AND
cd_fee_cat_type IN
(
1,
2,
3
)
)
t
GROUP BY t.batch_nbr
ORDER BY t.batch_nbr;

NTILE

Rajeshwaran, Jeyabal, March 07, 2017 - 2:00 pm UTC

....
I tried to execute to have 5000 records for each batch and gave NTILE(5000) , also with NTILE( ceil ...count(*)/5000) but the records in the batch were less than 5000, were around 3268.

also i have observed that firstKey & lastKey were not unique all the times.
....



NTILE is there to split the data into batches, with each batch having more or less the same volume of data.

if the first_key and last_key are not unique, then we have to sort them/batch them using KEY columns.

demo@ORA12C>
demo@ORA12C> create table t as select * from all_objects;

Table created.

demo@ORA12C> alter table t add constraint t_pk
  2  primary key(object_id);

Table altered.

demo@ORA12C>
demo@ORA12C> column x new_val x1
demo@ORA12C> select ceil(count(*)/5000) x from t;

         X
----------
        16

demo@ORA12C>
demo@ORA12C> select nt,count(*),min(object_id) first_key,max(object_id) last_key
  2  from (
  3  select object_id, ntile(&x1) over(order by object_id) nt
  4  from t
  5      )
  6  group by nt
  7  order by nt;
old   3: select object_id, ntile(&x1) over(order by object_id) nt
new   3: select object_id, ntile(        16) over(order by object_id) nt

        NT   COUNT(*)  FIRST_KEY   LAST_KEY
---------- ---------- ---------- ----------
         1       4840        133       6635
         2       4840       6636      14639
         3       4840      14641      22907
         4       4840      22908      27748
         5       4840      27749      32597
         6       4840      32598      37437
         7       4840      37438      42277
         8       4840      42278      47117
         9       4839      47118      51956
        10       4839      51957      56795
        11       4839      56796      61634
        12       4839      61635      66473
        13       4839      66474      71315
        14       4839      71316      76554
        15       4839      76555      82121
        16       4839      82122      95351

16 rows selected.

demo@ORA12C>


so we have sixteen batches, with each batch having more or less the same volume of keys and each batch is close to 5000 keys.

If you are still constrained by having the exact 5000 keys per batch, then we could tweak the code like this.

demo@ORA12C> select x, count(*),min(object_id) first_key,max(object_id) last_key
  2  from (
  3  select object_id, ceil(row_number() over(order by object_id) /5000) x
  4  from t
  5       )
  6  group by x
  7  order by x ;

         X   COUNT(*)  FIRST_KEY   LAST_KEY
---------- ---------- ---------- ----------
         1       5000        133       6886
         2       5000       6887      15092
         3       5000      15093      23387
         4       5000      23388      28391
         5       5000      28392      33397
         6       5000      33398      38397
         7       5000      38398      43397
         8       5000      43398      48397
         9       5000      48398      53397
        10       5000      53398      58397
        11       5000      58398      63397
        12       5000      63398      68400
        13       5000      68401      73774
        14       5000      73775      79528
        15       5000      79529      84973
        16       2432      84974      95351

16 rows selected.

demo@ORA12C>

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.