Skip to Main Content
  • Questions
  • Assinging Batch Numbers to Query Rows

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Greg.

Asked: November 04, 2015 - 4:22 pm UTC

Last updated: August 24, 2017 - 12:56 pm UTC

Version: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

Viewed 10K+ times! This question is

You Asked

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

create table orders
(
order_number number,
order_line number)

order_number line_number
50000 1
50000 2
50000 3
50010 1
50010 2
50300 3
50301 1
50301 5
50302 1
50302 2
50302 3
50302 4
50302 5
50302 6
50304 1
50304 2
50304 3
50305 6
50305 7
50306 8
50307 1
50308 1
51913 1
51913 2
...

In a SQL SELECT query I want to batch the orders together in groups of 5, to get something like this:

order_number line_number batch_number
50000 1 1
50000 2 1
50000 3 1
50010 1 1
50010 2 1
50300 3 1
50301 1 1
50301 5 1
50302 1 1
50302 2 1
50302 3 1
50302 4 1
50302 5 1
50302 6 1
50304 1 2
50304 2 2
50304 3 2
50305 6 2
50305 7 2
50306 8 2
50307 1 2
50308 1 2
51913 1 3
51913 2 3
...

An order can have any number of lines (except none), and may or may not have a line_number=1. Assume the results
are ordered by order number.

In the past, I've written a package function that when passed the order number, compares it to the previous order
number (stored in a package variable) and increments/returns the batch number accordingly. But in this usage (Web ADI),
it doesn't like me using a function in the query and sometimes gives ORA-04068 errors, and Web ADI caching is causing issues, too.

I feel there must be an analytical solution to this, but I think I've been staring at it so long that it is not apparent.

Any ideas?

and Chris said...

When working on these problems, it's helpful to sketch out the algorithm you need to use first.

- Every fifth order number, increase the batch_number by one.

Hmm. That doesn't help much. We need to break it down further.

You can put numbers in groups of 5 by dividing by this and taking the floor. There are gaps between the order numbers though. We need to assign an order rank, a gap-free sequence that increases for each new order number. Here's an outline of the process:

- Order the rows by order_number
- Start the order rank at zero
- If the order_number for the current row is different from the previous row, increment the order rank by one. Otherwise keep the same order rank
- Divide this result by five
- Take the floor of this division. This will group the rows into batches of five.
- These are your batch numbers. It starts at zero however. Add one if you want these to start at that instead.

You could do this with analytic functions. I'm going to use the model clause instead:

select * from orders
model 
  dimension by ( 
    -- this specifies how we're accessing rows
    row_number() over (order by order_number, order_line) rn 
  )
  measures ( 
    order_number, order_line, 0 order_rank, 0 order_group, 0 batch_number 
  )
  rules (
    order_rank[any] = 
      case 
        when order_number[cv()] != order_number[cv()-1] then 
          -- we need to increment the order rank
          order_rank[cv()-1]+1
        else 
          -- nvl() to set the value of the first row, otherwise all null
          nvl(order_rank[cv()-1], 0)
      end,
    order_group[any] = floor( (order_rank[cv()]) / 5 ),
    batch_number[any] = order_group[cv()] + 1
  );

        RN ORDER_NUMBER ORDER_LINE ORDER_RANK ORDER_GROUP BATCH_NUMBER
---------- ------------ ---------- ---------- ----------- ------------
         1        50000          1          0           0            1
         2        50000          2          0           0            1
         3        50000          3          0           0            1
         4        50010          1          1           0            1
         5        50010          2          1           0            1
         6        50300          3          2           0            1
         7        50301          1          3           0            1
         8        50301          5          3           0            1
         9        50302          1          4           0            1
        10        50302          2          4           0            1
        11        50302          3          4           0            1
        12        50302          4          4           0            1
        13        50302          5          4           0            1
        14        50302          6          4           0            1
        15        50304          1          5           1            2
        16        50304          2          5           1            2
        17        50304          3          5           1            2
        18        50305          6          6           1            2
        19        50305          7          6           1            2
        20        50306          8          7           1            2
        21        50307          1          8           1            2
        22        50308          1          9           1            2
        23        51913          1         10           2            3
        24        51913          2         10           2            3


You could simplify this (+1 in the order group is the batch_number). I've kept it split out to aid understanding.

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent!

Greg Johnson, November 05, 2015 - 2:22 pm UTC

Truly, you are a sql GOD. I had some of the bits and pieces, but was nowhere near your elegant solution.

Thanks so much!
Chris Saxon
November 06, 2015 - 2:10 am UTC

Thanks for your kind words. Glad to help..

For anyone with 12c

Stew Ashton, November 06, 2015 - 4:38 am UTC

Sometimes people read these questions years later, so...

For anyone with a similar problem and who has version 12c or later, there is an alternative solution:
select * from orders
match_recognize(
  order by order_number, order_line
  measures match_number() batch_number
  all rows per match
  pattern( (a b*){1,5} )
  define a as order_number > prev(order_number) or prev(order_number) is null,
         b as order_number = prev(order_number)
);

 ORDER_NUMBER ORDER_LINE  BATCH_NUMBER
------------- ---------- -------------
        50000          1             1
        50000          2             1
        50000          3             1
        50010          1             1
        50010          2             1
        50300          3             1
        50301          1             1
        50301          5             1
        50302          1             1
        50302          2             1
        50302          3             1
        50302          4             1
        50302          5             1
        50302          6             1
        50304          1             2
        50304          2             2
        50304          3             2
        50305          6             2
        50305          7             2
        50306          8             2
        50307          1             2
        50308          1             2
        51913          1             3
        51913          2             3

An analytic alternative

Timo Raitalaakso, November 06, 2015 - 6:10 am UTC

"You could do this with analytic functions". Why use model or match regognize, when the solution is much simpler and readable with the analytic way?
select order_number
     , line_number
     , ceil(dense_rank()over(order by order_number)/5) batch_number
  from orders


Everything Should Be Made as Simple as Possible, But Not Simpler
Connor McDonald
November 06, 2015 - 8:56 am UTC

"Everything Should Be Made as Simple as Possible, But Not Simpler "

Indeed.

Timo!

Stew Ashton, November 06, 2015 - 8:56 am UTC

Timo, I agree with you 100%. I often promote MATCH_RECOGNIZE to replace nested analytic functions, but a straightforward analytic function like yours is hard to beat. I just didn't see it.

Follow up on the same scenario

Renjith, August 23, 2017 - 12:26 pm UTC

will the solution work if I have a total of 1 lakh records. 10 order ids, each with 10000 line numbers. If I have to divide the 1 lakh data , using the above solutions, in to 10 batches, I am getting all the records with batch number 1.

My requirement:

What if I have an extra condition to handle? In my requirement, I need to divide the data in to batches, Conditions
a) making sure that the order line number belonging to the order does not get split in to different batches.I guess this is handled in the above solutions.
b) In a single batch, at the max there can only be 10 records.
Chris Saxon
August 24, 2017 - 12:56 pm UTC

I'm not sure I understand what you're trying to do...

Could you post create table + insert into showing a sample of the data you're working with?

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.