You Asked

Hi Chirs/Connor,

LiveSQL: https://livesql.oracle.com/apex/livesql/s/flfiwlt4m3yqb6v2tync6gr4o

I have a table test, from that i want to get the details as follows:
Get the no of chunk having file size <= 2GB
E.g. id = 1,2,3 sum of these three rows file comes as <= 2GB so it will be treated as one chunk
likewise id = 4,5 form a chunk
id= 6,7,8,9 form a chunk
and id = 10 for a chunk

so totally for as given created_date = TO_DATE('09/21/2017','MM/DD/YYYY'), we have 4 chunk of sum of file size <= 2GB.

So i want a sql which will either of result like below:

Can you please help in this SQL

id  created_date     no_of_chunks   Chunk_Id
--  ---------------  -------------  ---------
1    21-SEP-17       4              1 
2    21-SEP-17       4              1
3    21-SEP-17       4              1
4    21-SEP-17       4              2
5    21-SEP-17       4              2
6    21-SEP-17       4              3
7    21-SEP-17       4              3
8    21-SEP-17       4              3
9    21-SEP-17       4              3
10   21-SEP-17       4              4

no_of_chunks  id  Chunk_Id   
-----------   --  ---------  
4             1   1          
              2   1
              3   1
              4   2
              5   2
              6   3
              7   3
              8   3
              9   3
              10  4

with LiveSQL Test Case:

https://livesql.oracle.com/apex/livesql/s/flfiwlt4m3yqb6v2tync6gr4o

and Chris said...

You can find a similar problem discussed at:

https://asktom.oracle.com/pls/apex/asktom.search?tag=how-to-split-rows-into-balanced-sets-based-on-a-running-total-limited-to-2000

IMO the best solution uses match_recognize... Which needs 12c.

If you're on 11.2, you could use recursive with:

with data
  as (select sum(file_size) over(order by to_number(substr(file_name, 6))) as running_total,
             to_number(substr(file_name, 6)) ord,
             a.*
       from  test a
      )
    ,row_generator as
     (select level as lvl
              ,(level-1)*2048 +1 as lo_lvl
              ,level*2048 as hi_lvl
          from dual
        connect by level<=(select ceil(max(data.running_total)/2048)
                             from data
                           )     
     )
select a1.id
      ,a1.created_date
      ,a1.file_size
      ,b1.lvl as chunk_id
      ,sum(a1.file_size) over(order by a1.ord) as running_tot
  from data a1
  join row_generator b1
     on a1.running_total>=b1.lo_lvl
    and a1.running_total<=b1.hi_lvl
order by 1,2  ;

ID  CREATED_DATE          FILE_SIZE  CHUNK_ID  RUNNING_TOT  
1   21-SEP-2017 00:00:00  1024       1         1024         
2   21-SEP-2017 00:00:00  512        1         1536         
3   21-SEP-2017 00:00:00  512        1         2048         
4   21-SEP-2017 00:00:00  1024       2         3072         
5   21-SEP-2017 00:00:00  1024       2         4096         
6   21-SEP-2017 00:00:00  512        3         4608         
7   21-SEP-2017 00:00:00  512        3         5120         
8   21-SEP-2017 00:00:00  512        3         5632         
9   21-SEP-2017 00:00:00  512        3         6144         
10  21-SEP-2017 00:00:00  2048       4         8192

Otherwise model:

select id, created_date, file_size, fs_tot, chunk_id
from   test
model
  dimension by (row_number() over(order by to_number(substr(file_name, 6))) rn)
  measures (id, created_date, file_size, 0 fs_tot, 1 chunk_id)
  rules(
    fs_tot[any] = file_size[cv()] +
      case when fs_tot[cv()-1] + file_size[cv()] <= 2048
           then fs_tot[cv()-1] else 0 end,
    chunk_id[rn>1] = chunk_id[cv()-1] +
      case when fs_tot[cv()-1] + file_size[cv()] <= 2048
           then 0 else 1 end
  );

ID  CREATED_DATE          FILE_SIZE  FS_TOT  CHUNK_ID  
1   21-SEP-2017 00:00:00  1024       1024    1         
2   21-SEP-2017 00:00:00  512        1536    1         
3   21-SEP-2017 00:00:00  512        2048    1         
4   21-SEP-2017 00:00:00  1024       1024    2         
5   21-SEP-2017 00:00:00  1024       2048    2         
6   21-SEP-2017 00:00:00  512        512     3         
7   21-SEP-2017 00:00:00  512        1024    3         
8   21-SEP-2017 00:00:00  512        1536    3         
9   21-SEP-2017 00:00:00  512        2048    3         
10  21-SEP-2017 00:00:00  2048       2048    4

HT to Duke Ganote and Stew Ashton for these solutions.

Rating

(1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Wow!! That's Awesome

A reader, September 21, 2017 - 1:17 pm UTC

Thanks a lot Chirs, Duke Ganote and Stew Ashton for this!!
Looking at these SQL's, I feel like I don't even fall under basic Level 1 of SQL.. :)

In my 7 years of experience, I never understood how MODEL clause (great SQL feature) works..

If Chirs/Connor visit India and I get chance to meet them by any chance, I would like to understand this feature :)

Thanks a lot again!!

September 21, 2017 - 1:25 pm UTC

Model's neat, but match_recognize is much cooler. Put your effort into understanding that instead ;)

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Wow!! That's Awesome

More to Explore

Analytics

How to Group Rows into Buckets Based on Chunk Size

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Wow!! That's Awesome

More to Explore

Analytics