Hi Chirs/Connor,
LiveSQL:
https://livesql.oracle.com/apex/livesql/s/flfiwlt4m3yqb6v2tync6gr4o I have a table test, from that i want to get the details as follows:
Get the no of chunk having file size <= 2GB
E.g. id = 1,2,3 sum of these three rows file comes as <= 2GB so it will be treated as one chunk
likewise id = 4,5 form a chunk
id= 6,7,8,9 form a chunk
and id = 10 for a chunk
so totally for as given created_date = TO_DATE('09/21/2017','MM/DD/YYYY'), we have 4 chunk of sum of file size <= 2GB.
So i want a sql which will either of result like below:
Can you please help in this SQL
id created_date no_of_chunks Chunk_Id
-- --------------- ------------- ---------
1 21-SEP-17 4 1
2 21-SEP-17 4 1
3 21-SEP-17 4 1
4 21-SEP-17 4 2
5 21-SEP-17 4 2
6 21-SEP-17 4 3
7 21-SEP-17 4 3
8 21-SEP-17 4 3
9 21-SEP-17 4 3
10 21-SEP-17 4 4
OR
no_of_chunks id Chunk_Id
----------- -- ---------
4 1 1
2 1
3 1
4 2
5 2
6 3
7 3
8 3
9 3
10 4
You can find a similar problem discussed at:
https://asktom.oracle.com/pls/apex/asktom.search?tag=how-to-split-rows-into-balanced-sets-based-on-a-running-total-limited-to-2000 IMO the best solution uses match_recognize... Which needs 12c.
If you're on 11.2, you could use recursive with:
with data
as (select sum(file_size) over(order by to_number(substr(file_name, 6))) as running_total,
to_number(substr(file_name, 6)) ord,
a.*
from test a
)
,row_generator as
(select level as lvl
,(level-1)*2048 +1 as lo_lvl
,level*2048 as hi_lvl
from dual
connect by level<=(select ceil(max(data.running_total)/2048)
from data
)
)
select a1.id
,a1.created_date
,a1.file_size
,b1.lvl as chunk_id
,sum(a1.file_size) over(order by a1.ord) as running_tot
from data a1
join row_generator b1
on a1.running_total>=b1.lo_lvl
and a1.running_total<=b1.hi_lvl
order by 1,2 ;
ID CREATED_DATE FILE_SIZE CHUNK_ID RUNNING_TOT
1 21-SEP-2017 00:00:00 1024 1 1024
2 21-SEP-2017 00:00:00 512 1 1536
3 21-SEP-2017 00:00:00 512 1 2048
4 21-SEP-2017 00:00:00 1024 2 3072
5 21-SEP-2017 00:00:00 1024 2 4096
6 21-SEP-2017 00:00:00 512 3 4608
7 21-SEP-2017 00:00:00 512 3 5120
8 21-SEP-2017 00:00:00 512 3 5632
9 21-SEP-2017 00:00:00 512 3 6144
10 21-SEP-2017 00:00:00 2048 4 8192
Otherwise model:
select id, created_date, file_size, fs_tot, chunk_id
from test
model
dimension by (row_number() over(order by to_number(substr(file_name, 6))) rn)
measures (id, created_date, file_size, 0 fs_tot, 1 chunk_id)
rules(
fs_tot[any] = file_size[cv()] +
case when fs_tot[cv()-1] + file_size[cv()] <= 2048
then fs_tot[cv()-1] else 0 end,
chunk_id[rn>1] = chunk_id[cv()-1] +
case when fs_tot[cv()-1] + file_size[cv()] <= 2048
then 0 else 1 end
);
ID CREATED_DATE FILE_SIZE FS_TOT CHUNK_ID
1 21-SEP-2017 00:00:00 1024 1024 1
2 21-SEP-2017 00:00:00 512 1536 1
3 21-SEP-2017 00:00:00 512 2048 1
4 21-SEP-2017 00:00:00 1024 1024 2
5 21-SEP-2017 00:00:00 1024 2048 2
6 21-SEP-2017 00:00:00 512 512 3
7 21-SEP-2017 00:00:00 512 1024 3
8 21-SEP-2017 00:00:00 512 1536 3
9 21-SEP-2017 00:00:00 512 2048 3
10 21-SEP-2017 00:00:00 2048 2048 4
HT to Duke Ganote and Stew Ashton for these solutions.