Skip to Main Content
  • Questions
  • SQL Window function to skip and add rows dynamically

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, hariharaarumugam.

Asked: December 12, 2018 - 8:44 am UTC

Last updated: June 15, 2023 - 12:45 pm UTC

Version: 11g

Viewed 10K+ times! This question is

You Asked



create table test1 (col1 number,col2 varchar2(5),col3 date,col4 number,col5 number,col6 number);

--Positive scenario sample data

insert into test1 values(1,'A',trunc(sysdate),0,5.5,2);

insert into test1 values(1,'A',trunc(sysdate)+1,0,3.5,3);

insert into test1 values(1,'A',trunc(sysdate)+2,0,2.5,0);

insert into test1 values(1,'A',trunc(sysdate)+3,0,6.5,0);

insert into test1 values(1,'A',trunc(sysdate)+4,0,8,0);

--Negative scenario Sample data (not working)

insert into test1 values(2,'B',trunc(sysdate),0,5.5,2);

insert into test1 values(2,'B',trunc(sysdate),1,3.5,3);

insert into test1 values(2,'B',trunc(sysdate)+1,0,3.5,5);

insert into test1 values(2,'B',trunc(sysdate)+2,0,2.5,2);

insert into test1 values(2,'B',trunc(sysdate)+3,0,6.5,0);

insert into test1 values(2,'B',trunc(sysdate)+4,0,8,0);

insert into test1 values(2,'B',trunc(sysdate)+4,1,7.9,0);

insert into test1 values(2,'B',trunc(sysdate)+5,0,10.5,0);

insert into test1 values(2,'B',trunc(sysdate)+6,0,1.5,0);

/*Working scenario */
--Query fetches for positive scenario where always col4 is 0 for a corresponding combination of col1 and col2. there is no issue
select test1.*,sum(col5) over (partition by col1,col2 order by col3 desc,col4 desc rows col6 preceding ) derived_field from test1
where col1=1 order by col1,col2,col3,col4;


/* Negative Scenario */
--This query needs to be modified so that it should consider only records where col4 is 0. i.e if col6 is 5 and there is a record in preceding 5 rows where col4 is 1
--then that record needs to be ignored and it has to consider the next available 0th record

select test1.*,sum(col5) over (partition by col1,col2 order by col3 desc,col4 desc rows col6 preceding )derived_field from test1
where col1=2 order by col1,col2,col3,col4;


Hi TOM,

I have a typical scenario where we need to ignore n records in between and add n records when performing window functions

In the above example, there are two sets of data differentiated by values of Col1 (values being 1 and 2). For the first scenario where always col4 is 0 for every row where col1 is 1. The SQL window functions works as expected. Whereas the second scenario is the typical one. I have tried my best to illustrate the problem statement below. Please let me know if any more information is needed

COL1 COL2 COL3 COL4 COL5 COL6
2 B 12-Dec-18 0 5.5 2
2 B 12-Dec-18 1 3.5 3
2 B 13-Dec-18 0 3.5 5
2 B 14-Dec-18 0 2.5 2
2 B 15-Dec-18 0 6.5 0
2 B 16-Dec-18 0 8 0
2 B 16-Dec-18 1 7.9 0
2 B 17-Dec-18 0 10.5 0
2 B 18-Dec-18 0 1.5 0


From the above table I need to derive a new field. That field will be sum(col5) for the col6 number of rows ahead with some conditions. like the output shown below

COL1 COL2 COL3 COL4 COL5 COL6 Derived_field
2 B 12-DEC-18 0 5.5 2 11 -- (5.5+3.5(of date 13/dec)+2.5)
2 B 12-DEC-18 1 3.5 3 16 -- (3.5+3.5+2.5+6.5)
2 B 13-DEC-18 0 3.5 5 32.5
2 B 14-DEC-18 0 2.5 2 17
2 B 15-DEC-18 0 6.5 0 6.5
2 B 16-DEC-18 0 8 0 8
2 B 16-DEC-18 1 7.9 0 7.9
2 B 17-DEC-18 0 10.5 0 10.5
2 B 18-DEC-18 0 1.5 0 1.5

I have tried with the following query, but not sure how to ignore rows where col4=1 and dynamically add 1 more row if there is any ignored row in between. The critical part is there may be scenarios where col4 will have values like 2,3,4 where all of them needs to ignored and equivalent number of records where col4=0 needs to be added

select test1.*,sum(col5) over (partition by col1,col2 order by col3 desc,col4 desc rows col6 preceding )derived_field from test1
where col1=2 order by col1,col2,col3,col4;


with LiveSQL Test Case:

and Chris said...

OK, so you want the sum of:

- The current row
- COL6 following rows that have COL4 = 0?

If so, you can find how many of the next COL6 rows have the value COL4 = 1. Then add this to the number of rows you need to look ahead.

Calculate the sum for the following row + COL6 + the count above rows. Exclude the rows you want to skip by only returning COL5 if COL4 = 0.

Stick in some null handling and add this to the current COL5 value.

Giving something like:

with skips as (
  select t.*,
         col6 + count ( case when col4 <> 0 then 1 end ) over ( 
           partition by col1, col2
           order by col3, col4 
           rows between 1 following and col6 following
         ) rws
  from   test1 t
)
  select s.*,
         col5 +
           nvl ( sum ( case when col4 = 0 then col5 end ) over (
           partition by col1, col2
           order by col3, col4 
           rows between 1 following and rws following
         ), 0 ) d2
  from   skips s
  order  by col1, col2, col3;

COL1   COL2   COL3                   COL4   COL5   COL6   RWS   D2     
     1 A      12-DEC-2018 00:00:00        0    5.5      2     2   11.5 
     1 A      13-DEC-2018 00:00:00        0    3.5      3     3   20.5 
     1 A      14-DEC-2018 00:00:00        0    2.5      0     0    2.5 
     1 A      15-DEC-2018 00:00:00        0    6.5      0     0    6.5 
     1 A      16-DEC-2018 00:00:00        0      8      0     0      8 
     2 B      12-DEC-2018 00:00:00        0    5.5      2     3   11.5 
     2 B      12-DEC-2018 00:00:00        1    3.5      3     3     16 
     2 B      13-DEC-2018 00:00:00        0    3.5      5     6   32.5 
     2 B      14-DEC-2018 00:00:00        0    2.5      2     2     17 
     2 B      15-DEC-2018 00:00:00        0    6.5      0     0    6.5 
     2 B      16-DEC-2018 00:00:00        0      8      0     0      8 
     2 B      16-DEC-2018 00:00:00        1    7.9      0     0    7.9 
     2 B      17-DEC-2018 00:00:00        0   10.5      0     0   10.5 
     2 B      18-DEC-2018 00:00:00        0    1.5      0     0    1.5 


Note: this only works if there are no COL4 = 1 rows next to each other. Or at least, the number of consecutive COL4 = 1 rows is <= the number of rows you need to skip.

If you can have consecutive rows with COL4 = 1, you can have a situation where you look COL6 rows ahead to a row with COL4 = 1. But this has N more rows immediately after you need to skip. This solution doesn't consider these extra rows.

Rating

  (12 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks Chris!!

Hari, December 13, 2018 - 3:05 am UTC

It is really helpful and I am trying to apply that. However the volume we are accessing is very huge. I will have to see after performance testing. But still the logic works well, thanks a lot for your help
Chris Saxon
December 13, 2018 - 1:30 pm UTC

Great :)

mind twister

Racer I., December 13, 2018 - 11:36 am UTC

Hi,

how about this :

with 
MaxCol4 AS (
  Select MAX(Col4) + 1 MC from test1),  
FullJoin AS (
  SELECT col1, col2, col3, mv.nc col4n, col4, col5, col6 from test1
    join (select rownum - 1 nc from dual connect by rownum <= (select MC from MaxCol4)) mv ON (mv.nc >= col4)),
Sums AS (        
  select
    fj.*, sum(col5) over
      (partition by col1,col2,col4n order by col3 desc, col4 desc rows col6 preceding) de --between 2 following and
  from FullJoin fj)
SELECT *
FROM Sums  
where col4n = col4
order by col3, col4n, col4


The FullJoin replicates each col4=X to col4=X+1-MaxCol4. If you know maxCol4 you can skip the first full scan.
Would be easier if (as apparently inteded) col4=X didn't sum over col4=0-X. If it was just X :

select
  test1.*, sum(col5) over (partition by col1,col2,col4 order by col3 desc rows col6 preceding )de
from test1
order by col3, col4


Also : On 12c you get match_recognize...

regards
Chris Saxon
December 13, 2018 - 1:36 pm UTC

Not sure match_recognize helps here.

And if I've understood correctly, for each row you need to show the sum of the following COL4=0 values. So partitioning by COL4 won't help.

mind reader

Racer I., December 13, 2018 - 3:03 pm UTC

Hi,

The example didn't include cases with col4 > 1 (mentioned at the end) or a col4 > 0 within col6 of one so wasn't sure if the sum is over 0,X or 0-X or just X+0s.
The upper version would be for 0-X (with col4n). The lower version just for X (col4). But 0,X/X+0s (the likeliest intentions I now realize) doesn't work with this.
The upper version might have been faster with a big table, since it would only be scanned once. On the other hand a high maxCol4 could have slowed it down again.

as for match_recognize I was thinking along this line :
select mr.col1, mr.col2, mr.col3, mr.col4, mr.col5, mr.col6, mr.df, mr.start_d, mr.mno, mr.cf
from test1 MATCH_RECOGNIZE (
         PARTITION BY col1,col2
         ORDER BY col3, col4
         MEASURES  STR.col5 + SUM(MV.col5) df,
                   STR.COL3 AS start_d,
                   MATCH_NUMBER() AS mno,
                   CLASSIFIER() AS cf
         ONE ROW PER MATCH
         AFTER MATCH SKIP TO NEXT ROW
         PATTERN (STRT MV{0,STRT.col6})
         DEFINE
           MV AS MV.col4 BETWEEN 0 AND STR.col4 -- 0-X
           MV AS MV.col4 IN (0, STR.col4) -- 0,X
           MV AS MV.col4 = 0 -- X+0s
           MV AS MV.col4 = STR.col4 -- X
       ) MR
order by col1, col2, col3, col4


But I get ora-62501 so you can't reference your values in the pattern.

regards,
Chris Saxon
December 13, 2018 - 3:33 pm UTC

Not sure what you're getting at. In the example, the row for COL4 = 1 on 12 Dec sums the following rows with COL4 = 0. So you can't partition by this column.

Yeah, you can't have dynamic bounds for the regex quantifiers in MR.

This might be one for the *gasp* the model clause.

new model army

Racer I., December 14, 2018 - 7:36 am UTC

Hi,

It works with match_recognize :

select col1, col2 /*, col3, col4, col5, col6 */, mr.start_d, mr.df --, mr.mno, mr.cf
from test1 MATCH_RECOGNIZE (
         PARTITION BY col1,col2
         ORDER BY col3, col4
         MEASURES  STRT.col5 + NVL(SUM(MV.col5), 0) df,
                   STRT.COL3 AS start_d,
                   MATCH_NUMBER() AS mno,
                   CLASSIFIER() AS cf
         --ALL ROWS PER MATCH
         ONE ROW PER MATCH
         AFTER MATCH SKIP TO NEXT ROW
         PATTERN (STRT (NMV|MV)*)
         DEFINE
           NMV AS NMV.col4 != 0,
--           MV AS MV.col4 BETWEEN 0 AND STR.col4 -- 0-X
--           MV AS MV.col4 IN (0, STR.col4) -- 0,X
           MV AS MV.col4 = 0 AND COUNT(MV.*) <= STRT.Col6 -- X+0s
--           MV AS MV.col4 = STR.col4 -- X
       ) MR
order by col1, col2, start_d --mno, col3, col4

COL1 COL2 START_D DF
2 B 14/12/2018 11,5
2 B 14/12/2018 16
2 B 15/12/2018 32,5
2 B 16/12/2018 17
2 B 17/12/2018 6,5
2 B 18/12/2018 8
2 B 18/12/2018 7,9
2 B 19/12/2018 10,5
2 B 20/12/2018 1,5


Unless the OP gets back saying that X+0 is not the intended pattern the other variants can be ignored.

regards,
Connor McDonald
December 17, 2018 - 2:03 am UTC

nice stuff

far horizons

Racer I., December 14, 2018 - 8:44 am UTC

Hi,

Unfortunately a fly in the ointment :
Currently the pattern will match up to col6 MVs AND all subsequent NMVs to the end of the table. Unless MR optimizes this away seeing that NMV is not referenced in MEASURES this would probably error out with memory overflow.
Stop gap measure would be to limit the pattern size to some reasonable look ahead :

(NMV|MV){0,100}

regards,

catch all

Racer I., December 14, 2018 - 9:41 am UTC

Hi,

This might work :

PATTERN ((STRT (NMV|MV)* MV) | STRT)

regards,

Optimizing MATCH_RECOGNIZE

Stew Ashton, December 14, 2018 - 3:52 pm UTC

I think it takes only a small change to make the match stop once all the "good" rows are found.
select * from test1
match_recognize(
  partition by col1, col2 order by col3
  measures a.col3 col3, a.col4 col4, a.col5 col5, a.col6 col6,
    sum(ab.col5) sum_col5
  after match skip to next row
  pattern (a (b|x)*)
  subset ab=(a,b)
  define b as count(b.*) <= a.col6 and col4 = 0,
    x as count(b.*) < a.col6
);

      COL1 COL2  COL3              COL4       COL5       COL6   SUM_COL5
---------- ----- ----------- ---------- ---------- ---------- ----------
         1 A     14-dec-2018          0        5.5          2       11.5
         1 A     15-dec-2018          0        3.5          3       20.5
         1 A     16-dec-2018          0        2.5          0        2.5
         1 A     17-dec-2018          0        6.5          0        6.5
         1 A     18-dec-2018          0          8          0          8
         2 B     14-dec-2018          0        5.5          2       11.5
         2 B     14-dec-2018          1        3.5          3         16
         2 B     15-dec-2018          0        3.5          5       32.5
         2 B     16-dec-2018          0        2.5          2         17
         2 B     17-dec-2018          0        6.5          0        6.5
         2 B     18-dec-2018          0          8          0          8
         2 B     18-dec-2018          1        7.9          0        7.9
         2 B     19-dec-2018          0       10.5          0       10.5
         2 B     20-dec-2018          0        1.5          0        1.5
Good to see more people trying out MATCH_RECOGNIZE!

Best regards,
Stew
Chris Saxon
December 14, 2018 - 4:59 pm UTC

Nice work Stew.

using lateral

A reader, December 20, 2018 - 4:13 pm UTC

select o.*,  case when sum_col5 is null then o.col5 else o.col5 + sum_col5 end  sum_col5 
from test1 o,
lateral
(
    select sum(case when col4=0 then col5 END) as sum_col5
    from
    (
        select row_number()over(order by col3) as rn, col5, col4
        from test1 
        where col1=o.col1 and col2=o.col2 and col3>o.col3 and col4=0
    )x
    where rn<=o.col6
)




COL1 COL2 COL3 COL4 COL5 COL6 SUM_COL5
1 A 20-DEC-18 0 5.5 2 11.5
1 A 21-DEC-18 0 3.5 3 20.5
1 A 22-DEC-18 0 2.5 0 2.5
1 A 23-DEC-18 0 6.5 0 6.5
1 A 24-DEC-18 0 8 0 8
2 B 20-DEC-18 0 5.5 2 11.5
2 B 20-DEC-18 1 3.5 3 16
2 B 21-DEC-18 0 3.5 5 32.5
2 B 22-DEC-18 0 2.5 2 17
2 B 23-DEC-18 0 6.5 0 6.5
2 B 24-DEC-18 0 8 0 8
2 B 24-DEC-18 1 7.9 0 7.9
2 B 25-DEC-18 0 10.5 0 10.5
2 B 26-DEC-18 0 1.5 0 1.5

Chris Saxon
December 20, 2018 - 5:17 pm UTC

Neat, hadn't thought of that

using Model clause

Rajeshwaran, Jeyabal, January 07, 2019 - 6:38 am UTC

Sorry for being late in this game, here is an approach using Model clause.

demo@ORA12C> select *
  2  from test1
  3  model
  4    partition by (col1)
  5    dimension by ( row_number() over( partition by col1 order by col3,col4 ) r )
  6    measures( col2,col3,col4,col5,col6, 0 col7, 0 new_val )
  7    rules(
  8      col7[any] order by r = col6[cv(r)] +
  9                    count( case when col4 <>0 then 1 end ) [ r between cv(r)+1 and cv(r)+col6[cv(r)] ] ,
 10      new_val[any] order by r = col5[cv(r)] +
 11                    nvl( sum( case when col4 = 0 then col5 end ) [ r between cv(r)+1 and cv(r)+col7[cv(r)] ] ,0 )
 12        )
 13  /

      COL1          R COL2  COL3              COL4       COL5       COL6       COL7    NEW_VAL
---------- ---------- ----- ----------- ---------- ---------- ---------- ---------- ----------
         1          1 A     13-DEC-2018          0        5.5          2          2       11.5
         1          2 A     14-DEC-2018          0        3.5          3          3       20.5
         1          3 A     15-DEC-2018          0        2.5          0          0        2.5
         1          4 A     16-DEC-2018          0        6.5          0          0        6.5
         1          5 A     17-DEC-2018          0          8          0          0          8
         2          1 B     13-DEC-2018          0        5.5          2          3       11.5
         2          2 B     13-DEC-2018          1        3.5          3          3         16
         2          3 B     14-DEC-2018          0        3.5          5          6       32.5
         2          4 B     15-DEC-2018          0        2.5          2          2         17
         2          5 B     16-DEC-2018          0        6.5          0          0        6.5
         2          6 B     17-DEC-2018          0          8          0          0          8
         2          7 B     17-DEC-2018          1        7.9          0          0        7.9
         2          8 B     18-DEC-2018          0       10.5          0          0       10.5
         2          9 B     19-DEC-2018          0        1.5          0          0        1.5

14 rows selected.

demo@ORA12C>

Connor McDonald
January 07, 2019 - 7:07 am UTC

As they say "Better late than never" :-)

minor general

Racer I., January 07, 2019 - 10:10 am UTC

Hi,

The last model version is possibly not fully ok.
create table test1 (col1 number,col2 varchar2(5),col3 date,col4 number,col5 number,col6 number);

insert into test1 values(1,'A',trunc(sysdate),0,3,2);
insert into test1 values(1,'A',trunc(sysdate)+1,1,4,0);
insert into test1 values(1,'A',trunc(sysdate)+2,0,5,0);
insert into test1 values(1,'A',trunc(sysdate)+3,1,6,0);
insert into test1 values(1,'A',trunc(sysdate)+4,0,7,0);


This should give 15 for the first row (3+5+7) but the model as is gives only 8.
I guess its because col7 counts non-0s only up to col6. This is not recursive to skip over additional ones beyond that.
Can model be made recursive?

regards,
Chris Saxon
January 09, 2019 - 2:21 pm UTC

You can use model to generate rows. So it may be possible for each row to generate an extra COL6 rows after it. Then iterate through these, calculating the running total.

I can't figure out how to do this though ;)

Generate rows based on column value

Narendra, June 15, 2023 - 12:10 am UTC

Hello,

I have to declare beforehand that I am asking this question purely for the sake of learning MATCH_RECOGNIZE. I am still a noice so apologies in advance if I have missed something obvious.

If I have to generate extra rows from a result set based on value in a column then LATERAL does the job as below

with src as 
(
select 'A' as col1, 3 as col2 from dual union all
select 'B' as col1, 5 as col2 from dual union all
select 'C' as col1, 7 as col2 from dual
)
select src.*
from src, lateral(select level from dual connect by level <= src.col2) ;


How can I achieve the same result using MATCH_RECOGNIZE? I could only come up with below which I thought makes sense but it gives ORA-62517: Next match starts at the same point last match started. I guess I know why it fails but is there a way to achieve this using MATCH_RECOGNIZE?

with src as 
(
select 'A' as col1, 3 as col2 from dual union all
select 'B' as col1, 5 as col2 from dual union all
select 'C' as col1, 7 as col2 from dual
)
select MR.*
from src
MATCH_RECOGNIZE(
ORDER BY col1
MEASURES col2 as col21,
         COUNT(STRT.*) as cnt,
         MATCH_NUMBER() AS match_num,
         CLASSIFIER() AS var_match
     ALL ROWS PER MATCH
     AFTER MATCH SKIP TO LAST STRT
     PATTERN (STRT+)
     DEFINE STRT as STRT.col2 >= COUNT(STRT.col2) and STRT.col1 = NVL(PREV(STRT.col1), STRT.col1)
) MR;

Chris Saxon
June 15, 2023 - 12:45 pm UTC

You can't really. Pattern matching walks through the data set according to the ORDER BY. While doing so it searches for sequences of rows that match the PATTERN and returns these.

Once you're on the last row, there's no more data to process. Thus MR can't generate any further rows!

After finding a group, MR can go back to the point specified by the AFTER MATCH clause; this has to be after the first row of the previous group though.

Something like the below will return the next N (col2 value) rows for each. There are only three rows in the table, so it can return a maximum of three rows for each input.

If you want more than this you need lateral joins/recursive with/model

with src as 
(
  select 'A' as col1, 3 as col2 from dual union all
  select 'B' as col1, 5 as col2 from dual union all
  select 'C' as col1, 7 as col2 from dual
)
select MR.*
from src
match_recognize(
  order by col1
  measures 
    init.col1 as init_col1,
    init.col2 as init_col2,
    count(*) row#,
    match_number() as match_num,
    classifier() as var_match
  all rows per match
  after match skip to next row
  pattern ( init tot* )
  define 
    tot as col2 > count(*)
) mr;

C I  INIT_COL2       ROW#  MATCH_NUM VAR_MATCH       COL2
- - ---------- ---------- ---------- --------- ----------
A A          3          1          1 INIT               3
B A          3          2          1 TOT                5
C A          3          3          1 TOT                7
B B          5          1          2 INIT               5
C B          5          2          2 TOT                7
C C          7          1          3 INIT               7

Roe generation using MATCH_RECOGNIZE

Narendra, June 15, 2023 - 10:11 pm UTC

Thanks @Chris...appreciate your quick response.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.