Skip to Main Content
  • Questions
  • SQL request with analytics to fill previous values

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Christophe.

Asked: February 01, 2018 - 2:03 pm UTC

Last updated: February 05, 2018 - 10:59 am UTC

Version: 12.1.0

Viewed 1000+ times

You Asked

Hello,

I have a table with this structure


table values:
(code varchar2(10),
date_value date,
value number);


Example data :

'Code1','15/03/2017',25000
'Code1','06/06/2017',26000
'Code1','18/07/2017',29000
'Code1','21/07/2017',3000
'Code2','18/07/2017',89000
'Code1','05/10/2017',35000


I have also another table "calendar" containing the dates (from 2000 to 2999 for instances, 1 row/day)

table calendar
(cal_date date);


In order to create a graph, I'd like to have a sum by month, and all months with null values to bi filled with previous non-null month value. So for instance the result would be :

01/2017 'Code1' (null)
02/2017 'Code1' (null)
03/2017 'Code1' 25000
04/2017 'Code1' 25000
05/2017 'Code1' 25000
06/2017 'Code1' 26000
07/2017 'Code1' 31000
08/2017 'Code1' 31000
09/2017 'Code1' 31000
10/2017 'Code1' 35000
11/2017 'Code1' 35000
12/2017 'Code1' 35000

01/2017 'Code2' (null)
02/2017 'Code2' (null)
03/2017 'Code2' (null)
04/2017 'Code2' (null)
05/2017 'Code2' (null)
06/2017 'Code2' (null)
07/2017 'Code2' 89000
08/2017 'Code2' 89000
09/2017 'Code2' 89000
10/2017 'Code2' 89000
11/2017 'Code2' 89000
12/2017 'Code2' 89000


Using analytics I think that using lead/lag but which "previous line" using, since in some cases the previous value is null but the "2nd" previous isn't.

Any idea ?

and Chris said...

There's a couple of things you need to do here:

1. Generate a row for each month and code
2. Fill down the "missing" values

For step 1, you can use a partitioned outer join. This looks like:

left join t partition by (t.column)


For every row in the inner table, this generates a new set of values for each value in the partition by column. In your example, every row in calendar appears once for each code value.

To fill down the missing values, use last_value with the ignore nulls clause. This finds the last non-null value according to the partition and order by clauses.

Altogether this looks something like:

create table t (
  code varchar2(10),
  date_value date,
  val number
);

alter session set nls_date_format = 'dd/mm/yyyy';

insert into t values ('Code1','15/03/2017',25000);
insert into t values ('Code1','06/06/2017',26000);
insert into t values ('Code1','18/07/2017',29000);
insert into t values ('Code1','21/07/2017',3000);
insert into t values ('Code2','18/07/2017',89000);
insert into t values ('Code1','05/10/2017',35000);

with dates as (
  select add_months(date'2017-01-01', level-1) dt 
  from   dual
  connect by level <= 12
), monthly_tots as (
  select d.dt, t.code,
         sum(t.val) sm
  from   dates d
  left join t partition by (t.code)
  on     d.dt <= t.date_value 
  and    t.date_value < add_months(d.dt, 1)
  group  by d.dt, t.code
)
  select dt, code,  
         last_value(sm) ignore nulls over (
           partition by code order by dt
         ) val
  from   monthly_tots;

DT           CODE    VAL      
01/01/2017   Code1     <null> 
01/02/2017   Code1     <null> 
01/03/2017   Code1      25000 
01/04/2017   Code1      25000 
01/05/2017   Code1      25000 
01/06/2017   Code1      26000 
01/07/2017   Code1      32000 
01/08/2017   Code1      32000 
01/09/2017   Code1      32000 
01/10/2017   Code1      35000 
01/11/2017   Code1      35000 
01/12/2017   Code1      35000 
01/01/2017   Code2     <null> 
01/02/2017   Code2     <null> 
01/03/2017   Code2     <null> 
01/04/2017   Code2     <null> 
01/05/2017   Code2     <null> 
01/06/2017   Code2     <null> 
01/07/2017   Code2      89000 
01/08/2017   Code2      89000 
01/09/2017   Code2      89000 
01/10/2017   Code2      89000 
01/11/2017   Code2      89000 
01/12/2017   Code2      89000

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Christophe Marchadier, February 01, 2018 - 3:44 pm UTC

Chris Saxon
February 01, 2018 - 3:58 pm UTC

Thanks, but it's much more valuable to use before we answer the question! ;)

partitioned outer join

Duke Ganote, February 01, 2018 - 6:15 pm UTC

I don't recall hearing about "partitioned outer joins", so I couldn't resist trying it.

Because the sample data only has one row with 'Code2', the results may be more disjointed than hoped for. Specifically, the results of the bottom query show Code2 only has values in July (because the sole record is in July).

CREATE TABLE test_calendar1 as
SELECT DATE'2015-01-01'+LEVEL-1 AS cal_date
  FROM dual
  CONNECT BY LEVEL < ( DATE'2019-12-31' - DATE'2015-01-01')+2;
  
CREATE TABLE test_values as
WITH sample_data ( code, date_value, test_value ) AS (
select 'Code1',to_date('15032017','ddmmyyyy'),25000 from dual union all
select 'Code1',to_date('06062017','ddmmyyyy'),26000 from dual union all
select 'Code1',to_date('18072017','ddmmyyyy'),29000 from dual union all
select 'Code1',to_date('21072017','ddmmyyyy'),3000 from dual union all
select 'Code1',to_date('05102017','ddmmyyyy'),35000 from dual union all
select 'Code2',to_date('18072017','ddmmyyyy'),89000 from dual
) SELECT * FROM sample_data;

SELECT code, count(*), min(cal_date), max(cal_date)
FROM (
select code, cal_date, test_value
  from test_values v
  join test_calendar1 c
       partition by (code)
    ON trunc(date_value,'MM') = trunc(cal_date,'MM')
) WHERE cal_date between date'2017-06-06' and date'2017-07-18'
 group by code
 order by 1,2

CODE    COUNT(*)   MIN(CAL_DATE)   MAX(CAL_DATE)
Code1       61     06-JUN-17       18-JUL-17
Code2       18     01-JUL-17       18-JUL-17

2 rows selected.



Chris Saxon
February 02, 2018 - 11:17 am UTC

Make it an outer join to avoid this issue:

SELECT code, count(*), min(cal_date), max(cal_date)
FROM (
select code, cal_date, test_value
  from test_calendar1 c
  left join test_values v partition by (code)
    ON trunc(date_value,'MM') = trunc(cal_date,'MM')
) WHERE cal_date between date'2017-06-06' and date'2017-07-18'
 group by code
 order by 1,2;

CODE    COUNT(*)   MIN(CAL_DATE)          MAX(CAL_DATE)          
Code1           61 06-JUN-2017 00:00:00   18-JUL-2017 00:00:00   
Code2           43 06-JUN-2017 00:00:00   18-JUL-2017 00:00:00   

Model Dance !!!

Rajeshwaran, Jeyabal, February 02, 2018 - 4:38 am UTC

demo@ORA12C> select *
  2  from (
  3  select code, trunc(date_value,'mm') x ,sum(val) val
  4  from t
  5  group by code, trunc(date_value,'mm')
  6       )
  7  model
  8    partition by (code)
  9    dimension by (x)
 10    measures( val , val new_val)
 11    rules(
 12      new_val[ for x from to_date('01-Jan-2017','dd-mon-yyyy')  to
 13                    to_date('01-dec-2017','dd-mon-yyyy')
 14                    increment numtoyminterval(1,'month') ] =
 15                      nvl( val[cv()] , new_val[ add_months(cv(x),-1) ] )
 16        )
 17  order by code, x
 18  /

CODE       X                  VAL    NEW_VAL
---------- ----------- ---------- ----------
Code1      01-JAN-2017
Code1      01-FEB-2017
Code1      01-MAR-2017      25000      25000
Code1      01-APR-2017                 25000
Code1      01-MAY-2017                 25000
Code1      01-JUN-2017      26000      26000
Code1      01-JUL-2017      32000      32000
Code1      01-AUG-2017                 32000
Code1      01-SEP-2017                 32000
Code1      01-OCT-2017      35000      35000
Code1      01-NOV-2017                 35000
Code1      01-DEC-2017                 35000
Code2      01-JAN-2017
Code2      01-FEB-2017
Code2      01-MAR-2017
Code2      01-APR-2017
Code2      01-MAY-2017
Code2      01-JUN-2017
Code2      01-JUL-2017      89000      89000
Code2      01-AUG-2017                 89000
Code2      01-SEP-2017                 89000
Code2      01-OCT-2017                 89000
Code2      01-NOV-2017                 89000
Code2      01-DEC-2017                 89000

24 rows selected.

demo@ORA12C>

partitioned outer join, part ii

Duke Ganote, February 02, 2018 - 5:39 pm UTC

OK, I think I get the 'partitioning outer join' now; I don't have to cross join to the calendar for each CODE value. Nice.

Also, I sometimes want to fill in all the values, even if the first row has NULL, so I'll 'push up' the first value found, and 'pull down' the rest. For example, the 'Code2' only has one row with a value, but I'll push it up too, like so:

WITH sample_data ( code, date_value, test_value ) AS (
SELECT 'Code1', DATE'2017-03-15', 25000 FROM DUAL UNION ALL
SELECT 'Code1', DATE'2017-06-06', 26000 FROM DUAL UNION ALL
SELECT 'Code1', DATE'2017-07-18', 29000 FROM DUAL UNION ALL
SELECT 'Code1', DATE'2017-07-21',  3000 FROM DUAL UNION ALL
SELECT 'Code1', DATE'2017-10-05', 35000 FROM DUAL UNION ALL
SELECT 'Code2', DATE'2017-07-18', 89000 FROM DUAL
),
startstop AS (
SELECT min(date_value) as starter
     , max(date_value) as finisher
  FROM sample_data
),
dailydating AS (
SELECT starter + (level-1) as dt
  FROM startstop
 CONNECT BY starter + (level-1) <= finisher
), results AS (
SELECT code, coalesce(date_value,dt) as dt
     , test_value
     , LAST_VALUE(test_value IGNORE NULLS) OVER
        (PARTITION BY code ORDER BY dt ) pulldown
     , COALESCE(NULL
       , LAST_VALUE(test_value IGNORE NULLS) OVER
          (PARTITION BY code ORDER BY dt ASC) -- down first
       , LAST_VALUE(test_value IGNORE NULLS) OVER
          (PARTITION BY code ORDER BY dt DESC) -- else up
       ) AS pullmepushyou
     , count(*)over(partition by code) cnt
  FROM dailydating
  LEFT OUTER
  JOIN sample_data
       PARTITION BY (code)
    ON dt = date_value
) 
SELECT code, dt, test_value, pulldown, pullmepushyou, cnt
  FROM results 
 WHERE dt BETWEEN date'2017-07-14' AND date'2017-07-20'
 ORDER BY code, dt;


CODE  DT        TEST_VALUE   PULLDOWN PULLMEPUSHYOU        CNT
----- --------- ---------- ---------- ------------- ----------
Code1 14-JUL-17                 26000         26000        205
Code1 15-JUL-17                 26000         26000        205
Code1 16-JUL-17                 26000         26000        205
Code1 17-JUL-17                 26000         26000        205
Code1 18-JUL-17      29000      29000         29000        205
Code1 19-JUL-17                 29000         29000        205
Code1 20-JUL-17                 29000         29000        205

Code2 14-JUL-17                               89000        205
Code2 15-JUL-17                               89000        205
Code2 16-JUL-17                               89000        205
Code2 17-JUL-17                               89000        205
Code2 18-JUL-17      89000      89000         89000        205
Code2 19-JUL-17                 89000         89000        205
Code2 20-JUL-17                 89000         89000        205

14 rows selected.

Chris Saxon
February 05, 2018 - 10:59 am UTC

Yep, you've got it.

Nice work on the pull me/push you extension.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.