Perfect as always!
Richard Smith, March 30, 2003 - 3:54 pm UTC
Exactly what I needed.
Fill missing data with values of previous row
Edgar Chupit, August 20, 2003 - 12:08 pm UTC
Hello Tom!
Once again great article, but can we extend a little bit. What if I need to fill "missing data" with values from previous (first not missing) row?
Suppose I have:
SQL> create table t2 (val number, indx number);
Table created
SQL> insert into t2 values ( 1, 1 );
1 row inserted
SQL> insert into t2 values ( 13, 10 );
1 row inserted
SQL> insert into t2 values ( 12, 14 );
1 row inserted
SQL> insert into t2 values ( 11, 20 );
1 row inserted
Now we have a select with "missing data":
SQL> with all_values as (
2 select rownum as num
3 from all_objects
4 where rownum < 30 )
5 select t2.*, av.num
6 from t2, all_values av
7 where ( t2.indx(+) = av.num )
8 order by av.num
9 /
VAL INDX NUM
--- ---- ---
1 1 1
2
3
4
5
6
7
8
9
13 10 10
11
12
13
12 14 14
15
16
17
18
19
11 20 20
21
22
23
24
25
26
27
28
29
29 rows selected
and we want to fill "missing data" with values of last row that had data, in pl/sql this can be done easily like this:
SQL> declare
2 cursor c is
3 with all_values as (
4 select rownum as num
5 from all_objects
6 where rownum < 30 )
7 select val, indx, av.num
8 from t2, all_values av
9 where ( t2.indx(+) = av.num )
10 order by av.num;
11 val number;
12 indx number;
13 begin
14 dbms_output.put_line( 'VAL INDX NUM ' );
15 dbms_output.put_line( '----- ------ -----' );
16 for r in c loop
17 if( r.val is not null ) then
18 val := r.val;
19 indx := r.indx;
20 end if;
21 dbms_output.put_line( to_char(nvl(r.val,val),'00')||' '||to_char(nvl(r.indx,indx),'00')||' '||to_char(r.num,'00') );
22 end loop;
23 end;
24 /
VAL INDX NUM
----- ------ -----
01 01 01
01 01 02
01 01 03
01 01 04
01 01 05
01 01 06
01 01 07
01 01 08
01 01 09
13 10 10
13 10 11
13 10 12
13 10 13
12 14 14
12 14 15
12 14 16
12 14 17
12 14 18
12 14 19
11 20 20
11 20 21
11 20 22
11 20 23
11 20 24
11 20 25
11 20 26
11 20 27
11 20 28
11 20 29
PL/SQL procedure successfully completed
Can this be done in pure SQL (so we can add further data processing) or this kinds of data manipulations is possible only on pl/sql?
Thank you!
August 21, 2003 - 3:36 pm UTC
qed :)
ops$tkyte@ORA920> with all_values as (
2 select rownum as num
3 from all_objects
4 where rownum < 30 )
5 select t2.*, av.num,
6 substr( max( decode( t2.val, null, null,
to_char(av.num,'fm00000000') || t2.val ))
7 over (order by av.num), 9 ) t2_val,
8 substr( max( decode( t2.val, null, null,
to_char(av.num,'fm00000000') || t2.indx))
9 over (order by av.num), 9 ) t2_indx
10 from t2, all_values av
11 where ( t2.indx(+) = av.num )
12 order by av.num
13 /
VAL INDX NUM T2_VAL T2_INDX
---------- ---------- ---------- ---------- ----------
1 1 1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
13 10 10 13 10
11 13 10
12 13 10
13 13 10
12 14 14 12 14
15 12 14
16 12 14
17 12 14
18 12 14
19 12 14
11 20 20 11 20
21 11 20
22 11 20
23 11 20
24 11 20
25 11 20
26 11 20
27 11 20
28 11 20
29 11 20
29 rows selected.
similar approach
Joe, October 02, 2003 - 11:01 am UTC
Tom,
This is a similar requirement, but easier.
I need to ensure that all 24 hours are included in a report, even when there is no data for them. I need to know the number of occurences in an hour for a paritcular value.
create table master (pk number);
create table detail (fk number, value number);
insert into master values (1);
insert into master values (2);
insert into master values (3);
insert into master values (4);
insert into detail value (1, 10);
insert into detail value (1, 10);
insert into detail value (1, 10);
insert into detail value (1, 10);
insert into detail value (2, 10);
from SQL I can do this like:
SELECT master.pk, detail.event_count
FROM master,
(SELECT detail.fk,count(*) event_count from detail
WHERE value=:param
GROUP by key
) detail
where master.key=detail.key(+);
Can this be done without the subquery? Oracle Discoverer has some limitations that make this impossible unless I can re-write it without a subquery. (ie, I think I have a legitimate reason for asking for something that would otherwise not make sense). I read enough of your stuff to understand the frustration with "answer my question, but don't give me this answer" that you often get.
October 02, 2003 - 3:52 pm UTC
ops$tkyte@ORA920> variable param number
ops$tkyte@ORA920> exec :param := 10
PL/SQL procedure successfully completed.
ops$tkyte@ORA920>
ops$tkyte@ORA920> SELECT master.pk, detail.event_count
2 FROM master,
3 (SELECT detail.fk,count(*) event_count from detail
4 WHERE value=:param
5 GROUP by fk
6 ) detail
7 where master.pk=detail.fk(+);
PK EVENT_COUNT
---------- -----------
1 4
2 1
3
4
ops$tkyte@ORA920>
ops$tkyte@ORA920> select master.pk, count(detail.fk)
2 from master, detail
3 where :param = detail.value(+)
4 and master.pk = detail.fk(+)
5 group by master.pk;
PK COUNT(DETAIL.FK)
---------- ----------------
1 4
2 1
3 0
4 0
ops$tkyte@ORA920>
PARTITION OUTER JOIN
Adrian Billington, October 28, 2004 - 4:31 am UTC
Tom
10G has the "partition outer join" that simplifies your first example by removing the need for a second join to customers. Following is a small example from my htmldb.oracle.com account ( i.e. I can't post the sqlplus output as I don't have a 10G )...
create table customers
( name varchar2(30)
, dt date
, amt number(5) );
The following inserts data for SYS, ORDSYS and PUBLIC between April and September 2004 ( April 2004 must be the month when the HTMLDB database my account sits on was created ).
insert into customers
select owner
, trunc(created) + mod(rownum,6)
, trunc( object_id/rownum )
from all_objects
where created > trunc(sysdate,'year')
and rownum <= 1000;
The plan from running a version of your SQL ( changed to use a ROW_GENERATOR function as can't explain ALL_OBJECTS ):-
select real_name
, real_dt
, nvl(amt,0)
, nvl(sum(amt) over
(partition by real_name order by real_dt ),0) cum_amt
from (
select name
, trunc(dt,'mm') dt
, sum(amt) amt
from customers
group by
name
, trunc(dt,'mm')
) A
, (
select real_name
, real_dt
from (
select distinct name as real_name
from customers
)
, (
select /*+ cardinality( t 12 ) */
add_months( to_date('01-jan-2004'), rownum-1 ) as real_dt
from table(row_generator(12)) t
)
) B
where b.real_dt = a.dt (+)
and b.real_name = a.name (+);
REAL_NAME REAL_DT AMT CUM_AMT
ORDSYS 01-Jan-04 0 0
ORDSYS 01-Feb-04 0 0
ORDSYS 01-Mar-04 0 0
ORDSYS 01-Apr-04 1173 1173
ORDSYS 01-May-04 1134 2307
ORDSYS 01-Jun-04 1170 3477
ORDSYS 01-Jul-04 1170 4647
ORDSYS 01-Aug-04 1138 5785
ORDSYS 01-Sep-04 1147 6932
ORDSYS 01-Oct-04 0 6932
ORDSYS 01-Nov-04 0 6932
ORDSYS 01-Dec-04 0 6932
PUBLIC 01-Jan-04 0 0
PUBLIC 01-Feb-04 0 0
PUBLIC 01-Mar-04 0 0
PUBLIC 01-Apr-04 3914 3914
PUBLIC 01-May-04 4016 7930
PUBLIC 01-Jun-04 4556 12486
PUBLIC 01-Jul-04 3319 15805
PUBLIC 01-Aug-04 4213 20018
PUBLIC 01-Sep-04 3467 23485
PUBLIC 01-Oct-04 0 23485
PUBLIC 01-Nov-04 0 23485
PUBLIC 01-Dec-04 0 23485
SYS 01-Jan-04 0 0
SYS 01-Feb-04 0 0
SYS 01-Mar-04 0 0
SYS 01-Apr-04 9060 9060
SYS 01-May-04 11944 21004
SYS 01-Jun-04 9564 30568
SYS 01-Jul-04 10514 41082
SYS 01-Aug-04 9199 50281
SYS 01-Sep-04 9841 60122
SYS 01-Oct-04 0 60122
SYS 01-Nov-04 0 60122
SYS 01-Dec-04 0 60122
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12000 | 691K| 255 |
| 1 | WINDOW SORT | | 12000 | 691K| 255 |
| 2 | HASH JOIN RIGHT OUTER | | 12000 | 691K| 79 |
| 3 | VIEW | | 1000 | 36000 | 4 |
| 4 | SORT GROUP BY | | 1000 | 39000 | 4 |
| 5 | TABLE ACCESS FULL | CUSTOMERS | 1000 | 39000 | 3 |
| 6 | VIEW | | 12000 | 269K| 74 |
| 7 | MERGE JOIN CARTESIAN | | 12000 | 269K| 74 |
| 8 | VIEW | | 12 | 72 | 24 |
| 9 | COUNT | | | | |
| 10 | COLLECTION ITERATOR PICKLER FETCH| ROW_GENERATOR | | | |
| 11 | BUFFER SORT | | 1000 | 17000 | 74 |
| 12 | VIEW | | 1000 | 17000 | 4 |
| 13 | SORT UNIQUE | | 1000 | 17000 | 4 |
| 14 | TABLE ACCESS FULL | CUSTOMERS | 1000 | 17000 | 3 |
-----------------------------------------------------------------------------------------
Following uses PARTITION OUTER JOIN syntax...
with dates as (
select add_months( date '2004-01-01', rownum-1 ) as dt
from table(row_generator(12))
)
, custs as (
select name
, trunc(dt,'mm') as dt
, sum(amt) as amt
from customers
group by
name
, trunc(dt,'mm')
)
select c.name as real_name
, d.dt as real_dt
, nvl(c.amt,0) as amt
, sum(nvl(c.amt,0)) over
( partition by c.name order by d.dt ) as cum_amt
from custs c
partition by ( c.name )
right outer join dates d
on ( c.dt = d.dt );
REAL_NAME REAL_DT AMT CUM_AMT
ORDSYS 01-Jan-04 0 0
ORDSYS 01-Feb-04 0 0
ORDSYS 01-Mar-04 0 0
ORDSYS 01-Apr-04 1173 1173
ORDSYS 01-May-04 1134 2307
ORDSYS 01-Jun-04 1170 3477
ORDSYS 01-Jul-04 1170 4647
ORDSYS 01-Aug-04 1138 5785
ORDSYS 01-Sep-04 1147 6932
ORDSYS 01-Oct-04 0 6932
ORDSYS 01-Nov-04 0 6932
ORDSYS 01-Dec-04 0 6932
PUBLIC 01-Jan-04 0 0
PUBLIC 01-Feb-04 0 0
PUBLIC 01-Mar-04 0 0
PUBLIC 01-Apr-04 3914 3914
PUBLIC 01-May-04 4016 7930
PUBLIC 01-Jun-04 4556 12486
PUBLIC 01-Jul-04 3319 15805
PUBLIC 01-Aug-04 4213 20018
PUBLIC 01-Sep-04 3467 23485
PUBLIC 01-Oct-04 0 23485
PUBLIC 01-Nov-04 0 23485
PUBLIC 01-Dec-04 0 23485
SYS 01-Jan-04 0 0
SYS 01-Feb-04 0 0
SYS 01-Mar-04 0 0
SYS 01-Apr-04 9060 9060
SYS 01-May-04 11944 21004
SYS 01-Jun-04 9564 30568
SYS 01-Jul-04 10514 41082
SYS 01-Aug-04 9199 50281
SYS 01-Sep-04 9841 60122
SYS 01-Oct-04 0 60122
SYS 01-Nov-04 0 60122
SYS 01-Dec-04 0 60122
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 120 | 8640 | 34 |
| 1 | WINDOW SORT | | 120 | 8640 | 34 |
| 2 | VIEW | | 120 | 8640 | 33 |
| 3 | MERGE JOIN PARTITION OUTER | | 120 | 5040 | 33 |
| 4 | SORT JOIN | | 12 | 72 | 25 |
| 5 | VIEW | | 12 | 72 | 24 |
| 6 | COUNT | | | | |
| 7 | COLLECTION ITERATOR PICKLER FETCH| ROW_GENERATOR | | | |
| 8 | SORT PARTITION JOIN | | 1000 | 36000 | 5 |
| 9 | VIEW | | 1000 | 36000 | 4 |
| 10 | SORT GROUP BY | | 1000 | 39000 | 4 |
| 11 | TABLE ACCESS FULL | CUSTOMERS | 1000 | 39000 | 3 |
-----------------------------------------------------------------------------------------
Looks pretty good though I can't verify the stats ( there is no autotrace on htmldb.oracle.com and I can't compile runstats due to permissions ). Maybe you can test it out...
Regards
Adrian
October 28, 2004 - 9:16 am UTC
Looks very similar....
Adrian Billington, October 28, 2004 - 10:19 am UTC
Thanks for link - I hadn't seen that article, though it was through the oramag that I first saw partition outer join just yesterday( in Jonathan Gennick's MODEL article ). Takes me a while to catch up with the new stuff ! The stuff I just posted looks scarily similar to the SQL in your article - even have the same WITH name ( purely coincidence I assure you ;o) ). Maybe I've been reading your site for too long...
Actually I just applied the MODEL instead of the ROW_GENERATOR for "getless" record generation, as per something a guy called "Ant of NY" posted on one of your pages. This was purely for fun of course and probably overkill for 12 records but it looks cool :o)
Regards
Adrian
Just fiddling with the MODEL clause
Kim Berg Hansen, July 14, 2005 - 8:30 am UTC
This is just an alternative method of solving the original question using the MODEL clause in Oracle 10g. (You don't have to answer anything, Tom, I'm just offering it to other readers...)
SQL> create table c (
2 name varchar2(10),
3 amt number,
4 dt date
5 );
Table created.
SQL> insert into c values('tom',100,to_date('17-01-2002','DD-MM-YYYY'));
1 row created.
SQL> insert into c values('tom',200,to_date('17-02-2002','DD-MM-YYYY'));
1 row created.
SQL> insert into c values('tom',300,to_date('17-04-2002','DD-MM-YYYY'));
1 row created.
SQL> insert into c values('tom',50,to_date('17-06-2002','DD-MM-YYYY'));
1 row created.
SQL> insert into c values('sue',1231,to_date('17-03-2002','DD-MM-YYYY'));
1 row created.
SQL> insert into c values('sue',2321,to_date('17-04-2002','DD-MM-YYYY'));
1 row created.
SQL> insert into c values('sue',3122,to_date('17-09-2002','DD-MM-YYYY'));
1 row created.
SQL> insert into c values('sue',59,to_date('17-10-2002','DD-MM-YYYY'));
1 row created.
SQL> commit;
Commit complete.
SQL> select
2 name, to_char(dt,'DD-MM-YYYY') dt, amt, cum_amt
3 from (
4 select name, trunc(dt,'MM') dt, sum(amt) amt
5 from c
6 group by name, trunc(dt,'MM')
7 )
8 model
9 partition by (name)
10 dimension by (dt)
11 measures(amt, cast(null as number) cum_amt)
12 ignore nav
13 rules sequential order(
14 amt[for dt from to_date('01-01-2002','DD-MM-YYYY')
15 to to_date('01-12-2002','DD-MM-YYYY')
16 increment numtoyminterval(1,'MONTH') ] = amt[cv(dt)],
17 cum_amt[any] = sum(amt)[dt <= cv(dt)]
18 )
19 order by name, dt;
NAME DT AMT CUM_AMT
---------- ---------- ---------- ----------
sue 01-01-2002 0 0
sue 01-02-2002 0 0
sue 01-03-2002 1231 1231
sue 01-04-2002 2321 3552
sue 01-05-2002 0 3552
sue 01-06-2002 0 3552
sue 01-07-2002 0 3552
sue 01-08-2002 0 3552
sue 01-09-2002 3122 6674
sue 01-10-2002 59 6733
sue 01-11-2002 0 6733
sue 01-12-2002 0 6733
tom 01-01-2002 100 100
tom 01-02-2002 200 300
tom 01-03-2002 0 300
tom 01-04-2002 300 600
tom 01-05-2002 0 600
tom 01-06-2002 50 650
tom 01-07-2002 0 650
tom 01-08-2002 0 650
tom 01-09-2002 0 650
tom 01-10-2002 0 650
tom 01-11-2002 0 650
tom 01-12-2002 0 650
24 rows selected.
It might not *really* be a better solution than the original Tom solution in this simple case - I just offer it as an alternative that in some other cases might be applicable... :-)
(Just had my first real-life application use of the MODEL clause and I liked it, so I couldn't resist playing a bit with it and posting the result - I apologize in advance to any who feel I should limit myself to questions :-)
July 14, 2005 - 11:03 am UTC
excellent, I myself have to sit down some day and understand the beast that is the sql model clause. It is really powerful, but I've not yet taken the necessary time to master it.
Model clause: Powerful optimizer
Kim Berg Hansen, July 15, 2005 - 3:28 am UTC
Hi, Tom
I hope you have time to fiddle with the model clause so we all can benefit from your sharp brain analyzing the capabilities of "modelling" :-)
After I've tried to use it a few times it starts to make sense - but I can not give high marks to the documentation (at least not chapter 22 in the Data Warehousing Guide) as the examples practically all use very "hardcoded" references - like for example making a cell that's the sum of sales of Italy in 2001 and Spain in 2002. What I mean is that when you read the samples you could get the general idea, that you most of the time have to create rules specifically for the data you have. Personally I think the docs should pay more attention to more "general" use.
Anyway, that's all personal opinion :-)
What I would like to show, though (can't resist :-) is the intelligent optimizing of the model clause.
My example in the previous post gets this plan:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=8 Bytes=232)
1 0 SORT (ORDER BY) (Cost=5 Card=8 Bytes=232)
2 1 SQL MODEL (ORDERED) (Cost=5 Card=8 Bytes=232)
3 2 SORT (GROUP BY) (Cost=5 Card=8 Bytes=232)
4 3 TABLE ACCESS (FULL) OF 'C' (TABLE) (Cost=3 Card=8 Bytes=232)
5 2 WINDOW (IN SQL MODEL) (SORT)
If I replace line 17:
17 cum_amt[any] = sum(amt)[dt <= cv(dt)]
with this new line 17:
17 cum_amt[any] = sum(amt)[any]
then the plan becomes:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=8 Bytes=232)
1 0 SORT (ORDER BY) (Cost=5 Card=8 Bytes=232)
2 1 SQL MODEL (ORDERED) (Cost=5 Card=8 Bytes=232)
3 2 SORT (GROUP BY) (Cost=5 Card=8 Bytes=232)
4 3 TABLE ACCESS (FULL) OF 'C' (TABLE) (Cost=3 Card=8 Bytes=232)
In other words (and the docs do mention this :-) the optimizer figures out, that a cell reference like "sum(amt)[dt <= cv(dt)]" needs the analytical sum() function with a window sort, while a cell reference like "sum(amt)[any]" use the aggregate sum() function.
Now we're really rocking as you don't have to think about whether you should use analytics or aggregates - just tell the model what you want and it figures out the rest :-)
I have to pay respects to those who programmed the optimizing of the model clause - that's what I'd call intelligent people :-)
How to handle increments for dates
Javier, June 25, 2008 - 12:00 pm UTC
I have this for loop inside a rule, but I'm getting the error: ORA-32626 Illegal bounds or increment in MODEL FOR loop.
dt_elig is date column in the source table.
FOR dt_elig FROM to_date('01-Jul-06','DD-Mon-YY')
TO to_date('01-Jun-08','DD-Mon-YY')
INCREMENT numtoyminterval(1,'MONTH')
Any ideas about what I'm doing wrong?