Skip to Main Content
  • Questions
  • Using LAG to populate a status history

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Geoff.

Asked: June 09, 2003 - 1:21 pm UTC

Last updated: April 19, 2012 - 5:20 am UTC

Version: 8.1.7

Viewed 1000+ times

You Asked

Hi Tom,

We have a requirement to generate a sales pipeline history table from two tables, status_details and status. This query will be used to generate a materialized view.

SQL> desc status_details
STD_STA_ID NOT NULL VARCHAR2(50)
STD_SLO_ID NOT NULL VARCHAR2(50)
STD_START_DATE NOT NULL DATE
STD_END_DATE DATE
STD_MILESTONE_DATE DATE
STD_NOTES VARCHAR2(300)

The Milestone_date is the date the status of the sale changed. Start_date and End_dates are audit columns.

SQL> desc status

STA_ID NOT NULL VARCHAR2(50)
STA_START_DATE NOT NULL DATE
STA_NAME NOT NULL VARCHAR2(20)
STA_END_DATE DATE
STA_PRIORITY NOT NULL NUMBER(3)

For any particular sale, we have a history like
SQL> ed
Wrote file afiedt.buf

1 select std_slo_id Sale_id, sta_name status,std_milestone_date Milesone
2 from status_details,status
3 where std_sta_id = sta_id
4 and std_slo_id = 'SLO_079C0A6695CF4449923B8A772FECB001'
5* order by std_milestone_date
SQL> /

SALE_ID STATUS MILESONE
-------------------------------------------------- ---------- ---------
SLO_079C0A6695CF4449923B8A772FECB001 Identified 01-FEB-02
SLO_079C0A6695CF4449923B8A772FECB001 Contacted 22-FEB-02
SLO_079C0A6695CF4449923B8A772FECB001 Outcome 01-JUL-02

What we want to do is expand this out on a month-by month basis, up to the current date. Using your excellent article on showing missing dates as a starting point, we now get

select date_structure.std_slo_id sales_id,status,
DATE_STRUCTURE.MILESTONE MILESTONE
from
(
-- date structure table construct
-- based on Tom Kytes column from askTom "How to show missing date given a range"
select a.std_slo_id std_slo_id,
a.milestone_date MILESTONE
from
(select std_slo_id,start_date+r-1 milestone_date
from t2,
(select std_slo_id,min(to_date(to_char(STD_MILESTONE_DATE,'mon-yyyy'),'mon-yyyy')) start_date,
max(to_date(to_char(SYSDATE,'mon-yyyy'),'mon-yyyy'))
- min(to_date(to_char(STD_MILESTONE_DATE,'mon-yyyy'),'mon-yyyy'))+1 num_days
from status_details
where std_slo_id ='SLO_001E661354B6479BB34382EB7CA8C562'
group by std_slo_id) t1
where t2.r <=t1.num_days) a,
status_details b
where a.std_slo_id =b.std_slo_id(+)
and a.milestone_date = b.std_milestone_date(+)
and to_number(to_char(a.milestone_date,'dd')) =1 -- first day of the month only
) DATE_STRUCTURE,
(
-- this is the relational status history construct
select STD_SLO_ID ,STA_NAME STATUS,sta_priority,
--rownumber StatNum,
to_date(to_char(STD_MILESTONE_DATE,'mon-yyyy'),'mon-yyyy') MILESTONE
from status_details,status
where sta_id = STD_STA_ID
order by std_slo_id,std_milestone_date,sta_priority
) STATUS_STRUCTURE
where DATE_STRUCTURE.std_slo_id = STATUS_STRUCTURE.std_slo_id(+)
and DATE_STRUCTURE.milestone = status_structure.milestone(+)
order by date_structure.std_slo_id,DATE_STRUCTURE.MILESTONE,status_structure.sta_priority
/
SALES_ID STATUS MILESTONE
-------------------------------------------------- ---------- ---------
SLO_001E661354B6479BB34382EB7CA8C562 Identified 01-JUL-02
SLO_001E661354B6479BB34382EB7CA8C562 01-AUG-02
SLO_001E661354B6479BB34382EB7CA8C562 01-SEP-02
SLO_001E661354B6479BB34382EB7CA8C562 01-OCT-02
SLO_001E661354B6479BB34382EB7CA8C562 01-NOV-02
SLO_001E661354B6479BB34382EB7CA8C562 Contacted 01-DEC-02
SLO_001E661354B6479BB34382EB7CA8C562 01-JAN-03
SLO_001E661354B6479BB34382EB7CA8C562 Proposed 01-FEB-03
SLO_001E661354B6479BB34382EB7CA8C562 Outcome 01-FEB-03
SLO_001E661354B6479BB34382EB7CA8C562 01-MAR-03
SLO_001E661354B6479BB34382EB7CA8C562 01-APR-03
SLO_001E661354B6479BB34382EB7CA8C562 01-MAY-03
SLO_001E661354B6479BB34382EB7CA8C562 01-JUN-03

All well and good. Here's the question - I assume we'd use LAG to fill in the status gaps, and use arg2 (the offset into the partition) to point to the correct previous row to populate the history, so we end up with something like

SALES_ID STATUS MILESTONE
-------------------------------------------------- ---------- ---------
SLO_001E661354B6479BB34382EB7CA8C562 Identified 01-JUL-02
SLO_001E661354B6479BB34382EB7CA8C562 Identified 01-AUG-02
SLO_001E661354B6479BB34382EB7CA8C562 Identified 01-SEP-02
SLO_001E661354B6479BB34382EB7CA8C562 Identified 01-OCT-02
SLO_001E661354B6479BB34382EB7CA8C562 Identified 01-NOV-02
SLO_001E661354B6479BB34382EB7CA8C562 Contacted 01-DEC-02
SLO_001E661354B6479BB34382EB7CA8C562 Contacted 01-JAN-03
SLO_001E661354B6479BB34382EB7CA8C562 Proposed 01-FEB-03
SLO_001E661354B6479BB34382EB7CA8C562 Outcome 01-FEB-03
SLO_001E661354B6479BB34382EB7CA8C562 Outcome 01-MAR-03
SLO_001E661354B6479BB34382EB7CA8C562 Outcome 01-APR-03
SLO_001E661354B6479BB34382EB7CA8C562 Outcome 01-MAY-03
SLO_001E661354B6479BB34382EB7CA8C562 Outcome 01-JUN-03

but we haven't quite been able to make it work. We've tried using row_number() in a nested query to generate the correct offset, i.e.

select sales_id,
nvl(status,lag(status,theOffset)
over(partition by opportunity_id
order by opportunity_id,milestone) ),
milestone
from
(
-- first nested query
select sales_id,status,milestone,
row_number() over( partition by date_structure.std_slo_id
order by DATE_STRUCTURE.MILESTONE) theOffset
from
(
-- second nested query generating the expanded sales history, as above
)
)

Any help you could give us would be much appreciated; we're just getting into analytical functions and yes, they do rock.

Thanks in advance,

Geoff.


and Tom said...

Ok, have you ever had a question that people kept asking over and over and over again? I do, it is this one. In general it is "how can I find the last non-null value in a window after sorting and populate its value in this row?" (eg: this question)

I've always said "I don't know, beats me with a stick"....

Then it hit me, it isn't any different then finding the row in a partition that has the "max value" -- only we need to use a sliding window.

MAX can do this for us.

You want to find the MAX milestone
In your partition (sales_id)
That has a non-null status code
And return that.


Great, here we go:


ops$tkyte@ORA920> select t.*,
2 substr( max( decode( status, null, null,
3 to_char(milestone,'yyyymmddhh24miss')||status) )
4 over (partition by sales_id order by milestone ), 15 ) status
5 from t
6 /

SALES_ID STATU MILESTONE STATU
---------- ----- --------- -----
1 aaa 01-JUL-02 aaa
1 01-AUG-02 aaa
1 01-SEP-02 aaa
1 01-OCT-02 aaa
1 01-NOV-02 aaa
1 bbb 01-DEC-02 bbb
1 01-JAN-03 bbb
1 ccc 01-FEB-03 ccc
1 ddd 01-MAR-03 ddd
1 01-APR-03 ddd
1 01-MAY-03 ddd
1 01-JUN-03 ddd
2 AAA 01-JUL-02 AAA
2 01-AUG-02 AAA
2 01-SEP-02 AAA
2 01-OCT-02 AAA
2 01-NOV-02 AAA
2 BBB 01-DEC-02 BBB
2 01-JAN-03 BBB
2 CCC 01-FEB-03 CCC
2 DDD 01-MAR-03 DDD
2 01-APR-03 DDD
2 01-MAY-03 DDD
2 01-JUN-03 DDD

24 rows selected.


If status is NULL, the decode returns NULL
Else the decode returns a STRING that is sortable (yyyymmddhh24miss)

We take the MAX of that string, by partition, after sorting, upto the current row.

We just substring out what we want.

I'm happy -- sorry to the others to whom I said "i dont think so"

Rating

  (13 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent

Reader, June 09, 2003 - 4:04 pm UTC

I really keep learning new stuff everyday. I have most of things that I learnt here compiled into a document with my comments to understand it better.
Thanks for everything.

A reader, June 09, 2003 - 5:26 pm UTC


A reader, June 09, 2003 - 5:50 pm UTC

I have a query like

select cust_no,rank,min(rank) over (partition by cust_no)
from <complex view here>

I see that the plan has a 'window sort' as the last step i.e it seems to sort the entire result set as if I had a 'order by cust_no' at the end of the query. This is killing performance.

Shouldnt a min() over a partition order only the rows in that partition?

Thanks

Tom Kyte
June 09, 2003 - 7:11 pm UTC

tell me, how can it get the MIN(rank) in a cust_no unless all of the rows for a cust_no are there?

and further, how can it know that all of the rows for that cust_no are there, until it gets the last one -- since you have a complex view.

A reader, June 09, 2003 - 9:07 pm UTC

So how can I optimize this query? When I pass in a 'and cust_no=1234' to the view, it is blazing fast. But some apps want to run this query for all customers and it takes forever.

Any suggestions? Thanks

Tom Kyte
June 10, 2003 - 7:50 am UTC

Umm, try to make "complex view faster" I suppose?


This is sort of like a question I got yesterday. Found it funny enough to share:



<quote>
I know it's partial information but due to confidentiality I can't
send you the SQL but will try to send you after making it generic.

In the meantime will it be possible for you to comment on this.


15233028 rows selected.

Elapsed: 18:19:02.52

Statistics
----------------------------------------------------------
26790071 recursive calls
5377 db block gets
54149676 consistent gets
1079284 physical reads
692 redo size
2032284077 bytes sent via SQL*Net to client
112724810 bytes received via SQL*Net from client
1015537 SQL*Net roundtrips to/from client
70 sorts (memory)
2 sorts (disk)
15233028 rows processed
</quote>


Yeah, I can tune that.

NOT that I'm asking you to put all of the relevant information here, don't do that. If I'm taking questions -- feel free to post things like, oh, "the view", "the tables", "an english description of what the query does", "tkprof of query with and without analytics", "autotrace traceonly explains"



Excellent!

Geoff Duniam, June 10, 2003 - 3:37 am UTC

Thanks, Tom, very elegant - solves it for us. Great forum and site, BTW. Keep up the good work.

Just need a link

steve, June 10, 2003 - 11:45 am UTC

Hi Tom,

May I have the link for:
-- based on Tom Kytes column from askTom "How to show missing date given a range"


Thanks!

Steve

Tom Kyte
June 10, 2003 - 12:41 pm UTC

try this, search for

How to show missing date given a range


from asktom.oracle.com.

Your query works good but may be there is something in here...

GM, May 16, 2005 - 2:28 pm UTC

Here is my example table:

CREATE TABLE TEST_EVENT
(
ID NUMBER,
EVENT_KEY VARCHAR2(10),
EVENT_NAME VARCHAR2(10),
SUB_EVENT VARCHAR2(10)
)

and here is what I am trying to do

CREATE OR REPLACE VIEW TEST_EVENT_VIEW AS
SELECT D.ID, D.EVENT_KEY, D.EVENT_NAME, D.SUB_EVENT,
(CASE WHEN D.EVENT_NAME <> D.EVENT_BEFORE THEN 1 ELSE NULL END) IS_FIRST_SUB_EVENT,
(CASE WHEN D.EVENT_NAME <> D.EVENT_AFTER THEN 1 ELSE NULL END) IS_LAST_SUB_EVENT,
MAX (DECODE ((CASE WHEN D.EVENT_NAME <> D.EVENT_BEFORE THEN 1 ELSE NULL END), NULL, NULL, D.ID)) OVER (PARTITION BY D.EVENT_KEY, D.EVENT_NAME ORDER BY D.ID) EVENT_RUN_ID
FROM (SELECT STG.*,
LAG (EVENT_NAME, 1, 'START') OVER (PARTITION BY EVENT_KEY ORDER BY ID, EVENT_NAME) EVENT_BEFORE,
LEAD (EVENT_NAME, 1, 'END') OVER (PARTITION BY EVENT_KEY ORDER BY ID, EVENT_NAME) EVENT_AFTER
FROM TEST_EVENT STG) D;

I am trying to generate event_run_id for related sub-events to be the id value of first sub-event's id value.

Here is my first data-set:

INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 1, 'EK1', 'E1', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 2, 'EK1', 'E1', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 3, 'EK1', 'E2', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 4, 'EK1', 'E2', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 5, 'EK1', 'E1', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 6, 'EK1', 'E1', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 7, 'EK2', 'E1', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 8, 'EK2', 'E1', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 9, 'EK2', 'E2', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 10, 'EK2', 'E3', 'S3');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 11, 'EK3', 'E1', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 12, 'EK3', 'E1', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 13, 'EK3', 'E2', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 14, 'EK3', 'E2', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 15, 'EK4', 'E2', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 16, 'EK4', 'E2', 'S3');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 17, 'EK4', 'E3', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 18, 'EK4', 'E3', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 19, 'EK4', 'E2', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 20, 'EK4', 'E2', 'S3');
COMMIT;

This works fine. Here is my second data set and it has problem while generating the event_run_id 7 (its picked up twice for some reason) when I select from the view

INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 1, 'EK1', 'E1', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 2, 'EK1', 'E2', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 3, 'EK1', 'E3', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 4, 'EK1', 'E1', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 5, 'EK1', 'E2', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 6, 'EK1', 'E3', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 7, 'EK2', 'E1', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 8, 'EK2', 'E2', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 9, 'EK2', 'E3', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 10, 'EK2', 'E1', 'S3');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 11, 'EK3', 'E1', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 12, 'EK3', 'E2', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 13, 'EK3', 'E3', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 14, 'EK3', 'E1', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 15, 'EK4', 'E1', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 16, 'EK4', 'E2', 'S3');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 17, 'EK4', 'E3', 'S1');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 18, 'EK4', 'E1', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 19, 'EK4', 'E2', 'S2');
INSERT INTO TEST_EVENT (ID, EVENT_KEY, EVENT_NAME, SUB_EVENT) VALUES ( 20, 'EK4', 'E3', 'S3');
COMMIT;


OUT PUT

GM, May 16, 2005 - 2:41 pm UTC

Here is the output from both data sets

---------------------------------------------------------
id event_key event_name step_name first last event_run_id
---------------------------------------------------------
1 EK1 E1 S1 1 1
2 EK1 E1 S2 1 1
3 EK1 E2 S1 1 3
4 EK1 E2 S2 1 3
5 EK1 E1 S1 1 5
6 EK1 E1 S2 1 5
7 EK2 E1 S1 1 7
8 EK2 E1 S2 1 7
9 EK2 E2 S2 1 1 9
10 EK2 E3 S3 1 1 10
11 EK3 E1 S1 1 11
12 EK3 E1 S2 1 11
13 EK3 E2 S1 1 13
14 EK3 E2 S2 1 13
15 EK4 E2 S2 1 15
16 EK4 E2 S3 1 15
17 EK4 E3 S1 1 17
18 EK4 E3 S2 1 17
19 EK4 E2 S2 1 19
20 EK4 E2 S3 1 19

---------------------------

set2
--------------------------
1 EK1 E1 S1 1 1 1
2 EK1 E2 S2 1 1 2
3 EK1 E3 S1 1 1 3
4 EK1 E1 S2 1 1 4
5 EK1 E2 S1 1 1 5
6 EK1 E3 S2 1 1 6
7 EK2 E1 S1 1 1 7
8 EK2 E2 S2 1 1 8
9 EK2 E3 S2 1 1 9
10 EK2 E1 S3 1 1 7 --this row should have a value of 10 for the last column
11 EK3 E1 S1 1 1 11
12 EK3 E2 S2 1 1 12
13 EK3 E3 S1 1 1 13
14 EK3 E1 S2 1 1 14
15 EK4 E1 S2 1 1 15
16 EK4 E2 S3 1 1 16
17 EK4 E3 S1 1 1 17
18 EK4 E1 S2 1 1 18
19 EK4 E2 S2 1 1 19
20 EK4 E3 S3 1 1 20


Tom Kyte
May 16, 2005 - 4:58 pm UTC

sorry -- looks like a new question, didn't quite follow what was supposed to be going on.

Brilliant Solution, but...

Jon, September 02, 2005 - 6:00 pm UTC

Tom,
The problem at the beginning of this tread is something that I have run across many times, including just today. Bless your genius for figuring out a solution, but is there any chance the your good friends at Oracle might (or have since) provide us with a more conventional means of solving this problem?

Tom Kyte
September 03, 2005 - 8:50 am UTC

YES!!! 

ignore nulls. New in 10g:


ops$tkyte@ORA10G> create table t ( dt date, val number );
                                                                                                                     
Table created.
                                                                                                                     
ops$tkyte@ORA10G> insert into t
  2  select sysdate-100+rownum, decode(mod(rownum,4),1,user_id)
  3    from (select * from all_users order by user_id desc )
  4   where rownum <= 10;
 
10 rows created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> select dt, val
  2    from t
  3   order by dt;
 
DT               VAL
--------- ----------
27-MAY-05        167
28-MAY-05
29-MAY-05
30-MAY-05
31-MAY-05        157
01-JUN-05
02-JUN-05
03-JUN-05
04-JUN-05         75
05-JUN-05
 
10 rows selected.
 
ops$tkyte@ORA10G> select dt, val,
  2         to_number(substr(max(max_val) over (order by dt),5)) max_val
  3    from ( select dt, val,
  4                  case when val is not null
  5                           then to_char(row_number() over (order by dt),'fm0000')||val
  6                       end max_val
  7    from t ) order by dt
  8  /
 
DT               VAL    MAX_VAL
--------- ---------- ----------
27-MAY-05        167        167
28-MAY-05                   167
29-MAY-05                   167
30-MAY-05                   167
31-MAY-05        157        157
01-JUN-05                   157
02-JUN-05                   157
03-JUN-05                   157
04-JUN-05         75         75
05-JUN-05                    75
 
10 rows selected.
 
ops$tkyte@ORA10G> select dt, val,
  2         last_value(val ignore nulls) over (order by dt) val
  3    from t
  4   order by dt
  5  /
 
DT               VAL        VAL
--------- ---------- ----------
27-MAY-05        167        167
28-MAY-05                   167
29-MAY-05                   167
30-MAY-05                   167
31-MAY-05        157        157
01-JUN-05                   157
02-JUN-05                   157
03-JUN-05                   157
04-JUN-05         75         75
05-JUN-05                    75
 
10 rows selected.
 
 

silimar problem

Alla, September 13, 2007 - 4:25 pm UTC

Hi Tom;

I have very similar problem to the one discussed here, but can't wrap my head around it, because I am not too familiar with the analytic functions (just yet)

Here is my problem. I have data in the following format: revision catalog that holds master revision numbers and a table (few of them) that holds revision number of some details, but if the detail has not been changed in the revision, extra rows do not get populated with the same data.

create table revision_catalog (number_ integer);
create table detail_revision
(detail_id integer,
revision_number integer,
revision_data varchar2(50));

insert into revision_catalog values (1);
insert into revision_catalog values (2);
insert into revision_catalog values (3);
insert into revision_catalog values (4);
insert into revision_catalog values (5);
insert into revision_catalog values (6);

insert into detail_revision values (1, 1, 'revision 1 of detail 1');
insert into detail_revision values (1, 4, 'revision 4 of detail 1');
insert into detail_revision values (2, 1, 'revision 1 of detail 2');
insert into detail_revision values (2, 3, 'revision 3 of detail 2');

So, if I outer join my tables, I'll get this data:
select rc.number_, dr.detail_id, dr.revision_data
from revision_catalog rc left outer join detail_revision dr
on (rc.number_ = dr.revision_number)
order by 1

NUMBER_ DETAIL_ID REVISION_DATA
---------- ---------- --------------------------------------------------
1 1 revision 1 of detail 1
1 2 revision 1 of detail 2
2
3 2 revision 3 of detail 2
4 1 revision 4 of detail 1
5
6

what I need is for each detail to show data for each revision in the catalog to look like this:
NUMBER_ DETAIL_ID REVISION_DATA
---------- ---------- --------------------------------------------------
1 1 revision 1 of detail 1
2 1 revision 1 of detail 1
3 1 revision 1 of detail 1
4 1 revision 4 of detail 1
5 1 revision 4 of detail 1
6 1 revision 4 of detail 1
1 2 revision 1 of detail 2
2 2 revision 1 of detail 2
3 2 revision 3 of detail 2
4 2 revision 3 of detail 2
5 2 revision 3 of detail 2
6 2 revision 3 of detail 2

Can you help, please????

Thanks

Alla


Tom Kyte
September 15, 2007 - 8:11 pm UTC

need better specifications.

the output you desire does not compute, I don't see the logic - please spec it out a bit better.

clarifications

A reader, September 17, 2007 - 10:33 am UTC

I needed to fill out gaps. If a detail changed in revision N, then for all revisions between 0 and N I should show data corresponding to revision 0. If it later changed in revision M, then for all rows with revision number between N and M, I'll be showing data corresponding to revision N

I came up with this statement:

with numbers as (select number_ from revision_catalog)
select detail_id, rev_number, last_value(revision_data ignore nulls)
over (partition by detail_id order by rev_number) as rev_data
from (select detail_id, numbers.number_ as rev_number, revision_data
from detail_revision partition by (detail_id) right outer join
numbers on (numbers.number_ = revision_number));

It seems to do the job - does it look good to you?

Thanks


Tom Kyte
September 18, 2007 - 2:56 pm UTC

if it does the job
and runs faster than fast enough

I'm ok with it - last_value is the correct approach

continuation to ALLA's from Boston post.

Biswaranjan, April 17, 2012 - 3:46 am UTC

Hi Tom,

Regarding the 'Alla from Boston's' post in the year 2007 on this page, I found the logic for that.

The logic is on the second column( which is revision_number sequence under a particular detail_numer).
like for detail_id "1" the revision_number is 1 and 4 .so "revision 1 of detail 1" will be till 3rd and
"revision 4 of detail 1" will start from 4th to end of row(6th).

And for the details number "2" the sequences are 1 and 3.
so "revision 1 of detail 2" will be from 1st to 2nd and "revision 3 of detail 2" will be from
3rd to end(which is 6th row).

I tried to write the query as below which gives the exact output (i used your one of last_value concept :)).
create table revision_catalog (number_ integer);
create table detail_revision
(detail_id integer,
revision_number integer,
revision_data varchar2(50));

insert into revision_catalog values (1);
insert into revision_catalog values (2);
insert into revision_catalog values (3);
insert into revision_catalog values (4);
insert into revision_catalog values (5);
insert into revision_catalog values (6);

insert into detail_revision values (1, 1, 'revision 1 of detail 1');
insert into detail_revision values (1, 4, 'revision 4 of detail 1');
insert into detail_revision values (2, 1, 'revision 1 of detail 2');
insert into detail_revision values (2, 3, 'revision 3 of detail 2');

########
select r NUMBER_,x DETAIL_ID, last_value(revision_data ignore nulls) over(order by rownum) REVISION_DATA
from (select second2.r,second2.x,first1.revision_data from (
select * from
revision_catalog,detail_revision where
number_=revision_number
order by detail_id,number_
) first1,(select rownum r,1 x from
dual connect by level<7
union all select rownum r,2 x from
dual connect by level<7) second2 where first1.number_(+)=second2.r and first1.detail_id(+)=second2.x
order by second2.x,second2.r);
######### output
NUMBER_ DETAIL_ID REVISION_DATA

1 1 revision 1 of detail 1
2 1 revision 1 of detail 1
3 1 revision 1 of detail 1
4 1 revision 4 of detail 1
5 1 revision 4 of detail 1
6 1 revision 4 of detail 1
1 2 revision 1 of detail 2
2 2 revision 1 of detail 2
3 2 revision 3 of detail 2
4 2 revision 3 of detail 2
5 2 revision 3 of detail 2
6 2 revision 3 of detail 2


But i know you can write the query for same result better and shorter:).

Thanks,
Biswaranjan.

Continuation to Allas's post

Biswaranjan, April 18, 2012 - 11:24 pm UTC

Hi Tom,

The query I have posted above for those distinct value 1 and 2 which I have mentioned manually by using union.

Can it be written by taking the value from the table.
I mean to say if the query can be written shorter and better for the performance.

thanks for your help as always,
Biswaranjan.

Tom Kyte
April 19, 2012 - 5:20 am UTC

relate to me in the form of a specification what the query is supposed to do exactly.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.