Home>Question Details



Michael -- Thanks for the question regarding "Analytics question", version 9.2.0.

Submitted on 6-Oct-2003 15:41 Central time zone
Last updated 29-Jan-2010 15:23

You Asked

I have a table from a 3rd party application that is used to track
an order through the various manufacturing operations.  A subset of
the information looks like this:

   ORDER  OPN  STATION  CLOSE_DATE
   -----  ---  -------  ----------
   12345   10  RECV     07/01/2003
   12345   20  MACH1    07/02/2003
   12345   25  MACH1    07/05/2003
   12345   30  MACH1    07/11/2003
   12345   36  INSP1    07/12/2003
   12345   50  MACH1    07/16/2003
   12345   90  MACH2    07/30/2003
   12345  990  STOCK    08/01/2003

Where each row is a process that the order had to go through,
with OPN being the order of the processes.

What I would like to receive is the output grouped by consecutive
STATION values and include the start and close dates for each
STATION group.  The start date is defined as the date the prior
station closed.  So the output expected from the above data subset
would be:

   ORDER  STATION  START_DATE  CLOSE_DATE
   -----  -------  ----------  ----------
   12345  RECV                 07/01/2003
   12345  MACH1    07/01/2003  07/11/2003
   12345  INSP1    07/11/2003  07/12/2003
   12345  MACH1    07/12/2003  07/16/2003
   12345  MACH2    07/16/2003  07/30/2003
   12345  STOCK    07/30/2003  08/01/2003 

Is this possible?  I've tried using analytics, but I can't seem to
get what I want.  I can use the LAG function to get the start and 
close dates, grouped by STATION, but it will group all the different
STATION values together (i.e. all MACH1 STATIONS will be grouped 
together), not just the consecutive STATION values.  I could use
procedural code to get this answer, but I was wanting to see if
it could be done in 1 statement.

I'm sure it will be something easy, but I've been racking my tiny
brain over this for the last few days and can't come up with a
solution.  Can you help?

Many thanks,

Michael T.



 

and we said...

Analytics rock
Analytics roll

been thinking about writing a book just about analytics (but wait'll you see the SQL 
Model clause in 10g)

ops$tkyte@ORA920> select order#, station, lag_close_date, close_date
  2    from (
  3  select order#,
  4         lag(station) over (partition by order# order by close_date) 
                                                                 lag_station,
  5         lead(station) over (partition by order# order by close_date) 
                                                                 lead_station,
  6             station,
  7             close_date,
  8         lag(close_date) over (partition by order# order by close_date) 
                                                                lag_close_date,
  9         lead(close_date) over (partition by order# order by close_date) 
                                                                lead_close_date
 10    from t
 11         )
 12   where lag_station is null
 13          or lead_station is null
 14          or lead_station <> station
 15  /
 
    ORDER# STATION    LAG_CLOSE_ CLOSE_DATE
---------- ---------- ---------- ----------
     12345 RECV                  07/01/2003
     12345 MACH1      07/05/2003 07/11/2003
     12345 INSP1      07/11/2003 07/12/2003
     12345 MACH1      07/12/2003 07/16/2003
     12345 MACH2      07/16/2003 07/30/2003
     12345 STOCK      07/30/2003 08/01/2003
 
6 rows selected.
 
 

Reviews    
5 stars Excellent!!!   October 7, 2003 - 5am Central time zone
Reviewer: A reader 
Hi Tom,

 'been thinking about writing a book just about analytics ' ... please make this book available 
soon and am sure it will be yet another gift from you to Oracle World :) 


5 stars Wow!!   October 7, 2003 - 7am Central time zone
Reviewer: Michael T from Dallas, Tx
This is exactly what I needed!  Analytics do rock!  I just
need to understand them better.  If you do decide to write a
book on analytics, it would be at the top of my must have
list.  Thanks again!!! 


5 stars Small correction   October 7, 2003 - 7am Central time zone
Reviewer: Michael T from Dallas, Tx
After looking at it a little closer it looks like there is 
one small error.  The start date for the first MACH1 entry
should be the close date of the prior different station.  In
this case 07/01/2003.  However, by making some small changes 
to your query I can get the results I want.  

SELECT order#,
       station,
       lag(close_date) over (partition by order# order by close_date) 
           start_date,
       close_date
  FROM (SELECT order#, 
               station, 
               close_date
          FROM (SELECT order#,
                       lag(station) over (partition by order# order by 
                                          close_date) lag_station,
                       lead(station) over (partition by order# order by 
                                          close_date) lead_station,
                       station,
                       close_date
                  FROM t)
 WHERE lead_station <> station
    OR lead_station is null
    OR lag_station is null)

There might be an easier way to construct this query, but
it works great for me.  Thanks a lot for your help!

 


Followup   October 7, 2003 - 8am Central time zone:

sorry about that -- you are right -- when we have "a pair", we want to use lag/lead again to get 
and keep the right dates.  

So, we want to keep rows that are:

a) the first row in the partition  "where lag_station is null"
b) the last row in the partition "where lead_station is null"
c) the first of a possible pair "where lag_station <> station"
d) the second of a possible pair "where lead_station <> station"

This query does that:

ops$tkyte@ORA920> select order#,
  2         station,
  3         lag_close_date,
  4         close_date,
  5         decode( lead_station, station, 1, 0 ) first_of_pair,
  6         decode( lag_station, station, 1, 0 ) second_of_pair
  7    from (
  8  select order#,
  9         lag(station) over (partition by order# order by close_date)
 10                                                              lag_station,
 11         lead(station) over (partition by order# order by close_date)
 12                                                              lead_station,
 13             station,
 14             close_date,
 15         lag(close_date) over (partition by order# order by close_date)
 16                                                             lag_close_date,
 17         lead(close_date) over (partition by order# order by close_date)
 18                                                             lead_close_date
 19    from t
 20         )
 21   where lag_station is null
 22          or lead_station is null
 23          or lead_station <> station
 24          or lag_station <> station
 25  /
 
ORDER# STATION LAG_CLOSE_ CLOSE_DATE FIRST_OF_PAIR SECOND_OF_PAIR
------ ------- ---------- ---------- ------------- --------------
 12345 RECV               07/01/2003             0              0
 12345 MACH1   07/01/2003 07/02/2003             1              0
 12345 MACH1   07/05/2003 07/11/2003             0              1
 12345 INSP1   07/11/2003 07/12/2003             0              0
 12345 MACH1   07/12/2003 07/16/2003             0              0
 12345 MACH2   07/16/2003 07/30/2003             0              0
 12345 STOCK   07/30/2003 08/01/2003             0              0
 
7 rows selected.
 

we can see with the 1's the first/second of a pair in there.  All we need to do now is "reach 
forward" for the first of a pair and grab the close date from the next record:

ops$tkyte@ORA920> select order#,
  2         station,
  3         lag_close_date,
  4         close_date
  5    from (
  6  select order#,
  7         station,
  8         lag_close_date,
  9         decode( lead_station,
 10                 station,
 11                 lead(close_date) over (partition by order# order by close_date),
 12                 close_date ) close_date,
 13         decode( lead_station, station, 1, 0 ) first_of_pair,
 14         decode( lag_station, station, 1, 0 ) second_of_pair
 15    from (
 16  select order#,
 17         lag(station) over (partition by order# order by close_date)
 18                                                              lag_station,
 19         lead(station) over (partition by order# order by close_date)
 20                                                              lead_station,
 21             station,
 22             close_date,
 23         lag(close_date) over (partition by order# order by close_date)
 24                                                             lag_close_date,
 25         lead(close_date) over (partition by order# order by close_date)
 26                                                             lead_close_date
 27    from t
 28         )
 29   where lag_station is null
 30          or lead_station is null
 31          or lead_station <> station
 32          or lag_station <> station
 33         )
 34   where second_of_pair <> 1
 35  /
 
ORDER# STATION LAG_CLOSE_ CLOSE_DATE
------ ------- ---------- ----------
 12345 RECV               07/01/2003
 12345 MACH1   07/01/2003 07/11/2003
 12345 INSP1   07/11/2003 07/12/2003
 12345 MACH1   07/12/2003 07/16/2003
 12345 MACH2   07/16/2003 07/30/2003
 12345 STOCK   07/30/2003 08/01/2003
 
6 rows selected.


and discard the second of pairs row


That is another way to do it (and an insight into how I develop analytic queries -- adding extra 
columns like that just to see visually what I want to do) 

4 stars another good book on the list please go ahead on this one too   October 7, 2003 - 8am Central time zone
Reviewer: Vijay Sehgal from India
Best Regards,
Vijay Sehgal 


5 stars Very useful   October 7, 2003 - 12pm Central time zone
Reviewer: Michael T. from Dallas, Tx
Excellent, as always! 


5 stars Can we reach to the end of the group?   December 15, 2003 - 11am Central time zone
Reviewer: Steve from UK
For example, say our analytic query returns the following result:

master_record    sub_record   nxt_record
95845433           25860032     95118740
95118740           25860032     95837497
95837497           25860032     

What I'd like is to do is grab the final master_record, 95837497, and have that populated in the 
final column.  There could be 2,3 or more in each group. 


Followup   December 15, 2003 - 3pm Central time zone:

so the nxt_record of the last record should be the master_record of that row?

then just select


  nvl( lead(master_record) over (....), master_record ) nxt_record


when the lead is NULL, return the master_record of the current row 

5 stars Almost....   December 15, 2003 - 5pm Central time zone
Reviewer: Steve from UK
but I dodn't explain it well enough.  What I'd like to see is a result set that looks like:

master_record    sub_record   nxt_record
95845433           25860032     95837497
95118740           25860032     95837497
95837497           25860032     95837497

The data comes from this:

table activity
cllocn        moddate

25860032      18/06/2003
95118740      26/08/2003
95837497      15/12/2003
95845433      19/08/2003

table ext_dedupe

master_cllocn    dupe_cllocn
25860032         95118740
25860032         95837497
25860032         95845433

My query is:

select * from ( select master_record, sub_record, lead(master_record) over (partition by sub_record 
order by lst_activity asc) nxt_activity
from ( select * from (select case when dupelast_ackdate>last_ackdate then dupe_cllocn
when last_ackdate>dupelast_ackdate then master_cllocn
else master_cllocn
end master_record, greatest(last_ackdate,dupelast_ackdate) lst_activity,
case when dupelast_ackdate>last_ackdate then master_cllocn
when last_ackdate>dupelast_ackdate then dupe_cllocn
else dupe_cllocn
end sub_record
from (select master_cllocn, (select max(moddate) from activity a where a.cllocn=ed.master_cllocn) 
last_ackdate,
dupe_cllocn, (select max(moddate) from activity a where a.cllocn=ed.dupe_cllocn) dupelast_ackdate
from ext_dedupe ed))))

Am I on the right track or is there a simpler way to this?

Thanks 


Followup   December 16, 2003 - 6am Central time zone:

can you explain in "just text" how you got from your inputs to your outputs.

it is not clear (and i didn't feel like parsing that sql to reverse engineer what it does)

 

3 stars Is this what you are looking for ?   December 15, 2003 - 6pm Central time zone
Reviewer: Venkat from Detroit, MI USA
select master, sub, moddate
       , min(master) keep (dense_rank first order by moddate) over (partition by sub) first_in_list
       , max(master) keep (dense_rank last order by moddate) over (partition by sub) last_in_list
  from (select master, sub, moddate from (
        select 95845433 master, 25860032 sub, to_date('19-aug-03','dd/mon/yy') moddate from dual 
union all
        select 95118740, 25860032, to_date('26-aug-03','dd/mon/yy') from dual union all
        select 95837497, 25860032, to_date('15-dec-03','dd/mon/yy') from dual))

MASTER    SUB    MODDATE    FIRST_IN_LIST    LAST_IN_LIST
95845433    25860032    8/19/2003    95845433    95837497
95118740    25860032    8/26/2003    95845433    95837497
95837497    25860032    12/15/2003    95845433    95837497
 


4 stars Tom's Book   December 16, 2003 - 4am Central time zone
Reviewer: umesh from blore india
Tom
Do not announce until you are finished with the book .. when you talk of a book ..can't wait until 
We have it here
Analytics Book That must be real good 


5 stars Is it possible to get the same result in standard edition ?   December 16, 2003 - 4am Central time zone
Reviewer: Ninoslav from croatia
Hi Tom,
yes, analitic functions are great. However, we can use it only in enterprise edition of database. 
We have a few small customers that want only a standard edition.
So, is it possible in this question to get the same result without analitic functions ? 
It would be nice to have some kind of mapping between analitics and 'standard' queries. But, that 
is probabaly impossible... 


Followup   December 16, 2003 - 7am Central time zone:

Oracle 9iR2 and up -- analytics are a feature of standard edition.

there are things you can do in analytics that are quite simply NOT PRACTICAL in any sense without 
them.
 

5 stars ok   December 16, 2003 - 8am Central time zone
Reviewer: Steve from uk
I have two tables - activity and ext_dedupe.

table activity
cllocn        moddate
25860032      18/06/2003
95118740      26/08/2003
95837497      15/12/2003
95845433      19/08/2003

table ext_dedupe
master_cllocn    dupe_cllocn
25860032         95118740
25860032         95837497
25860032         95845433

Ext_dedupe is a table created by a third party app which has identified duplicate records within 
our database.  The first column is supposed to be the master and the second the duplicate.  The 
idea is to mark as archived all our duplicate records with a pointer to the master.  
Notwithstanding the order of the columns, what we want to do is find out which record has the most 
recent activity (from the activity table) and archive off the others.

So, in this example although the master is listed as 25860032 against the other 3, an examination 
of the activity dates mean I want to keep 95837497 and mark the others as archived and have a 
pointer on each of them to 95837497.  That's why I thought if I could get to the following result 
it would make it simpler.

master_record    sub_record   nxt_record
95845433           25860032     95837497
95118740           25860032     95837497
95837497           25860032     95837497

Hope that makes sense! 


Followup   December 16, 2003 - 11am Central time zone:

oh, then nxt_record is just 

last_value(master_record) over (partition by sub_record order by moddate)


  

5 stars Why...   December 16, 2003 - 1pm Central time zone
Reviewer: Steve from UK
it didn't work for me.  I had to change it to

first_value(master_record) over (partition by sub_record order by moddate desc)

Is there a reason for that? 


Followup   December 16, 2003 - 2pm Central time zone:

doh, default window clause is current row and unbounded preceding

i would have needed a window clause that looks forwards rather then backwards (reason #1 why I 
should always set up a test case instead of just answering on the fly)

your solution of reversing the data works just fine. 

3 stars Another solution   December 16, 2003 - 4pm Central time zone
Reviewer: A reader 
The following gives the same result ...

select cllocn master_record, nvl(master_cllocn,cllocn) sub_record
       , max(cllocn) keep (dense_rank last order by moddate)
                     over (partition by nvl(master_cllocn,cllocn)) nxt_record
  from activity, ext_dedupe where cllocn = dupe_cllocn

MASTER_RECORD    SUB_RECORD    NXT_RECORD
95118740        25860032    95837497
95837497        25860032    95837497
95845433        25860032    95837497
 


Followup   December 16, 2003 - 5pm Central time zone:

yes, there are many many ways to do this.

first_value
last_value

substring of max() without keep 

sure. 

3 stars   December 16, 2003 - 4pm Central time zone
Reviewer: A reader 
Actually the nvl(master_cllocn...) is required only if you need all 4 rows in the output as 
follows(there is an outer join involved). If you need only the 3 rows as shown in the above post, 
there is no need for the nvl's....

select cllocn master_record, nvl(master_cllocn,cllocn) sub_record
       , max(cllocn) keep (dense_rank last order by moddate)
                     over (partition by nvl(master_cllocn,cllocn)) nxt_record
       , last_value(cllocn) over (partition by nvl(master_cllocn,cllocn) order by moddate) nxt
  from activity, ext_dedupe where cllocn = dupe_cllocn (+)

MASTER_RECORD    SUB_RECORD    NXT_RECORD
25860032        25860032    95837497
95118740        25860032    95837497
95837497        25860032    95837497
95845433        25860032    95837497 


4 stars still q's on analytics   January 30, 2004 - 10am Central time zone
Reviewer: A reader from Madison, wi
Okay, so my web application logs "web transaction" statistics to a table.  This actually amounts to 
0 to many database tranactions... but anyway.. I need to summarize (sum, min, max, count, average) 
each day's transaction times for each class (name2) and action (name3) and ultimately "archive" 
this data to a hisory table.  I am running 8.1.7 and pretty new to analytics.

My table looks like this:

SQL> desc tran_stats
 Name                    Null?    Type
 ----------------------- -------- ----------------
 ID                      NOT NULL NUMBER(9)
 NAME1                            VARCHAR2(100)
 NAME2                            VARCHAR2(100)
 NAME3                            VARCHAR2(100)
 NAME4                            VARCHAR2(100)
 SEC                     NOT NULL NUMBER(9,3)
 TS_CR                   NOT NULL DATE

        ID NAME1 NAME2                     NAME3         SEC NAME4 TS_CR
---------- ----- ------------------------- ---------- ------ ----- ---------
     35947       /CM01_PersonManagement    CREATE       .484       15-JAN-04
     35987       /CM01_PersonManagement    CREATE       .031       15-JAN-04
     36086       /CM01_PersonManagement    EDIT         .312       16-JAN-04
     36555       /CM01_PersonManagement    CREATE       .297       19-JAN-04
     36623       /CM01_PersonManagement    EDIT         .375       19-JAN-04
     36627       /CM01_PersonManagement    CREATE       .047       19-JAN-04
     36756       /CM01_AddressManagement   CREATE       .375       20-JAN-04
     36766       /CM01_AddressManagement   CREATE       .305       20-JAN-04
     36757       /CM01_AddressManagement   INSERT       .391       20-JAN-04
     37178       /CM01_PersonManagement    EDIT         .203       20-JAN-04

and I need output like this:

TS_CR     NAME2                     NAME3       M_SUM  M_MIN  M_MAX M_COUNT  M_AVG
--------- ------------------------- ---------- ------ ------ ------ ------- ------
20-JAN-04 /CM01_AddressManagement   CREATE       .680   .305   .375       2   .340
20-JAN-04 /CM01_AddressManagement   INSERT       .391   .391   .391       1   .391
20-JAN-04 /CM01_PersonManagement    EDIT         .203   .203   .203       1   .203
19-JAN-04 /CM01_PersonManagement    CREATE       .344   .047   .297       2   .172
19-JAN-04 /CM01_PersonManagement    EDIT         .375   .375   .375       1   .375
16-JAN-04 /CM01_PersonManagement    EDIT         .312   .312   .312       1   .312
15-JAN-04 /CM01_PersonManagement    CREATE       .515   .031   .484       2   .258


This seems to work, but there has to be a better/cleaner/more efficient way to do this:

select distinct ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
from (
select  trunc(ts_cr) ts_cr,id, name2, name3, sum(sec) m_dummy
 , min(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_min
 , max(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_max
 , round(avg(sum(sec)) over(partition by name2,name3,trunc(ts_cr)),5) as m_avg
 , count(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_count
 , sum(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_sum
        from tran_stats group by name2, name3,trunc(ts_cr),id
)n order by 1 desc, 2, 3;


Any help or pointers would be appreciated.  Thanks in advance.



 


Followup   January 30, 2004 - 10am Central time zone:

why does there "have to be"?

what is "unclean" about this?  I could make it more verbose (and perhaps more readable) but this 
does exactly what you ask for?

It seems pretty "good", very "clean" and probably the most efficient method to get this result? 

2 stars Regarding the previous post ...   January 30, 2004 - 11am Central time zone
Reviewer: A reader 
Am I missing something or will the following do the same ..

select trunc(ts_cr) ts_cr, name2, name3,
       count(*) m_count, min(sec) m_min, max(sec) m_max,
       sum(sec) m_sum, avg(sec) m_avg
from tran_stats
group by trunc(ts_cr), name2, name3
order by 1 desc, 2, 3 


Followup   January 30, 2004 - 7pm Central time zone:

with the supplied data -- since "group by trunc(ts_cr), name2, name3" happened to be unique

yes.

In general -- no.  consider:

ops$tkyte@ORA9IR2> select distinct ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
  2  from ( select trunc(ts_cr) ts_cr,
  3                id,
  4                            name2,
  5                            name3,
  6                            sum(sec) m_dummy ,
  7                            min(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_min ,
  8                            max(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_max ,
  9                            round(avg(sum(sec)) over(partition by name2,name3,trunc(ts_cr)),5) 
as m_avg ,
 10                            count(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as 
m_count ,
 11                            sum(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_sum
 12               from tran_stats
 13                  group by name2, name3,trunc(ts_cr),id
 14        )n
 15  MINUS
 16  select ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
 17  from (
 18  select trunc(ts_cr) ts_cr, name2, name3,
 19         count(*) m_count, min(sec) m_min, max(sec) m_max,
 20                    sum(sec) m_sum, avg(sec) m_avg
 21                            from tran_stats
 22                            group by trunc(ts_cr), name2, name3 )
 23  /
 
no rows selected
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into tran_stats
  2  select 35947,'/CM01_PersonManagement','CREATE', .484  ,'15-JAN-04'
  3   from all_users where rownum <= 5;
 
5 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select distinct ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
  2  from ( select trunc(ts_cr) ts_cr,
  3                id,
  4                            name2,
  5                            name3,
  6                            sum(sec) m_dummy ,
  7                            min(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_min ,
  8                            max(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_max ,
  9                            round(avg(sum(sec)) over(partition by name2,name3,trunc(ts_cr)),5) 
as m_avg ,
 10                            count(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as 
m_count ,
 11                            sum(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_sum
 12               from tran_stats
 13                  group by name2, name3,trunc(ts_cr),id
 14        )n
 15  MINUS
 16  select ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
 17  from (
 18  select trunc(ts_cr) ts_cr, name2, name3,
 19         count(*) m_count, min(sec) m_min, max(sec) m_max,
 20                    sum(sec) m_sum, avg(sec) m_avg
 21                            from tran_stats
 22                            group by trunc(ts_cr), name2, name3 )
 23  /
 
TS_CR     NAME2                   NAME3         M_SUM      M_MIN      M_MAX    M_COUNT      M_AVG
--------- ----------------------- -------- ---------- ---------- ---------- ---------- ----------
15-JAN-04 /CM01_PersonManagement  CREATE        2.935       .031      2.904          2     1.4675



add more data and it won't be the same. 

3 stars OK   January 31, 2004 - 9am Central time zone
Reviewer: Siva from Bangalore,India
Dear Tom,
Can analytics be used for the following formats of the same query
 sql>select ename,nvl(ename,'Name is null') from emp
 sql>select ename,decode(ename,null,'Name is null',ename)
 from emp
If you know other ways,Please let me know
Bye!
 


Followup   January 31, 2004 - 10am Central time zone:

umm, why ?


 

5 stars with analytics   February 18, 2004 - 7am Central time zone
Reviewer: A reader 
with the following data

-- ------
1  val1_1
1  val1_2
1  val1_3
2  val1_1
2  val2_2
 
can i produce 

-- ------  --------------------
1  val1_1  val1_1,val1_2,val1_3
1  val1_2  val1_1,val1_2,val1_3 
1  val1_3  val1_1,val1_2,val1_3
2  val2_1  val2_1,val2_2
2  val2_2  val2_1,val2_2

with an  analytic that rocks
 


Followup   February 18, 2004 - 8pm Central time zone:

if

select max(count(*)) from t group by id

has a reasonable maximum -- yes, but it would be a trick lag/lead thing.

I would probably join using stragg.  join the details to the aggregate using inline views. 

3 stars OK   March 1, 2004 - 9am Central time zone
Reviewer: Siddiq from UAE
Hi Tom,
What can be the business use cases of the analytic functions
1)cume_dist
2)percentile_dist
3)percentile_cont
Where can they be of immense use?
Bye! 


Followup   March 1, 2004 - 10am Central time zone:

they are just statistical functions for analysis.

2 and 3 are really variations on eachother (disc=discrete, cont=continuous) and would be used to 
compute pctcentiles (like you might see on an SAT test report from back in high school).  
percentile_* can be used to find a median for example :)

cume_dist is a variation on that.  I'll cheat on an example, from the doc:

Analytic Example 

The following example calculates the salary percentile for each employee in the purchasing area. 
For example, 40% of clerks have salaries less than or equal to Himuro. 

SELECT job_id, last_name, salary, CUME_DIST() OVER (PARTITION BY job_id ORDER BY salary) AS 
cume_dist FROM employees WHERE job_id LIKE  PU% ; 

JOB_ID     LAST_NAME                 SALARY     CUME_DIST 
---------- ------------------------- ---------- ---------- 
PU_CLERK   Colmenares                2500        .2 
PU_CLERK   Himuro                    2600        .4 
PU_CLERK   Tobias                    2800        .6 
PU_CLERK   Baida                     2900        .8 
PU_CLERK   Khoo                      3100         1 
PU_MAN     Raphaely                  11000        1 

5 stars Stumped on Analytics   March 4, 2004 - 9am Central time zone
Reviewer: Dave Thompson from West Yorkshire, England.
Hi Tom,

I have the following two tables:

CREATE TABLE PAY_M
(
  PAY_ID  NUMBER,
  PAYMENT  NUMBER
)

--
--

CREATE TABLE PREM
(
  PREM_ID       NUMBER,
  PREM_PAYMENT  NUMBER
)

With the following data:

INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES ( 
1, 100); 
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES ( 
2, 50); 
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES ( 
3, 50); 
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES ( 
4, 50); 
COMMIT;
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES ( 
1, 50); 
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES ( 
2, 25); 
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES ( 
3, 50); 
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES ( 
4, 50); 
COMMIT;

PAY_M contains payments made against the premiums in the table prem.

Payments:

    PAY_ID    PAYMENT
---------- ----------
         1         50
         2         25
         3         50
         4         50

Prem:

   PREM_ID PREM_PAYMENT
---------- ------------
         1          100
         2           50
         3           50
         4           50

We are trying to find which payment Ids paid each premium payment in Prem.  The payments are 
assigned sequentially to the premiums.

For example payments 1,2 & 3 pay off the £100 in premium 1 leaving £25.  Then the remaining payment 
from payment 3 & payment 4 pay off premium 2 leaving a balance of £25, and so on.

We are trying to create a query that will use the analytical functions to find all the payment IDs 
that pay off the associated premium ids.  We want to keep this SQL based as we need to Process 
about 30 million payments!

Thanks.

Great website, hope you enjoyed your recent visit to the UK. 


Followup   March 4, 2004 - 1pm Central time zone:

let me make sure I have this straight -- you want to 

o sum up the first 3 records in payments
o discover they are 125 which exceeds 100
o output the fact that prem_id 1 is paid for by pay_id 1..3
o carry forward 25 from 3, discover that leftover 3+4 = 75 pays for prem_id 2 
  with 25 extra

while I believe (not sure) that the 10g MODEL clause might be able to do this (if you can do it in 
a spreadsheet, we can use the MODEL clause to do it).....

I'm pretty certain that analytics cannot -- we would need to recursively use lag (eg: after finding 
that 1,2,3 pay off 1, we'd need to -- well, it's hard to explain...)

I cannot see analytics doing this -- future rows depend on functions of the analytics from past 
rows and that is just "not allowed".


I can see how to do this in a pipelined PLSQL function -- will that work for you? 

3 stars Oops - Error in previous post   March 4, 2004 - 10am Central time zone
Reviewer: Dave Thompson from West Yorkshire, England
Tom,

Sorry, ignore the above tables as they are missing the joining column:

CREATE TABLE PAY_M
(
  PREM_ID  NUMBER,
  PAY_ID   NUMBER,
  PAYMENT  NUMBER
)

INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES ( 
1, 1, 50); 
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES ( 
1, 2, 25); 
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES ( 
1, 3, 50); 
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES ( 
1, 4, 50); 
COMMIT;

CREATE TABLE PREM
(
  PREM_ID       NUMBER,
  PAY_ID        NUMBER,
  PREM_PAYMENT  NUMBER
)

INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES ( 
1, 1, 100); 
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES ( 
1, 2, 50); 
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES ( 
1, 3, 50); 
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES ( 
1, 4, 50); 
COMMIT;


SQL> l
  1  SELECT *
  2* FROM   PAY_M
SQL> /

   PREM_ID     PAY_ID    PAYMENT
---------- ---------- ----------
         1          1         50
         1          2         25
         1          3         50
         1          4         50

SQL> select *
  2  from prem;

   PREM_ID     PAY_ID PREM_PAYMENT
---------- ---------- ------------
         1          1          100
         1          2           50
         1          3           50
         1          4           50

 


4 stars Thanks.....   March 5, 2004 - 4am Central time zone
Reviewer: Dave Thompson from West Yorkshire, England.
Tom,

Thanks for your prompt response.

I am familiar with Pipeline functions.

I was however hoping we could do this as a set based opertion because of the volume of data 
involved.

Thanks for your time. 


3 stars analytics book   March 5, 2004 - 5am Central time zone
Reviewer: Ron Chennells from UK
Just another vote and pre order for the analytics book 


3 stars OK   March 19, 2004 - 12am Central time zone
Reviewer: Gerhard from Dusseldorf,Germany
Dear Tom,
 I used the following query to find the difference of salaries between employees.

SQL> select ename,sal,sal-lag(sal) over(order by sal) as diff_sal from emp;

ENAME             SAL   DIFF_SAL                                                
---------- ---------- ----------                                                
SMITH             800                                                           
JAMES             950        150                                                
ADAMS            1100        150                                                
WARD             1250        150                                                
MARTIN           1250          0                                                
MILLER           1300         50                                                
TURNER           1500        200                                                
ALLEN            1600        100                                                
CLARK            2450        850                                                
BLAKE            2850        400                                                
JONES            2975        125                                                

ENAME             SAL   DIFF_SAL                                                
---------- ---------- ----------                                                
SCOTT            3000         25                                                
FORD             3000          0                                                
KING             5000       2000                                                

14 rows selected.

My Question is:
" What is the difference between King's sal with other 
employees?".Could you please help with the query?
Bye! 


Followup   March 19, 2004 - 8am Central time zone:

scott@ORA9IR2> select ename,sal,sal-lag(sal) over(order by sal) as diff_sal ,
  2         sal-king_sal king_sal_diff
  3    from (select sal king_sal from emp where ename = 'KING'),
  4         emp
  5  /
 
ENAME             SAL   DIFF_SAL KING_SAL_DIFF
---------- ---------- ---------- -------------
SMITH             800                    -4200
JAMES             950        150         -4050
ADAMS            1100        150         -3900
WARD             1250        150         -3750
MARTIN           1250          0         -3750
MILLER           1300         50         -3700
TURNER           1500        200         -3500
ALLEN            1600        100         -3400
CLARK            2450        850         -2550
BLAKE            2850        400         -2150
JONES            2975        125         -2025
SCOTT            3000         25         -2000
FORD             3000          0         -2000
KING             5000       2000             0
 
14 rows selected.
 

3 stars Will this be faster?   March 19, 2004 - 4pm Central time zone
Reviewer: Venkat from Detroit
select ename, sal, 
       sal-lag(sal) over(order by sal) as diff_sal,
       sal - max(case when ename='KING' then sal
                 else null end) over () king_sal_diff
from emp
 


Followup   March 20, 2004 - 9am Central time zone:

when you benchmarked it and tested it to scale, what did you see?  it would be interesting no? 

4 stars lead/lag on different dataset   May 3, 2004 - 9pm Central time zone
Reviewer: Stalin from CA, USA
Hi Tom, 

I've similar requirement but i'm not sure how to use lead or lag to refer from a different dataset.

Eg. logs table has both login and logout information and they are identified by action column. 
There could be different login/logout modes so records that have action in (1,2) and (3,4,5,6,7) 
values are login and logout records respectively. Now i need to find signon and signout times and 
also session duration in mins.

here is some sample data of logs table :

LOG_ID LOG_CREATION_DATE       USER_ID    SERVICE     ACTION 
---------- -------------------  ---------- ---------- ---------- 
         1 04/29/2004 10:48:36           3          5          2 
         3 04/29/2004 10:53:44           3          5          3 
         5 04/29/2004 11:11:35           3          5          1 
      1003 05/03/2004 15:18:53           3          5          5 
      1004 05/03/2004 15:19:50           8          5          1 

here is a query i came up with (not exacly what i want) :

select log_id signon_id, lead(log_id, 1) over (partition by account_id, user_id, mac order by 
log_id) signoff_id,
       user_id, log_creation_date signon_date,
       lead(log_creation_date, 1) over (partition by account_id, user_id, mac order by 
log_creation_date) signoff_date,
       nvl(round(((lead(log_creation_date, 1)
           over (partition by account_id, user_id order by 
log_creation_date)-log_creation_date)*1440), 2), 0)  Usage_Mins
from   logs
where  account_id = 'Robert'
and    service = 5
order  by user_id

desired output :

 SIGNON_ID SIGNOFF_ID     USER_ID SIGNON_DATE         SIGNOFF_DATE        USAGE_MINS
---------- ----------  ---------- ------------------- ------------------- ----------
         1          3           3 04/29/2004 10:48:36 04/29/2004 10:53:44       5.13
         5       1003          3 04/29/2004 11:11:35 05/03/2004 15:18:53     6007.3
      1004                      8 05/03/2004 15:19:50                              0

Thanks in Advance,
Stalin
 


Followup   May 4, 2004 - 7am Central time zone:

maybe if you supply simple create table and insert ... values ... statements for me.... this stuff 
would go faster.

Your query references columns that are not in the example as well. 

3 stars Create table scripts   May 4, 2004 - 1pm Central time zone
Reviewer: Stalin from CA, USA
Sorry for not giving this info in the first place.

here goes the scripts....

create table logs (log_id number, log_creation_date date, account_id varchar2(25), user_id number,
service number, action number, mac varchar2(50))
/

insert into logs values (1, to_date('04/29/2004 10:48:36'), 'Robert', 3, 5, 2, '00-00-00-00')
/
insert into logs values (3, to_date('04/29/2004 10:53:44'), 'Robert', 3, 5, 3, '00-00-00-00')
/
insert into logs values (5, to_date('04/29/2004 11:11:35'), 'Robert', 3, 5, 1, '00-00-00-00')
/
insert into logs values (1003, to_date('05/03/2004 15:18:53'), 'Robert', 3, 5, 5, '00-00-00-00')
/
insert into logs values (1004, to_date('05/03/2004 15:19:50'), 'Robert', 8, 5, 1, '00-00-00-00')
/

The reason for including mac in the partition group is cause users can login via multiple pc's 
without logging out hence i grouped it on account_id, user_id and mac.

Thanks,
Stalin 


Followup   May 4, 2004 - 2pm Central time zone:

ops$tkyte@ORA9IR2> select a.* , round( (signoff_date-signon_date) * 24 * 60, 2 ) minutes
  2    from (
  3  select log_id,
  4         case when action in (1,2) and lead(action) over (partition by account_id,user_id,mac 
order by log_creation_date) in (3,4,5,6,7)
  5              then lead(log_id) over (partition by account_id, user_id, mac order by 
log_creation_date)
  6          end signoff_id,
  7         user_id,
  8         log_creation_date signon_date,
  9         case when action in (1,2) and lead(action) over (partition by account_id,user_id,mac 
order by log_creation_date) in (3,4,5,6,7)
 10              then lead(log_creation_date) over (partition by account_id, user_id, mac order by 
log_creation_date)
 11          end signoff_date,
 12                  action
 13  from   logs
 14  where  account_id = 'Robert'
 15  and    service = 5
 16  order  by user_id
 17         ) a
 18   where action in (1,2)
 19  /
 
    LOG_ID SIGNOFF_ID    USER_ID SIGNON_DATE         SIGNOFF_DATE            ACTION    MINUTES
---------- ---------- ---------- ------------------- ------------------- ---------- ----------
         1          3          3 04/29/2004 10:48:36 04/29/2004 10:53:44          2       5.13
         5       1003          3 04/29/2004 11:11:35 05/03/2004 15:18:53          1     6007.3
      1004                     8 05/03/2004 15:19:50                              1
 

5 stars Excellent   May 4, 2004 - 3pm Central time zone
Reviewer: Stalin from CA, USA
This is exactly what i'm looking for. 

Thanks so much! 


4 stars Help On SQL   May 4, 2004 - 8pm Central time zone
Reviewer: VKOUL from Lacey, WA
I want to substitute the non null value of a column to its null column. e.g.

If I have records like the following

year  month  column_value
----- ------ --------------------
2002   06    55
2002   06    57
2002   07    NULL
2002   08    NULL
2002   09    NULL
2002   10    100
2002   11    101

I want the results as below

year  month  column_value
----- ------ --------------------
2002   06    55
2002   06    57
2002   07    57                      ------> Repeated
2002   08    57                      ------> Repeated
2002   09    57                      ------> Repeated
2002   10    100
2002   11    101
 


Followup   May 4, 2004 - 9pm Central time zone:

create table, 
insert into table

much appreciated......... (so i don't spend days of my life making create tables and insert into 
statements.  I've added this request to all pages where you can input stuff and I'll just be asking 
for it from now on in......  Not picking on you, just reminding everyone that i need a script like 
I provide.....)


but..... asked and answered:

http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:10286792840956



 

4 stars Help On SQL   May 4, 2004 - 11pm Central time zone
Reviewer: VKoul 
Beautiful !!!

I'll keep in mind "create table etc."

Thanks

VKoul 


5 stars analytic q   May 11, 2004 - 6pm Central time zone
Reviewer: A reader 
Tom
Please look at the following schema and data.
---------
spool schema
set echo on
drop table host_instances;
drop table rac_instances;
drop table instance_tablespaces;

create table host_instances
(
  host_name varchar2(50),
  instance_name varchar2(50)
);

create table rac_instances
(
  rac_name varchar2(50),
  instance_name varchar2(50)
);

create table instance_tablespaces
(
  instance_name varchar2(50),
  tablespace_name varchar2(50),
  tablespace_size number
);

-- host to instance mapping data
insert into host_instances values ( 'h1', 'i1' );
insert into host_instances values ( 'h2', 'i2' );
insert into host_instances values ( 'h3', 'i3' );
insert into host_instances values ( 'h4', 'i4' );
insert into host_instances values ( 'h5', 'i5' );

-- rac to instance mapping data

insert into rac_instances values ( 'rac1', 'i1' );
insert into rac_instances values ( 'rac1', 'i2' );
insert into rac_instances values ( 'rac2', 'i3' );
insert into rac_instances values ( 'rac2', 'i4' );

--- instance to tablespace mapping data
insert into instance_tablespaces values( 'i1', 't11', 100 );
insert into instance_tablespaces values( 'i1', 't12', 200 );
insert into instance_tablespaces values( 'i2', 't11', 100 );
insert into instance_tablespaces values( 'i2', 't12', 200 );
insert into instance_tablespaces values( 'i3', 't31', 500 );
insert into instance_tablespaces values( 'i3', 't32', 300 );
insert into instance_tablespaces values( 'i4', 't31', 500 );
insert into instance_tablespaces values( 'i4', 't32', 300 );
insert into instance_tablespaces values( 'i5', 't51', 400 );
commit;
---------

What I need is to sum up all tablespaces of all instances
for a list of hosts. However, if two hosts in the list
belong to a RAC then I should only pick one of the
hosts (I can pick any one of them.)
e.g. in the above data I should only pick i1 or i2 not
both since they both belong to the same RAC 'rac1'.

Following is the select I came up with for the above data.
Let me know if you have any comments on it.
Any other alternative solutions you can think of would
also be educating to me. I have not benchmarked this
select yet. The number of hosts could reach up to 2000
approximately. On an average we can assume each will have
one instance - some of these will be RACs.

Thank you!
-----------
scott@ora10g> set echo on
scott@ora10g> column host_name format a10
scott@ora10g> column instance_name format a10
scott@ora10g> column rac_name format a10
scott@ora10g> column row_number format 999
scott@ora10g> 
scott@ora10g> select a.instance_name, sum( tablespace_size )
  2  from
  3  (
  4    select instance_name
  5    from
  6    (
  7       select host_name, instance_name, rac_name,
  8         row_number() over
  9           (
 10             partition by rac_name
 11             order by rac_name, instance_name
 12           ) row_number
 13       from
 14       (
 15         select hi.host_name, hi.instance_name, ri.rac_name
 16         from host_instances hi, rac_instances ri
 17         where hi.instance_name = ri.instance_name(+)
 18       )
 19    )
 20    where row_number <= 1
 21  ) a, instance_tablespaces e
 22  where a.instance_name = e.instance_name
 23  group by a.instance_name;

i1                          300
i3                          800
i5                          400


---

Also do you prefer the .sql file (as above) or
the spooled output of schema.sql (i.e. schema.lst.) 
The above is more convenient to reproduce - but the spooled output makes for better reading in some 
cases.


 


Followup   May 11, 2004 - 9pm Central time zone:

I like the cut and paste from sqlplus truth be told.

sure, I have to do two vi commands and a couple of deletes to fix it up but.... I'm fairly certain 
that the poster *actually ran the commands successfully!* which is most relevant to me....

Besides, I do it to you ;)




ops$tkyte@ORA9IR2> select *
  2    from (
  3  select h.host_name, h.instance_name, r.rac_name, sum(t.tablespace_size),
  4         row_number() over (partition by r.rac_name order by h.host_name ) rn
  5    from host_instances h,
  6             rac_instances r,
  7             instance_tablespaces t
  8   where h.instance_name = r.instance_name(+)
  9     and h.instance_name = t.instance_name
 10   group by h.host_name, h.instance_name, r.rac_name
 11         )
 12   where rn = 1
 13  /
 
HO IN RAC_N SUM(T.TABLESPACE_SIZE)         RN
-- -- ----- ---------------------- ----------
h1 i1 rac1                     300          1
h3 i3 rac2                     800          1
h5 i5                          400          1


is the first thing that popped into my head.

with just a couple hundred rows -- any of them will perform better than good enough. 

5 stars thanx!   May 11, 2004 - 9pm Central time zone
Reviewer: A reader 
"I like the cut and paste from sqlplus truth be told."
Actually I was going to post that only - but your
 example at the point of posting led me to believe
that you want a straight sql - may be you wanna 
fix that (not that many people seem to care anyways!:))

Thanx for the sql - it looks good and a tad simpler
than the one I wrote...
 


5 stars How to compute this running total (sort of...)   May 18, 2004 - 11am Central time zone
Reviewer: Kishan from USA
create table investment (
 investment_id number,
 asset_id number,
 agreement_id number,
 constraint pk_i primary key (investment_id)
)
/
create table period (
 period_id number,
 business_domain  varchar2(10),
 status_code      varchar2(10),
 constraint pk_p primary key (period_id)
)
/
create table entry (
 entry_id number,
 period_id number,
 investment_id number,
 constraint pk_e primary key(entry_id),
 constraint fk_e_period     foreign key(period_id) references period(period_id),
 constraint fk_e_investment foreign key (investment_id) references investment(investment_id)
)
/
create table entry_detail(
 entry_id number,
 account_type varchar2(10),
 amount      number,
 constraint pk_ed primary key(entry_id, account_type),
 constraint fk_ed_entry foreign key(entry_id) references entry(entry_id)
)
/
insert into period (period_id, business_domain, status_code)
SELECT  rownum     AS period_id,
        'BDG'      AS business_domain,
        '2'        AS status_code
from all_objects where rownum <= 5
/

insert into investment(investment_id, asset_id, agreement_id)
select rownum+10   AS investment_id,
       rownum+100  AS asset_id,
       rownum+1000 AS agreement_id
from all_objects where rownum <=5
/
insert into entry(entry_id, period_id, investment_id) values (1, 1, 11)
/
insert into entry(entry_id, period_id, investment_id) values (2, 2, 11)
/
insert into entry(entry_id, period_id, investment_id) values (3, 3, 11)
/
insert into entry(entry_id, period_id, investment_id) values (4, 3, 13)
/
insert into entry(entry_id, period_id, investment_id) values (5, 4, 13)
/
insert into entry(entry_id, period_id, investment_id) values (6, 4, 14)
/
insert into entry(entry_id, period_id, investment_id) values (7, 5, 14)
/

insert into entry_detail(entry_id, account_type, amount) values(1, 'AC1', 1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(1, 'AC2', -200 )
/
insert into entry_detail(entry_id, account_type, amount) values(1, 'AC3', 300 )
/
insert into entry_detail(entry_id, account_type, amount) values(2, 'AC1', 200 )
/
insert into entry_detail(entry_id, account_type, amount) values(2, 'AC4', -1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(2, 'AC2', -500 )
/
insert into entry_detail(entry_id, account_type, amount) values(3, 'AC2', 2200 )
/
insert into entry_detail(entry_id, account_type, amount) values(3, 'AC1', 200 )
/
insert into entry_detail(entry_id, account_type, amount) values(4, 'AC4', -1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(4, 'AC2', -500 )
/
insert into entry_detail(entry_id, account_type, amount) values(5, 'AC2', 2200 )
/
insert into entry_detail(entry_id, account_type, amount) values(6, 'AC1', 200 )
/
insert into entry_detail(entry_id, account_type, amount) values(6, 'AC4', -1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(6, 'AC2', -500 )
/
insert into entry_detail(entry_id, account_type, amount) values(7, 'AC1', 2200 )
/
insert into entry_detail(entry_id, account_type, amount) values(7, 'AC3', 500 )
/
insert into entry_detail(entry_id, account_type, amount) values(7, 'AC4', 1200 )
/

scott@LDB.US.ORACLE.COM> select * from period;

 PERIOD_ID BUSINESS_D STATUS_COD
---------- ---------- ----------
         1 BDG        2
         2 BDG        2
         3 BDG        2
         4 BDG        2
         5 BDG        2

scott@LDB.US.ORACLE.COM> select * from investment;

INVESTMENT_ID   ASSET_ID AGREEMENT_ID
------------- ---------- ------------
           11        101         1001
           12        102         1002
           13        103         1003
           14        104         1004
           15        105         1005

scott@LDB.US.ORACLE.COM> select * from entry;

  ENTRY_ID  PERIOD_ID INVESTMENT_ID
---------- ---------- -------------
         1          1            11
         2          2            11
         3          3            11
         4          3            13
         5          4            13
         6          4            14
         7          5            14

7 rows selected.

scott@LDB.US.ORACLE.COM> select * from entry_detail;

  ENTRY_ID ACCOUNT_TY     AMOUNT
---------- ---------- ----------
         1 AC1              1000
         1 AC2              -200
         1 AC3               300
         2 AC1               200
         2 AC4             -1000
         2 AC2              -500
         3 AC2              2200
         3 AC1               200
         4 AC4             -1000
         4 AC2              -500
         5 AC2              2200
         6 AC1               200
         6 AC4             -1000
         6 AC2              -500
         7 AC1              2200
         7 AC3               500
         7 AC4              1200

17 rows selected.


The resultant view needed is given below.

To give an example from the result below, the first entry for investment_id 14
is from period 4. The account types entered on period 4 are AC1, AC4, AC2. We
need these three account types in all subsequent periods. Also, on period 5 a
new account type AC3 is added. So, if there is another period, say period_id 6, we need
information for AC1, AC2, AC3, AC4 (that's 4 account types). If there's no entry
for any of these account_types for any subseqent periods, the amount_for_period for such
periods are considered to be 0.00 and the balance will be sum(amount_for_period)
until that period.


PERIOD_ID INVESTMENT_ID ACCOUNT_TYPE AMOUNT_FOR_PERIOD BALANCE_TILL_PERIOD
--------- ------------- ------------ ----------------- -------------------
1                    11       AC1                 1000              1000
1                    11       AC2                 -200              -200
1                    11       AC3                  300               300

2                    11       AC1                  200              1200
2                    11       AC2                 -500              -700
2                    11       AC3                    0               300
2                    11       AC4                -1000             -1000

3                    11       AC1                  200              1400
3                    11       AC2                  200              -500
3                    11       AC3                    0               300
3                    11       AC4                    0              1000

4                    11       AC1                    0              1400
4                    11       AC2                    0              -500
4                    11       AC3                    0               300
4                    11       AC4                    0              1000

5                    11       AC1                    0              1400
5                    11       AC2                    0              -500
5                    11       AC3                    0               300
5                    11       AC4                    0              1000

3                    13       AC4                -1000             -1000
3                    13       AC2                 -500              -500

4                    13       AC4                    0             -1000
4                    13       AC2                 -500             -1000

5                    13       AC4                    0             -1000
5                    13       AC4                    0             -1000

4                    14       AC1                  200               200
4                    14       AC4                -1000             -1000
4                    14       AC2                 -500              -500

5                    14       AC1                 2200              2400
5                    14       AC3                  500               500
5                    14       AC4                 1200               200
5                    14       AC2                    0              -500

The blank lines in between are just for clarity. As always, grateful for all your efforts.

Regards,
Kishan.
 


Followup   May 18, 2004 - 6pm Central time zone:

so, what does your first try look like :)  at least get the join written up for the details - maybe 
the running total will be obvious from that. 

5 stars This is how far I went...and no further   May 19, 2004 - 10am Central time zone
Reviewer: Kishan from USA
select distinct period_id,
                investment_id,
                account_type,
                amount_for_period,
                balance_till_period
from (  select  period.period_id,
                entry.investment_id,
                entry_detail.account_type,
                  (case when entry.period_id = period.period_id then entry_detail.amount else 0 
end) amount_for_period,
                   sum(amount) over(partition by period.period_id, investment_id, account_type) 
balance_till_period
        from    period left outer join (entry join entry_detail on (entry.entry_id = 
entry_detail.entry_id)) on (entry.period_id <= period.period_id))
order by investment_id

The result looks as below:

 PERIOD_ID INVESTMENT_ID ACCOUNT_TY AMOUNT_FOR_PERIOD BALANCE_TILL_PERIOD
---------- ------------- ---------- ----------------- -------------------
         1            11 AC1                     1000                1000
         1            11 AC2                     -200                -200
         1            11 AC3                      300                 300

         2            11 AC1                        0                1200
         2            11 AC1                      200                1200
         2            11 AC2                     -500                -700
         2            11 AC2                        0                -700
         2            11 AC3                        0                 300
         2            11 AC4                    -1000               -1000

         3            11 AC1                        0                1400
         3            11 AC1                      200                1400
         3            11 AC2                        0                1500
         3            11 AC2                     2200                1500
         3            11 AC3                        0                 300
         3            11 AC4                        0               -1000

         4            11 AC1                        0                1400
         4            11 AC2                        0                1500
         4            11 AC3                        0                 300
         4            11 AC4                        0               -1000

         5            11 AC1                        0                1400
         5            11 AC2                        0                1500
         5            11 AC3                        0                 300
         5            11 AC4                        0               -1000

         3            13 AC2                     -500                -500
         3            13 AC4                    -1000               -1000

         4            13 AC2                        0                1700
         4            13 AC2                     2200                1700
         4            13 AC4                        0               -1000

         5            13 AC2                        0                1700
         5            13 AC4                        0               -1000

         4            14 AC1                      200                 200
         4            14 AC2                     -500                -500
         4            14 AC4                    -1000               -1000

         5            14 AC1                        0                2400
         5            14 AC1                     2200                2400
         5            14 AC2                        0                -500
         5            14 AC3                      500                 500
         5            14 AC4                        0                 200
         5            14 AC4                     1200                 200

First, I am sorry my originally constructed result (by hand..;) misses a couple of rows . 
However, other than that, I am unable to remove the redundant rows that are shows up for the 
particular investment and accout_type for a period as the logic beats me. 

Basically, I need to remove rows where the amount_for_period is 0 for an account_type only if its a 
redundant row for that set. That is, the first row of period_id 2 and 3 are redundant but the rows 
for the period 4 are not redundant.
 
Could you help me out?

Regards,
Kishan. 


Followup   May 19, 2004 - 11am Central time zone:

are we missing some more order bys?  I mean -- what if:

         3            11 AC1                        0                1400
         3            11 AC1                      200                1400
         3            11 AC2                        0                1500
         3            11 AC2                     2200                1500
         3            11 AC3                        0                 300
         3            11 AC4                        0               -1000

was really:


         3            11 AC1                      200                1400
         3            11 AC2                        0                1500
         3            11 AC2                     2200                1500
         3            11 AC3                        0                 300
         3            11 AC4                        0               -1000
         3            11 AC1                        0                1400

would that still be redundant?  missing something here/ 

4 stars Yes...they are redundant   May 19, 2004 - 12pm Central time zone
Reviewer: A reader 
Tom:
Yes, for that particular set, those rows are redundant, no matter what the order is. 


Regards,
Kishan. 


Followup   May 19, 2004 - 2pm Central time zone:

ok, so what is the "key" of that result set?  what can we partition the result set by.

my idea will be to use your query in an inline view and analytics on that to weed out what you 
want. 

4 stars   May 19, 2004 - 3pm Central time zone
Reviewer: Kishan from USA
The key would be period_id, investment_id and accout_type. Basically, what the result represents is 
the amount and the balance-to-date for a particular account_type of an investment_id for a period. 

Eg: Period 1->Investment 1->Account_Type AC1->Amount=1000->Balance=1000

If there's no activity on that investment and account_type for the next period, say Period 2, the 
amount will be 0 for that period, and the balance will be previous period's balance. 

Period 1->Investment 1->Account_Type AC1->Amount=1000->Balance=1000
Period 2->Investment 1->Account_Type AC1->Amount=0->Balance = 1000

But, if there's an activity on that account_type for that investment, then the amount will be the 
amount for that period and balance will be the sum of previous balance and current amount. Say for 
Period 2, the amount is 500, then

Period 1->Investment 1->Account_Type AC1->Amount=1000-> Balance=1000
Period 2->Investment 1->Account_Type AC1->Amount=500-> Balance=1500

And if there's a new account type entry, say AC2 and amount, say 2000 created for period 2, then 
the result set will be

Period 1->Investment 1->Account_Type AC1->Amount=1000->Balance=1000
Period 2->Investment 1->Account_Type AC1->Amount=500->Balance=1500
Period 2->Investment 1->Account_Type AC2->Amount=2000->Balance=2000

There may be many investments per period and many account_types per investment. Hope I am clear....

Regards,
Kishan.
 


Followup   May 19, 2004 - 5pm Central time zone:

so... if you have:

 PERIOD_ID INVESTMENT_ID ACCOUNT_TY AMOUNT_FOR_PERIOD BALANCE_TILL_PERIOD
---------- ------------- ---------- ----------------- -------------------
         1            11 AC1                     1000                1000
         1            11 AC2                     -200                -200
         1            11 AC3                      300                 300

         2            11 AC1                        0                1200
         2            11 AC1                      200                1200
         2            11 AC2                     -500                -700
         2            11 AC2                        0                -700
         2            11 AC3                        0                 300
         2            11 AC4                    -1000               -1000

you see though, why isn't the 4th line here "redundant" then? 

4 stars But it is redundant..   May 19, 2004 - 11pm Central time zone
Reviewer: Kishan from USA
Tom, I am assuming the 4th line you mention is  2->11->AC2->0->-700. Yes, it is redundant. 

We need amount and balance for every period_id, investment_id and account_type. One line, per 
period_id, investment_id and account_type, anything more, is redundant.  

Issue is, there may not be entries for a specific account_type of an investment for a particular 
period. In such cases, we need to assume amount for such periods are 0 and compute the balances 
accordingly.

Regards,
Kishan 


Followup   May 20, 2004 - 10am Central time zone:

so, if you partition by 

 PERIOD_ID INVESTMENT_ID ACCOUNT_TY BALANCE_TILL_PERIOD

order by 
AMOUNT_FOR_PERIOD 

select a.*, lead(amount_for_period) over (partition by .... order by ... ) nxt
  from (YOUR_QUERY)


you can then

select *
  from (that_query)
 where nxt is NULL or (nxt is not null and amount_for_period <> 0)

if nxt is null -- last row in the partition, keep it.
if nxt is not null AND we are zero -- remove it.


 

4 stars Almost there?   May 20, 2004 - 12pm Central time zone
Reviewer: Dave Thompson from UK
Hi Tom,

We have the following table of data:

CREATE TABLE DEDUP_TEST
(
  ID          NUMBER,
  COLUMN_A    VARCHAR2(10 BYTE),
  COLUMN_B    VARCHAR2(10 BYTE),
  COLUMN_C    VARCHAR2(10 BYTE),
  START_DATE  DATE,
  END_DATE    DATE
)

With:

INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
1, 'A', 'B', 'C',  TO_Date( '10/01/1999 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2000 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
1, 'D', 'B', 'C',  TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
1, 'A', 'B', 'C',  TO_Date( '10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'a', 'f', 'f',  TO_Date( '02/06/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'02/07/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/01/2000 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/02/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/05/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/02/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/03/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/04/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/06/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
3, 'A', 'F', 'F',  TO_Date( '02/10/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'02/20/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
COMMIT;

We are trying to sequentially de-duplicate this data.

Basically from the top of the table we go down and the check each row against the previous.  If 
they are the same the row that is a duplicate is marked as such as is the original row.

So far we have this query:

SELECT ID,
       COLUMN_A,
       COLUMN_B,
       COLUMN_C,
       START_DATE,
       END_DATE,
       CASE WHEN ( DUP = 'DUP' OR DUPER = 'DUP' ) THEN 'DUP' ELSE 'NOT' END LETSEE
FROM (
SELECT ID,
       COLUMN_A,
       COLUMN_B,
       COLUMN_C,
       START_DATE,
       END_DATE,
       DUP,
       CASE  WHEN COLUMN_A = NEXT_A 
                AND  COLUMN_B = NEXT_B
             AND  COLUMN_C = NEXT_C THEN 'DUP' ELSE 'NOT' END DUPER
FROM (
SELECT ID,
       COLUMN_A,
       COLUMN_B,
       COLUMN_C,
       START_DATE,
       END_DATE,
       NEXT_A,
       NEXT_B,
       NEXT_C,
       CASE  WHEN COLUMN_A = PREV_A 
                AND  COLUMN_B = PREV_B
             AND  COLUMN_C = PREV_C THEN 'DUP' ELSE 'NOT' END DUP
FROM (  SELECT  ID, 
           COLUMN_A,
        COLUMN_B,
        COLUMN_C,
        START_DATE,
        END_DATE,
        LAG (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS prev_A,
        LAG (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS prev_B,
        LAG (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS prev_C,
        LEAD (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS next_A,
        LEAD (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS next_B,
        LEAD (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS next_C
FROM    DEDUP_TEST
ORDER 
BY      1, 5 ) ) )    

        ID COLUMN_A   COLUMN_B   COLUMN_C   START_DAT END_DATE  LET
---------- ---------- ---------- ---------- --------- --------- ---
         1 A          B          C          01-OCT-99 01-OCT-00 NOT
         1 D          B          C          01-OCT-01 01-OCT-02 NOT
         1 A          B          C          01-OCT-02 01-OCT-03 NOT
         2 A          B          B          01-OCT-00 01-OCT-01 DUP
         2 A          B          B          01-OCT-01 01-OCT-03 DUP
         2 A          B          B          02-OCT-01 05-OCT-03 DUP
         2 a          f          f          06-FEB-04 07-FEB-04 NOT
         2 A          B          B          02-OCT-05 03-OCT-05 DUP
         2 A          B          B          04-OCT-05 06-OCT-05 DUP
         3 A          F          F          10-FEB-04 20-FEB-04 NOT

The resultset from this is almost what I am after.

However where there are groups of duplicate rows I only want to return one row.  I take the 
attributes, the start_date of the first row duplicated and the end_date of the last row duplicated.

I do not want to group all the duplicates together, so for example the rows with the attributes

 ID COLUMN_A   COLUMN_B   COLUMN_C

  2 A          B          B

will result in two output rows:

2 A          B          B          01-OCT-00 01-OCT-03

2 A          B          B          02-OCT-05 06-OCT-05

This is the final piece I cannot work out.

Any help would be appreciated.

Thanks.

 


Followup   May 20, 2004 - 2pm Central time zone:

what happens in your data if you had

1     A1   B1    C1     ....
1     A2   B2    C2     ....
1     A1   B1    C1     ....

that might or might not be "dup" since you just order by ID?  don't we need to ordedr by a,b, and 
c? 

3 stars Follow up   May 21, 2004 - 5am Central time zone
Reviewer: Dave Thompson from UK
Hi Tom, 

In repsonse to your question:

what happens in your data if you had

1     A1   B1    C1     ....
1     A2   B2    C2     ....
1     A1   B1    C1     ....

Then the first row would be classed as unique, as would the second and the third.  We are only 
looking at duplicates that occur sequentially.  

Sequential duplicates are then turned into one row by taking the start date of the first row and 
the end date of the last row in the group.

The test data should have had sequential dates:

INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
1, 'A', 'B', 'C',  TO_Date( '10/01/1999 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2000 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
1, 'D', 'B', 'C',  TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
1, 'A', 'B', 'C',  TO_Date( '10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'a', 'f', 'f',  TO_Date( '02/06/2009 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'02/07/2010 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/01/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/01/2006 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/02/2007 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/05/2008 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/02/2011 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/03/2012 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
2, 'A', 'B', 'B',  TO_Date( '10/04/2013 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'10/06/2014 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES ( 
3, 'A', 'F', 'F',  TO_Date( '02/10/2014 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( 
'02/20/2015 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM')); 
COMMIT;

CREATE TABLE DEDUP_TEST
(
  ID          NUMBER,
  COLUMN_A    VARCHAR2(10 BYTE),
  COLUMN_B    VARCHAR2(10 BYTE),
  COLUMN_C    VARCHAR2(10 BYTE),
  START_DATE  DATE,
  END_DATE    DATE
)

The query:

SELECT ID,
       COLUMN_A,
       COLUMN_B,
       COLUMN_C,
       START_DATE,
       END_DATE,
       CASE WHEN ( DUP = 'DUP' OR DUPER = 'DUP' ) THEN 'DUP' ELSE 'NOT' END LETSEE
FROM (       
SELECT ID,
       COLUMN_A,
       COLUMN_B,
       COLUMN_C,
       START_DATE,
       END_DATE,
       DUP,
       CASE  WHEN COLUMN_A = NEXT_A 
                AND  COLUMN_B = NEXT_B
             AND  COLUMN_C = NEXT_C THEN 'DUP' ELSE 'NOT' END DUPER
FROM (
SELECT ID,
       COLUMN_A,
       COLUMN_B,
       COLUMN_C,
       START_DATE,
       END_DATE,
       NEXT_A,
       NEXT_B,
       NEXT_C,
       CASE  WHEN COLUMN_A = PREV_A 
                AND  COLUMN_B = PREV_B
             AND  COLUMN_C = PREV_C THEN 'DUP' ELSE 'NOT' END DUP
FROM (  SELECT  ID, 
           COLUMN_A,
        COLUMN_B,
        COLUMN_C,
        START_DATE,
        END_DATE,
        LAG (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS prev_A,
        LAG (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS prev_B,
        LAG (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS prev_C,
        LEAD (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS next_A,
        LEAD (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS next_B,
        LEAD (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS next_C
FROM    DEDUP_TEST
ORDER 
BY      ID, START_DATE ) ) ) 

Gives:

        ID COLUMN_A   COLUMN_B   COLUMN_C   START_DAT END_DATE  LET
---------- ---------- ---------- ---------- --------- --------- ---
         1 A          B          C          01-OCT-99 01-OCT-00 NOT
         1 D          B          C          01-OCT-01 01-OCT-02 NOT
         1 A          B          C          01-OCT-02 01-OCT-03 NOT
         2 A          B          B          01-OCT-03 01-OCT-04 DUP
         2 A          B          B          01-OCT-05 01-OCT-06 DUP
         2 A          B          B          02-OCT-07 05-OCT-08 DUP
         2 a          f          f          06-FEB-09 07-FEB-10 NOT
         2 A          B          B          02-OCT-11 03-OCT-12 DUP
         2 A          B          B          04-OCT-13 06-OCT-14 DUP
         3 A          F          F          10-FEB-14 20-FEB-15 NOT

From this the sequentially duplicated rows with the attributes a, b, c will become:

2  A B C 01-OCT-03 05-OCT-08

2  A B C 02-OCT-11 06-OCT-14

Thanks. 


Followup   May 21, 2004 - 10am Central time zone:

define sequentially.

1     A1   B1    C1     ....
1     A2   B2    C2     ....
1     A1   B1    C1     ....

ordered by ID is the same (exact same) as:

1     A1   B1    C1     ....
1     A1   B1    C1     ....
1     A2   B2    C2     ....

and

1     A2   B2    C2     ....
1     A1   B1    C1     ....
1     A1   B1    C1     ....


and in fact, two runs of your query could return different answers given the SAME exact data.  How 
to handle that, you must have something more to sort by. 

3 stars Typo in previous post   May 21, 2004 - 5am Central time zone
Reviewer: Dave Thompson from England
Tom, 

The final output should be:

From this the sequentially duplicated rows with the attributes a, b, c will 
become:

2  A B B 01-OCT-03 05-OCT-08

2  A B B 02-OCT-11 06-OCT-14

Thanks.
 


3 stars Order   May 21, 2004 - 10am Central time zone
Reviewer: Dave Thompson from England
Hi Tom,

The order of the dataset should be on the ID and Start Date.

        ID COLUMN_A   COLUMN_B   COLUMN_C   START_DAT END_DATE  LET
---------- ---------- ---------- ---------- --------- --------- ---
         1 A          B          C          01-OCT-99 01-OCT-00 NOT
         1 D          B          C          01-OCT-01 01-OCT-02 NOT
         1 A          B          C          01-OCT-02 01-OCT-03 NOT
         2 A          B          B          01-OCT-03 01-OCT-04 DUP
         2 A          B          B          01-OCT-05 01-OCT-06 DUP
         2 A          B          B          02-OCT-07 05-OCT-08 DUP
         2 a          f          f          06-FEB-09 07-FEB-10 NOT
         2 A          B          B          02-OCT-11 03-OCT-12 DUP
         2 A          B          B          04-OCT-13 06-OCT-14 DUP
         3 A          F          F          10-FEB-14 20-FEB-15 NOT

Thanks. 


Followup   May 21, 2004 - 11am Central time zone:

Ok, your example doesn't do that -- it is "non-deterministic", given the same data, it could/would 
return two different answers at different times during the day!


so, i think you want one of these:

ops$tkyte@ORA9IR2> select *
  2    from (
  3  select id, a,b,c, start_date, end_date,
  4         case when (a = lag(a) over (order by id, start_date desc) and
  5                    b = lag(b) over (order by id, start_date desc) and
  6                    c = lag(c) over (order by id, start_date desc) )
  7              then row_number() over (order by id, start_date)
  8          end rn
  9    from v
 10         )
 11   where rn is null
 12  /
 
        ID A          B          C          START_DAT END_DATE          RN
---------- ---------- ---------- ---------- --------- --------- ----------
         1 A          B          C          01-OCT-99 01-OCT-00
         1 D          B          C          01-OCT-01 01-OCT-02
         1 A          B          C          01-OCT-02 01-OCT-03
         2 A          B          B          02-OCT-07 05-OCT-08
         2 a          f          f          06-FEB-09 07-FEB-10
         2 A          B          B          04-OCT-13 06-OCT-14
         3 A          F          F          10-FEB-14 20-FEB-15
 
7 rows selected.
 
ops$tkyte@ORA9IR2> select *
  2    from (
  3  select id, a,b,c, start_date, end_date,
  4         case when (a = lag(a) over (order by id, start_date) and
  5                    b = lag(b) over (order by id, start_date) and
  6                    c = lag(c) over (order by id, start_date) )
  7              then row_number() over (order by id, start_date)
  8          end rn
  9    from v
 10         )
 11   where rn is null
 12  /
 
        ID A          B          C          START_DAT END_DATE          RN
---------- ---------- ---------- ---------- --------- --------- ----------
         1 A          B          C          01-OCT-99 01-OCT-00
         1 D          B          C          01-OCT-01 01-OCT-02
         1 A          B          C          01-OCT-02 01-OCT-03
         2 A          B          B          01-OCT-03 01-OCT-04
         2 a          f          f          06-FEB-09 07-FEB-10
         2 A          B          B          02-OCT-11 03-OCT-12
         3 A          F          F          10-FEB-14 20-FEB-15
 
7 rows selected.

we just need to mark records that the preceding record is the "same" after sorting -- then nuke 
them. 

2 stars More Info   May 21, 2004 - 12pm Central time zone
Reviewer: Dave Thompson from England, Sunny spells with cloud today.
Hi Tom,

Thanks for the prompt reply.

I re-wrote the base query:

SELECT ID,
       COLUMN_A,
       COLUMN_B,
       COLUMN_C,
       START_DATE,
       END_DATE,
       CASE WHEN ( DUP = 'DUP' OR DUPER = 'DUP' ) THEN 'DUP' ELSE 'NOT' END LETSEE
FROM (       
SELECT ID,
       COLUMN_A,
       COLUMN_B,
       COLUMN_C,
       START_DATE,
       END_DATE,
       DUP,
       CASE  WHEN COLUMN_A = NEXT_A 
                AND  COLUMN_B = NEXT_B
             AND  COLUMN_C = NEXT_C THEN 'DUP' ELSE 'NOT' END DUPER
FROM (
SELECT ID,
       COLUMN_A,
       COLUMN_B,
       COLUMN_C,
       START_DATE,
       END_DATE,
       NEXT_A,
       NEXT_B,
       NEXT_C,
       CASE  WHEN COLUMN_A = PREV_A 
                AND  COLUMN_B = PREV_B
             AND  COLUMN_C = PREV_C THEN 'DUP' ELSE 'NOT' END DUP
FROM (  SELECT  ID, 
           COLUMN_A,
        COLUMN_B,
        COLUMN_C,
        START_DATE,
        END_DATE,
        ROWID       ROWID_R,
        LAG  (COLUMN_A, 1, 0) OVER (ORDER BY ID, START_DATE) AS prev_A,
        LAG  (COLUMN_B, 1, 0) OVER (ORDER BY ID, START_DATE) AS prev_B,
        LAG  (COLUMN_C, 1, 0) OVER (ORDER BY ID, START_DATE) AS prev_C,
        LEAD (COLUMN_A, 1, 0) OVER (ORDER BY ID, START_DATE) AS next_A,
        LEAD (COLUMN_B, 1, 0) OVER (ORDER BY ID, START_DATE) AS next_B,
        LEAD (COLUMN_C, 1, 0) OVER (ORDER BY ID, START_DATE) AS next_C
FROM    DEDUP_TEST
ORDER 
BY      ID, START_DATE ) ) )

And got:

        ID COLUMN_A   COLUMN_B   COLUMN_C   START_DAT END_DATE  LET
---------- ---------- ---------- ---------- --------- --------- ---
         1 A          B          C          01-OCT-99 01-OCT-00 NOT
         1 D          B          C          01-OCT-01 01-OCT-02 NOT
         1 A          B          C          01-OCT-02 01-OCT-03 NOT
         2 A          B          B          01-OCT-03 01-OCT-04 DUP
         2 A          B          B          01-OCT-05 01-OCT-06 DUP
         2 A          B          B          02-OCT-07 05-OCT-08 DUP
         2 a          f          f          06-FEB-09 07-FEB-10 NOT
         2 A          B          B          02-OCT-11 03-OCT-12 DUP
         2 A          B          B          04-OCT-13 06-OCT-14 DUP
         3 A          F          F          10-FEB-14 20-FEB-15 NOT

Looking at the column LETSEE I want to add a unique identifier to each row, treating duplicated 
rows as 1.

For example:

        ID COLUMN_A   COLUMN_B   COLUMN_C   START_DAT END_DATE  LET DUP_ID
---------- ---------- ---------- ---------- --------- --------- --- ------
         1 A          B          C          01-OCT-99 01-OCT-00 NOT 1
         1 D          B          C          01-OCT-01 01-OCT-02 NOT 2
         1 A          B          C          01-OCT-02 01-OCT-03 NOT 3
         2 A          B          B          01-OCT-03 01-OCT-04 DUP 4
         2 A          B          B          01-OCT-05 01-OCT-06 DUP 4
         2 A          B          B          02-OCT-07 05-OCT-08 DUP 4
         2 a          f          f          06-FEB-09 07-FEB-10 NOT 5
         2 A          B          B          02-OCT-11 03-OCT-12 DUP 6
         2 A          B          B          04-OCT-13 06-OCT-14 DUP 6
         3 A          F          F          10-FEB-14 20-FEB-15 NOT 7

Then I could use the Dup_Id to partition on to do the anaysis I need.

Any idea?

Have a nice weekend.

Thanks.
 


Followup   May 21, 2004 - 1pm Central time zone:

the above query doesn't work? 

2 stars Hi Again   May 21, 2004 - 2pm Central time zone
Reviewer: Dave Thompson from England
Hi Tom,

The above didn't work.

From the source query:

        ID COLUMN_A   COLUMN_B   COLUMN_C   START_DAT END_DATE  LET
---------- ---------- ---------- ---------- --------- --------- ---
         1 A          B          C          01-OCT-99 01-OCT-00 NOT
         1 D          B          C          01-OCT-01 01-OCT-02 NOT
         1 A          B          C          01-OCT-02 01-OCT-03 NOT
         2 A          B          B          01-OCT-03 01-OCT-04 DUP
         2 A          B          B          01-OCT-05 01-OCT-06 DUP
         2 A          B          B          02-OCT-07 05-OCT-08 DUP
         2 a          f          f          06-FEB-09 07-FEB-10 NOT
         2 A          B          B          02-OCT-11 03-OCT-12 DUP
         2 A          B          B          04-OCT-13 06-OCT-14 DUP
         3 A          F          F          10-FEB-14 20-FEB-15 NOT

I want to output the following resultset:

        ID COLUMN_A   COLUMN_B   COLUMN_C   START_DAT END_DATE  LET
---------- ---------- ---------- ---------- --------- --------- ---
         1 A          B          C          01-OCT-99 01-OCT-00 NOT
         1 D          B          C          01-OCT-01 01-OCT-02 NOT
         1 A          B          C          01-OCT-02 01-OCT-03 NOT
         2 A          B          B          01-OCT-03 05-OCT-08 DUP
         2 a          f          f          06-FEB-09 07-FEB-10 NOT
         2 A          B          B          02-OCT-11 06-OCT-14 DUP
         3 A          F          F          10-FEB-14 20-FEB-15 NOT

On the resultset from your queries the start and end dates were incorrect.

Where duplicates rows occur one after another then we need to take the start_date of the first row 
and the end_date of the last row in that block.

So far the following:

         2 A          B          B          01-OCT-03 01-OCT-04 DUP
         2 A          B          B          01-OCT-05 01-OCT-06 DUP
         2 A          B          B          02-OCT-07 05-OCT-08 DUP

You would get

         2 A          B          B          01-OCT-03 05-OCT-08 DUP


Does this make sense?

Thanks again for you input on this.

 


Followup   May 21, 2004 - 2pm Central time zone:

ops$tkyte@ORA9IR2> select id, a,b,c, min(start_date) start_date, max(end_date) end_date
  2    from (
  3  select id, a,b,c, start_date, end_date,
  4         max(grp) over (order by id, start_date desc) grp
  5    from (
  6  select id, a,b,c, start_date, end_date,
  7         case when (a <> lag(a) over (order by id, start_date desc) or
  8                    b <> lag(b) over (order by id, start_date desc) or
  9                    c <> lag(c) over (order by id, start_date desc) )
 10              then row_number() over (order by id, start_date desc)
 11          end grp
 12    from v
 13         )
 14         )
 15   group by id, a,b,c,grp
 16   order by 1, 5
 17  /
 
        ID A          B          C          START_DAT END_DATE
---------- ---------- ---------- ---------- --------- ---------
         1 A          B          C          01-OCT-99 01-OCT-00
         1 D          B          C          01-OCT-01 01-OCT-02
         1 A          B          C          01-OCT-02 01-OCT-03
         2 A          B          B          01-OCT-03 05-OCT-08
         2 a          f          f          06-FEB-09 07-FEB-10
         2 A          B          B          02-OCT-11 06-OCT-14
         3 A          F          F          10-FEB-14 20-FEB-15
 
7 rows selected.


One of my (current) favorite analytic tricks -- the old "carry forward".  We mark rows such that 
the preceding row was different -- subsequent dup rows would have NULLS there for grp.  

Then, we use max(grp) to "carry" that number down....

Now we have something to group by -- we've divided the rows up into groups we can deal with.


(note: if a,b,c allow NULLS, we'll need to accomidate for that!) 

5 stars Great Stuff   May 21, 2004 - 5pm Central time zone
Reviewer: Dave Thompson from England, Overnight frost expected!
Tom,

Thanks very much for that.  

I'll go over it in more detail when I'm in the Office Monday but it looks great from here.

Enjoy the weekend. 


5 stars Excellent   June 2, 2004 - 4am Central time zone
Reviewer: Dave Thompson from Yorkshire
Hi Tom,

This solution was spot on.

Thanks.

Any more thoughts on an Analytics book?  


4 stars   June 9, 2004 - 6pm Central time zone
Reviewer: Stalin from CA, US
hi tom,

wondering what would below sql look like if there hadn't been existence of lead or partition 
analytical funtions. is pl/sql the only option.

snippet from "lead/lag on different dataset" thread (it's has the create and insert stmts)

ops$tkyte@ORA9IR2> select a.* , round( (signoff_date-signon_date) * 24 * 60, 2 )
minutes
  2    from (
  3  select log_id,
  4         case when action in (1,2) and lead(action) over (partition by
account_id,user_id,mac order by log_creation_date) in (3,4,5,6,7)
  5              then lead(log_id) over (partition by account_id, user_id, mac
order by log_creation_date)
  6          end signoff_id,
  7         user_id,
  8         log_creation_date signon_date,
  9         case when action in (1,2) and lead(action) over (partition by
account_id,user_id,mac order by log_creation_date) in (3,4,5,6,7)
 10              then lead(log_creation_date) over (partition by account_id,
user_id, mac order by log_creation_date)
 11          end signoff_date,
 12                  action
 13  from   logs
 14  where  account_id = 'Robert'
 15  and    service = 5
 16  order  by user_id
 17         ) a
 18   where action in (1,2)
 19  /

Thanks,
Stalin 


Followup   June 9, 2004 - 6pm Central time zone:

you could use a non-equi self join to achieve the same.  Many orders of magnitudes slower.

scalar subqueries could be used as well -- with the same "slower" caveat. 

5 stars Is this solvable with ANALTICS too?   June 10, 2004 - 12am Central time zone
Reviewer: Peter Tran from Houston, TX USA
Hi Tom,

Can the following problem be solved using Analytics?

I have a 10 columns table where 9 of the fields are dimensions and one attribute.  I would like to 
get a report of D1/D2 columns where the ATTR1 is 1 for every other dimensions.  Furthermore the PK 
consist of all the dimension columns.

The example below aren't really true column names, but I didn't want to make the example table too 
wide for illustrative purpose.


 D1  D2  D3  D4  D5  D6  D7  D8  D9  ATTR1
-------------------------------------------- 
 AA  AA  AA  AA  AA  AA  AA  AA  AA    1
 AA  AA  BB  AA  AA  AA  AA  AA  AA    1
 AA  AA  AA  CC  AA  AA  AA  AA  AA    1
 AA  AA  AA  AA  DD  AA  AA  AA  AA    1
 AA  AA  AA  AA  EE  AA  AA  AA  AA    1

 AA  BB  AA  AA  AA  AA  AA  GG  AA    1 
 AA  BB  AA  AA  AA  AA  AA  AA  AA    1 
 AA  BB  CC  AA  AA  AA  AA  AA  AA    0 
 AA  BB  AA  DD  AA  AA  AA  AA  AA    1 

 EE  DD  JJ  LL  MM  NN  OO  PP  QQ    1
 EE  DD  TT  LL  MM  NN  OO  PP  QQ    1 
 
I want the query to return:

 D1  D2
-------- 
 AA  AA
 EE  DD
 
It would not return AA/BB, because of the record:

 D1  D2  D3  D4  D5  D6  D7  D8  D9  ATTR1
-------------------------------------------- 
 AA  BB  CC  AA  AA  AA  AA  AA  AA    0 

Thanks,
-Peter 


Followup   June 10, 2004 - 7am Central time zone:

yes they can, but they are not needed.  regular aggregates do the job.  I'd give you the real query 
if I had a create table/inserts to demo against.  this is "psuedo code", might or might not 
actually work:


select d1, d2
  from t
 group by d1, d2
having count(distinct attribute) = 1


 

4 stars   June 10, 2004 - 9am Central time zone
Reviewer: Michael T. from Dallas, Tx
Peter,
I think the following may give you what you want.

SELECT d1, d2 
  FROM t 
 GROUP BY d1, d2 
HAVING SUM(DECODE(attr1, 1, 0, 1)) > 0;

Tom's psuedo code will work except for the case when all D1/D2 combinations have the same ATTR1 
value, but that value is not 1.
 


Followup   June 10, 2004 - 9am Central time zone:

ahh, good eye -- i was thinking "all attribute values are the same"

but yours doesn't do it,  this will

having count( decode( attr1, 1, 1 ) ) = count(*)



cound(decode(attr1,1,1)) will return a count of non-null occurences (all of the 1's)

count(*) returns a count of all records

output when count(decode) = count(*)

 

5 stars Thank you!   June 10, 2004 - 10am Central time zone
Reviewer: Peter Tran from Houston, TX USA
Hi Tom/Michael T.,

Thank you.  It so much clearer now.

-Peter 


5 stars   June 10, 2004 - 10am Central time zone
Reviewer: Michael T. from Dallas, Tx
I did screw up in my previous response.  The query I submitted gives the entirely wrong answer.  It 
should have been

SELECT d1, d2
  FROM t
 GROUP BY d1, d2
HAVING SUM(DECODE(attr1, 1, 0, 1)) = 0

Even though, incorrectly, I wasn't originally considering null values for ATTR1, the above query 
seems to produce the correct answer even if ATTR1 is NULL.  The DECODE will evaluate a null ATTR1 
entry to 1.

Tom, many thanks for this site.  I have learned so much from it.  It is a daily must read for me.
 


5 stars You said a book on analytics?   June 10, 2004 - 12pm Central time zone
Reviewer: Jeff from Atlanta, GA
A book by you on analytics would be a best seller I think. 
Go for it. 


5 stars quick analytic question   June 16, 2004 - 5pm Central time zone
Reviewer: A reader 
schema creation---
---
scott@ora92> drop table t1;

Table dropped.

scott@ora92> create table t1
  2  (
  3    x varchar2(10),
  4    y number
  5  );

Table created.

scott@ora92> 
scott@ora92> insert into t1 values( 'x1', 1 );

1 row created.

scott@ora92> insert into t1 values( 'x1', 2 );

1 row created.

scott@ora92> insert into t1 values( 'x1', 4 );

1 row created.

scott@ora92> insert into t1 values( 'x1', 0 );

1 row created.

scott@ora92> commit;

Commit complete.

scott@ora92> select x, y, min(y) over() min_y
  2  from t1;

X                   Y      MIN_Y
---------- ---------- ----------
x1                  1          0
x1                  2          0
x1                  4          0
x1                  0          0

scott@ora92> spool off
---
how do i get the minimum of y for all values
that is greater than 0 (if one exists). In the above case
I should get the result as

X                   Y      MIN_Y
---------- ---------- ----------
x1                  1          1
x1                  2          1
x1                  4          1
x1                  0          1



Thanx for your excellent site and brilliant work!
 


Followup   June 16, 2004 - 6pm Central time zone:

min( case when y > 0 then y end ) over () 

5 stars Great!!!   June 16, 2004 - 6pm Central time zone
Reviewer: A reader 


5 stars Thank you very much   July 2, 2004 - 9am Central time zone
Reviewer: Gj from UK
The Oracle docs are a little light on examples but thank you for giving us the quick start to 
analytics, can't say I understand the complex examples yet, but the simple stuff seems so easy to 
understand now, can't wait until a real problem comes along I can apply this feature to. 


4 stars How to mimic Ora10g LAST_VALUE(... IGNORE NULLS)?   July 6, 2004 - 8am Central time zone
Reviewer: Sergey from Norway
Hi Tom,

I need to 'fill the gaps' with the values from the last existing row in a table that is outer 
joined to another table. The other table servers as a source of regular [time] intervals. The task 
seems to be conceptually very simple, so I looked into Ora docs (it happens to be Ora10g docs) I 
pretty soon found exactly what I need: LAST_VALUE with IGNORE NULLS. Unfortunately neither Ora8i, 
nor Ora9i accept IGNORE NULLS. Is there any way to mimic this feature with 'older' analitical 
functions?
I tried sort of ORDER BY SIGN(NVL(VALUE), 0) in analitical ORDER BY clause, but it does not work (I 
do not have a clue why)

Thanks in advance

Here is the test:
DROP TABLE TD;
CREATE TABLE TD AS
        (SELECT TRUNC(SYSDATE, 'DD') + ROWNUM T
            FROM ALL_OBJECTS
            WHERE ROWNUM <= 15
        );

DROP TABLE TV;
CREATE TABLE TV AS
        (SELECT 
                 TRUNC(SYSDATE, 'DD') + ROWNUM * 3 T
                ,ROWNUM V
            FROM ALL_OBJECTS
            WHERE ROWNUM <= 5
        );


SELECT 
         TD.T
        ,SIGN(NVL(TV.V, 0))
        ,NVL
            (TV.V, 
                LAST_VALUE(TV.V IGNORE NULLS) -- IGNORE NULLS does not work on Ora8i, Ora9i
                    OVER 
                        (
                            ORDER BY TD.T
                            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
                        )
            ) V
    FROM TD, TV
    WHERE TV.T(+) = TD.T
    ORDER BY TD.T
    ;
ERROR at line 6:
ORA-00907: missing right parenthesis

SELECT 
         TD.T
        ,SIGN(NVL(TV.V, 0))
        ,NVL
            (TV.V, 
                LAST_VALUE(TV.V)
                    OVER 
                        (
                            ORDER BY SIGN(NVL(TV.V, 0)), TD.T -- Does not work
                            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
                        )
            ) V
    FROM TD, TV
    WHERE TV.T(+) = TD.T
    ORDER BY TD.T
    ;
T                    SIGN(NVL(TV.V,0))                  V
------------------- ------------------ ------------------
07.07.2004 00:00:00                  0
08.07.2004 00:00:00                  0
09.07.2004 00:00:00                  1                  1
10.07.2004 00:00:00                  0
11.07.2004 00:00:00                  0
12.07.2004 00:00:00                  1                  2
13.07.2004 00:00:00                  0
14.07.2004 00:00:00                  0
15.07.2004 00:00:00                  1                  3
16.07.2004 00:00:00                  0
17.07.2004 00:00:00                  0
18.07.2004 00:00:00                  1                  4
19.07.2004 00:00:00                  0
20.07.2004 00:00:00                  0
21.07.2004 00:00:00                  1                  5


 


Followup   July 6, 2004 - 8am Central time zone:

This is a trick I call "carry down", we use analytics on analytics to accomplish this.  We output 
"marker rows" with ROW_NUMBER() on the leading edge.  Using MAX() in the outer query, we "carry 
down" these marker rows -- substr gets rid of the row_number for us:



ops$tkyte@ORA10G> select t,
  2         sign_v,
  3             v,
  4             substr( max(data) over (order by t), 7 ) v2
  5    from (
  6  SELECT TD.T,
  7         SIGN(NVL(TV.V, 0)) sign_v,
  8          NVL(TV.V, LAST_VALUE(TV.V IGNORE NULLS) OVER ( ORDER BY TD.T )) V,
  9           case when tv.v is not null
 10                 then to_char( row_number() 
                                  over (order by td.t), 'fm000000' ) || tv.v
 11                    end data
 12      FROM TD, TV
 13      WHERE TV.T(+) = TD.T
 14          )
 15   ORDER BY T
 16      ;
 
T             SIGN_V          V V2
--------- ---------- ---------- -----------------------------------------
07-JUL-04          0
08-JUL-04          0
09-JUL-04          1          1 1
10-JUL-04          0          1 1
11-JUL-04          0          1 1
12-JUL-04          1          2 2
13-JUL-04          0          2 2
14-JUL-04          0          2 2
15-JUL-04          1          3 3
16-JUL-04          0          3 3
17-JUL-04          0          3 3
18-JUL-04          1          4 4
19-JUL-04          0          4 4
20-JUL-04          0          4 4
21-JUL-04          1          5 5
 
15 rows selected.


So, in 9ir2 this would simply be:


ops$tkyte@ORA9IR2> select t,
  2         sign_v,
  3             substr( max(data) over (order by t), 7 ) v2
  4    from (
  5  SELECT TD.T,
  6         SIGN(NVL(TV.V, 0)) sign_v,
  7           case when tv.v is not null
  8                        then to_char( row_number() over (order by td.t), 'fm000000' ) || tv.v
  9                    end data
 10      FROM TD, TV
 11      WHERE TV.T(+) = TD.T
 12          )
 13   ORDER BY T
 14      ;
 
T             SIGN_V V2
--------- ---------- -----------------------------------------
07-JUL-04          0
08-JUL-04          0
09-JUL-04          1 1
10-JUL-04          0 1
11-JUL-04          0 1
12-JUL-04          1 2
13-JUL-04          0 2
14-JUL-04          0 2
15-JUL-04          1 3
16-JUL-04          0 3
17-JUL-04          0 3
18-JUL-04          1 4
19-JUL-04          0 4
20-JUL-04          0 4
21-JUL-04          1 5
 
15 rows selected.
 

5 stars Doesn't work with PL/SQL ????????   July 20, 2004 - 9am Central time zone
Reviewer: A reader 
Dear Tom
Are analytics fully compatible with PL/SQL?
Please see
SQL> ed
Wrote file afiedt.buf

  1  select empno,deptno,
  2         count(empno) over (partition by deptno order by empno
  3                            rows between unbounded preceding and current row) run_count
  4* from emp
SQL> /

     EMPNO     DEPTNO  RUN_COUNT
---------- ---------- ----------
      7782         10          1
      7839         10          2
      7934         10          3
      7369         20          1
      7566         20          2
      7788         20          3
      7876         20          4
      7902         20          5
      7499         30          1
      7521         30          2
      7654         30          3

     EMPNO     DEPTNO  RUN_COUNT
---------- ---------- ----------
      7698         30          4
      7844         30          5
      7900         30          6

14 rows selected.

SQL> 
SQL> ed
Wrote file afiedt.buf

  1  declare
  2  cursor c1 is
  3  select empno,deptno,
  4         count(empno) over (partition by deptno order by empno
  5                            rows between unbounded preceding and current row) run_count
  6  from emp;
  7  begin
  8   for rec in c1 loop
  9    null;
 10   end loop;
 11* end;
SQL> /
end;
   *
ERROR at line 11:
ORA-06550: line 5, column 72:
PL/SQL: ORA-00905: missing keyword
ORA-06550: line 3, column 1:
PL/SQL: SQL Statement ignored


SQL> 
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0       Production
TNS for 32-bit Windows: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

SQL>  


Followup   July 20, 2004 - 8pm Central time zone:

You can contact support and reference <Bug:3083373>, but the workaround would be to use native 
dynamic sql or a view to "hide" this construct.

the problem turns out to be the word "current" which had meaning in plsql. 

5 stars Effect of distinct on lag   July 29, 2004 - 1pm Central time zone
Reviewer: John Murphy from Vienna, VA
I am trying to use analytics to find accounts with receipts in 3 consecutive years.  The analytic 
code seems to work, however, when I add DISTINCT (to find each account once), I get strange 
results.  This is on 9.2.0.1.0.

create table jcm_test(acct_id number(10), rcpt_date date);

insert into jcm_test
values (1 , to_date('01-JAN-2000', 'dd-mon-yyyy'));                                                 
           
insert into jcm_test
values (1 , to_date('01-JAN-2001', 'dd-mon-yyyy'));                                                 
            
insert into jcm_test
values (1 , to_date('01-JAN-2003', 'dd-mon-yyyy'));                                                 
            
insert into jcm_test
values (1 , to_date('02-JAN-2001', 'dd-mon-yyyy'));   

(select j2.*,
  rcpt_year - lag_yr as year_diff,
  rank_year - lag_rank as rank_diff
  from (select acct_id, rcpt_year, rank_year,
       lag(rcpt_year, 2) over (partition by acct_id order by rcpt_year) lag_yr,
       lag(rank_year, 2) over (partition by acct_id order by rcpt_year) lag_rank
       from (select acct_id,
            rcpt_year,
            rank() over (partition by acct_id order by j.rcpt_year) rank_year
            from (select distinct acct_id, to_char(rcpt_date, 'YYYY') rcpt_year
                     from jcm_test) j )
     ) j2);

   ACCT_ID RCPT  RANK_YEAR LAG_   LAG_RANK  YEAR_DIFF  RANK_DIFF        
---------- ---- ---------- ---- ---------- ---------- ----------        
     1 2000      1                  
     1 2001      2                  
     1 2003      3 2000      1      3      2        

select * from
  (select j2.*,
  rcpt_year - lag_yr as year_diff,
  rank_year - lag_rank as rank_diff
  from (select acct_id, rcpt_year, rank_year,
       lag(rcpt_year, 2) over (partition by acct_id order by rcpt_year) lag_yr,
       lag(rank_year, 2) over (partition by acct_id order by rcpt_year) lag_rank
       from (select acct_id,
             rcpt_year,
             rank() over (partition by acct_id order by j.rcpt_year) rank_year
             from (select distinct acct_id, to_char(rcpt_date, 'YYYY') rcpt_year
                    from jcm_test) j )
   ) j2)
where  year_diff = rank_diff;

no rows selected

select distinct * from
  (select j2.*,
  rcpt_year - lag_yr as year_diff,
  rank_year - lag_rank as rank_diff
  from (select acct_id, rcpt_year, rank_year,
       lag(rcpt_year, 2) over (partition by acct_id order by rcpt_year) lag_yr,
       lag(rank_year, 2) over (partition by acct_id order by rcpt_year) lag_rank
       from (select acct_id,
             rcpt_year,
             rank() over (partition by acct_id order by j.rcpt_year) rank_year
             from (select distinct acct_id, to_char(rcpt_date, 'YYYY') rcpt_year
                  from jcm_test) j )
     ) j2)
   where  year_diff = rank_diff;

   ACCT_ID RCPT  RANK_YEAR LAG_   LAG_RANK  YEAR_DIFF  RANK_DIFF        
---------- ---- ---------- ---- ---------- ---------- ----------        
     1 2001      2 2000      1      1      1        
     1 2003      4 2001      2      2      2        

In your book, you say that because analytics are performed last, you must push them into an inline 
view.  However, that doesn't seem to do the trick here.  Thanks, john 


Followup   July 29, 2004 - 2pm Central time zone:

what release -- i don't see what you see. 

5 stars Distinct effect release   July 29, 2004 - 3pm Central time zone
Reviewer: John Murphy from Vienna, VA
Tom, we are using the following.

Oracle9i Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE    9.2.0.1.0       Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production

I tried searching Metalink, but couldn't find any bugs. 


Followup   July 29, 2004 - 4pm Central time zone:

i found one, not published, was solved via 9202 -- at least it did not reproduce, they did not 
pursue it further for that reason.  

 

5 stars Distinct effect release   July 29, 2004 - 4pm Central time zone
Reviewer: John Murphy from Vienna, VA
Actually, I suspect that this may be related to bug 2258035.  Do you agree?  Thanks, john 


Followup   July 29, 2004 - 4pm Central time zone:

yes, i can confirm that in 9205, it is not happening that way. 

5 stars how to write this query   July 30, 2004 - 6am Central time zone
Reviewer: Teddy 
Hi

using the original poster´s example:


   ORDER  OPN  STATION  CLOSE_DATE
   -----  ---  -------  ----------
   12345   10  RECV     07/01/2003
   12345   20  MACH1    07/02/2003
   12345   25  MACH1    07/05/2003
   12345   30  MACH1    07/11/2003
   12345   36  INSP1    07/12/2003
   12345   50  MACH1    08/16/2003
   12346   90  MACH2    07/30/2003
   12346  990  STOCK    07/31/2003

How do you write a query to determine that and order has passed maufacturing operation in several 
months?
In above example  

12345 has rows in July and Augist but 12346 has rows in July only. How can we write a query to find 
orders such as 12345?
 


Followup   July 30, 2004 - 4pm Central time zone:

select order, min(close_date), max(close_date)
  from t
having months_between( max(close_date), min(close_date) ) > your_threshold; 

5 stars Finding pairs in result set   August 11, 2004 - 10am Central time zone
Reviewer: PJ 
Tom,

CREATE TABLE A
(
  N  NUMBER,
  C  CHAR(1),
  V  VARCHAR2(20)
)

INSERT INTO A ( N, C, V ) VALUES ( 1, 'e', '1st e of 1st N'); 
INSERT INTO A ( N, C, V ) VALUES ( 1, 'e', '2nd e of 1st N'); 
INSERT INTO A ( N, C, V ) VALUES ( 1, 'e', '3rd e of 1st N'); 
INSERT INTO A ( N, C, V ) VALUES ( 1, 'w', '1st w of 1st N'); 
INSERT INTO A ( N, C, V ) VALUES ( 1, 'w', '2nd w of 1st N'); 
INSERT INTO A ( N, C, V ) VALUES ( 2, 'e', '1st e of 2nd N'); 
INSERT INTO A ( N, C, V ) VALUES ( 2, 'w', '1st w of 2nd N'); 
INSERT INTO A ( N, C, V ) VALUES ( 2, 'w', '2nd w of 2nd N'); 
commit;
 
SO the data I've is 
select * from a;
-------------------------
N    C    V

1    e    1st e of 1st N
1    e    2nd e of 1st N
1    e    3rd e of 1st N
1    w    1st w of 1st N
1    w    2nd w of 1st N
2    e    1st e of 2nd N
2    w    1st w of 2nd N
2    w    2nd w of 2nd N

---------------------------------------
And the output I'm looking for is

1    e    1st e of 1st N
1    e    2nd e of 1st N
1    w    1st w of 1st N
1    w    2nd w of 1st N
2    e    1st e of 2nd N
2    w    1st w of 2nd N


So basically I need the first pairs of (e-w/w-e) for each N.

I hope I'm clear here.

Thanks as usual in advance, 


Followup   August 11, 2004 - 12pm Central time zone:

do you have a field that can be "sorted on" for finding "1st, 2cnd" and so on.

If not, there is no such thing as "first", or "third"



 

4 stars   August 11, 2004 - 12pm Central time zone
Reviewer: PJ 
Tom,

Sorry if I was not clear.
we need to pick pairs for N. Like we have 5 rows with N=1. so we have to pick 4 rows leaving 1 
UNPAIRED "e" out.
We want the data in the same order as it is in table. We can sort it by --> order by N,C 


Followup   August 11, 2004 - 1pm Central time zone:

ops$tkyte@ORA920> select n, c, rn, cnt2
  2    from (
  3  select n, c, rn,
  4             min(cnt) over (partition by n) cnt2
  5    from (
  6  select n, c,
  7             row_number() over (partition by n, c order by c) rn,
  8             count(*) over (partition by n, c) cnt
  9    from a
 10             )
 11             )
 12   where rn <= cnt2
 13  /
 
         N C         RN       CNT2
---------- - ---------- ----------
         1 e          1          2
         1 e          2          2
         1 w          1          2
         1 w          2          2
         2 e          1          1
         2 w          1          1
 
6 rows selected.
 

5 stars Brilliant as usual !!   August 11, 2004 - 2pm Central time zone
Reviewer: A reader 


5 stars PJ's query   August 11, 2004 - 2pm Central time zone
Reviewer: Kevin from St. Louis
PJ - you can drop the column 'v' from your table, and just use this query (which I think will 
answer your question using N and C alone, and generate an appropriate 'v' as it runs).

CREATE TABLE b
(
  N  NUMBER,
  C  CHAR(1)
) 


INSERT INTO b ( N, C ) VALUES ( 1, 'e');
 
INSERT INTO b ( N, C ) VALUES ( 1, 'e'); 

INSERT INTO b ( N, C ) VALUES ( 1, 'e'); 

INSERT INTO b ( N, C ) VALUES ( 1, 'w'); 

INSERT INTO b ( N, C ) VALUES ( 1, 'w'); 

INSERT INTO b ( N, C ) VALUES ( 2, 'e'); 

INSERT INTO b ( N, C ) VALUES ( 2, 'w'); 

INSERT INTO b ( N, C ) VALUES ( 2, 'w'); 

COMMIT;


SELECT n,c,v1 
FROM   (       
         SELECT lag (c1) OVER (PARTITION BY n,c1 ORDER BY n,c1) c3, 
                lead (c1) OVER (PARTITION BY n,c1 ORDER BY n,c1)c4,
                c1 ||   
                 CASE WHEN c1 BETWEEN 10 AND 20 
                      THEN 'th' 
                      ELSE DECODE(MOD(c1,10),1,'st',2,'nd',3,'rd','th')
                 END || ' ' || c || ' of ' || c2 ||
                 CASE WHEN c2 BETWEEN 10 AND 20 
                      THEN 'th' 
                      ELSE DECODE(MOD(c2,10),1,'st',2,'nd',3,'rd','th')
                 END || ' N' v1,                
                t1.*
         FROM   (       
                  SELECT b.*,
                         row_number() OVER (PARTITION BY n, c ORDER BY n,c) c1,
                         DENSE_RANK() OVER (PARTITION BY n, c ORDER BY n,c) c2
                  FROM   b
                ) t1     
       ) t2
WHERE c3 IS NOT NULL OR c4 IS NOT NULL
/
Results:
N    C    V1
1    e    1st e of 1st N    
1    w    1st w of 1st N    
1    e    2nd e of 1st N    
1    w    2nd w of 1st N    
2    e    1st e of 1st N    
2    w    1st w of 1st N    

INSERT INTO b ( N, C ) VALUES ( 1, 'w'); 

COMMIT;

Results:
N    C    V1
1    e    1st e of 1st N    
1    w    1st w of 1st N    
1    e    2nd e of 1st N    
1    w    2nd w of 1st N    
1    e    3rd e of 1st N    
1    w    3rd w of 1st N    
2    e    1st e of 1st N    
2    w    1st w of 1st N    
 


5 stars oops   August 11, 2004 - 2pm Central time zone
Reviewer: Kevin from St. Louis
replace
DENSE_RANK() OVER (PARTITION BY n, c ORDER BY n,c) c2
with
DENSE_RANK() OVER (PARTITION BY c ORDER BY c) c2

my bad. 


3 stars   August 11, 2004 - 3pm Central time zone
Reviewer: A reader 
Your bad what?
toe? leg?
 


5 stars Cool....   August 12, 2004 - 7am Central time zone
Reviewer: PJ 


5 stars analytic q   October 22, 2004 - 6pm Central time zone
Reviewer: A reader 
First the schema:

scott@ORA92I> drop table t1;

Table dropped.

scott@ORA92I> create table t1( catg1 varchar2(10), catg2 varchar2(10), total number );

Table created.

scott@ORA92I> 
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V1', 'T1', 5 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V1', 'T1', 6 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V1', 'T1', 9 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V2', 'T2', 10 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V3', 'T1', 11 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V4', 'T1', 1 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V5', 'T2', 2 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V6', 'T2', 3 );

1 row created.

 The catg2 can only take two values, 'T1', 'T2'.
 I want to sum the total column for catg1, catg2
 and order by their total sum for each catg1 and catg2 values. Then
 I want to list the top 3 catg1, catg2 combinations
 based on their sum values of total column.

 If there are more than 3 such combinations then I 
 club the remaining ones into a catg1 value of 'Others'.

my first cut solution is:

scott@ORA92I> select catg1, catg2, sum( total_sum )
  2  from
  3  (
  4    select case
  5          when dr > 3 then
  6            'Others'
  7          when dr <= 3 then
  8            catg1
  9            end catg1,
 10            catg2,
 11            total_sum
 12    from
 13    (
 14       select catg1, catg2, total_sum,
 15             dense_rank() over( order by total_sum desc) dr
 16       from
 17       (
 18         select catg1, catg2, sum( total ) total_sum
 19         from t1
 20         group by catg1, catg2
 21       )
 22    )
 23  )
 24  group by catg1, catg2;

CATG1      CATG2      SUM(TOTAL_SUM)
---------- ---------- --------------
V1         T1                     20
V2         T2                     10
V3         T1                     11
Others     T1                      1
Others     T2                      5


Does it look ok or do you have any better solution?

Thank you as always.
 


Followup   October 23, 2004 - 9am Central time zone:

you could skip a layer of inline view, but it looks fine as is. 

5 stars thanx!   October 24, 2004 - 12pm Central time zone
Reviewer: A reader 


4 stars SQL query   November 3, 2004 - 1pm Central time zone
Reviewer: Reader from USA
I have a table which stores receipts against Purchase Orders. The users want the following o/p:
For each of the months of Jan, Feb and March 2004, provide a count of number of receipts which fall 
in each of the following Dollar value range
< $5000
Between $5000 to $9999
> $10,000

(There can be a number of receipts against one Purchase Order, so that's needs to be grouped 
together first)

I wrote this query using an inline view which is the UNION of 3 SQLs, one for each dollar range.
However, am sure there is a more elegant and efficient method to do this,maybe using analytical 
functions , CASE, decode .... Appreciate your help.

Thanks 


Followup   November 5, 2004 - 10am Central time zone:

select trunc(date_col,'mm') Month, 
       count( case when amt < 5000 then 1 end ) "lt 5000",
       count( case when amt between 5000 and 9999 then 1 end ) "between 5/9k",
       count( case when amt >= 10000 then 1 end ) "10k or more"
  from t
 where date_col between :a and :b
 group by trunc(date_col,'mm')


single pass.... 

5 stars Great -   November 10, 2004 - 7am Central time zone
Reviewer: syed from UK
Tom

I have a tables as follows

create table matches
( reference varchar2(9),
  endname varchar2(20),
  beginname varchar2(30),
  DOB date, 
  ni varchar2(9)
)
/


insert into matches values ('A1','SMITH','BOB',to_date('1/1/1976','dd/mm/yyyy'),'AA1234567');
insert into matches values ('A1','SMITH','TOM',to_date('1/1/1970','dd/mm/yyyy'),'AA1234568');
insert into matches values ('A2','JONES','TOM',to_date('1/1/1970','dd/mm/yyyy'),'AA1234568');
insert into matches values ('A3','JONES','TOM',to_date('1/1/1971','dd/mm/yyyy'),'AA1234569');
insert into matches values ('A4','BROWN','BRAD',to_date('1/1/1961','dd/mm/yyyy'),'AA1234570');
insert into matches values ('A4','JONES','BRAD',to_date('1/1/1961','dd/mm/yyyy'),'AA1234571');
insert into matches values ('A1','SMITH','BOB',to_date('1/1/1976','dd/mm/yyyy'),'AA1234567');
insert into matches values ('A3','JACKSON','TOM',to_date('1/1/1971','dd/mm/yyyy'),'AA1234569');
insert into matches values ('A2','JACKSON','BOB',to_date('1/1/1962','dd/mm/yyyy'),'AA1234568');
 insert into matches values ('A5','JACKSON','TOM',to_date('1/1/1920','dd/mm/yyyy'),'AA1234569');
commit;

SQL> select rownum,REFERENCE,ENDNAME,BEGINNAME,DOB,NI from matches;

 ROWNUM REFERENCE ENDNAME  BEGINNAME  DOB       NI
------- --------- -------- ---------- --------- ---------
      1 A1        SMITH    BOB        01-JAN-76 AA1234567
      2 A1        SMITH    TOM        01-JAN-70 AA1234568
      3 A2        JONES    TOM        01-JAN-70 AA1234568
      4 A3        JONES    TOM        01-JAN-71 AA1234569
      5 A4        BROWN    BRAD       01-JAN-61 AA1234570
      6 A4        JONES    BRAD       01-JAN-61 AA1234571
      7 A1        SMITH    BOB        01-JAN-76 AA1234567
      8 A3        JACKSON  TOM        01-JAN-71 AA1234569
      9 A2        JACKSON  BOB        01-JAN-62 AA1234568
     10 A5        JACKSON  TOM        01-JAN-20 AA1234569

I need to show duplicates where the following columns values are the same.

a) REFERENCE, ENDNAME,BEGINNAME,DOB,NI
b) ENDNAME,BEGINNAME,NI
c) REFERENCE,NI

So, 
rownum 1 and 7 match criteria a)
rownum 8 and 10 match criteria b) 
rownum 1 and 7, rownum 3 and 9, rownum 4 and 8 match criteria c)

How can I select this data out to show number matching each criteria ?
 


Followup   November 10, 2004 - 7am Central time zone:

"How can I select this data out to show number matching each criteria ?"

is ambigous.


If you add columns:

count(*) over (partition by reference, endname, beginname, dob, ni ) cnt1,
count(*) over (partition by endname, beginname, ni) cnt2,
count(*) over (partition by reference,ni) cnt3


it'll give you the "dup count" by each partition -- technically showing you the "number matching 
each criteria" 

5 stars analytics problem   November 19, 2004 - 9am Central time zone
Reviewer: David from United Kingdom
Am newish to analytic functions and have hit problem as follows:-

create table a
(accno         number(8)     not null,
 total_paid    number(7,2)   not null)
/

create table b
(accno         number(8)     not null,
 due_date      date          not null,
 amount_due    number(7,2)   not null)
/

insert into a values (1, 1000);
insert into a values (2, 1500);
insert into a values (3, 2000);
insert into a values (4, 3000);

insert into b values (1, '01-oct-04', 1000);
insert into b values (1, '01-jan-05', 900);
insert into b values (1, '01-apr-05', 700);

insert into b values (2, '01-oct-04', 1000);
insert into b values (2, '01-jan-05', 900);
insert into b values (2, '01-apr-05', 700);

insert into b values (3, '01-oct-04', 1000);
insert into b values (3, '01-jan-05', 900);
insert into b values (3, '01-apr-05', 700);

insert into b values (4, '01-oct-04', 1000);
insert into b values (4, '01-jan-05', 900);
insert into b values (4, '01-apr-05', 700);

If I then do this query...

SQL> select a.accno,
  2         a.total_paid,
  3         b.due_date,
  4         b.amount_due,
  5         case
  6  when sum(b.amount_due)
  7  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <= 0
  8  then 0
  9  when sum(b.amount_due)
 10  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid < b.amount_due
 11  then sum(b.amount_due)
 12  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid
 13  when sum(b.amount_due)
 14  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid >= b.amount_due
 15  and a.total_paid >= 0
 16  then b.amount_due
 17  end to_pay
 18  from a,b
 19  where a.accno = b.accno
 20  order by a.accno,
 21           to_date(b.due_date, 'dd-mon-rr')
 22  /

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         1       1000 01-OCT-04       1000       1000
         1       1000 01-JAN-05        900        900
         1       1000 01-APR-05        700        700
         2       1500 01-OCT-04       1000       1000
         2       1500 01-JAN-05        900        900
         2       1500 01-APR-05        700        700
         3       2000 01-OCT-04       1000       1000
         3       2000 01-JAN-05        900        900
         3       2000 01-APR-05        700        700
         4       3000 01-OCT-04       1000       1000
         4       3000 01-JAN-05        900        900
         4       3000 01-APR-05        700        700

12 rows selected.

...TO_PAY does not give what I was expecting. But if I do by individual accno I get what I'm 
after:-

SQL> select a.accno,
  2         a.total_paid,
  3         b.due_date,
  4         b.amount_due,
  5         case
  6  when sum(b.amount_due)
  7  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <= 0
  8  then 0
  9  when sum(b.amount_due)
 10  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid < b.amount_due
 11  then sum(b.amount_due)
 12  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid
 13  when sum(b.amount_due)
 14  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid >= b.amount_due
 15  and a.total_paid >= 0
 16  then b.amount_due
 17  end to_pay
 18  from a,b
 19  where a.accno = b.accno
 20  and a.accno = &accno
 21  order by a.accno,
 22           to_date(b.due_date, 'dd-mon-rr')
 23  /
Enter value for accno: 1
old  20: and a.accno = &accno
new  20: and a.accno = 1

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         1       1000 01-OCT-04       1000          0
         1       1000 01-JAN-05        900        900
         1       1000 01-APR-05        700        700

3 rows selected.

SQL> /
Enter value for accno: 2
old  20: and a.accno = &accno
new  20: and a.accno = 2

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         2       1500 01-OCT-04       1000          0
         2       1500 01-JAN-05        900        400
         2       1500 01-APR-05        700        700

3 rows selected.

SQL> /
Enter value for accno: 3
old  20: and a.accno = &accno
new  20: and a.accno = 3

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         3       2000 01-OCT-04       1000          0
         3       2000 01-JAN-05        900          0
         3       2000 01-APR-05        700        600

3 rows selected.

SQL> /
Enter value for accno: 4
old  20: and a.accno = &accno
new  20: and a.accno = 4

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         4       3000 01-OCT-04       1000          0
         4       3000 01-JAN-05        900          0
         4       3000 01-APR-05        700          0

3 rows selected.

What is needed for first query above to work?

cheers,
David 


Followup   November 19, 2004 - 11am Central time zone:

ops$tkyte@ORA9IR2> select a.accno,
  2         a.total_paid,
  3         b.due_date,
  4         b.amount_due,
  5         case
  6  when sum(b.amount_due)
  7  over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <= 0
  8  then 0
  9  when sum(b.amount_due)
 10  over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid < 
b.amount_due
 11  then sum(b.amount_due)
 12  over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid
 13  when sum(b.amount_due)
 14  over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid >= 
b.amount_due
 15  and a.total_paid >= 0
 16  then b.amount_due
 17  end to_pay
 18  from a,b
 19  where a.accno = b.accno
 20  order by a.accno,
 21           to_date(b.due_date, 'dd-mon-rr')
 22  /
 
     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         1       1000 01-OCT-04       1000          0
         1       1000 01-JAN-05        900        900
         1       1000 01-APR-05        700        700
         2       1500 01-OCT-04       1000          0
         2       1500 01-JAN-05        900        400
         2       1500 01-APR-05        700        700
         3       2000 01-OCT-04       1000          0
         3       2000 01-JAN-05        900          0
         3       2000 01-APR-05        700        600
         4       3000 01-OCT-04       1000          0
         4       3000 01-JAN-05        900          0
         4       3000 01-APR-05        700          0
 
12 rows selected.
 
 

5 stars excellent   November 19, 2004 - 12pm Central time zone
Reviewer: David from UK
many thanks 


5 stars Limitation of Analytic Functions   December 16, 2004 - 4am Central time zone
Reviewer: Nilanjan Ray from India
I am using the following view
create or replace view vw_history as
select
txm_dt,s_key,s_hist_slno,cm_key,burst_key,cm_channel_key
,(lag(s_hist_slno,1,0) over(partition by s_key,s_hist_slno order by s_key,s_hist_slno)) 
prv_hist_slno
from adc_history 

The following SQL statement invariably does a full table scan on 112,861,91 rows of ADC_HISTORY and 
runs for 20-25 mins.

select *
from vw_history
where t_dt between to_date('01/01/2002','dd/mm/yyyy') and to_date('01/01/2002','dd/mm/yyyy');

The query return 4200 rows. ADC_HISTORY has 112,861,91 rows. I have the following indexes : 
ADC_HISTORY_IDX8 on txm_dt and ADC_HISTORY_IDX1 on spot_key columns. Both have good selectivities.

But when the required query is ran without the view it properly uses the index ADC_HISTORY_IDX8

select
txm_dt,s_key,s_hist_slno,cm_key,burst_key,cm_channel_key
,(lag(s_hist_slno,1,0) over(partition by s_key,s_hist_slno order by s_key,s_hist_slno)) 
prv_hist_slno
from adc_history 
 
I had raised a tar and it says:This is the expected behaviour "PREDICATES ARE NOT PUSHED IN THE 
VIEW IF ANY ANALYTIC FUNCTIONS ARE USED"

Is there any way to work aroung this limitation. I just cannot think of the painful situation if I 
am unable to use views with analytics!!!!

Your help is absolutely necessary. Thanks in advance  


Followup   December 16, 2004 - 8am Central time zone:

guess what -- your two queries return different answers..


did you consider that?  did you check that?  

they are TOTALLY DIFFERENT.  Analytics are applied after predicates.  The view -- it has no 
predicate.  The query -- it has a predicate.  You'll find that you have DIFFERENT result sets.

don't you see that as a problem?

It is not that you are "unable to use views"

It is that "when I use a view, I get answer 1, when I do not use a view, I get answer 2"

which answer is technically correct here?


Think about it.


consider this example (using RBO just to make it so that "if an index could be used it would" to 
stress the point):


ops$tkyte@ORA9IR2> create table emp as select * from scott.emp;
 
Table created.
 
ops$tkyte@ORA9IR2> create index job_idx on emp(job);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace view v
  2  as
  3  select ename, sal, job,
  4         sum(sal) over (partition by job) sal_by_job,
  5             sum(sal) over (partition by deptno) sal_by_deptno
  6    from emp
  7  /
 
View created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace on explain
ops$tkyte@ORA9IR2> select *
  2    from v
  3   where job = 'CLERK'
  4  /
 
ENAME             SAL JOB       SAL_BY_JOB SAL_BY_DEPTNO
---------- ---------- --------- ---------- -------------
MILLER           1300 CLERK           4150          8750
JAMES             950 CLERK           4150          9400
SMITH             800 CLERK           4150         10875
ADAMS            1100 CLERK           4150         10875
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=RULE
   1    0   VIEW OF 'V'
   2    1     WINDOW (SORT)
   3    2       WINDOW (SORT)
   4    3         TABLE ACCESS (FULL) OF 'EMP'
 
 
so, one might ask "well - hey, I've got that beautiful index on JOB, I said "where job = 
'CLERK'", whats up with that full scan.

in fact, when I do it "right" -- without the evil view:

 
ops$tkyte@ORA9IR2> select ename, sal, job,
  2         sum(sal) over (partition by job) sal_by_job,
  3             sum(sal) over (partition by deptno) sal_by_deptno
  4    from emp
  5   where job = 'CLERK'
  6  /
 
ENAME             SAL JOB       SAL_BY_JOB SAL_BY_DEPTNO
---------- ---------- --------- ---------- -------------
MILLER           1300 CLERK           4150          1300
SMITH             800 CLERK           4150          1900
ADAMS            1100 CLERK           4150          1900
JAMES             950 CLERK           4150           950
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=RULE
   1    0   WINDOW (SORT)
   2    1     WINDOW (SORT)
   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
   4    3         INDEX (RANGE SCAN) OF 'JOB_IDX' (NON-UNIQUE)
 
it very rapidly uses my index !!!   stupid views...

but wait.

whats up with SAL_BY_DEPTNO, that appears to be wrong... hmmm, what happened?

What happened was we computed the sal_by_depto in the query without the view AFTER doing "where job 
= 'CLERK'"


YOU are doing your LAG() analysis AFTER applying the predicate.  Your lags in your query without 
the view -- they are pretty much "not accurate"


Note that when the predicate CAN be pushed:
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select ename, sal, sal_by_job
  2    from v
  3   where job = 'CLERK'
  4  /
 
ENAME             SAL SAL_BY_JOB
---------- ---------- ----------
SMITH             800       4150
ADAMS            1100       4150
JAMES             950       4150
MILLER           1300       4150
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=RULE
   1    0   VIEW OF 'V'
   2    1     WINDOW (BUFFER)
   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
   4    3         INDEX (RANGE SCAN) OF 'JOB_IDX' (NON-UNIQUE)
 

it most certainly is.  here the predicate can safely be pushed -- since the analytic is computed 
"by job", a predicate on "job" can be applied FIRST and then the analytic computed.  

When pushing would change the answer -- we cannot do it.

When pushing the predicate would not change the answer -- we do it.


This is not a 'limitation', this is about "getting the right answer"
 
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> alter session set optimizer_mode = choose;
 
Session altered.
 

 

5 stars Great!!!   December 17, 2004 - 12pm Central time zone
Reviewer: Nilanjan Ray from India
Simply amazing explanation. Cleared my doubts still further. One of the best explanation, in simple 
concise terms, I have seen on "Ask Tom". You know what, people should take enough caution and learn 
leasons from you before making misleading statements like "...LIMITATIONS...". In your terms yet 
again "Analytics Rock".

Regards 


4 stars Using analytical function, LEAD, LAG   December 24, 2004 - 9am Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,

    Analytical function, LEAD (or LAG) accepts the offset parameter as an integer which is a count 
of rows to be skipped from the current row before accessing the leading/lagging row. What if I want 
to access leading rows based on the value of column of current row, like a function applied to the 
column value of current row to access the leading row.

As an example: I have a table 

create table t(id integer, dt date);

For each id, start with the first record, after ordering by dt ASC. Get the next record where dt = 
10 min + first_row.dt. Then next record where dt = 20 min + first_row.dt and so on. Each time time 
is cummulatively increased by 10 min.

Suppose if don't get an exact match from next record (ie next_row.dt <> first_row.dt+10 min(say), 
then we select a row closest to the expected record, but lying within +/-10 seconds.

insert into t values (1, to_date('12/20/2004 00:00:00', 'mm/dd/yyyy hh24:mi:ss')); --Selected.

insert into t values (1, to_date('12/20/2004 00:05:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:09:55', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:10:00', 'mm/dd/yyyy hh24:mi:ss')); --Selected.

insert into t values (1, to_date('12/20/2004 00:15:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:19:54', 'mm/dd/yyyy hh24:mi:ss')); --Not selected.

insert into t values (1, to_date('12/20/2004 00:19:55', 'mm/dd/yyyy hh24:mi:ss')); --Selected.

insert into t values (1, to_date('12/20/2004 00:25:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:30:05', 'mm/dd/yyyy hh24:mi:ss')); --Selected.

insert into t values (1, to_date('12/20/2004 00:30:06', 'mm/dd/yyyy hh24:mi:ss')); --Not Selected.

insert into t values (1, to_date('12/20/2004 00:35:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:39:55', 'mm/dd/yyyy hh24:mi:ss')); --Either this or 
below record is selected. 

insert into t values (1, to_date('12/20/2004 00:40:05', 'mm/dd/yyyy hh24:mi:ss')); --Either this or 
above record is selected.

My output would be:
id    dt
-----------
1    12/20/2004 00:00:00 AM
1    12/20/2004 00:10:00 AM  --Exactly matches first_row.dt + 10min
1    12/20/2004 00:19:55 AM  --Closest to first_row.dt + 20min +/- 10sec
1    12/20/2004 00:30:05 AM  --Closest to first_row.dt + 30min +/- 10sec
1    12/20/2004 00:39:55 AM OR 12/20/2004 00:40:05 AM --Closest to first_row.dt + 40min +/- 10sec

The method I followed, after failed using LEAD is:

Step#1
------
Get a subset of dt's column, which is a 10 min cummulatiave dts from the dt value of first 
row(after rounding to the nearest minute, multiple of 10).
In this example I will get a subset:

12/20/2004 00:00:00 AM
12/20/2004 00:10:00 AM
12/20/2004 00:20:00 AM
12/20/2004 00:30:00 AM
12/20/2004 00:40:00 AM

This query will do it:

SELECT t1.id,
         (  min_dt -   MOD ((ROUND (min_dt, 'mi') - ROUND (min_dt, 'hh')) * 24 * 60, 10) / (24 * 
60)) + (ROWNUM - 1) * 10 / (24 * 60) dt_rounded
  FROM (SELECT   id, MIN (dt) min_dt,
                 ROUND ((MAX (dt) - MIN (dt)) * 24 * 60 / 10) max_rows
            FROM t
           WHERE id = 1
        GROUP BY id) t1, t
 WHERE ROWNUM <= max_rows + 1     

Step#2:
-------
This subquery is joined with table t to get only those records from t which is either equal to the 
dts in the resultset returned by the subquery or fall within the range 10min +/-10sec (not closest 
only, but all).

SELECT t.id, dt_rounded, ABS (t.dt - dt_rounded) * 24 * 60 * 60 dt_diff_in_sec
FROM t, 
    (SELECT t1.id,
         (  min_dt -   MOD ((ROUND (min_dt, 'mi') - ROUND (min_dt, 'hh')) * 24 * 60, 10) / (24 * 
60)) + (ROWNUM - 1) * 10 / (24 * 60) dt_rounded
     FROM (SELECT   id, MIN (dt) min_dt,
                 ROUND ((MAX (dt) - MIN (dt)) * 24 * 60 / 10) max_rows
            FROM t
           WHERE id = 1
        GROUP BY id) t1, t
     WHERE ROWNUM <= max_rows + 1) t2     
WHERE t.id = 1          
AND ABS (t.dt - dt_rounded) * 24 * 60 * 60 <= 10
ORDER BY t.id, dt_rounded, dt_diff_in_sec;

I agree, this resultset will include duplicate records which I need to remove procedurally, while 
looping through the cursor; the order by clause simplifies this.

Now you might have guessed the problem. If table t contains more than 1000 records, the query asks 
me to wait atleast 2 min! And that too when I am planning to put at least 70,000 records! 

I wrote a procedure which is handling the situation a little better. But I dont know if analytical 
query can help me out to bring back the performance. I could do it if Lead have the fuctionality I 
mentioned in the first paragraph.  Do you have any hints?

Thanks and regards

Praveen 


Followup   December 24, 2004 - 9am Central time zone:

you'd be looking at first_value with range windows, not lag and lead in this case.

 

3 stars Windowing clause and range function.   December 25, 2004 - 1pm Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,
    Thankyou for the suggestion. I am not very well used with analytical queries. I have tried 
based on your advise but unable to even start with. I am struck with the first step itself - in 
specifying the range in the windowing clause. In the windowing clause, we specify an integer to get 
the preceding rows based on the current column value (CLARK's example-Page:556, Analytical 
Funtions). 

In my above example I wrote a query which contains:
 
FIRST_VALUE(id)
OVER (ORDER BY dt DESC
      RANGE 10 PRECEDING)

10, in the windowing clause, will give me a record that fall within 10 days preceding the current 
row. But I need 10 minutes preceding records. Also at the same time all those records that span 
within +/- 10 sec, if exact 10 minute later records are not found (please see the description of 
the problem given in the previous question).

Kindly give me a more clear picture about windowing clause.
Also how you will approch the above problem.

Thanks and regards

Praveen 


Followup   December 26, 2004 - 12pm Central time zone:

do you have Expert One on One Oracle?   I have extensive examples in there.


range 10 = 10 days.

range 10/24 = 10 hours

range 10/24/60 = 10 minutes......


 

3 stars I do have Expert One on One   December 26, 2004 - 2pm Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,

   I got the first glimpse into analytical queries through your book only. Although I had attempted 
to learn them through oracle documentation a couple of times earlier, I never was able to write an 
decent query using analytical functions. Now, after spending a few hours with your book, I can see 
that these fuctions are not as complex as I thought earlier. 

The 'hiredate' example you have given in the book is calculating in terms of days. (Pg:555)

"select ename, sal, hiredate, hiredate-100 window_top
       first_value(ename)
       over(order by hiredate asc
       range 100 preceding) ename_prec,...."

I got the hint from your follow-up. I should have to think a little myself.

Thankyou Tom,

Praveen. 


5 stars   December 26, 2004 - 5pm Central time zone
Reviewer: A reader 
Tom,

Any dates when you would be releasing your book on Analytic?

Thanks. 


Followup   December 26, 2004 - 6pm Central time zone:

doing a 2nd edition of Expert One on One Oracle now -- not on the list yet. 

5 stars Great answer!   December 27, 2004 - 2am Central time zone
Reviewer: Shimon Tourgeman 
Dear Tom,
Could you please tell us when you  are going to publish the next edition of your books, covering 
9iR2 and maybe 10g, as you stated here?

Merry Christmas and a Happy New Year!
Shimon.
 


Followup   December 27, 2004 - 10am Central time zone:

sometime in 2005, but not the first 1/2 :) 

3 stars Using range windows   January 3, 2005 - 8am Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,

Please allow me to explain the problem again which you had 
followed up earlier (Please refer: "Using analytical 
function, LEAD, LAG"). In the table t(id integer, dt date) 
I have records which only differ by seconds ('dt' column). 
Could you please help me to write a query to create windows 
such that each window groups records based on the 
expression 590 <= dt_1 <= 610 (590 & 610 are date 
difference between first record and current record in 
seconds and dt1 is the 'dt' column value of first record in 
each window after ordering by 'id' and 'dt' ASC).
The idea is to find a record following the first record 
which leads by 10 minutes. If exact match is not found 
apply a tolerance of +/-10 seconds. Once the nearest match 
is found (if multiple matches are found, select any), start 
from the next record and repeat the process. (Please see 
the scripts I had given earlier). 

In your follow up, you had suggested the use of 
first_value() analytical function with range windows. But 
it looks like it is pretty difficult to generate the kind 
of windows I specified above. And in your book, examples of
such complex nature where not given (pardon me for being 
critical).

Your answer will help me to get a deeper and practical 
understanding of analytical functions while at the same 
time may help us to bring down a 12 hour procedure to less 
than 5 hours.

Thanks and regards


Praveen 


Followup   January 3, 2005 - 9am Central time zone:

no idea what 590 is.  days? hours? seconds?

sorry - this doesn't compute to me.

590 <= dt_1 <= 610???

 

5 stars Delete Records Older Than 90 Days While Keeping Max   January 3, 2005 - 10am Central time zone
Reviewer: Mac 
There is a DATE column in a table. I need to delete all records older than 90 days -- except if the 
newest record for a unique key happens to be older than 90 days, I want to keep it and delete the 
prior records for that key value.

How? 


Followup   January 3, 2005 - 10am Central time zone:

if the "newest record for a unique key"

if the key is unique.... then the date column is the only thing to be looked at?

that is, if the key is unique, then the oldest record is the newest record is in the fact the only 
record.... 

5 stars Oops, but   January 3, 2005 - 11am Central time zone
Reviewer: A reader 
Sorry, forgot to mention that the DATE column is a part of the unique key. 


3 stars Sorry, I went a bit fast...   January 3, 2005 - 2pm Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,
    
Sorry, I didnt explained properly. 
    
590 = (10 minutes * 60) seconds - 10 seconds
600 = (10 minutes * 60) seconds + 10 seconds
    
Here I am looking for a record (say rn) exactly 
600 sec    (10 min) later to the first record in 
the range window. If I didn't get an exact match 
I try to find a record which is closest to rn, 
but lies with in a range which is 10 seconds less 
than or more than rn.

And the condition
         
"590 <= dt_1 <= 610" tries to eliminate all other 
records inside the range window that does not follow 
the above rule.
         
 dt_1 is the dt column value of any row following the 
 first row in a given range window, such that the 
 difference between dt_1 and dt of first row is between
 590 seconds and 610 seconds. I am interested in only
 one record which lies closest to 600 seconds.
     
 I hope, the picture is more clear to you now. As an 
 example, 
     
 id   dt
 -----------------------------
 1    12/20/2004 00:00:00 AM   --Range window #1
 1    12/20/2004 00:09:55 AM
 1    12/20/2004 00:10:00 AM   --Selected (Closest to 12/20/2004 00:10:00 AM)

 ............................
 1    12/20/2004 00:10:10 AM   --Range window #2  
 1    12/20/2004 00:19:55 AM   --Selected (Closest to 12/20/2004 00:20:00 AM)
 1    12/20/2004 00:20:55 AM   
 ............................
 1    12/20/2004 00:20:55 AM   --Range window #3  
 1    12/20/2004 00:25:00 AM   --Nothing to select
 1    12/20/2004 00:29:10 AM   --Nothing to select      
 ...........................
 1    12/20/2004 00:30:05 AM   --Range window #4
 1    12/20/2004 00:39:55 AM   --Either one is selected
 1    12/20/2004 00:40:05 AM   --Either one is selected
 -----------------------------

Thanks and regards

Praveen     
          
          


Followup   January 3, 2005 - 10pm Central time zone:

that is first_value, last_value with a range window and the time range is

N * 1/24/60/60 -- for N seconds. 

4 stars How to mimic Oracle 10g LAST_VALUE(... IGNORE NULLS)?   January 13, 2005 - 3pm Central time zone
Reviewer: jayaramj@quinnox.com 
Hi Tom,

In answer to the question 'How to mimic Ora10g LAST_VALUE(... IGNORE NULLS)?' from reviewer Sergey 
(from Norway) in this post you have proposed the following solution:

ops$tkyte@ORA10G> select t,
  2         sign_v,
  3             v,
  4             substr( max(data) over (order by t), 7 ) v2
  5    from (
  6  SELECT TD.T,
  7         SIGN(NVL(TV.V, 0)) sign_v,
  8          NVL(TV.V, LAST_VALUE(TV.V IGNORE NULLS) OVER ( ORDER BY TD.T )) V,
  9           case when tv.v is not null
 10                 then to_char( row_number() 
                                  over (order by td.t), 'fm000000' ) || tv.v
 11                    end data
 12      FROM TD, TV
 13      WHERE TV.T(+) = TD.T
 14          )
 15   ORDER BY T
 16      ;

The problem is that this solution converts the data type of the column (in this case column TV.V) 
to a string (V2 in the result is a string). The result would then need to be converted back to the 
original data type. 
It is best to avoid such data type conversion. Is there a solution to mimic Oracle 10g 
LAST_VALUE(... IGNORE NULLS) in Oracle 9i without the datatype conversion?
 


Followup   January 13, 2005 - 3pm Central time zone:

encode the date as a string using to_char( v, 'yyyymmddhh24miss' ) and in the substr of it back out 
-- to_date( substr(...), 'yyyymmddhh24miss' ) 

2 stars How to mimic Oracle 10g LAST_VALUE(... IGNORE NULLS)?   January 14, 2005 - 12am Central time zone
Reviewer: Jay 
In response to your post above - Taking care of dates (for datatype conversion) is not complex 
(though timestamp variants would require a different format string). Object columns are a different 
story altogether. These cannot be easily converted to strings. Is there a better solution that does 
not require datatype conversion (and hence does not require any knowledge of the column datatype in 
this SQL). 


Followup   January 14, 2005 - 8am Central time zone:

upgrade to 10g. 

4 stars find prior collect_date to the max collect_date for each customer   January 25, 2005 - 4pm Central time zone
Reviewer: JANE 
Hello,Tom!
I work in ORACLE 8I
I have table with 2 columns:cstmr_no,collect_date
CREATE TABLE CSTMR_dates
(
  CSTMR_NO             NUMBER(8)                NOT NULL,
  COLLECT_DATE  DATE                     NOT NULL);
insert into cstmr_dates
values(18,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/03/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/05/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/11/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/03/04','dd/mm/yy');
How can i do instead this query the query using analytical
function:

select cstmr_no,max(collect_date) from 
CSTMR_dates
where  collect_date<(select max(RETURN_COLLECT_DATE)
group by cstmr_no 

In production i have thousands records in the table.                                                
       THANK A LOT                                          
JANE 


Followup   January 25, 2005 - 6pm Central time zone:

no idea what "return_collect_date" is. or where it comes from.

the sql is not sql... 

4 stars Mistake:return_collect_date is a collect_date   January 26, 2005 - 2am Central time zone
Reviewer: JANE 
Thank you for answer
JANE 


Followup   January 26, 2005 - 8am Central time zone:

but this sql:

select cstmr_no,max(collect_date) from 
CSTMR_dates
where  collect_date<(select max(COLLECT_DATE)
group by cstmr_no 

is still not sql and I don't know if you want to 

a) delete all old data BY CSTMR_NO (eg: keep just the record with the max(collect_date) BY CSTMR_NO

b) delete all data such that the collect_date is not equal to the max(collect_date)


I cannot suggest a way to rewrite an invalid sql query. 

4 stars No,i want to do the next:   January 26, 2005 - 9am Central time zone
Reviewer: A reader 
i have just to presene the data without deleting anything
For each cstmr i have to see:
cstmr_no max(collect_date) last prior date to max
======== ================= ====================== 
18       01/05/04          01/03/04     
248      01/11/04          01/03/04     

insert into cstmr_dates
values(18,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/03/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/05/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/11/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/03/04','dd/mm/yy');
 


Followup   January 26, 2005 - 9am Central time zone:

wow, how we got from:

select cstmr_no,max(collect_date) from 
CSTMR_dates
where  collect_date<(select max(RETURN_COLLECT_DATE)
group by cstmr_no 

to this, well -- just "wow".  horse of a very different color.


I have to sort of guess -- maybe I'll get it right -- you want 

a) every cstmr_no, 
b) the last two dates recorded for them.


well, after editing your inserts to make them become actual sql that can run.... (you don't really 
use YY in real life do you? please please say "no, that was a mistake...")

ops$tkyte@ORA9IR2> select cstmr_no,
  2         max( decode(rn,1,collect_date) ) d1,
  3         max( decode(rn,2,collect_date) ) d1
  4    from (
  5  select cstmr_no,
  6         collect_date,
  7             row_number() over (partition by cstmr_no order by collect_date desc nulls last) rn
  8    from cstmr_dates
  9         )
 10   where rn <= 2
 11   group by cstmr_no
 12  /
 
  CSTMR_NO D1        D1
---------- --------- ---------
        18 01-MAY-04 01-MAR-04
       248 01-NOV-04 01-MAR-04
 

 

3 stars Lead/Lag and Indexes   February 22, 2005 - 6pm Central time zone
Reviewer: Rob H from Winnipeg Canada
We are using the Lead and Lag functions and I have run into an issue of Index usage.

lets say I have 2 tables

select customer_num, prod_id, date_sold, total_sales from sales_table_NA

and 

select customer_num, prod_id, date_sold, total_sales from sales_table_EUR

if i do a

create view eur_sales
select customer_account, prod_id, trunc(sales_date,'mon') month_purch,
sum(total_sales) sales_current, lead(sum(total_sales),1) over(partition by customer_account, 
prod_id order by trunc(sales_date,'mon') desc) sales_last
from sales_table_EUR
group by customer_account, prod_id

create view na_sales as
select customer_account, prod_id, trunc(sales_date,'mon') month_purch,
sum(total_sales) sales_current, lead(sum(total_sales),1) over(partition by customer_account, 
prod_id order by trunc(sales_date,'mon') desc) sales_last
from sales_table_NA
group by customer_account, prod_id

There are indexes on the tables for customer_acccount

Now, if I 
select * from na_sales where customer_account=1 
the index is used. Same for eur_sales.  However, if I UNION them together it does not (WINDOW SORT 
on first select and WINDOW BUFFER on second).  If I remove the lead function and UNION them, the 
index is used.

Any help?
 


Followup   February 23, 2005 - 1am Central time zone:

do you really want UNION or UNION ALL.........

(do you know the difference between the two)....

if you had given me simple setup scripts, I would have been happy to see if that makes a 
difference, but oh well. 

3 stars Potential Solution   February 22, 2005 - 6pm Central time zone
Reviewer: Rob H from Winnipeg Canada
Rather than pre-sum the data into 2 views I found that union'ing (actually UNION ALL) the data, 
then sum and Lag works fine.
ie
select
customer_account, prod_id, sales_date month_purch,
sum(total_sales) sales_current, lead(sum(total_sales),1) over(partition by 
customer_account, prod_id order by sales_date desc) sales_last
from(
select customer_account, prod_id, sales_date, total_sales from sales_table_NA
union all 
select customer_account, prod_id, trunc(sales_date,'mon') month_purch, total_sales from 
sales_table_EUR) 


1 stars Attitude....   February 23, 2005 - 9am Central time zone
Reviewer: Rob H from Winnipeg Canada
What's the deal? Having a bad day?  I'm sorry, but I assumed from the select statements you could 
infer structure.  Yes, I was using UNION ALL, yes, I know the difference (uh, feeling a bit rude 
are we?) but I didn't realize until after I posted that I missed that (a nice feature would be to 
be able to edit a post for a certain time after post).  I generalized the data structure and SQL 
for confidentiality reasons.  For a guy who is so hard on people's IM speak, you forget to 
capitalize your sentences :)

Now, UNION Vs UNION ALL didn't affect index usage (it did however have 'other' performance issues). 
 You can see from my next post that I worked on the issue and resolved it by not presuming each 
table.  With the new query, if someone issues a select with no 'where customer_account=' then it's 
slower (but that also wasn't the goal).

Thanks 


Followup   February 24, 2005 - 4am Central time zone:

No?  I was simply asking "do you know the difference between the two" for I find most people

a) don't know union all exists
b) the semantic difference between union and union all
c) the performance penalty involved with union vs union all when they didn't need to use UNION

Your example, as posted, did not use UNION ALL.  Look at your text:

<quote>
Now, if I 
select * from na_sales where customer_account=1 
the index is used. Same for eur_sales.  However, if I UNION them together it 
does not (WINDOW SORT on first select and WINDOW BUFFER on second).  If I remove 
the lead function and UNION them, the index is used.
</quote>


I quite simply asked:

does union all change the behaviour? (i did not have an example with table creates and such to work 
with, so I couldn't really 'test it', I don't have your tables, your indexes, your datatypes, etc)

do you need to use union, you said union, you did not say union all.  do you know the difference 
between the two.


Sorry if you took it as an insult, I can only comment based on the data provided.  I had to assume 
you like most of the world was using UNION, not UNION ALL and simply wanted to know if you could 
use union all, if union all made a difference, if you knew the difference between the two.


If I had precience, I could have read your subsequent post and not ask any questions I guess.


Not having a bad day, just working with information provided.  I was not trying to insult you -- I 
was simply "asking".

 

5 stars Analytics   February 24, 2005 - 5am Central time zone
Reviewer: Neelz from Japan
Dear Sir,

I had gone through the above examples and was wondering whether analytical functions could be used 
when aggregating multiple columns from a table, 
CREATE TABLE T (
    SUPPLIER_CD CHAR(4) NOT NULL, 
    ORDERRPT_NO CHAR(8) NOT NULL, 
        ORDER_DATE CHAR(8) NOT NULL, 
    STORE_CD CHAR(4) NOT NULL, 
    POSITION_NO CHAR(3 ) NOT NULL, 
    CONTORL_FLAG CHAR(2 ), 
    ORDERQUANTITY_EXP NUMBER(3) DEFAULT (0) NOT NULL, 
    ORDERQUANTITY_RES NUMBER(3) DEFAULT (0) NOT NULL, 
        ENT_DATE DATE DEFAULT (SYSDATE) NOT NULL, 
    UPD_DATE DATE DEFAULT (SYSDATE) NOT NULL, 
    CONSTRAINT PK_T PRIMARY KEY(SUPPLIER_CD, ORDERRPT_NO,  ORDER_DATE, STORE_CD));

CREATE INDEX IDX_T ON T (SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE);

insert into t values('5636','62108373','20041129','0007','2','00',1,1, to_date('2004/11/29', 
'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('5636','62108373','20041129','0012','2','00',1,1,to_date('2004/11/29', 
'yyyy/mm/dd'), to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('5636','62108384','20041129','0014','2','00',1,1,to_date('2004/11/29', 
'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('5636','62108384','20041129','0015','3','00',1,1,to_date('2004/11/29', 
'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('1000','11169266','20040805','1309','4','00',8,8,to_date('2004/11/29', 
'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('1000','11169266','20040805','1312','12' ,'00',8,8,to_date('2004/04/22', 
'yyyy/mm/dd'),to_date('2004/11/23', 'yyyy/mm/dd'));

insert into t values('1000','11169266','20040805','1313','13' ,'00',12,12,to_date('2004/04/22', 
'yyyy/mm/dd'),to_date('2004/11/23', 'yyyy/mm/dd'));

Currently the following query is used:-

SELECT 
    SUPPLIER_CD,  ORDERRPT_NO,  ORDER_DATE,  
    SUM(DECODE(RTRIM(POSITION_NO),'1',ORDERQUANTITY_RES,0)) Q1, 
    SUM(DECODE(RTRIM(POSITION_NO),'2',ORDERQUANTITY_RES,0)) Q2, 
    SUM(DECODE(RTRIM(POSITION_NO),'3',ORDERQUANTITY_RES,0)) Q3, 
    SUM(ORDERQUANTITY_RES) ORDER_TOTAL 
FROM 
    T 
GROUP BY 
    SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE 

The execution plan when this query is executed on the real table which has 4m records is : - 

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=103002 Card=3571095 Bytes=107132850)

   1    0   SORT (GROUP BY NOSORT) (Cost=103002 Card=3571095 Bytes=107 132850)

   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=10 3002 Card=3571095 Bytes=107132850)

   3    2       INDEX (FULL SCAN) OF 'IDX_T' (NON-UNIQUE) (Cost=26942 Card=3571095)

Could you please tell me whether analytical functions could be used over here or a better approach 
for this query.

Thanks for your great help 
 


Followup   February 24, 2005 - 5am Central time zone:

there would be no need of analytics here.  analytics would be useful to get the 'aggregates' while 
preserving the 'details'

eg:

select empno, sal, sum(sal) over (partition by deptno)
  from emp;

shows the empno, their sal and the sum of all salaries in their dept.  that would be instead of 
coding:

select empno, sal, sum_sal
  from emp, (select deptno, sum(sal) sum_sal from emp gropu by deptno) t
 where emp.deptno = t.deptno
/


 

3 stars I was just wondering   February 24, 2005 - 6am Central time zone
Reviewer: A reader 
how would analytics help in the following example (the data nodes are implemented as rows in a 
table with two columns as pointers: split-from and merge-to, and the third column is "value", some 
number, not shown on diagram):

http://img23.exs.cx/my.php?loc=img23&image=directedgraph11th.png

The task is to use this directed dependency graph and prorate the "value" column in each row/node 
in the following way:

foreach node
-start with a node, for example 16
-visit each hierarchy on which 16 depends, in this case hierarchies for 14 and 15, SUM their values 
and the current value of node 16, and that will be new, prorated value for node 16
-repeat this recursively for each sub-hierarchy
until all nodes are prorated

I was thinking maybe to use combination of sys_connect_by_path and AF but not sure how. Any 
thoughts?
 


Followup   February 24, 2005 - 6am Central time zone:

you won't get very far with that structure in 9i and before.  connect by "loop" will be an error 
you see lots of with a directed graph.

analytics won't be appropriate either, they work on windows - not on hierarchies.

sys_connect_by_path is going to give you a string, not a sum


a scalar subquery in 10g with NOCYCLE on the query might work. 

3 stars What if there is no closure inside the graph?   February 24, 2005 - 9am Central time zone
Reviewer: A reader 
i.e. if the link between node 9 and 5 is removed, and the link between node 6 and 0 is removed.
Would that make difference? It would be a tree in that case. How should we proceed if that is the 
case? I was thinking maybe to use sys_connect_by_path to pack all sub-hierarchies one after 
another, and marker in window to be the depth or level. If the level switch from n to 1 that would 
mean the end of sub-hierarchy. If the level switch from 1 to 2 that is the begining of the 
hierarchy. And then aggregate over partition inside hierarchy view. Or is there a better approach? 


Followup   February 24, 2005 - 9am Central time zone:

scalar subqueries.

http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:30609389052804
 

3 stars Lead/Lag and 0 Sales   February 24, 2005 - 1pm Central time zone
Reviewer: Rob H from Winnipeg Canada
Thanks for all of the help so far. I have run into an issue where I have Companies and Contacts at 
that company.  Here are the tables.

create table SALES_TRANS
(
  CUSTOMER_ACCOUNT                   VARCHAR2(8)    ,
  STATION_NUMBER                     VARCHAR2(7)    ,
  PRODUCT_CODE                       VARCHAR2(8)    ,
  QUANTITY                           NUMBER         ,
  DATE_ISSUE                         DATE           ,
  PRICE                              NUMBER         ,
  VALUE                              NUMBER         );
/

Create table COMPANY_CUSTOMER
(
  COMPANY_ID                          NUMBER(9),
  CUSTOMER_ACCOUNT                    VARCHAR2(8));
/

Create table PRODUCT_INFO
(
  PRODUCT_CODE                       VARCHAR2(8) ,
  PRODUCT_GROUP                      VARCHAR2(25),
  PRODUCT_DESC                       VARCHAR2(100)
);
/

Running a query by customer (this select is a view called - SUM_CUST_TRANS_PRODUCT_FY_V)
Select
c.COMPANY_ID,
t.CUSTOMER_ACCOUNT,
p.product_group,
FISCAL_YEAR(DATE_ISSUE) fiscal_year,
sum(VALUE) total_VALUE_curr_y,
lead(sum(VALUE),1) over (partition by c.COMPANY_ID, t.CUSTOMER_ACCOUNT, p.product_group order by 
FISCAL_YEAR(DATE_ISSUE) desc)  total_VALUE_pre_y
From SALES_TRANS t
   inner join COMPANY_CUSTOMER c on t.CUSTOMER_ACCOUNT = C.CUSTOMER_ACCOUNT
   inner join PRODUCT_INFO P ON t.PRODUCT_CODE = p.PRODUCT_CODE
group by c.OMPANY_ID, t.CUSTOMER_ACCOUNT, p.product_group, fiscal_year


I get
COMPANY_ID,CUSTOMER_ACCOUNT,PRODUCT_GROUP,FISCAL_YEAR,TOTAL_VALUE_CURR_Y,TOTAL_VALUE_PRE_Y
"F0009631","27294370","Product1",2002,1460.08,0
"F0009631","27294370","Product2",2005,0,27926.31
"F0009631","27294370","Product2",2004,27926.31,18086.17
"F0009631","27294370","Product2",2003,18086.17,47597.05
"F0009631","27294370","Product2",2002,47597.05,0
"F0009631","27294370","Product2",2001,0,0
"F0009631","27294370","Product3",2004,64582.6,51041
"F0009631","27294370","Product3",2003,51041,60225
"F0009631","27294370","Product3",2002,60225,43150
"F0009631","27294370","Product3",2001,43150,50491
"F0009631","27294370","Product3",2000,50491,664
"F0009631","27294370","Product3",1999,664,0
"F0009631","27294370","Product4",2005,2119.1,1708.61
"F0009631","27294370","Product4",2004,1708.61,4050.82
"F0009631","27294370","Product4",2003,4050.82,15662.57
"F0009631","27294370","Product4",2002,15662.57,0
"F0009631","27294370","Product5",2005,0,351.64
"F0009631","27294370","Product5",2004,351.64,5873.61
"F0009631","27294370","Product5",2003,5873.61,2548.83
"F0009631","27294370","Product5",2002,2548.83,0
"F0009631","27294370","Product6",2004,17347.84,16781.33
"F0009631","27294370","Product6",2003,16781.33,10575
"F0009631","27294370","Product6",2002,10575,3659.67
"F0009631","27294370","Product6",2001,3659.67,4901.67
"F0009631","27294370","Product6",2000,4901.67,4073.47
"F0009631","27294370","Product6",1999,4073.47,0
"F0009631","27294370","Product7",2004,5377.5,2588
"F0009631","27294370","Product7",2003,2588,245
"F0009631","27294370","Product7",2000,245,0
"F0009631","27340843","Product2",2003,3013.71,0
"F0009631","27340843","Product3",1999,1411,0
"F0009631","27340843","Product5",2003,3254.9,0


Now if I run the same grouping by only company (this select is a view called - 
SUM_COMPANY_TRANS_PRODUCT_FY_V)
Select
c.COMPANY_ID,
p.product_group,
FISCAL_YEAR(DATE_ISSUE) fiscal_year,
sum(VALUE) total_VALUE_curr_y,
lead(sum(VALUE),1) over (partition by c.COMPANY_ID, p.product_group order by 
FISCAL_YEAR(DATE_ISSUE) desc)  total_VALUE_pre_y
From SALES_TRANS t
   inner join COMPANY_CUSTOMER c on t.CUSTOMER_ACCOUNT = C.CUSTOMER_ACCOUNT
   inner join PRODUCT_INFO P ON t.PRODUCT_CODE = p.PRODUCT_CODE
group by c.COMPANY_ID, p.product_group, fiscal_year

we get
COMPANY_ID,PRODUCT_GROUP,FISCAL_YEAR,TOTAL_VALUE_CURR_Y,TOTAL_VALUE_PRE_Y
"F0009631","Product1",2002,1460.08,0
"F0009631","Product2",2005,0,27926.31
"F0009631","Product2",2004,27926.31,21099.88
"F0009631","Product2",2003,21099.88,47597.05
"F0009631","Product2",2002,47597.05,0
"F0009631","Product2",2001,0,0
"F0009631","Product3",2004,64582.6,51041
"F0009631","Product3",2003,51041,60225
"F0009631","Product3",2002,60225,43150
"F0009631","Product3",2001,43150,50491
"F0009631","Product3",2000,50491,2075
"F0009631","Product3",1999,2075,0
"F0009631","Product4",2005,2119.1,1708.61
"F0009631","Product4",2004,1708.61,4050.82
"F0009631","Product4",2003,4050.82,15662.57
"F0009631","Product4",2002,15662.57,0
"F0009631","Product5",2005,0,351.64
"F0009631","Product5",2004,351.64,9128.51
"F0009631","Product5",2003,9128.51,2548.83
"F0009631","Product5",2002,2548.83,0
"F0009631","Product6",2004,17347.84,16781.33
"F0009631","Product6",2003,16781.33,10575
"F0009631","Product6",2002,10575,3659.67
"F0009631","Product6",2001,3659.67,4901.67
"F0009631","Product6",2000,4901.67,4073.47
"F0009631","Product6",1999,4073.47,0
"F0009631","Product7",2004,5377.5,2588
"F0009631","Product7",2003,2588,245
"F0009631","Product7",2000,245,0


The problem is that because if  I 
select * from SUM_CUST_TRANS_PRODUCT_FY_V where fiscal_year=2004 

Customer 27340843 will not show up (no 2004 purchases),  but that also means that the 
total_VALUE_pre_y for 2004 will never summarize by customer to the total_VALUE_pre_y for 2004 for 
the company.  Is there a better way to do this.  The goal is that we can show current year sales vs 
previous years sales by company, by customer, and potentially a larger summary higher than company 
(city).  

I guess the idea would be that I could somehow show for all customers in a company, all years, all 
products, that the company has purchases (cartesian) for every year purchasing.  This I think is 
difficult for large customer, sales transaction tables.

ie

"F0009631","27340843","Product2",2004,0,3013.71 <--- ***
"F0009631","27340843","Product2",2003,3013.71,0

*** This row doesn't exist in the customer view.  There are no 2004 sales, so doesn't appear, but 
we would like to see it so that the year previous shows.

I would love to "attach" some of the transactions if it would help.  Is there a better way? 


3 stars hierarchical cubes + MV?   February 25, 2005 - 2pm Central time zone
Reviewer: Rob H from Winnipeg Canada
Would hierarchical cubes and MV be the solution.  It seems like a lot of meta data to create.  We 
would have to create it for all customers, for all years, for all product groups.   


Followup   February 25, 2005 - 6pm Central time zone:

if you have "missing data", the only way i know to "make it up" is an outer join (partitioned outer 
joins in 10g rock, removing the need to create cartesian products of every dimension first) 

5 stars   February 27, 2005 - 2am Central time zone
Reviewer: Neelz from Japan
Dear Sir,

This is with regards to my previous post which is 5th above from this.

<quote>
SELECT 
    SUPPLIER_CD,  ORDERRPT_NO,  ORDER_DATE,  
    SUM(DECODE(RTRIM(POSITION_NO),'1',ORDERQUANTITY_RES,0)) Q1, 
    SUM(DECODE(RTRIM(POSITION_NO),'2',ORDERQUANTITY_RES,0)) Q2, 
    SUM(DECODE(RTRIM(POSITION_NO),'3',ORDERQUANTITY_RES,0)) Q3, 
    SUM(ORDERQUANTITY_RES) ORDER_TOTAL 
FROM 
    T 
GROUP BY 
    SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE 
</quote>

As you mentioned analytics could not be used, but could you please advice me on my problem,

The query is infact big, for brevity I just put few columns. The actual query is
SELECT 
    SUPPLIER_CD,  ORDERRPT_NO,  ORDER_DATE,  
    SUM(DECODE(RTRIM(POSITION_NO),'1',ORDERQUANTITY_RES,0)) Q1, 
    SUM(DECODE(RTRIM(POSITION_NO),'2',ORDERQUANTITY_RES,0)) Q2, 
    SUM(DECODE(RTRIM(POSITION_NO),'3',ORDERQUANTITY_RES,0)) Q3, 
    .....
    .....
    .....
    .....
    .....
    .....
    SUM(DECODE(RTRIM(POSITION_NO),'197',ORDERQUANTITY_RES,0)) Q197, 
    SUM(DECODE(RTRIM(POSITION_NO),'198',ORDERQUANTITY_RES,0)) Q198, 
    SUM(DECODE(RTRIM(POSITION_NO),'199',ORDERQUANTITY_RES,0)) Q199, 
    SUM(DECODE(RTRIM(POSITION_NO),'200',ORDERQUANTITY_RES,0)) Q200, 
    SUM(ORDERQUANTITY_RES) ORDER_TOTAL 
FROM 
    T 
GROUP BY 
    SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE 

As you could see there is a definite pattern on the sum function. Could you please help me in 
tuning this query?
Thanks in advance



 


Followup   February 27, 2005 - 8am Central time zone:

you aer doing a pivot -- looks great to me?  It is "classic" 

5 stars   February 27, 2005 - 9am Central time zone
Reviewer: Neelz from Japan
Dear Sir,

I am sorry if you felt like that, It is quite a new world for me here, started visiting this site 
3-4 months back then realized the enormity of it and its become like an addiction. Bought both 
books by you and started working on it. Reading the Oracle concepts guide.  Every day many times 
will try for asking a question but till now no luck, might be because of timezone difference.

Coming back to my question, since it is a huge query and was taking 35 min to execute, after 
reading through many articles here and in the books I was really confused as to what approach 
should I take. Still is. Analytical functions (not useful as you told), Function based indexes(no 
becuase we have a standard edition), Materialized views(no because its an OLTP), Stored Sql 
functions, Deterministic keyword, user defined aggregates, optimizer hints.. at present it is 
confusing for me.  

I am working on it with different approaches, could reduce the execution time upto 9.08 minutes. 
The query was written with an index hint earlier and by removing it, the execution time decreased 
upt 9+ minutes. 

I was thinking whether you could advice on what approach should I take 

Thanks for your valuable time, 



 


Followup   February 27, 2005 - 10am Central time zone:

if that is taking 35 minutes you either

a) have the memory settings like pga_aggreate_target/sort_area_size set way too low

b) you have billions of records that are hundreds of bytes in width

c) really slow disks

d) an overloaded system


I mean -- that query is pretty "simple" full scan, aggregate, nothing to it -- unless it is a gross 
simplification, it should not take 35 minutes.  Can you trace it with the 10046 level 12 trace and 
post the tkprof section that is relevant to just this query with the waits and all? 

5 stars   February 27, 2005 - 10am Central time zone
Reviewer: Neelz from Japan
Dear Sir,

Thank you for your kind reply,

This report is taken for the development system.
I used alter session set events '10046 trace name context forever, level 12'. The query execution 
time was 00:08:15.03


select
    supplier_cd, orderrpt_no, order_date,
    sum(decode(rtrim(position_no),'1',orderquantity_res,0)) q1,
    sum(decode(rtrim(position_no),'2',orderquantity_res,0)) q2,
    sum(decode(rtrim(position_no),'3',orderquantity_res,0)) q3,
    sum(decode(rtrim(position_no),'4',orderquantity_res,0)) q4,
    sum(decode(rtrim(position_no),'5',orderquantity_res,0)) q5,
    .....
    .....
    sum(decode(rtrim(position_no),'197',orderquantity_res,0)) q197,
    sum(decode(rtrim(position_no),'198',orderquantity_res,0)) q198,
    sum(decode(rtrim(position_no),'199',orderquantity_res,0)) q199,
    sum(decode(rtrim(position_no),'200',orderquantity_res,0)) q200,
    sum(orderquantity_res) order_total
from
    t
group by
    supplier_cd, orderrpt_no, order_date

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.04          0          0          0           0
Execute      2      0.02       0.04          0          0          0           0
Fetch       15    431.55     488.37      37147      36118         74         211
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       18    431.60     488.46      37147      36118         74         211

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 66  

Rows     Row Source Operation
-------  ---------------------------------------------------
    211  SORT GROUP BY 
4205484   TABLE ACCESS FULL T 


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      16        0.00          0.00
  SQL*Net more data to client                    30        0.00          0.00
  db file sequential read                         3        0.04          0.05
  db file scattered read                       2280        0.78         30.62
  direct path write                               4        0.00          0.00
  direct path read                              147        0.05          1.45
  SQL*Net message from client                    16      140.57        166.58
  SQL*Net break/reset to client                   2        0.01          0.01
********************************************************************************

Thank you 


Followup   February 27, 2005 - 11am Central time zone:

that is 8 minutes?

but I see some writes to temp here -- for 211 aggregated rows, perhaps your sort/pga is set small


Also, why do you need to rtrim() 4,205,484 rows?  (and why is something called position NUMBER in a 
string?) is that rtrim there "just in case" or is it really needed?  why would it have trailing 
blanks and is that not a data integrity issue that needs to be fixed?

(but this is an 8 minute query, not a 35 minute query, if it takes longer on production -- it'll be 
because it is waiting for something -- like IO...) 

5 stars   February 27, 2005 - 11am Central time zone
Reviewer: Neelz from Japan
Dear Sir,

This is a 3rd party application and the query was written with an index hint earlier. After 
removing the hint query execution time reduced to 8 min. Regarding the rtrim I have to check with 
the team if it is really needed. I will try the trace on production tomorrow.

And at last I could see the link for "Submit a New Question"!, I think I should try around 1.00 AM

Thanking You a lot 


Followup   February 27, 2005 - 11am Central time zone:

depends on your time zone, rarely am I up at 1am east coast (gmt-5) time doing this stuff! 

4 stars   March 2, 2005 - 8am Central time zone
Reviewer: Miki from Hungary
Tom,

I need to produce a moving average which has an even window size. If I want a 28 sized window, I 
need to look backward 14 but I need the first value of the window to be divided by 2 and I need to 
look forward 14 and the last value of the window to be divided by 2 also.
(a1/2+a2+...+a28+a29/2)/28
How could I accomplish it with the function: 
avg() over(...)?

Thanks in advance 


Followup   March 2, 2005 - 10am Central time zone:

this is the first thought that popped into my head:

a) get the sum(val) over 13 before and 13 after (27 rows possible).
b) get the lag(val,14)/2 and lead(val,14)/2
c) add those three numbers
d) divide by the count of non-null VALS observed (count(val) 13 before/after+ 1 if lag is not null 
+ 1 is lead is not null)


ops$tkyte@ORA9IR2> create table t
  2  as
  3  select rownum id, object_id val
  4    from all_objects
  5   where rownum <= 30;
 
Table created.
 
so, this was my "debug" query, just to see the data:


ops$tkyte@ORA9IR2> select id,
  2         sum(val) over 
                 (order by id rows between 13 preceding and 13 following) sum,
  3         count(val) over 
                 (order by id rows between 13 preceding and 13 following)+
  4             decode(lag(val,14) over (order by id),null,0,1)+
  5             decode(lead(val,14) over (order by id),null,0,1) cnt,
  6             lag(id,14) over (order by id) lagid,
  7             lag(val,14) over (order by id) lagval,
  8             lead(id,14) over (order by id) leadid,
  9             lead(val,14) over (order by id) leadval
 10    from t
 11   order by id;
 
        ID        SUM        CNT      LAGID     LAGVAL     LEADID    LEADVAL
---------- ---------- ---------- ---------- ---------- ---------- ----------
         1     218472         15                               15       6399
         2     224871         16                               16      19361
         3     244232         17                               17      23637
         4     267869         18                               18      14871
         5     282740         19                               19      20668
         6     303408         20                               20      18961
         7     322369         21                               21      15767
         8     338136         22                               22      20654
         9     358790         23                               23       7065
        10     365855         24                               24      17487
        11     383342         25                               25      11077
        12     394419         26                               26      20772
        13     415191         27                               27      15505
        14     430696         28                               28      12849
        15     425648         29          1      17897         29      23195
        16     441314         29          2       7529         30      18523
        17     436505         28          3      23332
        18     422306         27          4      14199
        19     399409         26          5      22897
        20     389266         25          6      10143
        21     365728         24          7      23538
        22     342135         23          8      23593
        23     332316         22          9       9819
        24     320581         21         10      11735
        25     303084         20         11      17497
        26     295369         19         12       7715
        27     276010         18         13      19359
        28     266791         17         14       9219
        29     260392         16         15       6399
        30     241031         15         16      19361
 
30 rows selected.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select id,
  2         (sum(val) over 
                (order by id rows between 13 preceding and 13 following)+
  3              nvl(lag(val,14) over (order by id)/2,0)+
  4              nvl(lead(val,14) over (order by id)/2,0))/
  5             nullif(
  6           count(val) over 
                  (order by id rows between 13 preceding and 13 following)+
  7               decode(lag(val,14) over (order by id),null,0,1)+
  8               decode(lead(val,14) over (order by id),null,0,1)
  9                  ,0) avg
 10    from t
 11   order by id;
 
        ID        AVG
---------- ----------
         1    14778.1
         2 14659.4688
         3 15061.7941
         4 15294.6944
         5 15424.9474
         6  15644.425
         7 15726.3095
         8 15839.2273
         9 15753.1522
        10 15608.2708
        11   15555.22
        12 15569.4231
        13 15664.5741
        14 15611.4464
        15      15386
        16 15666.8966
        17 16006.1071
        18 15903.9074
        19 15802.2115
        20    15773.5
        21 15729.0417
        22 15388.3261
        23 15328.4318
        24 15545.1667
        25  15591.625
        26 15748.7632
        27 15871.6389
        28 15964.7353
        29 16474.4688
        30    16714.1
 
30 rows selected.
 
ops$tkyte@ORA9IR2>

I did not do a detailed check of the results -- but that should get you going (remember -- there 
are 29 rows -- 14+1+14!!! and beware NULLs) 

4 stars   March 2, 2005 - 10am Central time zone
Reviewer: Miki from Hungary
Tom,

Your answer is excellent. That is - almost - what I needed.
If my window size is odd I can use simly avg() over() function. I am looking for a solution where I 
can also use avg() over() instead of sum() over()/count().
Is it possible?

Thank you!  


Followup   March 2, 2005 - 11am Central time zone:

if you want to do things to row 1 and row 29 in the window "special" like this -- this was the only 
thing I thought of. 

4 stars   March 2, 2005 - 11am Central time zone
Reviewer: Miki from Hungary
Thank you! I will use your recommended code. 


4 stars consecutive days... 8.1.7   March 9, 2005 - 1pm Central time zone
Reviewer: Dean from IL
create table day_cd
(dt  date
,cd  varchar2(2))
/


insert into day_cd values ('08-MAR-05', 'BD');
insert into day_cd values ('09-MAR-05', 'AD');
insert into day_cd values ('10-MAR-05', 'AD');
insert into day_cd values ('11-MAR-05', 'AD');
insert into day_cd values ('12-MAR-05', 'AD');
insert into day_cd values ('13-MAR-05', 'AD');
insert into day_cd values ('14-MAR-05', 'CD');
insert into day_cd values ('15-MAR-05', 'CD');
insert into day_cd values ('16-MAR-05', 'AD');
insert into day_cd values ('17-MAR-05', 'AD');
insert into day_cd values ('18-MAR-05', 'AD');
insert into day_cd values ('19-MAR-05', 'CD')
/


SELECT * FROM DAY_CD;

DT        CD
--------- --
08-MAR-05 BD
09-MAR-05 AD
10-MAR-05 AD
11-MAR-05 AD
12-MAR-05 AD
13-MAR-05 AD
14-MAR-05 CD
15-MAR-05 CD
16-MAR-05 AD
17-MAR-05 AD
18-MAR-05 AD
19-MAR-05 CD

I'd like the count the occurrence of each code as it occurs in consecutive days as one occurrence.

So that the output would be:

CD      OCCURRENCES
--    -----------
AD    2
BD    1
CD    2
 


Followup   March 9, 2005 - 1pm Central time zone:

take a look at this thread, try to understand analytics and see if you cannot figure out how you 
might use lag to do this.

see also:

http://www.oracle.com/technology/oramag/oracle/04-mar/o24asktom.html
for some hints (analytics to the rescue) 

4 stars nevermind...   March 9, 2005 - 1pm Central time zone
Reviewer: Dean from IL
select cd, count(*)
from
(
select cd, dt, case when (lead(dt) over (partition by cd order by dt) - dt) = 1 then 1 else 0 end 
day
from day_cd
)
where day = 0
group by cd
 


4 stars we were responding at the same time...   March 9, 2005 - 2pm Central time zone
Reviewer: Dean from IL
:)

select cd, count(*)
from
(
select cd, dt, case when (lead(dt) over (partition by cd order by dt) - dt) = 1 then 1 else 0 end 
day
from day_cd
)
where day = 0
group by cd


CD   COUNT(*)
-- ----------
AD          2
BD          1
CD          2

Thanks for all of your help... 


4 stars max() over() till not the current row   March 10, 2005 - 4am Central time zone
Reviewer: Miki from Hungary
Tom,

I have the following input

DATUM    T    COL1    COL2    COL3    COL4
2005.02.19 9:29    T    1    0    0    0
2005.02.20 9:29        0    0    0    0
2005.02.21 9:29        0    0    0    0
2005.02.22 9:29    T    1    0    0    0
2005.02.23 9:29        0    0    0    0
2005.02.24 9:29        0    0    0    0
2005.02.25 9:29        0    0    0    0
2005.02.26 9:29        0    0    0    0
2005.02.27 9:29    T    0    1    0    0
2005.02.28 9:29        0    0    0    0
2005.03.01 9:29        0    0    0    0
2005.03.02 9:29    T    1    1    0    0
2005.03.03 9:29        0    0    0    0
2005.03.04 9:29    T    1    1    0    0
2005.03.05 9:29        0    0    0    0
2005.03.06 9:29    T    1    0    0    0
2005.03.07 9:29        0    0    0    0
2005.03.08 9:29        0    0    0    0
2005.03.09 9:29        0    0    0    0

When value of column T is ’T’ a rule determines which columns (col1, …, col4) get 1 or 0.
Unfortunately, with the rule more then one column can get value 1. So, if col1+…+col4 > 1 then I 
would like colx to be the previous colx where t = 'T' and col1+...+col4 = 1

So, the output is the following
DATUM    T    COL1    COL2    COL3    COL4
2005.02.19 9:29    T    1    0    0    0
2005.02.20 9:29        0    0    0    0
2005.02.21 9:29        0    0    0    0
2005.02.22 9:29    T    1    0    0    0
2005.02.23 9:29        0    0    0    0
2005.02.24 9:29        0    0    0    0
2005.02.25 9:29        0    0    0    0
2005.02.26 9:29        0    0    0    0
2005.02.27 9:29    T    0    1    0    0
2005.02.28 9:29        0    0    0    0
2005.03.01 9:29        0    0    0    0
2005.03.02 9:29    T    0    1    0    0
2005.03.03 9:29        0    0    0    0
2005.03.04 9:29    T    0    1    0    0
2005.03.05 9:29        0    0    0    0
2005.03.06 9:29    T    1    0    0    0
2005.03.07 9:29        0    0    0    0
2005.03.08 9:29        0    0    0    0
2005.03.09 9:29        0    0    0    0
I tried to use a max() over() function to replace the ’wrong’ value but it dosn’t work because I 
can’t see the max datum till the previous record where t=’T’ and col1+...+col4 = 1

...
case when t = ’T’ and col1+…+col4>1 and
greatest(nvl(max(decode(col1,1,datum)) over(order by datum), sysdate-10000), 
nvl(max(decode(col2,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col3,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col4,1,datum)) over(order by datum), sysdate-10000)
) = nvl(max(decode(col1,1,datum)) over(order by datum), sysdate-10000)  then 1 else 0 end col1,
…
Case when t = ’T’ and col1+…+col4>1 and
Greatest(nvl(max(decode(col1,1,datum)) over(order by datum), sysdate-10000), 
nvl(max(decode(col2,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col3,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col4,1,datum)) over(order by datum), sysdate-10000)
) = nvl(max(decode(col4,1,datum)) over(order by datum), sysdate-10000)  then 1 else 0 end col4…

Could you give me a solution to my problem?

Thanks in advance
miki
 


Followup   March 10, 2005 - 7am Central time zone:

I can, but I'd need a create table and some inserts.


You might look at:

http://www.oracle.com/technology/oramag/oracle/04-mar/o24asktom.html
analytics to the rescue because I'll be using that exact technique. 

4 stars   March 10, 2005 - 8am Central time zone
Reviewer: Miki from Hungary
Here is my table populated with data:
create table T
(
  DATUM DATE,
  T     VARCHAR2(1),
  COL1  NUMBER,
  COL2  NUMBER,
  COL3  NUMBER,
  COL4  NUMBER
);

insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('16-01-2005 13:17:46', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('04-01-2005 17:23:13', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('01-03-2005 02:59:17', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('11-12-2004 21:59:18', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('10-01-2005 12:00:22', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('24-02-2005 02:36:51', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('08-12-2004 11:21:15', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('07-01-2005 20:52:26', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('02-02-2005 23:44:33', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('04-03-2005 16:25:12', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('01-01-2005 19:02:28', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('22-01-2005 11:21:41', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('19-01-2005 15:32:18', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('19-12-2004 03:07:10', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('21-02-2005 16:25:42', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('01-01-2005 01:02:39', 'dd-mm-yyyy hh24:mi:ss'), 'T', 0, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('15-12-2004 05:49:26', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('04-02-2005 14:35:34', 'dd-mm-yyyy hh24:mi:ss'), 'T', 0, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('02-12-2004 15:01:42', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
commit;

select t.* from t t
order by 1;

       DATUM    T    COL1    COL2    COL3    COL4
1    2004.12.02. 15:01:42        0    0    0    0
2    2004.12.08. 11:21:15        0    0    0    0
3    2004.12.11. 21:59:18    T    1    0    0    0
4    2004.12.15. 5:49:26        0    0    0    0
5    2004.12.19. 3:07:10        0    0    0    0
6    2005.01.01. 1:02:39    T    0    1    0    0
7    2005.01.01. 19:02:28        0    0    0    0
8    2005.01.04. 17:23:13    T    1    1    0    0
9    2005.01.07. 20:52:26        0    0    0    0
10    2005.01.10. 12:00:22        0    0    0    0
11    2005.01.16. 13:17:46        0    0    0    0
12    2005.01.19. 15:32:18    T    1    1    0    0
13    2005.01.22. 11:21:41        0    0    0    0
14    2005.02.02. 23:44:33        0    0    0    0
15    2005.02.04. 14:35:34    T    0    1    0    0
16    2005.02.21. 16:25:42        0    0    0    0
17    2005.02.24. 2:36:51        0    0    0    0
18    2005.03.01. 2:59:17        0    0    0    0
19    2005.03.04. 16:25:12    T    1    0    0    0
Line 8 and 12 have more then one column that contain 1.
So, I need to "copy" every colx from line 6 because it is the first line (ordered by datum), that 
has value 'T' for column T and only one colx has value 1.

Thank you 


Followup   March 10, 2005 - 8am Central time zone:

ops$tkyte@ORA9IR2> select t, col1, col2, col3, col4,
  2         substr(max(data) over (order by datum),11,1) c1,
  3         substr(max(data) over (order by datum),12,1) c2,
  4         substr(max(data) over (order by datum),13,1) c3,
  5         substr(max(data) over (order by datum),14,1) c4,
  6             case when col1+col2+col3+col4 > 1 then '<---' end fix
  7    from (
  8  select t.*,
  9         case when t = 'T' and col1+col2+col3+col4 = 1
 10                  then to_char(row_number() over (order by datum) ,'fm0000000000') || col1 || 
col2 || col3 || col4
 11                  end data
 12    from t
 13         )
 14   order by datum;

T       COL1       COL2       COL3       COL4 C C C C FIX
- ---------- ---------- ---------- ---------- - - - - ----
           0          0          0          0
           0          0          0          0
T          1          0          0          0 1 0 0 0
           0          0          0          0 1 0 0 0
           0          0          0          0 1 0 0 0
T          0          1          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
T          1          1          0          0 0 1 0 0 <---
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
T          1          1          0          0 0 1 0 0 <---
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
T          0          1          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
T          1          0          0          0 1 0 0 0

19 rows selected.
 

5 stars Great!   March 10, 2005 - 9am Central time zone
Reviewer: Miki from Hungary
Great solution!
Thank you, it is that i expected. 


4 stars book on Analytics   March 10, 2005 - 11am Central time zone
Reviewer: A reader 
Hi Tom,

It is high time that you publish the book on 'Analytic functions' - there is a lot one can do with 
these , but very few people are fully aware of it

When is this book due ?

thanks 


5 stars A variation of Dean's question ...   March 10, 2005 - 8pm Central time zone
Reviewer: Julius from Fremont, CA
create table tt (
 did     number,
 dd     date,
 status number);
 
alter table tt add constraint tt_pk primary key (did,dd) using index;

insert into tt values (-111,to_date('03/03/2005','mm/dd/yyyy'),11);
insert into tt values (-111,to_date('03/04/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/05/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/06/2005','mm/dd/yyyy'),11);
insert into tt values (-111,to_date('03/07/2005','mm/dd/yyyy'),33);
insert into tt values (-111,to_date('03/08/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/09/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/10/2005','mm/dd/yyyy'),22);

insert into tt values (-222,to_date('03/04/2005','mm/dd/yyyy'),33);
insert into tt values (-222,to_date('03/05/2005','mm/dd/yyyy'),33);
insert into tt values (-222,to_date('03/06/2005','mm/dd/yyyy'),77);
insert into tt values (-222,to_date('03/07/2005','mm/dd/yyyy'),33);
insert into tt values (-222,to_date('03/08/2005','mm/dd/yyyy'),55);
insert into tt values (-222,to_date('03/09/2005','mm/dd/yyyy'),11);

I need a query which would return following result set where days_in_status is a count of 
consecutive days the did has been in its current status (dd values are days only). I've been trying 
to use analytics but without much success so far. Any idea? Thanks!!

 DID          DD STATUS DAYS_IN_STATUS
----- ---------- ------ --------------
-111  03/10/2005     22              3
-222  03/09/2005     11              1
 


Followup   March 10, 2005 - 9pm Central time zone:

ops$tkyte@ORA9IR2> select did, max(dd), count(*)
  2    from (
  3  select x.*, max(grp) over (partition by did order by dd desc) maxgrp
  4    from (
  5  select tt.*,
  6         case when lag(status) over (partition by did order by dd desc) <> status
  7                  then 1
  8                  end grp
  9    from tt
 10         ) x
 11             )
 12   where maxgrp is null
 13   group by did
 14  /

       DID MAX(DD)     COUNT(*)
---------- --------- ----------
      -222 09-MAR-05          1
      -111 10-MAR-05          3



is one approach... 

5 stars SQL Query   March 15, 2005 - 6pm Central time zone
Reviewer: a reader 
Hi Tom,

create table a
(accno         number(8)     not null,
 amount_paid    number(7)   not null)
/


insert into a values (1, 1000);
insert into a values (2, 1500);
insert into a values (3, 2000);
insert into a values (4, 3000);
insert into a values (5, 3000);

Could you please help me in writing the following query without using rownum and analytics. 

list the accno corresponding to maximum amount paid. In case of more than one accounts having the 
same max amount paid, list any one.

I am expecting the result to be accno 4 or 5


Thanks for your time.

Regards 


Followup   March 15, 2005 - 9pm Central time zone:

sounds like homework.

I give a similar quiz question in interviews (find the more frequently occuring month)


tkyte@ORA8IW> select substr( max( to_char(amount_paid,'fm0000000') || accno ), 8 ) accno
  2    from a;

ACCNO
-----------------------------------------
5


is one possible approach (assuming that amount_paid is positive) 



tkyte@ORA8IW> select max(accno)
  2    from a
  3   where amount_paid = ( select max(amount_paid) from a );

MAX(ACCNO)
----------
         5


is another (that would work well if amount_paid,accno were indexed....)
 

3 stars negatives to worry about ...   March 15, 2005 - 9pm Central time zone
Reviewer: Gabe 
SQL> select * from a;

     ACCNO AMOUNT_PAID
---------- -----------
         1          -2
         2          -1

SQL> select substr( max( to_char(amount_paid,'fm0000000') || accno ), 8 ) accno from a;

ACCNO
-----------------------------------------
21
 


Followup   March 15, 2005 - 10pm Central time zone:

....
(assuming that amount_paid is positive) 
.......

that was caveated and why I gave two answers ;)
 

4 stars cannot read ...   March 15, 2005 - 10pm Central time zone
Reviewer: Gabe 
Sorry about that ... missed it completely.

 


3 stars following an idea of mikito ...   March 16, 2005 - 8am Central time zone
Reviewer: Matthias Rogel from Kaiserslautern, Germany
  1  select accno
  2  from a
  3  start with amount_paid = (select max(amount_paid) from a)
  4         and accno = (select min(accno) from a where amount_paid = (select max(amount_paid) from 
a))
  5* connect by prior null is not null
SQL> /

     ACCNO
----------
         4

would be a third solution 


Followup   March 16, 2005 - 8am Central time zone:

there are many solutions -- this one would win a Rube Goldberg award though :) 

5 stars another query using analytics   March 29, 2005 - 11am Central time zone
Reviewer: A reader 
I've got 2 tables, t1 and t2.
t1(1 column):

t1.x(int ,primary key)

1
2
3

and t2(3 columns,index on t2.y):

t2.x(int) t2.y(int)  t2.z(int)

1         7000        1      
1         7000        6
1         8000        8 
2         7000        1
2         7000        5
3         7000        3 
3         8000        1
3         8000        7
3         9000        5

I would like to have a report like this:

t1.x     t2.y   count  min max

1        7000   2       1   8
1        8000   1       1   8
2        7000   2       1   5
3        7000   1       1   7
3        8000   2       1   7
3        9000   1       1   7

What I came up with is:

select distinct t1.x,t2.y,
count(*) over (partition by t1.x,t2.y) as count,
min(t2.z) over (partition by t1.x) as min,
max(t2.z) over (partition by t1.x) as max
from t1,t2 where
where t1.x=t2.x;

I was wondering if this query is good enough, or if there's a better way(in terms of performance) 
to write this query. I'm new to analytics, and your help would be very much appreciated.
 


Followup   March 29, 2005 - 12pm Central time zone:

we could probably do this in analytics without the distinct, something like


select t1.x, t2.y, t2.cnt, 
       min(t2.z) over (partition by t1.x), 
       max(t2.z) over (partition by t1.x)
  from t1, (select x, y, count(*) cnt from t2 group by x, y ) t2
 where t1.x = t2.x;

and maybe even pusht he min/max() down into the inline view.
 

5 stars Analytics problem   April 8, 2005 - 12pm Central time zone
Reviewer: Mark from NY
Hi Tom,

I have a problem whose solution I'm pretty sure involves analytic functions. I've been struggling 
with it for some time, but analytics are new to me. I want to go from this:

/* create and inserts */
create table test.test (ordernum varchar2(10), 
                         tasktype char(3),
                         feetype varchar2(20),
                         amount number(10,2));

insert into test.test(ordernum, tasktype, feetype, amount)
               values('123123', 'DOC', 'Product Fee', 15);
insert into test.test(ordernum, tasktype, feetype, amount)
               values('123123', 'DOC', 'Copy Fee', 1);
insert into test.test(ordernum, tasktype, feetype, amount)
               values('34864', 'COS', 'Setup Fee', 23);
insert into test.test(ordernum, tasktype, feetype, amount)
               values('34864', 'COS', 'File Review Fee', 27);
insert into test.test(ordernum, tasktype, feetype, amount)
               values('34864', 'COS', 'Statutory Fee', 23);               
insert into test.test(ordernum, tasktype, feetype, amount)
               values('56432', 'DOC', 'Product Fee', 80);    
insert into test.test(ordernum, tasktype, feetype, amount)
               values('56432', 'DOC', 'Prepayment', -16);

SQL> select tasktype, ordernum, feetype, amount from test.test;

TAS ORDERNUM   FEETYPE                  AMOUNT
--- ---------- -------------------- ----------
DOC 123123     Product Fee                  15
DOC 123123     Copy Fee                      1
COS 34864      Setup Fee                    23
COS 34864      File Review Fee              27
COS 34864      Statutory Fee                22
DOC 56432      Product Fee                  80
DOC 56432      Prepayment                  -16

...to this:

TAS ORDERNUM FEE1        FEE2            FEE3          FEE4     FEE5
--- -------- ----------- --------        ----------    -------- --------
DOC          Product Fee Copy Fee        Prepayment
DOC 123123   15          1
DOC 56432    80                          -16
COS          Setup Fee   File Review Fee Statutory Fee
COS 34864    23          27              22

Allow me to explain. For each tasktype I would like a heading row, which, going across, contains 
all the feetypes found in test.test for that particular tasktype. There should never be more than 
five feetypes.

For each ordernum under each tasktype, I would like to have the amounts going across, underneath 
the appropriate feetypes. 

I'm pretty sure my solution involves the lag and/or lead functions, partitioning over tasktype. I 
particularly seem to have trouble wrapping my brain around the problem of how to get a distinct 
ordernum while keeping intact the data in other columns (where ordernums duplicate).

I hope my explanation is clear enough.

Hope you can help. Thanks in advance. I will continue working on this. 


Followup   April 8, 2005 - 12pm Central time zone:

ops$tkyte@ORA9IR2> with columns
  2  as
  3  (select tasktype, feetype, row_number() over (partition by tasktype order by feetype) rn
  4     from (select distinct tasktype, feetype from test )
  5  )
  6  select a.tasktype, a.ordernum,
  7         to_char( max( decode( rn, 1, amount ) )) fee1,
  8         to_char( max( decode( rn, 2, amount ) )) fee2,
  9         to_char( max( decode( rn, 3, amount ) )) fee3,
 10         to_char( max( decode( rn, 4, amount ) )) fee4,
 11         to_char( max( decode( rn, 5, amount ) )) fee5
 12    from test a, columns b
 13   where a.tasktype = b.tasktype
 14     and a.feetype = b.feetype
 15   group by a.tasktype, a.ordernum
 16   union all
 17  select tasktype, null,
 18         ( max( decode( rn, 1, feetype ) )) fee1,
 19         ( max( decode( rn, 2, feetype ) )) fee2,
 20         ( max( decode( rn, 3, feetype ) )) fee3,
 21         ( max( decode( rn, 4, feetype ) )) fee4,
 22         ( max( decode( rn, 5, feetype ) )) fee5
 23    from columns
 24   group by tasktype
 25   order by 1 desc, 2 nulls first
 26  /
 
TAS ORDERNUM   FEE1            FEE2            FEE3            FEE4 FEE5
--- ---------- --------------- --------------- --------------- ---- ----
DOC            Copy Fee        Prepayment      Product Fee
DOC 123123     1                               15
DOC 56432                      -16             80
COS            File Review Fee Setup Fee       Statutory Fee
COS 34864      27              23              23


of course. :)


(suggestion, break it out, run each of the bits to see what they do.  basically, columns is a view 
used to "pivot" on -- we needed to assign a column number to each FEETYPE by TASKTYPE.  That is all 
that view does.

Then, we join that to test and "pivot" naturally.

Union all in the pivot of the column names....

and sort) 

5 stars RE: Analytics problem   April 8, 2005 - 1pm Central time zone
Reviewer: Mark from NY
Excellent! I'll definitely break it down to figure out exactly what you did. Thank you very much.  


4 stars Re: “another query using analytics”   April 8, 2005 - 3pm Central time zone
Reviewer: Gabe 
You weren’t given any resources … so, I understand your solution was in fact merely a [untested] 
suggestion.

create table t1 ( x int primary key );

insert into t1 values (1);
insert into t1 values (2);
insert into t1 values (3);

create table t2 ( x int not null references t1(x), y int not null, z int not null );

insert into t2 values ( 1,7000,1);
insert into t2 values ( 1,7000,6);
insert into t2 values ( 1,8000,8);
insert into t2 values ( 2,7000,1);
insert into t2 values ( 2,7000,5);
insert into t2 values ( 3,7000,3);
insert into t2 values ( 3,8000,1);
insert into t2 values ( 3,8000,7);
insert into t2 values ( 3,9000,5);

My solution (avoiding the distinct) is not necessarily better than the one presented by the “A 
reader”, but here it goes:

flip@FLOP> select x, y, c
  2        ,min(f) over (partition by x) f
  3        ,max(l) over (partition by x) l
  4  from (
  5  select t2.x, t2.y, count(*) c
  6        ,min(t2.z) keep (dense_rank first order by t2.z) f
  7        ,max(t2.z) keep (dense_rank last  order by t2.z) l
  8  from  t1, t2
  9  where t1.x = t2.x
 10  group by t2.x, t2.y
 11  ) t
 12  ;

         X          Y          C          F          L
---------- ---------- ---------- ---------- ----------
         1       7000          2          1          8
         1       8000          1          1          8
         2       7000          2          1          5
         3       7000          1          1          7
         3       8000          2          1          7
         3       9000          1          1          7

Cheers. 


Followup   April 8, 2005 - 3pm Central time zone:

without create tables and inserts, I guess :)

takes too much time to create the setup for every case (wish people would read the page that they 
have to page down through to put something up here...) 

3 stars I'm confused   April 18, 2005 - 9pm Central time zone
Reviewer: Mikito 
Given that

select distinct deptno
from emp

is essentially

select deptno
from emp
group by deptno

how is distinct query should be rewritten in case with analytics columns? Neither

SELECT deptno, count(1),
       min(sal) over (partition by deptno) f
from emp
group by deptno,min(sal) over (partition by deptno);

nor

SELECT deptno, count(1),
       min(sal) over (partition by deptno) f
from emp
group by deptno,f;

seems to be a valid syntax. 

(To repeat: "Does analytics scale?")



 


Followup   April 19, 2005 - 7am Central time zone:

why would you use analytics that way?  

Tell us the question, we'll tell you the method.


select deptno, count(*)  /* because count(1) is counter-intuitive */, 
       min(sal) over (partition by deptno) f
  from emp
 group by deptno, min(sal) over (partition by deptno)

would not make sense.  You are saying "get all deptnos, by deptno find the minimum salary and 
associate that number with each one, then aggregate by deptno/min salary to count records"

You should just ask:

find the minimum salary and count of records by deptno.


select deptno, count(*), min(sal) from emp group by deptno;


is what you were looking for.  analytics scale up wonderful.  Say the question was instead:

you have a table full of records that have a customer_id and a last_sale_date, I would like you to 
retrieve the last record for each customer.


select * 
  from ( select cust.*, max(sale_date) over (partition by cust_id) lsd
           from cust )
 where sale_date = lsd;

versus

select * 
  from cust
 where sale_date = 
    (select max(sale_date) from cust c2 where cust_id = cust.cust_id )
/

or

select * 
  from cust, (select cust_id, max(sale_date) lsd from cust group by cust_id)x
 where cust.cust_id = x.cust_id
   and cust.sale_date = x.lsd
/

for example 

5 stars Tricky SQL?   April 19, 2005 - 10am Central time zone
Reviewer: A reader 
CREATE TABLE master
(
    m_no         INTEGER       PRIMARY KEY,
    m_name        VARCHAR2(255) NOT NULL UNIQUE
);

create table detail
(
  d_pk integer primary key,
  d_no integer not null references m(m_no),
  d_date date,
  d_data varchar2(255)
);

Given a d_pk, how can I get the second-to-last (ordered by d_date) record from M for that M_NAME? 
In other words, for a given m_name, there are multiple records in "detail" with different dates. 
Given one of those records, I want the prior record in "detail" (there might not be any)

I tried to design a simple master detail table, but maybe I over-normalized?

Thanks 


Followup   April 19, 2005 - 12pm Central time zone:


are you saying "i have a detail record, I want the detail record that came 'in front' of this one"? 
 

that is what I sort of hear, but the second to last is confusing me.


select * 
  from (
select ...., lead(d_pk) over (order by d_date) next_pk
  from master, detail
 where master.m_no = (select d_no from detail where d_pk = :x)
   and master.m_no = detail.d_no
       )
 where next_pk = :x;

I think that does that.  You get the master/detail for that d_pk (inline view)

Use lead to assign to each record the "next pk" after sorting by d_date

Keep the record whose 'next' records primary key was the one you wanted..
 

3 stars a little inconsistency   April 19, 2005 - 1pm Central time zone
Reviewer: mikito 
I meant inconsistency, not scalability. Why "distinct"

SELECT distinct deptno, 
       min(sal) over (partition by deptno) f
from emp

is allowed, whereas "group by" doesn't? If someone has trouble understanding what analytics with 
"group by" means, the same should apply to analytics with "distinct" as well.
 


Followup   April 19, 2005 - 1pm Central time zone:

because group by is not distinct, they are frankly very different concepts.

 

4 stars detail and summery in one sql statement   April 27, 2005 - 3pm Central time zone
Reviewer: A reader 
hi tom,

quick shot. i have to process many detail (column a - f) and one summery record (containing sum 
(column c) + count (*) over all recs + some literal placeholders) within one sql statement. is 
there another way then using a classical UNION ALL select? any new way with analytical functions? 


Followup   April 27, 2005 - 3pm Central time zone:

need small example, did not follow your example as stated. 

3 stars detail and summery in one sql statement   April 28, 2005 - 10am Central time zone
Reviewer: A reader 
hi tom,

here is the small and simple test case to show what i mean.

SQL> create table t1 (col1 number primary key, col2 number, col3 number);

Tabelle wurde angelegt.

SQL> create table t2 (col0 number primary key, col1 number references t1 (col1), col2 number, col3 
number, col4 number);

Tabelle wurde angelegt.

SQL> create index t2_col1 on t2 (col1);

Index wurde angelegt.

SQL> insert into t1 values (1, 1, 1);

1 Zeile wurde erstellt.

SQL> insert into t2 values (1, 1, 1, 1, 1);

1 Zeile wurde erstellt.

SQL> insert into t2 values (2, 1, 2, 2, 2);

1 Zeile wurde erstellt.

SQL> insert into t2 values (3, 1, 3, 3, 3);

1 Zeile wurde erstellt.

SQL> analyze table t1 compute statistics;

Tabelle wurde analysiert.

SQL> analyze table t2 compute statistics;

Tabelle wurde analysiert.

SQL> select 0 rowtype, t1.col1 display1, t1.col2 display2, t2.col3 display3, t2.col4 display4
  2  from   t1 join t2 on (t1.col1 = t2.col1)
  3  where  t1.col1 = 1
  4  UNION ALL
  5  select 1 rowtype, t1.col1, count (*), null, sum (t2.col4)
  6  from   t1 join t2 on (t1.col1 = t2.col1)
  7  where  t1.col1 = 1
  8  group  by t1.col1
  9* order  by rowtype

   ROWTYPE   DISPLAY1   DISPLAY2   DISPLAY3   DISPLAY4
---------- ---------- ---------- ---------- ----------
         0          1          1          1          1
         0          1          1          2          2
         0          1          1          3          3
         1          1          3                     6

that is creating detail + summary record within one sql statement! 


Followup   April 28, 2005 - 10am Central time zone:

ops$tkyte@ORA10G> select grouping_id(t1.col2) rowtype,
  2         t1.col1 d1,
  3             decode( grouping_id(t1.col2), 0, t1.col2, count(*) ) d2,
  4             decode( grouping_id(t1.col2), 0, t2.col3, null ) d3,
  5             decode( grouping_id(t1.col2), 0, t2.col4, sum(t2.col4) ) d4
  6    from t1, t2
  7   where t1.col1 = t2.col1
  8   group by grouping sets((t1.col1),(t1.col1,t1.col2,t2.col3,t2.col4))
  9  /
 
   ROWTYPE         D1         D2         D3         D4
---------- ---------- ---------- ---------- ----------
         0          1          1          1          1
         0          1          1          2          2
         0          1          1          3          3
         1          1          3                     6
 

5 stars detail and summery in one sql statement   April 29, 2005 - 10am Central time zone
Reviewer: A reader 
hi tom,

thanks for your help. that's exactly what i need. analytics rock, analytics roll as you said. :)

unfortunately it is hard to get. :(

i looked in the documentation but cannot understand the grouping_id values in the example. please 
could you explain? what is "2" or "3" in the grouping column?


Examples
The following example shows how to extract grouping IDs from a query of the sample table sh.sales:

SELECT channel_id, promo_id, sum(amount_sold) s_sales,
   GROUPING(channel_id) gc,
   GROUPING(promo_id) gp,
   GROUPING_ID(channel_id, promo_id) gcp,
   GROUPING_ID(promo_id, channel_id) gpc
   FROM sales
   WHERE promo_id > 496
   GROUP BY CUBE(channel_id, promo_id);
 
C   PROMO_ID    S_SALES         GC         GP        GCP        GPC
- ---------- ---------- ---------- ---------- ---------- ----------
C        497   26094.35          0          0          0          0
C        498    22272.4          0          0          0          0
C        499    19616.8          0          0          0          0
C       9999   87781668          0          0          0          0
C            87849651.6          0          1          1          2
I        497    50325.8          0          0          0          0
I        498    52215.4          0          0          0          0
I        499   58445.85          0          0          0          0
I       9999  169497409          0          0          0          0
I             169658396          0          1          1          2
P        497   31141.75          0          0          0          0
P        498    46942.8          0          0          0          0
P        499      24156          0          0          0          0
P       9999   70890248          0          0          0          0
P            70992488.6          0          1          1          2
S        497  110629.75          0          0          0          0
S        498   82937.25          0          0          0          0
S        499   80999.15          0          0          0          0
S       9999  267205791          0          0          0          0
S             267480357          0          1          1          2
T        497     8319.6          0          0          0          0
T        498    5347.65          0          0          0          0
T        499      19781          0          0          0          0
T       9999   28095689          0          0          0          0
T            28129137.3          0          1          1          2
         497  226511.25          1          0          2          1
         498   209715.5          1          0          2          1
         499   202998.8          1          0          2          1
        9999  623470805          1          0          2          1
              624110031          1          1          3          3

 


Followup   April 29, 2005 - 10am Central time zone:

http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:37355353762363
 

4 stars How to do this using Analytics   May 5, 2005 - 5pm Central time zone
Reviewer: A reader 
Hello Sir,
         I have a denormalized table dept_emp of which part of it I have reproduced here.It 
has/will have dupes .

I need to find out all emps which belong to more than one dept using Analytics ( Want to avoid self 
join ).

So the required output must be :



DEPTNO DNAME      EMPNO ENAME               
------ ---------- ----- --------------------
    10 D10            1 E1                  
    10 D10            1 E1                  
    10 D10            2 E2                  
    10 D10            2 E2                  
                  
    20 D20            1 E1                  
    20 D20            1 E1                  
    20 D20            2 E2                  
    20 D20            2 E2                  
   
From the total set of :
SELECT * FROM DEPT_EMP ORDER BY DEPTNO ,EMPNO
DEPTNO DNAME      EMPNO ENAME               
------ ---------- ----- --------------------
    10 D10            1 E1                  
    10 D10            1 E1                  
    10 D10            2 E2                  
    10 D10            2 E2                  
    10 D10            3 E3                  
    10 D10            3 E3                  
    20 D20            1 E1                  
    20 D20            1 E1                  
    20 D20            2 E2                  
    20 D20            2 E2                  
    20 D20            4 E4                  
    20 D20            4 E4                  
    20 D20            5 E5                  
    20 D20            5 E5                  
14 rows selected


create table dept_emp (deptno number , dname varchar2(10) ,empno number ,ename varchar2(20) ) ;

INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
10, 'D10', 1, 'E1'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
10, 'D10', 2, 'E2'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
10, 'D10', 3, 'E3'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
20, 'D20', 4, 'E4'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
20, 'D20', 5, 'E5'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
20, 'D20', 1, 'E1'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
20, 'D20', 2, 'E2'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
10, 'D10', 1, 'E1'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
10, 'D10', 2, 'E2'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
10, 'D10', 3, 'E3'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
20, 'D20', 4, 'E4'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
20, 'D20', 5, 'E5'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
20, 'D20', 1, 'E1'); 
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES ( 
20, 'D20', 2, 'E2'); 
COMMIT ;

Thanx 


Followup   May 5, 2005 - 6pm Central time zone:

no analytics

select empno, count(distinct deptno)
  from t
 group by empno
having count(distinct deptno) > 1; 

5 stars Thanx Sir   May 5, 2005 - 9pm Central time zone
Reviewer: A reader 
Actually I was planing to use analytics to get the whole row info, will do the same trick with 
Analytics, then.

You are a Genius. 


Followup   May 6, 2005 - 7am Central time zone:

select *
  from (
select t.*, count(distinct deptno) over (partition by empno) cnt
  from t
       )
 where cnt > 1;
 

4 stars Analytical solution   May 10, 2005 - 6am Central time zone
Reviewer: Baiju Menon from India
Sir,
I want to list the department and the maximum number of employees working in that department by 
using Analytical function(only the department in which the maximum number of employees are working) 

the query without the Analytical function is
select deptno, count(deptno) from emp group by deptno having count(deptno) in (select 
max(count(deptno)) from emp group by deptno)
Thanks
 


Followup   May 10, 2005 - 9am Central time zone:

  1  select deptno, cnt
  2    from (
  3  select deptno, cnt, max(cnt) over() max_cnt
  4    from (
  5  select deptno, count(*) cnt
  6    from emp
  7   group by deptno
  8         )
  9        )
 10*  where cnt = max_cnt
scott@ORA9IR2> /
 
    DEPTNO        CNT
---------- ----------
        30          6
 

5 stars group by   May 11, 2005 - 4am Central time zone
Reviewer: Anoop Gupta from INDIA
Hi Tom,
        
       I have a table in table data is like this 

   empid leavelname
   1001  Level1
   1001  Level2
   1001  Level3
   1001  Level4
   1002  Level1
   1002  Level2
   1002  Level3
   ...
   ...

Means this table tell on which levels employee is assigned.
Is there any query posible that will retrun data llike this without writing a function.

empid  emp_assigned on leavel
1001   level1,level2,level3,level4
1002   level1,level2,level3
...
...

Waiting for your response.....

        


Followup   May 11, 2005 - 7am Central time zone:

only if there is some reasonable maximum number of levelname rows per empid.

is there? 

5 stars Analytics Rock - But why are they slower for me   May 13, 2005 - 1am Central time zone
Reviewer: Jeff Plumb from Melbourne, Australia
Hi Tom,

I have followed you example about Analytics from Effective Oracle by Design on page 516 (Find a 
specific row in a partition). When I run the example and tkprof the 3 different queries, the 
analytics actually takes a lot longer to run, but it does do less logical I/O's. It is doing a lot 
more physical I/O's so I am guessing that it is using a temporary segment on disk to perform the 
window sort. To perform the test I created the big_table that you use and populated it with 
1,000,000 rows. I am using Oracle 9i release 2. Here is the output from TKPROF:

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33
********************************************************************************

select owner, object_name, created
from   big_table t
where  created = (select max(created)
                  from   big_table t2
                  where  t2.owner = t.owner)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        8      5.32       6.42      13815      14669          0         694
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       10      5.32       6.42      13815      14669          0         694

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
    694  HASH JOIN
     20   VIEW
     20    SORT GROUP BY
1000000     TABLE ACCESS FULL BIG_TABLE
1000000   TABLE ACCESS FULL BIG_TABLE

********************************************************************************

select t.owner, t.object_name, t.created
from   big_table t
join (select owner, max(created) maxcreated
      from   big_table
      group by owner) t2
  on (t2.owner = t.owner and t2.maxcreated = t.created)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        8      5.03       5.06      13816      14669          0         694
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       10      5.03       5.06      13816      14669          0         694

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
    694  HASH JOIN
     20   VIEW
     20    SORT GROUP BY
1000000     TABLE ACCESS FULL BIG_TABLE
1000000   TABLE ACCESS FULL BIG_TABLE

********************************************************************************

select owner, object_name, created
from
(   select owner, object_name, created, max(created) over (partition by owner) as maxcreated
    from   big_table
)
where created = maxcreated

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        8     16.68      40.66      15157       7331         17         694
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       10     16.68      40.66      15157       7331         17         694

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33

Rows     Row Source Operation
-------  ---------------------------------------------------
    694  VIEW
1000000   WINDOW SORT
1000000    TABLE ACCESS FULL BIG_TABLE

********************************************************************************

And when I run the query with the analytics using autotrace I get the following which shows a sort 
to disk:
SQL*Plus: Release 9.2.0.6.0 - Production on Fri May 13 14:53:08 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
With the Partitioning option
JServer Release 9.2.0.6.0 - Production

control@DWDEV> set autot traceonly
control@DWDEV> select owner, object_name, created
  2  from
  3  (   select owner, object_name, created, max(created) over (partition by owner) as maxcreated
  4      from   big_table
  5  )
  6  where created = maxcreated;

694 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4399 Card=1000000 Bytes=52000000)
   1    0   VIEW (Cost=4399 Card=1000000 Bytes=52000000)
   2    1     WINDOW (SORT) (Cost=4399 Card=1000000 Bytes=43000000)
   3    2       TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=637 Card=1000000 Bytes=43000000)




Statistics
----------------------------------------------------------
          0  recursive calls
         17  db block gets
       7331  consistent gets
      15348  physical reads
        432  redo size
      12784  bytes sent via SQL*Net to client
        717  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          1  sorts (disk)
        694  rows processed

So how can I stop the sorts (disk)? I am guessing that the pga_aggregate_target needs to be higher, 
but it seems to already be set quite high.

control@DWDEV> show parameter pga

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target                 big integer 524288000

I hope you can help clarify how to make the anayltic version run quicker.

Thanks.
 


Followup   May 13, 2005 - 9am Central time zone:

it'll be a function of the number of "owners" here

You have 1,000,000 records.

You have but 20 users.

in this extreme case, having 50,000 records per window and swapping out was not as good as 
squashing the data down to 20 records and joining -- the CBO quite smartly rewrote:

select owner, object_name, created
from   big_table t
where  created = (select max(created)
                  from   big_table t2
                  where  t2.owner = t.owner)

as

select ...
  from big_table t, (select owner,max(created) created from big_table t2 ...)
 where ....



So, does the data you analyze to find the "most current record" tend to have 50,000 records/key in 
real life?

In your case, your hash table didn't spill to disk.  In real life though, the numbers would 
probably be much different.  a 1,000,000 row table would have keys with 10 or 100 rows maybe, not 
50,000 (in general).  There you would find the answer to be very different.

And if you let the sort run in memory it would be different as well -- you would get a max of 25m 
given your pga aggregate target setting that may have been too small.

but consider what happens when the size of the "aggregate" goes up, dimishing marginal returns sets 
in:

select owner, object_name, created
from   big_table t
where  created = (select max(created)
                  from   big_table t2
                  where  t2.owner = t.owner)
                                                                                         
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      320      2.06       2.01      26970      29283          0        4775
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      322      2.06       2.01      26970      29283          0        4775
********************************************************************************
select owner, object_name, created
from
(   select owner, object_name, created,
           max(created) over (partition by owner) as maxcreated
    from   big_table
)
where created = maxcreated
                                                                                         
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      320      4.57      10.05      30603      14484         15        4775
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      322      4.57      10.05      30603      14484         15        4775
********************************************************************************
select owner, object_name, created
from   big_table t
where  created = (select max(created)
                  from   big_table t2
                  where  t2.id = t.id)
                                                                                         
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    66668      7.70      12.04      33787      45393          2     1000000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    66670      7.71      12.05      33787      45393          2     1000000
********************************************************************************
select owner, object_name, created
from
(   select owner, object_name, created,
           max(created) over (partition by id) as maxcreated
    from   big_table
)
where created = maxcreated
                                                                                         
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    66668      7.00       9.60       9336      14484          2     1000000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    66670      7.00       9.60       9336      14484          2     1000000





and, given sufficient space to work "in memory", these two big queries both benefited:


select owner, object_name, created
from   big_table t
where  created = (select max(created)
                  from   big_table t2
                  where  t2.owner = t.owner)
                                                                                         
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      320      1.82       1.96       9909      29283          0        4775
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      322      1.83       1.97       9909      29283          0        4775
********************************************************************************
select owner, object_name, created
from
(   select owner, object_name, created,
           max(created) over (partition by owner) as maxcreated
    from   big_table
)
where created = maxcreated
                                                                                         
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      320      2.15       2.11       2858      14484          0        4775
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      322      2.15       2.11       2858      14484          0        4775
********************************************************************************
select owner, object_name, created
from   big_table t
where  created = (select max(created)
                  from   big_table t2
                  where  t2.id = t.id)
                                                                                         
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    66668      7.64       7.55      10181      94633          0     1000000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    66670      7.65       7.56      10181      94633          0     1000000
********************************************************************************
select owner, object_name, created
from
(   select owner, object_name, created,
           max(created) over (partition by id) as maxcreated
    from   big_table
)
where created = maxcreated
                                                                                         
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    66668      5.69       5.49       2699      14484          0     1000000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    66670      5.69       5.49       2699      14484          0     1000000


(this was a dual cpu xeon using 'nonparallel' query in this case, once with a 256mb pga aggregate 
target and again with a 2gig one 

3 stars   May 14, 2005 - 3am Central time zone
Reviewer: kuldeep from India
Dear Tom,

I have three tables t1, t2 & t3. where t2 & t3 is joined with t2 with column "key_id".
Now I need sum of key_values(amount) of t2 and sum of key_values(amount) of t3 for key_id
in table t1.

kuldeep@dlfscg> select * from t1;

    KEY_ID    KEY_VAL
---------- ----------
         2       1980
         1       1975

kuldeep@dlfscg> select * from t2;

    KEY_ID    KEY_VAL
---------- ----------
         2        550
         2        575
         1        500

kuldeep@dlfscg> select * from t3;

    KEY_ID    KEY_VAL
---------- ----------
         2        900
         1       1000
         1        750

***** QUERY 1 *****

kuldeep@dlfscg> SELECT t1.key_id, SUM(t2.key_val) sum_t2_key_val, SUM(t3.key_val) sum_t3_key_val
  2  FROM t1, t2, t3
  3  WHERE t1.key_id=t2.key_id
  4  AND t1.key_id=t3.key_id
  5  GROUP BY t1.key_id
  6  /

    KEY_ID SUM_T2_KEY_VAL SUM_T3_KEY_VAL
---------- -------------- --------------
         1           1000           1750
         2           1125           1800

***** QUERY 2 *****

kuldeep@dlfscg> SELECT t1.key_id, t2.sum_t2_key_val, t3.sum_t3_key_val
  2  FROM   t1, 
  3      (SELECT key_id, SUM(key_val) sum_t2_key_val FROM t2 GROUP BY key_id) t2,
  4      (SELECT key_id, SUM(key_val) sum_t3_key_val FROM t3 GROUP BY key_id) t3
  5  WHERE  t1.key_id=t2.key_id
  6  AND    t1.key_id=t3.key_id 
  7  /

    KEY_ID SUM_T2_KEY_VAL SUM_T3_KEY_VAL
---------- -------------- --------------
         1            500           1750
         2           1125            900

Query 1 is giving wrong result and I can not use query 2 whose performance is very poor.

Oracle 9i has added a lot of new grouping features and a lot of analytic functions (all going over 
the head).

Is there any "special" sum function or way which should pick value only once belonging to a row (or 
query's key, here "key_id")
irrespective of how many time it is appearing on query result.


    KEY_ID T2_KEY_VAL T3_KEY_VAL
---------- ---------- ----------
         1        500       1000 
         1        500        750 <---- 500 of t2 should not be calculated, it is repeat
         2        550        900
         2        575        900 <---- 900 of t3 should not be calculated, it is repeat

thanks and regards,

 


Followup   May 14, 2005 - 9am Central time zone:

select t1.key_id, t2.sum_val, t3.sum_val
  from t1, 
       (select key_id, sum(val) sum_val from t2 group by key_id ) t2,
       (select key_id, sum(val) sum_val from t3 group by key_id ) t3
 WHERE  t1.key_id=t2.key_id
 AND    t1.key_id=t3.key_id 
  

4 stars apply an amount across multiple records   May 15, 2005 - 8pm Central time zone
Reviewer: Dave from Seattle
I have a problem similar to what I call the invoice payment problem.
It would seem to be a common problem, but I have searched to no avail.

The idea is that a customer may have many outstanding invoices, and sends in a check for an 
arbitrary amount.  So we need to apply the money across the invoices oldest first.
Note that in my specific case, if a payment exceeds the total outstanding, the excess is ignored 
(obviously not dealing with real money here!)

create table invoices (
  cust_nbr    integer not null,
  invoice_nbr integer not null,
  invoice_amt number not null,
  payment_amt number not null,
  primary key (cust_nbr, invoice_nbr)
);

begin
  delete from invoices;
  dbms_random.seed(123456789);
  for c in 1 .. 2 loop
    for i in 1 .. 3 loop
      insert into invoices values (c, i, round(dbms_random.value * 10, 2)+1, 0);
    end loop;
  end loop;
  update invoices
    set payment_amt = round(dbms_random.value * invoice_amt, 2)
    where invoice_nbr = 1;
  commit;
end;
/
select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
       invoice_amt - payment_amt outstanding_amt
  from invoices
  where invoice_amt - payment_amt > 0
  order by cust_nbr, invoice_nbr;

  CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT OUTSTANDING_AMT
---------- ----------- ----------- ----------- ---------------
         1           1        9.44        5.55            3.89
         1           2        3.21           0            3.21
         1           3        2.78           0            2.78
         2           1        7.57         4.3            3.27
         2           2        9.46           0            9.46
         2           3        5.92           0            5.92

variable cust_nbr number;
variable received_amt number;
begin
  :cust_nbr := 1;
  :received_amt := 7.25;
end;
/

update invoices i1
  set payment_amt = (... some query which applies
                      :received_amt to outstanding_amt ...)
  where cust_nbr = :cust_nbr;


result should be:

  CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT OUTSTANDING_AMT
---------- ----------- ----------- ----------- ---------------
         1           1        9.44        9.44               0
         1           2        3.21        3.21               0
         1           3        2.78         .15            2.63
         2           1        7.57         4.3            3.27
         2           2        9.46           0            9.46
         2           3        5.92           0            5.92


This is simple to solve in pl/sql with a cursor, but I thought it would be a good test for a 
set-based solution with analytics.  But after some effort, I'm stumped.
 


Followup   May 16, 2005 - 7am Central time zone:

Using analytics we can see how to apply the inputs:

ops$tkyte@ORA9IR2> select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  2         least( greatest( :received_amt - rt + outstanding_amt, 0 ), outstanding_amt ) 
amount_to_apply
  3    from (
  4  select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  5         invoice_amt - payment_amt outstanding_amt,
  6         sum(invoice_amt - payment_amt) over (partition by cust_nbr order by invoice_nbr) rt
  7    from invoices
  8   where cust_nbr = :cust_nbr
  9         )
 10    order by cust_nbr, invoice_nbr;
 
  CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT AMOUNT_TO_APPLY
---------- ----------- ----------- ----------- ---------------
         1           1        9.44        5.55            3.89
         1           2        3.21           0            3.21
         1           3        2.78           0             .15


Just needed a running total of outstanding amounts to take away from the received amount....

Then, merge:

ops$tkyte@ORA9IR2> merge into invoices
  2  using
  3  (
  4  select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  5         least( greatest( :received_amt - rt + outstanding_amt, 0 ), outstanding_amt ) 
amount_to_apply
  6    from (
  7  select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  8         invoice_amt - payment_amt outstanding_amt,
  9         sum(invoice_amt - payment_amt) over (partition by cust_nbr order by invoice_nbr) rt
 10    from invoices
 11   where cust_nbr = :cust_nbr
 12         )
 13  ) x
 14  on ( invoices.cust_nbr = x.cust_nbr and invoices.invoice_nbr = x.invoice_nbr )
 15  when matched then update set payment_amt = nvl(payment_amt,0)+x.amount_to_apply
 16  when not matched /* never happens... */ then insert (cust_nbr) values (null);
 
3 rows merged.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  2         invoice_amt - payment_amt outstanding_amt
  3    from invoices
  4    order by cust_nbr, invoice_nbr;
 
  CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT OUTSTANDING_AMT
---------- ----------- ----------- ----------- ---------------
         1           1        9.44        9.44               0
         1           2        3.21        3.21               0
         1           3        2.78         .15            2.63
         2           1        7.57         4.3            3.27
         2           2        9.46           0            9.46
         2           3        5.92           0            5.92
 
6 rows selected.
 

5 stars Group by   May 16, 2005 - 10am Central time zone
Reviewer: Anoop Gupta from INDIA
Reviewer:  Anoop Gupta  from INDIA 

Hi Tom,

As i asked question that 
        
       I have a table in table data is like this 

   empid leavelname
   1001  Level1
   1001  Level2
   1001  Level3
   1001  Level4
   1002  Level1
   1002  Level2
   1002  Level3
   ...
   ...

Means this table tell on which levels employee is assigned.
Is there any query posible that will retrun data llike this without writing a 
function.

empid  emp_assigned on leavel
1001   level1,level2,level3,level4
1002   level1,level2,level3
...
...

Give me the way to write a query if Suppose here we have a limitation of levels for an employee is 
50.

Please reply....


 


Followup   May 16, 2005 - 1pm Central time zone:

select empid,
       rtrim(
       max(decode(rn,1,leavelname)) || ',' ||
       max(decode(rn,1,leavelname)) || ',' ||
       ....
       max(decode(rn,50,leavelname)), ',' )
  from (select empid,   
               row_number() over (partition by empid order by leavelname) rn,
               leavelname
          from t 
       )
 group by empid; 

3 stars special sum   May 17, 2005 - 12am Central time zone
Reviewer: kuldeep from India
Dear Tom,

Thanks for your response and for this useful site.

I was looking for a solution which could avoid these inline views which were making my query to run 
slow. I tried for the solution and got this query,

/*  DATA VIEW  */
kuldeep@dlfscg> SELECT t1.key_id,    
  2      t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn, 
t2.key_val,
  3      t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t2_rn, 
t3.key_val
  4  FROM t1, t2, t3
  5  WHERE t1.key_id=t2.key_id
  6  AND t1.key_id=t3.key_id
  7  ORDER BY t1.key_id
  8  /

    KEY_ID T2_ROWID                T2_RN    KEY_VAL T3_ROWID                T2_RN    KEY_VAL
---------- ------------------ ---------- ---------- ------------------ ---------- ----------
         1 AAANZ5AAHAAAD94AAA          1        500 AAANZ4AAHAAAD9wAAA          1       1000
         1 AAANZ5AAHAAAD94AAA          2        500 AAANZ4AAHAAAD9wAAB          1        750
         2 AAANZ5AAHAAAD91AAA          1        550 AAANZ4AAHAAAD9tAAA          1        900
         2 AAANZ5AAHAAAD91AAB          1        575 AAANZ4AAHAAAD9tAAA          2        900

/*  FINAL QUERY  */

kuldeep@dlfscg> SELECT key_id,
  2      SUM(DECODE(t2_rn,1,t2_key_val,0)) t2_key_val,
  3      SUM(DECODE(t3_rn,1,t3_key_val,0)) t3_key_val
  4  FROM   (SELECT t1.key_id,    
  5           t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn, 
t2.key_val t2_key_val,
  6           t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t3_rn, 
t3.key_val t3_key_val
  7          FROM t1, t2, t3
  8    WHERE t1.key_id=t2.key_id
  9    AND t1.key_id=t3.key_id)
 10  GROUP BY key_id
 11  /

    KEY_ID T2_KEY_VAL T3_KEY_VAL
---------- ---------- ----------
         1        500       1750
         2       1125        900

regards,
 


Followup   May 17, 2005 - 8am Central time zone:

one would need more information -- it APPEARS that you are trying to get a "random first hit" from 
T2 and T3 by T1.key_id

That is, for every row in T1 -- find the first match (any match will do) in T2 and in T3

report that value


is that correct.

and how big are t1,t2,t3,  and how long is long. 

5 stars group by   May 17, 2005 - 9am Central time zone
Reviewer: Anoop Gupta from INDIA
Tom,

Thanks for your prompt response. 


5 stars Analytical Problem   May 18, 2005 - 4am Central time zone
Reviewer: Imran 
Look at the following two queries.

SQL> SELECT phone, MONTH, arrears, this_month, ABS (up_down),
  2         CASE
  3            WHEN up_down < 0
  4               THEN 'DOWN'
  5            WHEN up_down > 0
  6               THEN 'UP'
  7            ELSE 'BALANCE'
  8         END CASE,
  9         prev_month
 10    FROM (SELECT exch || ' - ' || phone phone,
 11                 TO_CHAR (TO_DATE (MONTH, 'YYMM'), 'Mon, YYYY') MONTH, region,
 12                 instdate, paybefdue this_month, arrears,
 13                 LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC) prev_month,
 14                   paybefdue
 15                 - (LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC)) up_down
 16            FROM ptc
 17           WHERE phone IN (7629458));

PHONE           MONTH              ARREARS THIS_MONTH ABS(UP_DOWN) CASE    PREV_MONTH
--------------- --------------- ---------- ---------- ------------ ------- ----------
202 - 7629458   Apr, 2005          2562.52       5265         5265 UP               0

SQL> SELECT phone, MONTH, arrears, this_month, ABS (up_down),
  2         CASE
  3            WHEN up_down < 0
  4               THEN 'DOWN'
  5            WHEN up_down > 0
  6               THEN 'UP'
  7            ELSE 'BALANCE'
  8         END CASE,
  9         prev_month
 10    FROM (SELECT exch || ' - ' || phone phone,
 11                 TO_CHAR (TO_DATE (MONTH, 'YYMM'), 'Mon, YYYY') MONTH, region,
 12                 instdate, paybefdue this_month, arrears,
 13                 LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC) prev_month,
 14                   paybefdue
 15                 - (LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC)) up_down
 16            FROM ptc
 17           WHERE phone IN (7629459));

PHONE           MONTH              ARREARS THIS_MONTH ABS(UP_DOWN) CASE    PREV_MONTH
--------------- --------------- ---------- ---------- ------------ ------- ----------
202 - 7629459   Apr, 2005          3516.62       7834         7834 UP               0

SQL> 

Now when I combine the two queries results are different. 

  1  SELECT phone, MONTH, arrears, this_month, ABS (up_down),
  2         CASE
  3            WHEN up_down < 0
  4               THEN 'DOWN'
  5            WHEN up_down > 0
  6               THEN 'UP'
  7            ELSE 'BALANCE'
  8         END CASE,
  9         prev_month
 10    FROM (SELECT exch || ' - ' || phone phone,
 11                 TO_CHAR (TO_DATE (MONTH, 'YYMM'), 'Mon, YYYY') MONTH, region,
 12                 instdate, paybefdue this_month, arrears,
 13                 LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC) prev_month,
 14                   paybefdue
 15                 - (LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC)) up_down
 16            FROM ptc
 17*          WHERE phone IN (7629458,7629459))
SQL> /

PHONE           MONTH              ARREARS THIS_MONTH ABS(UP_DOWN) CASE    PREV_MONTH
--------------- --------------- ---------- ---------- ------------ ------- ----------
202 - 7629458   Apr, 2005          2562.52       5265         2569 DOWN          7834
202 - 7629459   Apr, 2005          3516.62       7834         7834 UP               0

So you note that prev Month balance now disturbs badly.

Please tell me how to do this 


Followup   May 18, 2005 - 8am Central time zone:

need test case.  create table, insert's (like the page used to submit this said....) 

4 stars Use of analytic functions in UPDATE statements   May 18, 2005 - 12pm Central time zone
Reviewer: Bob Lyon from Houston
Tom,


-- Given this sample data

CREATE TABLE GT (
   XP_ID           INTEGER,
   OFFSET          INTEGER,
   PMAX            NUMBER,
   PRIOR_PMAX      NUMBER
);

INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 123, 1, 3);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 123, 2, 8);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 155, 3, 5);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 173, 3, 7.3);

-- I want to update the table and set the PRIOR_PMAX column values to be as follows

SELECT XP_ID, OFFSET,  PMAX,
       LAG(PMAX, 1, NULL) OVER (PARTITION BY XP_ID
                                ORDER BY     XP_ID, OFFSET) PRIOR_PMAX
FROM   GT
ORDER BY  XP_ID, OFFSET;

     XP_ID     OFFSET       PMAX PRIOR_PMAX
---------- ---------- ---------- ----------
       123          1          3
       123          2          8          3
       155          3          5
       173          3        7.3

-- My insert to do this tells me "4 rows updated.", but does not do what I want.

UPDATE GT A
SET    PRIOR_PMAX = (
          SELECT LAG(B.PMAX, 1, NULL) OVER (PARTITION BY B.XP_ID
                                            ORDER BY     B.XP_ID, B.OFFSET)  PRIOR_PMAX
          FROM   GT B
          WHERE  A.ROWID = B.ROWID
          );

-- but I get

SELECT  xp_id, offset, pmax, prior_pmax
FROM    GT
ORDER BY  xp_id, offset;

     XP_ID     OFFSET       PMAX PRIOR_PMAX
---------- ---------- ---------- ----------
       123          1          3
       123          2          8
       155          3          5
       173          3        7.3

-- Oracle doc states
--  "Therefore, analytic functions can appear only in the select list or ORDER BY clause."
-- which is perhaps a little ambiguous in this case.
-- Is there a way to do this update is "Straight SQL"?

 


Followup   May 18, 2005 - 12pm Central time zone:

you can merge

merge into gt a
using ( SELECT rowid rid, XP_ID, OFFSET,  PMAX,
       LAG(PMAX, 1, NULL) OVER (PARTITION BY XP_ID
                                ORDER BY     XP_ID, OFFSET) PRIOR_PMAX
FROM   GT )b
on (a.rowid = b.rowid) 
when matched then update ...
when not matched (never happens, just do a dummy insert of a single null in 9i or leave off 
entirely in 10g) 

3 stars special sum   May 19, 2005 - 1am Central time zone
Reviewer: Kuldeep from India
My requirement was like this : I have receivables (bills, debit notes etc.) which I adjusts against 
the received payments and credit note (both are in seperate tables). To know the outstanding I was 
joining (outer join) my receivables with payments and credit notes. 

Because one receivable can be adjusted against many payments and credit notes so outstanding 
payment was like this:

  outstanding = receivable amount - sum(payment amount) - sum(credit note amount)

this simple query using outer join was giving wrong result if a receivable is adjusted against one 
payment and more than one credit note or vice versa.

in this case where
  receivable : 1000         payment : 400         CN : 400, 200

will appear as
  1000        400         400
  1000        400         200
              ---         ---
              800         600      outstanding = -400 (wrong)

My t1, t2 and t3 has 600,000, 350,000 and 80,000 row respectively.

This is my actual inline view query
-----------------------------------
SELECT a.bill_type, a.bill_exact_type, a.period_id, 
       a.scheme_id, a.property_number, a.bill_number, 
       a.bill_amount, SUM(NVL(c.adj_amt,0)+NVL(p.adjust_amount,0)) adj_amt,
       NVL(a.bill_amount,0) - SUM(NVL(c.adj_amt,0)+NVL(p.adjust_amount,0)) pending_amt
FROM   ALL_RECEIVABLE a, 
       (SELECT bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number, 
SUM(adj_amt) adj_amt
        FROM   CREDIT_NOTE_RECEIVABLE
        WHERE  bill_type=p_bill_type
        AND    scheme_id=p_scheme
        AND    property_number=p_prop
        GROUP BY bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number) c, 

    (SELECT bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number, 
SUM(adjust_amount) adjust_amount
     FROM    PAYMENT_RECEIPT_ADJ
     WHERE  bill_type=p_bill_type
     AND    scheme_id=p_scheme
     AND    property_number=p_prop
     GROUP BY bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number) p
WHERE   a.bill_type=P_BILL_TYPE
AND    a.scheme_id=P_SCHEME
AND    a.property_number=P_PROP
AND    a.bill_type=c.bill_type(+)
AND    a.bill_exact_type=c.bill_exact_type(+)
AND    a.period_id=c.period_id(+)
AND    a.scheme_id=c.scheme_id(+)
AND    a.property_number=c.property_number(+)
AND    a.bill_number=c.bill_number(+)
AND    a.bill_type=p.bill_type(+)
AND    a.bill_exact_type=p.bill_exact_type(+)
AND    a.period_id=p.period_id(+)
AND    a.scheme_id=p.scheme_id(+)
AND    a.property_number=p.property_number(+)
AND    a.bill_number=p.bill_number(+)
GROUP BY a.bill_type, a.bill_exact_type, a.period_id, a.scheme_id, 
             a.property_number, a.bill_number, a.bill_date, a.bill_amount
HAVING (NVL(a.bill_amount,0) - SUM(NVL(c.adj_amt,0)+NVL(p.adjust_amount,0))) > 0
ORDER BY a.bill_date;
-----------------------------------

It is not reporting just the first hit of t1 in t2 and t3. Here in my last posting, I was trying 
just to exclude any repeat of t2 and t3's ROW in sum calculation. That means one row of t2 and t3 
should be calculated only once.

I have tried this query putting more rows and applied the same on actual query, it is working fine 
and giving the same result as previous inline view query was giving.

kuldeep@dlfscg> SELECT t1.key_id,    
  2      t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn, 
t2.key_val,
  3      t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t2_rn, 
t3.key_val
  4  FROM t1, t2, t3
  5  WHERE t1.key_id=t2.key_id(+)
  6  AND t1.key_id=t3.key_id(+)
  7  ORDER BY t1.key_id
  8  /

    KEY_ID T2_ROWID                T2_RN    KEY_VAL T3_ROWID                T2_RN    KEY_VAL
---------- ------------------ ---------- ---------- ------------------ ---------- ----------
         1 AAANZ5AAHAAAD94AAA          1        500 AAANZ4AAHAAAD9wAAA          1       1000
         1 AAANZ5AAHAAAD94AAA          2        500 AAANZ4AAHAAAD9wAAB          1        750
         1 AAANZ5AAHAAAD94AAA          3        500 AAANZ4AAHAAAD9wAAC          1         25
         2 AAANZ5AAHAAAD91AAA          1        550 AAANZ4AAHAAAD9tAAA          1        900
         2 AAANZ5AAHAAAD91AAB          1        575 AAANZ4AAHAAAD9tAAA          2        900
         3 AAANZ5AAHAAAD91AAC          1        222                             1
         3 AAANZ5AAHAAAD91AAD          1        223                             2
         4                             1            AAANZ4AAHAAAD9tAAB          1        333

8 rows selected.

kuldeep@dlfscg> SELECT key_id,
  2      SUM(DECODE(t2_rn,1,t2_key_val,0)) t2_key_val,
  3      SUM(DECODE(t3_rn,1,t3_key_val,0)) t3_key_val
  4  FROM   (SELECT t1.key_id,    
  5           t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn, 
t2.key_val t2_key_val,
  6           t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t3_rn, 
t3.key_val t3_key_val
  7          FROM t1, t2, t3
  8    WHERE t1.key_id=t2.key_id(+)
  9    AND t1.key_id=t3.key_id(+))
 10  GROUP BY key_id
 11  /

    KEY_ID T2_KEY_VAL T3_KEY_VAL
---------- ---------- ----------
         1        500       1775
         2       1125        900
         3        445          0
         4                   333

kuldeep@dlfscg> 

thanks for your responses.

regards, 


Followup   May 19, 2005 - 7am Central time zone:

do not order by rowid to get a last row -- is that what you are trying to do??


which row do you want to get from t2 to join with t1
and which row do you want to get from t3 to join with t1

You must specify that based on attributes you manage (eg: there must be an orderable field that 
helps you determine WHICH record is the right one)


consider rowid to be a random number that does not have any meaning when ordered by, it does not 
imply order of insertion or anything. 

4 stars null record   May 25, 2005 - 4pm Central time zone
Reviewer: yeshk from St.louis,mo,usa
I need help with this query - This is just a part of the query I am working with. 

I am not able to generate a NULL RECORD in between the result set.
I should be able to pass this information out as a reference cursor.

create table test(state varchar2(2),svc_cat varchar2(3),measd_tkt number,non_measd_tkt number);

insert into test values('CA','NDS',100,200);
insert into test values('IL','DSL',200,300);
insert into test values('CA','DSL',100,300);
insert into test values('MO','NDS',1000,300);
insert into test values('MO','DSL',100,200);

I need a result like this
STATE   SVC_CAT  MEASD_TKT        NON MEASD TKT
CA    DSL          200              300
CA    NDS          100              200
      TOTAL        300              500

IL    DSL          200              300
      TOTAL        200              300

MO    DSL          100              200
MO    NDS         1000              300
      TOTAL       1100              500

I am able to generate the result using a query with  analytics.But I dont know how to get an empty 
row after each state total

Also,Which is better using cursor 
  1) cursor based on state.
  2) get the data and insert into a temporary table.
  3) insert a null record

or use analytics to get complete data and put into a reference cursor.

Thanks
yeshk 


Followup   May 25, 2005 - 7pm Central time zone:

well, that would sort of be the job of the "pretty printing routine" -- eg: the report generator?

what tool is printing this out? 

4 stars null record   May 26, 2005 - 9am Central time zone
Reviewer: yeshk from St.louis,mo,usa
we need to give the resultant set with a null record after each state calculation to front-end VB 
application. It will be given in a reference cursor.They will just select * from reference cursor 
and display it on a report.  


Followup   May 26, 2005 - 10am Central time zone:

the VB application should do this, (it should be able to do something shouldn't it...)

ops$tkyte@ORA9IR2> select decode( grp, 0, state ) state,
  2         decode( grp, 0, svc_cat) svc_cat,
  3             decode( grp, 0, sum_mt ) sum_mt,
  4             decode( grp, 0, sum_nmt ) sum_nmt
  5    from (
  6  select grouping(dummy) grp, state, svc_cat, sum(measd_tkt) sum_mt, sum(non_measd_tkt) sum_nmt
  7    from (
  8  select state, svc_cat, 1 dummy, measd_tkt, non_measd_tkt
  9    from test
 10         )
 11   group by rollup( state, dummy, svc_cat )
 12         )
 13  /
 
ST SVC     SUM_MT    SUM_NMT
-- --- ---------- ----------
CA DSL        100        300
CA NDS        100        200
CA            200        500
 
IL DSL        200        300
IL            200        300
 
MO DSL        100        200
MO NDS       1000        300
MO           1100        500
 
 
 
12 rows selected.
 
 

5 stars Can rollup do the thing??   May 26, 2005 - 9am Central time zone
Reviewer: Bhavesh Ghodasara from Ahmedabad,Gujarat,India
Hi yeshk,
 
create table test(state varchar2(2),svc_cat varchar2(3),measd_tkt 
number,non_measd_tkt number);

insert into test values('CA','NDS',100,200);........

insert into test values('CA','DSL',100,300);....



STATE   SVC_CAT  MEASD_TKT        NON MEASD TKT
CA    DSL          200              300     <==From where measd_tkt=200 comes from??????
CA    NDS          100              200
      TOTAL        300              500

Tom,,Can we do like this,

break on state
select STATE,SVC_CAT,sum(measd_tkt),sum(non_measd_tkt)
from test       
group by rollup(STATE,SVC_CAT)
order by state

............
If i make any mistake than please tell me..
Thanks in advance. 


Followup   May 26, 2005 - 10am Central time zone:

see above 

4 stars Which analytics to use?   May 30, 2005 - 9pm Central time zone
Reviewer: Marc-Andre Larochelle from Montreal, PQ
Hi Tom,

I have this 3rd party table: 

drop table t;
create table t (atype varchar2(4),
                acol# varchar2(3),
                adin varchar2(8),
                ares varchar2(8));

insert into t (atype, acol#, adin) values ('DUPT','001','02246569');
insert into t (atype, acol#, adin) values ('DUPT','002','00021474');
insert into t (atype, acol#, adin) values ('DUPT','003','02246569');
insert into t (atype, acol#, ares) values ('MACT','1','02246569');
insert into t (atype, acol#, ares) values ('MACT','6','02246569');
insert into t (atype, acol#, ares) values ('MACT','7','00021474');

select * from t;

ATYPE ACOL# ADIN     ARES
----- ----- -------- --------
DUPT  001   02246569
DUPT  002   00021474
DUPT  003   02246569
MACT  1              02246569
MACT  6              02246569
MACT  7              00021474

I would like to get the following result :

DUPT 001 02246569 MACT 1 02246569
DUPT 002 00021474 MACT 7 00021474
DUPT 003 02246569 MACT 6 02246569

I need to match DUPT.adin=MACT.ares together but making sure MACT.acol# is different for every 
DUPT.acol#. Bsically this table has different values in column depending on the type of rows 
(atype).

I have tried using lag, lead, rank and nothing seems to work but I am pretty sure it is doable with 
analytics which is why I posted my question here.

Any hint/help would be appreciated.

Thank you,

Marc-Andre 


Followup   May 31, 2005 - 7am Central time zone:

question for you.

How did you know to put:

DUPT 001 02246569 together with MACT 1 02246569  and
DUPT 003 02246569 together with MACT 6 02246569

and not

DUPT 001 02246569 MACT 6 02246569
DUPT 003 02246569 MACT 1 02246569

for example.  some missing logic here. 

5 stars Am I Correct??   May 31, 2005 - 5am Central time zone
Reviewer: Bhavesh Ghodasara from Ahmedabad,Gujarat,India
Hi tom,
I solved above problem...
The query like :

select atyp,acol,aadin,batype,bacol,bares
from (
select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol# bacol,b.ares bares,
nvl(lead(b.acol# ) over(order by a.adin),0) lb,
count(*) over(partition by a.acol#) cnt
from t a,t b
where a.adin=b.ares
order by atyp,acol) t
where bacol<>lb

What i think is there must be a better way...
I know You will do it in much much better way..
Please suggest the corrections.
Thanks in Advance..
 


Followup   May 31, 2005 - 8am Central time zone:

 
ATYP ACO AADIN    BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 001 02246569 MACT 6   02246569
DUPT 002 00021474 MACT 7   00021474
DUPT 003 02246569 MACT 1   02246569
 

well, it gives a different result than the one you posted, it gives my hypothetical answer -- where 
001 was combined with 6, not 1. 

5 stars We can do this..   May 31, 2005 - 8am Central time zone
Reviewer: Bhavesh Ghodasara from Ahmedabad,Gujarat,India
Hi tom,
I can further modified my query:
now its give desired result....
(Agree that question is ambiguous)
select atyp,acol,aadin,batype,bacol,bares
from (
select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol# 
bacol,b.ares bares,
nvl(lead(b.acol# ) over(order by a.adin),0) lb,
min(b.acol#) over(partition by a.acol#) cnt
from t a,t b
where a.adin=b.ares
order by atyp,acol) t
where bacol=lb
or cnt>1

OUTPUT:
ATYP ACO AADIN    BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 001 02246569 MACT 1   02246569
DUPT 002 00021474 MACT 7   00021474
DUPT 003 02246569 MACT 6   02246569

So any corrections now??
Thanks in advance
Bhavesh
 


Followup   May 31, 2005 - 8am Central time zone:

I don't know your data well enough, but your query is non-deterministic if you care.  Consider:


ops$tkyte@ORA10G> create table t (atype varchar2(4),
  2                  acol# varchar2(3),
  3                  adin varchar2(8),
  4                  ares varchar2(8));
 
Table created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','001','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','002','00021474');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','003','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','1','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','5','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','6','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','7','00021474');
 
1 row created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> select atyp,acol,aadin,batype,bacol,bares
  2  from (
  3  select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol#
  4  bacol,b.ares bares,
  5  nvl(lead(b.acol# ) over(order by a.adin),0) lb,
  6  min(b.acol#) over(partition by a.acol#) cnt
  7  from t a,t b
  8  where a.adin=b.ares
  9  order by atyp,acol) t
 10  where bacol=lb
 11  or cnt>1;
 
ATYP ACO AADIN    BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 002 00021474 MACT 7   00021474
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> truncate table t;
 
Table truncated.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','001','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','002','00021474');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','003','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','1','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','6','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','7','00021474');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','5','02246569');
 
1 row created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> select atyp,acol,aadin,batype,bacol,bares
  2  from (
  3  select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol#
  4  bacol,b.ares bares,
  5  nvl(lead(b.acol# ) over(order by a.adin),0) lb,
  6  min(b.acol#) over(partition by a.acol#) cnt
  7  from t a,t b
  8  where a.adin=b.ares
  9  order by atyp,acol) t
 10  where bacol=lb
 11  or cnt>1;
 
ATYP ACO AADIN    BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 001 02246569 MACT 6   02246569
DUPT 002 00021474 MACT 7   00021474


Same data both times, just different order of insertions.  With analytics and order by, you need to 
be concerned about duplicates. 

5 stars Answers   May 31, 2005 - 11am Central time zone
Reviewer: Marc-Andre Larochelle from Montreal, PQ
Tom, Bhavesh,

The problem resides exactly there: no logic to match the records. I know that DUPT.din1 must have a 
MACT.din1 somewhere. I just don't know which one (1st one, 2nd one?). This is a decision I will 
have to take.

DUPT 001 02246569 MACT 1 02246569
DUPT 003 02246569 MACT 6 02246569
and
DUPT 001 02246569 MACT 6 02246569
DUPT 003 02246569 MACT 1 02246569

are the same to me. But when I run the query, I want to always get the same results.

Anyways, all in all, your queries (Bhavesh - thank you - and yours) seem to answer to my question. 
I will watch out for duplicates.

Thank you very much for the quick help.

Marc-Andre 


5 stars What I found   May 31, 2005 - 5pm Central time zone
Reviewer: Marc-Andre Larochelle from Montreal, PQ
Hi Tom,

Testing the SQL statement Bhavesh provided, I quickly discovered what you meant when saying the 
query was non-deterministic. When I added a 4th record :

insert into t (atype,acol#,adin) values ('DUPT','004','02246569');
insert into t (atype,acol#,ares) values ('MACT','5','02246569');

only one row was returned. I played with the query and here is what I came up with :

select atyp,acol,aadin,batype,bacol,bares
from (
select atyp,acol,aadin,batype,bacol,bares,drnk ,
       rank() over (partition by acol order by bacol) rnk
from (
select a.atype atyp,
         a.acol# acol,
         a.adin aadin,
         b.atype batype,
         b.acol# bacol,
         b.ares bares,
         dense_rank() over (partition by a.atype,a.adin order by a.acol#) drnk
from t a,t b
where a.adin=b.ares))
where drnk=rnk;

Feel free to comment.

Again thank you (and Bhavesh).

Marc-Andre 


5 stars Using Analytical Values to find latest info   June 3, 2005 - 10am Central time zone
Reviewer: anirudh from newyork, NY
Hi Tom,

we have a fairly large table with about 100 million rows, among others this table has 
the following columns

CREATE TABLE my_fact_table ( 
  staff_number     VARCHAR2 (10),   -- staff number 
  per_end_dt       DATE,            -- last day of month
  engagement_code  VARCHAR2 (30),   -- engagement code
  client_code      VARCHAR2 (20),   -- client code
  revenue          NUMBER (15,2)    -- revenue
  )

in this table the same engagement code can have different client codes for diffenet periods. This 
was at one point desirable and that is the reason client code was stored in this fact table instead 
of the engagement dimension. 

Our users now want us to update the client code in these transactions to the latest value of the 
client code (meaning - pick the client from the latest month for which we have got any transactions 
for that engagement)

This situation where same engagement has multiple clients across periods is there for about 5 % of 
the rows.

[btw - we do plan to do data-model change to reflect the new relationships - but that may take some 
time - hence the interim need to just update the fact table]

to implemnt these updates that may happen for several months, I'm trying to take the approach below 

which involve multiple queries and creation of a couple of temp tables - does it seem reasonable. i 
have a lurking feeling that with a deeper understanding of Analytic functions this can be further 
simplified - will appreciate your thoughts.

============= My Approach =================

-- Find the Engagements that have multiple Clients
CREATE TABLE amtest_mult_cli AS
WITH
v1 AS (SELECT DISTINCT engagement_code,client_code 
       FROM my_fact_table)
SELECT  engagement_code 
FROM v1
GROUP BY engagement_code
HAVING COUNT(*) > 1

-- Find What should be the correct client for those engagements
CREATE TABLE amtest_use_cli AS
SELECT engagement_code,per_end_dt,client_code
FROM
(
 SELECT engagement_code,per_end_dt,client_code
        row_number() OVER (PARTITION BY engagement_code 
          ORDER BY per_end_dt DESC, client_code DESC) 
        row_num
 FROM my_fact_table a,
     amtest_mult_cli b
 WHERE a.engagement_code = b.engagement_code
)
WHERE row_num = 1;

-- Update Correct Clients for those engagements
UPDATE my_fact_table a
SET a.client_code = 
    (SELECT b.client_code
     FROM amtest_use_cli b
     WHERE a.engagement_code = b.engagement_code)
WHERE EXISTS 
    (SELECT 1 
     FROM amtest_use_cli c
     WHERE a.engagement_code = c.engagement_code);
     
====================================================== 


Followup   June 3, 2005 - 12pm Central time zone:

why not:

merge into my_fact_table F
using 
( select engagement_code, 
         substr(max(to_char(per_end_dt,'yyyymmddhh24miss')||client_code ),15) cc
    from my_fact_table  
   group by engagement_code
  having count(distinct client_code) > 1 ) X
on ( f.engagement_code = x.engagement_code )
when matched 
     then update set client_code = x.cc
when not matched 
     then insert ( client_code ) values ( null ); <<== never can happen 
                                                  <<== in 10g, not needed!


That select finds the client_code for the max per_end_dt by engagement_code for engagement_code's 
that have more than one distinct client_code....


               first_value(client_code) 
                     over (partition by engagement_code 
                           order by per_end_dt desc, client_code desc ),
               count(distinct client_code)  

5 stars help with lead   June 9, 2005 - 1am Central time zone
Reviewer: Adolph from india
I have a table in the following structure:

create table cs_fpc_pr
(PRGM_C       VARCHAR2(10) not null,
 fpc_date     date  not null,
 TIME_code    VARCHAR2(3) not null,
 SUN_TYPE     varchar2(1)) 

insert into cs_fpc_pr values ('PRGM000222', to_date('08-may-2005','dd-mon-rrrr'), '33','1');

insert into cs_fpc_pr values ('PRGM000222', to_date('09-may-2005','dd-mon-rrrr'), '05','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('09-may-2005','dd-mon-rrrr'), '25','1');
insert into cs_fpc_pr values ('PRGM000222', to_date('09-may-2005','dd-mon-rrrr'), '45','3');

insert into cs_fpc_pr values ('PRGM000222', to_date('10-may-2005','dd-mon-rrrr'), '05','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('10-may-2005','dd-mon-rrrr'), '25','1');
insert into cs_fpc_pr values ('PRGM000222', to_date('10-may-2005','dd-mon-rrrr'), '45','3');

insert into cs_fpc_pr values ('PRGM000222', to_date('14-may-2005','dd-mon-rrrr'), '05','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('14-may-2005','dd-mon-rrrr'), '24','1');


insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '23','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '47','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '48','3');

insert into cs_fpc_pr values ('PRGM000242', to_date('09-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('09-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('09-may-2005','dd-mon-rrrr'), '46','3');

insert into cs_fpc_pr values ('PRGM000242', to_date('10-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('10-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('10-may-2005','dd-mon-rrrr'), '46','3');

insert into cs_fpc_pr values ('PRGM000242', to_date('11-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('11-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('11-may-2005','dd-mon-rrrr'), '46','3');

insert into cs_fpc_pr values ('PRGM000242', to_date('14-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('14-may-2005','dd-mon-rrrr'), '23','1');

commit;

select prgm_c,fpc_date,time_code,sun_type,
lead(fpc_date) over(partition by prgm_C order by fpc_date) next_date
from cs_fpc_pr
order by prgm_c,fpc_date,time_code; 


PRGM_C     FPC_DATE  TIM S NEXT_DATE
---------- --------- --- - ---------
PRGM000222 08-MAY-05 33  1 09-MAY-05
PRGM000222 09-MAY-05 05  3 09-MAY-05
PRGM000222 09-MAY-05 25  1 09-MAY-05
PRGM000222 09-MAY-05 45  3 10-MAY-05
PRGM000222 10-MAY-05 05  3 10-MAY-05
PRGM000222 10-MAY-05 25  1 10-MAY-05
PRGM000222 10-MAY-05 45  3 14-MAY-05
PRGM000222 14-MAY-05 05  3 14-MAY-05
PRGM000222 14-MAY-05 24  1
PRGM000242 08-MAY-05 07  3 08-MAY-05
PRGM000242 08-MAY-05 23  1 08-MAY-05
PRGM000242 08-MAY-05 47  3 08-MAY-05
PRGM000242 08-MAY-05 48  3 09-MAY-05
PRGM000242 09-MAY-05 07  3 09-MAY-05
PRGM000242 09-MAY-05 33  1 09-MAY-05
PRGM000242 09-MAY-05 46  3 10-MAY-05
PRGM000242 10-MAY-05 07  3 10-MAY-05
PRGM000242 10-MAY-05 33  1 10-MAY-05
PRGM000242 10-MAY-05 46  3 11-MAY-05
PRGM000242 11-MAY-05 07  3 11-MAY-05
PRGM000242 11-MAY-05 33  1 11-MAY-05
PRGM000242 11-MAY-05 46  3 14-MAY-05
PRGM000242 14-MAY-05 07  3 14-MAY-05
PRGM000242 14-MAY-05 23  1

I need to find the for a particular 'prgm_c' the next date & time code where the 'sun_type' field = 
'1'. 

A sample of the output should look something like this:

PRGM_C     FPC_DATE  TIM S NEXT_DATE  next_time
---------- --------- --- - ---------  -------
PRGM000222 08-MAY-05 33  1 09-MAY-05   25
PRGM000222 09-MAY-05 05  3 09-MAY-05   25
PRGM000222 09-MAY-05 25  1 10-MAY-05   25
PRGM000222 09-MAY-05 45  3 10-MAY-05   25
PRGM000222 10-MAY-05 05  3 10-MAY-05   25
PRGM000222 10-MAY-05 25  1 14-MAY-05   24
PRGM000222 10-MAY-05 45  3 14-MAY-05   24
PRGM000222 14-MAY-05 05  3 14-MAY-05   24
PRGM000222 14-MAY-05 24  1

Tom, Can you please help me with with this?

Regards 

 


Followup   June 9, 2005 - 6am Central time zone:

PRGM000222 10-MAY-05 05  3 10-MAY-05
PRGM000222 10-MAY-05 25  1 10-MAY-05
PRGM000222 10-MAY-05 45  3 14-MAY-05
PRGM000222 14-MAY-05 05  3 14-MAY-05
PRGM000222 14-MAY-05 24  1

you've got a problem with those fpc_dates and ordering by them. you have "dups" so no one of those 
10-may-05 comes "first" same with the 14th.  You need to figure out how to really order this data 
deterministically first.

My first attempt at this is:


tkyte@ORA9IR2W> select prgm_c, fpc_date, time_code, sun_type,
  2         to_date(substr( max(data) 
                 over (partition by prgm_c order by fpc_date desc), 
                     6, 14 ),'yyyymmddhh24miss') ndt,
  3             to_number( substr( max(data) 
                  over (partition by prgm_c order by fpc_date desc), 20) ) ntc
  4    from (
  5  select prgm_c,
  6         fpc_date,
  7             time_code,
  8             sun_type,
  9             case when lag(sun_type) 
                        over (partition by prgm_c order by fpc_date desc)  = '1'
 10                  then to_char( row_number() 
              over (partition by prgm_c order by fpc_date desc) , 'fm00000') ||
 11                               to_char(lag(fpc_date) 
              over (partition by prgm_c order by fpc_date desc),'yyyymmddhh24mi
ss')||
 12             lag(time_code) over (partition by prgm_c order by fpc_date desc)
 13                  end data
 14  from cs_fpc_pr
 15       )
 16  order by prgm_c,fpc_date,time_code
 17  /

PRGM_C     FPC_DATE  TIM S NDT              NTC
---------- --------- --- - --------- ----------
PRGM000222 08-MAY-05 33  1 09-MAY-05         25
PRGM000222 09-MAY-05 05  3 09-MAY-05         25
PRGM000222 09-MAY-05 25  1 09-MAY-05         25
PRGM000222 09-MAY-05 45  3 09-MAY-05         25
PRGM000222 10-MAY-05 05  3 10-MAY-05         25
PRGM000222 10-MAY-05 25  1 10-MAY-05         25
PRGM000222 10-MAY-05 45  3 10-MAY-05         25
PRGM000222 14-MAY-05 05  3
PRGM000222 14-MAY-05 24  1
PRGM000242 08-MAY-05 07  3 08-MAY-05         23
PRGM000242 08-MAY-05 23  1 08-MAY-05         23
PRGM000242 08-MAY-05 47  3 08-MAY-05         23
PRGM000242 08-MAY-05 48  3 08-MAY-05         23
PRGM000242 09-MAY-05 07  3 10-MAY-05         33
PRGM000242 09-MAY-05 33  1 10-MAY-05         33
PRGM000242 09-MAY-05 46  3 10-MAY-05         33
PRGM000242 10-MAY-05 07  3 10-MAY-05         33
PRGM000242 10-MAY-05 33  1 10-MAY-05         33
PRGM000242 10-MAY-05 46  3 10-MAY-05         33
PRGM000242 11-MAY-05 07  3 14-MAY-05         23
PRGM000242 11-MAY-05 33  1 14-MAY-05         23
PRGM000242 11-MAY-05 46  3 14-MAY-05         23
PRGM000242 14-MAY-05 07  3
PRGM000242 14-MAY-05 23  1

24 rows selected.

but the lack of distinctness on the fpc_date means you might get "a different answer" with the same 
set of data. 

4 stars reply   June 9, 2005 - 7am Central time zone
Reviewer: Adolph from India
Sorry for not being clear at the first instance so here goes.... A program (prgm_C) will have a 
maximum of one entry in the table for a combination of a (fpc_date & time_code). 

This time_code actually maps to another table where '01' is '01:00:00' , '02' is '01:30:00' & so on 
(i.e. times stored in varchar2 formats ) 

So basically a program will exist for a fpc_date and a time_code only once

I hope i'm making sense.

Regards
 


Followup   June 9, 2005 - 7am Central time zone:

tkyte@ORA9IR2W> select prgm_c,
  2         fpc_date,
  3         time_code,
  4         sun_type,
  5         to_date(
  6            substr( max(data)
  7              over (partition by prgm_c
  8                    order by fpc_date desc,
  9                             time_code desc),
 10                  6, 14 ),'yyyymmddhh24miss') ndt,
 11             to_number(
 12               substr( max(data)
 13                 over (partition by prgm_c
 14                       order by fpc_date desc,
 15                                time_code desc), 20) ) ntc
 16    from (
 17  select prgm_c,
 18         fpc_date,
 19         time_code,
 20         sun_type,
 21         case when lag(sun_type)
 22                     over (partition by prgm_c
 23                           order by fpc_date desc,
 24                                    time_code desc)  = '1'
 25                  then
 26                  to_char( row_number()
 27                           over (partition by prgm_c
 28                                 order by fpc_date desc,
 29                                    time_code desc) , 'fm00000') ||
 30                  to_char(lag(fpc_date)
 31                            over (partition by prgm_c
 32                                  order by fpc_date desc,
 33                                  time_code desc),'yyyymmddhh24mi ss')||
 34                  lag(time_code)
 35                     over (partition by prgm_c
 36                           order by fpc_date desc,
 37                                    time_code desc)
 38                  end data
 39  from cs_fpc_pr
 40       )
 41  order by prgm_c,fpc_date,time_code
 42  /

PRGM_C     FPC_DATE  TIM S NDT              NTC
---------- --------- --- - --------- ----------
PRGM000222 08-MAY-05 33  1 09-MAY-05         25
PRGM000222 09-MAY-05 05  3 09-MAY-05         25
PRGM000222 09-MAY-05 25  1 10-MAY-05         25
PRGM000222 09-MAY-05 45  3 10-MAY-05         25
PRGM000222 10-MAY-05 05  3 10-MAY-05         25
PRGM000222 10-MAY-05 25  1 14-MAY-05         24
PRGM000222 10-MAY-05 45  3 14-MAY-05         24
PRGM000222 14-MAY-05 05  3 14-MAY-05         24
PRGM000222 14-MAY-05 24  1
PRGM000242 08-MAY-05 07  3 08-MAY-05         23
PRGM000242 08-MAY-05 23  1 09-MAY-05         33
PRGM000242 08-MAY-05 47  3 09-MAY-05         33
PRGM000242 08-MAY-05 48  3 09-MAY-05         33
PRGM000242 09-MAY-05 07  3 09-MAY-05         33
PRGM000242 09-MAY-05 33  1 10-MAY-05         33
PRGM000242 09-MAY-05 46  3 10-MAY-05         33
PRGM000242 10-MAY-05 07  3 10-MAY-05         33
PRGM000242 10-MAY-05 33  1 11-MAY-05         33
PRGM000242 10-MAY-05 46  3 11-MAY-05         33
PRGM000242 11-MAY-05 07  3 11-MAY-05         33
PRGM000242 11-MAY-05 33  1 14-MAY-05         23
PRGM000242 11-MAY-05 46  3 14-MAY-05         23
PRGM000242 14-MAY-05 07  3 14-MAY-05         23
PRGM000242 14-MAY-05 23  1

24 rows selected.

Just needed to add "time_code DESC"


See

http://www.oracle.com/technology/oramag/oracle/04-mar/o24asktom.html
analytics to the rescue

for the "carry down" technique I used here.  In 10g, we'd simplify using "ignore nulls" in the 
LAST_VALUE function instead of the max() and row_number() trick 

5 stars brilliant   June 9, 2005 - 9am Central time zone
Reviewer: Adolph from India
Thank you very much Tom. The query works like a charm.I will read up the link. Analytics do rock n 
roll :)

 


3 stars Working on an Analytic Query   June 9, 2005 - 12pm Central time zone
Reviewer: Scott from Long Island, NY USA
Tom,
   From your example for Mark's problem on 4/8, it seems that you need to specify a number of 
columns to output this way.  Is there a way to have a varying number of columns.  For example, I 
need to have a query that takes a date range, and makes each date a column heading.  Any help would 
be greatly appreciated.
Thanks,
Scott 


Followup   June 9, 2005 - 6pm Central time zone:

you need dynamic sql.  the number of columns in a query is "well defined, known at parse time" by 
definition.

If you have access to expert one on one Oracle, I demostrated how to do this with ref cursors in a 
stored procedure.  but you have to run a query, to get the set of column "headings" and write a 
query bsaed on that. 

5 stars Tom any idea how I can re write this piece of code   June 9, 2005 - 3pm Central time zone
Reviewer: A reader 
 decode ((SELECT ih.in_date
                 FROM major_sales ih
                WHERE ih.container = i.container
                  AND sales > i.container_id
                  AND sales = (SELECT MIN(ihh.container_id)
                                          FROM major_sales ihh
                                         WHERE ihh.container_id > i.container_id
                                           AND ihh.container = i.container)), NULL, 


Followup   June 9, 2005 - 6pm Central time zone:

not out of context, no. 

5 stars I am still having problem with analytical function   July 1, 2005 - 12pm Central time zone
Reviewer: A reader 
select i.container,ssl_user_code,ssl_user_code ssl,cl.code length_code, out_trucker_code, 
i.chassis,
lead(in_date) over (partition by i.container order by in_date) next_in_date, 
out_date, 
lead (out_date) over (partition by i.container order by in_date) o_date 
from  his_containers i, 
       container_masters cm, 
       tml_container_lhts clht, 
       tml_container_lengths cl 
WHERE cm.container = i.container 
and cm.lht_code = clht.code 
and clht.length_code = cl.code 
and ssl_user_code = 'ACL' 
and  i.container like '%408014' 
and voided_date is null 
and ((in_date between  to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS') 
               and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')) OR 
              (out_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS') 
               and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')))

results:
----------
CONTAINER    SSL_USER_CODE    SSL LENGTH_CODE    OUT_TRUCKER_CODE  CHASSIS  NEXT_IN_DATE            
OUT_DATE        O_DATE
ACLU408014    ACL    ACL    4        R0480             3/22/2005 2:52:41 PM    3/21/2005 3:45:48 PM 
 4/6/2005 2:25:59 PM
ACLU408014    ACL    ACL    4        J1375                        4/6/2005 2:25:59 PM    



1. how can I get rid of the 4/6/2005 2:25:59 PM???




  


Followup   July 1, 2005 - 1pm Central time zone:

can you be more specific about why you don't like April 6th as 2:25:59pm?  what is it about that 
you don't like?

That'll help me tell you how to in general remove it.  What is the criteria for removal  

4 stars analytical query   July 1, 2005 - 2pm Central time zone
Reviewer: A reader 
Tom,

We are trying to build the client within a the month, in this case is within april. I also would 
like to know how many days elapsed during 2 days so I can bill them. 
 


Followup   July 1, 2005 - 3pm Central time zone:

"how many days elapsed between 2 days"

the answer is: 2

but are you asking how to do date arithmetic?  Just subtract. 

4 stars sorry...within March   July 1, 2005 - 2pm Central time zone
Reviewer: A reader 


5 stars more information   July 1, 2005 - 2pm Central time zone
Reviewer: A reader 
Tom, 

This is how the data looks 

IN_DATE             OUT_DATE                CONTAINER
1/3/2005 2:23:05 PM    1/10/2005 5:05:16 PM    ACLU408014
1/11/2005 1:04:49 PM    1/12/2005 8:49:06 AM    ACLU408014
1/14/2005 12:09:50 PM    1/18/2005 6:39:10 AM    ACLU408014
3/19/2005 2:10:24 AM    3/21/2005 3:45:48 PM    ACLU408014
3/22/2005 2:52:41 PM    4/6/2005 2:25:59 PM    ACLU408014
4/7/2005 1:24:43 PM    4/10/2005 2:21:59 AM    ACLU408014

and I would like to get the pair within  the same month
 
 


Followup   July 1, 2005 - 3pm Central time zone:

the pair of "what"? 

5 stars I would like to get all the dates within the month   July 1, 2005 - 4pm Central time zone
Reviewer: A reader 


Followup   July 1, 2005 - 4pm Central time zone:

please be much much more specific.  pretend you trying to explain this to a newbie...


not following the requirement at all..
http://tkyte.blogspot.com/2005/06/how-to-ask-questions.html

5 stars one more try   July 1, 2005 - 4pm Central time zone
Reviewer: A reader 
This is how the data looks as of now with the above query.
IN_DATE             OUT_DATE                CONTAINER
1/3/2005 2:23:05 PM    1/10/2005 5:05:16 PM    ACLU408014
1/11/2005 1:04:49 PM    1/12/2005 8:49:06 AM    ACLU408014
1/14/2005 12:09:50 PM    1/18/2005 6:39:10 AM    ACLU408014
3/19/2005 2:10:24 AM    3/21/2005 3:45:48 PM    ACLU408014
3/22/2005 2:52:41 PM    4/6/2005 2:25:59 PM    ACLU408014
4/7/2005 1:24:43 PM    4/10/2005 2:21:59 AM    ACLU408014

I Would like to get it as the following


IN_DATE                  OUT_DATE                CONTAINER
 
3/19/2005 2:10:24 AM    3/21/2005 3:45:48 PM    ACLU408014
3/22/2005 2:52:41 PM    
 
This is what I am looking for.....this way.
 


Followup   July 1, 2005 - 4pm Central time zone:

still not much of a specification (important thing for those of us in this industry - being able to 
describe the problem at hand in detail, so someone else can take the problem definition and code 
it).


Let me try, this is purely a speculative guess on my part:


I would like all records in the table such that the in_date-out_date range covered at least part of 
the month of march in the year 2005.

If the out_date falls AFTER march, I would like it nulled out.

(this part is a total guess) if the in_date falls BEFORE march, i would like it nulled out as well 
(for consistency?)


Ok, stated like that I can give you untested psuedo code since there are no create tables and no 
inserts to play with:


select case when in_date between to_date( :x, 'dd-mon-yyyy' )
                             and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
            then in_date end,
       case when out_date between to_date( :x, 'dd-mon-yyyy' )
                              and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
            then out_date end,
       container
  from T
 where in_date <= to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
   and out_date >= to_date( :x, 'dd-mon-yyyy' )


bind in :x = '01-mar-2005' and :y = '01-apr-2005' for your dates.
      

5 stars As you requested   July 1, 2005 - 5pm Central time zone
Reviewer: A reader 
CREATE TABLE CONTAINER_MASTERS
(
  CONTAINER              VARCHAR2(10 BYTE)      NOT NULL,
  CHECK_DIGIT            VARCHAR2(1 BYTE)       NOT NULL,
  SSL_OWNER_CODE         VARCHAR2(5 BYTE)       NOT NULL,
  LHT_CODE               VARCHAR2(5 BYTE)       NOT NULL
  
)

INSERT INTO CONTAINER_MASTERS ( CONTAINER, CHECK_DIGIT, SSL_OWNER_CODE,
LHT_CODE ) VALUES ( '045404', '1', 'BCL', '5AV'); 
commit;
 

CREATE TABLE TML_CONTAINER_LHTS
(
  CODE                     VARCHAR2(5 BYTE)     NOT NULL,
  SHORT_DESCRIPTION        VARCHAR2(10 BYTE)    NOT NULL,
  LONG_DESCRIPTION         VARCHAR2(30 BYTE)    NOT NULL,
  ISO                      VARCHAR2(4 BYTE)     NOT NULL,
  LENGTH_CODE              VARCHAR2(5 BYTE)     NOT NULL
 
)

INSERT INTO TML_CONTAINER_LHTS ( CODE, SHORT_DESCRIPTION, LONG_DESCRIPTION, ISO, LENGTH_CODE,
HEIGHT_CODE, TYPE_CODE ) VALUES ( '5BR', '5BR', '45'' 9''6" Reefer', '5432', '5', 'B', 'R'); 
commit;
 


CREATE TABLE TML_CONTAINER_LENGTHS
(
  CODE               VARCHAR2(5 BYTE)           NOT NULL,
  SHORT_DESCRIPTION  VARCHAR2(10 BYTE)          NOT NULL,
 
)


INSERT INTO TML_CONTAINER_LENGTHS ( CODE, SHORT_DESCRIPTION,
LONG_DESCRIPTION ) VALUES ( 
'2', '20''', '20 Ft'); 
INSERT INTO TML_CONTAINER_LENGTHS ( CODE, SHORT_DESCRIPTION,
LONG_DESCRIPTION ) VALUES ( 
'4', '40''', '40 Ft'); 
commit;
  


Followup   July 1, 2005 - 6pm Central time zone:

umm, specification?

did I get it right?  if so, did you *try* the query at all??? 

1 stars Here is a SQL puzzle for analytics zealots   July 1, 2005 - 10pm Central time zone
Reviewer: Mikito Harakiri 
OK, if anybody suceed writing the following with analytics, I would convert to analytics once and 
forever. Credit it in the book, of course. 

Given:
table Hotels (
   name string,
   price integer,
   distance    
)

Here is a query that sounds very analytical:
Order hotels by price, distance. Compare each record with its neighbour (lag?), and one of them is 
inferior to the other by both criteria -- more pricey and father from the beach -- then throw it 
away from the result. 


Followup   July 2, 2005 - 9am Central time zone:

define neighbor.

is neighbor defined by price or by distance?  your specification is lacking many many details 
(seems to be a recurring theme on this page for some reason)

sounds like you want the cheapest closest hotel to the beach.  for each row, if something closer 
and cheaper exists in the original set, do not keep that row.

sounds like a where not exists, not analytics to me.  but then - the specification is lacking.

And lets see, in order to appreciate a tool, you have to be shown that the tool can be the end all, 
be all answer to everything??!??  that is downright silly don't you think.

Let's see:

"if anyone succeeds in making the Oracle 9i merge command select data, I would convert to merge 
once and forever"

"if anyone succeeds in making my car fly into outer space, I would convert to cars once and 
forever"

Think about your logic here.


There are no zealots here, there are people willing to read the documentation, understand that 
things work the way they work, not the way THEY think they should have been made to work, and have 
jobs to do, pragmatic practical things to accomplish and are willing to use the best tool for the 
job. 

3 stars specs   July 3, 2005 - 11pm Central time zone
Reviewer: Mikito Harakiri 
Yes, find all the hotels that are not dominated by the others by both price and distance. That is 
"not exists" query, but it is a very inefficient one: 

select * from hotels h
where not exists (select * from hotels hh
   where hh.price < h.price and hh.distance <= h.distance
   or hh.price <= h.price and hh.distance < h.distance
) 

The one that reformulated is much more efficient, but how do I express it in SQL? 


Followup   July 4, 2005 - 10am Central time zone:

the one that reforumulated?  

and why do you have the or in there at all.  to dominate by both pric and distance would simply be:

where not exists ( select NULL
                     from hotels hh
                    where hh.price < h.price 
                      AND hh.distinct < h.distance )

You said "by BOTH price and distance", nothing but nothing about ties.


ops$tkyte@ORA9IR2> /*
DOC>
DOC>drop table hotels;
DOC>
DOC>create table hotels
DOC>as
DOC>select object_name name, object_id price, object_id distance, all_objects.*
DOC>  from all_objects;
DOC>
DOC>create index hotel_idx on hotels(price,distance);
DOC>
DOC>exec dbms_stats.gather_table_stats( user, 'T', cascade=>true );
DOC>*/
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select h1.name, h1.price, h1.distance
  2    from hotels h1
  3   where not exists ( select NULL
  4                        from hotels h2
  5                       where h2.price < h1.price
  6                         AND h2.distance < h1.distance )
  7  /
 
NAME                                PRICE   DISTANCE
------------------------------ ---------- ----------
I_OBJ#                                  3          3
 
Elapsed: 00:00:00.22
ops$tkyte@ORA9IR2> select count(*) from hotels;
 
  COUNT(*)
----------
     27837
 
Elapsed: 00:00:00.00

it doesn't seem horribly inefficient. 

5 stars Tom Can we give it one more try   July 5, 2005 - 9am Central time zone
Reviewer: A reader 
Tom, When I ran the query it returned nothing. I am sending you the whole test case. This is what I 
would like to see
in the report.

out_date                in_date               container
1/18/2005 6:39:10 AM    3/19/2005 2:10:24 AM  ACLU408014
3/21/2005 3:45:48 PM    3/22/2005 2:52:41 PM  ACLU408014


 

CREATE TABLE BETA
(
  IN_DATE    DATE                               NOT NULL,
  OUT_DATE   DATE,
  CONTAINER  VARCHAR2(10 BYTE)                  NOT NULL
)

INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '01/03/2005 02:23:05 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '01/10/2005 05:05:16 PM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '01/11/2005 01:04:49 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '01/12/2005 08:49:06 AM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '01/14/2005 12:09:50 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '01/18/2005 06:39:10 AM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '03/19/2005 02:10:24 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '03/21/2005 03:45:48 PM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '03/22/2005 02:52:41 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '04/06/2005 02:25:59 PM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '04/07/2005 01:24:43 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '04/10/2005 02:21:59 AM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
commit;

select in_date, out_date,container,
            case when in_date between to_date('01-mar-2005', 'dd-mon-yyyy' )
            and to_date( '31-mar-2005', 'dd-mon-yyyy' )-1/24/60/60
            then in_date end,
            case when out_date between to_date( '01-mar-2005', 'dd-mon-yyyy' )
            and to_date( '31-mar-2005', 'dd-mon-yyyy' )-1/24/60/60
            then out_date end
      container
  from BETA
  WHERE in_date <= to_date( '01-mar-2005', 'dd-mon-yyyy' )-1/24/60/60
  and out_date >= to_date( '31-mar-2005', 'dd-mon-yyyy' ) 


Followup   July 5, 2005 - 9am Central time zone:

you know, this is going beyond....

*s*p*e*c*i*f*i*c*a*t*i*o*n*

pretend you were explaining to your mother (who presumably doesn't work in IT and doesn't know sql 
or databases or whatever) what needed to be done.  

that is what I need to see.  I obviously don't know your logic of getting from "A (inputs) to B 
(outputs)" and you need to explain that.


and when I run my query:

ops$tkyte@ORA10G> variable x varchar2(20)
ops$tkyte@ORA10G> variable y varchar2(20)
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec :x := '01-mar-2005'; :y := '01-apr-2005'
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G> select case when in_date between to_date( :x, 'dd-mon-yyyy' )
  2                               and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
  3              then in_date end,
  4         case when out_date between to_date( :x, 'dd-mon-yyyy' )
  5                                and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
  6              then out_date end,
  7         container
  8    from beta
  9   where in_date <= to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
 10     and out_date >= to_date( :x, 'dd-mon-yyyy' )
 11  /
 
CASEWHENI CASEWHENO CONTAINER
--------- --------- ----------
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05           ACLU408014


I do get output, not what you say you want, but output.  you need to tell me THE LOGIC here.  (and 
maybe when you write it down, specify it, the answer will just naturally appear)

so yes, we can definitely give it one more try but if and only if you provide the details, the 
specification, the logic, the thoughts behind this.

Not just "i have this and want that", it doesn't work that way. 

5 stars in english   July 5, 2005 - 10am Central time zone
Reviewer: Jean 
We are trying to bill from the time the truck left to the
time it returned. For example in the above query.
I would like to bill him from 1/18/2005 to 3/19/2005. So it must be part of the report. That's the 
the whole key here. 


5 stars clarification!!   July 5, 2005 - 10am Central time zone
Reviewer: A reader 
the time he left       1/18/2005 6:39:10 AM    
the time he came back  3/22/2005 2:52:41 PM     
 
hope this helps.... 


Followup   July 5, 2005 - 11am Central time zone:

ops$tkyte@ORA9IR2> select * from beta order by in_date;
 
IN_DATE   OUT_DATE  CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014   <<<=== gap, no 13
14-JAN-05 18-JAN-05 ACLU408014   <<=== big gap, no 19.... mar 18
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 06-APR-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014
 
6 rows selected.


I don't get it.  I don't get it AT ALL.   does anyone else ?  

nope, not getting it even a teeny tiny bit myself.


give us LOGIC, ALGORITHM, INFORMATION.


like I said, pretend I'm your mother who has never seen a computer -- explain the logic at that 
level (or I just give up) 

5 stars BETTER TABLE   July 5, 2005 - 11am Central time zone
Reviewer: A reader 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '01/03/2005 02:23:05 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '01/10/2005 05:05:16 PM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '01/11/2005 01:04:49 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '01/12/2005 08:49:06 AM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '01/14/2005 12:09:50 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '01/18/2005 06:39:10 AM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '03/19/2005 02:10:24 AM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '03/21/2005 03:45:48 PM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '04/07/2005 01:24:43 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '04/10/2005 02:21:59 AM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '03/22/2005 02:52:41 PM', 'MM/DD/YYYY HH:MI:SS AM'),  TO_Date( '04/06/2005 02:25:59 PM', 
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014'); 
commit;
 

OUT_DATE            IN_DATE
1/18/2005 6:39:10 AM           3/19/2005 2:10:24 AM    
3/21/2005 3:45:48 PM           3/22/2005 2:52:41 PM


LEFT 1/18 CAME BACK 3/19
LEFT 3/21 CAME BACK 3/22
 


Followup   July 5, 2005 - 12pm Central time zone:

you have totally and utterly missed my point.


 
IN_DATE   OUT_DATE  CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014
14-JAN-05 18-JAN-05 ACLU408014
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 06-APR-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014
 
6 rows selected.


sigh.  

what if the records are

IN_DATE   OUT_DATE  CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014
14-JAN-05 18-JAN-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014

specification, you know what, without it, I'm not even going to look anymore.  Textual description 
of precisely what you want.  I'm tired of guessing.  I think I can guess, but I don't even want to 
guess about "missing" months like my second example here.
 

5 stars English Explanation   July 5, 2005 - 1pm Central time zone
Reviewer: A reader 
Sorry for going back and forth on this report. All I want is the following: We have trucks that 
comes and out of yard. All we are looking for is when the truck came in and the "next record" 
nothing in between because a truck can come in many times during a month. So we want when it first 
came in and the very last time he went out for a particular month.That is to say the last time he 
left the yard. So the date and time should give us this information. Finally this  report should be 
 within a month.

example:

IN_DATE   OUT_DATE  CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014
14-JAN-05 18-JAN-05 ACLU408014
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 06-APR-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014
 
6 rows selected.

in this case we want

in_date                  out_date
--------                --------
3/22/2005 2:52:41PM    1/18/2005 6:39:10 AM



 


 


Followup   July 5, 2005 - 1pm Central time zone:

so what happened to the 21st/22nd of march this time.  the answer keeps changing?

and what if, there are no records for march in the table (nothing in_date/out_date wise)

 

5 stars follow up   July 5, 2005 - 1pm Central time zone
Reviewer: jean 
Tom,

We realized that it maybe too much to get the dates in between
so we opt for just getting the in_date and out_date. By the way there will always be data so do not 
worry about if....

Thanks!!

 


Followup   July 5, 2005 - 3pm Central time zone:

feb, what about feb?  you said there would always be data?  I want to run this for feb?

do you or do you not need to be concerned about a missing month. 

5 stars do not be concerned!   July 5, 2005 - 3pm Central time zone
Reviewer: A reader 
Please do not be concerned about missing a month. This is a report. 


Followup   July 5, 2005 - 3pm Central time zone:

umm, I want the report for feburary

it is blank.

now what?  it should not be blank should it?  this is a problem, this is a problem in our industry 
in general.  You get what you ask for (sometimes) and if you ask repeatedly for the wrong thing, 
that's what you'll get.  I am concerned -- by this line of question here.

Hey, here you go:

ops$tkyte-ORA9IR2> select *
  2    from (
  3  select
  4         lag(out_date) over (partition by container order by in_date) last_out_date,
  5         in_date,
  6             container
  7    from beta
  8         )
  9   where trunc(in_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
 10      or trunc(last_out_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy');
 
LAST_OUT_ IN_DATE   CONTAINER
--------- --------- ----------
18-JAN-05 19-MAR-05 ACLU408014
21-MAR-05 22-MAR-05 ACLU408014

gets the answer given your data, makes a zillion assumptions (50% of which are probably wrong), 
won't work for FEB, probably doesn't answer the question behind the question, but hey, there you 
go.   

4 stars Thanks!!!   July 6, 2005 - 9am Central time zone
Reviewer: A reader 
I will try it ...Thanks a zillion for your efforts and your patient. 


4 stars Thanks!   July 6, 2005 - 11am Central time zone
Reviewer: A reader 
CREATE TABLE BETA3
(
  IN_DATE    DATE                               NOT NULL,
  OUT_DATE   DATE,
  CONTAINER  VARCHAR2(10 BYTE)                  NOT NULL
)



INSERT INTO BETA3 ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '07/20/2004 03:08:49 PM', 'MM/DD/YYYY HH:MI:SS AM'),  
TO_Date( '08/10/2004 02:45:52 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU040312'); 
INSERT INTO BETA3 ( IN_DATE, OUT_DATE, CONTAINER ) VALUES ( 
 TO_Date( '03/19/2005 01:55:06 AM', 'MM/DD/YYYY HH:MI:SS AM'), 
 TO_Date( '03/27/2005 05:05:36 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU040312'); 
commit;
 
Tom I was able to get the first pair as show 

last_out_date          in_date                  container
8/10/2004 2:45:52 AM     3/19/2005 1:55:06 AM     ACLU040312

which is fine...

But can I get the other pair?

last_out_date            in_date                  container
3/27/2005 5:05:36 AM   
 


Followup   July 6, 2005 - 12pm Central time zone:

problem is, you are "missing" a row and 'making up' data is hard.

it might be

ops$tkyte-ORA10G> select decode( r, 1, last_out_date, out_date ),
  2         decode( r, 1, in_date, next_in_date )
  3    from (
  4  select
  5         lag(out_date) over (partition by container order by in_date) last_out_date,
  6         in_date, out_date,
  7         lead(in_date) over (partition by container order by in_date) next_in_date,
  8             container
  9    from beta3
 10         ), ( select 1 r from dual union all select 2 r from dual )
 11   where ((
 12          trunc(in_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
 13          or
 14                  trunc(last_out_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
 15             ) and r = 1 )
 16             or
 17             ( next_in_date is null and r = 2 )
 18  /
 
DECODE(R,1,LAST_OUT_ DECODE(R,1,IN_DATE,N
-------------------- --------------------
10-aug-2004 02:45:52 19-mar-2005 01:55:06
27-mar-2005 05:05:36

still curious what happens in feb. 

5 stars Please refer some books to learn Oracle Analytic functions   July 7, 2005 - 7am Central time zone
Reviewer: Vijay from India


Followup   July 7, 2005 - 9am Central time zone:

data warehousing guide (freely available on otn.oracle.com)

Expert one on one Oracle (I have a big chapter on them in there) 

5 stars Thank you very much!!   July 8, 2005 - 10am Central time zone
Reviewer: Jean 
I want to thank you for the last query!!! it worked very well,even tho I still get dates outside of 
the range. But overall it's fine.


  


4 stars How to get contiguous date ranges from Start_date, end_date pairs?   July 11, 2005 - 3pm Central time zone
Reviewer: Bob Lyon from Houston
-- Tom, Suppose I have a table with data...

-- MKT_CD START_DT_GMT      END_DT_GMT
-- ------ ----------------- -----------------
-- AAA    07/11/05 00:00:00 07/12/05 00:00:00
-- BBB    07/11/05 00:00:00 07/11/05 01:00:00
-- BBB    07/11/05 01:00:00 07/11/05 02:00:00
-- BBB    07/11/05 02:00:00 07/11/05 03:00:00
-- BBB    07/11/05 06:00:00 07/11/05 07:00:00
-- BBB    07/11/05 07:00:00 07/11/05 08:00:00

-- What I would like to get is the "contiguous date ranges"
-- by MKT_CD, i.e., 

-- MKT_CD START_DT_GMT      END_DT_GMT
-- ------ ----------------- -----------------
-- AAA    07/11/05 00:00:00 07/12/05 00:00:00
-- BBB    07/11/05 00:00:00 07/11/05 03:00:00
-- BBB    07/11/05 06:00:00 07/11/05 08:00:00

-- I have played with LAG/LEAD/FIRST_VALUE/LAST_VALUE
-- but seem to just "go in circles" trying to code this.

-- Here is the test data setup (Oracle 9.2.0.6) :

CREATE GLOBAL TEMPORARY TABLE NM_DEMAND_BIDS_API_GT
(
  MKT_CD           VARCHAR2(6) NOT NULL,
  START_DT_GMT     DATE        NOT NULL,
  END_DT_GMT       DATE        NOT NULL
)
ON COMMIT PRESERVE ROWS;

-- This code has 24 hours
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('AAA', TRUNC(SYSDATE), TRUNC(SYSDATE) + 1);
-- A second code goes by hours
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 00/24, TRUNC(SYSDATE) + 01/24);
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 01/24, TRUNC(SYSDATE) + 02/24);
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 02/24, TRUNC(SYSDATE) + 03/24);
-- and has an intentional gap
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 06/24, TRUNC(SYSDATE) + 07/24);
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 07/24, TRUNC(SYSDATE) + 08/24);

-- Query

SELECT MKT_CD, START_DT_GMT, END_DT_GMT
FROM NM_DEMAND_BIDS_API_GT;

 


Followup   July 11, 2005 - 3pm Central time zone:

based on:
http://www.oracle.com/technology/oramag/oracle/04-mar/o24asktom.html
ops$tkyte@ORA9IR2> select mkt_cd, min(start_dt_gmt), max(end_dt_gmt)
  2    from (
  3  select mkt_cd, start_dt_gmt, end_dt_gmt,
  4         max(grp) over (partition by mkt_cd order by start_dt_gmt) mgrp
  5    from (
  6  SELECT MKT_CD,
  7         START_DT_GMT,
  8         END_DT_GMT,
  9         case when lag(end_dt_gmt) over (partition by mkt_cd order by start_dt_gmt) <> 
start_dt_gmt
 10                   or
 11                   lag(end_dt_gmt) over (partition by mkt_cd order by start_dt_gmt) is null
 12              then row_number() over (partition by mkt_cd order by start_dt_gmt)
 13          end grp
 14    FROM NM_DEMAND_BIDS_API_GT
 15         )
 16         )
 17   group by mkt_cd, mgrp
 18   order by 1, 2
 19  /
 
MKT_CD MIN(START_DT_GMT)    MAX(END_DT_GMT)
------ -------------------- --------------------
AAA    11-jul-2005 00:00:00 12-jul-2005 00:00:00
BBB    11-jul-2005 00:00:00 11-jul-2005 03:00:00
BBB    11-jul-2005 06:00:00 11-jul-2005 08:00:00
 
 

5 stars Thanks!   July 11, 2005 - 5pm Central time zone
Reviewer: Bob Lyon from Houston
Wow, that was fast.

The trick here is the MAX() analytic function.  I could tag the lines where a break was to occur 
but couldn't figure out how to carry forward the tag/grp.

Thanks Again! 


5 stars Analytical functions book   July 11, 2005 - 11pm Central time zone
Reviewer: Vijay from India
Thanks a lot 


5 stars More Help   July 26, 2005 - 5pm Central time zone
Reviewer: Jean 
Tom,

How can I get "just" the record within the scope? I am getting record outside of march.

select container,decode( r, 1, last_out_date, out_date )out_date, decode( r, 1, in_date, 
next_in_date) in_date, 
           code length_code,chassis,out_trucker_code,ssl_user_code ssl, ssl_user_code,out_mode 
           from ( 
     select lag(out_date) over (partition by i.container order by in_date) 
           last_out_date, 
           i.ssl_user_code, 
           in_date, 
           cl.code, 
           i.out_trucker_code, 
           i.ssl_user_code ssl, 
           i.container, 
           i.chassis, 
           out_mode, 
           out_date, 
           clht.length_code, 
           lead(in_date) over (partition by i.container order by in_date) 
          next_in_date 
          from his_containers i,container_masters cm,tml_container_lhts clht,tml_container_lengths 
cl 
          where cm.container = i.container 
          and cm.lht_code = clht.code 
          and cl.code = clht.length_code 
          and ssl_user_code = 'ACL' 
          and i.container = 'ACLU214285' 
          and voided_date is null 
          and chassis is  null 
          and in_mode = 'T' 
          and out_mode = 'T' ), ( select 1 r from dual union all select 2 r from dual ) 
          where (( trunc(in_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy') 
          or  trunc(last_out_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')) 
          and r = 1 ) or ( next_in_date is null and r = 2 )
          order by out_date
 


Followup   July 26, 2005 - 5pm Central time zone:

select * 
  from (Q)
 where <any other conditions you like>
 order by out_date;


replace Q with your query. 

3 stars that's what I got in my query.....   July 26, 2005 - 6pm Central time zone
Reviewer: A reader 


Followup   July 26, 2005 - 6pm Central time zone:

don't know what you mean 

4 stars I thought I was doing what you suggested already...   July 26, 2005 - 6pm Central time zone
Reviewer: A reader 


Followup   July 26, 2005 - 6pm Central time zone:

I cannot see your output, obviously you are getting more data than you wanted it -- add  to the 
predicate in order to filter it out.  don't know what else to say. 

4 stars More information..   July 27, 2005 - 9am Central time zone
Reviewer: Jean 
the way it was before

CONTAINER    OUT_DATE            IN_DATE                LENGTH_CODE    CHASSIS    OUT_TRUCKER_CODE  
  
ACLU217150    6/25/2004 2:58:01 PM    3/11/2005 7:36:29 PM    4    E2131    ACL    ACL    T



---with your changes---

CONTAINER    OUT_DATE            IN_DATE                 LENGTH_CODE    CHASSIS    OUT_TRUCKER_CODE 
       ACLU217150    6/25/2004 2:58:01 PM    3/11/2005 7:36:29 PM    4        E2131



my history tables


CONTAINER_ID    OUT_DATE    IN_DATE
31779    6/21/2004 10:03:25 AM    6/16/2004 1:33:50 AM
55317    6/25/2004 2:58:01 PM    6/25/2004 2:19:49 PM
672863    3/2/2005 7:03:31 PM    2/26/2005 6:03:49 PM
708598    4/4/2005 3:31:03 PM    3/11/2005 7:36:29 PM
779305    4/16/2005 1:03:36 PM    4/6/2005 2:04:53 PM

as you can see I am not picking up the records within the month of march...with or without 
the changes to the query.  


Followup   July 27, 2005 - 10am Central time zone:

sorry -- you'll need to work through this, you see the techniques involved right -- lag, lead, 
analytic functions, YOU understand your data much better than I.

(because in part, frankly, the "way it was before" and "with your changes" look, well, I don't know 
-- the same I think to me as displayed here) 

5 stars Thanks for your help!   July 27, 2005 - 1pm Central time zone
Reviewer: A reader 
I know the data, however I thought I was going to be something easy just to get the date within 
march...I guess not. 


4 stars count number of rows in a number of ranges   July 27, 2005 - 6pm Central time zone
Reviewer: A reader 
Hi

I would like to count the number of rows I have per range of values. For example

SELECT   RANGE, SUM(suma) total_per_deptno
  FROM (SELECT CASE
                  WHEN deptno between 10 and 20 THEN '10-20'
                  ELSE '30'
               END RANGE,
               deptno, 1 SUMA
          FROM scott$emp)
GROUP BY RANGE

RANGE TOTAL_PER_DEPTNO
----- ----------------
10-20                8
30                   6

Can I rewrite that query in some other way so range can be dynamic such as

11-20
21-30
31-40

and counts the number of rows?

Thank you

 


Followup   July 27, 2005 - 6pm Central time zone:

if you can come up with a function f(x) such that f(x) returns what you want, sure.

EG:


for you 11-20, 21-30, 31-40 -- well

f(deptno) = trunc( (deptno-0.1)/10)

(assuming deptno is an integer) -- that'll bin up deptno 0..10, 11..20, 21..30 and so on into 
groups 0, 1, 2, 3, ....

 

5 stars   August 2, 2005 - 1pm Central time zone
Reviewer: A reader 
Tom,

I hope you can provide an insight to this.

table emp1 is shown below. 

EmpId  Week  Year  Day0 Day1  .....  Day14

100    20    2005   8   8        8
200    22    2003   0    0        8    
300    25    2004   8    8        0    
400    06    2005   0   8        8
500    08    2002   8   0        8

create table emp1(empid varchar2(3), week varchar2(2), year varchar2(4), day0 number(2), day1 
number(2), day2 number(2), day3 number(2), day4 number(2), day5 number(2), day6 number(2), day7 
number(2), day8 number(2), day9 number(2), day10 number(2), day11 number(2), day12 number(2), day13 
number(2), day14 number(2));

insert into emp1 values('100', '20', '2005', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
insert into emp1 values('200', '22', '2003', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
insert into emp1 values('300', '25', '2004', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 0);
insert into emp1 values('400', '06', '2005', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
insert into emp1 values('500', '08', '2002', 8, 0, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);

I am trying to select emp1 records as follows:

EmpId, Date of the day, Hours worked per day

Firstly, I have to calculate date of the day of a record (first day that corresponds to Day0) using 

week of the year and year. Then I have to increment the day by 1, 2 ...14 
to get the hours worked for each particular date

Example: Assuming that week 20 of 2005 is 05/07/2005. It corresponds to Day0 in the same record

Day1 column corresponds to the next day which is 05/08/2005. Day2 becomes 05/09/2005 and so on ... 

Then, I have to print individual rows for each empid as:

100 05/07/2005 8
100 05/08/2005 8 
.....
200 05/22/2003 0
200 05/23/2003 8 
.. and so on for all empid's ...


Thank you. 


Followup   August 2, 2005 - 2pm Central time zone:

oh no, columns where rows should be :(


and basically you are saying "i need ROWS where these rows should be!"


tell me, how do you turn 20 into a date? 

5 stars   August 2, 2005 - 2pm Central time zone
Reviewer: A reader 
Tom,

I should've explained it better. Week 20 of 2005, here should be translated to the first day of 
week 20 of 2005 (Assuming it is 05/07/2005). That corresponds to Day0 of that row. Day1 becomes 
05/08/2005 and so on ...

Is there a function or approach that can convert columns to rows? 


Followup   August 2, 2005 - 3pm Central time zone:

no, i mean -- what function/logic/algorithm are you using to figure out "week 20 is this day" 

5 stars   August 2, 2005 - 9pm Central time zone
Reviewer: A reader 
Tom,

Sorry, firstly, the date is not calculated the way I said above. It's not clear yet how the date is 
obtained. This issue is under review and I think I'll obtain date by joining empid with some table 
(say temp1). However, I am sure I will have to use date (such as 05/07/2005), associate it with 
Day0 column value. Day1 becomes 05/08/2005 and so on ..  However, I am trying to obtain a sql or 
pl/sql that can arrange the rows as described above. Any ideas? Thanks. 


Followup   August 3, 2005 - 10am Central time zone:

I cannot tell you how much I object to this model.  

storing "week" and "year" - UGH.

storing them in STRINGS - UGH UGH UGH.

storing things that should be cross record in record UGH to the power of 10.

I had to fix your inserts, they did not work, added day14 of zero.


ops$tkyte@ORA10G> with dates as
  2  (select to_date( '05/07/2005','mm/dd/yyyy')+level-1 dt, level-1 l from dual connect by level 
<= 15 )
  3  select empid, dt,
  4         case when l = 0 then day0
  5                  when l = 1 then day1
  6                  when l = 2 then day2
  7                          /* ... */
  8                  when l = 13 then day13
  9                  when l = 14 then day14
 10                  end data
 11    from (select * from emp1 where week = 20), dates
 12  /
 
EMP DT              DATA
--- --------- ----------
100 07-MAY-05          8
100 08-MAY-05          8
100 09-MAY-05          0
100 10-MAY-05
100 11-MAY-05
100 12-MAY-05
100 13-MAY-05
100 14-MAY-05
100 15-MAY-05
100 16-MAY-05
100 17-MAY-05
100 18-MAY-05
100 19-MAY-05
100 20-MAY-05          8
100 21-MAY-05          0
 
15 rows selected.


 

5 stars   August 3, 2005 - 3pm Central time zone
Reviewer: A reader 
Tom,

Thanks for the solution. I need some more help if you don't mind. The sql works excellently and I 
experimented with it. 

However, this question is based on a change of design here ... The emp1 table is joined with trn1 
table (empid ~ trnid) to obtain values x and y. x and y should be passed to a function that returns 
date. 

The emp1 table is like:

EmpId  Day0 Day1  .....  Day14

100    8   8        8
200    0    0        8    
300    8    8        0    
400    0   8        8
500    8   0        8

trn1 table is like:

trnid x y
100   3 18
200   4 19
300   5 20
400   6 21
500   7 22 

etc ...



create table emp1(empid varchar2(3),  day0 number(2), day1 number(2), day2 number(2), day3 
number(2), day4 number(2), day5 number(2), day6 number(2), day7 number(2), day8 number(2), day9 
number(2), day10 number(2), day11 number(2), day12 number(2), day13 number(2), day14 number(2));

insert into emp1 values('100', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 8);
insert into emp1 values('200', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 0);
insert into emp1 values('300', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 0, 0);
insert into emp1 values('400', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 8);
insert into emp1 values('500', 8, 0, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 8);



create table trn1(empid varchar2(3), x number(2), y number(2));

insert into trn1 values('100', 3, 18);
insert into trn1 values('200', 4, 19);
insert into trn1 values('300', 5, 20);
insert into trn1 values('400', 6, 21);
insert into trn1 values('500', 7, 22);



I used this function on just one row of emp1 (by hard coding x and y values).

I replaced 

with dates as
    (select to_date( '05/07/2005','mm/dd/yyyy')+level-1 dt, level-1 l from dual 
connect by level <= 15 )

with 

with dates as
    (select getXYDate(x,y)+level-1 dt, level-1 l from dual 
connect by level <= 15 )

However, I am trying to implement this on every row of emp1 by obtaining x and y from trn. There is 
no week or year in emp1 table. Any help? Thanks again.

 


Followup   August 3, 2005 - 6pm Central time zone:

I didn't think it was possible, but now I like this even less than before!  didn't think you could 
do that ;(


ops$tkyte@ORA10G> with dates as
  2  (select to_date( '05/07/2005','mm/dd/yyyy')+level-1 dt, level-1 l from dual 
connect by level <= 15 )
  3  select empid, dt,
  4         case when l = 0 then day0
  5                  when l = 1 then day1
  6                  when l = 2 then day2
  7                          /* ... */
  8                  when l = 13 then day13
  9                  when l = 14 then day14
 10                  end data
 11    from ( QUERY ), dates
 12  /

replace query with a join of emp with trn and apply the function in there. 

5 stars   August 3, 2005 - 7pm Central time zone
Reviewer: A reader 
Tom,

Sorry to bother you again. In my case, I think 
(select to_date( '05/07/2005','mm/dd/yyyy') will not help me anymore because I have to basically 
find dates for Day0 .. Day14 of every row in emp1 table. The first date (date that corresponds to 
Day0) for each record should be obtained using a function by passing X and Y values of trn table .. 
Because each record may have different x, y values.
If it's not achievable using this way, can you suggest an alternate approach. I am trying to make a 
function that would use a loop. Also, the data should be written to a text file once complete, in 
that case I think a procedure might help and if so, could you throw some light? Thanks for your 
patience. 


Followup   August 3, 2005 - 8pm Central time zone:

well, you just need to generate a set of 15 numbers (L)

and add them in later than.  No big change.  You have the "start_date" from the function right -- 
just add L to dt. 

5 stars   August 3, 2005 - 8pm Central time zone
Reviewer: A reader 
Ok, Can you please show that if possible? 


5 stars   August 3, 2005 - 9pm Central time zone
Reviewer: A reader 
Tom,

I tried this and am getting an error: ORA-00904: "DAY13": invalid identifier

WITH DATES AS 
(SELECT  FUNC_XY(17,2003)+level-1 dt, level-1 l FROM DUAL
 connect by level <= 15)
 select empid, day0, day14, x, y, dt,
    case when l = 0 then day0
         when l = 1 then day1
         when l = 2 then day2
         when l = 3 then day3
         when l = 4 then day4
         when l = 5 then day5
         when l = 6 then day6
         when l = 7 then day7
         when l = 8 then day8
         when l = 9 then day9
         when l = 10 then day10
         when l = 11 then day11
         when l = 12 then day12
         when l = 13 then day13
         when l = 14 then day14
         end data
  from (select emp1.empid, day0, day14, x, y from emp1, trn1 where emp1.empid = trn1.empid), dates
/

As said before ... I also have to use x and y instead of 17 and 2003 in order to compute it for 
every row.
 


Followup   August 4, 2005 - 8am Central time zone:

yeah, well -- you didn't select it out in the inline view.  fix that.


look the concept is thus:


with some_rows as ( select level-1 l from dual connect by level <= 15 )
select a.empid, a.dt+l, case when l=0 then a.day0
                           ...
                           when l=14 then a.day14
                        end data
  from some_rows,
      (select emp1.empid, func_xy(trn1.x, trn1.y) dt,
              emp1.day0, emp1.day1, .... <ALL OF THE DAYS>, emp1.day14
         from emp1, trn1
        where emp1.empid = trn1.empno )

 

5 stars   August 4, 2005 - 9am Central time zone
Reviewer: A reader 
Tom,

Here, the sql is using a.empid, a.dt+l ... 

whereas the inner sql is using emp1.day0, trn1.empid , etc ... My real inner sql well uses some 
more columns adn joins as well. When this gave me error, I just substituted emp1.day0, emp1.day14 
etc ... with day0, day14 etc .. and it worked. However, when there are several joins with alias 
names, How should it be done? 

To make it a bit clear, this sql looks similar to:

select emp1.empid, emp1.day0 from some_rows, (select emp1.empid, emp1.day0) ...

Any idea how to select from select and still use multiple joins etc ... Hope I am clear 


Followup   August 4, 2005 - 9am Central time zone:

you can join as much as you WANT in the inline views.  

Sorry, I cannot go further with this one, I've shown the technique -- it is just a pivot to turn 
COLUMNS THAT SHOULD HAVE BEEN ROWS into rows -- very common. 

5 stars   August 4, 2005 - 9am Central time zone
Reviewer: A reader 
Please ignore above post. 


5 stars I need some help   August 9, 2005 - 10am Central time zone
Reviewer: Carlos 
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('11/15/2004 17:42:56', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('11/18/2004 15:09:19', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('11/24/2004 09:38:15', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('11/30/2004 04:28:09', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('01/03/2005 14:36:24', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('01/05/2005 10:04:15', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('01/07/2005 08:54:59', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('01/10/2005 10:54:07', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('01/12/2005 10:13:13', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('01/18/2005 04:23:41', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('03/03/2005 03:15:05', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/09/2005 18:54:11', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('03/11/2005 13:25:40', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/15/2005 21:47:41', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('03/22/2005 20:27:03', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/29/2005 17:05:04', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('03/22/2005 20:27:15', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/30/2005 08:53:13', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('03/30/2005 13:16:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('04/16/2005 13:40:44', 
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('03/30/2005 15:08:39', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('04/16/2005 13:40:44', 
'MM/DD/YYYY HH24:MI:SS'));
COMMIT;


Tom, 

I hope you can help since I have been struggling with this report.  I would like to get something 
like this...


IN ORDER WORDS  I WANT TO GET WHEN IT FIRST WAS LOGED  IN INDATE AND WHEN IT WAS LAST LOGed IN 
OUT_DATE. SORT OF LIKE MIN AND MAX. In this case for example for the month of March, however it can 
be for any given Month.  Any Ideas how I can accomplish that? 

IN_DATE               OUT_DATE
3/22/2005 8:27:03 PM        3/30/2005 3:08:39 PM 

  ----from the table above for the month of March 


Followup   August 9, 2005 - 10am Central time zone:

insufficient detail here, why won't min/max work for you for example.

but I don't understand the logic behind the two values you say you want, I don't get how you 
arrived at them. 

5 stars This is what I get   August 9, 2005 - 10am Central time zone
Reviewer: A reader 
 select in_date, out_date
from lou_date 
where id = 201048
and ((out_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')) OR
(in_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS'))) 

I get the following:

In_date                  out_date

3/22/2005 8:27:03 PM  3/29/2005 5:05:04 PM
3/30/2005 3:08:39 PM  4/16/2005 1:40:44 PM 


Followup   August 9, 2005 - 11am Central time zone:

ok, 

Insert into LOU_DATE
   (IN_DATE, OUT_DATE)
 Values
   (TO_DATE('03/11/2005 13:25:40', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/15/2005 
21:47:41', 'MM/DD/YYYY HH24:MI:SS'));

why didn't you get that row.  for example. 

4 stars   August 9, 2005 - 11am Central time zone
Reviewer: A reader 
SQL Statement which produced this data:
  select in_date, out_date 
  from lou_date 
  where ((out_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS') 
  and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')) OR 
  (in_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS') 
  and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS'))) 
  order by out_date

3/3/2005 3:15:05 AM    3/9/2005 6:54:11 PM
3/11/2005 1:25:40 PM    3/15/2005 9:47:41 PM
3/11/2005 1:25:40 PM    3/15/2005 9:47:41 PM
3/22/2005 8:27:03 PM    3/29/2005 5:05:04 PM
3/22/2005 8:27:15 PM    3/30/2005 8:53:13 AM
3/30/2005 1:16:00 PM    4/16/2005 1:40:44 PM
3/30/2005 3:08:39 PM    4/16/2005 1:40:44 PM

I guess my question is I would like to that when
I get records with beyond march it should be replace
with blank or Null...since I can't charged him/her
for April...
  


Followup   August 9, 2005 - 12pm Central time zone:

I am so not following you here. 

5 stars   August 9, 2005 - 12pm Central time zone
Reviewer: A reader 
Tom,

Pretend that you are charging someone for a particular month. Let's say the month of March. So you 
would like to do a query that reflect just that..so a group of dates are given to you and in that 
group of  dates you have multiple records with the same id. Also some records containts records 
that inintiated in march but came back in April. Here is are the examples..but it can work with any 
dates...

example 1.

   in_date          out_date
 3/22/2005 8:27:15 PM    3/30/2005 8:53:13 AM
 3/30/2005 1:16:00 PM    4/16/2005 1:40:44 PM
 

would like to see:
   in_date          out_date
 3/22/2005 8:27:15 PM     3/30/2005 1:16:00 PM  
  
 example 2
 
In_date               out_date
 3/3/2005 3:15:05 AM  3/9/2005 6:54:11 PM
 3/11/2005 1:25:40 PM 3/15/2005 9:47:41 PM

would like to see:

In_date               out_date
3/3/2005 3:15:05 AM   3/15/2005 9:47:41 PM 


Followup   August 9, 2005 - 12pm Central time zone:

begs the question


in_date         out_date    
20-feb-2005     15-apr-2005

or

   in_date          out_date
 3/22/2005 8:27:15 PM    3/25/2005 8:53:13 AM
 3/30/2005 1:16:00 PM    4/16/2005 1:40:44 PM

what then.  Be able to clearly specify the "goal" or the "algorithm" usually leads us straight to 
the query itself.  There are so many ambiguities here.  Pretend you were actually documenting this 
for a junior programmer to program.  Give them the specifications.  In gory detail.

please don't just answer these two what thens -- think of all of the cases (cause I'll just keep on 
coming back with "what then" if you don't)

Remember -- I know NOTHING about your data, not a thing.  This progression from 

... I WANT TO GET WHEN IT FIRST WAS LOGED  IN INDATE AND WHEN IT WAS 
LAST LOGed IN OUT_DATE. SORT OF LIKE MIN AND MAX....

to this has been 'strange' to say the least. 

5 stars Full explanation of requirements   August 9, 2005 - 3pm Central time zone
Reviewer: A reader 
Sorry for the misunderstanding Tom. Here is the full requirements. I hope I can explain it this 
time.

The report is a billing report and the it goes as follows:
For example for the month of March we have to bill as 
in the following way:

out_date    date_in        Bill

2/23        3/2        3/1 to 3/2

3/1        3/3        3/1 to 3/3

3/1        4/14        3/1 to 3/31

3/1            -        3/1 to 3/31

2/23        -        3/1 to 3/31 


Followup   August 9, 2005 - 3pm Central time zone:

well, i hope you give your programmers more detail.  Here is the best I'll do

ops$tkyte@ORA9IR1> select t.*,
  2         greatest( in_date, to_date('mar-2005','mon-yyyy') ) fixed_in_date,
  3         least( nvl(out_date,to_date('3000','yyyy')),  last_day( to_date( 'mar-2005', 'mon-yyyy' 
) ) ) fixed_out_date
  4    from t
  5   where in_date < last_day( to_date( 'mar-2005', 'mon-yyyy' ) )+1
  6     and out_date >= to_date( 'mar-2005', 'mon-yyyy' );

 
IN_DATE   OUT_DATE  FIXED_IN_ FIXED_OUT
--------- --------- --------- ---------
03-MAR-05 09-MAR-05 03-MAR-05 09-MAR-05
11-MAR-05 15-MAR-05 11-MAR-05 15-MAR-05
22-MAR-05 29-MAR-05 22-MAR-05 29-MAR-05
22-MAR-05 30-MAR-05 22-MAR-05 30-MAR-05
30-MAR-05 16-APR-05 30-MAR-05 31-MAR-05
30-MAR-05 16-APR-05 30-MAR-05 31-MAR-05
 
6 rows selected.


predicate finds records that overlap march.

select adjusts the begin/end dates. 

5 stars Thank!!!   August 10, 2005 - 12pm Central time zone
Reviewer: A reader 
Tom,

One more request. I would like to start the report with
the first time it went out. That is to say...

how it looks now with your help...

  fix_in                fix_out
3/22/2005 8:27:03 PM    3/29/2005 5:05:04 PM
3/30/2005 3:08:39 PM    3/31/2005


how the data looks


  fix_in            fix_out
3/22/2005 8:27:03 PM    3/29/2005 5:05:04 PM---first went out
3/30/2005 3:08:39 PM    4/16/2005 1:40:44 PM

How  I would like to see it since we begin billing from 
the first date the truck went out.

  fix_in         fix_out
3/29/2005 5:05:04 PM    3/30/2005 3:08:39 PM
3/30/2005 3:08:39 PM    3/31/2005

Thanks again Tom 


Followup   August 10, 2005 - 1pm Central time zone:

try to work it out yourself -- please.  

why?  because I'll do this little thing and it'll be "oh yeah, one more thing, when the data looks 
like this...."

specifying requirements is like the most important thing in the world -- it is key, it is crucial.  
It is obivous you know what you want (well, maybe -- it seems to change over time) but I don't "get 
it" myself.  Your simple example here with two rows begs so so many questions, I don't even want to 
get started.


You have lag() and lead() at your disposal, the probably come into play here.  check them out. 

5 stars Thanks for help !   August 11, 2005 - 3pm Central time zone
Reviewer: A reader 
The report is kind of tricky. Specially when one of the dates originates in Feb. and the other pair 
falls in march.
 


5 stars Hooked on Analytics worked for me!!   August 22, 2005 - 11am Central time zone
Reviewer: Greg from Toronto
I think I need to find a meeting group to help with my addiction ... I think I'm addicted to 
analytics .. :\

Finally got a chance to read chapter 12 in "Expert Oracle" ... awesome!!  4 big, hairy Thumbs up!! 
heh

But I got a question ... an "odd" behaviour that I don't understand ... was wondering if you could 
help explain:

Test Script:
================
drop table junk2;
drop sequence seq_junk2;

create sequence seq_junk2;

create table junk2
  (inv_num   number,
   cli_num   number,
   user_id   number)
/

insert into junk2
  values ( 123, 456, null );
insert into junk2
  values ( 123, 678, null );
insert into junk2
  values ( 234, 456, null );
insert into junk2
  values ( 234, 678, null );

commit;

break on cli_num skip 1

select * from junk2;

select inv_num, cli_num,
       NVL ( user_id, 999 ) chk1,
       NVL2 ( user_id, 'NOT NULL', 'NULL' ) chk2,
       seq_junk2.nextval seq,
       FIRST_VALUE ( NVL ( user_id, seq_junk2.nextval ) )
                    OVER ( PARTITION BY cli_num ) user_id
  from junk2
/
=====================

The final query shows this:


   INV_NUM    CLI_NUM       CHK1 CHK2            SEQ    USER_ID
---------- ---------- ---------- -------- ---------- ----------
       123        456        999 NULL              1
       234                   999 NULL              2

       123        678        999 NULL              3          2
       234                   999 NULL              4          2


4 rows selected.

and I'm kinda confused .. it appears that the analytic functions are not "processing" that sequence 
... how do sequences and analytics work together?? (if at all??)

(In short, this is a simplified example of a bigger problem I tripped over.  I'm trying to assign 
new user_ids for existing clients, but only want 1 user_id assigned per client.  Trick is, each 
client can be associated with more 1 investment ... so I have multiple rows with same client, but I 
want the same user_id assigned. kind of: "Has this client got an id yet? if not, give him a new 
one, otherwise display the one he's already been assigned".)

FIRST_VALUE and LAST_VALUE seemed the logical choice ... 
The interesting thing is, when I use DBMS_RANDOM.VALUE (to assign a random PIN to start with) ... 
it works fine, what am I missing/forgetting about sequences that changes their behaviour in this 
regards?)
 


Followup   August 23, 2005 - 8am Central time zone:

that will be a tricky one, lots of assumptions on orders of rows processed and such.

that should throw an ora-2287 in my opinion.

I cannot see a safe way to do that without writing a plsql function and performing a lookup off to 
the side by cli_num 

4 stars Sorry, I don't understand ...   August 23, 2005 - 11am Central time zone
Reviewer: Greg from Toronto
you wrote:
"that will be a tricky one, lots of assumptions on orders of rows processed and such."

I don't understand what assumptions I'm making ... in my example, I just got 4 rows, I don't care 
what order they come back in, just so long as it deals with them in "groups of cli_nums" .. (hence 
the partition by cli_num portion) ... if I "lose" sequence numbers, that's fine, too ... I don't 
care about gaps in the sequence or "missing userids" ... 

The only behaviour I'm seeing, is that the analytic function doesn't seem to be working with the 
sequence properly ... 

I guess I can simplify the question even further:

Why does the following query return "NULL" ?

SQL > select first_value ( seq_junk2.nextval ) over ( )
  2  from dual
  3  /
------more------

FIRST_VALUE(SEQ_JUNK2.NEXTVAL)OVER()
------------------------------------


1 row selected.

(with a "normal" sequence - nothing fancy):

SQL > select seq_junk2.nextval from dual;
------more------

   NEXTVAL
----------
        29

1 row selected.
 


Followup   August 24, 2005 - 8am Central time zone:

as i said, i believe it should be raising an error (I have it on my list of things to file when I 
get back in town).

I cannot make it work, I cannot think of a way to do it in a single statement, short of writing a 
user defined function. 

4 stars Connect by with self referenced parent   August 23, 2005 - 12pm Central time zone
Reviewer: Joe from Reston, VA
CONNECT BY works great but I've run into a problem when the ultimate parent is referenced in the 
parent record.  e.g., date looks like:
SQL> select * from t;

    OBJ_ID  PARENT_ID
---------- ----------
         1          1
         2          1
         3          1
         4          2
         5          4

But... using connect by generates an error..

SQL> select lpad(' ', 2*(level-1)) ||level "LEVEL",t.obj_id, t.parent_id
  2  from t
  3  connect by t.parent_id = prior t.obj_id;
ERROR:
ORA-01436: CONNECT BY loop in user data

If parent_id is null where obj_id = 1, then it's okay.  Any suggestion on how to handle the other 
case?  I'm stumped.
 


5 stars Solution for connect by   August 23, 2005 - 5pm Central time zone
Reviewer: Logan Palanisamy from Sunnyvale, CA USA
SQL> select lpad(' ', 2*(level-1)) ||level "LEVEL",t.obj_id, t.parent_id
  2  from t
  3  connect by t.parent_id = prior t.obj_id and t.parent_id <> t.obj_id;

LEVEL                    OBJ_ID  PARENT_ID
-------------------- ---------- ----------
1                             1          1
  2                           2          1
    3                         4          2
      4                       5          4
  2                           3          1
1                             2          1
  2                           4          2
    3                         5          4
1                             3          1
1                             4          2
  2                           5          4
1                             5          4

12 rows selected.
 


5 stars re:Solution for connect by   August 24, 2005 - 8am Central time zone
Reviewer: Joe from Reston, VA
Thanks Logan.  Often the solution is so simple!  Thanks. 


5 stars Seq problem   August 24, 2005 - 11am Central time zone
Reviewer: Bob B from Albany, NY
SELECT
  A.*,
  seq_junk2.currval CURR_SEQ,
  seq_junk2.nextval - ROWNUM + VAL SEQ
FROM (
SELECT 
  inv_num, 
  cli_num,
  NVL ( user_id, 999 ) chk1,
  NVL2 ( user_id, 'NOT NULL', 'NULL' ) chk2,
  DENSE_RANK() OVER ( ORDER BY CLI_NUM ) VAL
FROM JUNK2
) A

Might be a starting point.  It works on the following ASSUMPTION: ROWNUM corresponds to the number 
of times the sequence has been called.  As Tom stated, this assumption can easily go out the window 
(throw an analytic function or an order by on the outer query for a simple example).

A safer solution might be to run two updates.  Update 1 will give a unique id to each null user id. 
 Update 2 will update the user id to the min or max user id for that cli_num.  A little overhead, 
but safer and simpler than the aforementioned alternative. 


5 stars Still confused ... but working on it ...   August 24, 2005 - 1pm Central time zone
Reviewer: Greg from Toronto
Thanks, Bob!!  Yeah, that does exactly what I wanted it to do, (but still doesn't really explain 
the "why" part) ... 

problem is, it looks like this is more a question on sequences now than analytics, so I'll see if I 
can find a more appropriate thread to continue this on ..

Thanks!!
 


4 stars A slight twist on lag/lead   September 1, 2005 - 11am Central time zone
Reviewer: Sudha Bhagavatula from Buffalo, NY
That was useful to me. Could do a lot of queries easily. However I'm stuck at this point.

I have data like this:

subr_id    dep_nbr    grp    eff_date     term_date
1001       001        2112   01/01/2000   12/31/2000
1001       001        2112   01/01/2001   06/30/2001
1001       001        2112   07/01/2001   12/31/2001
1001       001        7552   01/01/2003   12/31/2003
1001       001        2112   06/30/2004   12/31/9999

I want my output to look like this:

subr_id    dep_nbr    grp    eff_date     term_date
1001       001        2112   01/01/2000   12/31/2001
1001       001        7552   01/01/2003   12/31/2003
1001       001        2112   06/30/2004   12/31/9999

How do I achieve this ?
 


Followup   September 1, 2005 - 3pm Central time zone:

well, you should start by describing the logic from getting from A to B first.

otherwise it is just text.  what are the rules that got you from inputs to outputs.

tell me the procedural algorithm you would use for example. 

3 stars Rules from A to B   September 2, 2005 - 9am Central time zone
Reviewer: Sudha Bhagavatula from Buffalo, NY
A member is enrolled in a group for a timeframe. For all contiguous time frames for a group I can 
take the min(eff_date) and max(term_date). For each break in group a new row with min(eff_date) and 
max(term_date) again. So say a member was enrolled in a group from 01/01/2001 to 12/31/2001 and 
then again with the same group from 01/01/2005 to 06/30/2005 then I need 2 rows for this member
with the dates as said just now. This is the sql that I'm running, hopefully I'm on the right track 
but am stuck at this point:

SELECT   SUBR_ID, 
         DEP_NBR,
         GRP,
         LAG_EFF_DATE,
         LEAD_EFF_DATE,
         EFF_DATE,
         TERM_DATE,
         LAG_TERM_DATE,
         LEAD_TERM_DATE,
         DECODE( LEAD_GRP, GRP, 1, 0 ) FIRST_OF_SET,
         DECODE( LAG_GRP, GRP, 1, 0 ) LAST_OF_SET
  FROM   (SELECT   M.SUBR_ID,
                   M.DEP_NBR,
                   LAG(GRP_NBR||SUB_GRP) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY 
CJ.EFF_DATE) LAG_GRP,
                   LEAD(GRP_NBR||SUB_GRP) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY 
CJ.EFF_DATE) LEAD_GRP,                                                    
                   GRP_NBR||SUB_GRP GRP,
                   CJ.EFF_DATE,
                   CJ.TERM_DATE,
                   LAG(CJ.EFF_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) 
LAG_EFF_DATE,
                   LEAD(CJ.EFF_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) 
LEAD_EFF_DATE,
                   LAG(CJ.TERM_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) 
LAG_TERM_DATE,
                   LEAD(CJ.TERM_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) 
LEAD_TERM_DATE                                                            
            FROM   DW.T_MEMBER_GROUP_JUNCTION CJ,
                   BCBS.T_GROUP_DIMENSION G,
                   BCBS.T_MEMBER_DIMENSION M 
           WHERE   CJ.GRP_DIM_ID = G.GRP_DIM_ID 
             AND   CJ.MBR_DIM_ID = M.MBR_DIM_ID         
             AND   M.DEP_NBR != '000' 
             AND   G.BENE_PKG IS NOT NULL)
 WHERE   LAG_GRP IS NULL 
    OR   LEAD_GRP IS NULL 
    OR   LEAD_GRP <> GRP 
    OR   LAG_GRP <> GRP 

Thanks for your reply. 


Followup   September 3, 2005 - 7am Central time zone:

you know, without a table, rows and something more concrete....  I have no comment. 

2 stars More detail   September 4, 2005 - 10pm Central time zone
Reviewer: Sudha Bhagavatula from buffalo, NY
 have 3 tables:

Member_dimension
Group_Dimension
Member_Group_Junction

Member_Dimension :- columns are mbr_dim_id, subr_id, dep_nbr

Group dimension :- columns are grp_dim_id, grp_nbr, sub_grp

Member_Group_Junction :- columns are mbr_dim_id, grp_dim_id, eff_date, term_date

I have to create one row for each contiguous dates of enrollment with a new row for a new group or 
a break in date.

Suppose a member (subr_id = 1001, dep_nbr = 001) is enrolled with a group called 001 from 
01/01/2001 till 06/30/2001, he then changes group to 002 for the period 07/01/2001 till 12/31/2001. 
He enrolls with the same group 002 from 01/01/2002 till 06/30/2002 with a change in benefits. He 
then gets transferred to some other city or changes jobs. He joins back with the group 001 from 
09/30/2003 till 11/30/2003 and quits again. joins back with the same group 001 from 01/01/2204 till 
present.The data in the junction table will be like this:


mbr_dim_id    grp_dim_id   eff_date      term_date

1             1            01/01/2001    06/30/2001
1             2            07/01/2001    12/31/2001
1             2            01/01/2002    06/30/2002
1             1            09/30/2003    11/30/2003
1             1            01/01/2004    12/31/9999  

My output should be like this:

mbr_dim_id    grp_dim_id   eff_date      term_date

1             1            01/01/2001    06/30/2001
1             2            07/01/2001    06/30/2002
1             1            09/30/2003    11/30/2003
1             1            01/01/2004    12/31/9999  

For each change in group or a break in the contiguity of the dates I should get a new row. The 
junction table is joined to the dimension with the respective dim_ids.

Hope I'm clearer this time.

Thanks
Sudha



 


Followup   September 5, 2005 - 10am Central time zone:

tell you what, see
http://www.oracle.com/technology/oramag/oracle/04-mar/o24asktom.html
it shows a technique in the analytics to the rescue article that will be useful for grouping ranges 
a records using the LAG() function.



But, you need to read the text that you are supposed to read before putting an example here.

It is something I think I say a lot.


<quote>
If your followup requires a response that might include a query, you had better supply very very 
simple create tables and insert statements. I cannot create a table and populate it for each and 
every question. The SMALLEST create table possible (no tablespaces, no schema names, just like I do 
in my examples for you)
</quote>

that is a direct cut and paste 

4 stars distinct last_value   September 6, 2005 - 4am Central time zone
Reviewer: Putchi from Stockholm, Sweden
When using last_value I am usually only intrested in the last value, hence I need a distinct in the 
select to get it. It gives what I want but it seems that the database hase to do the work twice, 
first a window sort and after that a unique sort. Is there any way to avoid the distinct but still 
only get one row per partion key?

create table a (num number(2), var1 varchar2(10), var2 varchar2(10));

insert into a values (1,'a','A');
insert into a values (2,'b','A');
insert into a values (3,'c','A');
insert into a values (1,'a','B');
insert into a values (2,'b','B');
insert into a values (3,'c','B');

commit;

SQL> select distinct
  2         var2
  3        ,last_value(var1) over (partition by var2 order by num
  4                                rows between unbounded preceding and unbounded following) var1
  5  from a;

VAR2       VAR1
---------- ----------
A          c
B          c


Körschema
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (UNIQUE)
   2    1     WINDOW (SORT)
   3    2       TABLE ACCESS (FULL) OF 'A'

 


Followup   September 6, 2005 - 8am Central time zone:

nope, analytics are not aggregates, aggregates are not analytics. 


A trick you can use to skip one or the other step is:


ops$tkyte@ORA817DEV> select var2,
  2         substr( max(to_char( num,'fm0000000000') || var1), 11 ) data
  3    from a
  4   group by var2
  5  /

VAR2       DATA
---------- -----------
A          c
B          c


 

3 stars Analytics to the rescue   September 6, 2005 - 11am Central time zone
Reviewer: Sudha Bhagavatula from Buffalo, NY
Read that article. Helped me, but now I have another twist.

Create table contracts (subr_id varchar2(15), dep_nbr varchar2(3), grp_nbr varchar2(12), eff_date 
date, term_date date)

insert into contracts values ('1001', '001', '2112', to_date('01/01/2000','mm/dd/yyyy'), 
to_date('12/31/2000','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '2112', to_date('01/01/2001','mm/dd/yyyy'), 
to_date('06/30/2001','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '2112', to_date('07/01/2001','mm/dd/yyyy'), 
to_date('12/31/2001','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '7552', to_date('01/01/2003','mm/dd/yyyy'), 
to_date('12/31/2003','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '2112', to_date('01/01/2004','mm/dd/yyyy'), 
to_date('12/31/9999','mm/dd/yyyy'));


I ran this query to identify breaks in groups and dates for the above table:

select subr_id, dep_nbr, grp,
       min_eff_date, 
       max_term_date
  from      
(select subr_id, dep_nbr, grp,
       min(eff_date) min_eff_date, 
       max(term_date) max_term_date
  from 
(select subr_id, dep_nbr, eff_date, term_date, grp,
       max(rn) 
         over(partition by subr_id, dep_nbr order by eff_date) max_rn
  from
(select subr_id, dep_nbr, eff_date, term_date, grp,
       (case
       when eff_date-lag_term_date > 1
            or lag_term_date is null
            or lag_grp_nbr is null
            or lag_grp_nbr <> grp
       then row_num
        end) rn
  from (
select subr_id, dep_nbr, eff_date, term_date, grp_nbr grp,
       lag(term_date) 
          over (partition by subr_id, dep_nbr order by eff_date) lag_term_date,
       lag(grp_nbr||sub_grp) 
          over (partition by subr_id, dep_nbr order by eff_date) lag_grp_nbr,   
       row_number() 
          over (partition by subr_id, dep_nbr order by eff_date) row_num
  from contracts         )))
group by subr_id, dep_nbr, grp, max_rn )      
 order by subr_id, dep_nbr, min_eff_date

This gave me the output as :

subr_id    dep_nbr    grp    eff_date     term_date
1001       001        2112   01/01/2000   12/31/2001
1001       001        7552   01/01/2003   12/31/2003
1001       001        2112   06/30/2004   12/31/9999


I now have another table :

create table contract_pcp_junction (subr_id varchar2(15), dep_nbr varchar2(3), pcp_id varchar2(12), 
eff_date date, term_date date)

insert into contract_pcp_junction values('1001','001','123765', to_date('07/01/2000','mm/dd/yyyy') 
to_date('06/30/2001','mm/dd/yyyy');
insert into contract_pcp_junction values('1001','001','155165', to_date('01/01/2003','mm/dd/yyyy') 
to_date('12/31/9999','mm/dd/yyyy');

This table identifies the provider coverage for each member. I need to identify the breaks in 
coverage with regards to the contracts.

Now as per the data above this member does not have a pcp from 01/01/2000 to 06/30/2000 and again 
from 07/01/2001 to 12/31/2001.

I need to insert the breaks into another table. This table needs to have the subr_id, dep_nbr, grp 
and eff_date, term_date.

create table contract_pcp_breaks (subr_id varchar2(15), dep_nbr varchar2(3), grp_nbr varchar2(12), 
eff_date date, term_date date)

This table needs to have the data for the breaks

subr_id    dep_nbr   grp_nbr   eff_date    term_date

1001       001       2112      01/01/2000  06/30/2000
1001       001       2112      07/01/2001  12/31/2001


How do I do that and hopefully I have the necessary scripts for you to work w1th.

Thanks a lot for your patience with this.

--Sudha 


Followup   September 6, 2005 - 8pm Central time zone:

yah, I have scripts, but no real idea how these tables relate.  Your query looks overly complex for 
the single table.

cannot you take your data, join it, get some "flat relation" that just simply using lag() on will 
solve the problem?

(please remember, you have been looking at this for hours.  To you this data is natural.  to 
everyone else, it is just bits and bytes on the screen) 

3 stars Combining two tables   September 9, 2005 - 6am Central time zone
Reviewer: Putchi from Stockholm, Sweden
Hi Tom!

I want to combine from/to history values from two tables into one sequence like this:

create table a (a varchar2(2)
               ,from_date  date
               ,to_date    date);

create table b (b varchar2(2)
               ,from_date  date
               ,to_date    date);

insert into a ( a, from_date, to_date ) values ( 
'a1',  to_date( '01/13/2005', 'mm/dd/yyyy'),  to_date('02/10/2005', 'mm/dd/yyyy')); 
insert into a ( a, from_date, to_date ) values ( 
'a2',  to_date( '02/10/2005', 'mm/dd/yyyy'),  to_date( '05/01/2005', 'mm/dd/yyyy')); 
insert into a ( a, from_date, to_date ) values ( 
'a3',  to_date( '05/01/2005', 'mm/dd/yyyy'),  to_date( '08/12/2005', 'mm/dd/yyyy')); 
insert into b ( b, from_date, to_date ) values ( 
'b1',  to_date( '01/13/2005', 'mm/dd/yyyy'),  to_date( '01/22/2005', 'mm/dd/yyyy')); 
insert into b ( b, from_date, to_date ) values ( 
'b2',  to_date( '01/22/2005', 'mm/dd/yyyy'),  to_date( '04/01/2005', 'mm/dd/yyyy')); 
insert into b ( b, from_date, to_date ) values ( 
'b3',  to_date( '04/01/2005', 'mm/dd/yyyy'),  to_date( '09/07/2005', 'mm/dd/yyyy')); 
commit;


select * from ("Magic");

A  B  FROM_DATE  TO_DATE
-- -- ---------- ----------
a1 b1 2005-01-13 2005-01-22
a1 b2 2005-01-22 2005-02-10
a2 b2 2005-02-10 2005-04-01
a2 b3 2005-04-01 2005-05-01
a3 b3 2005-05-01 2005-08-12

Is it possible? 


Followup   September 9, 2005 - 8am Central time zone:

ops$tkyte@ORA10G> select a.* , b.*,
  2         greatest(a.from_date,b.from_date),
  3             least(a.to_date,b.to_date)
  4    from a, b
  5   where a.from_date <=  b.to_date
  6     and a.to_date >= b.from_date;
 
A  FROM_DATE TO_DATE   B  FROM_DATE TO_DATE   GREATEST( LEAST(A.T
-- --------- --------- -- --------- --------- --------- ---------
a1 13-JAN-05 10-FEB-05 b1 13-JAN-05 22-JAN-05 13-JAN-05 22-JAN-05
a1 13-JAN-05 10-FEB-05 b2 22-JAN-05 01-APR-05 22-JAN-05 10-FEB-05
a2 10-FEB-05 01-MAY-05 b2 22-JAN-05 01-APR-05 10-FEB-05 01-APR-05
a2 10-FEB-05 01-MAY-05 b3 01-APR-05 07-SEP-05 01-APR-05 01-MAY-05
a3 01-MAY-05 12-AUG-05 b3 01-APR-05 07-SEP-05 01-MAY-05 12-AUG-05


It won't be blindingly fast on huge things I would guess... 

3 stars   September 9, 2005 - 9am Central time zone
Reviewer: Putchi from Stockholm, Sweden
OK, I will try if it works, the real tables will have hundred of thousands records. I tried this 
myself, but I couldn't come up with something that filled in the "null" values.

SQL> select a,b,from_date,lead(from_date) over (order by from_date)
  2  from (
  3  select a,null b,from_date,to_date from a
  4  union all
  5  select null a,b,from_date,to_date from b
  6  order by from_date
  7  );

A  B  FROM_DATE  LEAD(FROM_
-- -- ---------- ----------
a1    2005-01-13 2005-01-13
   b1 2005-01-13 2005-01-22
   b2 2005-01-22 2005-02-10
a2    2005-02-10 2005-04-01
   b3 2005-04-01 2005-05-01
a3    2005-05-01

 


Followup   September 9, 2005 - 9am Central time zone:

that query won't work -- you need to join. 

5 stars How to get the 1ST row of this distinct value in a single SELECT   September 16, 2005 - 11am Central time zone
Reviewer: Sean Chang from VA,USA
Thank you, Tom.
 
I have read the analytic function for a while, but still
can't figure out a way to select the first row of a distinct
column value in a single SELECT statement. I.E
>>by running below Create and Insert


create  table  INV (
   inv# number(7),
   add_time     date ,
   inv_type varchar2(10),
   amount   number(8,2));

insert into inv values(1, sysdate-1, 'CASH', 100);
insert into inv values(1, sysdate, 'VISA', 200);
insert into inv values(1, sysdate+1, 'COD', 100);
insert into inv values(1, sysdate, 'VISA', 200);
insert into inv values(2, sysdate, 'MC', 10);
insert into inv values(3, sysdate-1, 'AMEX', 30);
insert into inv values(3, sysdate, 'CASH', 30);


I can get the first row of distinct INV# this way:

select * from (select a.*,
              rank() over (partition by inv# order by add_time) time_order
               from inv  a)  where time_order=1;

But how can I acheive this by a single SELECT statement?
The reason is that we have lots of tables we only need
look the very first row of the same Column value and I
don't want endup with lots of in-line views in SELECT
statement.

 


Followup   September 16, 2005 - 1pm Central time zone:

that is a single select.

why not? (on the lots of in-line views).  If you think they are evil - then you wouldn't like my 
code ;)

 

5 stars Is analytical fitting in this situtation?   October 3, 2005 - 10am Central time zone
Reviewer: A reader 
select  b.damage_inspection_date,
        b.damage_inspection_by
       ,b.status
       ,NVL(a.cnt,0) CNT

from
     (select aa.damage_inspection_date,
             aa.damage_inspection_by,
             bb.status 
        from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, 
gc.damage_inspection_by
                from gate_damages gd, gate_containers gc
               where gd.gate_id = gc.gate_id
             ) aa,
             (select * 
                from (select 'MAJOR' STATUS from dual
                      union all
                      select 'MINOR' STATUS from dual
                      union all
                      select 'TOTAL' STATUS from dual
                     )
             ) bb
     )b,


     ((SELECT  damage_inspection_date,
               damage_inspection_by,
               Status,
               cnt
          FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
                       c.damage_inspection_by,
                       'MAJOR' STATUS,
                       count(distinct c.gate_id) cnt
                  from gate_containers c,
                       gate_damages d
                 where c.gate_id = d.gate_id and
                       d.damage_type_code = 'A'
              group by trunc(c.damage_inspection_date),c.damage_inspection_by
                 
                UNION ALL

                select trunc(g.damage_inspection_date) damage_inspection_date,
                       g.damage_inspection_by,
                       'MINOR' STATUS,
                       count(distinct g.gate_id) cnt
                  from gate_containers g,
                       gate_damages z
                 where g.gate_id = z.gate_id and
                       z.damage_type_code = 'F'
              group by trunc(g.damage_inspection_date),g.damage_inspection_by
             
                UNION ALL

                select  trunc(ab.damage_inspection_date) damage_inspection_date,
                        ab.damage_inspection_by,
                       'TOTAL' STATUS,
                       count(distinct ab.gate_id) cnt
                  from gate_containers ab,
                       gate_damages ac
                 where ab.gate_id = ac.gate_id
              group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
             
               )
               group by damage_inspection_date, damage_inspection_by, status, cnt
               )
     ) a

where b.damage_inspection_by = a.damage_inspection_by(+)
  and b.damage_inspection_date = a.damage_inspection_date(+)
  and b.status = a.status(+); 


Followup   October 3, 2005 - 11am Central time zone:

((SELECT  damage_inspection_date,
               damage_inspection_by,
               Status,
               cnt
          FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
                       c.damage_inspection_by,
                       'MAJOR' STATUS,
                       count(distinct c.gate_id) cnt
                  from gate_containers c,
                       gate_damages d
                 where c.gate_id = d.gate_id and
                       d.damage_type_code = 'A'
              group by trunc(c.damage_inspection_date),c.damage_inspection_by
                 
                UNION ALL

                select trunc(g.damage_inspection_date) damage_inspection_date,
                       g.damage_inspection_by,
                       'MINOR' STATUS,
                       count(distinct g.gate_id) cnt
                  from gate_containers g,
                       gate_damages z
                 where g.gate_id = z.gate_id and
                       z.damage_type_code = 'F'
              group by trunc(g.damage_inspection_date),g.damage_inspection_by
             
                UNION ALL

                select  trunc(ab.damage_inspection_date) damage_inspection_date,
                        ab.damage_inspection_by,
                       'TOTAL' STATUS,
                       count(distinct ab.gate_id) cnt
                  from gate_containers ab,
                       gate_damages ac
                 where ab.gate_id = ac.gate_id
              group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
             
               )


should be a single query without union's - you don't need to make three passes on that data

select ..., count(distinct case when damage_code = 'A' then gate_id),
            count(distinct case when damage_code = 'F' then gate_id end), 
            count(distinct gate_id)
 

5 stars Great!   October 3, 2005 - 4pm Central time zone
Reviewer: A reader 
Tom,

When I put the changes. It saying "missing keyword" What am I doing wrong?

select  b.damage_inspection_date,
        b.damage_inspection_by
       ,b.status
       ,NVL(a.cnt,0) CNT
from
     (select aa.damage_inspection_date,
             aa.damage_inspection_by,
             bb.status 
        from (select distinct trunc(gc.damage_inspection_date) 
damage_inspection_date, gc.damage_inspection_by
                from gate_damages gd, gate_containers gc
               where gd.gate_id = gc.gate_id
             ) aa,
             (select * 
                from (select 'MAJOR' STATUS from dual
                      union all
                      select 'MINOR' STATUS from dual
                      union all
                      select 'TOTAL' STATUS from dual
                     )
             ) bb
     )b,
     ((SELECT  damage_inspection_date,
               damage_inspection_by,
               Status,
               count(distinct case when damage_code = 'A' then gate_id),
               count(distinct case when damage_code = 'F' then gate_id end), 
               count(distinct gate_id))
             from gate_containers ab,gate_damages ac
                 where ab.gate_id = ac.gate_id
              group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
                         )
    where b.damage_inspection_by = a.damage_inspection_by(+)
  and b.damage_inspection_date = a.damage_inspection_date(+)
  and b.status = a.status(+);  


Followup   October 3, 2005 - 8pm Central time zone:

sorry, I am not a sql compiler, I cannot reproduce since I don't have the tables or anything. 

4 stars Case when ... then ... end   October 4, 2005 - 8am Central time zone
Reviewer: Greg from Toronto
Just lucked out an saw this:

"select ..., count(distinct case when damage_code = 'A' then gate_id),
            count(distinct case when damage_code = 'F' then gate_id end), 
            count(distinct gate_id)"

Should be:

"select ..., count(distinct case when damage_code = 'A' then gate_id end),
            count(distinct case when damage_code = 'F' then gate_id end), 
            count(distinct gate_id)"

Tom just missed the "end" for the case statement ... (I got lucky and spotted it .. heh)

 


Followup   October 4, 2005 - 4pm Central time zone:

(that is why i always ask for create tables and inserts - without them, it is not possible to test) 

5 stars thanks!!   October 4, 2005 - 2pm Central time zone
Reviewer: A reader 


5 stars Well Taken   October 5, 2005 - 10am Central time zone
Reviewer: A reader 
Tom,

This is what I would like to see..

damage_inspection_date  damage_inspection_by    counts
xx/xx/xxxx                    Louis             2 minors
xx/xx/xxxx                    juan              1 major

thanks. 


4 stars can analytics help me?   October 5, 2005 - 2pm Central time zone
Reviewer: Susan from Watertown, MA
My result set be ordered by the sum of multiple columns with weight assigned to the columns.  The 
SQL below works and gives me what I want, but maybe there is an analytical function solution?  
Thanks for all your help.

SELECT ename, job, sal, comm  FROM scott.BONUS
ORDER BY DECODE(job, -2, 0, job)*100000+DECODE(sal, -2, 0, sal)*10000+DECODE(comm, -2,0,comm)*100 
DESC

 


Followup   October 5, 2005 - 3pm Central time zone:

not in this case - you want to order by a simple function of attributes of a single row.

You don't need to look across rows - analytics look across rows. 

5 stars Thanks Tom   October 5, 2005 - 3pm Central time zone
Reviewer: Susan from Watertown, MA
Thanks for your reply. Do you agree with the DECODE approach or am I missing a more elegant 
solution? 


Followup   October 5, 2005 - 8pm Central time zone:

the decode looks fine here - shorter than case but in this "case" just as easy to read. 

5 stars Tom   October 5, 2005 - 4pm Central time zone
Reviewer: A reader 
 Tom,

Can you please point in the right direction... 

This is what I am getting with the following query...


 damage_inspection_date damage_inspection_by  status
 6/12/2004           CCCT              MAJOR   
 6/12/2004           CCCT              MINOR
 6/12/2004           CCCT              TOTAL
 6/12/2004           LOU               MAJOR
 6/12/2004           LOU               MINOR
 

and this is what I would like to get....
 
 damage_inspection_date damage_inspection_by status   count
 6/12/2004          CCCT               MAJOR     2 
 6/12/2004          CCCT               MINOR     2
 6/12/2004         CCCT                TOTAL     1




 select b.damage_inspection_date,
        b.damage_inspection_by
       ,b.status
from
     (select aa.damage_inspection_date,
             aa.damage_inspection_by,
             bb.status 
        from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, 
gc.damage_inspection_by
                from gate_damages gd, gate_containers gc
               where gd.gate_id = gc.gate_id
             ) aa,
             (select * 
                from (select 'MAJOR' STATUS from dual
                      union all
                      select 'MINOR' STATUS from dual
                      union all
                      select 'TOTAL' STATUS from dual
                     )
             ) bb
     )b,
    ((SELECT  ab.damage_inspection_date,
               damage_inspection_by,
               STATUS_CODE,
               count(distinct case when ac.damage_location_code = 'A' then ab.gate_id end),
               count(distinct case when ac.damage_location_code = 'F' then ab.gate_id end), 
               count(distinct ab.gate_id )
                   from gate_containers ab,gate_damages ac
                 where ab.gate_id = ac.gate_id
                group by ab.damage_inspection_date,ab.damage_inspection_by,status_code, 
ab.gate_id))a
                where b.damage_inspection_by = a.damage_inspection_by(+)
                  and b.damage_inspection_date = a.damage_inspection_date(+)
                group by (b.damage_inspection_date, b.damage_inspection_by,b.status)

 


Followup   October 5, 2005 - 8pm Central time zone:

....
 damage_inspection_date damage_inspection_by  status
 6/12/2004           CCCT              MAJOR   
 6/12/2004           CCCT              MINOR
 6/12/2004           CCCT              TOTAL
 6/12/2004           LOU               MAJOR
 6/12/2004           LOU               MINOR
 

and this is what I would like to get....
 
 damage_inspection_date damage_inspection_by status   count
 6/12/2004          CCCT               MAJOR     2 
 6/12/2004          CCCT               MINOR     2
 6/12/2004         CCCT                TOTAL     1
...

by what "logic"?   can you explain how you get from A to B? 

5 stars follow up   October 6, 2005 - 9am Central time zone
Reviewer: A reader 
Tom,

I already got the first part done. All I need to show is to somehow have the count in another 
column, how many minor, major and total I have. Can that be possible?
Just maybe like in the second example.

 


Followup   October 6, 2005 - 11am Central time zone:

first part of WHAT?   

5 stars more information   October 6, 2005 - 12pm Central time zone
Reviewer: A reader 
Sorry about the lack of information before.

Here I will try to do bettter. I am trying to 
a query where I need to count the major, minor 
and then get a total. 

requirements:

1. if there is a container with  majors and a minors total the     
counts = major+ minor = total count

2. where container has minor and no major count  the minor only.   
count = minor 



inspector             major  minor   total

   1 major, 0 minor , other           1      1

inspector
 2 major , 1 minor , other     2     1       3

inspector 

 0 major, 1 minor, other        0     1      1 


Followup   October 6, 2005 - 1pm Central time zone:

sorry -- going back to your original example, I still cannot see the logic behind "what I have" and 
"what I want" there.  

I don't know what you mean by  "i have the first part"


 

5 stars this what I have now   October 6, 2005 - 2pm Central time zone
Reviewer: A reader 
Tom,

This is my query and result...

select  b.damage_inspection_date,
        b.damage_inspection_by
       ,b.status
       ,NVL(a.cnt,0) CNT

from
     (select aa.damage_inspection_date,
             aa.damage_inspection_by,
             bb.status 
        from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, 
gc.damage_inspection_by
                from gate_damages gd, gate_containers gc
               where gd.gate_id = gc.gate_id
             ) aa,
             (select * 
                from (select 'MAJOR' STATUS from dual
                      union all
                      select 'MINOR' STATUS from dual
                      union all
                      select 'TOTAL' STATUS from dual
                     )
             ) bb
     )b,


     ((SELECT  damage_inspection_date,
               damage_inspection_by,
               Status,
               cnt
          FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
                       c.damage_inspection_by,
                       'MAJOR' STATUS,
                       count(distinct c.gate_id) cnt
                  from gate_containers c,
                       gate_damages d
                 where c.gate_id = d.gate_id and
                       d.damage_type_code = 'F'
              group by trunc(c.damage_inspection_date),c.damage_inspection_by
                 
                UNION ALL

                select trunc(g.damage_inspection_date) damage_inspection_date,
                       g.damage_inspection_by,
                       'MINOR' STATUS,
                       count(distinct g.gate_id) cnt
                  from gate_containers g,
                       gate_damages z
                 where g.gate_id = z.gate_id and
                       z.damage_type_code = 'A'
              group by trunc(g.damage_inspection_date),g.damage_inspection_by
             
                UNION ALL

                select  trunc(ab.damage_inspection_date) damage_inspection_date,
                        ab.damage_inspection_by,
                       'TOTAL' STATUS,
                       count(distinct ab.gate_id) cnt
                  from gate_containers ab,
                       gate_damages ac
                 where ab.gate_id = ac.gate_id(+) and
                       SUBSTR(ab.action,2,1) != 'C'
              group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
             
               )
               group by damage_inspection_date, damage_inspection_by, status, cnt
               )
     ) a

where b.damage_inspection_by = a.damage_inspection_by(+)
  and b.damage_inspection_date = a.damage_inspection_date(+)
  and b.status = a.status(+);

RESULT:

SQL Statement which produced this data:
  select * from MAJOR_MINOR_COUNT_VIEW 
  where rownum < 10

6/12/2004    CCCT    TOTAL    1
6/12/2004    CRAIG    TOTAL    6
6/13/2004    CCCT    TOTAL    5
6/14/2004    CCCT    TOTAL    46
6/14/2004    FYFE    TOTAL    30
6/14/2004    HALM    TOTAL    38
6/14/2004    MUTH    MAJOR    2
6/14/2004    MUTH    MINOR    14
6/14/2004    MUTH    TOTAL    40

AND I WOULD LIK TO HAVE LIKE AS 
THE REQUIREMENTS ABOVE...HOPE THIS HELP. 


Followup   October 6, 2005 - 2pm Central time zone:

take your query - call it Q


select inspector, 
       max(decode(status,'MINOR',cnt)) minor,
       max(decode(status,'MAJOR',cnt)) major,
       max(decode(status,'TOTAL',cnt)) total
  from (Q)
 group by inspector 

4 stars Year to dt + month to date   October 6, 2005 - 2pm Central time zone
Reviewer: reader from US
CREATE TABLE TEST (ID VARCHAR2(10),sale_dt DATE ,amount NUMBER(6,2) )

INSERT INTO TEST VALUES ('aa','14-OCT-2005',65.25);
INSERT INTO TEST VALUES ('aa','14-OCT-2005',56.25);
INSERT INTO TEST VALUES ('aa','15-SEP-2005',72.25);
INSERT INTO TEST VALUES ('aa','19-OCT-2005',43.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',67.25);
INSERT INTO TEST VALUES ('bb','13-OCT-2005',235.25);
INSERT INTO TEST VALUES ('bb','15-OCT-2005',365.25);
INSERT INTO TEST VALUES ('bb','14-NOV-2005',465.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',165.25);
commit;


SELECT DISTINCT id,sale_dt,SUM (amount) 
OVER (PARTITION BY id ORDER BY sale_dt ASC) sale_daily,
SUM (amount) 
OVER (PARTITION BY id, TO_CHAR(invoice_dt, 'MON-YYYY') ORDER BY TO_CHAR(sale_dt, 'MON-YYYY') ASC) 
mon_sal,
SUM (sale_price_usd * qty_sold) 
OVER (PARTITION BY id, TO_CHAR(sale_dt, 'YYYY') ORDER BY TO_CHAR(sale_dt, 'YYYY') ASC) yr_sal,
FROM test

ID         SALE_DT   SALE_DAILY    MON_SAL     YR_SAL
---------- --------- ---------- ---------- ----------
aa         15-SEP-05      72.25      72.25        237
aa         14-OCT-05      121.5     164.75        237
aa         19-OCT-05      43.25     164.75        237
bb         14-SEP-05      232.5      232.5    1298.25
bb         13-OCT-05     235.25      600.5    1298.25
bb         15-OCT-05     365.25      600.5    1298.25
bb         14-NOV-05     465.25     465.25    1298.25

7 rows selected.


Ideally ,it should have been ----

ID         SALE_DT   SALE_DAILY    MON_SAL     YR_SAL
---------- --------- ---------- ---------- ----------
aa         15-SEP-05      72.25      72.25        72.25
aa         14-OCT-05      121.5     121.5        193.75
aa         19-OCT-05      43.25     164.75        237
bb         14-SEP-05     232.5      232.5    232.5
bb         13-OCT-05     235.25     235.25    467.5
bb         15-OCT-05     365.25     600.5    833.0 
bb         14-NOV-05     465.25     465.25    1298.25

How can I do this ?

Will appreciate your help .

THANKS 


 


Followup   October 6, 2005 - 3pm Central time zone:

ideally - there would be a qty_sold column somewhere :)


ideally you will ONLY use to_char to *format* data, never to process it.

trunc(invoice_dt,'y')  NOT to_char(invoice_dt,'yyyy')
trunc(sale_dt,'mm')    NOT to_char(sale_dt, 'MON-YYYY' )


 

4 stars Year to Date and Month to date   October 6, 2005 - 10pm Central time zone
Reviewer: READER from US
As per your suggestion ,I made the changes but ...still need your help .


CREATE TABLE TEST (ID VARCHAR2(10),sale_dt DATE ,amount NUMBER(6,2) )

INSERT INTO TEST VALUES ('aa','14-OCT-2005',65.25);
INSERT INTO TEST VALUES ('aa','14-OCT-2005',56.25);
INSERT INTO TEST VALUES ('aa','15-SEP-2005',72.25);
INSERT INTO TEST VALUES ('aa','19-OCT-2005',43.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',67.25);
INSERT INTO TEST VALUES ('bb','13-OCT-2005',235.25);
INSERT INTO TEST VALUES ('bb','15-OCT-2005',365.25);
INSERT INTO TEST VALUES ('bb','14-NOV-2005',465.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',165.25);
commit;


SELECT DISTINCT id,sale_dt,SUM (amount)
OVER (PARTITION BY id ORDER BY sale_dt ASC) sale_daily,
SUM (amount)
OVER (PARTITION BY id,trunc(sale_dt,'MM') ORDER BY trunc(sale_dt,'MM') ASC) mon_sal,
SUM (amount)
OVER (PARTITION BY id,trunc(sale_dt,'Y') ORDER BY trunc(sale_dt,'Y') ASC) yr_sal
FROM test
ID         SALE_DT               SALE_DAILY    MON_SAL     YR_SAL
---------- --------------------- ---------- ---------- ----------
aa         9/15/2005                  72.25      72.25        237
aa         10/14/2005                193.75     164.75        237
aa         10/19/2005                   237     164.75        237
bb         9/14/2005                  232.5      232.5    1298.25
bb         10/13/2005                467.75      600.5    1298.25
bb         10/15/2005                   833      600.5    1298.25
bb         11/14/2005               1298.25     465.25    1298.25
7 rows selected


Ideally ,it should have been ----

ID         SALE_DT   SALE_DAILY    MON_SAL     YR_SAL
---------- --------- ---------- ---------- ----------
aa         15-SEP-05      72.25      72.25        72.25
aa         14-OCT-05      121.5     121.5        193.75
aa         19-OCT-05      43.25     164.75        237
bb         14-SEP-05     232.5      232.5    232.5
bb         13-OCT-05     235.25     235.25    467.5
bb         15-OCT-05     365.25     600.5    833.0 
bb         14-NOV-05     465.25     465.25    1298.25


Thanks again . 


Followup   October 7, 2005 - 8am Central time zone:

you shall have to explain how you derived your "optimal" output.

certainly isn't sorted by anything?  I don't get the numbers. 

3 stars Year to date /Month to date   October 7, 2005 - 9am Central time zone
Reviewer: Reader from US
I wish to create a summary table where we will have sale for every day ,sale up to that day in that 
month and then upto that day in that year 
ie running total or cummulative total 

Thanks 
 


Followup   October 7, 2005 - 8pm Central time zone:

ok?
5 stars Follo up   October 7, 2005 - 9am Central time zone
Reviewer: A reader 
Tom,

The above pivot worked well, however my count are off since
I ONLY want to  count the minor when there is no Major.
Something like this..

                              major  minor   count

1 major, 0 minor , other             1             1
2 major , 1 minor , other              2             2
0 major, 1 minor, other              0     1       1 


* count the minor when there is no major



CREATE TABLE GATE_CONTAINERS
(
  GATE_ID                       NUMBER          ,
  VISIT                         NUMBER          ,
  REFERENCE_ID                  NUMBER          ,
  DAMAGE_INSPECTION_BY          VARCHAR2(30),
  DAMAGE_INSPECTION_DATE        DATE,
                   
)

Insert into GATE_TBL
   (GATE_ID, VISIT)
 Values
   (1, 1);
Insert into GATE_TBL
   (GATE_ID, VISIT)
 Values
   (17, 10);
Insert into GATE_TBL
   (GATE_ID, VISIT)
 Values
   (21, 12);
Insert into GATE_TBL
   (GATE_ID, VISIT)
 Values
   (31, 18);
Insert into GATE_TBL
   (GATE_ID, VISIT)
 Values
   (33, 19);
Insert into GATE_TBL
   (GATE_ID, VISIT, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
 Values
   (36, 22, TO_DATE('06/12/2004 11:48:49', 'MM/DD/YYYY HH24:MI:SS'), 'CRAIG');
Insert into GATE_TBL
   (GATE_ID, VISIT, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
 Values
   (37, 23, TO_DATE('06/12/2004 11:50:11', 'MM/DD/YYYY HH24:MI:SS'), 'CRAIG');
Insert into GATE_TBL
   (GATE_ID, VISIT, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
 Values
   (39, 25, TO_DATE('06/12/2004 11:48:19', 'MM/DD/YYYY HH24:MI:SS'), 'CRAIG');
Insert into GATE_TBL
   (GATE_ID, VISIT)
 Values
   (45, 30);
COMMIT;




CREATE TABLE GATE_DAMAGES
(
  GATE_ID               NUMBER                  NOT NULL,
  DAMAGE_LOCATION_CODE  VARCHAR2(5 BYTE)        NOT NULL,
  DAMAGE_TYPE_CODE      VARCHAR2(5 BYTE)        NOT NULL
)

Insert into damages_tbl
   (GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
 Values
   (34, '01', '9');
Insert into damages_tbl
   (GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
 Values
   (34, '02', 'C');
Insert into damages_tbl
   (GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
 Values
   (37, '01', 'B');
Insert into damages_tbl
   (GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
 Values
   (62, '05', 'B');
Insert into damages_tbl
   (GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
 Values
   (101, '23', 'C');
Insert into damages_tbl
   (GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
 Values
   (183, '99', '9');
Insert into damages_tbl
   (GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
 Values
   (188, '01', 'D');
Insert into damages_tbl
   (GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
 Values
   (188, '04', 'B');
Insert into damages_tbl
   (GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
 Values
   (188, '07', 'B');
COMMIT;
 


Followup   October 7, 2005 - 8pm Central time zone:

The above pivot worked well, however my count are off since
I ONLY want to  count the minor when there is no Major.
Something like this..

                              major  minor   count

1 major, 0 minor , other             1             1
2 major , 1 minor , other              2             2
0 major, 1 minor, other              0     1       1 



so tell me why there are minor counts when major > 0??? 

5 stars and this is my query   October 7, 2005 - 9am Central time zone
Reviewer: A reader 
select damage_inspection_date,damage_inspection_by,
       max(decode(status,'MINOR',cnt)) minor,
       max(decode(status,'MAJOR',cnt)) major,
       max(decode(status,'TOTAL',cnt)) total
  from (select  b.damage_inspection_date,
        b.damage_inspection_by
       ,b.status
       ,NVL(a.cnt,0) CNT
from
     (select aa.damage_inspection_date,
             aa.damage_inspection_by,
             bb.status 
        from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, 
gc.damage_inspection_by
                from gate_damages gd, gate_containers gc
               where gd.gate_id = gc.gate_id
             ) aa,
             (select * 
                from (select 'MAJOR' STATUS from dual
                      union all
                      select 'MINOR' STATUS from dual
                      union all
                      select 'TOTAL' STATUS from dual
                     )
             ) bb
     )b,
     ((SELECT  damage_inspection_date,
               damage_inspection_by,
               Status,
               cnt
          FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
                       c.damage_inspection_by,
                       'MAJOR' STATUS,
                       count(distinct c.gate_id) cnt
                  from gate_containers c,
                       gate_damages d
                 where c.gate_id = d.gate_id and
                       d.damage_type_code = 'F'
              group by trunc(c.damage_inspection_date),c.damage_inspection_by
               UNION ALL
                select trunc(g.damage_inspection_date) damage_inspection_date,
                       g.damage_inspection_by,
                       'MINOR' STATUS,
                       count(distinct g.gate_id) cnt
                  from gate_containers g,
                       gate_damages z
                 where g.gate_id = z.gate_id and
                       z.damage_type_code = 'A'
              group by trunc(g.damage_inspection_date),g.damage_inspection_by
              UNION ALL
              select  trunc(ab.damage_inspection_date) damage_inspection_date,
                        ab.damage_inspection_by,
                       'TOTAL' STATUS,
                       count(distinct ab.gate_id) cnt
                  from gate_containers ab,
                       gate_damages ac
                 where ab.gate_id = ac.gate_id(+) and
                       SUBSTR(ab.action,2,1) != 'C'
              group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
               )
               group by damage_inspection_date, damage_inspection_by, status, cnt
               )
     ) a
where b.damage_inspection_by = a.damage_inspection_by(+)
  and b.damage_inspection_date = a.damage_inspection_date(+)
  and b.status = a.status(+) 
)
group by damage_inspection_date,damage_inspection_by 


5 stars I got it.... &