Analytics question

Excellent!!!

A reader, October 07, 2003 - 5:05 am UTC

Hi Tom,

'been thinking about writing a book just about analytics ' ... please make this book available soon and am sure it will be yet another gift from you to Oracle World :)

Wow!!

Michael T, October 07, 2003 - 7:00 am UTC

This is exactly what I needed! Analytics do rock! I just
need to understand them better. If you do decide to write a
book on analytics, it would be at the top of my must have
list. Thanks again!!!

Small correction

Michael T, October 07, 2003 - 7:33 am UTC

After looking at it a little closer it looks like there is
one small error. The start date for the first MACH1 entry
should be the close date of the prior different station. In
this case 07/01/2003. However, by making some small changes
to your query I can get the results I want.

SELECT order#,
station,
lag(close_date) over (partition by order# order by close_date)
start_date,
close_date
FROM (SELECT order#,
station,
close_date
FROM (SELECT order#,
lag(station) over (partition by order# order by
close_date) lag_station,
lead(station) over (partition by order# order by
close_date) lead_station,
station,
close_date
FROM t)
WHERE lead_station <> station
OR lead_station is null
OR lag_station is null)

There might be an easier way to construct this query, but
it works great for me. Thanks a lot for your help!

October 07, 2003 - 8:25 am UTC

sorry about that -- you are right -- when we have "a pair", we want to use lag/lead again to get and keep the right dates.  

So, we want to keep rows that are:

a) the first row in the partition  "where lag_station is null"
b) the last row in the partition "where lead_station is null"
c) the first of a possible pair "where lag_station <> station"
d) the second of a possible pair "where lead_station <> station"

This query does that:

ops$tkyte@ORA920> select order#,
  2         station,
  3         lag_close_date,
  4         close_date,
  5         decode( lead_station, station, 1, 0 ) first_of_pair,
  6         decode( lag_station, station, 1, 0 ) second_of_pair
  7    from (
  8  select order#,
  9         lag(station) over (partition by order# order by close_date)
 10                                                              lag_station,
 11         lead(station) over (partition by order# order by close_date)
 12                                                              lead_station,
 13             station,
 14             close_date,
 15         lag(close_date) over (partition by order# order by close_date)
 16                                                             lag_close_date,
 17         lead(close_date) over (partition by order# order by close_date)
 18                                                             lead_close_date
 19    from t
 20         )
 21   where lag_station is null
 22          or lead_station is null
 23          or lead_station <> station
 24          or lag_station <> station
 25  /
 
ORDER# STATION LAG_CLOSE_ CLOSE_DATE FIRST_OF_PAIR SECOND_OF_PAIR
------ ------- ---------- ---------- ------------- --------------
 12345 RECV               07/01/2003             0              0
 12345 MACH1   07/01/2003 07/02/2003             1              0
 12345 MACH1   07/05/2003 07/11/2003             0              1
 12345 INSP1   07/11/2003 07/12/2003             0              0
 12345 MACH1   07/12/2003 07/16/2003             0              0
 12345 MACH2   07/16/2003 07/30/2003             0              0
 12345 STOCK   07/30/2003 08/01/2003             0              0
 
7 rows selected.
 

<b>we can see with the 1's the first/second of a pair in there.  All we need to do now is "reach forward" for the first of a pair and grab the close date from the next record:</b>

ops$tkyte@ORA920> select order#,
  2         station,
  3         lag_close_date,
  4         close_date
  5    from (
  6  select order#,
  7         station,
  8         lag_close_date,
  9         decode( lead_station,
 10                 station,
 11                 lead(close_date) over (partition by order# order by close_date),
 12                 close_date ) close_date,
 13         decode( lead_station, station, 1, 0 ) first_of_pair,
 14         decode( lag_station, station, 1, 0 ) second_of_pair
 15    from (
 16  select order#,
 17         lag(station) over (partition by order# order by close_date)
 18                                                              lag_station,
 19         lead(station) over (partition by order# order by close_date)
 20                                                              lead_station,
 21             station,
 22             close_date,
 23         lag(close_date) over (partition by order# order by close_date)
 24                                                             lag_close_date,
 25         lead(close_date) over (partition by order# order by close_date)
 26                                                             lead_close_date
 27    from t
 28         )
 29   where lag_station is null
 30          or lead_station is null
 31          or lead_station <> station
 32          or lag_station <> station
 33         )
 34   where second_of_pair <> 1
 35  /
 
ORDER# STATION LAG_CLOSE_ CLOSE_DATE
------ ------- ---------- ----------
 12345 RECV               07/01/2003
 12345 MACH1   07/01/2003 07/11/2003
 12345 INSP1   07/11/2003 07/12/2003
 12345 MACH1   07/12/2003 07/16/2003
 12345 MACH2   07/16/2003 07/30/2003
 12345 STOCK   07/30/2003 08/01/2003
 
6 rows selected.


<b>and discard the second of pairs row</b>


That is another way to do it (and an insight into how I develop analytic queries -- adding extra columns like that just to see visually what I want to do)

another good book on the list please go ahead on this one too

Vijay Sehgal, October 07, 2003 - 8:58 am UTC

Best Regards,
Vijay Sehgal

Very useful

Michael T., October 07, 2003 - 12:05 pm UTC

Excellent, as always!

Can we reach to the end of the group?

Steve, December 15, 2003 - 11:28 am UTC

For example, say our analytic query returns the following result:

master_record sub_record nxt_record
95845433 25860032 95118740
95118740 25860032 95837497
95837497 25860032

What I'd like is to do is grab the final master_record, 95837497, and have that populated in the final column. There could be 2,3 or more in each group.

December 15, 2003 - 3:45 pm UTC

so the nxt_record of the last record should be the master_record of that row?

then just select

nvl( lead(master_record) over (....), master_record ) nxt_record

when the lead is NULL, return the master_record of the current row

Almost....

Steve, December 15, 2003 - 5:52 pm UTC

but I dodn't explain it well enough. What I'd like to see is a result set that looks like:

master_record sub_record nxt_record
95845433 25860032 95837497
95118740 25860032 95837497
95837497 25860032 95837497

The data comes from this:

table activity
cllocn moddate

25860032 18/06/2003
95118740 26/08/2003
95837497 15/12/2003
95845433 19/08/2003

table ext_dedupe

master_cllocn dupe_cllocn
25860032 95118740
25860032 95837497
25860032 95845433

My query is:

select * from ( select master_record, sub_record, lead(master_record) over (partition by sub_record order by lst_activity asc) nxt_activity
from ( select * from (select case when dupelast_ackdate>last_ackdate then dupe_cllocn
when last_ackdate>dupelast_ackdate then master_cllocn
else master_cllocn
end master_record, greatest(last_ackdate,dupelast_ackdate) lst_activity,
case when dupelast_ackdate>last_ackdate then master_cllocn
when last_ackdate>dupelast_ackdate then dupe_cllocn
else dupe_cllocn
end sub_record
from (select master_cllocn, (select max(moddate) from activity a where a.cllocn=ed.master_cllocn) last_ackdate,
dupe_cllocn, (select max(moddate) from activity a where a.cllocn=ed.dupe_cllocn) dupelast_ackdate
from ext_dedupe ed))))

Am I on the right track or is there a simpler way to this?

Thanks

December 16, 2003 - 6:50 am UTC

can you explain in "just text" how you got from your inputs to your outputs.

it is not clear (and i didn't feel like parsing that sql to reverse engineer what it does)

Is this what you are looking for ?

Venkat, December 15, 2003 - 6:44 pm UTC

select master, sub, moddate
, min(master) keep (dense_rank first order by moddate) over (partition by sub) first_in_list
, max(master) keep (dense_rank last order by moddate) over (partition by sub) last_in_list
from (select master, sub, moddate from (
select 95845433 master, 25860032 sub, to_date('19-aug-03','dd/mon/yy') moddate from dual union all
select 95118740, 25860032, to_date('26-aug-03','dd/mon/yy') from dual union all
select 95837497, 25860032, to_date('15-dec-03','dd/mon/yy') from dual))

MASTER SUB MODDATE FIRST_IN_LIST LAST_IN_LIST
95845433 25860032 8/19/2003 95845433 95837497
95118740 25860032 8/26/2003 95845433 95837497
95837497 25860032 12/15/2003 95845433 95837497

Tom's Book

umesh, December 16, 2003 - 4:13 am UTC

Tom
Do not announce until you are finished with the book .. when you talk of a book ..can't wait until We have it here
Analytics Book That must be real good

Is it possible to get the same result in standard edition ?

Ninoslav, December 16, 2003 - 4:21 am UTC

Hi Tom,
yes, analitic functions are great. However, we can use it only in enterprise edition of database. We have a few small customers that want only a standard edition.
So, is it possible in this question to get the same result without analitic functions ?
It would be nice to have some kind of mapping between analitics and 'standard' queries. But, that is probabaly impossible...

December 16, 2003 - 7:27 am UTC

Oracle 9iR2 and up -- analytics are a feature of standard edition.

there are things you can do in analytics that are quite simply NOT PRACTICAL in any sense without them.

ok

Steve, December 16, 2003 - 8:41 am UTC

I have two tables - activity and ext_dedupe.

table activity
cllocn moddate
25860032 18/06/2003
95118740 26/08/2003
95837497 15/12/2003
95845433 19/08/2003

table ext_dedupe
master_cllocn dupe_cllocn
25860032 95118740
25860032 95837497
25860032 95845433

Ext_dedupe is a table created by a third party app which has identified duplicate records within our database. The first column is supposed to be the master and the second the duplicate. The idea is to mark as archived all our duplicate records with a pointer to the master. Notwithstanding the order of the columns, what we want to do is find out which record has the most recent activity (from the activity table) and archive off the others.

So, in this example although the master is listed as 25860032 against the other 3, an examination of the activity dates mean I want to keep 95837497 and mark the others as archived and have a pointer on each of them to 95837497. That's why I thought if I could get to the following result it would make it simpler.

master_record sub_record nxt_record
95845433 25860032 95837497
95118740 25860032 95837497
95837497 25860032 95837497

Hope that makes sense!

December 16, 2003 - 11:33 am UTC

oh, then nxt_record is just

last_value(master_record) over (partition by sub_record order by moddate)

Why...

Steve, December 16, 2003 - 1:31 pm UTC

it didn't work for me. I had to change it to

first_value(master_record) over (partition by sub_record order by moddate desc)

Is there a reason for that?

December 16, 2003 - 2:00 pm UTC

doh, default window clause is current row and unbounded preceding

i would have needed a window clause that looks forwards rather then backwards (reason #1 why I should always set up a test case instead of just answering on the fly)

your solution of reversing the data works just fine.

Another solution

A reader, December 16, 2003 - 4:03 pm UTC

The following gives the same result ...

select cllocn master_record, nvl(master_cllocn,cllocn) sub_record
, max(cllocn) keep (dense_rank last order by moddate)
over (partition by nvl(master_cllocn,cllocn)) nxt_record
from activity, ext_dedupe where cllocn = dupe_cllocn

MASTER_RECORD SUB_RECORD NXT_RECORD
95118740 25860032 95837497
95837497 25860032 95837497
95845433 25860032 95837497

December 16, 2003 - 5:44 pm UTC

yes, there are many many ways to do this.

first_value
last_value

substring of max() without keep

sure.

A reader, December 16, 2003 - 4:15 pm UTC

Actually the nvl(master_cllocn...) is required only if you need all 4 rows in the output as follows(there is an outer join involved). If you need only the 3 rows as shown in the above post, there is no need for the nvl's....

select cllocn master_record, nvl(master_cllocn,cllocn) sub_record
, max(cllocn) keep (dense_rank last order by moddate)
over (partition by nvl(master_cllocn,cllocn)) nxt_record
, last_value(cllocn) over (partition by nvl(master_cllocn,cllocn) order by moddate) nxt
from activity, ext_dedupe where cllocn = dupe_cllocn (+)

MASTER_RECORD SUB_RECORD NXT_RECORD
25860032 25860032 95837497
95118740 25860032 95837497
95837497 25860032 95837497
95845433 25860032 95837497

still q's on analytics

A reader, January 30, 2004 - 10:13 am UTC

Okay, so my web application logs "web transaction" statistics to a table.  This actually amounts to 0 to many database tranactions... but anyway.. I need to summarize (sum, min, max, count, average) each day's transaction times for each class (name2) and action (name3) and ultimately "archive" this data to a hisory table.  I am running 8.1.7 and pretty new to analytics.

My table looks like this:

SQL> desc tran_stats
 Name                    Null?    Type
 ----------------------- -------- ----------------
 ID                      NOT NULL NUMBER(9)
 NAME1                            VARCHAR2(100)
 NAME2                            VARCHAR2(100)
 NAME3                            VARCHAR2(100)
 NAME4                            VARCHAR2(100)
 SEC                     NOT NULL NUMBER(9,3)
 TS_CR                   NOT NULL DATE

        ID NAME1 NAME2                     NAME3         SEC NAME4 TS_CR
---------- ----- ------------------------- ---------- ------ ----- ---------
     35947       /CM01_PersonManagement    CREATE       .484       15-JAN-04
     35987       /CM01_PersonManagement    CREATE       .031       15-JAN-04
     36086       /CM01_PersonManagement    EDIT         .312       16-JAN-04
     36555       /CM01_PersonManagement    CREATE       .297       19-JAN-04
     36623       /CM01_PersonManagement    EDIT         .375       19-JAN-04
     36627       /CM01_PersonManagement    CREATE       .047       19-JAN-04
     36756       /CM01_AddressManagement   CREATE       .375       20-JAN-04
     36766       /CM01_AddressManagement   CREATE       .305       20-JAN-04
     36757       /CM01_AddressManagement   INSERT       .391       20-JAN-04
     37178       /CM01_PersonManagement    EDIT         .203       20-JAN-04

and I need output like this:

TS_CR     NAME2                     NAME3       M_SUM  M_MIN  M_MAX M_COUNT  M_AVG
--------- ------------------------- ---------- ------ ------ ------ ------- ------
20-JAN-04 /CM01_AddressManagement   CREATE       .680   .305   .375       2   .340
20-JAN-04 /CM01_AddressManagement   INSERT       .391   .391   .391       1   .391
20-JAN-04 /CM01_PersonManagement    EDIT         .203   .203   .203       1   .203
19-JAN-04 /CM01_PersonManagement    CREATE       .344   .047   .297       2   .172
19-JAN-04 /CM01_PersonManagement    EDIT         .375   .375   .375       1   .375
16-JAN-04 /CM01_PersonManagement    EDIT         .312   .312   .312       1   .312
15-JAN-04 /CM01_PersonManagement    CREATE       .515   .031   .484       2   .258


This seems to work, but there has to be a better/cleaner/more efficient way to do this:

select distinct ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
from (
select  trunc(ts_cr) ts_cr,id, name2, name3, sum(sec) m_dummy
 , min(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_min
 , max(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_max
 , round(avg(sum(sec)) over(partition by name2,name3,trunc(ts_cr)),5) as m_avg
 , count(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_count
 , sum(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_sum
        from tran_stats group by name2, name3,trunc(ts_cr),id
)n order by 1 desc, 2, 3;


Any help or pointers would be appreciated.  Thanks in advance.

January 30, 2004 - 10:31 am UTC

why does there "have to be"?

what is "unclean" about this? I could make it more verbose (and perhaps more readable) but this does exactly what you ask for?

It seems pretty "good", very "clean" and probably the most efficient method to get this result?

Regarding the previous post ...

A reader, January 30, 2004 - 11:45 am UTC

Am I missing something or will the following do the same ..

select trunc(ts_cr) ts_cr, name2, name3,
count(*) m_count, min(sec) m_min, max(sec) m_max,
sum(sec) m_sum, avg(sec) m_avg
from tran_stats
group by trunc(ts_cr), name2, name3
order by 1 desc, 2, 3

January 30, 2004 - 7:43 pm UTC

with the supplied data -- since "group by trunc(ts_cr), name2, name3" happened to be unique

yes.

In general -- no.  consider:

ops$tkyte@ORA9IR2> select distinct ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
  2  from ( select trunc(ts_cr) ts_cr,
  3                id,
  4                            name2,
  5                            name3,
  6                            sum(sec) m_dummy ,
  7                            min(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_min ,
  8                            max(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_max ,
  9                            round(avg(sum(sec)) over(partition by name2,name3,trunc(ts_cr)),5) as m_avg ,
 10                            count(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_count ,
 11                            sum(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_sum
 12               from tran_stats
 13                  group by name2, name3,trunc(ts_cr),id
 14        )n
 15  MINUS
 16  select ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
 17  from (
 18  select trunc(ts_cr) ts_cr, name2, name3,
 19         count(*) m_count, min(sec) m_min, max(sec) m_max,
 20                    sum(sec) m_sum, avg(sec) m_avg
 21                            from tran_stats
 22                            group by trunc(ts_cr), name2, name3 )
 23  /
 
no rows selected
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into tran_stats
  2  select 35947,'/CM01_PersonManagement','CREATE', .484  ,'15-JAN-04'
  3   from all_users where rownum <= 5;
 
5 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select distinct ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
  2  from ( select trunc(ts_cr) ts_cr,
  3                id,
  4                            name2,
  5                            name3,
  6                            sum(sec) m_dummy ,
  7                            min(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_min ,
  8                            max(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_max ,
  9                            round(avg(sum(sec)) over(partition by name2,name3,trunc(ts_cr)),5) as m_avg ,
 10                            count(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_count ,
 11                            sum(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_sum
 12               from tran_stats
 13                  group by name2, name3,trunc(ts_cr),id
 14        )n
 15  MINUS
 16  select ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
 17  from (
 18  select trunc(ts_cr) ts_cr, name2, name3,
 19         count(*) m_count, min(sec) m_min, max(sec) m_max,
 20                    sum(sec) m_sum, avg(sec) m_avg
 21                            from tran_stats
 22                            group by trunc(ts_cr), name2, name3 )
 23  /
 
TS_CR     NAME2                   NAME3         M_SUM      M_MIN      M_MAX    M_COUNT      M_AVG
--------- ----------------------- -------- ---------- ---------- ---------- ---------- ----------
15-JAN-04 /CM01_PersonManagement  CREATE        2.935       .031      2.904          2     1.4675



add more data and it won't be the same.

OK

Siva, January 31, 2004 - 9:05 am UTC

Dear Tom,
Can analytics be used for the following formats of the same query
sql>select ename,nvl(ename,'Name is null') from emp
sql>select ename,decode(ename,null,'Name is null',ename)
from emp
If you know other ways,Please let me know
Bye!

January 31, 2004 - 10:03 am UTC

umm, why ?

with analytics

A reader, February 18, 2004 - 7:30 am UTC

with the following data

-- ------
1 val1_1
1 val1_2
1 val1_3
2 val1_1
2 val2_2

can i produce

-- ------ --------------------
1 val1_1 val1_1,val1_2,val1_3
1 val1_2 val1_1,val1_2,val1_3
1 val1_3 val1_1,val1_2,val1_3
2 val2_1 val2_1,val2_2
2 val2_2 val2_1,val2_2

with an analytic that rocks

February 18, 2004 - 8:47 pm UTC

if

select max(count(*)) from t group by id

has a reasonable maximum -- yes, but it would be a trick lag/lead thing.

I would probably join using stragg. join the details to the aggregate using inline views.

OK

Siddiq, March 01, 2004 - 9:26 am UTC

Hi Tom,
What can be the business use cases of the analytic functions
1)cume_dist
2)percentile_dist
3)percentile_cont
Where can they be of immense use?
Bye!

March 01, 2004 - 10:17 am UTC

they are just statistical functions for analysis.

2 and 3 are really variations on eachother (disc=discrete, cont=continuous) and would be used to compute pctcentiles (like you might see on an SAT test report from back in high school). percentile_* can be used to find a median for example :)

cume_dist is a variation on that. I'll cheat on an example, from the doc:

Analytic Example

The following example calculates the salary percentile for each employee in the purchasing area. For example, 40% of clerks have salaries less than or equal to Himuro.

SELECT job_id, last_name, salary, CUME_DIST() OVER (PARTITION BY job_id ORDER BY salary) AS cume_dist FROM employees WHERE job_id LIKE PU% ;

JOB_ID LAST_NAME SALARY CUME_DIST
---------- ------------------------- ---------- ----------
PU_CLERK Colmenares 2500 .2
PU_CLERK Himuro 2600 .4
PU_CLERK Tobias 2800 .6
PU_CLERK Baida 2900 .8
PU_CLERK Khoo 3100 1
PU_MAN Raphaely 11000 1

Stumped on Analytics

Dave Thompson, March 04, 2004 - 9:56 am UTC

Hi Tom,

I have the following two tables:

CREATE TABLE PAY_M
(
PAY_ID NUMBER,
PAYMENT NUMBER
)

--
--

CREATE TABLE PREM
(
PREM_ID NUMBER,
PREM_PAYMENT NUMBER
)

With the following data:

INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES (
1, 100);
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES (
2, 50);
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES (
3, 50);
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES (
4, 50);
COMMIT;
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES (
1, 50);
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES (
2, 25);
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES (
3, 50);
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES (
4, 50);
COMMIT;

PAY_M contains payments made against the premiums in the table prem.

Payments:

PAY_ID PAYMENT
---------- ----------
1 50
2 25
3 50
4 50

Prem:

PREM_ID PREM_PAYMENT
---------- ------------
1 100
2 50
3 50
4 50

We are trying to find which payment Ids paid each premium payment in Prem. The payments are assigned sequentially to the premiums.

For example payments 1,2 & 3 pay off the £100 in premium 1 leaving £25. Then the remaining payment from payment 3 & payment 4 pay off premium 2 leaving a balance of £25, and so on.

We are trying to create a query that will use the analytical functions to find all the payment IDs that pay off the associated premium ids. We want to keep this SQL based as we need to Process about 30 million payments!

Thanks.

Great website, hope you enjoyed your recent visit to the UK.

March 04, 2004 - 1:52 pm UTC

let me make sure I have this straight -- you want to

o sum up the first 3 records in payments
o discover they are 125 which exceeds 100
o output the fact that prem_id 1 is paid for by pay_id 1..3
o carry forward 25 from 3, discover that leftover 3+4 = 75 pays for prem_id 2
with 25 extra

while I believe (not sure) that the 10g MODEL clause might be able to do this (if you can do it in a spreadsheet, we can use the MODEL clause to do it).....

I'm pretty certain that analytics cannot -- we would need to recursively use lag (eg: after finding that 1,2,3 pay off 1, we'd need to -- well, it's hard to explain...)

I cannot see analytics doing this -- future rows depend on functions of the analytics from past rows and that is just "not allowed".

I can see how to do this in a pipelined PLSQL function -- will that work for you?

Oops - Error in previous post

Dave Thompson, March 04, 2004 - 10:17 am UTC

Tom,

Sorry, ignore the above tables as they are missing the joining column:

CREATE TABLE PAY_M
(
  PREM_ID  NUMBER,
  PAY_ID   NUMBER,
  PAYMENT  NUMBER
)

INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES ( 
1, 1, 50); 
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES ( 
1, 2, 25); 
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES ( 
1, 3, 50); 
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES ( 
1, 4, 50); 
COMMIT;

CREATE TABLE PREM
(
  PREM_ID       NUMBER,
  PAY_ID        NUMBER,
  PREM_PAYMENT  NUMBER
)

INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES ( 
1, 1, 100); 
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES ( 
1, 2, 50); 
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES ( 
1, 3, 50); 
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES ( 
1, 4, 50); 
COMMIT;


SQL> l
  1  SELECT *
  2* FROM   PAY_M
SQL> /

   PREM_ID     PAY_ID    PAYMENT
---------- ---------- ----------
         1          1         50
         1          2         25
         1          3         50
         1          4         50

SQL> select *
  2  from prem;

   PREM_ID     PAY_ID PREM_PAYMENT
---------- ---------- ------------
         1          1          100
         1          2           50
         1          3           50
         1          4           50

Thanks.....

Dave Thompson, March 05, 2004 - 4:23 am UTC

Tom,

Thanks for your prompt response.

I am familiar with Pipeline functions.

I was however hoping we could do this as a set based opertion because of the volume of data involved.

Thanks for your time.

analytics book

Ron Chennells, March 05, 2004 - 5:52 am UTC

Just another vote and pre order for the analytics book

OK

Gerhard, March 19, 2004 - 12:33 am UTC

Dear Tom,
 I used the following query to find the difference of salaries between employees.

SQL> select ename,sal,sal-lag(sal) over(order by sal) as diff_sal from emp;

ENAME             SAL   DIFF_SAL                                                
---------- ---------- ----------                                                
SMITH             800                                                           
JAMES             950        150                                                
ADAMS            1100        150                                                
WARD             1250        150                                                
MARTIN           1250          0                                                
MILLER           1300         50                                                
TURNER           1500        200                                                
ALLEN            1600        100                                                
CLARK            2450        850                                                
BLAKE            2850        400                                                
JONES            2975        125                                                

ENAME             SAL   DIFF_SAL                                                
---------- ---------- ----------                                                
SCOTT            3000         25                                                
FORD             3000          0                                                
KING             5000       2000                                                

14 rows selected.

My Question is:
" What is the difference between King's sal with other 
employees?".Could you please help with the query?
Bye!

March 19, 2004 - 8:58 am UTC

scott@ORA9IR2> select ename,sal,sal-lag(sal) over(order by sal) as diff_sal ,
2 sal-king_sal king_sal_diff
3 from (select sal king_sal from emp where ename = 'KING'),
4 emp
5 /

ENAME SAL DIFF_SAL KING_SAL_DIFF
---------- ---------- ---------- -------------
SMITH 800 -4200
JAMES 950 150 -4050
ADAMS 1100 150 -3900
WARD 1250 150 -3750
MARTIN 1250 0 -3750
MILLER 1300 50 -3700
TURNER 1500 200 -3500
ALLEN 1600 100 -3400
CLARK 2450 850 -2550
BLAKE 2850 400 -2150
JONES 2975 125 -2025
SCOTT 3000 25 -2000
FORD 3000 0 -2000
KING 5000 2000 0

14 rows selected.

Will this be faster?

Venkat, March 19, 2004 - 4:20 pm UTC

select ename, sal,
sal-lag(sal) over(order by sal) as diff_sal,
sal - max(case when ename='KING' then sal
else null end) over () king_sal_diff
from emp

March 20, 2004 - 9:47 am UTC

when you benchmarked it and tested it to scale, what did you see? it would be interesting no?

lead/lag on different dataset

Stalin, May 03, 2004 - 9:22 pm UTC

Hi Tom,

I've similar requirement but i'm not sure how to use lead or lag to refer from a different dataset.

Eg. logs table has both login and logout information and they are identified by action column. There could be different login/logout modes so records that have action in (1,2) and (3,4,5,6,7) values are login and logout records respectively. Now i need to find signon and signout times and also session duration in mins.

here is some sample data of logs table :

LOG_ID LOG_CREATION_DATE USER_ID SERVICE ACTION
---------- ------------------- ---------- ---------- ----------
1 04/29/2004 10:48:36 3 5 2
3 04/29/2004 10:53:44 3 5 3
5 04/29/2004 11:11:35 3 5 1
1003 05/03/2004 15:18:53 3 5 5
1004 05/03/2004 15:19:50 8 5 1

here is a query i came up with (not exacly what i want) :

select log_id signon_id, lead(log_id, 1) over (partition by account_id, user_id, mac order by log_id) signoff_id,
user_id, log_creation_date signon_date,
lead(log_creation_date, 1) over (partition by account_id, user_id, mac order by log_creation_date) signoff_date,
nvl(round(((lead(log_creation_date, 1)
over (partition by account_id, user_id order by log_creation_date)-log_creation_date)*1440), 2), 0) Usage_Mins
from logs
where account_id = 'Robert'
and service = 5
order by user_id

desired output :

SIGNON_ID SIGNOFF_ID USER_ID SIGNON_DATE SIGNOFF_DATE USAGE_MINS
---------- ---------- ---------- ------------------- ------------------- ----------
1 3 3 04/29/2004 10:48:36 04/29/2004 10:53:44 5.13
5 1003 3 04/29/2004 11:11:35 05/03/2004 15:18:53 6007.3
1004 8 05/03/2004 15:19:50 0

Thanks in Advance,
Stalin

May 04, 2004 - 7:11 am UTC

maybe if you supply simple create table and insert ... values ... statements for me.... this stuff would go faster.

Your query references columns that are not in the example as well.

Create table scripts

Stalin, May 04, 2004 - 1:29 pm UTC

Sorry for not giving this info in the first place.

here goes the scripts....

create table logs (log_id number, log_creation_date date, account_id varchar2(25), user_id number,
service number, action number, mac varchar2(50))
/

insert into logs values (1, to_date('04/29/2004 10:48:36'), 'Robert', 3, 5, 2, '00-00-00-00')
/
insert into logs values (3, to_date('04/29/2004 10:53:44'), 'Robert', 3, 5, 3, '00-00-00-00')
/
insert into logs values (5, to_date('04/29/2004 11:11:35'), 'Robert', 3, 5, 1, '00-00-00-00')
/
insert into logs values (1003, to_date('05/03/2004 15:18:53'), 'Robert', 3, 5, 5, '00-00-00-00')
/
insert into logs values (1004, to_date('05/03/2004 15:19:50'), 'Robert', 8, 5, 1, '00-00-00-00')
/

The reason for including mac in the partition group is cause users can login via multiple pc's without logging out hence i grouped it on account_id, user_id and mac.

Thanks,
Stalin

May 04, 2004 - 2:38 pm UTC

ops$tkyte@ORA9IR2> select a.* , round( (signoff_date-signon_date) * 24 * 60, 2 ) minutes
  2    from (
  3  select log_id,
  4         case when action in (1,2) and lead(action) over (partition by account_id,user_id,mac order by log_creation_date) in (3,4,5,6,7)
  5              then lead(log_id) over (partition by account_id, user_id, mac order by log_creation_date)
  6          end signoff_id,
  7         user_id,
  8         log_creation_date signon_date,
  9         case when action in (1,2) and lead(action) over (partition by account_id,user_id,mac order by log_creation_date) in (3,4,5,6,7)
 10              then lead(log_creation_date) over (partition by account_id, user_id, mac order by log_creation_date)
 11          end signoff_date,
 12                  action
 13  from   logs
 14  where  account_id = 'Robert'
 15  and    service = 5
 16  order  by user_id
 17         ) a
 18   where action in (1,2)
 19  /
 
    LOG_ID SIGNOFF_ID    USER_ID SIGNON_DATE         SIGNOFF_DATE            ACTION    MINUTES
---------- ---------- ---------- ------------------- ------------------- ---------- ----------
         1          3          3 04/29/2004 10:48:36 04/29/2004 10:53:44          2       5.13
         5       1003          3 04/29/2004 11:11:35 05/03/2004 15:18:53          1     6007.3
      1004                     8 05/03/2004 15:19:50                              1

Excellent

Stalin, May 04, 2004 - 3:42 pm UTC

This is exactly what i'm looking for.

Thanks so much!

Help On SQL

VKOUL, May 04, 2004 - 8:05 pm UTC

I want to substitute the non null value of a column to its null column. e.g.

If I have records like the following

year month column_value
----- ------ --------------------
2002 06 55
2002 06 57
2002 07 NULL
2002 08 NULL
2002 09 NULL
2002 10 100
2002 11 101

I want the results as below

year month column_value
----- ------ --------------------
2002 06 55
2002 06 57
2002 07 57 ------> Repeated
2002 08 57 ------> Repeated
2002 09 57 ------> Repeated
2002 10 100
2002 11 101

May 04, 2004 - 9:08 pm UTC

create table,
insert into table

much appreciated......... (so i don't spend days of my life making create tables and insert into statements. I've added this request to all pages where you can input stuff and I'll just be asking for it from now on in...... Not picking on you, just reminding everyone that i need a script like I provide.....)

but..... asked and answered:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:10286792840956 <code>

Help On SQL

VKoul, May 04, 2004 - 11:27 pm UTC

Beautiful !!!

I'll keep in mind "create table etc."

Thanks

VKoul

analytic q

A reader, May 11, 2004 - 6:38 pm UTC

Tom
Please look at the following schema and data.
---------
spool schema
set echo on
drop table host_instances;
drop table rac_instances;
drop table instance_tablespaces;

create table host_instances
(
host_name varchar2(50),
instance_name varchar2(50)
);

create table rac_instances
(
rac_name varchar2(50),
instance_name varchar2(50)
);

create table instance_tablespaces
(
instance_name varchar2(50),
tablespace_name varchar2(50),
tablespace_size number
);

-- host to instance mapping data
insert into host_instances values ( 'h1', 'i1' );
insert into host_instances values ( 'h2', 'i2' );
insert into host_instances values ( 'h3', 'i3' );
insert into host_instances values ( 'h4', 'i4' );
insert into host_instances values ( 'h5', 'i5' );

-- rac to instance mapping data

insert into rac_instances values ( 'rac1', 'i1' );
insert into rac_instances values ( 'rac1', 'i2' );
insert into rac_instances values ( 'rac2', 'i3' );
insert into rac_instances values ( 'rac2', 'i4' );

--- instance to tablespace mapping data
insert into instance_tablespaces values( 'i1', 't11', 100 );
insert into instance_tablespaces values( 'i1', 't12', 200 );
insert into instance_tablespaces values( 'i2', 't11', 100 );
insert into instance_tablespaces values( 'i2', 't12', 200 );
insert into instance_tablespaces values( 'i3', 't31', 500 );
insert into instance_tablespaces values( 'i3', 't32', 300 );
insert into instance_tablespaces values( 'i4', 't31', 500 );
insert into instance_tablespaces values( 'i4', 't32', 300 );
insert into instance_tablespaces values( 'i5', 't51', 400 );
commit;
---------

What I need is to sum up all tablespaces of all instances
for a list of hosts. However, if two hosts in the list
belong to a RAC then I should only pick one of the
hosts (I can pick any one of them.)
e.g. in the above data I should only pick i1 or i2 not
both since they both belong to the same RAC 'rac1'.

Following is the select I came up with for the above data.
Let me know if you have any comments on it.
Any other alternative solutions you can think of would
also be educating to me. I have not benchmarked this
select yet. The number of hosts could reach up to 2000
approximately. On an average we can assume each will have
one instance - some of these will be RACs.

Thank you!
-----------
scott@ora10g> set echo on
scott@ora10g> column host_name format a10
scott@ora10g> column instance_name format a10
scott@ora10g> column rac_name format a10
scott@ora10g> column row_number format 999
scott@ora10g>
scott@ora10g> select a.instance_name, sum( tablespace_size )
2 from
3 (
4 select instance_name
5 from
6 (
7 select host_name, instance_name, rac_name,
8 row_number() over
9 (
10 partition by rac_name
11 order by rac_name, instance_name
12 ) row_number
13 from
14 (
15 select hi.host_name, hi.instance_name, ri.rac_name
16 from host_instances hi, rac_instances ri
17 where hi.instance_name = ri.instance_name(+)
18 )
19 )
20 where row_number <= 1
21 ) a, instance_tablespaces e
22 where a.instance_name = e.instance_name
23 group by a.instance_name;

i1 300
i3 800
i5 400

---

Also do you prefer the .sql file (as above) or
the spooled output of schema.sql (i.e. schema.lst.)
The above is more convenient to reproduce - but the spooled output makes for better reading in some cases.

May 11, 2004 - 9:15 pm UTC

I like the cut and paste from sqlplus truth be told.

sure, I have to do two vi commands and a couple of deletes to fix it up but.... I'm fairly certain that the poster *actually ran the commands successfully!* which is most relevant to me....

Besides, I do it to you ;)




ops$tkyte@ORA9IR2> select *
  2    from (
  3  select h.host_name, h.instance_name, r.rac_name, sum(t.tablespace_size),
  4         row_number() over (partition by r.rac_name order by h.host_name ) rn
  5    from host_instances h,
  6             rac_instances r,
  7             instance_tablespaces t
  8   where h.instance_name = r.instance_name(+)
  9     and h.instance_name = t.instance_name
 10   group by h.host_name, h.instance_name, r.rac_name
 11         )
 12   where rn = 1
 13  /
 
HO IN RAC_N SUM(T.TABLESPACE_SIZE)         RN
-- -- ----- ---------------------- ----------
h1 i1 rac1                     300          1
h3 i3 rac2                     800          1
h5 i5                          400          1


is the first thing that popped into my head.

with just a couple hundred rows -- any of them will perform better than good enough.

thanx!

A reader, May 11, 2004 - 9:54 pm UTC

"I like the cut and paste from sqlplus truth be told."
Actually I was going to post that only - but your
example at the point of posting led me to believe
that you want a straight sql - may be you wanna
fix that (not that many people seem to care anyways!:))

Thanx for the sql - it looks good and a tad simpler
than the one I wrote...

How to compute this running total (sort of...)

Kishan, May 18, 2004 - 11:33 am UTC

create table investment (
investment_id number,
asset_id number,
agreement_id number,
constraint pk_i primary key (investment_id)
)
/
create table period (
period_id number,
business_domain varchar2(10),
status_code varchar2(10),
constraint pk_p primary key (period_id)
)
/
create table entry (
entry_id number,
period_id number,
investment_id number,
constraint pk_e primary key(entry_id),
constraint fk_e_period foreign key(period_id) references period(period_id),
constraint fk_e_investment foreign key (investment_id) references investment(investment_id)
)
/
create table entry_detail(
entry_id number,
account_type varchar2(10),
amount number,
constraint pk_ed primary key(entry_id, account_type),
constraint fk_ed_entry foreign key(entry_id) references entry(entry_id)
)
/
insert into period (period_id, business_domain, status_code)
SELECT rownum AS period_id,
'BDG' AS business_domain,
'2' AS status_code
from all_objects where rownum <= 5
/

insert into investment(investment_id, asset_id, agreement_id)
select rownum+10 AS investment_id,
rownum+100 AS asset_id,
rownum+1000 AS agreement_id
from all_objects where rownum <=5
/
insert into entry(entry_id, period_id, investment_id) values (1, 1, 11)
/
insert into entry(entry_id, period_id, investment_id) values (2, 2, 11)
/
insert into entry(entry_id, period_id, investment_id) values (3, 3, 11)
/
insert into entry(entry_id, period_id, investment_id) values (4, 3, 13)
/
insert into entry(entry_id, period_id, investment_id) values (5, 4, 13)
/
insert into entry(entry_id, period_id, investment_id) values (6, 4, 14)
/
insert into entry(entry_id, period_id, investment_id) values (7, 5, 14)
/

insert into entry_detail(entry_id, account_type, amount) values(1, 'AC1', 1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(1, 'AC2', -200 )
/
insert into entry_detail(entry_id, account_type, amount) values(1, 'AC3', 300 )
/
insert into entry_detail(entry_id, account_type, amount) values(2, 'AC1', 200 )
/
insert into entry_detail(entry_id, account_type, amount) values(2, 'AC4', -1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(2, 'AC2', -500 )
/
insert into entry_detail(entry_id, account_type, amount) values(3, 'AC2', 2200 )
/
insert into entry_detail(entry_id, account_type, amount) values(3, 'AC1', 200 )
/
insert into entry_detail(entry_id, account_type, amount) values(4, 'AC4', -1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(4, 'AC2', -500 )
/
insert into entry_detail(entry_id, account_type, amount) values(5, 'AC2', 2200 )
/
insert into entry_detail(entry_id, account_type, amount) values(6, 'AC1', 200 )
/
insert into entry_detail(entry_id, account_type, amount) values(6, 'AC4', -1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(6, 'AC2', -500 )
/
insert into entry_detail(entry_id, account_type, amount) values(7, 'AC1', 2200 )
/
insert into entry_detail(entry_id, account_type, amount) values(7, 'AC3', 500 )
/
insert into entry_detail(entry_id, account_type, amount) values(7, 'AC4', 1200 )
/

scott@LDB.US.ORACLE.COM> select * from period;

PERIOD_ID BUSINESS_D STATUS_COD
---------- ---------- ----------
1 BDG 2
2 BDG 2
3 BDG 2
4 BDG 2
5 BDG 2

scott@LDB.US.ORACLE.COM> select * from investment;

INVESTMENT_ID ASSET_ID AGREEMENT_ID
------------- ---------- ------------
11 101 1001
12 102 1002
13 103 1003
14 104 1004
15 105 1005

scott@LDB.US.ORACLE.COM> select * from entry;

ENTRY_ID PERIOD_ID INVESTMENT_ID
---------- ---------- -------------
1 1 11
2 2 11
3 3 11
4 3 13
5 4 13
6 4 14
7 5 14

7 rows selected.

scott@LDB.US.ORACLE.COM> select * from entry_detail;

ENTRY_ID ACCOUNT_TY AMOUNT
---------- ---------- ----------
1 AC1 1000
1 AC2 -200
1 AC3 300
2 AC1 200
2 AC4 -1000
2 AC2 -500
3 AC2 2200
3 AC1 200
4 AC4 -1000
4 AC2 -500
5 AC2 2200
6 AC1 200
6 AC4 -1000
6 AC2 -500
7 AC1 2200
7 AC3 500
7 AC4 1200

17 rows selected.

The resultant view needed is given below.

To give an example from the result below, the first entry for investment_id 14
is from period 4. The account types entered on period 4 are AC1, AC4, AC2. We
need these three account types in all subsequent periods. Also, on period 5 a
new account type AC3 is added. So, if there is another period, say period_id 6, we need
information for AC1, AC2, AC3, AC4 (that's 4 account types). If there's no entry
for any of these account_types for any subseqent periods, the amount_for_period for such
periods are considered to be 0.00 and the balance will be sum(amount_for_period)
until that period.

PERIOD_ID INVESTMENT_ID ACCOUNT_TYPE AMOUNT_FOR_PERIOD BALANCE_TILL_PERIOD
--------- ------------- ------------ ----------------- -------------------
1 11 AC1 1000 1000
1 11 AC2 -200 -200
1 11 AC3 300 300

2 11 AC1 200 1200
2 11 AC2 -500 -700
2 11 AC3 0 300
2 11 AC4 -1000 -1000

3 11 AC1 200 1400
3 11 AC2 200 -500
3 11 AC3 0 300
3 11 AC4 0 1000

4 11 AC1 0 1400
4 11 AC2 0 -500
4 11 AC3 0 300
4 11 AC4 0 1000

5 11 AC1 0 1400
5 11 AC2 0 -500
5 11 AC3 0 300
5 11 AC4 0 1000

3 13 AC4 -1000 -1000
3 13 AC2 -500 -500

4 13 AC4 0 -1000
4 13 AC2 -500 -1000

5 13 AC4 0 -1000
5 13 AC4 0 -1000

4 14 AC1 200 200
4 14 AC4 -1000 -1000
4 14 AC2 -500 -500

5 14 AC1 2200 2400
5 14 AC3 500 500
5 14 AC4 1200 200
5 14 AC2 0 -500

The blank lines in between are just for clarity. As always, grateful for all your efforts.

Regards,
Kishan.

May 18, 2004 - 6:14 pm UTC

so, what does your first try look like :) at least get the join written up for the details - maybe the running total will be obvious from that.

This is how far I went...and no further

Kishan, May 19, 2004 - 10:18 am UTC

select distinct period_id,
investment_id,
account_type,
amount_for_period,
balance_till_period
from ( select period.period_id,
entry.investment_id,
entry_detail.account_type,
(case when entry.period_id = period.period_id then entry_detail.amount else 0 end) amount_for_period,
sum(amount) over(partition by period.period_id, investment_id, account_type) balance_till_period
from period left outer join (entry join entry_detail on (entry.entry_id = entry_detail.entry_id)) on (entry.period_id <= period.period_id))
order by investment_id

The result looks as below:

PERIOD_ID INVESTMENT_ID ACCOUNT_TY AMOUNT_FOR_PERIOD BALANCE_TILL_PERIOD
---------- ------------- ---------- ----------------- -------------------
1 11 AC1 1000 1000
1 11 AC2 -200 -200
1 11 AC3 300 300

2 11 AC1 0 1200
2 11 AC1 200 1200
2 11 AC2 -500 -700
2 11 AC2 0 -700
2 11 AC3 0 300
2 11 AC4 -1000 -1000

3 11 AC1 0 1400
3 11 AC1 200 1400
3 11 AC2 0 1500
3 11 AC2 2200 1500
3 11 AC3 0 300
3 11 AC4 0 -1000

4 11 AC1 0 1400
4 11 AC2 0 1500
4 11 AC3 0 300
4 11 AC4 0 -1000

5 11 AC1 0 1400
5 11 AC2 0 1500
5 11 AC3 0 300
5 11 AC4 0 -1000

3 13 AC2 -500 -500
3 13 AC4 -1000 -1000

4 13 AC2 0 1700
4 13 AC2 2200 1700
4 13 AC4 0 -1000

5 13 AC2 0 1700
5 13 AC4 0 -1000

4 14 AC1 200 200
4 14 AC2 -500 -500
4 14 AC4 -1000 -1000

5 14 AC1 0 2400
5 14 AC1 2200 2400
5 14 AC2 0 -500
5 14 AC3 500 500
5 14 AC4 0 200
5 14 AC4 1200 200

First, I am sorry my originally constructed result (by hand..;) misses a couple of rows .
However, other than that, I am unable to remove the redundant rows that are shows up for the particular investment and accout_type for a period as the logic beats me.

Basically, I need to remove rows where the amount_for_period is 0 for an account_type only if its a redundant row for that set. That is, the first row of period_id 2 and 3 are redundant but the rows for the period 4 are not redundant.

Could you help me out?

Regards,
Kishan.

May 19, 2004 - 11:06 am UTC

are we missing some more order bys? I mean -- what if:

3 11 AC1 0 1400
3 11 AC1 200 1400
3 11 AC2 0 1500
3 11 AC2 2200 1500
3 11 AC3 0 300
3 11 AC4 0 -1000

was really:

3 11 AC1 200 1400
3 11 AC2 0 1500
3 11 AC2 2200 1500
3 11 AC3 0 300
3 11 AC4 0 -1000
3 11 AC1 0 1400

would that still be redundant? missing something here/

Yes...they are redundant

A reader, May 19, 2004 - 12:16 pm UTC

Tom:
Yes, for that particular set, those rows are redundant, no matter what the order is.

Regards,
Kishan.

May 19, 2004 - 2:24 pm UTC

ok, so what is the "key" of that result set? what can we partition the result set by.

my idea will be to use your query in an inline view and analytics on that to weed out what you want.

Kishan, May 19, 2004 - 3:08 pm UTC

The key would be period_id, investment_id and accout_type. Basically, what the result represents is the amount and the balance-to-date for a particular account_type of an investment_id for a period.

Eg: Period 1->Investment 1->Account_Type AC1->Amount=1000->Balance=1000

If there's no activity on that investment and account_type for the next period, say Period 2, the amount will be 0 for that period, and the balance will be previous period's balance.

Period 1->Investment 1->Account_Type AC1->Amount=1000->Balance=1000
Period 2->Investment 1->Account_Type AC1->Amount=0->Balance = 1000

But, if there's an activity on that account_type for that investment, then the amount will be the amount for that period and balance will be the sum of previous balance and current amount. Say for Period 2, the amount is 500, then

Period 1->Investment 1->Account_Type AC1->Amount=1000-> Balance=1000
Period 2->Investment 1->Account_Type AC1->Amount=500-> Balance=1500

And if there's a new account type entry, say AC2 and amount, say 2000 created for period 2, then the result set will be

Period 1->Investment 1->Account_Type AC1->Amount=1000->Balance=1000
Period 2->Investment 1->Account_Type AC1->Amount=500->Balance=1500
Period 2->Investment 1->Account_Type AC2->Amount=2000->Balance=2000

There may be many investments per period and many account_types per investment. Hope I am clear....

Regards,
Kishan.

May 19, 2004 - 5:34 pm UTC

so... if you have:

PERIOD_ID INVESTMENT_ID ACCOUNT_TY AMOUNT_FOR_PERIOD BALANCE_TILL_PERIOD
---------- ------------- ---------- ----------------- -------------------
1 11 AC1 1000 1000
1 11 AC2 -200 -200
1 11 AC3 300 300

2 11 AC1 0 1200
2 11 AC1 200 1200
2 11 AC2 -500 -700
2 11 AC2 0 -700
2 11 AC3 0 300
2 11 AC4 -1000 -1000

you see though, why isn't the 4th line here "redundant" then?

But it is redundant..

Kishan, May 19, 2004 - 11:51 pm UTC

Tom, I am assuming the 4th line you mention is 2->11->AC2->0->-700. Yes, it is redundant.

We need amount and balance for every period_id, investment_id and account_type. One line, per period_id, investment_id and account_type, anything more, is redundant.

Issue is, there may not be entries for a specific account_type of an investment for a particular period. In such cases, we need to assume amount for such periods are 0 and compute the balances accordingly.

Regards,
Kishan

May 20, 2004 - 10:55 am UTC

so, if you partition by

PERIOD_ID INVESTMENT_ID ACCOUNT_TY BALANCE_TILL_PERIOD

order by
AMOUNT_FOR_PERIOD

select a.*, lead(amount_for_period) over (partition by .... order by ... ) nxt
from (YOUR_QUERY)

you can then

select *
from (that_query)
where nxt is NULL or (nxt is not null and amount_for_period <> 0)

if nxt is null -- last row in the partition, keep it.
if nxt is not null AND we are zero -- remove it.

Almost there?

Dave Thompson, May 20, 2004 - 12:30 pm UTC

Hi Tom,

We have the following table of data:

CREATE TABLE DEDUP_TEST
(
ID NUMBER,
COLUMN_A VARCHAR2(10 BYTE),
COLUMN_B VARCHAR2(10 BYTE),
COLUMN_C VARCHAR2(10 BYTE),
START_DATE DATE,
END_DATE DATE
)

With:

INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'A', 'B', 'C', TO_Date( '10/01/1999 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2000 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'D', 'B', 'C', TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'A', 'B', 'C', TO_Date( '10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'a', 'f', 'f', TO_Date( '02/06/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '02/07/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/01/2000 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/02/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/05/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/02/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/03/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/04/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/06/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
3, 'A', 'F', 'F', TO_Date( '02/10/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '02/20/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
COMMIT;

We are trying to sequentially de-duplicate this data.

Basically from the top of the table we go down and the check each row against the previous. If they are the same the row that is a duplicate is marked as such as is the original row.

So far we have this query:

SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
CASE WHEN ( DUP = 'DUP' OR DUPER = 'DUP' ) THEN 'DUP' ELSE 'NOT' END LETSEE
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
DUP,
CASE WHEN COLUMN_A = NEXT_A
AND COLUMN_B = NEXT_B
AND COLUMN_C = NEXT_C THEN 'DUP' ELSE 'NOT' END DUPER
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
NEXT_A,
NEXT_B,
NEXT_C,
CASE WHEN COLUMN_A = PREV_A
AND COLUMN_B = PREV_B
AND COLUMN_C = PREV_C THEN 'DUP' ELSE 'NOT' END DUP
FROM ( SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
LAG (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS prev_A,
LAG (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS prev_B,
LAG (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS prev_C,
LEAD (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS next_A,
LEAD (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS next_B,
LEAD (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS next_C
FROM DEDUP_TEST
ORDER
BY 1, 5 ) ) )

ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-00 01-OCT-01 DUP
2 A B B 01-OCT-01 01-OCT-03 DUP
2 A B B 02-OCT-01 05-OCT-03 DUP
2 a f f 06-FEB-04 07-FEB-04 NOT
2 A B B 02-OCT-05 03-OCT-05 DUP
2 A B B 04-OCT-05 06-OCT-05 DUP
3 A F F 10-FEB-04 20-FEB-04 NOT

The resultset from this is almost what I am after.

However where there are groups of duplicate rows I only want to return one row. I take the attributes, the start_date of the first row duplicated and the end_date of the last row duplicated.

I do not want to group all the duplicates together, so for example the rows with the attributes

ID COLUMN_A COLUMN_B COLUMN_C

2 A B B

will result in two output rows:

2 A B B 01-OCT-00 01-OCT-03

2 A B B 02-OCT-05 06-OCT-05

This is the final piece I cannot work out.

Any help would be appreciated.

Thanks.

May 20, 2004 - 2:18 pm UTC

what happens in your data if you had

1 A1 B1 C1 ....
1 A2 B2 C2 ....
1 A1 B1 C1 ....

that might or might not be "dup" since you just order by ID? don't we need to ordedr by a,b, and c?

Follow up

Dave Thompson, May 21, 2004 - 5:02 am UTC

Hi Tom,

In repsonse to your question:

what happens in your data if you had

1 A1 B1 C1 ....
1 A2 B2 C2 ....
1 A1 B1 C1 ....

Then the first row would be classed as unique, as would the second and the third. We are only looking at duplicates that occur sequentially.

Sequential duplicates are then turned into one row by taking the start date of the first row and the end date of the last row in the group.

The test data should have had sequential dates:

INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'A', 'B', 'C', TO_Date( '10/01/1999 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2000 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'D', 'B', 'C', TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'A', 'B', 'C', TO_Date( '10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'a', 'f', 'f', TO_Date( '02/06/2009 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '02/07/2010 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/01/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/01/2006 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/02/2007 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/05/2008 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/02/2011 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/03/2012 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/04/2013 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '10/06/2014 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
3, 'A', 'F', 'F', TO_Date( '02/10/2014 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '02/20/2015 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
COMMIT;

CREATE TABLE DEDUP_TEST
(
ID NUMBER,
COLUMN_A VARCHAR2(10 BYTE),
COLUMN_B VARCHAR2(10 BYTE),
COLUMN_C VARCHAR2(10 BYTE),
START_DATE DATE,
END_DATE DATE
)

The query:

SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
CASE WHEN ( DUP = 'DUP' OR DUPER = 'DUP' ) THEN 'DUP' ELSE 'NOT' END LETSEE
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
DUP,
CASE WHEN COLUMN_A = NEXT_A
AND COLUMN_B = NEXT_B
AND COLUMN_C = NEXT_C THEN 'DUP' ELSE 'NOT' END DUPER
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
NEXT_A,
NEXT_B,
NEXT_C,
CASE WHEN COLUMN_A = PREV_A
AND COLUMN_B = PREV_B
AND COLUMN_C = PREV_C THEN 'DUP' ELSE 'NOT' END DUP
FROM ( SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
LAG (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS prev_A,
LAG (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS prev_B,
LAG (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS prev_C,
LEAD (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS next_A,
LEAD (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS next_B,
LEAD (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS next_C
FROM DEDUP_TEST
ORDER
BY ID, START_DATE ) ) )

Gives:

ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 03-OCT-12 DUP
2 A B B 04-OCT-13 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT

From this the sequentially duplicated rows with the attributes a, b, c will become:

2 A B C 01-OCT-03 05-OCT-08

2 A B C 02-OCT-11 06-OCT-14

Thanks.

May 21, 2004 - 10:50 am UTC

define sequentially.

1 A1 B1 C1 ....
1 A2 B2 C2 ....
1 A1 B1 C1 ....

ordered by ID is the same (exact same) as:

1 A1 B1 C1 ....
1 A1 B1 C1 ....
1 A2 B2 C2 ....

and

1 A2 B2 C2 ....
1 A1 B1 C1 ....
1 A1 B1 C1 ....

and in fact, two runs of your query could return different answers given the SAME exact data. How to handle that, you must have something more to sort by.

Typo in previous post

Dave Thompson, May 21, 2004 - 5:56 am UTC

Tom,

The final output should be:

From this the sequentially duplicated rows with the attributes a, b, c will
become:

2 A B B 01-OCT-03 05-OCT-08

2 A B B 02-OCT-11 06-OCT-14

Thanks.

Order

Dave Thompson, May 21, 2004 - 10:57 am UTC

Hi Tom,

The order of the dataset should be on the ID and Start Date.

ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 03-OCT-12 DUP
2 A B B 04-OCT-13 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT

Thanks.

May 21, 2004 - 11:42 am UTC

Ok, your example doesn't do that -- it is "non-deterministic", given the same data, it could/would return two different answers at different times during the day!


so, i think you want one of these:

ops$tkyte@ORA9IR2> select *
  2    from (
  3  select id, a,b,c, start_date, end_date,
  4         case when (a = lag(a) over (order by id, start_date desc) and
  5                    b = lag(b) over (order by id, start_date desc) and
  6                    c = lag(c) over (order by id, start_date desc) )
  7              then row_number() over (order by id, start_date)
  8          end rn
  9    from v
 10         )
 11   where rn is null
 12  /
 
        ID A          B          C          START_DAT END_DATE          RN
---------- ---------- ---------- ---------- --------- --------- ----------
         1 A          B          C          01-OCT-99 01-OCT-00
         1 D          B          C          01-OCT-01 01-OCT-02
         1 A          B          C          01-OCT-02 01-OCT-03
         2 A          B          B          02-OCT-07 05-OCT-08
         2 a          f          f          06-FEB-09 07-FEB-10
         2 A          B          B          04-OCT-13 06-OCT-14
         3 A          F          F          10-FEB-14 20-FEB-15
 
7 rows selected.
 
ops$tkyte@ORA9IR2> select *
  2    from (
  3  select id, a,b,c, start_date, end_date,
  4         case when (a = lag(a) over (order by id, start_date) and
  5                    b = lag(b) over (order by id, start_date) and
  6                    c = lag(c) over (order by id, start_date) )
  7              then row_number() over (order by id, start_date)
  8          end rn
  9    from v
 10         )
 11   where rn is null
 12  /
 
        ID A          B          C          START_DAT END_DATE          RN
---------- ---------- ---------- ---------- --------- --------- ----------
         1 A          B          C          01-OCT-99 01-OCT-00
         1 D          B          C          01-OCT-01 01-OCT-02
         1 A          B          C          01-OCT-02 01-OCT-03
         2 A          B          B          01-OCT-03 01-OCT-04
         2 a          f          f          06-FEB-09 07-FEB-10
         2 A          B          B          02-OCT-11 03-OCT-12
         3 A          F          F          10-FEB-14 20-FEB-15
 
7 rows selected.

we just need to mark records that the preceding record is the "same" after sorting -- then nuke them.

More Info

Dave Thompson, May 21, 2004 - 12:25 pm UTC

Hi Tom,

Thanks for the prompt reply.

I re-wrote the base query:

SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
CASE WHEN ( DUP = 'DUP' OR DUPER = 'DUP' ) THEN 'DUP' ELSE 'NOT' END LETSEE
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
DUP,
CASE WHEN COLUMN_A = NEXT_A
AND COLUMN_B = NEXT_B
AND COLUMN_C = NEXT_C THEN 'DUP' ELSE 'NOT' END DUPER
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
NEXT_A,
NEXT_B,
NEXT_C,
CASE WHEN COLUMN_A = PREV_A
AND COLUMN_B = PREV_B
AND COLUMN_C = PREV_C THEN 'DUP' ELSE 'NOT' END DUP
FROM ( SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
ROWID ROWID_R,
LAG (COLUMN_A, 1, 0) OVER (ORDER BY ID, START_DATE) AS prev_A,
LAG (COLUMN_B, 1, 0) OVER (ORDER BY ID, START_DATE) AS prev_B,
LAG (COLUMN_C, 1, 0) OVER (ORDER BY ID, START_DATE) AS prev_C,
LEAD (COLUMN_A, 1, 0) OVER (ORDER BY ID, START_DATE) AS next_A,
LEAD (COLUMN_B, 1, 0) OVER (ORDER BY ID, START_DATE) AS next_B,
LEAD (COLUMN_C, 1, 0) OVER (ORDER BY ID, START_DATE) AS next_C
FROM DEDUP_TEST
ORDER
BY ID, START_DATE ) ) )

And got:

ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 03-OCT-12 DUP
2 A B B 04-OCT-13 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT

Looking at the column LETSEE I want to add a unique identifier to each row, treating duplicated rows as 1.

For example:

ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET DUP_ID
---------- ---------- ---------- ---------- --------- --------- --- ------
1 A B C 01-OCT-99 01-OCT-00 NOT 1
1 D B C 01-OCT-01 01-OCT-02 NOT 2
1 A B C 01-OCT-02 01-OCT-03 NOT 3
2 A B B 01-OCT-03 01-OCT-04 DUP 4
2 A B B 01-OCT-05 01-OCT-06 DUP 4
2 A B B 02-OCT-07 05-OCT-08 DUP 4
2 a f f 06-FEB-09 07-FEB-10 NOT 5
2 A B B 02-OCT-11 03-OCT-12 DUP 6
2 A B B 04-OCT-13 06-OCT-14 DUP 6
3 A F F 10-FEB-14 20-FEB-15 NOT 7

Then I could use the Dup_Id to partition on to do the anaysis I need.

Any idea?

Have a nice weekend.

Thanks.

May 21, 2004 - 1:59 pm UTC

the above query doesn't work?

Hi Again

Dave Thompson, May 21, 2004 - 2:05 pm UTC

Hi Tom,

The above didn't work.

From the source query:

ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 03-OCT-12 DUP
2 A B B 04-OCT-13 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT

I want to output the following resultset:

ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT

On the resultset from your queries the start and end dates were incorrect.

Where duplicates rows occur one after another then we need to take the start_date of the first row and the end_date of the last row in that block.

So far the following:

2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP

You would get

2 A B B 01-OCT-03 05-OCT-08 DUP

Does this make sense?

Thanks again for you input on this.

May 21, 2004 - 2:19 pm UTC

ops$tkyte@ORA9IR2> select id, a,b,c, min(start_date) start_date, max(end_date) end_date
  2    from (
  3  select id, a,b,c, start_date, end_date,
  4         max(grp) over (order by id, start_date desc) grp
  5    from (
  6  select id, a,b,c, start_date, end_date,
  7         case when (a <> lag(a) over (order by id, start_date desc) or
  8                    b <> lag(b) over (order by id, start_date desc) or
  9                    c <> lag(c) over (order by id, start_date desc) )
 10              then row_number() over (order by id, start_date desc)
 11          end grp
 12    from v
 13         )
 14         )
 15   group by id, a,b,c,grp
 16   order by 1, 5
 17  /
 
        ID A          B          C          START_DAT END_DATE
---------- ---------- ---------- ---------- --------- ---------
         1 A          B          C          01-OCT-99 01-OCT-00
         1 D          B          C          01-OCT-01 01-OCT-02
         1 A          B          C          01-OCT-02 01-OCT-03
         2 A          B          B          01-OCT-03 05-OCT-08
         2 a          f          f          06-FEB-09 07-FEB-10
         2 A          B          B          02-OCT-11 06-OCT-14
         3 A          F          F          10-FEB-14 20-FEB-15
 
7 rows selected.


One of my (current) favorite analytic tricks -- the old "carry forward".  We mark rows such that the preceding row was different -- subsequent dup rows would have NULLS there for grp.  

Then, we use max(grp) to "carry" that number down....

Now we have something to group by -- we've divided the rows up into groups we can deal with.


(note: if a,b,c allow NULLS, we'll need to accomidate for that!)

Great Stuff

Dave Thompson, May 21, 2004 - 5:02 pm UTC

Tom,

Thanks very much for that.

I'll go over it in more detail when I'm in the Office Monday but it looks great from here.

Enjoy the weekend.

Excellent

Dave Thompson, June 02, 2004 - 4:53 am UTC

Hi Tom,

This solution was spot on.

Thanks.

Any more thoughts on an Analytics book?

Stalin, June 09, 2004 - 6:03 pm UTC

hi tom,

wondering what would below sql look like if there hadn't been existence of lead or partition analytical funtions. is pl/sql the only option.

snippet from "lead/lag on different dataset" thread (it's has the create and insert stmts)

ops$tkyte@ORA9IR2> select a.* , round( (signoff_date-signon_date) * 24 * 60, 2 )
minutes
2 from (
3 select log_id,
4 case when action in (1,2) and lead(action) over (partition by
account_id,user_id,mac order by log_creation_date) in (3,4,5,6,7)
5 then lead(log_id) over (partition by account_id, user_id, mac
order by log_creation_date)
6 end signoff_id,
7 user_id,
8 log_creation_date signon_date,
9 case when action in (1,2) and lead(action) over (partition by
account_id,user_id,mac order by log_creation_date) in (3,4,5,6,7)
10 then lead(log_creation_date) over (partition by account_id,
user_id, mac order by log_creation_date)
11 end signoff_date,
12 action
13 from logs
14 where account_id = 'Robert'
15 and service = 5
16 order by user_id
17 ) a
18 where action in (1,2)
19 /

Thanks,
Stalin

June 09, 2004 - 6:27 pm UTC

you could use a non-equi self join to achieve the same. Many orders of magnitudes slower.

scalar subqueries could be used as well -- with the same "slower" caveat.

Is this solvable with ANALTICS too?

Peter Tran, June 10, 2004 - 12:14 am UTC

Hi Tom,

Can the following problem be solved using Analytics?

I have a 10 columns table where 9 of the fields are dimensions and one attribute. I would like to get a report of D1/D2 columns where the ATTR1 is 1 for every other dimensions. Furthermore the PK consist of all the dimension columns.

The example below aren't really true column names, but I didn't want to make the example table too wide for illustrative purpose.

D1 D2 D3 D4 D5 D6 D7 D8 D9 ATTR1
--------------------------------------------
AA AA AA AA AA AA AA AA AA 1
AA AA BB AA AA AA AA AA AA 1
AA AA AA CC AA AA AA AA AA 1
AA AA AA AA DD AA AA AA AA 1
AA AA AA AA EE AA AA AA AA 1

AA BB AA AA AA AA AA GG AA 1
AA BB AA AA AA AA AA AA AA 1
AA BB CC AA AA AA AA AA AA 0
AA BB AA DD AA AA AA AA AA 1

EE DD JJ LL MM NN OO PP QQ 1
EE DD TT LL MM NN OO PP QQ 1

I want the query to return:

D1 D2
--------
AA AA
EE DD

It would not return AA/BB, because of the record:

D1 D2 D3 D4 D5 D6 D7 D8 D9 ATTR1
--------------------------------------------
AA BB CC AA AA AA AA AA AA 0

Thanks,
-Peter

June 10, 2004 - 7:43 am UTC

yes they can, but they are not needed. regular aggregates do the job. I'd give you the real query if I had a create table/inserts to demo against. this is "psuedo code", might or might not actually work:

select d1, d2
from t
group by d1, d2
having count(distinct attribute) = 1

Michael T., June 10, 2004 - 9:01 am UTC

Peter,
I think the following may give you what you want.

SELECT d1, d2
FROM t
GROUP BY d1, d2
HAVING SUM(DECODE(attr1, 1, 0, 1)) > 0;

Tom's psuedo code will work except for the case when all D1/D2 combinations have the same ATTR1 value, but that value is not 1.

June 10, 2004 - 9:45 am UTC

ahh, good eye -- i was thinking "all attribute values are the same"

but yours doesn't do it, this will

having count( decode( attr1, 1, 1 ) ) = count(*)

cound(decode(attr1,1,1)) will return a count of non-null occurences (all of the 1's)

count(*) returns a count of all records

output when count(decode) = count(*)

Thank you!

Peter Tran, June 10, 2004 - 10:37 am UTC

Hi Tom/Michael T.,

Thank you. It so much clearer now.

-Peter

Michael T., June 10, 2004 - 10:46 am UTC

I did screw up in my previous response. The query I submitted gives the entirely wrong answer. It should have been

SELECT d1, d2
FROM t
GROUP BY d1, d2
HAVING SUM(DECODE(attr1, 1, 0, 1)) = 0

Even though, incorrectly, I wasn't originally considering null values for ATTR1, the above query seems to produce the correct answer even if ATTR1 is NULL. The DECODE will evaluate a null ATTR1 entry to 1.

Tom, many thanks for this site. I have learned so much from it. It is a daily must read for me.

You said a book on analytics?

Jeff, June 10, 2004 - 12:30 pm UTC

A book by you on analytics would be a best seller I think.
Go for it.

quick analytic question

A reader, June 16, 2004 - 5:03 pm UTC

schema creation---
---
scott@ora92> drop table t1;

Table dropped.

scott@ora92> create table t1
2 (
3 x varchar2(10),
4 y number
5 );

Table created.

scott@ora92>
scott@ora92> insert into t1 values( 'x1', 1 );

1 row created.

scott@ora92> insert into t1 values( 'x1', 2 );

1 row created.

scott@ora92> insert into t1 values( 'x1', 4 );

1 row created.

scott@ora92> insert into t1 values( 'x1', 0 );

1 row created.

scott@ora92> commit;

Commit complete.

scott@ora92> select x, y, min(y) over() min_y
2 from t1;

X Y MIN_Y
---------- ---------- ----------
x1 1 0
x1 2 0
x1 4 0
x1 0 0

scott@ora92> spool off
---
how do i get the minimum of y for all values
that is greater than 0 (if one exists). In the above case
I should get the result as

X Y MIN_Y
---------- ---------- ----------
x1 1 1
x1 2 1
x1 4 1
x1 0 1

Thanx for your excellent site and brilliant work!

June 16, 2004 - 6:09 pm UTC

min( case when y > 0 then y end ) over ()

Great!!!

A reader, June 16, 2004 - 6:46 pm UTC

Thank you very much

Gj, July 02, 2004 - 9:16 am UTC

The Oracle docs are a little light on examples but thank you for giving us the quick start to analytics, can't say I understand the complex examples yet, but the simple stuff seems so easy to understand now, can't wait until a real problem comes along I can apply this feature to.

July 02, 2004 - 10:39 am UTC

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96520/analysis.htm#1020 <code>

How to mimic Ora10g LAST_VALUE(... IGNORE NULLS)?

Sergey, July 06, 2004 - 8:08 am UTC

Hi Tom,

I need to 'fill the gaps' with the values from the last existing row in a table that is outer joined to another table. The other table servers as a source of regular [time] intervals. The task seems to be conceptually very simple, so I looked into Ora docs (it happens to be Ora10g docs) I pretty soon found exactly what I need: LAST_VALUE with IGNORE NULLS. Unfortunately neither Ora8i, nor Ora9i accept IGNORE NULLS. Is there any way to mimic this feature with 'older' analitical functions?
I tried sort of ORDER BY SIGN(NVL(VALUE), 0) in analitical ORDER BY clause, but it does not work (I do not have a clue why)

Thanks in advance

Here is the test:
DROP TABLE TD;
CREATE TABLE TD AS
(SELECT TRUNC(SYSDATE, 'DD') + ROWNUM T
FROM ALL_OBJECTS
WHERE ROWNUM <= 15
);

DROP TABLE TV;
CREATE TABLE TV AS
(SELECT
TRUNC(SYSDATE, 'DD') + ROWNUM * 3 T
,ROWNUM V
FROM ALL_OBJECTS
WHERE ROWNUM <= 5
);

SELECT
TD.T
,SIGN(NVL(TV.V, 0))
,NVL
(TV.V,
LAST_VALUE(TV.V IGNORE NULLS) -- IGNORE NULLS does not work on Ora8i, Ora9i
OVER
(
ORDER BY TD.T
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
) V
FROM TD, TV
WHERE TV.T(+) = TD.T
ORDER BY TD.T
;
ERROR at line 6:
ORA-00907: missing right parenthesis

SELECT
TD.T
,SIGN(NVL(TV.V, 0))
,NVL
(TV.V,
LAST_VALUE(TV.V)
OVER
(
ORDER BY SIGN(NVL(TV.V, 0)), TD.T -- Does not work
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
) V
FROM TD, TV
WHERE TV.T(+) = TD.T
ORDER BY TD.T
;
T SIGN(NVL(TV.V,0)) V
------------------- ------------------ ------------------
07.07.2004 00:00:00 0
08.07.2004 00:00:00 0
09.07.2004 00:00:00 1 1
10.07.2004 00:00:00 0
11.07.2004 00:00:00 0
12.07.2004 00:00:00 1 2
13.07.2004 00:00:00 0
14.07.2004 00:00:00 0
15.07.2004 00:00:00 1 3
16.07.2004 00:00:00 0
17.07.2004 00:00:00 0
18.07.2004 00:00:00 1 4
19.07.2004 00:00:00 0
20.07.2004 00:00:00 0
21.07.2004 00:00:00 1 5

July 06, 2004 - 8:26 am UTC

This is a trick I call "carry down", we use analytics on analytics to accomplish this.  We output "marker rows" with ROW_NUMBER() on the leading edge.  Using MAX() in the outer query, we "carry down" these marker rows -- substr gets rid of the row_number for us:



ops$tkyte@ORA10G> select t,
  2         sign_v,
  3             v,
  4             substr( max(data) over (order by t), 7 ) v2
  5    from (
  6  SELECT TD.T,
  7         SIGN(NVL(TV.V, 0)) sign_v,
  8          NVL(TV.V, LAST_VALUE(TV.V IGNORE NULLS) OVER ( ORDER BY TD.T )) V,
  9           case when tv.v is not null
 10                 then to_char( row_number() 
                                  over (order by td.t), 'fm000000' ) || tv.v
 11                    end data
 12      FROM TD, TV
 13      WHERE TV.T(+) = TD.T
 14          )
 15   ORDER BY T
 16      ;
 
T             SIGN_V          V V2
--------- ---------- ---------- -----------------------------------------
07-JUL-04          0
08-JUL-04          0
09-JUL-04          1          1 1
10-JUL-04          0          1 1
11-JUL-04          0          1 1
12-JUL-04          1          2 2
13-JUL-04          0          2 2
14-JUL-04          0          2 2
15-JUL-04          1          3 3
16-JUL-04          0          3 3
17-JUL-04          0          3 3
18-JUL-04          1          4 4
19-JUL-04          0          4 4
20-JUL-04          0          4 4
21-JUL-04          1          5 5
 
15 rows selected.


So, in 9ir2 this would simply be:


ops$tkyte@ORA9IR2> select t,
  2         sign_v,
  3             substr( max(data) over (order by t), 7 ) v2
  4    from (
  5  SELECT TD.T,
  6         SIGN(NVL(TV.V, 0)) sign_v,
  7           case when tv.v is not null
  8                        then to_char( row_number() over (order by td.t), 'fm000000' ) || tv.v
  9                    end data
 10      FROM TD, TV
 11      WHERE TV.T(+) = TD.T
 12          )
 13   ORDER BY T
 14      ;
 
T             SIGN_V V2
--------- ---------- -----------------------------------------
07-JUL-04          0
08-JUL-04          0
09-JUL-04          1 1
10-JUL-04          0 1
11-JUL-04          0 1
12-JUL-04          1 2
13-JUL-04          0 2
14-JUL-04          0 2
15-JUL-04          1 3
16-JUL-04          0 3
17-JUL-04          0 3
18-JUL-04          1 4
19-JUL-04          0 4
20-JUL-04          0 4
21-JUL-04          1 5
 
15 rows selected.

Doesn't work with PL/SQL ????????

A reader, July 20, 2004 - 9:31 am UTC

Dear Tom
Are analytics fully compatible with PL/SQL?
Please see
SQL> ed
Wrote file afiedt.buf

  1  select empno,deptno,
  2         count(empno) over (partition by deptno order by empno
  3                            rows between unbounded preceding and current row) run_count
  4* from emp
SQL> /

     EMPNO     DEPTNO  RUN_COUNT
---------- ---------- ----------
      7782         10          1
      7839         10          2
      7934         10          3
      7369         20          1
      7566         20          2
      7788         20          3
      7876         20          4
      7902         20          5
      7499         30          1
      7521         30          2
      7654         30          3

     EMPNO     DEPTNO  RUN_COUNT
---------- ---------- ----------
      7698         30          4
      7844         30          5
      7900         30          6

14 rows selected.

SQL> 
SQL> ed
Wrote file afiedt.buf

  1  declare
  2  cursor c1 is
  3  select empno,deptno,
  4         count(empno) over (partition by deptno order by empno
  5                            rows between unbounded preceding and current row) run_count
  6  from emp;
  7  begin
  8   for rec in c1 loop
  9    null;
 10   end loop;
 11* end;
SQL> /
end;
   *
ERROR at line 11:
ORA-06550: line 5, column 72:
PL/SQL: ORA-00905: missing keyword
ORA-06550: line 3, column 1:
PL/SQL: SQL Statement ignored


SQL> 
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0       Production
TNS for 32-bit Windows: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

SQL>

July 20, 2004 - 8:08 pm UTC

You can contact support and reference <Bug:3083373>, but the workaround would be to use native dynamic sql or a view to "hide" this construct.

the problem turns out to be the word "current" which had meaning in plsql.

Effect of distinct on lag

John Murphy, July 29, 2004 - 1:48 pm UTC

I am trying to use analytics to find accounts with receipts in 3 consecutive years. The analytic code seems to work, however, when I add DISTINCT (to find each account once), I get strange results. This is on 9.2.0.1.0.

create table jcm_test(acct_id number(10), rcpt_date date);

insert into jcm_test
values (1 , to_date('01-JAN-2000', 'dd-mon-yyyy'));
insert into jcm_test
values (1 , to_date('01-JAN-2001', 'dd-mon-yyyy'));
insert into jcm_test
values (1 , to_date('01-JAN-2003', 'dd-mon-yyyy'));
insert into jcm_test
values (1 , to_date('02-JAN-2001', 'dd-mon-yyyy'));

(select j2.*,
rcpt_year - lag_yr as year_diff,
rank_year - lag_rank as rank_diff
from (select acct_id, rcpt_year, rank_year,
lag(rcpt_year, 2) over (partition by acct_id order by rcpt_year) lag_yr,
lag(rank_year, 2) over (partition by acct_id order by rcpt_year) lag_rank
from (select acct_id,
rcpt_year,
rank() over (partition by acct_id order by j.rcpt_year) rank_year
from (select distinct acct_id, to_char(rcpt_date, 'YYYY') rcpt_year
from jcm_test) j )
) j2);

ACCT_ID RCPT RANK_YEAR LAG_ LAG_RANK YEAR_DIFF RANK_DIFF
---------- ---- ---------- ---- ---------- ---------- ----------
1 2000 1
1 2001 2
1 2003 3 2000 1 3 2

select * from
(select j2.*,
rcpt_year - lag_yr as year_diff,
rank_year - lag_rank as rank_diff
from (select acct_id, rcpt_year, rank_year,
lag(rcpt_year, 2) over (partition by acct_id order by rcpt_year) lag_yr,
lag(rank_year, 2) over (partition by acct_id order by rcpt_year) lag_rank
from (select acct_id,
rcpt_year,
rank() over (partition by acct_id order by j.rcpt_year) rank_year
from (select distinct acct_id, to_char(rcpt_date, 'YYYY') rcpt_year
from jcm_test) j )
) j2)
where year_diff = rank_diff;

no rows selected

select distinct * from
(select j2.*,
rcpt_year - lag_yr as year_diff,
rank_year - lag_rank as rank_diff
from (select acct_id, rcpt_year, rank_year,
lag(rcpt_year, 2) over (partition by acct_id order by rcpt_year) lag_yr,
lag(rank_year, 2) over (partition by acct_id order by rcpt_year) lag_rank
from (select acct_id,
rcpt_year,
rank() over (partition by acct_id order by j.rcpt_year) rank_year
from (select distinct acct_id, to_char(rcpt_date, 'YYYY') rcpt_year
from jcm_test) j )
) j2)
where year_diff = rank_diff;

ACCT_ID RCPT RANK_YEAR LAG_ LAG_RANK YEAR_DIFF RANK_DIFF
---------- ---- ---------- ---- ---------- ---------- ----------
1 2001 2 2000 1 1 1
1 2003 4 2001 2 2 2

In your book, you say that because analytics are performed last, you must push them into an inline view. However, that doesn't seem to do the trick here. Thanks, john

July 29, 2004 - 2:18 pm UTC

what release -- i don't see what you see.

Distinct effect release

John Murphy, July 29, 2004 - 3:12 pm UTC

Tom, we are using the following.

Oracle9i Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production

I tried searching Metalink, but couldn't find any bugs.

July 29, 2004 - 4:03 pm UTC

i found one, not published, was solved via 9202 -- at least it did not reproduce, they did not pursue it further for that reason.

Distinct effect release

John Murphy, July 29, 2004 - 4:01 pm UTC

Actually, I suspect that this may be related to bug 2258035. Do you agree? Thanks, john

July 29, 2004 - 4:18 pm UTC

yes, i can confirm that in 9205, it is not happening that way.

how to write this query

Teddy, July 30, 2004 - 6:33 am UTC

Hi

using the original poster´s example:

ORDER OPN STATION CLOSE_DATE
----- --- ------- ----------
12345 10 RECV 07/01/2003
12345 20 MACH1 07/02/2003
12345 25 MACH1 07/05/2003
12345 30 MACH1 07/11/2003
12345 36 INSP1 07/12/2003
12345 50 MACH1 08/16/2003
12346 90 MACH2 07/30/2003
12346 990 STOCK 07/31/2003

How do you write a query to determine that and order has passed maufacturing operation in several months?
In above example

12345 has rows in July and Augist but 12346 has rows in July only. How can we write a query to find orders such as 12345?

July 30, 2004 - 4:40 pm UTC

select order, min(close_date), max(close_date)
from t
having months_between( max(close_date), min(close_date) ) > your_threshold;

Finding pairs in result set

PJ, August 11, 2004 - 10:05 am UTC

Tom,

CREATE TABLE A
(
N NUMBER,
C CHAR(1),
V VARCHAR2(20)
)

INSERT INTO A ( N, C, V ) VALUES ( 1, 'e', '1st e of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 1, 'e', '2nd e of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 1, 'e', '3rd e of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 1, 'w', '1st w of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 1, 'w', '2nd w of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 2, 'e', '1st e of 2nd N');
INSERT INTO A ( N, C, V ) VALUES ( 2, 'w', '1st w of 2nd N');
INSERT INTO A ( N, C, V ) VALUES ( 2, 'w', '2nd w of 2nd N');
commit;

SO the data I've is
select * from a;
-------------------------
N C V

1 e 1st e of 1st N
1 e 2nd e of 1st N
1 e 3rd e of 1st N
1 w 1st w of 1st N
1 w 2nd w of 1st N
2 e 1st e of 2nd N
2 w 1st w of 2nd N
2 w 2nd w of 2nd N

---------------------------------------
And the output I'm looking for is

1 e 1st e of 1st N
1 e 2nd e of 1st N
1 w 1st w of 1st N
1 w 2nd w of 1st N
2 e 1st e of 2nd N
2 w 1st w of 2nd N

So basically I need the first pairs of (e-w/w-e) for each N.

I hope I'm clear here.

Thanks as usual in advance,

August 11, 2004 - 12:40 pm UTC

do you have a field that can be "sorted on" for finding "1st, 2cnd" and so on.

If not, there is no such thing as "first", or "third"

PJ, August 11, 2004 - 12:58 pm UTC

Tom,

Sorry if I was not clear.
we need to pick pairs for N. Like we have 5 rows with N=1. so we have to pick 4 rows leaving 1 UNPAIRED "e" out.
We want the data in the same order as it is in table. We can sort it by --> order by N,C

August 11, 2004 - 1:58 pm UTC

ops$tkyte@ORA920> select n, c, rn, cnt2
  2    from (
  3  select n, c, rn,
  4             min(cnt) over (partition by n) cnt2
  5    from (
  6  select n, c,
  7             row_number() over (partition by n, c order by c) rn,
  8             count(*) over (partition by n, c) cnt
  9    from a
 10             )
 11             )
 12   where rn <= cnt2
 13  /
 
         N C         RN       CNT2
---------- - ---------- ----------
         1 e          1          2
         1 e          2          2
         1 w          1          2
         1 w          2          2
         2 e          1          1
         2 w          1          1
 
6 rows selected.

Brilliant as usual !!

A reader, August 11, 2004 - 2:04 pm UTC

PJ's query

Kevin, August 11, 2004 - 2:04 pm UTC

PJ - you can drop the column 'v' from your table, and just use this query (which I think will answer your question using N and C alone, and generate an appropriate 'v' as it runs).

CREATE TABLE b
(
N NUMBER,
C CHAR(1)
)

INSERT INTO b ( N, C ) VALUES ( 1, 'e');

INSERT INTO b ( N, C ) VALUES ( 1, 'e');

INSERT INTO b ( N, C ) VALUES ( 1, 'e');

INSERT INTO b ( N, C ) VALUES ( 1, 'w');

INSERT INTO b ( N, C ) VALUES ( 1, 'w');

INSERT INTO b ( N, C ) VALUES ( 2, 'e');

INSERT INTO b ( N, C ) VALUES ( 2, 'w');

INSERT INTO b ( N, C ) VALUES ( 2, 'w');

COMMIT;

SELECT n,c,v1
FROM (
SELECT lag (c1) OVER (PARTITION BY n,c1 ORDER BY n,c1) c3,
lead (c1) OVER (PARTITION BY n,c1 ORDER BY n,c1)c4,
c1 ||
CASE WHEN c1 BETWEEN 10 AND 20
THEN 'th'
ELSE DECODE(MOD(c1,10),1,'st',2,'nd',3,'rd','th')
END || ' ' || c || ' of ' || c2 ||
CASE WHEN c2 BETWEEN 10 AND 20
THEN 'th'
ELSE DECODE(MOD(c2,10),1,'st',2,'nd',3,'rd','th')
END || ' N' v1,
t1.*
FROM (
SELECT b.*,
row_number() OVER (PARTITION BY n, c ORDER BY n,c) c1,
DENSE_RANK() OVER (PARTITION BY n, c ORDER BY n,c) c2
FROM b
) t1
) t2
WHERE c3 IS NOT NULL OR c4 IS NOT NULL
/
Results:
N C V1
1 e 1st e of 1st N
1 w 1st w of 1st N
1 e 2nd e of 1st N
1 w 2nd w of 1st N
2 e 1st e of 1st N
2 w 1st w of 1st N

INSERT INTO b ( N, C ) VALUES ( 1, 'w');

COMMIT;

Results:
N C V1
1 e 1st e of 1st N
1 w 1st w of 1st N
1 e 2nd e of 1st N
1 w 2nd w of 1st N
1 e 3rd e of 1st N
1 w 3rd w of 1st N
2 e 1st e of 1st N
2 w 1st w of 1st N

oops

Kevin, August 11, 2004 - 2:12 pm UTC

replace
DENSE_RANK() OVER (PARTITION BY n, c ORDER BY n,c) c2
with
DENSE_RANK() OVER (PARTITION BY c ORDER BY c) c2

my bad.

A reader, August 11, 2004 - 3:27 pm UTC

Your bad what?
toe? leg?

Cool....

PJ, August 12, 2004 - 7:25 am UTC

analytic q

A reader, October 22, 2004 - 6:34 pm UTC

First the schema:

scott@ORA92I> drop table t1;

Table dropped.

scott@ORA92I> create table t1( catg1 varchar2(10), catg2 varchar2(10), total number );

Table created.

scott@ORA92I>
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V1', 'T1', 5 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V1', 'T1', 6 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V1', 'T1', 9 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V2', 'T2', 10 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V3', 'T1', 11 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V4', 'T1', 1 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V5', 'T2', 2 );

1 row created.

scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V6', 'T2', 3 );

1 row created.

The catg2 can only take two values, 'T1', 'T2'.
I want to sum the total column for catg1, catg2
and order by their total sum for each catg1 and catg2 values. Then
I want to list the top 3 catg1, catg2 combinations
based on their sum values of total column.

If there are more than 3 such combinations then I
club the remaining ones into a catg1 value of 'Others'.

my first cut solution is:

scott@ORA92I> select catg1, catg2, sum( total_sum )
2 from
3 (
4 select case
5 when dr > 3 then
6 'Others'
7 when dr <= 3 then
8 catg1
9 end catg1,
10 catg2,
11 total_sum
12 from
13 (
14 select catg1, catg2, total_sum,
15 dense_rank() over( order by total_sum desc) dr
16 from
17 (
18 select catg1, catg2, sum( total ) total_sum
19 from t1
20 group by catg1, catg2
21 )
22 )
23 )
24 group by catg1, catg2;

CATG1 CATG2 SUM(TOTAL_SUM)
---------- ---------- --------------
V1 T1 20
V2 T2 10
V3 T1 11
Others T1 1
Others T2 5

Does it look ok or do you have any better solution?

Thank you as always.

October 23, 2004 - 9:36 am UTC

you could skip a layer of inline view, but it looks fine as is.

thanx!

A reader, October 24, 2004 - 12:37 pm UTC

SQL query

Reader, November 03, 2004 - 1:45 pm UTC

I have a table which stores receipts against Purchase Orders. The users want the following o/p:
For each of the months of Jan, Feb and March 2004, provide a count of number of receipts which fall in each of the following Dollar value range
< $5000
Between $5000 to $9999
> $10,000

(There can be a number of receipts against one Purchase Order, so that's needs to be grouped together first)

I wrote this query using an inline view which is the UNION of 3 SQLs, one for each dollar range.
However, am sure there is a more elegant and efficient method to do this,maybe using analytical functions , CASE, decode .... Appreciate your help.

Thanks

November 05, 2004 - 10:49 am UTC

select trunc(date_col,'mm') Month,
count( case when amt < 5000 then 1 end ) "lt 5000",
count( case when amt between 5000 and 9999 then 1 end ) "between 5/9k",
count( case when amt >= 10000 then 1 end ) "10k or more"
from t
where date_col between :a and :b
group by trunc(date_col,'mm')

single pass....

Great -

syed, November 10, 2004 - 7:09 am UTC

Tom

I have a tables as follows

create table matches
( reference varchar2(9),
  endname varchar2(20),
  beginname varchar2(30),
  DOB date, 
  ni varchar2(9)
)
/


insert into matches values ('A1','SMITH','BOB',to_date('1/1/1976','dd/mm/yyyy'),'AA1234567');
insert into matches values ('A1','SMITH','TOM',to_date('1/1/1970','dd/mm/yyyy'),'AA1234568');
insert into matches values ('A2','JONES','TOM',to_date('1/1/1970','dd/mm/yyyy'),'AA1234568');
insert into matches values ('A3','JONES','TOM',to_date('1/1/1971','dd/mm/yyyy'),'AA1234569');
insert into matches values ('A4','BROWN','BRAD',to_date('1/1/1961','dd/mm/yyyy'),'AA1234570');
insert into matches values ('A4','JONES','BRAD',to_date('1/1/1961','dd/mm/yyyy'),'AA1234571');
insert into matches values ('A1','SMITH','BOB',to_date('1/1/1976','dd/mm/yyyy'),'AA1234567');
insert into matches values ('A3','JACKSON','TOM',to_date('1/1/1971','dd/mm/yyyy'),'AA1234569');
insert into matches values ('A2','JACKSON','BOB',to_date('1/1/1962','dd/mm/yyyy'),'AA1234568');
 insert into matches values ('A5','JACKSON','TOM',to_date('1/1/1920','dd/mm/yyyy'),'AA1234569');
commit;

SQL> select rownum,REFERENCE,ENDNAME,BEGINNAME,DOB,NI from matches;

 ROWNUM REFERENCE ENDNAME  BEGINNAME  DOB       NI
------- --------- -------- ---------- --------- ---------
      1 A1        SMITH    BOB        01-JAN-76 AA1234567
      2 A1        SMITH    TOM        01-JAN-70 AA1234568
      3 A2        JONES    TOM        01-JAN-70 AA1234568
      4 A3        JONES    TOM        01-JAN-71 AA1234569
      5 A4        BROWN    BRAD       01-JAN-61 AA1234570
      6 A4        JONES    BRAD       01-JAN-61 AA1234571
      7 A1        SMITH    BOB        01-JAN-76 AA1234567
      8 A3        JACKSON  TOM        01-JAN-71 AA1234569
      9 A2        JACKSON  BOB        01-JAN-62 AA1234568
     10 A5        JACKSON  TOM        01-JAN-20 AA1234569

I need to show duplicates where the following columns values are the same.

a) REFERENCE, ENDNAME,BEGINNAME,DOB,NI
b) ENDNAME,BEGINNAME,NI
c) REFERENCE,NI

So, 
rownum 1 and 7 match criteria a)
rownum 8 and 10 match criteria b) 
rownum 1 and 7, rownum 3 and 9, rownum 4 and 8 match criteria c)

How can I select this data out to show number matching each criteria ?

November 10, 2004 - 7:23 am UTC

"How can I select this data out to show number matching each criteria ?"

is ambigous.

If you add columns:

count(*) over (partition by reference, endname, beginname, dob, ni ) cnt1,
count(*) over (partition by endname, beginname, ni) cnt2,
count(*) over (partition by reference,ni) cnt3

it'll give you the "dup count" by each partition -- technically showing you the "number matching each criteria"

analytics problem

David, November 19, 2004 - 9:37 am UTC

Am newish to analytic functions and have hit problem as follows:-

create table a
(accno         number(8)     not null,
 total_paid    number(7,2)   not null)
/

create table b
(accno         number(8)     not null,
 due_date      date          not null,
 amount_due    number(7,2)   not null)
/

insert into a values (1, 1000);
insert into a values (2, 1500);
insert into a values (3, 2000);
insert into a values (4, 3000);

insert into b values (1, '01-oct-04', 1000);
insert into b values (1, '01-jan-05', 900);
insert into b values (1, '01-apr-05', 700);

insert into b values (2, '01-oct-04', 1000);
insert into b values (2, '01-jan-05', 900);
insert into b values (2, '01-apr-05', 700);

insert into b values (3, '01-oct-04', 1000);
insert into b values (3, '01-jan-05', 900);
insert into b values (3, '01-apr-05', 700);

insert into b values (4, '01-oct-04', 1000);
insert into b values (4, '01-jan-05', 900);
insert into b values (4, '01-apr-05', 700);

If I then do this query...

SQL> select a.accno,
  2         a.total_paid,
  3         b.due_date,
  4         b.amount_due,
  5         case
  6  when sum(b.amount_due)
  7  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <= 0
  8  then 0
  9  when sum(b.amount_due)
 10  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid < b.amount_due
 11  then sum(b.amount_due)
 12  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid
 13  when sum(b.amount_due)
 14  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid >= b.amount_due
 15  and a.total_paid >= 0
 16  then b.amount_due
 17  end to_pay
 18  from a,b
 19  where a.accno = b.accno
 20  order by a.accno,
 21           to_date(b.due_date, 'dd-mon-rr')
 22  /

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         1       1000 01-OCT-04       1000       1000
         1       1000 01-JAN-05        900        900
         1       1000 01-APR-05        700        700
         2       1500 01-OCT-04       1000       1000
         2       1500 01-JAN-05        900        900
         2       1500 01-APR-05        700        700
         3       2000 01-OCT-04       1000       1000
         3       2000 01-JAN-05        900        900
         3       2000 01-APR-05        700        700
         4       3000 01-OCT-04       1000       1000
         4       3000 01-JAN-05        900        900
         4       3000 01-APR-05        700        700

12 rows selected.

...TO_PAY does not give what I was expecting. But if I do by individual accno I get what I'm after:-

SQL> select a.accno,
  2         a.total_paid,
  3         b.due_date,
  4         b.amount_due,
  5         case
  6  when sum(b.amount_due)
  7  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <= 0
  8  then 0
  9  when sum(b.amount_due)
 10  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid < b.amount_due
 11  then sum(b.amount_due)
 12  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid
 13  when sum(b.amount_due)
 14  over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid >= b.amount_due
 15  and a.total_paid >= 0
 16  then b.amount_due
 17  end to_pay
 18  from a,b
 19  where a.accno = b.accno
 20  and a.accno = &accno
 21  order by a.accno,
 22           to_date(b.due_date, 'dd-mon-rr')
 23  /
Enter value for accno: 1
old  20: and a.accno = &accno
new  20: and a.accno = 1

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         1       1000 01-OCT-04       1000          0
         1       1000 01-JAN-05        900        900
         1       1000 01-APR-05        700        700

3 rows selected.

SQL> /
Enter value for accno: 2
old  20: and a.accno = &accno
new  20: and a.accno = 2

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         2       1500 01-OCT-04       1000          0
         2       1500 01-JAN-05        900        400
         2       1500 01-APR-05        700        700

3 rows selected.

SQL> /
Enter value for accno: 3
old  20: and a.accno = &accno
new  20: and a.accno = 3

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         3       2000 01-OCT-04       1000          0
         3       2000 01-JAN-05        900          0
         3       2000 01-APR-05        700        600

3 rows selected.

SQL> /
Enter value for accno: 4
old  20: and a.accno = &accno
new  20: and a.accno = 4

     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         4       3000 01-OCT-04       1000          0
         4       3000 01-JAN-05        900          0
         4       3000 01-APR-05        700          0

3 rows selected.

What is needed for first query above to work?

cheers,
David

November 19, 2004 - 11:31 am UTC

ops$tkyte@ORA9IR2> select a.accno,
  2         a.total_paid,
  3         b.due_date,
  4         b.amount_due,
  5         case
  6  when sum(b.amount_due)
  7  over (<b>partition by a.accno</b> order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <= 0
  8  then 0
  9  when sum(b.amount_due)
 10  over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid < b.amount_due
 11  then sum(b.amount_due)
 12  over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid
 13  when sum(b.amount_due)
 14  over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid >= b.amount_due
 15  and a.total_paid >= 0
 16  then b.amount_due
 17  end to_pay
 18  from a,b
 19  where a.accno = b.accno
 20  order by a.accno,
 21           to_date(b.due_date, 'dd-mon-rr')
 22  /
 
     ACCNO TOTAL_PAID DUE_DATE  AMOUNT_DUE     TO_PAY
---------- ---------- --------- ---------- ----------
         1       1000 01-OCT-04       1000          0
         1       1000 01-JAN-05        900        900
         1       1000 01-APR-05        700        700
         2       1500 01-OCT-04       1000          0
         2       1500 01-JAN-05        900        400
         2       1500 01-APR-05        700        700
         3       2000 01-OCT-04       1000          0
         3       2000 01-JAN-05        900          0
         3       2000 01-APR-05        700        600
         4       3000 01-OCT-04       1000          0
         4       3000 01-JAN-05        900          0
         4       3000 01-APR-05        700          0
 
12 rows selected.

excellent

David, November 19, 2004 - 12:02 pm UTC

many thanks

Limitation of Analytic Functions

Nilanjan Ray, December 16, 2004 - 4:27 am UTC

I am using the following view
create or replace view vw_history as
select
txm_dt,s_key,s_hist_slno,cm_key,burst_key,cm_channel_key
,(lag(s_hist_slno,1,0) over(partition by s_key,s_hist_slno order by s_key,s_hist_slno)) prv_hist_slno
from adc_history

The following SQL statement invariably does a full table scan on 112,861,91 rows of ADC_HISTORY and runs for 20-25 mins.

select *
from vw_history
where t_dt between to_date('01/01/2002','dd/mm/yyyy') and to_date('01/01/2002','dd/mm/yyyy');

The query return 4200 rows. ADC_HISTORY has 112,861,91 rows. I have the following indexes : ADC_HISTORY_IDX8 on txm_dt and ADC_HISTORY_IDX1 on spot_key columns. Both have good selectivities.

But when the required query is ran without the view it properly uses the index ADC_HISTORY_IDX8

select
txm_dt,s_key,s_hist_slno,cm_key,burst_key,cm_channel_key
,(lag(s_hist_slno,1,0) over(partition by s_key,s_hist_slno order by s_key,s_hist_slno)) prv_hist_slno
from adc_history

I had raised a tar and it says:This is the expected behaviour "PREDICATES ARE NOT PUSHED IN THE VIEW IF ANY ANALYTIC FUNCTIONS ARE USED"

Is there any way to work aroung this limitation. I just cannot think of the painful situation if I am unable to use views with analytics!!!!

Your help is absolutely necessary. Thanks in advance

December 16, 2004 - 8:27 am UTC

guess what -- your two queries <b>return different answers</b>..


did you consider that?  did you check that?  

they are TOTALLY DIFFERENT.  Analytics are applied after predicates.  The view -- it has no predicate.  The query -- it has a predicate.  You'll find that you have DIFFERENT result sets.

don't you see that as a problem?

It is not that you are "unable to use views"

It is that "when I use a view, I get answer 1, when I do not use a view, I get answer 2"

which answer is technically correct here?


Think about it.


consider this example (using RBO just to make it so that "if an index could be used it would" to stress the point):


ops$tkyte@ORA9IR2> create table emp as select * from scott.emp;
 
Table created.
 
ops$tkyte@ORA9IR2> create index job_idx on emp(job);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace view v
  2  as
  3  select ename, sal, job,
  4         sum(sal) over (partition by job) sal_by_job,
  5             sum(sal) over (partition by deptno) sal_by_deptno
  6    from emp
  7  /
 
View created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace on explain
ops$tkyte@ORA9IR2> select *
  2    from v
  3   where job = 'CLERK'
  4  /
 
ENAME             SAL JOB       SAL_BY_JOB SAL_BY_DEPTNO
---------- ---------- --------- ---------- -------------
MILLER           1300 CLERK           4150          8750
JAMES             950 CLERK           4150          9400
SMITH             800 CLERK           4150         10875
ADAMS            1100 CLERK           4150         10875
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=RULE
   1    0   VIEW OF 'V'
   2    1     WINDOW (SORT)
   3    2       WINDOW (SORT)
   4    3         TABLE ACCESS (FULL) OF 'EMP'
 
 
<b>so, one might ask "well - hey, I've got that beautiful index on JOB, I said "where job = 'CLERK'", whats up with that full scan.

in fact, when I do it "right" -- without the evil view:</b>

 
ops$tkyte@ORA9IR2> select ename, sal, job,
  2         sum(sal) over (partition by job) sal_by_job,
  3             sum(sal) over (partition by deptno) sal_by_deptno
  4    from emp
  5   where job = 'CLERK'
  6  /
 
ENAME             SAL JOB       SAL_BY_JOB SAL_BY_DEPTNO
---------- ---------- --------- ---------- -------------
MILLER           1300 CLERK           4150          1300
SMITH             800 CLERK           4150          1900
ADAMS            1100 CLERK           4150          1900
JAMES             950 CLERK           4150           950
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=RULE
   1    0   WINDOW (SORT)
   2    1     WINDOW (SORT)
   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
   4    3         INDEX (RANGE SCAN) OF 'JOB_IDX' (NON-UNIQUE)
 
<b>it very rapidly uses my index !!!   stupid views...

but wait.

whats up with SAL_BY_DEPTNO, that appears to be wrong... hmmm, what happened?

What happened was we computed the sal_by_depto in the query without the view AFTER doing "where job = 'CLERK'"


YOU are doing your LAG() analysis AFTER applying the predicate.  Your lags in your query without the view -- they are pretty much "not accurate"


Note that when the predicate CAN be pushed:</b>
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select ename, sal, sal_by_job
  2    from v
  3   where job = 'CLERK'
  4  /
 
ENAME             SAL SAL_BY_JOB
---------- ---------- ----------
SMITH             800       4150
ADAMS            1100       4150
JAMES             950       4150
MILLER           1300       4150
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=RULE
   1    0   VIEW OF 'V'
   2    1     WINDOW (BUFFER)
   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
   4    3         INDEX (RANGE SCAN) OF 'JOB_IDX' (NON-UNIQUE)
 

<b>it most certainly is.  here the predicate can safely be pushed -- since the analytic is computed "by job", a predicate on "job" can be applied FIRST and then the analytic computed.  

When pushing would change the answer -- we cannot do it.

When pushing the predicate would not change the answer -- we do it.


This is not a 'limitation', this is about "getting the right answer"</b>
 
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> alter session set optimizer_mode = choose;
 
Session altered.

Great!!!

Nilanjan Ray, December 17, 2004 - 12:59 pm UTC

Simply amazing explanation. Cleared my doubts still further. One of the best explanation, in simple concise terms, I have seen on "Ask Tom". You know what, people should take enough caution and learn leasons from you before making misleading statements like "...LIMITATIONS...". In your terms yet again "Analytics Rock".

Regards

Using analytical function, LEAD, LAG

Praveen, December 24, 2004 - 9:26 am UTC

Hi Tom,

Analytical function, LEAD (or LAG) accepts the offset parameter as an integer which is a count of rows to be skipped from the current row before accessing the leading/lagging row. What if I want to access leading rows based on the value of column of current row, like a function applied to the column value of current row to access the leading row.

As an example: I have a table

create table t(id integer, dt date);

For each id, start with the first record, after ordering by dt ASC. Get the next record where dt = 10 min + first_row.dt. Then next record where dt = 20 min + first_row.dt and so on. Each time time is cummulatively increased by 10 min.

Suppose if don't get an exact match from next record (ie next_row.dt <> first_row.dt+10 min(say), then we select a row closest to the expected record, but lying within +/-10 seconds.

insert into t values (1, to_date('12/20/2004 00:00:00', 'mm/dd/yyyy hh24:mi:ss')); --Selected.

insert into t values (1, to_date('12/20/2004 00:05:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:09:55', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:10:00', 'mm/dd/yyyy hh24:mi:ss')); --Selected.

insert into t values (1, to_date('12/20/2004 00:15:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:19:54', 'mm/dd/yyyy hh24:mi:ss')); --Not selected.

insert into t values (1, to_date('12/20/2004 00:19:55', 'mm/dd/yyyy hh24:mi:ss')); --Selected.

insert into t values (1, to_date('12/20/2004 00:25:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:30:05', 'mm/dd/yyyy hh24:mi:ss')); --Selected.

insert into t values (1, to_date('12/20/2004 00:30:06', 'mm/dd/yyyy hh24:mi:ss')); --Not Selected.

insert into t values (1, to_date('12/20/2004 00:35:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into t values (1, to_date('12/20/2004 00:39:55', 'mm/dd/yyyy hh24:mi:ss')); --Either this or below record is selected.

insert into t values (1, to_date('12/20/2004 00:40:05', 'mm/dd/yyyy hh24:mi:ss')); --Either this or above record is selected.

My output would be:
id dt
-----------
1 12/20/2004 00:00:00 AM
1 12/20/2004 00:10:00 AM --Exactly matches first_row.dt + 10min
1 12/20/2004 00:19:55 AM --Closest to first_row.dt + 20min +/- 10sec
1 12/20/2004 00:30:05 AM --Closest to first_row.dt + 30min +/- 10sec
1 12/20/2004 00:39:55 AM OR 12/20/2004 00:40:05 AM --Closest to first_row.dt + 40min +/- 10sec

The method I followed, after failed using LEAD is:

Step#1
------
Get a subset of dt's column, which is a 10 min cummulatiave dts from the dt value of first row(after rounding to the nearest minute, multiple of 10).
In this example I will get a subset:

12/20/2004 00:00:00 AM
12/20/2004 00:10:00 AM
12/20/2004 00:20:00 AM
12/20/2004 00:30:00 AM
12/20/2004 00:40:00 AM

This query will do it:

SELECT t1.id,
( min_dt - MOD ((ROUND (min_dt, 'mi') - ROUND (min_dt, 'hh')) * 24 * 60, 10) / (24 * 60)) + (ROWNUM - 1) * 10 / (24 * 60) dt_rounded
FROM (SELECT id, MIN (dt) min_dt,
ROUND ((MAX (dt) - MIN (dt)) * 24 * 60 / 10) max_rows
FROM t
WHERE id = 1
GROUP BY id) t1, t
WHERE ROWNUM <= max_rows + 1

Step#2:
-------
This subquery is joined with table t to get only those records from t which is either equal to the dts in the resultset returned by the subquery or fall within the range 10min +/-10sec (not closest only, but all).

SELECT t.id, dt_rounded, ABS (t.dt - dt_rounded) * 24 * 60 * 60 dt_diff_in_sec
FROM t,
(SELECT t1.id,
( min_dt - MOD ((ROUND (min_dt, 'mi') - ROUND (min_dt, 'hh')) * 24 * 60, 10) / (24 * 60)) + (ROWNUM - 1) * 10 / (24 * 60) dt_rounded
FROM (SELECT id, MIN (dt) min_dt,
ROUND ((MAX (dt) - MIN (dt)) * 24 * 60 / 10) max_rows
FROM t
WHERE id = 1
GROUP BY id) t1, t
WHERE ROWNUM <= max_rows + 1) t2
WHERE t.id = 1
AND ABS (t.dt - dt_rounded) * 24 * 60 * 60 <= 10
ORDER BY t.id, dt_rounded, dt_diff_in_sec;

I agree, this resultset will include duplicate records which I need to remove procedurally, while looping through the cursor; the order by clause simplifies this.

Now you might have guessed the problem. If table t contains more than 1000 records, the query asks me to wait atleast 2 min! And that too when I am planning to put at least 70,000 records!

I wrote a procedure which is handling the situation a little better. But I dont know if analytical query can help me out to bring back the performance. I could do it if Lead have the fuctionality I mentioned in the first paragraph. Do you have any hints?

Thanks and regards

Praveen

December 24, 2004 - 9:54 am UTC

you'd be looking at first_value with range windows, not lag and lead in this case.

Windowing clause and range function.

Praveen, December 25, 2004 - 1:29 pm UTC

Hi Tom,
Thankyou for the suggestion. I am not very well used with analytical queries. I have tried based on your advise but unable to even start with. I am struck with the first step itself - in specifying the range in the windowing clause. In the windowing clause, we specify an integer to get the preceding rows based on the current column value (CLARK's example-Page:556, Analytical Funtions).

In my above example I wrote a query which contains:

FIRST_VALUE(id)
OVER (ORDER BY dt DESC
RANGE 10 PRECEDING)

10, in the windowing clause, will give me a record that fall within 10 days preceding the current row. But I need 10 minutes preceding records. Also at the same time all those records that span within +/- 10 sec, if exact 10 minute later records are not found (please see the description of the problem given in the previous question).

Kindly give me a more clear picture about windowing clause.
Also how you will approch the above problem.

Thanks and regards

Praveen

December 26, 2004 - 12:19 pm UTC

do you have Expert One on One Oracle? I have extensive examples in there.

range 10 = 10 days.

range 10/24 = 10 hours

range 10/24/60 = 10 minutes......

I do have Expert One on One

Praveen, December 26, 2004 - 2:24 pm UTC

Hi Tom,

I got the first glimpse into analytical queries through your book only. Although I had attempted to learn them through oracle documentation a couple of times earlier, I never was able to write an decent query using analytical functions. Now, after spending a few hours with your book, I can see that these fuctions are not as complex as I thought earlier.

The 'hiredate' example you have given in the book is calculating in terms of days. (Pg:555)

"select ename, sal, hiredate, hiredate-100 window_top
first_value(ename)
over(order by hiredate asc
range 100 preceding) ename_prec,...."

I got the hint from your follow-up. I should have to think a little myself.

Thankyou Tom,

Praveen.

A reader, December 26, 2004 - 5:49 pm UTC

Tom,

Any dates when you would be releasing your book on Analytic?

Thanks.

December 26, 2004 - 6:00 pm UTC

doing a 2nd edition of Expert One on One Oracle now -- not on the list yet.

Great answer!

Shimon Tourgeman, December 27, 2004 - 2:21 am UTC

Dear Tom,
Could you please tell us when you are going to publish the next edition of your books, covering 9iR2 and maybe 10g, as you stated here?

Merry Christmas and a Happy New Year!
Shimon.

December 27, 2004 - 10:06 am UTC

sometime in 2005, but not the first 1/2 :)

Using range windows

Praveen, January 03, 2005 - 8:09 am UTC

Hi Tom,

Please allow me to explain the problem again which you had
followed up earlier (Please refer: "Using analytical
function, LEAD, LAG"). In the table t(id integer, dt date)
I have records which only differ by seconds ('dt' column).
Could you please help me to write a query to create windows
such that each window groups records based on the
expression 590 <= dt_1 <= 610 (590 & 610 are date
difference between first record and current record in
seconds and dt1 is the 'dt' column value of first record in
each window after ordering by 'id' and 'dt' ASC).
The idea is to find a record following the first record
which leads by 10 minutes. If exact match is not found
apply a tolerance of +/-10 seconds. Once the nearest match
is found (if multiple matches are found, select any), start
from the next record and repeat the process. (Please see
the scripts I had given earlier).

In your follow up, you had suggested the use of
first_value() analytical function with range windows. But
it looks like it is pretty difficult to generate the kind
of windows I specified above. And in your book, examples of
such complex nature where not given (pardon me for being
critical).

Your answer will help me to get a deeper and practical
understanding of analytical functions while at the same
time may help us to bring down a 12 hour procedure to less
than 5 hours.

Thanks and regards

Praveen

January 03, 2005 - 9:11 am UTC

no idea what 590 is. days? hours? seconds?

sorry - this doesn't compute to me.

590 <= dt_1 <= 610???

Delete Records Older Than 90 Days While Keeping Max

Mac, January 03, 2005 - 10:24 am UTC

There is a DATE column in a table. I need to delete all records older than 90 days -- except if the newest record for a unique key happens to be older than 90 days, I want to keep it and delete the prior records for that key value.

How?

January 03, 2005 - 10:26 am UTC

if the "newest record for a unique key"

if the key is unique.... then the date column is the only thing to be looked at?

that is, if the key is unique, then the oldest record is the newest record is in the fact the only record....

Oops, but

A reader, January 03, 2005 - 11:01 am UTC

Sorry, forgot to mention that the DATE column is a part of the unique key.

Sorry, I went a bit fast...

Praveen, January 03, 2005 - 2:00 pm UTC

Hi Tom,

Sorry, I didnt explained properly.

590 = (10 minutes * 60) seconds - 10 seconds
600 = (10 minutes * 60) seconds + 10 seconds

Here I am looking for a record (say rn) exactly
600 sec (10 min) later to the first record in
the range window. If I didn't get an exact match
I try to find a record which is closest to rn,
but lies with in a range which is 10 seconds less
than or more than rn.

And the condition

"590 <= dt_1 <= 610" tries to eliminate all other
records inside the range window that does not follow
the above rule.

dt_1 is the dt column value of any row following the
first row in a given range window, such that the
difference between dt_1 and dt of first row is between
590 seconds and 610 seconds. I am interested in only
one record which lies closest to 600 seconds.

I hope, the picture is more clear to you now. As an
example,

id dt
-----------------------------
1 12/20/2004 00:00:00 AM --Range window #1
1 12/20/2004 00:09:55 AM
1 12/20/2004 00:10:00 AM --Selected (Closest to 12/20/2004 00:10:00 AM)

............................
1 12/20/2004 00:10:10 AM --Range window #2
1 12/20/2004 00:19:55 AM --Selected (Closest to 12/20/2004 00:20:00 AM)
1 12/20/2004 00:20:55 AM
............................
1 12/20/2004 00:20:55 AM --Range window #3
1 12/20/2004 00:25:00 AM --Nothing to select
1 12/20/2004 00:29:10 AM --Nothing to select
...........................
1 12/20/2004 00:30:05 AM --Range window #4
1 12/20/2004 00:39:55 AM --Either one is selected
1 12/20/2004 00:40:05 AM --Either one is selected
-----------------------------

Thanks and regards

Praveen

January 03, 2005 - 10:24 pm UTC

that is first_value, last_value with a range window and the time range is

N * 1/24/60/60 -- for N seconds.

How to mimic Oracle 10g LAST_VALUE(... IGNORE NULLS)?

jayaramj@quinnox.com, January 13, 2005 - 3:11 pm UTC

Hi Tom,

In answer to the question 'How to mimic Ora10g LAST_VALUE(... IGNORE NULLS)?' from reviewer Sergey (from Norway) in this post you have proposed the following solution:

ops$tkyte@ORA10G> select t,
2 sign_v,
3 v,
4 substr( max(data) over (order by t), 7 ) v2
5 from (
6 SELECT TD.T,
7 SIGN(NVL(TV.V, 0)) sign_v,
8 NVL(TV.V, LAST_VALUE(TV.V IGNORE NULLS) OVER ( ORDER BY TD.T )) V,
9 case when tv.v is not null
10 then to_char( row_number()
over (order by td.t), 'fm000000' ) || tv.v
11 end data
12 FROM TD, TV
13 WHERE TV.T(+) = TD.T
14 )
15 ORDER BY T
16 ;

The problem is that this solution converts the data type of the column (in this case column TV.V) to a string (V2 in the result is a string). The result would then need to be converted back to the original data type.
It is best to avoid such data type conversion. Is there a solution to mimic Oracle 10g LAST_VALUE(... IGNORE NULLS) in Oracle 9i without the datatype conversion?

January 13, 2005 - 3:45 pm UTC

encode the date as a string using to_char( v, 'yyyymmddhh24miss' ) and in the substr of it back out -- to_date( substr(...), 'yyyymmddhh24miss' )

How to mimic Oracle 10g LAST_VALUE(... IGNORE NULLS)?

Jay, January 14, 2005 - 12:44 am UTC

In response to your post above - Taking care of dates (for datatype conversion) is not complex (though timestamp variants would require a different format string). Object columns are a different story altogether. These cannot be easily converted to strings. Is there a better solution that does not require datatype conversion (and hence does not require any knowledge of the column datatype in this SQL).

January 14, 2005 - 8:06 am UTC

upgrade to 10g.

find prior collect_date to the max collect_date for each customer

JANE, January 25, 2005 - 4:30 pm UTC

Hello,Tom!
I work in ORACLE 8I
I have table with 2 columns:cstmr_no,collect_date
CREATE TABLE CSTMR_dates
(
CSTMR_NO NUMBER(8) NOT NULL,
COLLECT_DATE DATE NOT NULL);
insert into cstmr_dates
values(18,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/03/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/05/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/11/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/03/04','dd/mm/yy');
How can i do instead this query the query using analytical
function:

select cstmr_no,max(collect_date) from
CSTMR_dates
where collect_date<(select max(RETURN_COLLECT_DATE)
group by cstmr_no

In production i have thousands records in the table. THANK A LOT
JANE

January 25, 2005 - 6:59 pm UTC

no idea what "return_collect_date" is. or where it comes from.

the sql is not sql...

Mistake:return_collect_date is a collect_date

JANE, January 26, 2005 - 2:58 am UTC

Thank you for answer
JANE

January 26, 2005 - 8:46 am UTC

but this sql:

select cstmr_no,max(collect_date) from
CSTMR_dates
where collect_date<(select max(COLLECT_DATE)
group by cstmr_no

is still not sql and I don't know if you want to

a) delete all old data BY CSTMR_NO (eg: keep just the record with the max(collect_date) BY CSTMR_NO

b) delete all data such that the collect_date is not equal to the max(collect_date)

I cannot suggest a way to rewrite an invalid sql query.

No,i want to do the next:

A reader, January 26, 2005 - 9:08 am UTC

i have just to presene the data without deleting anything
For each cstmr i have to see:
cstmr_no max(collect_date) last prior date to max
======== ================= ======================
18 01/05/04 01/03/04
248 01/11/04 01/03/04

insert into cstmr_dates
values(18,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/03/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/05/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/11/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/03/04','dd/mm/yy');

January 26, 2005 - 9:31 am UTC

wow, how we got from:

select cstmr_no,max(collect_date) from 
CSTMR_dates
where  collect_date<(select max(RETURN_COLLECT_DATE)
group by cstmr_no 

to this, well -- just "wow".  horse of a very different color.


I have to sort of guess -- maybe I'll get it right -- you want 

a) every cstmr_no, 
b) the last two dates recorded for them.


well, after editing your inserts to make them become actual sql that can run.... (you don't really use YY in real life do you? please please say "no, that was a mistake...")

ops$tkyte@ORA9IR2> select cstmr_no,
  2         max( decode(rn,1,collect_date) ) d1,
  3         max( decode(rn,2,collect_date) ) d1
  4    from (
  5  select cstmr_no,
  6         collect_date,
  7             row_number() over (partition by cstmr_no order by collect_date desc nulls last) rn
  8    from cstmr_dates
  9         )
 10   where rn <= 2
 11   group by cstmr_no
 12  /
 
  CSTMR_NO D1        D1
---------- --------- ---------
        18 01-MAY-04 01-MAR-04
       248 01-NOV-04 01-MAR-04

Lead/Lag and Indexes

Rob H, February 22, 2005 - 6:12 pm UTC

We are using the Lead and Lag functions and I have run into an issue of Index usage.

lets say I have 2 tables

select customer_num, prod_id, date_sold, total_sales from sales_table_NA

and

select customer_num, prod_id, date_sold, total_sales from sales_table_EUR

if i do a

create view eur_sales
select customer_account, prod_id, trunc(sales_date,'mon') month_purch,
sum(total_sales) sales_current, lead(sum(total_sales),1) over(partition by customer_account, prod_id order by trunc(sales_date,'mon') desc) sales_last
from sales_table_EUR
group by customer_account, prod_id

create view na_sales as
select customer_account, prod_id, trunc(sales_date,'mon') month_purch,
sum(total_sales) sales_current, lead(sum(total_sales),1) over(partition by customer_account, prod_id order by trunc(sales_date,'mon') desc) sales_last
from sales_table_NA
group by customer_account, prod_id

There are indexes on the tables for customer_acccount

Now, if I
select * from na_sales where customer_account=1
the index is used. Same for eur_sales. However, if I UNION them together it does not (WINDOW SORT on first select and WINDOW BUFFER on second). If I remove the lead function and UNION them, the index is used.

Any help?

February 23, 2005 - 1:56 am UTC

do you really want UNION or UNION ALL.........

(do you know the difference between the two)....

if you had given me simple setup scripts, I would have been happy to see if that makes a difference, but oh well.

Potential Solution

Rob H, February 22, 2005 - 6:54 pm UTC

Rather than pre-sum the data into 2 views I found that union'ing (actually UNION ALL) the data, then sum and Lag works fine.
ie
select
customer_account, prod_id, sales_date month_purch,
sum(total_sales) sales_current, lead(sum(total_sales),1) over(partition by
customer_account, prod_id order by sales_date desc) sales_last
from(
select customer_account, prod_id, sales_date, total_sales from sales_table_NA
union all
select customer_account, prod_id, trunc(sales_date,'mon') month_purch, total_sales from sales_table_EUR)

Attitude....

Rob H, February 23, 2005 - 9:54 am UTC

What's the deal? Having a bad day? I'm sorry, but I assumed from the select statements you could infer structure. Yes, I was using UNION ALL, yes, I know the difference (uh, feeling a bit rude are we?) but I didn't realize until after I posted that I missed that (a nice feature would be to be able to edit a post for a certain time after post). I generalized the data structure and SQL for confidentiality reasons. For a guy who is so hard on people's IM speak, you forget to capitalize your sentences :)

Now, UNION Vs UNION ALL didn't affect index usage (it did however have 'other' performance issues). You can see from my next post that I worked on the issue and resolved it by not presuming each table. With the new query, if someone issues a select with no 'where customer_account=' then it's slower (but that also wasn't the goal).

Thanks

February 24, 2005 - 4:35 am UTC

No? I was simply asking "do you know the difference between the two" for I find most people

a) don't know union all exists
b) the semantic difference between union and union all
c) the performance penalty involved with union vs union all when they didn't need to use UNION

Your example, as posted, did not use UNION ALL. Look at your text:

<quote>
Now, if I
select * from na_sales where customer_account=1
the index is used. Same for eur_sales. However, if I UNION them together it
does not (WINDOW SORT on first select and WINDOW BUFFER on second). If I remove
the lead function and UNION them, the index is used.
</quote>

I quite simply asked:

does union all change the behaviour? (i did not have an example with table creates and such to work with, so I couldn't really 'test it', I don't have your tables, your indexes, your datatypes, etc)

do you need to use union, you said union, you did not say union all. do you know the difference between the two.

Sorry if you took it as an insult, I can only comment based on the data provided. I had to assume you like most of the world was using UNION, not UNION ALL and simply wanted to know if you could use union all, if union all made a difference, if you knew the difference between the two.

If I had precience, I could have read your subsequent post and not ask any questions I guess.

Not having a bad day, just working with information provided. I was not trying to insult you -- I was simply "asking".

Analytics

Neelz, February 24, 2005 - 5:34 am UTC

Dear Sir,

I had gone through the above examples and was wondering whether analytical functions could be used when aggregating multiple columns from a table,
CREATE TABLE T (
SUPPLIER_CD CHAR(4) NOT NULL,
ORDERRPT_NO CHAR(8) NOT NULL,
ORDER_DATE CHAR(8) NOT NULL,
STORE_CD CHAR(4) NOT NULL,
POSITION_NO CHAR(3 ) NOT NULL,
CONTORL_FLAG CHAR(2 ),
ORDERQUANTITY_EXP NUMBER(3) DEFAULT (0) NOT NULL,
ORDERQUANTITY_RES NUMBER(3) DEFAULT (0) NOT NULL,
ENT_DATE DATE DEFAULT (SYSDATE) NOT NULL,
UPD_DATE DATE DEFAULT (SYSDATE) NOT NULL,
CONSTRAINT PK_T PRIMARY KEY(SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE, STORE_CD));

CREATE INDEX IDX_T ON T (SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE);

insert into t values('5636','62108373','20041129','0007','2','00',1,1, to_date('2004/11/29', 'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('5636','62108373','20041129','0012','2','00',1,1,to_date('2004/11/29', 'yyyy/mm/dd'), to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('5636','62108384','20041129','0014','2','00',1,1,to_date('2004/11/29', 'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('5636','62108384','20041129','0015','3','00',1,1,to_date('2004/11/29', 'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('1000','11169266','20040805','1309','4','00',8,8,to_date('2004/11/29', 'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));

insert into t values('1000','11169266','20040805','1312','12' ,'00',8,8,to_date('2004/04/22', 'yyyy/mm/dd'),to_date('2004/11/23', 'yyyy/mm/dd'));

insert into t values('1000','11169266','20040805','1313','13' ,'00',12,12,to_date('2004/04/22', 'yyyy/mm/dd'),to_date('2004/11/23', 'yyyy/mm/dd'));

Currently the following query is used:-

SELECT
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE,
SUM(DECODE(RTRIM(POSITION_NO),'1',ORDERQUANTITY_RES,0)) Q1,
SUM(DECODE(RTRIM(POSITION_NO),'2',ORDERQUANTITY_RES,0)) Q2,
SUM(DECODE(RTRIM(POSITION_NO),'3',ORDERQUANTITY_RES,0)) Q3,
SUM(ORDERQUANTITY_RES) ORDER_TOTAL
FROM
T
GROUP BY
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE

The execution plan when this query is executed on the real table which has 4m records is : -

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=103002 Card=3571095 Bytes=107132850)

1 0 SORT (GROUP BY NOSORT) (Cost=103002 Card=3571095 Bytes=107 132850)

2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=10 3002 Card=3571095 Bytes=107132850)

3 2 INDEX (FULL SCAN) OF 'IDX_T' (NON-UNIQUE) (Cost=26942 Card=3571095)

Could you please tell me whether analytical functions could be used over here or a better approach for this query.

Thanks for your great help

February 24, 2005 - 5:49 am UTC

there would be no need of analytics here. analytics would be useful to get the 'aggregates' while preserving the 'details'

eg:

select empno, sal, sum(sal) over (partition by deptno)
from emp;

shows the empno, their sal and the sum of all salaries in their dept. that would be instead of coding:

select empno, sal, sum_sal
from emp, (select deptno, sum(sal) sum_sal from emp gropu by deptno) t
where emp.deptno = t.deptno
/

I was just wondering

A reader, February 24, 2005 - 6:25 am UTC

how would analytics help in the following example (the data nodes are implemented as rows in a table with two columns as pointers: split-from and merge-to, and the third column is "value", some number, not shown on diagram):

</code> http://img23.exs.cx/my.php?loc=img23&image=directedgraph11th.png <code>

The task is to use this directed dependency graph and prorate the "value" column in each row/node in the following way:

foreach node
-start with a node, for example 16
-visit each hierarchy on which 16 depends, in this case hierarchies for 14 and 15, SUM their values and the current value of node 16, and that will be new, prorated value for node 16
-repeat this recursively for each sub-hierarchy
until all nodes are prorated

I was thinking maybe to use combination of sys_connect_by_path and AF but not sure how. Any thoughts?

February 24, 2005 - 6:51 am UTC

you won't get very far with that structure in 9i and before. connect by "loop" will be an error you see lots of with a directed graph.

analytics won't be appropriate either, they work on windows - not on hierarchies.

sys_connect_by_path is going to give you a string, not a sum

a scalar subquery in 10g with NOCYCLE on the query might work.

What if there is no closure inside the graph?

A reader, February 24, 2005 - 9:08 am UTC

i.e. if the link between node 9 and 5 is removed, and the link between node 6 and 0 is removed.
Would that make difference? It would be a tree in that case. How should we proceed if that is the case? I was thinking maybe to use sys_connect_by_path to pack all sub-hierarchies one after another, and marker in window to be the depth or level. If the level switch from n to 1 that would mean the end of sub-hierarchy. If the level switch from 1 to 2 that is the begining of the hierarchy. And then aggregate over partition inside hierarchy view. Or is there a better approach?

February 24, 2005 - 9:22 am UTC

scalar subqueries.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:30609389052804 <code>

Lead/Lag and 0 Sales

Rob H, February 24, 2005 - 1:00 pm UTC

Thanks for all of the help so far. I have run into an issue where I have Companies and Contacts at that company. Here are the tables.

create table SALES_TRANS
(
CUSTOMER_ACCOUNT VARCHAR2(8) ,
STATION_NUMBER VARCHAR2(7) ,
PRODUCT_CODE VARCHAR2(8) ,
QUANTITY NUMBER ,
DATE_ISSUE DATE ,
PRICE NUMBER ,
VALUE NUMBER );
/

Create table COMPANY_CUSTOMER
(
COMPANY_ID NUMBER(9),
CUSTOMER_ACCOUNT VARCHAR2(8));
/

Create table PRODUCT_INFO
(
PRODUCT_CODE VARCHAR2(8) ,
PRODUCT_GROUP VARCHAR2(25),
PRODUCT_DESC VARCHAR2(100)
);
/

Running a query by customer (this select is a view called - SUM_CUST_TRANS_PRODUCT_FY_V)
Select
c.COMPANY_ID,
t.CUSTOMER_ACCOUNT,
p.product_group,
FISCAL_YEAR(DATE_ISSUE) fiscal_year,
sum(VALUE) total_VALUE_curr_y,
lead(sum(VALUE),1) over (partition by c.COMPANY_ID, t.CUSTOMER_ACCOUNT, p.product_group order by FISCAL_YEAR(DATE_ISSUE) desc) total_VALUE_pre_y
From SALES_TRANS t
inner join COMPANY_CUSTOMER c on t.CUSTOMER_ACCOUNT = C.CUSTOMER_ACCOUNT
inner join PRODUCT_INFO P ON t.PRODUCT_CODE = p.PRODUCT_CODE
group by c.OMPANY_ID, t.CUSTOMER_ACCOUNT, p.product_group, fiscal_year

I get
COMPANY_ID,CUSTOMER_ACCOUNT,PRODUCT_GROUP,FISCAL_YEAR,TOTAL_VALUE_CURR_Y,TOTAL_VALUE_PRE_Y
"F0009631","27294370","Product1",2002,1460.08,0
"F0009631","27294370","Product2",2005,0,27926.31
"F0009631","27294370","Product2",2004,27926.31,18086.17
"F0009631","27294370","Product2",2003,18086.17,47597.05
"F0009631","27294370","Product2",2002,47597.05,0
"F0009631","27294370","Product2",2001,0,0
"F0009631","27294370","Product3",2004,64582.6,51041
"F0009631","27294370","Product3",2003,51041,60225
"F0009631","27294370","Product3",2002,60225,43150
"F0009631","27294370","Product3",2001,43150,50491
"F0009631","27294370","Product3",2000,50491,664
"F0009631","27294370","Product3",1999,664,0
"F0009631","27294370","Product4",2005,2119.1,1708.61
"F0009631","27294370","Product4",2004,1708.61,4050.82
"F0009631","27294370","Product4",2003,4050.82,15662.57
"F0009631","27294370","Product4",2002,15662.57,0
"F0009631","27294370","Product5",2005,0,351.64
"F0009631","27294370","Product5",2004,351.64,5873.61
"F0009631","27294370","Product5",2003,5873.61,2548.83
"F0009631","27294370","Product5",2002,2548.83,0
"F0009631","27294370","Product6",2004,17347.84,16781.33
"F0009631","27294370","Product6",2003,16781.33,10575
"F0009631","27294370","Product6",2002,10575,3659.67
"F0009631","27294370","Product6",2001,3659.67,4901.67
"F0009631","27294370","Product6",2000,4901.67,4073.47
"F0009631","27294370","Product6",1999,4073.47,0
"F0009631","27294370","Product7",2004,5377.5,2588
"F0009631","27294370","Product7",2003,2588,245
"F0009631","27294370","Product7",2000,245,0
"F0009631","27340843","Product2",2003,3013.71,0
"F0009631","27340843","Product3",1999,1411,0
"F0009631","27340843","Product5",2003,3254.9,0

Now if I run the same grouping by only company (this select is a view called - SUM_COMPANY_TRANS_PRODUCT_FY_V)
Select
c.COMPANY_ID,
p.product_group,
FISCAL_YEAR(DATE_ISSUE) fiscal_year,
sum(VALUE) total_VALUE_curr_y,
lead(sum(VALUE),1) over (partition by c.COMPANY_ID, p.product_group order by FISCAL_YEAR(DATE_ISSUE) desc) total_VALUE_pre_y
From SALES_TRANS t
inner join COMPANY_CUSTOMER c on t.CUSTOMER_ACCOUNT = C.CUSTOMER_ACCOUNT
inner join PRODUCT_INFO P ON t.PRODUCT_CODE = p.PRODUCT_CODE
group by c.COMPANY_ID, p.product_group, fiscal_year

we get
COMPANY_ID,PRODUCT_GROUP,FISCAL_YEAR,TOTAL_VALUE_CURR_Y,TOTAL_VALUE_PRE_Y
"F0009631","Product1",2002,1460.08,0
"F0009631","Product2",2005,0,27926.31
"F0009631","Product2",2004,27926.31,21099.88
"F0009631","Product2",2003,21099.88,47597.05
"F0009631","Product2",2002,47597.05,0
"F0009631","Product2",2001,0,0
"F0009631","Product3",2004,64582.6,51041
"F0009631","Product3",2003,51041,60225
"F0009631","Product3",2002,60225,43150
"F0009631","Product3",2001,43150,50491
"F0009631","Product3",2000,50491,2075
"F0009631","Product3",1999,2075,0
"F0009631","Product4",2005,2119.1,1708.61
"F0009631","Product4",2004,1708.61,4050.82
"F0009631","Product4",2003,4050.82,15662.57
"F0009631","Product4",2002,15662.57,0
"F0009631","Product5",2005,0,351.64
"F0009631","Product5",2004,351.64,9128.51
"F0009631","Product5",2003,9128.51,2548.83
"F0009631","Product5",2002,2548.83,0
"F0009631","Product6",2004,17347.84,16781.33
"F0009631","Product6",2003,16781.33,10575
"F0009631","Product6",2002,10575,3659.67
"F0009631","Product6",2001,3659.67,4901.67
"F0009631","Product6",2000,4901.67,4073.47
"F0009631","Product6",1999,4073.47,0
"F0009631","Product7",2004,5377.5,2588
"F0009631","Product7",2003,2588,245
"F0009631","Product7",2000,245,0

The problem is that because if I
select * from SUM_CUST_TRANS_PRODUCT_FY_V where fiscal_year=2004

Customer 27340843 will not show up (no 2004 purchases), but that also means that the total_VALUE_pre_y for 2004 will never summarize by customer to the total_VALUE_pre_y for 2004 for the company. Is there a better way to do this. The goal is that we can show current year sales vs previous years sales by company, by customer, and potentially a larger summary higher than company (city).

I guess the idea would be that I could somehow show for all customers in a company, all years, all products, that the company has purchases (cartesian) for every year purchasing. This I think is difficult for large customer, sales transaction tables.

ie

"F0009631","27340843","Product2",2004,0,3013.71 <--- ***
"F0009631","27340843","Product2",2003,3013.71,0

*** This row doesn't exist in the customer view. There are no 2004 sales, so doesn't appear, but we would like to see it so that the year previous shows.

I would love to "attach" some of the transactions if it would help. Is there a better way?

hierarchical cubes + MV?

Rob H, February 25, 2005 - 2:52 pm UTC

Would hierarchical cubes and MV be the solution. It seems like a lot of meta data to create. We would have to create it for all customers, for all years, for all product groups.

February 25, 2005 - 6:40 pm UTC

if you have "missing data", the only way i know to "make it up" is an outer join (partitioned outer joins in 10g rock, removing the need to create cartesian products of every dimension first)

Neelz, February 27, 2005 - 2:32 am UTC

Dear Sir,

This is with regards to my previous post which is 5th above from this.

<quote>
SELECT
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE,
SUM(DECODE(RTRIM(POSITION_NO),'1',ORDERQUANTITY_RES,0)) Q1,
SUM(DECODE(RTRIM(POSITION_NO),'2',ORDERQUANTITY_RES,0)) Q2,
SUM(DECODE(RTRIM(POSITION_NO),'3',ORDERQUANTITY_RES,0)) Q3,
SUM(ORDERQUANTITY_RES) ORDER_TOTAL
FROM
T
GROUP BY
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE
</quote>

As you mentioned analytics could not be used, but could you please advice me on my problem,

The query is infact big, for brevity I just put few columns. The actual query is
SELECT
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE,
SUM(DECODE(RTRIM(POSITION_NO),'1',ORDERQUANTITY_RES,0)) Q1,
SUM(DECODE(RTRIM(POSITION_NO),'2',ORDERQUANTITY_RES,0)) Q2,
SUM(DECODE(RTRIM(POSITION_NO),'3',ORDERQUANTITY_RES,0)) Q3,
.....
.....
.....
.....
.....
.....
SUM(DECODE(RTRIM(POSITION_NO),'197',ORDERQUANTITY_RES,0)) Q197,
SUM(DECODE(RTRIM(POSITION_NO),'198',ORDERQUANTITY_RES,0)) Q198,
SUM(DECODE(RTRIM(POSITION_NO),'199',ORDERQUANTITY_RES,0)) Q199,
SUM(DECODE(RTRIM(POSITION_NO),'200',ORDERQUANTITY_RES,0)) Q200,
SUM(ORDERQUANTITY_RES) ORDER_TOTAL
FROM
T
GROUP BY
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE

As you could see there is a definite pattern on the sum function. Could you please help me in tuning this query?
Thanks in advance

February 27, 2005 - 8:32 am UTC

you aer doing a pivot -- looks great to me? It is "classic"

Neelz, February 27, 2005 - 9:51 am UTC

Dear Sir,

I am sorry if you felt like that, It is quite a new world for me here, started visiting this site 3-4 months back then realized the enormity of it and its become like an addiction. Bought both books by you and started working on it. Reading the Oracle concepts guide. Every day many times will try for asking a question but till now no luck, might be because of timezone difference.

Coming back to my question, since it is a huge query and was taking 35 min to execute, after reading through many articles here and in the books I was really confused as to what approach should I take. Still is. Analytical functions (not useful as you told), Function based indexes(no becuase we have a standard edition), Materialized views(no because its an OLTP), Stored Sql functions, Deterministic keyword, user defined aggregates, optimizer hints.. at present it is confusing for me.

I am working on it with different approaches, could reduce the execution time upto 9.08 minutes. The query was written with an index hint earlier and by removing it, the execution time decreased upt 9+ minutes.

I was thinking whether you could advice on what approach should I take

Thanks for your valuable time,

February 27, 2005 - 10:04 am UTC

if that is taking 35 minutes you either

a) have the memory settings like pga_aggreate_target/sort_area_size set way too low

b) you have billions of records that are hundreds of bytes in width

c) really slow disks

d) an overloaded system

I mean -- that query is pretty "simple" full scan, aggregate, nothing to it -- unless it is a gross simplification, it should not take 35 minutes. Can you trace it with the 10046 level 12 trace and post the tkprof section that is relevant to just this query with the waits and all?

Neelz, February 27, 2005 - 10:56 am UTC

Dear Sir,

Thank you for your kind reply,

This report is taken for the development system.
I used alter session set events '10046 trace name context forever, level 12'. The query execution time was 00:08:15.03

select
supplier_cd, orderrpt_no, order_date,
sum(decode(rtrim(position_no),'1',orderquantity_res,0)) q1,
sum(decode(rtrim(position_no),'2',orderquantity_res,0)) q2,
sum(decode(rtrim(position_no),'3',orderquantity_res,0)) q3,
sum(decode(rtrim(position_no),'4',orderquantity_res,0)) q4,
sum(decode(rtrim(position_no),'5',orderquantity_res,0)) q5,
.....
.....
sum(decode(rtrim(position_no),'197',orderquantity_res,0)) q197,
sum(decode(rtrim(position_no),'198',orderquantity_res,0)) q198,
sum(decode(rtrim(position_no),'199',orderquantity_res,0)) q199,
sum(decode(rtrim(position_no),'200',orderquantity_res,0)) q200,
sum(orderquantity_res) order_total
from
t
group by
supplier_cd, orderrpt_no, order_date

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.04 0 0 0 0
Execute 2 0.02 0.04 0 0 0 0
Fetch 15 431.55 488.37 37147 36118 74 211
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 18 431.60 488.46 37147 36118 74 211

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 66

Rows Row Source Operation
------- ---------------------------------------------------
211 SORT GROUP BY
4205484 TABLE ACCESS FULL T

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 16 0.00 0.00
SQL*Net more data to client 30 0.00 0.00
db file sequential read 3 0.04 0.05
db file scattered read 2280 0.78 30.62
direct path write 4 0.00 0.00
direct path read 147 0.05 1.45
SQL*Net message from client 16 140.57 166.58
SQL*Net break/reset to client 2 0.01 0.01
********************************************************************************

Thank you

February 27, 2005 - 11:13 am UTC

that is 8 minutes?

but I see some writes to temp here -- for 211 aggregated rows, perhaps your sort/pga is set small

Also, why do you need to rtrim() 4,205,484 rows? (and why is something called position NUMBER in a string?) is that rtrim there "just in case" or is it really needed? why would it have trailing blanks and is that not a data integrity issue that needs to be fixed?

(but this is an 8 minute query, not a 35 minute query, if it takes longer on production -- it'll be because it is waiting for something -- like IO...)

Neelz, February 27, 2005 - 11:30 am UTC

Dear Sir,

This is a 3rd party application and the query was written with an index hint earlier. After removing the hint query execution time reduced to 8 min. Regarding the rtrim I have to check with the team if it is really needed. I will try the trace on production tomorrow.

And at last I could see the link for "Submit a New Question"!, I think I should try around 1.00 AM

Thanking You a lot

February 27, 2005 - 11:31 am UTC

depends on your time zone, rarely am I up at 1am east coast (gmt-5) time doing this stuff!

Miki, March 02, 2005 - 8:51 am UTC

Tom,

I need to produce a moving average which has an even window size. If I want a 28 sized window, I need to look backward 14 but I need the first value of the window to be divided by 2 and I need to look forward 14 and the last value of the window to be divided by 2 also.
(a1/2+a2+...+a28+a29/2)/28
How could I accomplish it with the function:
avg() over(...)?

Thanks in advance

March 02, 2005 - 10:03 am UTC

this is the first thought that popped into my head:

a) get the sum(val) over 13 before and 13 after (27 rows possible).
b) get the lag(val,14)/2 and lead(val,14)/2
c) add those three numbers
d) divide by the count of non-null VALS observed (count(val) 13 before/after+ 1 if lag is not null + 1 is lead is not null)


ops$tkyte@ORA9IR2> create table t
  2  as
  3  select rownum id, object_id val
  4    from all_objects
  5   where rownum <= 30;
 
Table created.
 
<b>so, this was my "debug" query, just to see the data:</b>


ops$tkyte@ORA9IR2> select id,
  2         sum(val) over 
                 (order by id rows between 13 preceding and 13 following) sum,
  3         count(val) over 
                 (order by id rows between 13 preceding and 13 following)+
  4             decode(lag(val,14) over (order by id),null,0,1)+
  5             decode(lead(val,14) over (order by id),null,0,1) cnt,
  6             lag(id,14) over (order by id) lagid,
  7             lag(val,14) over (order by id) lagval,
  8             lead(id,14) over (order by id) leadid,
  9             lead(val,14) over (order by id) leadval
 10    from t
 11   order by id;
 
        ID        SUM        CNT      LAGID     LAGVAL     LEADID    LEADVAL
---------- ---------- ---------- ---------- ---------- ---------- ----------
         1     218472         15                               15       6399
         2     224871         16                               16      19361
         3     244232         17                               17      23637
         4     267869         18                               18      14871
         5     282740         19                               19      20668
         6     303408         20                               20      18961
         7     322369         21                               21      15767
         8     338136         22                               22      20654
         9     358790         23                               23       7065
        10     365855         24                               24      17487
        11     383342         25                               25      11077
        12     394419         26                               26      20772
        13     415191         27                               27      15505
        14     430696         28                               28      12849
        15     425648         29          1      17897         29      23195
        16     441314         29          2       7529         30      18523
        17     436505         28          3      23332
        18     422306         27          4      14199
        19     399409         26          5      22897
        20     389266         25          6      10143
        21     365728         24          7      23538
        22     342135         23          8      23593
        23     332316         22          9       9819
        24     320581         21         10      11735
        25     303084         20         11      17497
        26     295369         19         12       7715
        27     276010         18         13      19359
        28     266791         17         14       9219
        29     260392         16         15       6399
        30     241031         15         16      19361
 
30 rows selected.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select id,
  2         (sum(val) over 
                (order by id rows between 13 preceding and 13 following)+
  3              nvl(lag(val,14) over (order by id)/2,0)+
  4              nvl(lead(val,14) over (order by id)/2,0))/
  5             nullif(
  6           count(val) over 
                  (order by id rows between 13 preceding and 13 following)+
  7               decode(lag(val,14) over (order by id),null,0,1)+
  8               decode(lead(val,14) over (order by id),null,0,1)
  9                  ,0) avg
 10    from t
 11   order by id;
 
        ID        AVG
---------- ----------
         1    14778.1
         2 14659.4688
         3 15061.7941
         4 15294.6944
         5 15424.9474
         6  15644.425
         7 15726.3095
         8 15839.2273
         9 15753.1522
        10 15608.2708
        11   15555.22
        12 15569.4231
        13 15664.5741
        14 15611.4464
        15      15386
        16 15666.8966
        17 16006.1071
        18 15903.9074
        19 15802.2115
        20    15773.5
        21 15729.0417
        22 15388.3261
        23 15328.4318
        24 15545.1667
        25  15591.625
        26 15748.7632
        27 15871.6389
        28 15964.7353
        29 16474.4688
        30    16714.1
 
30 rows selected.
 
ops$tkyte@ORA9IR2>

<b>I did not do a detailed check of the results -- but that should get you going (remember -- there are 29 rows -- 14+1+14!!! and beware NULLs)</b>

Miki, March 02, 2005 - 10:54 am UTC

Tom,

Your answer is excellent. That is - almost - what I needed.
If my window size is odd I can use simly avg() over() function. I am looking for a solution where I can also use avg() over() instead of sum() over()/count().
Is it possible?

Thank you!

March 02, 2005 - 11:15 am UTC

if you want to do things to row 1 and row 29 in the window "special" like this -- this was the only thing I thought of.

Miki, March 02, 2005 - 11:18 am UTC

Thank you! I will use your recommended code.

consecutive days... 8.1.7

Dean, March 09, 2005 - 1:07 pm UTC

create table day_cd
(dt date
,cd varchar2(2))
/

insert into day_cd values ('08-MAR-05', 'BD');
insert into day_cd values ('09-MAR-05', 'AD');
insert into day_cd values ('10-MAR-05', 'AD');
insert into day_cd values ('11-MAR-05', 'AD');
insert into day_cd values ('12-MAR-05', 'AD');
insert into day_cd values ('13-MAR-05', 'AD');
insert into day_cd values ('14-MAR-05', 'CD');
insert into day_cd values ('15-MAR-05', 'CD');
insert into day_cd values ('16-MAR-05', 'AD');
insert into day_cd values ('17-MAR-05', 'AD');
insert into day_cd values ('18-MAR-05', 'AD');
insert into day_cd values ('19-MAR-05', 'CD')
/

SELECT * FROM DAY_CD;

DT CD
--------- --
08-MAR-05 BD
09-MAR-05 AD
10-MAR-05 AD
11-MAR-05 AD
12-MAR-05 AD
13-MAR-05 AD
14-MAR-05 CD
15-MAR-05 CD
16-MAR-05 AD
17-MAR-05 AD
18-MAR-05 AD
19-MAR-05 CD

I'd like the count the occurrence of each code as it occurs in consecutive days as one occurrence.

So that the output would be:

CD OCCURRENCES
-- -----------
AD 2
BD 1
CD 2

March 09, 2005 - 1:55 pm UTC

take a look at this thread, try to understand analytics and see if you cannot figure out how you might use lag to do this.

see also:

</code> https://www.oracle.com/technetwork/issue-archive/2014/14-mar/o24asktom-2147206.html <code>

for some hints (analytics to the rescue)

nevermind...

Dean, March 09, 2005 - 1:59 pm UTC

select cd, count(*)
from
(
select cd, dt, case when (lead(dt) over (partition by cd order by dt) - dt) = 1 then 1 else 0 end day
from day_cd
)
where day = 0
group by cd

we were responding at the same time...

Dean, March 09, 2005 - 2:01 pm UTC

:)

select cd, count(*)
from
(
select cd, dt, case when (lead(dt) over (partition by cd order by dt) - dt) = 1 then 1 else 0 end day
from day_cd
)
where day = 0
group by cd

CD COUNT(*)
-- ----------
AD 2
BD 1
CD 2

Thanks for all of your help...

max() over() till not the current row

Miki, March 10, 2005 - 4:12 am UTC

Tom,

I have the following input

DATUM T COL1 COL2 COL3 COL4
2005.02.19 9:29 T 1 0 0 0
2005.02.20 9:29 0 0 0 0
2005.02.21 9:29 0 0 0 0
2005.02.22 9:29 T 1 0 0 0
2005.02.23 9:29 0 0 0 0
2005.02.24 9:29 0 0 0 0
2005.02.25 9:29 0 0 0 0
2005.02.26 9:29 0 0 0 0
2005.02.27 9:29 T 0 1 0 0
2005.02.28 9:29 0 0 0 0
2005.03.01 9:29 0 0 0 0
2005.03.02 9:29 T 1 1 0 0
2005.03.03 9:29 0 0 0 0
2005.03.04 9:29 T 1 1 0 0
2005.03.05 9:29 0 0 0 0
2005.03.06 9:29 T 1 0 0 0
2005.03.07 9:29 0 0 0 0
2005.03.08 9:29 0 0 0 0
2005.03.09 9:29 0 0 0 0

When value of column T is T a rule determines which columns (col1, , col4) get 1 or 0.
Unfortunately, with the rule more then one column can get value 1. So, if col1++col4 > 1 then I would like colx to be the previous colx where t = 'T' and col1+...+col4 = 1

So, the output is the following
DATUM T COL1 COL2 COL3 COL4
2005.02.19 9:29 T 1 0 0 0
2005.02.20 9:29 0 0 0 0
2005.02.21 9:29 0 0 0 0
2005.02.22 9:29 T 1 0 0 0
2005.02.23 9:29 0 0 0 0
2005.02.24 9:29 0 0 0 0
2005.02.25 9:29 0 0 0 0
2005.02.26 9:29 0 0 0 0
2005.02.27 9:29 T 0 1 0 0
2005.02.28 9:29 0 0 0 0
2005.03.01 9:29 0 0 0 0
2005.03.02 9:29 T 0 1 0 0
2005.03.03 9:29 0 0 0 0
2005.03.04 9:29 T 0 1 0 0
2005.03.05 9:29 0 0 0 0
2005.03.06 9:29 T 1 0 0 0
2005.03.07 9:29 0 0 0 0
2005.03.08 9:29 0 0 0 0
2005.03.09 9:29 0 0 0 0
I tried to use a max() over() function to replace the wrong value but it dosnt work because I cant see the max datum till the previous record where t=T and col1+...+col4 = 1

...
case when t = T and col1++col4>1 and
greatest(nvl(max(decode(col1,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col2,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col3,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col4,1,datum)) over(order by datum), sysdate-10000)
) = nvl(max(decode(col1,1,datum)) over(order by datum), sysdate-10000) then 1 else 0 end col1,

Case when t = T and col1++col4>1 and
Greatest(nvl(max(decode(col1,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col2,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col3,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col4,1,datum)) over(order by datum), sysdate-10000)
) = nvl(max(decode(col4,1,datum)) over(order by datum), sysdate-10000) then 1 else 0 end col4

Could you give me a solution to my problem?

Thanks in advance
miki

March 10, 2005 - 7:42 am UTC

I can, but I'd need a create table and some inserts.

You might look at:

</code> https://www.oracle.com/technetwork/issue-archive/2014/14-mar/o24asktom-2147206.html <code>

analytics to the rescue because I'll be using that exact technique.

Miki, March 10, 2005 - 8:09 am UTC

Here is my table populated with data:
create table T
(
DATUM DATE,
T VARCHAR2(1),
COL1 NUMBER,
COL2 NUMBER,
COL3 NUMBER,
COL4 NUMBER
);

insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('16-01-2005 13:17:46', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('04-01-2005 17:23:13', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('01-03-2005 02:59:17', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('11-12-2004 21:59:18', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('10-01-2005 12:00:22', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('24-02-2005 02:36:51', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('08-12-2004 11:21:15', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('07-01-2005 20:52:26', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('02-02-2005 23:44:33', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('04-03-2005 16:25:12', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('01-01-2005 19:02:28', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('22-01-2005 11:21:41', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('19-01-2005 15:32:18', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('19-12-2004 03:07:10', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('21-02-2005 16:25:42', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('01-01-2005 01:02:39', 'dd-mm-yyyy hh24:mi:ss'), 'T', 0, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('15-12-2004 05:49:26', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('04-02-2005 14:35:34', 'dd-mm-yyyy hh24:mi:ss'), 'T', 0, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('02-12-2004 15:01:42', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
commit;

select t.* from t t
order by 1;

DATUM T COL1 COL2 COL3 COL4
1 2004.12.02. 15:01:42 0 0 0 0
2 2004.12.08. 11:21:15 0 0 0 0
3 2004.12.11. 21:59:18 T 1 0 0 0
4 2004.12.15. 5:49:26 0 0 0 0
5 2004.12.19. 3:07:10 0 0 0 0
6 2005.01.01. 1:02:39 T 0 1 0 0
7 2005.01.01. 19:02:28 0 0 0 0
8 2005.01.04. 17:23:13 T 1 1 0 0
9 2005.01.07. 20:52:26 0 0 0 0
10 2005.01.10. 12:00:22 0 0 0 0
11 2005.01.16. 13:17:46 0 0 0 0
12 2005.01.19. 15:32:18 T 1 1 0 0
13 2005.01.22. 11:21:41 0 0 0 0
14 2005.02.02. 23:44:33 0 0 0 0
15 2005.02.04. 14:35:34 T 0 1 0 0
16 2005.02.21. 16:25:42 0 0 0 0
17 2005.02.24. 2:36:51 0 0 0 0
18 2005.03.01. 2:59:17 0 0 0 0
19 2005.03.04. 16:25:12 T 1 0 0 0
Line 8 and 12 have more then one column that contain 1.
So, I need to "copy" every colx from line 6 because it is the first line (ordered by datum), that has value 'T' for column T and only one colx has value 1.

Thank you

March 10, 2005 - 8:28 am UTC

ops$tkyte@ORA9IR2> select t, col1, col2, col3, col4,
  2         substr(max(data) over (order by datum),11,1) c1,
  3         substr(max(data) over (order by datum),12,1) c2,
  4         substr(max(data) over (order by datum),13,1) c3,
  5         substr(max(data) over (order by datum),14,1) c4,
  6             case when col1+col2+col3+col4 > 1 then '<---' end fix
  7    from (
  8  select t.*,
  9         case when t = 'T' and col1+col2+col3+col4 = 1
 10                  then to_char(row_number() over (order by datum) ,'fm0000000000') || col1 || col2 || col3 || col4
 11                  end data
 12    from t
 13         )
 14   order by datum;

T       COL1       COL2       COL3       COL4 C C C C FIX
- ---------- ---------- ---------- ---------- - - - - ----
           0          0          0          0
           0          0          0          0
T          1          0          0          0 1 0 0 0
           0          0          0          0 1 0 0 0
           0          0          0          0 1 0 0 0
T          0          1          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
T          1          1          0          0 0 1 0 0 <---
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
T          1          1          0          0 0 1 0 0 <---
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
T          0          1          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
           0          0          0          0 0 1 0 0
T          1          0          0          0 1 0 0 0

19 rows selected.

Great!

Miki, March 10, 2005 - 9:38 am UTC

Great solution!
Thank you, it is that i expected.

book on Analytics

A reader, March 10, 2005 - 11:07 am UTC

Hi Tom,

It is high time that you publish the book on 'Analytic functions' - there is a lot one can do with these , but very few people are fully aware of it

When is this book due ?

thanks

A variation of Dean's question ...

Julius, March 10, 2005 - 8:13 pm UTC

create table tt (
did number,
dd date,
status number);

alter table tt add constraint tt_pk primary key (did,dd) using index;

insert into tt values (-111,to_date('03/03/2005','mm/dd/yyyy'),11);
insert into tt values (-111,to_date('03/04/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/05/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/06/2005','mm/dd/yyyy'),11);
insert into tt values (-111,to_date('03/07/2005','mm/dd/yyyy'),33);
insert into tt values (-111,to_date('03/08/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/09/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/10/2005','mm/dd/yyyy'),22);

insert into tt values (-222,to_date('03/04/2005','mm/dd/yyyy'),33);
insert into tt values (-222,to_date('03/05/2005','mm/dd/yyyy'),33);
insert into tt values (-222,to_date('03/06/2005','mm/dd/yyyy'),77);
insert into tt values (-222,to_date('03/07/2005','mm/dd/yyyy'),33);
insert into tt values (-222,to_date('03/08/2005','mm/dd/yyyy'),55);
insert into tt values (-222,to_date('03/09/2005','mm/dd/yyyy'),11);

I need a query which would return following result set where days_in_status is a count of consecutive days the did has been in its current status (dd values are days only). I've been trying to use analytics but without much success so far. Any idea? Thanks!!

DID DD STATUS DAYS_IN_STATUS
----- ---------- ------ --------------
-111 03/10/2005 22 3
-222 03/09/2005 11 1

March 10, 2005 - 9:04 pm UTC

ops$tkyte@ORA9IR2> select did, max(dd), count(*)
  2    from (
  3  select x.*, max(grp) over (partition by did order by dd desc) maxgrp
  4    from (
  5  select tt.*,
  6         case when lag(status) over (partition by did order by dd desc) <> status
  7                  then 1
  8                  end grp
  9    from tt
 10         ) x
 11             )
 12   where maxgrp is null
 13   group by did
 14  /

       DID MAX(DD)     COUNT(*)
---------- --------- ----------
      -222 09-MAR-05          1
      -111 10-MAR-05          3



is one approach...

SQL Query

a reader, March 15, 2005 - 6:50 pm UTC

Hi Tom,

create table a
(accno number(8) not null,
amount_paid number(7) not null)
/

insert into a values (1, 1000);
insert into a values (2, 1500);
insert into a values (3, 2000);
insert into a values (4, 3000);
insert into a values (5, 3000);

Could you please help me in writing the following query without using rownum and analytics.

list the accno corresponding to maximum amount paid. In case of more than one accounts having the same max amount paid, list any one.

I am expecting the result to be accno 4 or 5

Thanks for your time.

Regards

March 15, 2005 - 9:27 pm UTC

sounds like homework.

I give a similar quiz question in interviews (find the more frequently occuring month)

tkyte@ORA8IW> select substr( max( to_char(amount_paid,'fm0000000') || accno ), 8 ) accno
2 from a;

ACCNO
-----------------------------------------
5

is one possible approach (assuming that amount_paid is positive)

tkyte@ORA8IW> select max(accno)
2 from a
3 where amount_paid = ( select max(amount_paid) from a );

MAX(ACCNO)
----------
5

is another (that would work well if amount_paid,accno were indexed....)

negatives to worry about ...

Gabe, March 15, 2005 - 9:56 pm UTC

SQL> select * from a;

     ACCNO AMOUNT_PAID
---------- -----------
         1          -2
         2          -1

SQL> select substr( max( to_char(amount_paid,'fm0000000') || accno ), 8 ) accno from a;

ACCNO
-----------------------------------------
21

March 15, 2005 - 10:04 pm UTC

....
(assuming that amount_paid is positive)
.......

that was caveated and why I gave two answers ;)

cannot read ...

Gabe, March 15, 2005 - 10:57 pm UTC

Sorry about that ... missed it completely.

following an idea of mikito ...

Matthias Rogel, March 16, 2005 - 8:11 am UTC

  1  select accno
  2  from a
  3  start with amount_paid = (select max(amount_paid) from a)
  4         and accno = (select min(accno) from a where amount_paid = (select max(amount_paid) from a))
  5* connect by prior null is not null
SQL> /

     ACCNO
----------
         4

would be a third solution

March 16, 2005 - 8:38 am UTC

there are many solutions -- this one would win a Rube Goldberg award though :)

another query using analytics

A reader, March 29, 2005 - 11:32 am UTC

I've got 2 tables, t1 and t2.
t1(1 column):

t1.x(int ,primary key)

1
2
3

and t2(3 columns,index on t2.y):

t2.x(int) t2.y(int) t2.z(int)

1 7000 1
1 7000 6
1 8000 8
2 7000 1
2 7000 5
3 7000 3
3 8000 1
3 8000 7
3 9000 5

I would like to have a report like this:

t1.x t2.y count min max

1 7000 2 1 8
1 8000 1 1 8
2 7000 2 1 5
3 7000 1 1 7
3 8000 2 1 7
3 9000 1 1 7

What I came up with is:

select distinct t1.x,t2.y,
count(*) over (partition by t1.x,t2.y) as count,
min(t2.z) over (partition by t1.x) as min,
max(t2.z) over (partition by t1.x) as max
from t1,t2 where
where t1.x=t2.x;

I was wondering if this query is good enough, or if there's a better way(in terms of performance) to write this query. I'm new to analytics, and your help would be very much appreciated.

March 29, 2005 - 12:25 pm UTC

we could probably do this in analytics without the distinct, something like

select t1.x, t2.y, t2.cnt,
min(t2.z) over (partition by t1.x),
max(t2.z) over (partition by t1.x)
from t1, (select x, y, count(*) cnt from t2 group by x, y ) t2
where t1.x = t2.x;

and maybe even pusht he min/max() down into the inline view.

Analytics problem

Mark, April 08, 2005 - 12:19 pm UTC

Hi Tom,

I have a problem whose solution I'm pretty sure involves analytic functions. I've been struggling with it for some time, but analytics are new to me. I want to go from this:

/* create and inserts */
create table test.test (ordernum varchar2(10), 
                         tasktype char(3),
                         feetype varchar2(20),
                         amount number(10,2));

insert into test.test(ordernum, tasktype, feetype, amount)
               values('123123', 'DOC', 'Product Fee', 15);
insert into test.test(ordernum, tasktype, feetype, amount)
               values('123123', 'DOC', 'Copy Fee', 1);
insert into test.test(ordernum, tasktype, feetype, amount)
               values('34864', 'COS', 'Setup Fee', 23);
insert into test.test(ordernum, tasktype, feetype, amount)
               values('34864', 'COS', 'File Review Fee', 27);
insert into test.test(ordernum, tasktype, feetype, amount)
               values('34864', 'COS', 'Statutory Fee', 23);               
insert into test.test(ordernum, tasktype, feetype, amount)
               values('56432', 'DOC', 'Product Fee', 80);    
insert into test.test(ordernum, tasktype, feetype, amount)
               values('56432', 'DOC', 'Prepayment', -16);

SQL> select tasktype, ordernum, feetype, amount from test.test;

TAS ORDERNUM   FEETYPE                  AMOUNT
--- ---------- -------------------- ----------
DOC 123123     Product Fee                  15
DOC 123123     Copy Fee                      1
COS 34864      Setup Fee                    23
COS 34864      File Review Fee              27
COS 34864      Statutory Fee                22
DOC 56432      Product Fee                  80
DOC 56432      Prepayment                  -16

...to this:

TAS ORDERNUM FEE1        FEE2            FEE3          FEE4     FEE5
--- -------- ----------- --------        ----------    -------- --------
DOC          Product Fee Copy Fee        Prepayment
DOC 123123   15          1
DOC 56432    80                          -16
COS          Setup Fee   File Review Fee Statutory Fee
COS 34864    23          27              22

Allow me to explain. For each tasktype I would like a heading row, which, going across, contains all the feetypes found in test.test for that particular tasktype. There should never be more than five feetypes.

For each ordernum under each tasktype, I would like to have the amounts going across, underneath the appropriate feetypes. 

I'm pretty sure my solution involves the lag and/or lead functions, partitioning over tasktype. I particularly seem to have trouble wrapping my brain around the problem of how to get a distinct ordernum while keeping intact the data in other columns (where ordernums duplicate).

I hope my explanation is clear enough.

Hope you can help. Thanks in advance. I will continue working on this.

April 08, 2005 - 12:51 pm UTC

ops$tkyte@ORA9IR2> with columns
  2  as
  3  (select tasktype, feetype, row_number() over (partition by tasktype order by feetype) rn
  4     from (select distinct tasktype, feetype from test )
  5  )
  6  select a.tasktype, a.ordernum,
  7         to_char( max( decode( rn, 1, amount ) )) fee1,
  8         to_char( max( decode( rn, 2, amount ) )) fee2,
  9         to_char( max( decode( rn, 3, amount ) )) fee3,
 10         to_char( max( decode( rn, 4, amount ) )) fee4,
 11         to_char( max( decode( rn, 5, amount ) )) fee5
 12    from test a, columns b
 13   where a.tasktype = b.tasktype
 14     and a.feetype = b.feetype
 15   group by a.tasktype, a.ordernum
 16   union all
 17  select tasktype, null,
 18         ( max( decode( rn, 1, feetype ) )) fee1,
 19         ( max( decode( rn, 2, feetype ) )) fee2,
 20         ( max( decode( rn, 3, feetype ) )) fee3,
 21         ( max( decode( rn, 4, feetype ) )) fee4,
 22         ( max( decode( rn, 5, feetype ) )) fee5
 23    from columns
 24   group by tasktype
 25   order by 1 desc, 2 nulls first
 26  /
 
TAS ORDERNUM   FEE1            FEE2            FEE3            FEE4 FEE5
--- ---------- --------------- --------------- --------------- ---- ----
DOC            Copy Fee        Prepayment      Product Fee
DOC 123123     1                               15
DOC 56432                      -16             80
COS            File Review Fee Setup Fee       Statutory Fee
COS 34864      27              23              23


of course. :)


(suggestion, break it out, run each of the bits to see what they do.  basically, columns is a view used to "pivot" on -- we needed to assign a column number to each FEETYPE by TASKTYPE.  That is all that view does.

Then, we join that to test and "pivot" naturally.

Union all in the pivot of the column names....

and sort)

RE: Analytics problem

Mark, April 08, 2005 - 1:27 pm UTC

Excellent! I'll definitely break it down to figure out exactly what you did. Thank you very much.

Re: another query using analytics

Gabe, April 08, 2005 - 3:27 pm UTC

You werent given any resources so, I understand your solution was in fact merely a [untested] suggestion.

create table t1 ( x int primary key );

insert into t1 values (1);
insert into t1 values (2);
insert into t1 values (3);

create table t2 ( x int not null references t1(x), y int not null, z int not null );

insert into t2 values ( 1,7000,1);
insert into t2 values ( 1,7000,6);
insert into t2 values ( 1,8000,8);
insert into t2 values ( 2,7000,1);
insert into t2 values ( 2,7000,5);
insert into t2 values ( 3,7000,3);
insert into t2 values ( 3,8000,1);
insert into t2 values ( 3,8000,7);
insert into t2 values ( 3,9000,5);

My solution (avoiding the distinct) is not necessarily better than the one presented by the A reader, but here it goes:

flip@FLOP> select x, y, c
2 ,min(f) over (partition by x) f
3 ,max(l) over (partition by x) l
4 from (
5 select t2.x, t2.y, count(*) c
6 ,min(t2.z) keep (dense_rank first order by t2.z) f
7 ,max(t2.z) keep (dense_rank last order by t2.z) l
8 from t1, t2
9 where t1.x = t2.x
10 group by t2.x, t2.y
11 ) t
12 ;

X Y C F L
---------- ---------- ---------- ---------- ----------
1 7000 2 1 8
1 8000 1 1 8
2 7000 2 1 5
3 7000 1 1 7
3 8000 2 1 7
3 9000 1 1 7

Cheers.

April 08, 2005 - 3:34 pm UTC

without create tables and inserts, I guess :)

takes too much time to create the setup for every case (wish people would read the page that they have to page down through to put something up here...)

I'm confused

Mikito, April 18, 2005 - 9:55 pm UTC

Given that

select distinct deptno
from emp

is essentially

select deptno
from emp
group by deptno

how is distinct query should be rewritten in case with analytics columns? Neither

SELECT deptno, count(1),
min(sal) over (partition by deptno) f
from emp
group by deptno,min(sal) over (partition by deptno);

nor

SELECT deptno, count(1),
min(sal) over (partition by deptno) f
from emp
group by deptno,f;

seems to be a valid syntax.

(To repeat: "Does analytics scale?")

April 19, 2005 - 7:22 am UTC

why would you use analytics that way?

Tell us the question, we'll tell you the method.

select deptno, count(*) /* because count(1) is counter-intuitive */,
min(sal) over (partition by deptno) f
from emp
group by deptno, min(sal) over (partition by deptno)

would not make sense. You are saying "get all deptnos, by deptno find the minimum salary and associate that number with each one, then aggregate by deptno/min salary to count records"

You should just ask:

find the minimum salary and count of records by deptno.

select deptno, count(*), min(sal) from emp group by deptno;

is what you were looking for. analytics scale up wonderful. Say the question was instead:

you have a table full of records that have a customer_id and a last_sale_date, I would like you to retrieve the last record for each customer.

select *
from ( select cust.*, max(sale_date) over (partition by cust_id) lsd
from cust )
where sale_date = lsd;

versus

select *
from cust
where sale_date =
(select max(sale_date) from cust c2 where cust_id = cust.cust_id )
/

or

select *
from cust, (select cust_id, max(sale_date) lsd from cust group by cust_id)x
where cust.cust_id = x.cust_id
and cust.sale_date = x.lsd
/

for example

Tricky SQL?

A reader, April 19, 2005 - 10:29 am UTC

CREATE TABLE master
(
m_no INTEGER PRIMARY KEY,
m_name VARCHAR2(255) NOT NULL UNIQUE
);

create table detail
(
d_pk integer primary key,
d_no integer not null references m(m_no),
d_date date,
d_data varchar2(255)
);

Given a d_pk, how can I get the second-to-last (ordered by d_date) record from M for that M_NAME? In other words, for a given m_name, there are multiple records in "detail" with different dates. Given one of those records, I want the prior record in "detail" (there might not be any)

I tried to design a simple master detail table, but maybe I over-normalized?

Thanks

April 19, 2005 - 12:00 pm UTC

are you saying "i have a detail record, I want the detail record that came 'in front' of this one"?

that is what I sort of hear, but the second to last is confusing me.

select *
from (
select ...., lead(d_pk) over (order by d_date) next_pk
from master, detail
where master.m_no = (select d_no from detail where d_pk = :x)
and master.m_no = detail.d_no
)
where next_pk = :x;

I think that does that. You get the master/detail for that d_pk (inline view)

Use lead to assign to each record the "next pk" after sorting by d_date

Keep the record whose 'next' records primary key was the one you wanted..

a little inconsistency

mikito, April 19, 2005 - 1:24 pm UTC

I meant inconsistency, not scalability. Why "distinct"

SELECT distinct deptno,
min(sal) over (partition by deptno) f
from emp

is allowed, whereas "group by" doesn't? If someone has trouble understanding what analytics with "group by" means, the same should apply to analytics with "distinct" as well.

April 19, 2005 - 1:26 pm UTC

because group by is not distinct, they are frankly very different concepts.

detail and summery in one sql statement

A reader, April 27, 2005 - 3:02 pm UTC

hi tom,

quick shot. i have to process many detail (column a - f) and one summery record (containing sum (column c) + count (*) over all recs + some literal placeholders) within one sql statement. is there another way then using a classical UNION ALL select? any new way with analytical functions?

April 27, 2005 - 3:22 pm UTC

need small example, did not follow your example as stated.

detail and summery in one sql statement

A reader, April 28, 2005 - 10:08 am UTC

hi tom,

here is the small and simple test case to show what i mean.

SQL> create table t1 (col1 number primary key, col2 number, col3 number);

Tabelle wurde angelegt.

SQL> create table t2 (col0 number primary key, col1 number references t1 (col1), col2 number, col3 number, col4 number);

Tabelle wurde angelegt.

SQL> create index t2_col1 on t2 (col1);

Index wurde angelegt.

SQL> insert into t1 values (1, 1, 1);

1 Zeile wurde erstellt.

SQL> insert into t2 values (1, 1, 1, 1, 1);

1 Zeile wurde erstellt.

SQL> insert into t2 values (2, 1, 2, 2, 2);

1 Zeile wurde erstellt.

SQL> insert into t2 values (3, 1, 3, 3, 3);

1 Zeile wurde erstellt.

SQL> analyze table t1 compute statistics;

Tabelle wurde analysiert.

SQL> analyze table t2 compute statistics;

Tabelle wurde analysiert.

SQL> select 0 rowtype, t1.col1 display1, t1.col2 display2, t2.col3 display3, t2.col4 display4
  2  from   t1 join t2 on (t1.col1 = t2.col1)
  3  where  t1.col1 = 1
  4  UNION ALL
  5  select 1 rowtype, t1.col1, count (*), null, sum (t2.col4)
  6  from   t1 join t2 on (t1.col1 = t2.col1)
  7  where  t1.col1 = 1
  8  group  by t1.col1
  9* order  by rowtype

   ROWTYPE   DISPLAY1   DISPLAY2   DISPLAY3   DISPLAY4
---------- ---------- ---------- ---------- ----------
         0          1          1          1          1
         0          1          1          2          2
         0          1          1          3          3
         1          1          3                     6

that is creating detail + summary record within one sql statement!

April 28, 2005 - 10:18 am UTC

ops$tkyte@ORA10G> select grouping_id(t1.col2) rowtype,
  2         t1.col1 d1,
  3             decode( grouping_id(t1.col2), 0, t1.col2, count(*) ) d2,
  4             decode( grouping_id(t1.col2), 0, t2.col3, null ) d3,
  5             decode( grouping_id(t1.col2), 0, t2.col4, sum(t2.col4) ) d4
  6    from t1, t2
  7   where t1.col1 = t2.col1
  8   group by grouping sets((t1.col1),(t1.col1,t1.col2,t2.col3,t2.col4))
  9  /
 
   ROWTYPE         D1         D2         D3         D4
---------- ---------- ---------- ---------- ----------
         0          1          1          1          1
         0          1          1          2          2
         0          1          1          3          3
         1          1          3                     6

detail and summery in one sql statement

A reader, April 29, 2005 - 10:05 am UTC

hi tom,

thanks for your help. that's exactly what i need. analytics rock, analytics roll as you said. :)

unfortunately it is hard to get. :(

i looked in the documentation but cannot understand the grouping_id values in the example. please could you explain? what is "2" or "3" in the grouping column?

Examples
The following example shows how to extract grouping IDs from a query of the sample table sh.sales:

SELECT channel_id, promo_id, sum(amount_sold) s_sales,
GROUPING(channel_id) gc,
GROUPING(promo_id) gp,
GROUPING_ID(channel_id, promo_id) gcp,
GROUPING_ID(promo_id, channel_id) gpc
FROM sales
WHERE promo_id > 496
GROUP BY CUBE(channel_id, promo_id);

C PROMO_ID S_SALES GC GP GCP GPC
- ---------- ---------- ---------- ---------- ---------- ----------
C 497 26094.35 0 0 0 0
C 498 22272.4 0 0 0 0
C 499 19616.8 0 0 0 0
C 9999 87781668 0 0 0 0
C 87849651.6 0 1 1 2
I 497 50325.8 0 0 0 0
I 498 52215.4 0 0 0 0
I 499 58445.85 0 0 0 0
I 9999 169497409 0 0 0 0
I 169658396 0 1 1 2
P 497 31141.75 0 0 0 0
P 498 46942.8 0 0 0 0
P 499 24156 0 0 0 0
P 9999 70890248 0 0 0 0
P 70992488.6 0 1 1 2
S 497 110629.75 0 0 0 0
S 498 82937.25 0 0 0 0
S 499 80999.15 0 0 0 0
S 9999 267205791 0 0 0 0
S 267480357 0 1 1 2
T 497 8319.6 0 0 0 0
T 498 5347.65 0 0 0 0
T 499 19781 0 0 0 0
T 9999 28095689 0 0 0 0
T 28129137.3 0 1 1 2
497 226511.25 1 0 2 1
498 209715.5 1 0 2 1
499 202998.8 1 0 2 1
9999 623470805 1 0 2 1
624110031 1 1 3 3

April 29, 2005 - 10:21 am UTC

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:37355353762363 <code>

How to do this using Analytics

A reader, May 05, 2005 - 5:11 pm UTC

Hello Sir,
I have a denormalized table dept_emp of which part of it I have reproduced here.It has/will have dupes .

I need to find out all emps which belong to more than one dept using Analytics ( Want to avoid self join ).

So the required output must be :

DEPTNO DNAME EMPNO ENAME
------ ---------- ----- --------------------
10 D10 1 E1
10 D10 1 E1
10 D10 2 E2
10 D10 2 E2

20 D20 1 E1
20 D20 1 E1
20 D20 2 E2
20 D20 2 E2

From the total set of :
SELECT * FROM DEPT_EMP ORDER BY DEPTNO ,EMPNO
DEPTNO DNAME EMPNO ENAME
------ ---------- ----- --------------------
10 D10 1 E1
10 D10 1 E1
10 D10 2 E2
10 D10 2 E2
10 D10 3 E3
10 D10 3 E3
20 D20 1 E1
20 D20 1 E1
20 D20 2 E2
20 D20 2 E2
20 D20 4 E4
20 D20 4 E4
20 D20 5 E5
20 D20 5 E5
14 rows selected

create table dept_emp (deptno number , dname varchar2(10) ,empno number ,ename varchar2(20) ) ;

INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 1, 'E1');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 2, 'E2');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 3, 'E3');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 4, 'E4');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 5, 'E5');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 1, 'E1');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 2, 'E2');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 1, 'E1');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 2, 'E2');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 3, 'E3');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 4, 'E4');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 5, 'E5');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 1, 'E1');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 2, 'E2');
COMMIT ;

Thanx

May 05, 2005 - 6:10 pm UTC

no analytics

select empno, count(distinct deptno)
from t
group by empno
having count(distinct deptno) > 1;

Thanx Sir

A reader, May 05, 2005 - 9:20 pm UTC

Actually I was planing to use analytics to get the whole row info, will do the same trick with Analytics, then.

You are a Genius.

May 06, 2005 - 7:17 am UTC

select *
from (
select t.*, count(distinct deptno) over (partition by empno) cnt
from t
)
where cnt > 1;

Analytical solution

Baiju Menon, May 10, 2005 - 6:29 am UTC

Sir,
I want to list the department and the maximum number of employees working in that department by using Analytical function(only the department in which the maximum number of employees are working)
the query without the Analytical function is
select deptno, count(deptno) from emp group by deptno having count(deptno) in (select max(count(deptno)) from emp group by deptno)
Thanks

May 10, 2005 - 9:15 am UTC

1 select deptno, cnt
2 from (
3 select deptno, cnt, max(cnt) over() max_cnt
4 from (
5 select deptno, count(*) cnt
6 from emp
7 group by deptno
8 )
9 )
10* where cnt = max_cnt
scott@ORA9IR2> /

DEPTNO CNT
---------- ----------
30 6

group by

Anoop Gupta, May 11, 2005 - 4:15 am UTC

Hi Tom,

I have a table in table data is like this

empid leavelname
1001 Level1
1001 Level2
1001 Level3
1001 Level4
1002 Level1
1002 Level2
1002 Level3
...
...

Means this table tell on which levels employee is assigned.
Is there any query posible that will retrun data llike this without writing a function.

empid emp_assigned on leavel
1001 level1,level2,level3,level4
1002 level1,level2,level3
...
...

Waiting for your response.....

May 11, 2005 - 7:30 am UTC

only if there is some reasonable maximum number of levelname rows per empid.

is there?

Analytics Rock - But why are they slower for me

Jeff Plumb, May 13, 2005 - 1:00 am UTC

Hi Tom,

I have followed you example about Analytics from Effective Oracle by Design on page 516 (Find a specific row in a partition). When I run the example and tkprof the 3 different queries, the analytics actually takes a lot longer to run, but it does do less logical I/O's. It is doing a lot more physical I/O's so I am guessing that it is using a temporary segment on disk to perform the window sort. To perform the test I created the big_table that you use and populated it with 1,000,000 rows. I am using Oracle 9i release 2. Here is the output from TKPROF:

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33
********************************************************************************

select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.owner = t.owner)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 8 5.32 6.42 13815 14669 0 694
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10 5.32 6.42 13815 14669 0 694

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33

Rows Row Source Operation
------- ---------------------------------------------------
694 HASH JOIN
20 VIEW
20 SORT GROUP BY
1000000 TABLE ACCESS FULL BIG_TABLE
1000000 TABLE ACCESS FULL BIG_TABLE

********************************************************************************

select t.owner, t.object_name, t.created
from big_table t
join (select owner, max(created) maxcreated
from big_table
group by owner) t2
on (t2.owner = t.owner and t2.maxcreated = t.created)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 8 5.03 5.06 13816 14669 0 694
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10 5.03 5.06 13816 14669 0 694

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33

Rows Row Source Operation
------- ---------------------------------------------------
694 HASH JOIN
20 VIEW
20 SORT GROUP BY
1000000 TABLE ACCESS FULL BIG_TABLE
1000000 TABLE ACCESS FULL BIG_TABLE

********************************************************************************

select owner, object_name, created
from
( select owner, object_name, created, max(created) over (partition by owner) as maxcreated
from big_table
)
where created = maxcreated

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 8 16.68 40.66 15157 7331 17 694
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10 16.68 40.66 15157 7331 17 694

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33

Rows Row Source Operation
------- ---------------------------------------------------
694 VIEW
1000000 WINDOW SORT
1000000 TABLE ACCESS FULL BIG_TABLE

********************************************************************************

And when I run the query with the analytics using autotrace I get the following which shows a sort to disk:
SQL*Plus: Release 9.2.0.6.0 - Production on Fri May 13 14:53:08 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
With the Partitioning option
JServer Release 9.2.0.6.0 - Production

control@DWDEV> set autot traceonly
control@DWDEV> select owner, object_name, created
2 from
3 ( select owner, object_name, created, max(created) over (partition by owner) as maxcreated
4 from big_table
5 )
6 where created = maxcreated;

694 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4399 Card=1000000 Bytes=52000000)
1 0 VIEW (Cost=4399 Card=1000000 Bytes=52000000)
2 1 WINDOW (SORT) (Cost=4399 Card=1000000 Bytes=43000000)
3 2 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=637 Card=1000000 Bytes=43000000)

Statistics
----------------------------------------------------------
0 recursive calls
17 db block gets
7331 consistent gets
15348 physical reads
432 redo size
12784 bytes sent via SQL*Net to client
717 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
694 rows processed

So how can I stop the sorts (disk)? I am guessing that the pga_aggregate_target needs to be higher, but it seems to already be set quite high.

control@DWDEV> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 524288000

I hope you can help clarify how to make the anayltic version run quicker.

Thanks.

May 13, 2005 - 9:50 am UTC

it'll be a function of the number of "owners" here

You have 1,000,000 records.

You have but 20 users.

in this extreme case, having 50,000 records per window and swapping out was not as good as squashing the data down to 20 records and joining -- the CBO quite smartly rewrote:

select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.owner = t.owner)

as

select ...
from big_table t, (select owner,max(created) created from big_table t2 ...)
where ....

So, does the data you analyze to find the "most current record" tend to have 50,000 records/key in real life?

In your case, your hash table didn't spill to disk. In real life though, the numbers would probably be much different. a 1,000,000 row table would have keys with 10 or 100 rows maybe, not 50,000 (in general). There you would find the answer to be very different.

And if you let the sort run in memory it would be different as well -- you would get a max of 25m given your pga aggregate target setting that may have been too small.

but consider what happens when the size of the "aggregate" goes up, dimishing marginal returns sets in:

select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.owner = t.owner)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 320 2.06 2.01 26970 29283 0 4775
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 322 2.06 2.01 26970 29283 0 4775
********************************************************************************
select owner, object_name, created
from
( select owner, object_name, created,
max(created) over (partition by owner) as maxcreated
from big_table
)
where created = maxcreated

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 320 4.57 10.05 30603 14484 15 4775
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 322 4.57 10.05 30603 14484 15 4775
********************************************************************************
select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.id = t.id)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66668 7.70 12.04 33787 45393 2 1000000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66670 7.71 12.05 33787 45393 2 1000000
********************************************************************************
select owner, object_name, created
from
( select owner, object_name, created,
max(created) over (partition by id) as maxcreated
from big_table
)
where created = maxcreated

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66668 7.00 9.60 9336 14484 2 1000000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66670 7.00 9.60 9336 14484 2 1000000

and, given sufficient space to work "in memory", these two big queries both benefited:

select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.owner = t.owner)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 320 1.82 1.96 9909 29283 0 4775
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 322 1.83 1.97 9909 29283 0 4775
********************************************************************************
select owner, object_name, created
from
( select owner, object_name, created,
max(created) over (partition by owner) as maxcreated
from big_table
)
where created = maxcreated

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 320 2.15 2.11 2858 14484 0 4775
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 322 2.15 2.11 2858 14484 0 4775
********************************************************************************
select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.id = t.id)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66668 7.64 7.55 10181 94633 0 1000000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66670 7.65 7.56 10181 94633 0 1000000
********************************************************************************
select owner, object_name, created
from
( select owner, object_name, created,
max(created) over (partition by id) as maxcreated
from big_table
)
where created = maxcreated

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66668 5.69 5.49 2699 14484 0 1000000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66670 5.69 5.49 2699 14484 0 1000000

(this was a dual cpu xeon using 'nonparallel' query in this case, once with a 256mb pga aggregate target and again with a 2gig one

kuldeep, May 14, 2005 - 3:27 am UTC

Dear Tom,

I have three tables t1, t2 & t3. where t2 & t3 is joined with t2 with column "key_id".
Now I need sum of key_values(amount) of t2 and sum of key_values(amount) of t3 for key_id
in table t1.

kuldeep@dlfscg> select * from t1;

KEY_ID KEY_VAL
---------- ----------
2 1980
1 1975

kuldeep@dlfscg> select * from t2;

KEY_ID KEY_VAL
---------- ----------
2 550
2 575
1 500

kuldeep@dlfscg> select * from t3;

KEY_ID KEY_VAL
---------- ----------
2 900
1 1000
1 750

***** QUERY 1 *****

kuldeep@dlfscg> SELECT t1.key_id, SUM(t2.key_val) sum_t2_key_val, SUM(t3.key_val) sum_t3_key_val
2 FROM t1, t2, t3
3 WHERE t1.key_id=t2.key_id
4 AND t1.key_id=t3.key_id
5 GROUP BY t1.key_id
6 /

KEY_ID SUM_T2_KEY_VAL SUM_T3_KEY_VAL
---------- -------------- --------------
1 1000 1750
2 1125 1800

***** QUERY 2 *****

kuldeep@dlfscg> SELECT t1.key_id, t2.sum_t2_key_val, t3.sum_t3_key_val
2 FROM t1,
3 (SELECT key_id, SUM(key_val) sum_t2_key_val FROM t2 GROUP BY key_id) t2,
4 (SELECT key_id, SUM(key_val) sum_t3_key_val FROM t3 GROUP BY key_id) t3
5 WHERE t1.key_id=t2.key_id
6 AND t1.key_id=t3.key_id
7 /

KEY_ID SUM_T2_KEY_VAL SUM_T3_KEY_VAL
---------- -------------- --------------
1 500 1750
2 1125 900

Query 1 is giving wrong result and I can not use query 2 whose performance is very poor.

Oracle 9i has added a lot of new grouping features and a lot of analytic functions (all going over the head).

Is there any "special" sum function or way which should pick value only once belonging to a row (or query's key, here "key_id")
irrespective of how many time it is appearing on query result.

KEY_ID T2_KEY_VAL T3_KEY_VAL
---------- ---------- ----------
1 500 1000
1 500 750 <---- 500 of t2 should not be calculated, it is repeat
2 550 900
2 575 900 <---- 900 of t3 should not be calculated, it is repeat

thanks and regards,

May 14, 2005 - 9:36 am UTC

select t1.key_id, t2.sum_val, t3.sum_val
from t1,
(select key_id, sum(val) sum_val from t2 group by key_id ) t2,
(select key_id, sum(val) sum_val from t3 group by key_id ) t3
WHERE t1.key_id=t2.key_id
AND t1.key_id=t3.key_id

apply an amount across multiple records

Dave, May 15, 2005 - 8:17 pm UTC

I have a problem similar to what I call the invoice payment problem.
It would seem to be a common problem, but I have searched to no avail.

The idea is that a customer may have many outstanding invoices, and sends in a check for an arbitrary amount. So we need to apply the money across the invoices oldest first.
Note that in my specific case, if a payment exceeds the total outstanding, the excess is ignored (obviously not dealing with real money here!)

create table invoices (
cust_nbr integer not null,
invoice_nbr integer not null,
invoice_amt number not null,
payment_amt number not null,
primary key (cust_nbr, invoice_nbr)
);

begin
delete from invoices;
dbms_random.seed(123456789);
for c in 1 .. 2 loop
for i in 1 .. 3 loop
insert into invoices values (c, i, round(dbms_random.value * 10, 2)+1, 0);
end loop;
end loop;
update invoices
set payment_amt = round(dbms_random.value * invoice_amt, 2)
where invoice_nbr = 1;
commit;
end;
/
select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
invoice_amt - payment_amt outstanding_amt
from invoices
where invoice_amt - payment_amt > 0
order by cust_nbr, invoice_nbr;

CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT OUTSTANDING_AMT
---------- ----------- ----------- ----------- ---------------
1 1 9.44 5.55 3.89
1 2 3.21 0 3.21
1 3 2.78 0 2.78
2 1 7.57 4.3 3.27
2 2 9.46 0 9.46
2 3 5.92 0 5.92

variable cust_nbr number;
variable received_amt number;
begin
:cust_nbr := 1;
:received_amt := 7.25;
end;
/

update invoices i1
set payment_amt = (... some query which applies
:received_amt to outstanding_amt ...)
where cust_nbr = :cust_nbr;

result should be:

CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT OUTSTANDING_AMT
---------- ----------- ----------- ----------- ---------------
1 1 9.44 9.44 0
1 2 3.21 3.21 0
1 3 2.78 .15 2.63
2 1 7.57 4.3 3.27
2 2 9.46 0 9.46
2 3 5.92 0 5.92

This is simple to solve in pl/sql with a cursor, but I thought it would be a good test for a set-based solution with analytics. But after some effort, I'm stumped.

May 16, 2005 - 7:37 am UTC

Using analytics we can see how to apply the inputs:

ops$tkyte@ORA9IR2> select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  2         least( greatest( :received_amt - rt + outstanding_amt, 0 ), outstanding_amt ) amount_to_apply
  3    from (
  4  select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  5         invoice_amt - payment_amt outstanding_amt,
  6         sum(invoice_amt - payment_amt) over (partition by cust_nbr order by invoice_nbr) rt
  7    from invoices
  8   where cust_nbr = :cust_nbr
  9         )
 10    order by cust_nbr, invoice_nbr;
 
  CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT AMOUNT_TO_APPLY
---------- ----------- ----------- ----------- ---------------
         1           1        9.44        5.55            3.89
         1           2        3.21           0            3.21
         1           3        2.78           0             .15


Just needed a running total of outstanding amounts to take away from the received amount....

Then, merge:

ops$tkyte@ORA9IR2> merge into invoices
  2  using
  3  (
  4  select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  5         least( greatest( :received_amt - rt + outstanding_amt, 0 ), outstanding_amt ) amount_to_apply
  6    from (
  7  select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  8         invoice_amt - payment_amt outstanding_amt,
  9         sum(invoice_amt - payment_amt) over (partition by cust_nbr order by invoice_nbr) rt
 10    from invoices
 11   where cust_nbr = :cust_nbr
 12         )
 13  ) x
 14  on ( invoices.cust_nbr = x.cust_nbr and invoices.invoice_nbr = x.invoice_nbr )
 15  when matched then update set payment_amt = nvl(payment_amt,0)+x.amount_to_apply
 16  when not matched /* never happens... */ then insert (cust_nbr) values (null);
 
3 rows merged.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
  2         invoice_amt - payment_amt outstanding_amt
  3    from invoices
  4    order by cust_nbr, invoice_nbr;
 
  CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT OUTSTANDING_AMT
---------- ----------- ----------- ----------- ---------------
         1           1        9.44        9.44               0
         1           2        3.21        3.21               0
         1           3        2.78         .15            2.63
         2           1        7.57         4.3            3.27
         2           2        9.46           0            9.46
         2           3        5.92           0            5.92
 
6 rows selected.

Group by

Anoop Gupta, May 16, 2005 - 10:06 am UTC

Reviewer: Anoop Gupta from INDIA

Hi Tom,

As i asked question that

I have a table in table data is like this

empid leavelname
1001 Level1
1001 Level2
1001 Level3
1001 Level4
1002 Level1
1002 Level2
1002 Level3
...
...

Means this table tell on which levels employee is assigned.
Is there any query posible that will retrun data llike this without writing a
function.

empid emp_assigned on leavel
1001 level1,level2,level3,level4
1002 level1,level2,level3
...
...

Give me the way to write a query if Suppose here we have a limitation of levels for an employee is 50.

Please reply....

May 16, 2005 - 1:09 pm UTC

select empid,
rtrim(
max(decode(rn,1,leavelname)) || ',' ||
max(decode(rn,1,leavelname)) || ',' ||
....
max(decode(rn,50,leavelname)), ',' )
from (select empid,
row_number() over (partition by empid order by leavelname) rn,
leavelname
from t
)
group by empid;

special sum

kuldeep, May 17, 2005 - 12:38 am UTC

Dear Tom,

Thanks for your response and for this useful site.

I was looking for a solution which could avoid these inline views which were making my query to run slow. I tried for the solution and got this query,

/* DATA VIEW */
kuldeep@dlfscg> SELECT t1.key_id,
2 t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn, t2.key_val,
3 t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t2_rn, t3.key_val
4 FROM t1, t2, t3
5 WHERE t1.key_id=t2.key_id
6 AND t1.key_id=t3.key_id
7 ORDER BY t1.key_id
8 /

KEY_ID T2_ROWID T2_RN KEY_VAL T3_ROWID T2_RN KEY_VAL
---------- ------------------ ---------- ---------- ------------------ ---------- ----------
1 AAANZ5AAHAAAD94AAA 1 500 AAANZ4AAHAAAD9wAAA 1 1000
1 AAANZ5AAHAAAD94AAA 2 500 AAANZ4AAHAAAD9wAAB 1 750
2 AAANZ5AAHAAAD91AAA 1 550 AAANZ4AAHAAAD9tAAA 1 900
2 AAANZ5AAHAAAD91AAB 1 575 AAANZ4AAHAAAD9tAAA 2 900

/* FINAL QUERY */

kuldeep@dlfscg> SELECT key_id,
2 SUM(DECODE(t2_rn,1,t2_key_val,0)) t2_key_val,
3 SUM(DECODE(t3_rn,1,t3_key_val,0)) t3_key_val
4 FROM (SELECT t1.key_id,
5 t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn, t2.key_val t2_key_val,
6 t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t3_rn, t3.key_val t3_key_val
7 FROM t1, t2, t3
8 WHERE t1.key_id=t2.key_id
9 AND t1.key_id=t3.key_id)
10 GROUP BY key_id
11 /

KEY_ID T2_KEY_VAL T3_KEY_VAL
---------- ---------- ----------
1 500 1750
2 1125 900

regards,

May 17, 2005 - 8:23 am UTC

one would need more information -- it APPEARS that you are trying to get a "random first hit" from T2 and T3 by T1.key_id

That is, for every row in T1 -- find the first match (any match will do) in T2 and in T3

report that value

is that correct.

and how big are t1,t2,t3, and how long is long.

group by

Anoop Gupta, May 17, 2005 - 9:42 am UTC

Tom,

Thanks for your prompt response.

Analytical Problem

Imran, May 18, 2005 - 4:16 am UTC

Look at the following two queries.

SQL> SELECT phone, MONTH, arrears, this_month, ABS (up_down),
  2         CASE
  3            WHEN up_down < 0
  4               THEN 'DOWN'
  5            WHEN up_down > 0
  6               THEN 'UP'
  7            ELSE 'BALANCE'
  8         END CASE,
  9         prev_month
 10    FROM (SELECT exch || ' - ' || phone phone,
 11                 TO_CHAR (TO_DATE (MONTH, 'YYMM'), 'Mon, YYYY') MONTH, region,
 12                 instdate, paybefdue this_month, arrears,
 13                 LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC) prev_month,
 14                   paybefdue
 15                 - (LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC)) up_down
 16            FROM ptc
 17           WHERE phone IN (7629458));

PHONE           MONTH              ARREARS THIS_MONTH ABS(UP_DOWN) CASE    PREV_MONTH
--------------- --------------- ---------- ---------- ------------ ------- ----------
202 - 7629458   Apr, 2005          2562.52       5265         5265 UP               0

SQL> SELECT phone, MONTH, arrears, this_month, ABS (up_down),
  2         CASE
  3            WHEN up_down < 0
  4               THEN 'DOWN'
  5            WHEN up_down > 0
  6               THEN 'UP'
  7            ELSE 'BALANCE'
  8         END CASE,
  9         prev_month
 10    FROM (SELECT exch || ' - ' || phone phone,
 11                 TO_CHAR (TO_DATE (MONTH, 'YYMM'), 'Mon, YYYY') MONTH, region,
 12                 instdate, paybefdue this_month, arrears,
 13                 LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC) prev_month,
 14                   paybefdue
 15                 - (LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC)) up_down
 16            FROM ptc
 17           WHERE phone IN (7629459));

PHONE           MONTH              ARREARS THIS_MONTH ABS(UP_DOWN) CASE    PREV_MONTH
--------------- --------------- ---------- ---------- ------------ ------- ----------
202 - 7629459   Apr, 2005          3516.62       7834         7834 UP               0

SQL> 

Now when I combine the two queries results are different. 

  1  SELECT phone, MONTH, arrears, this_month, ABS (up_down),
  2         CASE
  3            WHEN up_down < 0
  4               THEN 'DOWN'
  5            WHEN up_down > 0
  6               THEN 'UP'
  7            ELSE 'BALANCE'
  8         END CASE,
  9         prev_month
 10    FROM (SELECT exch || ' - ' || phone phone,
 11                 TO_CHAR (TO_DATE (MONTH, 'YYMM'), 'Mon, YYYY') MONTH, region,
 12                 instdate, paybefdue this_month, arrears,
 13                 LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC) prev_month,
 14                   paybefdue
 15                 - (LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC)) up_down
 16            FROM ptc
 17*          WHERE phone IN (7629458,7629459))
SQL> /

PHONE           MONTH              ARREARS THIS_MONTH ABS(UP_DOWN) CASE    PREV_MONTH
--------------- --------------- ---------- ---------- ------------ ------- ----------
202 - 7629458   Apr, 2005          2562.52       5265         2569 DOWN          7834
202 - 7629459   Apr, 2005          3516.62       7834         7834 UP               0

So you note that prev Month balance now disturbs badly.

Please tell me how to do this

May 18, 2005 - 8:58 am UTC

need test case. create table, insert's (like the page used to submit this said....)

Use of analytic functions in UPDATE statements

Bob Lyon, May 18, 2005 - 12:29 pm UTC

Tom,

-- Given this sample data

CREATE TABLE GT (
XP_ID INTEGER,
OFFSET INTEGER,
PMAX NUMBER,
PRIOR_PMAX NUMBER
);

INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 123, 1, 3);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 123, 2, 8);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 155, 3, 5);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 173, 3, 7.3);

-- I want to update the table and set the PRIOR_PMAX column values to be as follows

SELECT XP_ID, OFFSET, PMAX,
LAG(PMAX, 1, NULL) OVER (PARTITION BY XP_ID
ORDER BY XP_ID, OFFSET) PRIOR_PMAX
FROM GT
ORDER BY XP_ID, OFFSET;

XP_ID OFFSET PMAX PRIOR_PMAX
---------- ---------- ---------- ----------
123 1 3
123 2 8 3
155 3 5
173 3 7.3

-- My insert to do this tells me "4 rows updated.", but does not do what I want.

UPDATE GT A
SET PRIOR_PMAX = (
SELECT LAG(B.PMAX, 1, NULL) OVER (PARTITION BY B.XP_ID
ORDER BY B.XP_ID, B.OFFSET) PRIOR_PMAX
FROM GT B
WHERE A.ROWID = B.ROWID
);

-- but I get

SELECT xp_id, offset, pmax, prior_pmax
FROM GT
ORDER BY xp_id, offset;

XP_ID OFFSET PMAX PRIOR_PMAX
---------- ---------- ---------- ----------
123 1 3
123 2 8
155 3 5
173 3 7.3

-- Oracle doc states
-- "Therefore, analytic functions can appear only in the select list or ORDER BY clause."
-- which is perhaps a little ambiguous in this case.
-- Is there a way to do this update is "Straight SQL"?

May 18, 2005 - 12:54 pm UTC

you can merge

merge into gt a
using ( SELECT rowid rid, XP_ID, OFFSET, PMAX,
LAG(PMAX, 1, NULL) OVER (PARTITION BY XP_ID
ORDER BY XP_ID, OFFSET) PRIOR_PMAX
FROM GT )b
on (a.rowid = b.rowid)
when matched then update ...
when not matched (never happens, just do a dummy insert of a single null in 9i or leave off entirely in 10g)

special sum

Kuldeep, May 19, 2005 - 1:09 am UTC

My requirement was like this : I have receivables (bills, debit notes etc.) which I adjusts against the received payments and credit note (both are in seperate tables). To know the outstanding I was joining (outer join) my receivables with payments and credit notes.

Because one receivable can be adjusted against many payments and credit notes so outstanding payment was like this:

outstanding = receivable amount - sum(payment amount) - sum(credit note amount)

this simple query using outer join was giving wrong result if a receivable is adjusted against one payment and more than one credit note or vice versa.

in this case where
receivable : 1000 payment : 400 CN : 400, 200

will appear as
1000 400 400
1000 400 200
--- ---
800 600 outstanding = -400 (wrong)

My t1, t2 and t3 has 600,000, 350,000 and 80,000 row respectively.

This is my actual inline view query
-----------------------------------
SELECT a.bill_type, a.bill_exact_type, a.period_id,
a.scheme_id, a.property_number, a.bill_number,
a.bill_amount, SUM(NVL(c.adj_amt,0)+NVL(p.adjust_amount,0)) adj_amt,
NVL(a.bill_amount,0) - SUM(NVL(c.adj_amt,0)+NVL(p.adjust_amount,0)) pending_amt
FROM ALL_RECEIVABLE a,
(SELECT bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number, SUM(adj_amt) adj_amt
FROM CREDIT_NOTE_RECEIVABLE
WHERE bill_type=p_bill_type
AND scheme_id=p_scheme
AND property_number=p_prop
GROUP BY bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number) c,
(SELECT bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number, SUM(adjust_amount) adjust_amount
FROM PAYMENT_RECEIPT_ADJ
WHERE bill_type=p_bill_type
AND scheme_id=p_scheme
AND property_number=p_prop
GROUP BY bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number) p
WHERE a.bill_type=P_BILL_TYPE
AND a.scheme_id=P_SCHEME
AND a.property_number=P_PROP
AND a.bill_type=c.bill_type(+)
AND a.bill_exact_type=c.bill_exact_type(+)
AND a.period_id=c.period_id(+)
AND a.scheme_id=c.scheme_id(+)
AND a.property_number=c.property_number(+)
AND a.bill_number=c.bill_number(+)
AND a.bill_type=p.bill_type(+)
AND a.bill_exact_type=p.bill_exact_type(+)
AND a.period_id=p.period_id(+)
AND a.scheme_id=p.scheme_id(+)
AND a.property_number=p.property_number(+)
AND a.bill_number=p.bill_number(+)
GROUP BY a.bill_type, a.bill_exact_type, a.period_id, a.scheme_id,
a.property_number, a.bill_number, a.bill_date, a.bill_amount
HAVING (NVL(a.bill_amount,0) - SUM(NVL(c.adj_amt,0)+NVL(p.adjust_amount,0))) > 0
ORDER BY a.bill_date;
-----------------------------------

It is not reporting just the first hit of t1 in t2 and t3. Here in my last posting, I was trying just to exclude any repeat of t2 and t3's ROW in sum calculation. That means one row of t2 and t3 should be calculated only once.

I have tried this query putting more rows and applied the same on actual query, it is working fine and giving the same result as previous inline view query was giving.

kuldeep@dlfscg> SELECT t1.key_id,
2 t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn, t2.key_val,
3 t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t2_rn, t3.key_val
4 FROM t1, t2, t3
5 WHERE t1.key_id=t2.key_id(+)
6 AND t1.key_id=t3.key_id(+)
7 ORDER BY t1.key_id
8 /

KEY_ID T2_ROWID T2_RN KEY_VAL T3_ROWID T2_RN KEY_VAL
---------- ------------------ ---------- ---------- ------------------ ---------- ----------
1 AAANZ5AAHAAAD94AAA 1 500 AAANZ4AAHAAAD9wAAA 1 1000
1 AAANZ5AAHAAAD94AAA 2 500 AAANZ4AAHAAAD9wAAB 1 750
1 AAANZ5AAHAAAD94AAA 3 500 AAANZ4AAHAAAD9wAAC 1 25
2 AAANZ5AAHAAAD91AAA 1 550 AAANZ4AAHAAAD9tAAA 1 900
2 AAANZ5AAHAAAD91AAB 1 575 AAANZ4AAHAAAD9tAAA 2 900
3 AAANZ5AAHAAAD91AAC 1 222 1
3 AAANZ5AAHAAAD91AAD 1 223 2
4 1 AAANZ4AAHAAAD9tAAB 1 333

8 rows selected.

kuldeep@dlfscg> SELECT key_id,
2 SUM(DECODE(t2_rn,1,t2_key_val,0)) t2_key_val,
3 SUM(DECODE(t3_rn,1,t3_key_val,0)) t3_key_val
4 FROM (SELECT t1.key_id,
5 t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn, t2.key_val t2_key_val,
6 t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t3_rn, t3.key_val t3_key_val
7 FROM t1, t2, t3
8 WHERE t1.key_id=t2.key_id(+)
9 AND t1.key_id=t3.key_id(+))
10 GROUP BY key_id
11 /

KEY_ID T2_KEY_VAL T3_KEY_VAL
---------- ---------- ----------
1 500 1775
2 1125 900
3 445 0
4 333

kuldeep@dlfscg>

thanks for your responses.

regards,

May 19, 2005 - 7:47 am UTC

do not order by rowid to get a last row -- is that what you are trying to do??

which row do you want to get from t2 to join with t1
and which row do you want to get from t3 to join with t1

You must specify that based on attributes you manage (eg: there must be an orderable field that helps you determine WHICH record is the right one)

consider rowid to be a random number that does not have any meaning when ordered by, it does not imply order of insertion or anything.

null record

yeshk, May 25, 2005 - 4:12 pm UTC

I need help with this query - This is just a part of the query I am working with.

I am not able to generate a NULL RECORD in between the result set.
I should be able to pass this information out as a reference cursor.

create table test(state varchar2(2),svc_cat varchar2(3),measd_tkt number,non_measd_tkt number);

insert into test values('CA','NDS',100,200);
insert into test values('IL','DSL',200,300);
insert into test values('CA','DSL',100,300);
insert into test values('MO','NDS',1000,300);
insert into test values('MO','DSL',100,200);

I need a result like this
STATE SVC_CAT MEASD_TKT NON MEASD TKT
CA DSL 200 300
CA NDS 100 200
TOTAL 300 500

IL DSL 200 300
TOTAL 200 300

MO DSL 100 200
MO NDS 1000 300
TOTAL 1100 500

I am able to generate the result using a query with analytics.But I dont know how to get an empty row after each state total

Also,Which is better using cursor
1) cursor based on state.
2) get the data and insert into a temporary table.
3) insert a null record

or use analytics to get complete data and put into a reference cursor.

Thanks
yeshk

May 25, 2005 - 7:57 pm UTC

well, that would sort of be the job of the "pretty printing routine" -- eg: the report generator?

what tool is printing this out?

null record

yeshk, May 26, 2005 - 9:20 am UTC

we need to give the resultant set with a null record after each state calculation to front-end VB application. It will be given in a reference cursor.They will just select * from reference cursor and display it on a report.

May 26, 2005 - 10:02 am UTC

the VB application should do this, (it should be able to do something shouldn't it...)

ops$tkyte@ORA9IR2> select decode( grp, 0, state ) state,
  2         decode( grp, 0, svc_cat) svc_cat,
  3             decode( grp, 0, sum_mt ) sum_mt,
  4             decode( grp, 0, sum_nmt ) sum_nmt
  5    from (
  6  select grouping(dummy) grp, state, svc_cat, sum(measd_tkt) sum_mt, sum(non_measd_tkt) sum_nmt
  7    from (
  8  select state, svc_cat, 1 dummy, measd_tkt, non_measd_tkt
  9    from test
 10         )
 11   group by rollup( state, dummy, svc_cat )
 12         )
 13  /
 
ST SVC     SUM_MT    SUM_NMT
-- --- ---------- ----------
CA DSL        100        300
CA NDS        100        200
CA            200        500
 
IL DSL        200        300
IL            200        300
 
MO DSL        100        200
MO NDS       1000        300
MO           1100        500
 
 
 
12 rows selected.

Can rollup do the thing??

Bhavesh Ghodasara, May 26, 2005 - 9:39 am UTC

Hi yeshk,

create table test(state varchar2(2),svc_cat varchar2(3),measd_tkt
number,non_measd_tkt number);

insert into test values('CA','NDS',100,200);........

insert into test values('CA','DSL',100,300);....

STATE SVC_CAT MEASD_TKT NON MEASD TKT
CA DSL 200 300 <==From where measd_tkt=200 comes from??????
CA NDS 100 200
TOTAL 300 500

Tom,,Can we do like this,

break on state
select STATE,SVC_CAT,sum(measd_tkt),sum(non_measd_tkt)
from test
group by rollup(STATE,SVC_CAT)
order by state

............
If i make any mistake than please tell me..
Thanks in advance.

May 26, 2005 - 10:19 am UTC

see above

Which analytics to use?

Marc-Andre Larochelle, May 30, 2005 - 9:10 pm UTC

Hi Tom,

I have this 3rd party table:

drop table t;
create table t (atype varchar2(4),
acol# varchar2(3),
adin varchar2(8),
ares varchar2(8));

insert into t (atype, acol#, adin) values ('DUPT','001','02246569');
insert into t (atype, acol#, adin) values ('DUPT','002','00021474');
insert into t (atype, acol#, adin) values ('DUPT','003','02246569');
insert into t (atype, acol#, ares) values ('MACT','1','02246569');
insert into t (atype, acol#, ares) values ('MACT','6','02246569');
insert into t (atype, acol#, ares) values ('MACT','7','00021474');

select * from t;

ATYPE ACOL# ADIN ARES
----- ----- -------- --------
DUPT 001 02246569
DUPT 002 00021474
DUPT 003 02246569
MACT 1 02246569
MACT 6 02246569
MACT 7 00021474

I would like to get the following result :

DUPT 001 02246569 MACT 1 02246569
DUPT 002 00021474 MACT 7 00021474
DUPT 003 02246569 MACT 6 02246569

I need to match DUPT.adin=MACT.ares together but making sure MACT.acol# is different for every DUPT.acol#. Bsically this table has different values in column depending on the type of rows (atype).

I have tried using lag, lead, rank and nothing seems to work but I am pretty sure it is doable with analytics which is why I posted my question here.

Any hint/help would be appreciated.

Thank you,

Marc-Andre

May 31, 2005 - 7:30 am UTC

question for you.

How did you know to put:

DUPT 001 02246569 together with MACT 1 02246569 and
DUPT 003 02246569 together with MACT 6 02246569

and not

DUPT 001 02246569 MACT 6 02246569
DUPT 003 02246569 MACT 1 02246569

for example. some missing logic here.

Am I Correct??

Bhavesh Ghodasara, May 31, 2005 - 5:15 am UTC

Hi tom,
I solved above problem...
The query like :

select atyp,acol,aadin,batype,bacol,bares
from (
select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol# bacol,b.ares bares,
nvl(lead(b.acol# ) over(order by a.adin),0) lb,
count(*) over(partition by a.acol#) cnt
from t a,t b
where a.adin=b.ares
order by atyp,acol) t
where bacol<>lb

What i think is there must be a better way...
I know You will do it in much much better way..
Please suggest the corrections.
Thanks in Advance..

May 31, 2005 - 8:17 am UTC

ATYP ACO AADIN BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 001 02246569 MACT 6 02246569
DUPT 002 00021474 MACT 7 00021474
DUPT 003 02246569 MACT 1 02246569

well, it gives a different result than the one you posted, it gives my hypothetical answer -- where 001 was combined with 6, not 1.

We can do this..

Bhavesh Ghodasara, May 31, 2005 - 8:28 am UTC

Hi tom,
I can further modified my query:
now its give desired result....
(Agree that question is ambiguous)
select atyp,acol,aadin,batype,bacol,bares
from (
select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol#
bacol,b.ares bares,
nvl(lead(b.acol# ) over(order by a.adin),0) lb,
min(b.acol#) over(partition by a.acol#) cnt
from t a,t b
where a.adin=b.ares
order by atyp,acol) t
where bacol=lb
or cnt>1

OUTPUT:
ATYP ACO AADIN BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 001 02246569 MACT 1 02246569
DUPT 002 00021474 MACT 7 00021474
DUPT 003 02246569 MACT 6 02246569

So any corrections now??
Thanks in advance
Bhavesh

May 31, 2005 - 8:43 am UTC

I don't know your data well enough, but your query is non-deterministic if you care.  Consider:


ops$tkyte@ORA10G> create table t (atype varchar2(4),
  2                  acol# varchar2(3),
  3                  adin varchar2(8),
  4                  ares varchar2(8));
 
Table created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','001','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','002','00021474');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','003','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','1','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','5','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','6','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','7','00021474');
 
1 row created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> select atyp,acol,aadin,batype,bacol,bares
  2  from (
  3  select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol#
  4  bacol,b.ares bares,
  5  nvl(lead(b.acol# ) over(order by a.adin),0) lb,
  6  min(b.acol#) over(partition by a.acol#) cnt
  7  from t a,t b
  8  where a.adin=b.ares
  9  order by atyp,acol) t
 10  where bacol=lb
 11  or cnt>1;
 
ATYP ACO AADIN    BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 002 00021474 MACT 7   00021474
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> truncate table t;
 
Table truncated.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','001','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','002','00021474');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','003','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','1','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','6','02246569');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','7','00021474');
 
1 row created.
 
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','5','02246569');
 
1 row created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> select atyp,acol,aadin,batype,bacol,bares
  2  from (
  3  select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol#
  4  bacol,b.ares bares,
  5  nvl(lead(b.acol# ) over(order by a.adin),0) lb,
  6  min(b.acol#) over(partition by a.acol#) cnt
  7  from t a,t b
  8  where a.adin=b.ares
  9  order by atyp,acol) t
 10  where bacol=lb
 11  or cnt>1;
 
ATYP ACO AADIN    BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 001 02246569 MACT 6   02246569
DUPT 002 00021474 MACT 7   00021474


Same data both times, just different order of insertions.  With analytics and order by, you need to be concerned about duplicates.

Answers

Marc-Andre Larochelle, May 31, 2005 - 11:35 am UTC

Tom, Bhavesh,

The problem resides exactly there: no logic to match the records. I know that DUPT.din1 must have a MACT.din1 somewhere. I just don't know which one (1st one, 2nd one?). This is a decision I will have to take.

DUPT 001 02246569 MACT 1 02246569
DUPT 003 02246569 MACT 6 02246569
and
DUPT 001 02246569 MACT 6 02246569
DUPT 003 02246569 MACT 1 02246569

are the same to me. But when I run the query, I want to always get the same results.

Anyways, all in all, your queries (Bhavesh - thank you - and yours) seem to answer to my question. I will watch out for duplicates.

Thank you very much for the quick help.

Marc-Andre

What I found

Marc-Andre Larochelle, May 31, 2005 - 5:02 pm UTC

Hi Tom,

Testing the SQL statement Bhavesh provided, I quickly discovered what you meant when saying the query was non-deterministic. When I added a 4th record :

insert into t (atype,acol#,adin) values ('DUPT','004','02246569');
insert into t (atype,acol#,ares) values ('MACT','5','02246569');

only one row was returned. I played with the query and here is what I came up with :

select atyp,acol,aadin,batype,bacol,bares
from (
select atyp,acol,aadin,batype,bacol,bares,drnk ,
rank() over (partition by acol order by bacol) rnk
from (
select a.atype atyp,
a.acol# acol,
a.adin aadin,
b.atype batype,
b.acol# bacol,
b.ares bares,
dense_rank() over (partition by a.atype,a.adin order by a.acol#) drnk
from t a,t b
where a.adin=b.ares))
where drnk=rnk;

Feel free to comment.

Again thank you (and Bhavesh).

Marc-Andre

Using Analytical Values to find latest info

anirudh, June 03, 2005 - 10:41 am UTC

Hi Tom,

we have a fairly large table with about 100 million rows, among others this table has
the following columns

CREATE TABLE my_fact_table (
staff_number VARCHAR2 (10), -- staff number
per_end_dt DATE, -- last day of month
engagement_code VARCHAR2 (30), -- engagement code
client_code VARCHAR2 (20), -- client code
revenue NUMBER (15,2) -- revenue
)

in this table the same engagement code can have different client codes for diffenet periods. This was at one point desirable and that is the reason client code was stored in this fact table instead of the engagement dimension.

Our users now want us to update the client code in these transactions to the latest value of the client code (meaning - pick the client from the latest month for which we have got any transactions for that engagement)

This situation where same engagement has multiple clients across periods is there for about 5 % of the rows.

[btw - we do plan to do data-model change to reflect the new relationships - but that may take some time - hence the interim need to just update the fact table]

to implemnt these updates that may happen for several months, I'm trying to take the approach below
which involve multiple queries and creation of a couple of temp tables - does it seem reasonable. i have a lurking feeling that with a deeper understanding of Analytic functions this can be further simplified - will appreciate your thoughts.

============= My Approach =================

-- Find the Engagements that have multiple Clients
CREATE TABLE amtest_mult_cli AS
WITH
v1 AS (SELECT DISTINCT engagement_code,client_code
FROM my_fact_table)
SELECT engagement_code
FROM v1
GROUP BY engagement_code
HAVING COUNT(*) > 1

-- Find What should be the correct client for those engagements
CREATE TABLE amtest_use_cli AS
SELECT engagement_code,per_end_dt,client_code
FROM
(
SELECT engagement_code,per_end_dt,client_code
row_number() OVER (PARTITION BY engagement_code
ORDER BY per_end_dt DESC, client_code DESC)
row_num
FROM my_fact_table a,
amtest_mult_cli b
WHERE a.engagement_code = b.engagement_code
)
WHERE row_num = 1;

-- Update Correct Clients for those engagements
UPDATE my_fact_table a
SET a.client_code =
(SELECT b.client_code
FROM amtest_use_cli b
WHERE a.engagement_code = b.engagement_code)
WHERE EXISTS
(SELECT 1
FROM amtest_use_cli c
WHERE a.engagement_code = c.engagement_code);

======================================================

June 03, 2005 - 12:14 pm UTC

why not:

merge into my_fact_table F
using
( select engagement_code,
substr(max(to_char(per_end_dt,'yyyymmddhh24miss')||client_code ),15) cc
from my_fact_table
group by engagement_code
having count(distinct client_code) > 1 ) X
on ( f.engagement_code = x.engagement_code )
when matched
then update set client_code = x.cc
when not matched
then insert ( client_code ) values ( null ); <<== never can happen
<<== in 10g, not needed!

That select finds the client_code for the max per_end_dt by engagement_code for engagement_code's that have more than one distinct client_code....

first_value(client_code)
over (partition by engagement_code
order by per_end_dt desc, client_code desc ),
count(distinct client_code)

help with lead

Adolph, June 09, 2005 - 1:24 am UTC

I have a table in the following structure:

create table cs_fpc_pr
(PRGM_C VARCHAR2(10) not null,
fpc_date date not null,
TIME_code VARCHAR2(3) not null,
SUN_TYPE varchar2(1))

insert into cs_fpc_pr values ('PRGM000222', to_date('08-may-2005','dd-mon-rrrr'), '33','1');

insert into cs_fpc_pr values ('PRGM000222', to_date('09-may-2005','dd-mon-rrrr'), '05','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('09-may-2005','dd-mon-rrrr'), '25','1');
insert into cs_fpc_pr values ('PRGM000222', to_date('09-may-2005','dd-mon-rrrr'), '45','3');

insert into cs_fpc_pr values ('PRGM000222', to_date('10-may-2005','dd-mon-rrrr'), '05','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('10-may-2005','dd-mon-rrrr'), '25','1');
insert into cs_fpc_pr values ('PRGM000222', to_date('10-may-2005','dd-mon-rrrr'), '45','3');

insert into cs_fpc_pr values ('PRGM000222', to_date('14-may-2005','dd-mon-rrrr'), '05','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('14-may-2005','dd-mon-rrrr'), '24','1');

insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '23','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '47','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '48','3');

insert into cs_fpc_pr values ('PRGM000242', to_date('09-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('09-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('09-may-2005','dd-mon-rrrr'), '46','3');

insert into cs_fpc_pr values ('PRGM000242', to_date('10-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('10-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('10-may-2005','dd-mon-rrrr'), '46','3');

insert into cs_fpc_pr values ('PRGM000242', to_date('11-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('11-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('11-may-2005','dd-mon-rrrr'), '46','3');

insert into cs_fpc_pr values ('PRGM000242', to_date('14-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('14-may-2005','dd-mon-rrrr'), '23','1');

commit;

select prgm_c,fpc_date,time_code,sun_type,
lead(fpc_date) over(partition by prgm_C order by fpc_date) next_date
from cs_fpc_pr
order by prgm_c,fpc_date,time_code;

PRGM_C FPC_DATE TIM S NEXT_DATE
---------- --------- --- - ---------
PRGM000222 08-MAY-05 33 1 09-MAY-05
PRGM000222 09-MAY-05 05 3 09-MAY-05
PRGM000222 09-MAY-05 25 1 09-MAY-05
PRGM000222 09-MAY-05 45 3 10-MAY-05
PRGM000222 10-MAY-05 05 3 10-MAY-05
PRGM000222 10-MAY-05 25 1 10-MAY-05
PRGM000222 10-MAY-05 45 3 14-MAY-05
PRGM000222 14-MAY-05 05 3 14-MAY-05
PRGM000222 14-MAY-05 24 1
PRGM000242 08-MAY-05 07 3 08-MAY-05
PRGM000242 08-MAY-05 23 1 08-MAY-05
PRGM000242 08-MAY-05 47 3 08-MAY-05
PRGM000242 08-MAY-05 48 3 09-MAY-05
PRGM000242 09-MAY-05 07 3 09-MAY-05
PRGM000242 09-MAY-05 33 1 09-MAY-05
PRGM000242 09-MAY-05 46 3 10-MAY-05
PRGM000242 10-MAY-05 07 3 10-MAY-05
PRGM000242 10-MAY-05 33 1 10-MAY-05
PRGM000242 10-MAY-05 46 3 11-MAY-05
PRGM000242 11-MAY-05 07 3 11-MAY-05
PRGM000242 11-MAY-05 33 1 11-MAY-05
PRGM000242 11-MAY-05 46 3 14-MAY-05
PRGM000242 14-MAY-05 07 3 14-MAY-05
PRGM000242 14-MAY-05 23 1

I need to find the for a particular 'prgm_c' the next date & time code where the 'sun_type' field = '1'.

A sample of the output should look something like this:

PRGM_C FPC_DATE TIM S NEXT_DATE next_time
---------- --------- --- - --------- -------
PRGM000222 08-MAY-05 33 1 09-MAY-05 25
PRGM000222 09-MAY-05 05 3 09-MAY-05 25
PRGM000222 09-MAY-05 25 1 10-MAY-05 25
PRGM000222 09-MAY-05 45 3 10-MAY-05 25
PRGM000222 10-MAY-05 05 3 10-MAY-05 25
PRGM000222 10-MAY-05 25 1 14-MAY-05 24
PRGM000222 10-MAY-05 45 3 14-MAY-05 24
PRGM000222 14-MAY-05 05 3 14-MAY-05 24
PRGM000222 14-MAY-05 24 1

Tom, Can you please help me with with this?

Regards

June 09, 2005 - 6:53 am UTC

PRGM000222 10-MAY-05 05 3 10-MAY-05
PRGM000222 10-MAY-05 25 1 10-MAY-05
PRGM000222 10-MAY-05 45 3 14-MAY-05
PRGM000222 14-MAY-05 05 3 14-MAY-05
PRGM000222 14-MAY-05 24 1

you've got a problem with those fpc_dates and ordering by them. you have "dups" so no one of those 10-may-05 comes "first" same with the 14th. You need to figure out how to really order this data deterministically first.

My first attempt at this is:

tkyte@ORA9IR2W> select prgm_c, fpc_date, time_code, sun_type,
2 to_date(substr( max(data)
over (partition by prgm_c order by fpc_date desc),
6, 14 ),'yyyymmddhh24miss') ndt,
3 to_number( substr( max(data)
over (partition by prgm_c order by fpc_date desc), 20) ) ntc
4 from (
5 select prgm_c,
6 fpc_date,
7 time_code,
8 sun_type,
9 case when lag(sun_type)
over (partition by prgm_c order by fpc_date desc) = '1'
10 then to_char( row_number()
over (partition by prgm_c order by fpc_date desc) , 'fm00000') ||
11 to_char(lag(fpc_date)
over (partition by prgm_c order by fpc_date desc),'yyyymmddhh24mi
ss')||
12 lag(time_code) over (partition by prgm_c order by fpc_date desc)
13 end data
14 from cs_fpc_pr
15 )
16 order by prgm_c,fpc_date,time_code
17 /

PRGM_C FPC_DATE TIM S NDT NTC
---------- --------- --- - --------- ----------
PRGM000222 08-MAY-05 33 1 09-MAY-05 25
PRGM000222 09-MAY-05 05 3 09-MAY-05 25
PRGM000222 09-MAY-05 25 1 09-MAY-05 25
PRGM000222 09-MAY-05 45 3 09-MAY-05 25
PRGM000222 10-MAY-05 05 3 10-MAY-05 25
PRGM000222 10-MAY-05 25 1 10-MAY-05 25
PRGM000222 10-MAY-05 45 3 10-MAY-05 25
PRGM000222 14-MAY-05 05 3
PRGM000222 14-MAY-05 24 1
PRGM000242 08-MAY-05 07 3 08-MAY-05 23
PRGM000242 08-MAY-05 23 1 08-MAY-05 23
PRGM000242 08-MAY-05 47 3 08-MAY-05 23
PRGM000242 08-MAY-05 48 3 08-MAY-05 23
PRGM000242 09-MAY-05 07 3 10-MAY-05 33
PRGM000242 09-MAY-05 33 1 10-MAY-05 33
PRGM000242 09-MAY-05 46 3 10-MAY-05 33
PRGM000242 10-MAY-05 07 3 10-MAY-05 33
PRGM000242 10-MAY-05 33 1 10-MAY-05 33
PRGM000242 10-MAY-05 46 3 10-MAY-05 33
PRGM000242 11-MAY-05 07 3 14-MAY-05 23
PRGM000242 11-MAY-05 33 1 14-MAY-05 23
PRGM000242 11-MAY-05 46 3 14-MAY-05 23
PRGM000242 14-MAY-05 07 3
PRGM000242 14-MAY-05 23 1

24 rows selected.

but the lack of distinctness on the fpc_date means you might get "a different answer" with the same set of data.

reply

Adolph, June 09, 2005 - 7:48 am UTC

Sorry for not being clear at the first instance so here goes.... A program (prgm_C) will have a maximum of one entry in the table for a combination of a (fpc_date & time_code).

This time_code actually maps to another table where '01' is '01:00:00' , '02' is '01:30:00' & so on (i.e. times stored in varchar2 formats )

So basically a program will exist for a fpc_date and a time_code only once

I hope i'm making sense.

Regards

June 09, 2005 - 7:58 am UTC

tkyte@ORA9IR2W> select prgm_c,
2 fpc_date,
3 time_code,
4 sun_type,
5 to_date(
6 substr( max(data)
7 over (partition by prgm_c
8 order by fpc_date desc,
9 time_code desc),
10 6, 14 ),'yyyymmddhh24miss') ndt,
11 to_number(
12 substr( max(data)
13 over (partition by prgm_c
14 order by fpc_date desc,
15 time_code desc), 20) ) ntc
16 from (
17 select prgm_c,
18 fpc_date,
19 time_code,
20 sun_type,
21 case when lag(sun_type)
22 over (partition by prgm_c
23 order by fpc_date desc,
24 time_code desc) = '1'
25 then
26 to_char( row_number()
27 over (partition by prgm_c
28 order by fpc_date desc,
29 time_code desc) , 'fm00000') ||
30 to_char(lag(fpc_date)
31 over (partition by prgm_c
32 order by fpc_date desc,
33 time_code desc),'yyyymmddhh24mi ss')||
34 lag(time_code)
35 over (partition by prgm_c
36 order by fpc_date desc,
37 time_code desc)
38 end data
39 from cs_fpc_pr
40 )
41 order by prgm_c,fpc_date,time_code
42 /

PRGM_C FPC_DATE TIM S NDT NTC
---------- --------- --- - --------- ----------
PRGM000222 08-MAY-05 33 1 09-MAY-05 25
PRGM000222 09-MAY-05 05 3 09-MAY-05 25
PRGM000222 09-MAY-05 25 1 10-MAY-05 25
PRGM000222 09-MAY-05 45 3 10-MAY-05 25
PRGM000222 10-MAY-05 05 3 10-MAY-05 25
PRGM000222 10-MAY-05 25 1 14-MAY-05 24
PRGM000222 10-MAY-05 45 3 14-MAY-05 24
PRGM000222 14-MAY-05 05 3 14-MAY-05 24
PRGM000222 14-MAY-05 24 1
PRGM000242 08-MAY-05 07 3 08-MAY-05 23
PRGM000242 08-MAY-05 23 1 09-MAY-05 33
PRGM000242 08-MAY-05 47 3 09-MAY-05 33
PRGM000242 08-MAY-05 48 3 09-MAY-05 33
PRGM000242 09-MAY-05 07 3 09-MAY-05 33
PRGM000242 09-MAY-05 33 1 10-MAY-05 33
PRGM000242 09-MAY-05 46 3 10-MAY-05 33
PRGM000242 10-MAY-05 07 3 10-MAY-05 33
PRGM000242 10-MAY-05 33 1 11-MAY-05 33
PRGM000242 10-MAY-05 46 3 11-MAY-05 33
PRGM000242 11-MAY-05 07 3 11-MAY-05 33
PRGM000242 11-MAY-05 33 1 14-MAY-05 23
PRGM000242 11-MAY-05 46 3 14-MAY-05 23
PRGM000242 14-MAY-05 07 3 14-MAY-05 23
PRGM000242 14-MAY-05 23 1

24 rows selected.

Just needed to add "time_code DESC"

See

</code> https://www.oracle.com/technetwork/issue-archive/2014/14-mar/o24asktom-2147206.html <code>

analytics to the rescue

for the "carry down" technique I used here. In 10g, we'd simplify using "ignore nulls" in the LAST_VALUE function instead of the max() and row_number() trick

brilliant

Adolph, June 09, 2005 - 9:53 am UTC

Thank you very much Tom. The query works like a charm.I will read up the link. Analytics do rock n roll :)

Working on an Analytic Query

Scott, June 09, 2005 - 12:15 pm UTC

Tom,
From your example for Mark's problem on 4/8, it seems that you need to specify a number of columns to output this way. Is there a way to have a varying number of columns. For example, I need to have a query that takes a date range, and makes each date a column heading. Any help would be greatly appreciated.
Thanks,
Scott

June 09, 2005 - 6:15 pm UTC

you need dynamic sql. the number of columns in a query is "well defined, known at parse time" by definition.

If you have access to expert one on one Oracle, I demostrated how to do this with ref cursors in a stored procedure. but you have to run a query, to get the set of column "headings" and write a query bsaed on that.

Tom any idea how I can re write this piece of code

A reader, June 09, 2005 - 3:00 pm UTC

decode ((SELECT ih.in_date
FROM major_sales ih
WHERE ih.container = i.container
AND sales > i.container_id
AND sales = (SELECT MIN(ihh.container_id)
FROM major_sales ihh
WHERE ihh.container_id > i.container_id
AND ihh.container = i.container)), NULL,

June 09, 2005 - 6:35 pm UTC

not out of context, no.

I am still having problem with analytical function

A reader, July 01, 2005 - 12:31 pm UTC

select i.container,ssl_user_code,ssl_user_code ssl,cl.code length_code, out_trucker_code, i.chassis,
lead(in_date) over (partition by i.container order by in_date) next_in_date,
out_date,
lead (out_date) over (partition by i.container order by in_date) o_date
from his_containers i,
container_masters cm,
tml_container_lhts clht,
tml_container_lengths cl
WHERE cm.container = i.container
and cm.lht_code = clht.code
and clht.length_code = cl.code
and ssl_user_code = 'ACL'
and i.container like '%408014'
and voided_date is null
and ((in_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')) OR
(out_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')))

results:
----------
CONTAINER SSL_USER_CODE SSL LENGTH_CODE OUT_TRUCKER_CODE CHASSIS NEXT_IN_DATE OUT_DATE O_DATE
ACLU408014 ACL ACL 4 R0480 3/22/2005 2:52:41 PM 3/21/2005 3:45:48 PM 4/6/2005 2:25:59 PM
ACLU408014 ACL ACL 4 J1375 4/6/2005 2:25:59 PM

1. how can I get rid of the 4/6/2005 2:25:59 PM???

July 01, 2005 - 1:52 pm UTC

can you be more specific about why you don't like April 6th as 2:25:59pm? what is it about that you don't like?

That'll help me tell you how to in general remove it. What is the criteria for removal

analytical query

A reader, July 01, 2005 - 2:19 pm UTC

Tom,

We are trying to build the client within a the month, in this case is within april. I also would like to know how many days elapsed during 2 days so I can bill them.

July 01, 2005 - 3:15 pm UTC

"how many days elapsed between 2 days"

the answer is: 2

but are you asking how to do date arithmetic? Just subtract.

sorry...within March

A reader, July 01, 2005 - 2:21 pm UTC

more information

A reader, July 01, 2005 - 2:29 pm UTC

Tom,

This is how the data looks

IN_DATE OUT_DATE CONTAINER
1/3/2005 2:23:05 PM 1/10/2005 5:05:16 PM ACLU408014
1/11/2005 1:04:49 PM 1/12/2005 8:49:06 AM ACLU408014
1/14/2005 12:09:50 PM 1/18/2005 6:39:10 AM ACLU408014
3/19/2005 2:10:24 AM 3/21/2005 3:45:48 PM ACLU408014
3/22/2005 2:52:41 PM 4/6/2005 2:25:59 PM ACLU408014
4/7/2005 1:24:43 PM 4/10/2005 2:21:59 AM ACLU408014

and I would like to get the pair within the same month

July 01, 2005 - 3:16 pm UTC

the pair of "what"?

I would like to get all the dates within the month

A reader, July 01, 2005 - 4:03 pm UTC

July 01, 2005 - 4:13 pm UTC

please be much much more specific. pretend you trying to explain this to a newbie...

not following the requirement at all..
</code> http://asktom.oracle.com/Misc/how-to-ask-questions.html <code>

one more try

A reader, July 01, 2005 - 4:26 pm UTC

This is how the data looks as of now with the above query.
IN_DATE OUT_DATE CONTAINER
1/3/2005 2:23:05 PM 1/10/2005 5:05:16 PM ACLU408014
1/11/2005 1:04:49 PM 1/12/2005 8:49:06 AM ACLU408014
1/14/2005 12:09:50 PM 1/18/2005 6:39:10 AM ACLU408014
3/19/2005 2:10:24 AM 3/21/2005 3:45:48 PM ACLU408014
3/22/2005 2:52:41 PM 4/6/2005 2:25:59 PM ACLU408014
4/7/2005 1:24:43 PM 4/10/2005 2:21:59 AM ACLU408014

I Would like to get it as the following

IN_DATE OUT_DATE CONTAINER

3/19/2005 2:10:24 AM 3/21/2005 3:45:48 PM ACLU408014
3/22/2005 2:52:41 PM

This is what I am looking for.....this way.

July 01, 2005 - 4:46 pm UTC

still not much of a specification (important thing for those of us in this industry - being able to describe the problem at hand in detail, so someone else can take the problem definition and code it).

Let me try, this is purely a speculative guess on my part:

I would like all records in the table such that the in_date-out_date range covered at least part of the month of march in the year 2005.

If the out_date falls AFTER march, I would like it nulled out.

(this part is a total guess) if the in_date falls BEFORE march, i would like it nulled out as well (for consistency?)

Ok, stated like that I can give you untested psuedo code since there are no create tables and no inserts to play with:

select case when in_date between to_date( :x, 'dd-mon-yyyy' )
and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
then in_date end,
case when out_date between to_date( :x, 'dd-mon-yyyy' )
and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
then out_date end,
container
from T
where in_date <= to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
and out_date >= to_date( :x, 'dd-mon-yyyy' )

bind in :x = '01-mar-2005' and :y = '01-apr-2005' for your dates.

As you requested

A reader, July 01, 2005 - 5:01 pm UTC

CREATE TABLE CONTAINER_MASTERS
(
CONTAINER VARCHAR2(10 BYTE) NOT NULL,
CHECK_DIGIT VARCHAR2(1 BYTE) NOT NULL,
SSL_OWNER_CODE VARCHAR2(5 BYTE) NOT NULL,
LHT_CODE VARCHAR2(5 BYTE) NOT NULL

)

INSERT INTO CONTAINER_MASTERS ( CONTAINER, CHECK_DIGIT, SSL_OWNER_CODE,
LHT_CODE ) VALUES ( '045404', '1', 'BCL', '5AV');
commit;

CREATE TABLE TML_CONTAINER_LHTS
(
CODE VARCHAR2(5 BYTE) NOT NULL,
SHORT_DESCRIPTION VARCHAR2(10 BYTE) NOT NULL,
LONG_DESCRIPTION VARCHAR2(30 BYTE) NOT NULL,
ISO VARCHAR2(4 BYTE) NOT NULL,
LENGTH_CODE VARCHAR2(5 BYTE) NOT NULL

)

INSERT INTO TML_CONTAINER_LHTS ( CODE, SHORT_DESCRIPTION, LONG_DESCRIPTION, ISO, LENGTH_CODE,
HEIGHT_CODE, TYPE_CODE ) VALUES ( '5BR', '5BR', '45'' 9''6" Reefer', '5432', '5', 'B', 'R');
commit;

CREATE TABLE TML_CONTAINER_LENGTHS
(
CODE VARCHAR2(5 BYTE) NOT NULL,
SHORT_DESCRIPTION VARCHAR2(10 BYTE) NOT NULL,

)

INSERT INTO TML_CONTAINER_LENGTHS ( CODE, SHORT_DESCRIPTION,
LONG_DESCRIPTION ) VALUES (
'2', '20''', '20 Ft');
INSERT INTO TML_CONTAINER_LENGTHS ( CODE, SHORT_DESCRIPTION,
LONG_DESCRIPTION ) VALUES (
'4', '40''', '40 Ft');
commit;

July 01, 2005 - 6:06 pm UTC

umm, specification?

did I get it right? if so, did you *try* the query at all???

Here is a SQL puzzle for analytics zealots

Mikito Harakiri, July 01, 2005 - 10:33 pm UTC

OK, if anybody suceed writing the following with analytics, I would convert to analytics once and forever. Credit it in the book, of course.

Given:
table Hotels (
name string,
price integer,
distance
)

Here is a query that sounds very analytical:
Order hotels by price, distance. Compare each record with its neighbour (lag?), and one of them is inferior to the other by both criteria -- more pricey and father from the beach -- then throw it away from the result.

July 02, 2005 - 9:20 am UTC

define neighbor.

is neighbor defined by price or by distance? your specification is lacking many many details (seems to be a recurring theme on this page for some reason)

sounds like you want the cheapest closest hotel to the beach. for each row, if something closer and cheaper exists in the original set, do not keep that row.

sounds like a where not exists, not analytics to me. but then - the specification is lacking.

And lets see, in order to appreciate a tool, you have to be shown that the tool can be the end all, be all answer to everything??!?? that is downright silly don't you think.

Let's see:

"if anyone succeeds in making the Oracle 9i merge command select data, I would convert to merge once and forever"

"if anyone succeeds in making my car fly into outer space, I would convert to cars once and forever"

Think about your logic here.

There are no zealots here, there are people willing to read the documentation, understand that things work the way they work, not the way THEY think they should have been made to work, and have jobs to do, pragmatic practical things to accomplish and are willing to use the best tool for the job.

specs

Mikito Harakiri, July 03, 2005 - 11:07 pm UTC

Yes, find all the hotels that are not dominated by the others by both price and distance. That is "not exists" query, but it is a very inefficient one:

select * from hotels h
where not exists (select * from hotels hh
where hh.price < h.price and hh.distance <= h.distance
or hh.price <= h.price and hh.distance < h.distance
)

The one that reformulated is much more efficient, but how do I express it in SQL?

July 04, 2005 - 10:25 am UTC

the one that reforumulated?  

and why do you have the or in there at all.  to dominate by both pric and distance would simply be:

where not exists ( select NULL
                     from hotels hh
                    where hh.price < h.price 
                      AND hh.distinct < h.distance )

You said "by BOTH price and distance", nothing but nothing about ties.


ops$tkyte@ORA9IR2> /*
DOC>
DOC>drop table hotels;
DOC>
DOC>create table hotels
DOC>as
DOC>select object_name name, object_id price, object_id distance, all_objects.*
DOC>  from all_objects;
DOC>
DOC>create index hotel_idx on hotels(price,distance);
DOC>
DOC>exec dbms_stats.gather_table_stats( user, 'T', cascade=>true );
DOC>*/
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select h1.name, h1.price, h1.distance
  2    from hotels h1
  3   where not exists ( select NULL
  4                        from hotels h2
  5                       where h2.price < h1.price
  6                         AND h2.distance < h1.distance )
  7  /
 
NAME                                PRICE   DISTANCE
------------------------------ ---------- ----------
I_OBJ#                                  3          3
 
Elapsed: 00:00:00.22
ops$tkyte@ORA9IR2> select count(*) from hotels;
 
  COUNT(*)
----------
     27837
 
Elapsed: 00:00:00.00

it doesn't seem horribly inefficient.

Tom Can we give it one more try

A reader, July 05, 2005 - 9:20 am UTC

Tom, When I ran the query it returned nothing. I am sending you the whole test case. This is what I would like to see
in the report.

out_date in_date container
1/18/2005 6:39:10 AM 3/19/2005 2:10:24 AM ACLU408014
3/21/2005 3:45:48 PM 3/22/2005 2:52:41 PM ACLU408014

CREATE TABLE BETA
(
IN_DATE DATE NOT NULL,
OUT_DATE DATE,
CONTAINER VARCHAR2(10 BYTE) NOT NULL
)

INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/03/2005 02:23:05 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/10/2005 05:05:16 PM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/11/2005 01:04:49 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/12/2005 08:49:06 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/14/2005 12:09:50 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/18/2005 06:39:10 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/19/2005 02:10:24 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '03/21/2005 03:45:48 PM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/22/2005 02:52:41 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '04/06/2005 02:25:59 PM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '04/07/2005 01:24:43 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '04/10/2005 02:21:59 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
commit;

select in_date, out_date,container,
case when in_date between to_date('01-mar-2005', 'dd-mon-yyyy' )
and to_date( '31-mar-2005', 'dd-mon-yyyy' )-1/24/60/60
then in_date end,
case when out_date between to_date( '01-mar-2005', 'dd-mon-yyyy' )
and to_date( '31-mar-2005', 'dd-mon-yyyy' )-1/24/60/60
then out_date end
container
from BETA
WHERE in_date <= to_date( '01-mar-2005', 'dd-mon-yyyy' )-1/24/60/60
and out_date >= to_date( '31-mar-2005', 'dd-mon-yyyy' )

July 05, 2005 - 9:54 am UTC

you know, this is going beyond....

*s*p*e*c*i*f*i*c*a*t*i*o*n*

pretend you were explaining to your mother (who presumably doesn't work in IT and doesn't know sql or databases or whatever) what needed to be done.  

that is what I need to see.  I obviously don't know your logic of getting from "A (inputs) to B (outputs)" and you need to explain that.


and when I run my query:

ops$tkyte@ORA10G> variable x varchar2(20)
ops$tkyte@ORA10G> variable y varchar2(20)
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec :x := '01-mar-2005'; :y := '01-apr-2005'
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G> select case when in_date between to_date( :x, 'dd-mon-yyyy' )
  2                               and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
  3              then in_date end,
  4         case when out_date between to_date( :x, 'dd-mon-yyyy' )
  5                                and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
  6              then out_date end,
  7         container
  8    from beta
  9   where in_date <= to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
 10     and out_date >= to_date( :x, 'dd-mon-yyyy' )
 11  /
 
CASEWHENI CASEWHENO CONTAINER
--------- --------- ----------
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05           ACLU408014


I do get output, not what you say you want, but output.  you need to tell me THE LOGIC here.  (and maybe when you write it down, specify it, the answer will just naturally appear)

so yes, we can definitely give it one more try but if and only if you provide the details, the specification, the logic, the thoughts behind this.

Not just "i have this and want that", it doesn't work that way.

in english

Jean, July 05, 2005 - 10:33 am UTC

We are trying to bill from the time the truck left to the
time it returned. For example in the above query.
I would like to bill him from 1/18/2005 to 3/19/2005. So it must be part of the report. That's the the whole key here.

clarification!!

A reader, July 05, 2005 - 10:56 am UTC

the time he left 1/18/2005 6:39:10 AM
the time he came back 3/22/2005 2:52:41 PM

hope this helps....

July 05, 2005 - 11:28 am UTC

ops$tkyte@ORA9IR2> select * from beta order by in_date;
 
IN_DATE   OUT_DATE  CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014   <<<=== gap, no 13
14-JAN-05 18-JAN-05 ACLU408014   <<=== big gap, no 19.... mar 18
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 06-APR-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014
 
6 rows selected.


I don't get it.  I don't get it AT ALL.   does anyone else ?  

nope, not getting it even a teeny tiny bit myself.


give us LOGIC, ALGORITHM, INFORMATION.


like I said, pretend I'm your mother who has never seen a computer -- explain the logic at that level (or I just give up)

BETTER TABLE

A reader, July 05, 2005 - 11:57 am UTC

INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/03/2005 02:23:05 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/10/2005 05:05:16 PM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/11/2005 01:04:49 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/12/2005 08:49:06 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/14/2005 12:09:50 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/18/2005 06:39:10 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/19/2005 02:10:24 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '03/21/2005 03:45:48 PM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '04/07/2005 01:24:43 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '04/10/2005 02:21:59 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/22/2005 02:52:41 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '04/06/2005 02:25:59 PM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
commit;

OUT_DATE IN_DATE
1/18/2005 6:39:10 AM 3/19/2005 2:10:24 AM
3/21/2005 3:45:48 PM 3/22/2005 2:52:41 PM

LEFT 1/18 CAME BACK 3/19
LEFT 3/21 CAME BACK 3/22

July 05, 2005 - 12:20 pm UTC

you have totally and utterly missed my point.

IN_DATE OUT_DATE CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014
14-JAN-05 18-JAN-05 ACLU408014
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 06-APR-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014

6 rows selected.

sigh.

what if the records are

IN_DATE OUT_DATE CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014
14-JAN-05 18-JAN-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014

specification, you know what, without it, I'm not even going to look anymore. Textual description of precisely what you want. I'm tired of guessing. I think I can guess, but I don't even want to guess about "missing" months like my second example here.

English Explanation

A reader, July 05, 2005 - 1:06 pm UTC

Sorry for going back and forth on this report. All I want is the following: We have trucks that comes and out of yard. All we are looking for is when the truck came in and the "next record" nothing in between because a truck can come in many times during a month. So we want when it first came in and the very last time he went out for a particular month.That is to say the last time he left the yard. So the date and time should give us this information. Finally this report should be within a month.

example:

IN_DATE OUT_DATE CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014
14-JAN-05 18-JAN-05 ACLU408014
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 06-APR-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014

6 rows selected.

in this case we want

in_date out_date
-------- --------
3/22/2005 2:52:41PM 1/18/2005 6:39:10 AM

July 05, 2005 - 1:17 pm UTC

so what happened to the 21st/22nd of march this time. the answer keeps changing?

and what if, there are no records for march in the table (nothing in_date/out_date wise)

follow up

jean, July 05, 2005 - 1:57 pm UTC

Tom,

We realized that it maybe too much to get the dates in between
so we opt for just getting the in_date and out_date. By the way there will always be data so do not worry about if....

Thanks!!

July 05, 2005 - 3:11 pm UTC

feb, what about feb? you said there would always be data? I want to run this for feb?

do you or do you not need to be concerned about a missing month.

do not be concerned!

A reader, July 05, 2005 - 3:22 pm UTC

Please do not be concerned about missing a month. This is a report.

July 05, 2005 - 3:46 pm UTC

umm, I want the report for feburary

it is blank.

now what?  it should not be blank should it?  this is a problem, this is a problem in our industry in general.  You get what you ask for (sometimes) and if you ask repeatedly for the wrong thing, that's what you'll get.  I am concerned -- by this line of question here.

Hey, here you go:

ops$tkyte-ORA9IR2> select *
  2    from (
  3  select
  4         lag(out_date) over (partition by container order by in_date) last_out_date,
  5         in_date,
  6             container
  7    from beta
  8         )
  9   where trunc(in_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
 10      or trunc(last_out_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy');
 
LAST_OUT_ IN_DATE   CONTAINER
--------- --------- ----------
18-JAN-05 19-MAR-05 ACLU408014
21-MAR-05 22-MAR-05 ACLU408014

gets the answer given your data, makes a zillion assumptions (50% of which are probably wrong), won't work for FEB, probably doesn't answer the question behind the question, but hey, there you go.

Thanks!!!

A reader, July 06, 2005 - 9:00 am UTC

I will try it ...Thanks a zillion for your efforts and your patient.

Thanks!

A reader, July 06, 2005 - 11:55 am UTC

CREATE TABLE BETA3
(
IN_DATE DATE NOT NULL,
OUT_DATE DATE,
CONTAINER VARCHAR2(10 BYTE) NOT NULL
)

INSERT INTO BETA3 ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '07/20/2004 03:08:49 PM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_Date( '08/10/2004 02:45:52 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU040312');
INSERT INTO BETA3 ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/19/2005 01:55:06 AM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_Date( '03/27/2005 05:05:36 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU040312');
commit;

Tom I was able to get the first pair as show

last_out_date in_date container
8/10/2004 2:45:52 AM 3/19/2005 1:55:06 AM ACLU040312

which is fine...

But can I get the other pair?

last_out_date in_date container
3/27/2005 5:05:36 AM

July 06, 2005 - 12:44 pm UTC

problem is, you are "missing" a row and 'making up' data is hard.

it might be

ops$tkyte-ORA10G> select decode( r, 1, last_out_date, out_date ),
  2         decode( r, 1, in_date, next_in_date )
  3    from (
  4  select
  5         lag(out_date) over (partition by container order by in_date) last_out_date,
  6         in_date, out_date,
  7         lead(in_date) over (partition by container order by in_date) next_in_date,
  8             container
  9    from beta3
 10         ), ( select 1 r from dual union all select 2 r from dual )
 11   where ((
 12          trunc(in_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
 13          or
 14                  trunc(last_out_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
 15             ) and r = 1 )
 16             or
 17             ( next_in_date is null and r = 2 )
 18  /
 
DECODE(R,1,LAST_OUT_ DECODE(R,1,IN_DATE,N
-------------------- --------------------
10-aug-2004 02:45:52 19-mar-2005 01:55:06
27-mar-2005 05:05:36

still curious what happens in feb.

Please refer some books to learn Oracle Analytic functions

Vijay, July 07, 2005 - 7:58 am UTC

July 07, 2005 - 9:47 am UTC

data warehousing guide (freely available on otn.oracle.com)

Expert one on one Oracle (I have a big chapter on them in there)

Thank you very much!!

Jean, July 08, 2005 - 10:35 am UTC

I want to thank you for the last query!!! it worked very well,even tho I still get dates outside of the range. But overall it's fine.

How to get contiguous date ranges from Start_date, end_date pairs?

Bob Lyon, July 11, 2005 - 3:15 pm UTC

-- Tom, Suppose I have a table with data...

-- MKT_CD START_DT_GMT END_DT_GMT
-- ------ ----------------- -----------------
-- AAA 07/11/05 00:00:00 07/12/05 00:00:00
-- BBB 07/11/05 00:00:00 07/11/05 01:00:00
-- BBB 07/11/05 01:00:00 07/11/05 02:00:00
-- BBB 07/11/05 02:00:00 07/11/05 03:00:00
-- BBB 07/11/05 06:00:00 07/11/05 07:00:00
-- BBB 07/11/05 07:00:00 07/11/05 08:00:00

-- What I would like to get is the "contiguous date ranges"
-- by MKT_CD, i.e.,

-- MKT_CD START_DT_GMT END_DT_GMT
-- ------ ----------------- -----------------
-- AAA 07/11/05 00:00:00 07/12/05 00:00:00
-- BBB 07/11/05 00:00:00 07/11/05 03:00:00
-- BBB 07/11/05 06:00:00 07/11/05 08:00:00

-- I have played with LAG/LEAD/FIRST_VALUE/LAST_VALUE
-- but seem to just "go in circles" trying to code this.

-- Here is the test data setup (Oracle 9.2.0.6) :

CREATE GLOBAL TEMPORARY TABLE NM_DEMAND_BIDS_API_GT
(
MKT_CD VARCHAR2(6) NOT NULL,
START_DT_GMT DATE NOT NULL,
END_DT_GMT DATE NOT NULL
)
ON COMMIT PRESERVE ROWS;

-- This code has 24 hours
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('AAA', TRUNC(SYSDATE), TRUNC(SYSDATE) + 1);
-- A second code goes by hours
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 00/24, TRUNC(SYSDATE) + 01/24);
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 01/24, TRUNC(SYSDATE) + 02/24);
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 02/24, TRUNC(SYSDATE) + 03/24);
-- and has an intentional gap
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 06/24, TRUNC(SYSDATE) + 07/24);
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 07/24, TRUNC(SYSDATE) + 08/24);

-- Query

SELECT MKT_CD, START_DT_GMT, END_DT_GMT
FROM NM_DEMAND_BIDS_API_GT;

July 11, 2005 - 3:49 pm UTC

based on:

https://www.oracle.com/technetwork/issue-archive/2014/14-mar/o24asktom-2147206.html


ops$tkyte@ORA9IR2> select mkt_cd, min(start_dt_gmt), max(end_dt_gmt)
  2    from (
  3  select mkt_cd, start_dt_gmt, end_dt_gmt,
  4         max(grp) over (partition by mkt_cd order by start_dt_gmt) mgrp
  5    from (
  6  SELECT MKT_CD,
  7         START_DT_GMT,
  8         END_DT_GMT,
  9         case when lag(end_dt_gmt) over (partition by mkt_cd order by start_dt_gmt) <> start_dt_gmt
 10                   or
 11                   lag(end_dt_gmt) over (partition by mkt_cd order by start_dt_gmt) is null
 12              then row_number() over (partition by mkt_cd order by start_dt_gmt)
 13          end grp
 14    FROM NM_DEMAND_BIDS_API_GT
 15         )
 16         )
 17   group by mkt_cd, mgrp
 18   order by 1, 2
 19  /
 
MKT_CD MIN(START_DT_GMT)    MAX(END_DT_GMT)
------ -------------------- --------------------
AAA    11-jul-2005 00:00:00 12-jul-2005 00:00:00
BBB    11-jul-2005 00:00:00 11-jul-2005 03:00:00
BBB    11-jul-2005 06:00:00 11-jul-2005 08:00:00

Thanks!

Bob Lyon, July 11, 2005 - 5:20 pm UTC

Wow, that was fast.

The trick here is the MAX() analytic function. I could tag the lines where a break was to occur but couldn't figure out how to carry forward the tag/grp.

Thanks Again!

Analytical functions book

Vijay, July 11, 2005 - 11:55 pm UTC

Thanks a lot

More Help

Jean, July 26, 2005 - 5:40 pm UTC

Tom,

How can I get "just" the record within the scope? I am getting record outside of march.

select container,decode( r, 1, last_out_date, out_date )out_date, decode( r, 1, in_date, next_in_date) in_date,
code length_code,chassis,out_trucker_code,ssl_user_code ssl, ssl_user_code,out_mode
from (
select lag(out_date) over (partition by i.container order by in_date)
last_out_date,
i.ssl_user_code,
in_date,
cl.code,
i.out_trucker_code,
i.ssl_user_code ssl,
i.container,
i.chassis,
out_mode,
out_date,
clht.length_code,
lead(in_date) over (partition by i.container order by in_date)
next_in_date
from his_containers i,container_masters cm,tml_container_lhts clht,tml_container_lengths cl
where cm.container = i.container
and cm.lht_code = clht.code
and cl.code = clht.length_code
and ssl_user_code = 'ACL'
and i.container = 'ACLU214285'
and voided_date is null
and chassis is null
and in_mode = 'T'
and out_mode = 'T' ), ( select 1 r from dual union all select 2 r from dual )
where (( trunc(in_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
or trunc(last_out_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy'))
and r = 1 ) or ( next_in_date is null and r = 2 )
order by out_date

July 26, 2005 - 5:57 pm UTC

select *
from (Q)
where <any other conditions you like>
order by out_date;

replace Q with your query.

that's what I got in my query.....

A reader, July 26, 2005 - 6:03 pm UTC

July 26, 2005 - 6:23 pm UTC

don't know what you mean

I thought I was doing what you suggested already...

A reader, July 26, 2005 - 6:41 pm UTC

July 26, 2005 - 6:56 pm UTC

I cannot see your output, obviously you are getting more data than you wanted it -- add to the predicate in order to filter it out. don't know what else to say.

More information..

Jean, July 27, 2005 - 9:14 am UTC

the way it was before

CONTAINER OUT_DATE IN_DATE LENGTH_CODE CHASSIS OUT_TRUCKER_CODE
ACLU217150 6/25/2004 2:58:01 PM 3/11/2005 7:36:29 PM 4 E2131 ACL ACL T

---with your changes---

CONTAINER OUT_DATE IN_DATE LENGTH_CODE CHASSIS OUT_TRUCKER_CODE ACLU217150 6/25/2004 2:58:01 PM 3/11/2005 7:36:29 PM 4 E2131

my history tables

CONTAINER_ID OUT_DATE IN_DATE
31779 6/21/2004 10:03:25 AM 6/16/2004 1:33:50 AM
55317 6/25/2004 2:58:01 PM 6/25/2004 2:19:49 PM
672863 3/2/2005 7:03:31 PM 2/26/2005 6:03:49 PM
708598 4/4/2005 3:31:03 PM 3/11/2005 7:36:29 PM
779305 4/16/2005 1:03:36 PM 4/6/2005 2:04:53 PM

as you can see I am not picking up the records within the month of march...with or without
the changes to the query.

July 27, 2005 - 10:27 am UTC

sorry -- you'll need to work through this, you see the techniques involved right -- lag, lead, analytic functions, YOU understand your data much better than I.

(because in part, frankly, the "way it was before" and "with your changes" look, well, I don't know -- the same I think to me as displayed here)

Thanks for your help!

A reader, July 27, 2005 - 1:26 pm UTC

I know the data, however I thought I was going to be something easy just to get the date within march...I guess not.

count number of rows in a number of ranges

A reader, July 27, 2005 - 6:08 pm UTC

Hi

I would like to count the number of rows I have per range of values. For example

SELECT RANGE, SUM(suma) total_per_deptno
FROM (SELECT CASE
WHEN deptno between 10 and 20 THEN '10-20'
ELSE '30'
END RANGE,
deptno, 1 SUMA
FROM scott$emp)
GROUP BY RANGE

RANGE TOTAL_PER_DEPTNO
----- ----------------
10-20 8
30 6

Can I rewrite that query in some other way so range can be dynamic such as

11-20
21-30
31-40

and counts the number of rows?

Thank you

July 27, 2005 - 6:33 pm UTC

if you can come up with a function f(x) such that f(x) returns what you want, sure.

EG:

for you 11-20, 21-30, 31-40 -- well

f(deptno) = trunc( (deptno-0.1)/10)

(assuming deptno is an integer) -- that'll bin up deptno 0..10, 11..20, 21..30 and so on into groups 0, 1, 2, 3, ....

A reader, August 02, 2005 - 1:35 pm UTC

Tom,

I hope you can provide an insight to this.

table emp1 is shown below.

EmpId Week Year Day0 Day1 ..... Day14

100 20 2005 8 8 8
200 22 2003 0 0 8
300 25 2004 8 8 0
400 06 2005 0 8 8
500 08 2002 8 0 8

create table emp1(empid varchar2(3), week varchar2(2), year varchar2(4), day0 number(2), day1 number(2), day2 number(2), day3 number(2), day4 number(2), day5 number(2), day6 number(2), day7 number(2), day8 number(2), day9 number(2), day10 number(2), day11 number(2), day12 number(2), day13 number(2), day14 number(2));

insert into emp1 values('100', '20', '2005', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
insert into emp1 values('200', '22', '2003', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
insert into emp1 values('300', '25', '2004', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 0);
insert into emp1 values('400', '06', '2005', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
insert into emp1 values('500', '08', '2002', 8, 0, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);

I am trying to select emp1 records as follows:

EmpId, Date of the day, Hours worked per day

Firstly, I have to calculate date of the day of a record (first day that corresponds to Day0) using
week of the year and year. Then I have to increment the day by 1, 2 ...14
to get the hours worked for each particular date

Example: Assuming that week 20 of 2005 is 05/07/2005. It corresponds to Day0 in the same record

Day1 column corresponds to the next day which is 05/08/2005. Day2 becomes 05/09/2005 and so on ...

Then, I have to print individual rows for each empid as:

100 05/07/2005 8
100 05/08/2005 8
.....
200 05/22/2003 0
200 05/23/2003 8
.. and so on for all empid's ...

Thank you.

August 02, 2005 - 2:09 pm UTC

oh no, columns where rows should be :(

and basically you are saying "i need ROWS where these rows should be!"

tell me, how do you turn 20 into a date?

A reader, August 02, 2005 - 2:19 pm UTC

Tom,

I should've explained it better. Week 20 of 2005, here should be translated to the first day of week 20 of 2005 (Assuming it is 05/07/2005). That corresponds to Day0 of that row. Day1 becomes 05/08/2005 and so on ...

Is there a function or approach that can convert columns to rows?

August 02, 2005 - 3:30 pm UTC

no, i mean -- what function/logic/algorithm are you using to figure out "week 20 is this day"

A reader, August 02, 2005 - 9:06 pm UTC

Tom,

Sorry, firstly, the date is not calculated the way I said above. It's not clear yet how the date is obtained. This issue is under review and I think I'll obtain date by joining empid with some table (say temp1). However, I am sure I will have to use date (such as 05/07/2005), associate it with Day0 column value. Day1 becomes 05/08/2005 and so on .. However, I am trying to obtain a sql or pl/sql that can arrange the rows as described above. Any ideas? Thanks.

August 03, 2005 - 10:06 am UTC

I cannot tell you how much I object to this model.  

storing "week" and "year" - UGH.

storing them in STRINGS - UGH UGH UGH.

storing things that should be cross record in record UGH to the power of 10.

I had to fix your inserts, they did not work, added day14 of zero.


ops$tkyte@ORA10G> with dates as
  2  (select to_date( '05/07/2005','mm/dd/yyyy')+level-1 dt, level-1 l from dual connect by level <= 15 )
  3  select empid, dt,
  4         case when l = 0 then day0
  5                  when l = 1 then day1
  6                  when l = 2 then day2
  7                          /* ... */
  8                  when l = 13 then day13
  9                  when l = 14 then day14
 10                  end data
 11    from (select * from emp1 where week = 20), dates
 12  /
 
EMP DT              DATA
--- --------- ----------
100 07-MAY-05          8
100 08-MAY-05          8
100 09-MAY-05          0
100 10-MAY-05
100 11-MAY-05
100 12-MAY-05
100 13-MAY-05
100 14-MAY-05
100 15-MAY-05
100 16-MAY-05
100 17-MAY-05
100 18-MAY-05
100 19-MAY-05
100 20-MAY-05          8
100 21-MAY-05          0
 
15 rows selected.

A reader, August 03, 2005 - 3:18 pm UTC

Tom,

Thanks for the solution. I need some more help if you don't mind. The sql works excellently and I experimented with it.

However, this question is based on a change of design here ... The emp1 table is joined with trn1 table (empid ~ trnid) to obtain values x and y. x and y should be passed to a function that returns date.

The emp1 table is like:

EmpId Day0 Day1 ..... Day14

100 8 8 8
200 0 0 8
300 8 8 0
400 0 8 8
500 8 0 8

trn1 table is like:

trnid x y
100 3 18
200 4 19
300 5 20
400 6 21
500 7 22

etc ...

create table emp1(empid varchar2(3), day0 number(2), day1 number(2), day2 number(2), day3 number(2), day4 number(2), day5 number(2), day6 number(2), day7 number(2), day8 number(2), day9 number(2), day10 number(2), day11 number(2), day12 number(2), day13 number(2), day14 number(2));

insert into emp1 values('100', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 8);
insert into emp1 values('200', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 0);
insert into emp1 values('300', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 0, 0);
insert into emp1 values('400', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 8);
insert into emp1 values('500', 8, 0, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 8);

create table trn1(empid varchar2(3), x number(2), y number(2));

insert into trn1 values('100', 3, 18);
insert into trn1 values('200', 4, 19);
insert into trn1 values('300', 5, 20);
insert into trn1 values('400', 6, 21);
insert into trn1 values('500', 7, 22);

I used this function on just one row of emp1 (by hard coding x and y values).

I replaced

with dates as
(select to_date( '05/07/2005','mm/dd/yyyy')+level-1 dt, level-1 l from dual
connect by level <= 15 )

with

with dates as
(select getXYDate(x,y)+level-1 dt, level-1 l from dual
connect by level <= 15 )

However, I am trying to implement this on every row of emp1 by obtaining x and y from trn. There is no week or year in emp1 table. Any help? Thanks again.

August 03, 2005 - 6:00 pm UTC

I didn't think it was possible, but now I like this even less than before!  didn't think you could do that ;(


ops$tkyte@ORA10G> with dates as
  2  (select to_date( '05/07/2005','mm/dd/yyyy')+level-1 dt, level-1 l from dual 
connect by level <= 15 )
  3  select empid, dt,
  4         case when l = 0 then day0
  5                  when l = 1 then day1
  6                  when l = 2 then day2
  7                          /* ... */
  8                  when l = 13 then day13
  9                  when l = 14 then day14
 10                  end data
 11    from ( QUERY ), dates
 12  /

replace query with a join of emp with trn and apply the function in there.

A reader, August 03, 2005 - 7:38 pm UTC

Tom,

Sorry to bother you again. In my case, I think
(select to_date( '05/07/2005','mm/dd/yyyy') will not help me anymore because I have to basically find dates for Day0 .. Day14 of every row in emp1 table. The first date (date that corresponds to Day0) for each record should be obtained using a function by passing X and Y values of trn table .. Because each record may have different x, y values.
If it's not achievable using this way, can you suggest an alternate approach. I am trying to make a function that would use a loop. Also, the data should be written to a text file once complete, in that case I think a procedure might help and if so, could you throw some light? Thanks for your patience.

August 03, 2005 - 8:26 pm UTC

well, you just need to generate a set of 15 numbers (L)

and add them in later than. No big change. You have the "start_date" from the function right -- just add L to dt.

A reader, August 03, 2005 - 8:38 pm UTC

Ok, Can you please show that if possible?

A reader, August 03, 2005 - 9:38 pm UTC

Tom,

I tried this and am getting an error: ORA-00904: "DAY13": invalid identifier

WITH DATES AS
(SELECT FUNC_XY(17,2003)+level-1 dt, level-1 l FROM DUAL
connect by level <= 15)
select empid, day0, day14, x, y, dt,
case when l = 0 then day0
when l = 1 then day1
when l = 2 then day2
when l = 3 then day3
when l = 4 then day4
when l = 5 then day5
when l = 6 then day6
when l = 7 then day7
when l = 8 then day8
when l = 9 then day9
when l = 10 then day10
when l = 11 then day11
when l = 12 then day12
when l = 13 then day13
when l = 14 then day14
end data
from (select emp1.empid, day0, day14, x, y from emp1, trn1 where emp1.empid = trn1.empid), dates
/

As said before ... I also have to use x and y instead of 17 and 2003 in order to compute it for every row.

August 04, 2005 - 8:20 am UTC

yeah, well -- you didn't select it out in the inline view. fix that.

look the concept is thus:

with some_rows as ( select level-1 l from dual connect by level <= 15 )
select a.empid, a.dt+l, case when l=0 then a.day0
...
when l=14 then a.day14
end data
from some_rows,
(select emp1.empid, func_xy(trn1.x, trn1.y) dt,
emp1.day0, emp1.day1, .... <ALL OF THE DAYS>, emp1.day14
from emp1, trn1
where emp1.empid = trn1.empno )

A reader, August 04, 2005 - 9:15 am UTC

Tom,

Here, the sql is using a.empid, a.dt+l ...

whereas the inner sql is using emp1.day0, trn1.empid , etc ... My real inner sql well uses some more columns adn joins as well. When this gave me error, I just substituted emp1.day0, emp1.day14 etc ... with day0, day14 etc .. and it worked. However, when there are several joins with alias names, How should it be done?

To make it a bit clear, this sql looks similar to:

select emp1.empid, emp1.day0 from some_rows, (select emp1.empid, emp1.day0) ...

Any idea how to select from select and still use multiple joins etc ... Hope I am clear

August 04, 2005 - 9:56 am UTC

you can join as much as you WANT in the inline views.

Sorry, I cannot go further with this one, I've shown the technique -- it is just a pivot to turn COLUMNS THAT SHOULD HAVE BEEN ROWS into rows -- very common.

A reader, August 04, 2005 - 9:52 am UTC

Please ignore above post.

I need some help

Carlos, August 09, 2005 - 10:25 am UTC

Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('11/15/2004 17:42:56', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('11/18/2004 15:09:19', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('11/24/2004 09:38:15', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('11/30/2004 04:28:09', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('01/03/2005 14:36:24', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('01/05/2005 10:04:15', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('01/07/2005 08:54:59', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('01/10/2005 10:54:07', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('01/12/2005 10:13:13', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('01/18/2005 04:23:41', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/03/2005 03:15:05', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/09/2005 18:54:11', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/11/2005 13:25:40', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/15/2005 21:47:41', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/22/2005 20:27:03', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/29/2005 17:05:04', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/22/2005 20:27:15', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/30/2005 08:53:13', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/30/2005 13:16:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('04/16/2005 13:40:44', 'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/30/2005 15:08:39', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('04/16/2005 13:40:44', 'MM/DD/YYYY HH24:MI:SS'));
COMMIT;

Tom,

I hope you can help since I have been struggling with this report. I would like to get something like this...

IN ORDER WORDS I WANT TO GET WHEN IT FIRST WAS LOGED IN INDATE AND WHEN IT WAS LAST LOGed IN OUT_DATE. SORT OF LIKE MIN AND MAX. In this case for example for the month of March, however it can be for any given Month. Any Ideas how I can accomplish that?

IN_DATE OUT_DATE
3/22/2005 8:27:03 PM 3/30/2005 3:08:39 PM

----from the table above for the month of March

August 09, 2005 - 10:45 am UTC

insufficient detail here, why won't min/max work for you for example.

but I don't understand the logic behind the two values you say you want, I don't get how you arrived at them.

This is what I get

A reader, August 09, 2005 - 10:57 am UTC

select in_date, out_date
from lou_date
where id = 201048
and ((out_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')) OR
(in_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')))

I get the following:

In_date out_date

3/22/2005 8:27:03 PM 3/29/2005 5:05:04 PM
3/30/2005 3:08:39 PM 4/16/2005 1:40:44 PM

August 09, 2005 - 11:19 am UTC

ok,

Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/11/2005 13:25:40', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/15/2005
21:47:41', 'MM/DD/YYYY HH24:MI:SS'));

why didn't you get that row. for example.

A reader, August 09, 2005 - 11:48 am UTC

SQL Statement which produced this data:
select in_date, out_date
from lou_date
where ((out_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')) OR
(in_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')))
order by out_date

3/3/2005 3:15:05 AM 3/9/2005 6:54:11 PM
3/11/2005 1:25:40 PM 3/15/2005 9:47:41 PM
3/11/2005 1:25:40 PM 3/15/2005 9:47:41 PM
3/22/2005 8:27:03 PM 3/29/2005 5:05:04 PM
3/22/2005 8:27:15 PM 3/30/2005 8:53:13 AM
3/30/2005 1:16:00 PM 4/16/2005 1:40:44 PM
3/30/2005 3:08:39 PM 4/16/2005 1:40:44 PM

I guess my question is I would like to that when
I get records with beyond march it should be replace
with blank or Null...since I can't charged him/her
for April...

August 09, 2005 - 12:00 pm UTC

I am so not following you here.

A reader, August 09, 2005 - 12:24 pm UTC

Tom,

Pretend that you are charging someone for a particular month. Let's say the month of March. So you would like to do a query that reflect just that..so a group of dates are given to you and in that group of dates you have multiple records with the same id. Also some records containts records that inintiated in march but came back in April. Here is are the examples..but it can work with any dates...

example 1.

in_date out_date
3/22/2005 8:27:15 PM 3/30/2005 8:53:13 AM
3/30/2005 1:16:00 PM 4/16/2005 1:40:44 PM

would like to see:
in_date out_date
3/22/2005 8:27:15 PM 3/30/2005 1:16:00 PM

example 2

In_date out_date
3/3/2005 3:15:05 AM 3/9/2005 6:54:11 PM
3/11/2005 1:25:40 PM 3/15/2005 9:47:41 PM

would like to see:

In_date out_date
3/3/2005 3:15:05 AM 3/15/2005 9:47:41 PM

August 09, 2005 - 12:42 pm UTC

begs the question

in_date out_date
20-feb-2005 15-apr-2005

or

in_date out_date
3/22/2005 8:27:15 PM 3/25/2005 8:53:13 AM
3/30/2005 1:16:00 PM 4/16/2005 1:40:44 PM

what then. Be able to clearly specify the "goal" or the "algorithm" usually leads us straight to the query itself. There are so many ambiguities here. Pretend you were actually documenting this for a junior programmer to program. Give them the specifications. In gory detail.

please don't just answer these two what thens -- think of all of the cases (cause I'll just keep on coming back with "what then" if you don't)

Remember -- I know NOTHING about your data, not a thing. This progression from

... I WANT TO GET WHEN IT FIRST WAS LOGED IN INDATE AND WHEN IT WAS
LAST LOGed IN OUT_DATE. SORT OF LIKE MIN AND MAX....

to this has been 'strange' to say the least.

Full explanation of requirements

A reader, August 09, 2005 - 3:10 pm UTC

Sorry for the misunderstanding Tom. Here is the full requirements. I hope I can explain it this time.

The report is a billing report and the it goes as follows:
For example for the month of March we have to bill as
in the following way:

out_date date_in Bill

2/23 3/2 3/1 to 3/2

3/1 3/3 3/1 to 3/3

3/1 4/14 3/1 to 3/31

3/1 - 3/1 to 3/31

2/23 - 3/1 to 3/31

August 09, 2005 - 3:38 pm UTC

well, i hope you give your programmers more detail.  Here is the best I'll do

ops$tkyte@ORA9IR1> select t.*,
  2         greatest( in_date, to_date('mar-2005','mon-yyyy') ) fixed_in_date,
  3         least( nvl(out_date,to_date('3000','yyyy')),  last_day( to_date( 'mar-2005', 'mon-yyyy' ) ) ) fixed_out_date
  4    from t
  5   where in_date < last_day( to_date( 'mar-2005', 'mon-yyyy' ) )+1
  6     and out_date >= to_date( 'mar-2005', 'mon-yyyy' );

 
IN_DATE   OUT_DATE  FIXED_IN_ FIXED_OUT
--------- --------- --------- ---------
03-MAR-05 09-MAR-05 03-MAR-05 09-MAR-05
11-MAR-05 15-MAR-05 11-MAR-05 15-MAR-05
22-MAR-05 29-MAR-05 22-MAR-05 29-MAR-05
22-MAR-05 30-MAR-05 22-MAR-05 30-MAR-05
30-MAR-05 16-APR-05 30-MAR-05 31-MAR-05
30-MAR-05 16-APR-05 30-MAR-05 31-MAR-05
 
6 rows selected.


predicate finds records that overlap march.

select adjusts the begin/end dates.

Thank!!!

A reader, August 10, 2005 - 12:00 pm UTC

Tom,

One more request. I would like to start the report with
the first time it went out. That is to say...

how it looks now with your help...

fix_in fix_out
3/22/2005 8:27:03 PM 3/29/2005 5:05:04 PM
3/30/2005 3:08:39 PM 3/31/2005

how the data looks

fix_in fix_out
3/22/2005 8:27:03 PM 3/29/2005 5:05:04 PM---first went out
3/30/2005 3:08:39 PM 4/16/2005 1:40:44 PM

How I would like to see it since we begin billing from
the first date the truck went out.

fix_in fix_out
3/29/2005 5:05:04 PM 3/30/2005 3:08:39 PM
3/30/2005 3:08:39 PM 3/31/2005

Thanks again Tom

August 10, 2005 - 1:05 pm UTC

try to work it out yourself -- please.

why? because I'll do this little thing and it'll be "oh yeah, one more thing, when the data looks like this...."

specifying requirements is like the most important thing in the world -- it is key, it is crucial. It is obivous you know what you want (well, maybe -- it seems to change over time) but I don't "get it" myself. Your simple example here with two rows begs so so many questions, I don't even want to get started.

You have lag() and lead() at your disposal, the probably come into play here. check them out.

Thanks for help !

A reader, August 11, 2005 - 3:25 pm UTC

The report is kind of tricky. Specially when one of the dates originates in Feb. and the other pair falls in march.

Hooked on Analytics worked for me!!

Greg, August 22, 2005 - 11:15 am UTC

I think I need to find a meeting group to help with my addiction ... I think I'm addicted to analytics .. :\

Finally got a chance to read chapter 12 in "Expert Oracle" ... awesome!! 4 big, hairy Thumbs up!! heh

But I got a question ... an "odd" behaviour that I don't understand ... was wondering if you could help explain:

Test Script:
================
drop table junk2;
drop sequence seq_junk2;

create sequence seq_junk2;

create table junk2
(inv_num number,
cli_num number,
user_id number)
/

insert into junk2
values ( 123, 456, null );
insert into junk2
values ( 123, 678, null );
insert into junk2
values ( 234, 456, null );
insert into junk2
values ( 234, 678, null );

commit;

break on cli_num skip 1

select * from junk2;

select inv_num, cli_num,
NVL ( user_id, 999 ) chk1,
NVL2 ( user_id, 'NOT NULL', 'NULL' ) chk2,
seq_junk2.nextval seq,
FIRST_VALUE ( NVL ( user_id, seq_junk2.nextval ) )
OVER ( PARTITION BY cli_num ) user_id
from junk2
/
=====================

The final query shows this:

INV_NUM CLI_NUM CHK1 CHK2 SEQ USER_ID
---------- ---------- ---------- -------- ---------- ----------
123 456 999 NULL 1
234 999 NULL 2

123 678 999 NULL 3 2
234 999 NULL 4 2

4 rows selected.

and I'm kinda confused .. it appears that the analytic functions are not "processing" that sequence ... how do sequences and analytics work together?? (if at all??)

(In short, this is a simplified example of a bigger problem I tripped over. I'm trying to assign new user_ids for existing clients, but only want 1 user_id assigned per client. Trick is, each client can be associated with more 1 investment ... so I have multiple rows with same client, but I want the same user_id assigned. kind of: "Has this client got an id yet? if not, give him a new one, otherwise display the one he's already been assigned".)

FIRST_VALUE and LAST_VALUE seemed the logical choice ...
The interesting thing is, when I use DBMS_RANDOM.VALUE (to assign a random PIN to start with) ... it works fine, what am I missing/forgetting about sequences that changes their behaviour in this regards?)

August 23, 2005 - 8:56 am UTC

that will be a tricky one, lots of assumptions on orders of rows processed and such.

that should throw an ora-2287 in my opinion.

I cannot see a safe way to do that without writing a plsql function and performing a lookup off to the side by cli_num

Sorry, I don't understand ...

Greg, August 23, 2005 - 11:36 am UTC

you wrote:
"that will be a tricky one, lots of assumptions on orders of rows processed and such."

I don't understand what assumptions I'm making ... in my example, I just got 4 rows, I don't care what order they come back in, just so long as it deals with them in "groups of cli_nums" .. (hence the partition by cli_num portion) ... if I "lose" sequence numbers, that's fine, too ... I don't care about gaps in the sequence or "missing userids" ...

The only behaviour I'm seeing, is that the analytic function doesn't seem to be working with the sequence properly ...

I guess I can simplify the question even further:

Why does the following query return "NULL" ?

SQL > select first_value ( seq_junk2.nextval ) over ( )
2 from dual
3 /
------more------

FIRST_VALUE(SEQ_JUNK2.NEXTVAL)OVER()
------------------------------------

1 row selected.

(with a "normal" sequence - nothing fancy):

SQL > select seq_junk2.nextval from dual;
------more------

NEXTVAL
----------
29

1 row selected.

August 24, 2005 - 8:35 am UTC

as i said, i believe it should be raising an error (I have it on my list of things to file when I get back in town).

I cannot make it work, I cannot think of a way to do it in a single statement, short of writing a user defined function.

Connect by with self referenced parent

Joe, August 23, 2005 - 12:30 pm UTC

CONNECT BY works great but I've run into a problem when the ultimate parent is referenced in the parent record.  e.g., date looks like:
SQL> select * from t;

    OBJ_ID  PARENT_ID
---------- ----------
         1          1
         2          1
         3          1
         4          2
         5          4

But... using connect by generates an error..

SQL> select lpad(' ', 2*(level-1)) ||level "LEVEL",t.obj_id, t.parent_id
  2  from t
  3  connect by t.parent_id = prior t.obj_id;
ERROR:
ORA-01436: CONNECT BY loop in user data

If parent_id is null where obj_id = 1, then it's okay.  Any suggestion on how to handle the other case?  I'm stumped.

Solution for connect by

Logan Palanisamy, August 23, 2005 - 5:39 pm UTC

SQL> select lpad(' ', 2*(level-1)) ||level "LEVEL",t.obj_id, t.parent_id
  2  from t
  3  connect by t.parent_id = prior t.obj_id and t.parent_id <> t.obj_id;

LEVEL                    OBJ_ID  PARENT_ID
-------------------- ---------- ----------
1                             1          1
  2                           2          1
    3                         4          2
      4                       5          4
  2                           3          1
1                             2          1
  2                           4          2
    3                         5          4
1                             3          1
1                             4          2
  2                           5          4
1                             5          4

12 rows selected.

re:Solution for connect by

Joe, August 24, 2005 - 8:43 am UTC

Thanks Logan. Often the solution is so simple! Thanks.

Seq problem

Bob B, August 24, 2005 - 11:25 am UTC

SELECT
A.*,
seq_junk2.currval CURR_SEQ,
seq_junk2.nextval - ROWNUM + VAL SEQ
FROM (
SELECT
inv_num,
cli_num,
NVL ( user_id, 999 ) chk1,
NVL2 ( user_id, 'NOT NULL', 'NULL' ) chk2,
DENSE_RANK() OVER ( ORDER BY CLI_NUM ) VAL
FROM JUNK2
) A

Might be a starting point. It works on the following ASSUMPTION: ROWNUM corresponds to the number of times the sequence has been called. As Tom stated, this assumption can easily go out the window (throw an analytic function or an order by on the outer query for a simple example).

A safer solution might be to run two updates. Update 1 will give a unique id to each null user id. Update 2 will update the user id to the min or max user id for that cli_num. A little overhead, but safer and simpler than the aforementioned alternative.

Still confused ... but working on it ...

Greg, August 24, 2005 - 1:42 pm UTC

Thanks, Bob!! Yeah, that does exactly what I wanted it to do, (but still doesn't really explain the "why" part) ...

problem is, it looks like this is more a question on sequences now than analytics, so I'll see if I can find a more appropriate thread to continue this on ..

Thanks!!

A slight twist on lag/lead

Sudha Bhagavatula, September 01, 2005 - 11:08 am UTC

That was useful to me. Could do a lot of queries easily. However I'm stuck at this point.

I have data like this:

subr_id dep_nbr grp eff_date term_date
1001 001 2112 01/01/2000 12/31/2000
1001 001 2112 01/01/2001 06/30/2001
1001 001 2112 07/01/2001 12/31/2001
1001 001 7552 01/01/2003 12/31/2003
1001 001 2112 06/30/2004 12/31/9999

I want my output to look like this:

subr_id dep_nbr grp eff_date term_date
1001 001 2112 01/01/2000 12/31/2001
1001 001 7552 01/01/2003 12/31/2003
1001 001 2112 06/30/2004 12/31/9999

How do I achieve this ?

September 01, 2005 - 3:49 pm UTC

well, you should start by describing the logic from getting from A to B first.

otherwise it is just text. what are the rules that got you from inputs to outputs.

tell me the procedural algorithm you would use for example.

Rules from A to B

Sudha Bhagavatula, September 02, 2005 - 9:29 am UTC

A member is enrolled in a group for a timeframe. For all contiguous time frames for a group I can take the min(eff_date) and max(term_date). For each break in group a new row with min(eff_date) and max(term_date) again. So say a member was enrolled in a group from 01/01/2001 to 12/31/2001 and then again with the same group from 01/01/2005 to 06/30/2005 then I need 2 rows for this member
with the dates as said just now. This is the sql that I'm running, hopefully I'm on the right track but am stuck at this point:

SELECT SUBR_ID,
DEP_NBR,
GRP,
LAG_EFF_DATE,
LEAD_EFF_DATE,
EFF_DATE,
TERM_DATE,
LAG_TERM_DATE,
LEAD_TERM_DATE,
DECODE( LEAD_GRP, GRP, 1, 0 ) FIRST_OF_SET,
DECODE( LAG_GRP, GRP, 1, 0 ) LAST_OF_SET
FROM (SELECT M.SUBR_ID,
M.DEP_NBR,
LAG(GRP_NBR||SUB_GRP) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) LAG_GRP,
LEAD(GRP_NBR||SUB_GRP) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) LEAD_GRP,
GRP_NBR||SUB_GRP GRP,
CJ.EFF_DATE,
CJ.TERM_DATE,
LAG(CJ.EFF_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) LAG_EFF_DATE,
LEAD(CJ.EFF_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) LEAD_EFF_DATE,
LAG(CJ.TERM_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) LAG_TERM_DATE,
LEAD(CJ.TERM_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE) LEAD_TERM_DATE
FROM DW.T_MEMBER_GROUP_JUNCTION CJ,
BCBS.T_GROUP_DIMENSION G,
BCBS.T_MEMBER_DIMENSION M
WHERE CJ.GRP_DIM_ID = G.GRP_DIM_ID
AND CJ.MBR_DIM_ID = M.MBR_DIM_ID
AND M.DEP_NBR != '000'
AND G.BENE_PKG IS NOT NULL)
WHERE LAG_GRP IS NULL
OR LEAD_GRP IS NULL
OR LEAD_GRP <> GRP
OR LAG_GRP <> GRP

Thanks for your reply.

September 03, 2005 - 7:15 am UTC

you know, without a table, rows and something more concrete.... I have no comment.

More detail

Sudha Bhagavatula, September 04, 2005 - 10:34 pm UTC

have 3 tables:

Member_dimension
Group_Dimension
Member_Group_Junction

Member_Dimension :- columns are mbr_dim_id, subr_id, dep_nbr

Group dimension :- columns are grp_dim_id, grp_nbr, sub_grp

Member_Group_Junction :- columns are mbr_dim_id, grp_dim_id, eff_date, term_date

I have to create one row for each contiguous dates of enrollment with a new row for a new group or a break in date.

Suppose a member (subr_id = 1001, dep_nbr = 001) is enrolled with a group called 001 from 01/01/2001 till 06/30/2001, he then changes group to 002 for the period 07/01/2001 till 12/31/2001. He enrolls with the same group 002 from 01/01/2002 till 06/30/2002 with a change in benefits. He then gets transferred to some other city or changes jobs. He joins back with the group 001 from 09/30/2003 till 11/30/2003 and quits again. joins back with the same group 001 from 01/01/2204 till present.The data in the junction table will be like this:

mbr_dim_id grp_dim_id eff_date term_date

1 1 01/01/2001 06/30/2001
1 2 07/01/2001 12/31/2001
1 2 01/01/2002 06/30/2002
1 1 09/30/2003 11/30/2003
1 1 01/01/2004 12/31/9999

My output should be like this:

mbr_dim_id grp_dim_id eff_date term_date

1 1 01/01/2001 06/30/2001
1 2 07/01/2001 06/30/2002
1 1 09/30/2003 11/30/2003
1 1 01/01/2004 12/31/9999

For each change in group or a break in the contiguity of the dates I should get a new row. The junction table is joined to the dimension with the respective dim_ids.

Hope I'm clearer this time.

Thanks
Sudha

September 05, 2005 - 10:11 am UTC

tell you what, see
</code> https://www.oracle.com/technetwork/issue-archive/2014/14-mar/o24asktom-2147206.html <code>

it shows a technique in the analytics to the rescue article that will be useful for grouping ranges a records using the LAG() function.

But, you need to read the text that you are supposed to read before putting an example here.

It is something I think I say a lot.

<quote>
If your followup requires a response that might include a query, you had better supply very very simple create tables and insert statements. I cannot create a table and populate it for each and every question. The SMALLEST create table possible (no tablespaces, no schema names, just like I do in my examples for you)
</quote>

that is a direct cut and paste

distinct last_value

Putchi, September 06, 2005 - 4:49 am UTC

When using last_value I am usually only intrested in the last value, hence I need a distinct in the select to get it. It gives what I want but it seems that the database hase to do the work twice, first a window sort and after that a unique sort. Is there any way to avoid the distinct but still only get one row per partion key?

create table a (num number(2), var1 varchar2(10), var2 varchar2(10));

insert into a values (1,'a','A');
insert into a values (2,'b','A');
insert into a values (3,'c','A');
insert into a values (1,'a','B');
insert into a values (2,'b','B');
insert into a values (3,'c','B');

commit;

SQL> select distinct
  2         var2
  3        ,last_value(var1) over (partition by var2 order by num
  4                                rows between unbounded preceding and unbounded following) var1
  5  from a;

VAR2       VAR1
---------- ----------
A          c
B          c


Körschema
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (UNIQUE)
   2    1     WINDOW (SORT)
   3    2       TABLE ACCESS (FULL) OF 'A'

September 06, 2005 - 8:31 am UTC

nope, analytics are not aggregates, aggregates are not analytics. 


A trick you can use to skip one or the other step is:


ops$tkyte@ORA817DEV> select var2,
  2         substr( max(to_char( num,'fm0000000000') || var1), 11 ) data
  3    from a
  4   group by var2
  5  /

VAR2       DATA
---------- -----------
A          c
B          c

Analytics to the rescue

Sudha Bhagavatula, September 06, 2005 - 11:28 am UTC

Read that article. Helped me, but now I have another twist.

Create table contracts (subr_id varchar2(15), dep_nbr varchar2(3), grp_nbr varchar2(12), eff_date date, term_date date)

insert into contracts values ('1001', '001', '2112', to_date('01/01/2000','mm/dd/yyyy'), to_date('12/31/2000','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '2112', to_date('01/01/2001','mm/dd/yyyy'), to_date('06/30/2001','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '2112', to_date('07/01/2001','mm/dd/yyyy'), to_date('12/31/2001','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '7552', to_date('01/01/2003','mm/dd/yyyy'), to_date('12/31/2003','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '2112', to_date('01/01/2004','mm/dd/yyyy'), to_date('12/31/9999','mm/dd/yyyy'));

I ran this query to identify breaks in groups and dates for the above table:

select subr_id, dep_nbr, grp,
min_eff_date,
max_term_date
from
(select subr_id, dep_nbr, grp,
min(eff_date) min_eff_date,
max(term_date) max_term_date
from
(select subr_id, dep_nbr, eff_date, term_date, grp,
max(rn)
over(partition by subr_id, dep_nbr order by eff_date) max_rn
from
(select subr_id, dep_nbr, eff_date, term_date, grp,
(case
when eff_date-lag_term_date > 1
or lag_term_date is null
or lag_grp_nbr is null
or lag_grp_nbr <> grp
then row_num
end) rn
from (
select subr_id, dep_nbr, eff_date, term_date, grp_nbr grp,
lag(term_date)
over (partition by subr_id, dep_nbr order by eff_date) lag_term_date,
lag(grp_nbr||sub_grp)
over (partition by subr_id, dep_nbr order by eff_date) lag_grp_nbr,
row_number()
over (partition by subr_id, dep_nbr order by eff_date) row_num
from contracts )))
group by subr_id, dep_nbr, grp, max_rn )
order by subr_id, dep_nbr, min_eff_date

This gave me the output as :

subr_id dep_nbr grp eff_date term_date
1001 001 2112 01/01/2000 12/31/2001
1001 001 7552 01/01/2003 12/31/2003
1001 001 2112 06/30/2004 12/31/9999

I now have another table :

create table contract_pcp_junction (subr_id varchar2(15), dep_nbr varchar2(3), pcp_id varchar2(12), eff_date date, term_date date)

insert into contract_pcp_junction values('1001','001','123765', to_date('07/01/2000','mm/dd/yyyy') to_date('06/30/2001','mm/dd/yyyy');
insert into contract_pcp_junction values('1001','001','155165', to_date('01/01/2003','mm/dd/yyyy') to_date('12/31/9999','mm/dd/yyyy');

This table identifies the provider coverage for each member. I need to identify the breaks in coverage with regards to the contracts.

Now as per the data above this member does not have a pcp from 01/01/2000 to 06/30/2000 and again from 07/01/2001 to 12/31/2001.

I need to insert the breaks into another table. This table needs to have the subr_id, dep_nbr, grp and eff_date, term_date.

create table contract_pcp_breaks (subr_id varchar2(15), dep_nbr varchar2(3), grp_nbr varchar2(12), eff_date date, term_date date)

This table needs to have the data for the breaks

subr_id dep_nbr grp_nbr eff_date term_date

1001 001 2112 01/01/2000 06/30/2000
1001 001 2112 07/01/2001 12/31/2001

How do I do that and hopefully I have the necessary scripts for you to work w1th.

Thanks a lot for your patience with this.

--Sudha

September 06, 2005 - 8:51 pm UTC

yah, I have scripts, but no real idea how these tables relate. Your query looks overly complex for the single table.

cannot you take your data, join it, get some "flat relation" that just simply using lag() on will solve the problem?

(please remember, you have been looking at this for hours. To you this data is natural. to everyone else, it is just bits and bytes on the screen)

Combining two tables

Putchi, September 09, 2005 - 6:39 am UTC

Hi Tom!

I want to combine from/to history values from two tables into one sequence like this:

create table a (a varchar2(2)
,from_date date
,to_date date);

create table b (b varchar2(2)
,from_date date
,to_date date);

insert into a ( a, from_date, to_date ) values (
'a1', to_date( '01/13/2005', 'mm/dd/yyyy'), to_date('02/10/2005', 'mm/dd/yyyy'));
insert into a ( a, from_date, to_date ) values (
'a2', to_date( '02/10/2005', 'mm/dd/yyyy'), to_date( '05/01/2005', 'mm/dd/yyyy'));
insert into a ( a, from_date, to_date ) values (
'a3', to_date( '05/01/2005', 'mm/dd/yyyy'), to_date( '08/12/2005', 'mm/dd/yyyy'));
insert into b ( b, from_date, to_date ) values (
'b1', to_date( '01/13/2005', 'mm/dd/yyyy'), to_date( '01/22/2005', 'mm/dd/yyyy'));
insert into b ( b, from_date, to_date ) values (
'b2', to_date( '01/22/2005', 'mm/dd/yyyy'), to_date( '04/01/2005', 'mm/dd/yyyy'));
insert into b ( b, from_date, to_date ) values (
'b3', to_date( '04/01/2005', 'mm/dd/yyyy'), to_date( '09/07/2005', 'mm/dd/yyyy'));
commit;

select * from ("Magic");

A B FROM_DATE TO_DATE
-- -- ---------- ----------
a1 b1 2005-01-13 2005-01-22
a1 b2 2005-01-22 2005-02-10
a2 b2 2005-02-10 2005-04-01
a2 b3 2005-04-01 2005-05-01
a3 b3 2005-05-01 2005-08-12

Is it possible?

September 09, 2005 - 8:30 am UTC

ops$tkyte@ORA10G> select a.* , b.*,
  2         greatest(a.from_date,b.from_date),
  3             least(a.to_date,b.to_date)
  4    from a, b
  5   where a.from_date <=  b.to_date
  6     and a.to_date >= b.from_date;
 
A  FROM_DATE TO_DATE   B  FROM_DATE TO_DATE   GREATEST( LEAST(A.T
-- --------- --------- -- --------- --------- --------- ---------
a1 13-JAN-05 10-FEB-05 b1 13-JAN-05 22-JAN-05 13-JAN-05 22-JAN-05
a1 13-JAN-05 10-FEB-05 b2 22-JAN-05 01-APR-05 22-JAN-05 10-FEB-05
a2 10-FEB-05 01-MAY-05 b2 22-JAN-05 01-APR-05 10-FEB-05 01-APR-05
a2 10-FEB-05 01-MAY-05 b3 01-APR-05 07-SEP-05 01-APR-05 01-MAY-05
a3 01-MAY-05 12-AUG-05 b3 01-APR-05 07-SEP-05 01-MAY-05 12-AUG-05


It won't be blindingly fast on huge things I would guess...

Putchi, September 09, 2005 - 9:14 am UTC

OK, I will try if it works, the real tables will have hundred of thousands records. I tried this myself, but I couldn't come up with something that filled in the "null" values.

SQL> select a,b,from_date,lead(from_date) over (order by from_date)
  2  from (
  3  select a,null b,from_date,to_date from a
  4  union all
  5  select null a,b,from_date,to_date from b
  6  order by from_date
  7  );

A  B  FROM_DATE  LEAD(FROM_
-- -- ---------- ----------
a1    2005-01-13 2005-01-13
   b1 2005-01-13 2005-01-22
   b2 2005-01-22 2005-02-10
a2    2005-02-10 2005-04-01
   b3 2005-04-01 2005-05-01
a3    2005-05-01

September 09, 2005 - 9:36 am UTC

that query won't work -- you need to join.

How to get the 1ST row of this distinct value in a single SELECT

Sean Chang, September 16, 2005 - 11:48 am UTC

Thank you, Tom.

I have read the analytic function for a while, but still
can't figure out a way to select the first row of a distinct
column value in a single SELECT statement. I.E
>>by running below Create and Insert

create table INV (
inv# number(7),
add_time date ,
inv_type varchar2(10),
amount number(8,2));

insert into inv values(1, sysdate-1, 'CASH', 100);
insert into inv values(1, sysdate, 'VISA', 200);
insert into inv values(1, sysdate+1, 'COD', 100);
insert into inv values(1, sysdate, 'VISA', 200);
insert into inv values(2, sysdate, 'MC', 10);
insert into inv values(3, sysdate-1, 'AMEX', 30);
insert into inv values(3, sysdate, 'CASH', 30);

I can get the first row of distinct INV# this way:

select * from (select a.*,
rank() over (partition by inv# order by add_time) time_order
from inv a) where time_order=1;

But how can I acheive this by a single SELECT statement?
The reason is that we have lots of tables we only need
look the very first row of the same Column value and I
don't want endup with lots of in-line views in SELECT
statement.

September 16, 2005 - 1:59 pm UTC

that is a single select.

why not? (on the lots of in-line views). If you think they are evil - then you wouldn't like my code ;)

Is analytical fitting in this situtation?

A reader, October 03, 2005 - 10:29 am UTC

select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT

from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,

((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'A'
group by trunc(c.damage_inspection_date),c.damage_inspection_by

UNION ALL

select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id) cnt
from gate_containers g,
gate_damages z
where g.gate_id = z.gate_id and
z.damage_type_code = 'F'
group by trunc(g.damage_inspection_date),g.damage_inspection_by

UNION ALL

select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by

)
group by damage_inspection_date, damage_inspection_by, status, cnt
)
) a

where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+);

October 03, 2005 - 11:29 am UTC

((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'A'
group by trunc(c.damage_inspection_date),c.damage_inspection_by

UNION ALL

select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id) cnt
from gate_containers g,
gate_damages z
where g.gate_id = z.gate_id and
z.damage_type_code = 'F'
group by trunc(g.damage_inspection_date),g.damage_inspection_by

UNION ALL

select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by

)

should be a single query without union's - you don't need to make three passes on that data

select ..., count(distinct case when damage_code = 'A' then gate_id),
count(distinct case when damage_code = 'F' then gate_id end),
count(distinct gate_id)

Great!

A reader, October 03, 2005 - 4:05 pm UTC

Tom,

When I put the changes. It saying "missing keyword" What am I doing wrong?

select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date)
damage_inspection_date, gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
count(distinct case when damage_code = 'A' then gate_id),
count(distinct case when damage_code = 'F' then gate_id end),
count(distinct gate_id))
from gate_containers ab,gate_damages ac
where ab.gate_id = ac.gate_id
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+);

October 03, 2005 - 8:57 pm UTC

sorry, I am not a sql compiler, I cannot reproduce since I don't have the tables or anything.

Case when ... then ... end

Greg, October 04, 2005 - 8:26 am UTC

Just lucked out an saw this:

"select ..., count(distinct case when damage_code = 'A' then gate_id),
count(distinct case when damage_code = 'F' then gate_id end),
count(distinct gate_id)"

Should be:

"select ..., count(distinct case when damage_code = 'A' then gate_id end),
count(distinct case when damage_code = 'F' then gate_id end),
count(distinct gate_id)"

Tom just missed the "end" for the case statement ... (I got lucky and spotted it .. heh)

October 04, 2005 - 4:25 pm UTC

(that is why i always ask for create tables and inserts - without them, it is not possible to test)

thanks!!

A reader, October 04, 2005 - 2:24 pm UTC

Well Taken

A reader, October 05, 2005 - 10:53 am UTC

Tom,

This is what I would like to see..

damage_inspection_date damage_inspection_by counts
xx/xx/xxxx Louis 2 minors
xx/xx/xxxx juan 1 major

thanks.

can analytics help me?

Susan, October 05, 2005 - 2:41 pm UTC

My result set be ordered by the sum of multiple columns with weight assigned to the columns. The SQL below works and gives me what I want, but maybe there is an analytical function solution? Thanks for all your help.

SELECT ename, job, sal, comm FROM scott.BONUS
ORDER BY DECODE(job, -2, 0, job)*100000+DECODE(sal, -2, 0, sal)*10000+DECODE(comm, -2,0,comm)*100 DESC

October 05, 2005 - 3:05 pm UTC

not in this case - you want to order by a simple function of attributes of a single row.

You don't need to look across rows - analytics look across rows.

Thanks Tom

Susan, October 05, 2005 - 3:58 pm UTC

Thanks for your reply. Do you agree with the DECODE approach or am I missing a more elegant solution?

October 05, 2005 - 8:23 pm UTC

the decode looks fine here - shorter than case but in this "case" just as easy to read.

Tom

A reader, October 05, 2005 - 4:25 pm UTC

Tom,

Can you please point in the right direction...

This is what I am getting with the following query...

damage_inspection_date damage_inspection_by status
6/12/2004 CCCT MAJOR
6/12/2004 CCCT MINOR
6/12/2004 CCCT TOTAL
6/12/2004 LOU MAJOR
6/12/2004 LOU MINOR

and this is what I would like to get....

damage_inspection_date damage_inspection_by status count
6/12/2004 CCCT MAJOR 2
6/12/2004 CCCT MINOR 2
6/12/2004 CCCT TOTAL 1

select b.damage_inspection_date,
b.damage_inspection_by
,b.status
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT ab.damage_inspection_date,
damage_inspection_by,
STATUS_CODE,
count(distinct case when ac.damage_location_code = 'A' then ab.gate_id end),
count(distinct case when ac.damage_location_code = 'F' then ab.gate_id end),
count(distinct ab.gate_id )
from gate_containers ab,gate_damages ac
where ab.gate_id = ac.gate_id
group by ab.damage_inspection_date,ab.damage_inspection_by,status_code, ab.gate_id))a
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
group by (b.damage_inspection_date, b.damage_inspection_by,b.status)

October 05, 2005 - 8:35 pm UTC

....
damage_inspection_date damage_inspection_by status
6/12/2004 CCCT MAJOR
6/12/2004 CCCT MINOR
6/12/2004 CCCT TOTAL
6/12/2004 LOU MAJOR
6/12/2004 LOU MINOR

and this is what I would like to get....

damage_inspection_date damage_inspection_by status count
6/12/2004 CCCT MAJOR 2
6/12/2004 CCCT MINOR 2
6/12/2004 CCCT TOTAL 1
...

by what "logic"? can you explain how you get from A to B?

follow up

A reader, October 06, 2005 - 9:48 am UTC

Tom,

I already got the first part done. All I need to show is to somehow have the count in another column, how many minor, major and total I have. Can that be possible?
Just maybe like in the second example.

October 06, 2005 - 11:54 am UTC

first part of WHAT?

more information

A reader, October 06, 2005 - 12:47 pm UTC

Sorry about the lack of information before.

Here I will try to do bettter. I am trying to
a query where I need to count the major, minor
and then get a total.

requirements:

1. if there is a container with majors and a minors total the
counts = major+ minor = total count

2. where container has minor and no major count the minor only.
count = minor

inspector major minor total

1 major, 0 minor , other 1 1

inspector
2 major , 1 minor , other 2 1 3

inspector

0 major, 1 minor, other 0 1 1

October 06, 2005 - 1:25 pm UTC

sorry -- going back to your original example, I still cannot see the logic behind "what I have" and "what I want" there.

I don't know what you mean by "i have the first part"

this what I have now

A reader, October 06, 2005 - 2:11 pm UTC

Tom,

This is my query and result...

select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT

from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,

((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'F'
group by trunc(c.damage_inspection_date),c.damage_inspection_by

UNION ALL

select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id) cnt
from gate_containers g,
gate_damages z
where g.gate_id = z.gate_id and
z.damage_type_code = 'A'
group by trunc(g.damage_inspection_date),g.damage_inspection_by

UNION ALL

select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id(+) and
SUBSTR(ab.action,2,1) != 'C'
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by

)
group by damage_inspection_date, damage_inspection_by, status, cnt
)
) a

where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+);

RESULT:

SQL Statement which produced this data:
select * from MAJOR_MINOR_COUNT_VIEW
where rownum < 10

6/12/2004 CCCT TOTAL 1
6/12/2004 CRAIG TOTAL 6
6/13/2004 CCCT TOTAL 5
6/14/2004 CCCT TOTAL 46
6/14/2004 FYFE TOTAL 30
6/14/2004 HALM TOTAL 38
6/14/2004 MUTH MAJOR 2
6/14/2004 MUTH MINOR 14
6/14/2004 MUTH TOTAL 40

AND I WOULD LIK TO HAVE LIKE AS
THE REQUIREMENTS ABOVE...HOPE THIS HELP.

October 06, 2005 - 2:57 pm UTC

take your query - call it Q

select inspector,
max(decode(status,'MINOR',cnt)) minor,
max(decode(status,'MAJOR',cnt)) major,
max(decode(status,'TOTAL',cnt)) total
from (Q)
group by inspector

Year to dt + month to date

reader, October 06, 2005 - 2:35 pm UTC

CREATE TABLE TEST (ID VARCHAR2(10),sale_dt DATE ,amount NUMBER(6,2) )

INSERT INTO TEST VALUES ('aa','14-OCT-2005',65.25);
INSERT INTO TEST VALUES ('aa','14-OCT-2005',56.25);
INSERT INTO TEST VALUES ('aa','15-SEP-2005',72.25);
INSERT INTO TEST VALUES ('aa','19-OCT-2005',43.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',67.25);
INSERT INTO TEST VALUES ('bb','13-OCT-2005',235.25);
INSERT INTO TEST VALUES ('bb','15-OCT-2005',365.25);
INSERT INTO TEST VALUES ('bb','14-NOV-2005',465.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',165.25);
commit;

SELECT DISTINCT id,sale_dt,SUM (amount)
OVER (PARTITION BY id ORDER BY sale_dt ASC) sale_daily,
SUM (amount)
OVER (PARTITION BY id, TO_CHAR(invoice_dt, 'MON-YYYY') ORDER BY TO_CHAR(sale_dt, 'MON-YYYY') ASC) mon_sal,
SUM (sale_price_usd * qty_sold)
OVER (PARTITION BY id, TO_CHAR(sale_dt, 'YYYY') ORDER BY TO_CHAR(sale_dt, 'YYYY') ASC) yr_sal,
FROM test

ID SALE_DT SALE_DAILY MON_SAL YR_SAL
---------- --------- ---------- ---------- ----------
aa 15-SEP-05 72.25 72.25 237
aa 14-OCT-05 121.5 164.75 237
aa 19-OCT-05 43.25 164.75 237
bb 14-SEP-05 232.5 232.5 1298.25
bb 13-OCT-05 235.25 600.5 1298.25
bb 15-OCT-05 365.25 600.5 1298.25
bb 14-NOV-05 465.25 465.25 1298.25

7 rows selected.

Ideally ,it should have been ----

ID SALE_DT SALE_DAILY MON_SAL YR_SAL
---------- --------- ---------- ---------- ----------
aa 15-SEP-05 72.25 72.25 72.25
aa 14-OCT-05 121.5 121.5 193.75
aa 19-OCT-05 43.25 164.75 237
bb 14-SEP-05 232.5 232.5 232.5
bb 13-OCT-05 235.25 235.25 467.5
bb 15-OCT-05 365.25 600.5 833.0
bb 14-NOV-05 465.25 465.25 1298.25

How can I do this ?

Will appreciate your help .

THANKS

October 06, 2005 - 3:02 pm UTC

ideally - there would be a qty_sold column somewhere :)

ideally you will ONLY use to_char to *format* data, never to process it.

trunc(invoice_dt,'y') NOT to_char(invoice_dt,'yyyy')
trunc(sale_dt,'mm') NOT to_char(sale_dt, 'MON-YYYY' )

Year to Date and Month to date

READER, October 06, 2005 - 10:04 pm UTC

As per your suggestion ,I made the changes but ...still need your help .

CREATE TABLE TEST (ID VARCHAR2(10),sale_dt DATE ,amount NUMBER(6,2) )

INSERT INTO TEST VALUES ('aa','14-OCT-2005',65.25);
INSERT INTO TEST VALUES ('aa','14-OCT-2005',56.25);
INSERT INTO TEST VALUES ('aa','15-SEP-2005',72.25);
INSERT INTO TEST VALUES ('aa','19-OCT-2005',43.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',67.25);
INSERT INTO TEST VALUES ('bb','13-OCT-2005',235.25);
INSERT INTO TEST VALUES ('bb','15-OCT-2005',365.25);
INSERT INTO TEST VALUES ('bb','14-NOV-2005',465.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',165.25);
commit;

SELECT DISTINCT id,sale_dt,SUM (amount)
OVER (PARTITION BY id ORDER BY sale_dt ASC) sale_daily,
SUM (amount)
OVER (PARTITION BY id,trunc(sale_dt,'MM') ORDER BY trunc(sale_dt,'MM') ASC) mon_sal,
SUM (amount)
OVER (PARTITION BY id,trunc(sale_dt,'Y') ORDER BY trunc(sale_dt,'Y') ASC) yr_sal
FROM test
ID SALE_DT SALE_DAILY MON_SAL YR_SAL
---------- --------------------- ---------- ---------- ----------
aa 9/15/2005 72.25 72.25 237
aa 10/14/2005 193.75 164.75 237
aa 10/19/2005 237 164.75 237
bb 9/14/2005 232.5 232.5 1298.25
bb 10/13/2005 467.75 600.5 1298.25
bb 10/15/2005 833 600.5 1298.25
bb 11/14/2005 1298.25 465.25 1298.25
7 rows selected

Ideally ,it should have been ----

ID SALE_DT SALE_DAILY MON_SAL YR_SAL
---------- --------- ---------- ---------- ----------
aa 15-SEP-05 72.25 72.25 72.25
aa 14-OCT-05 121.5 121.5 193.75
aa 19-OCT-05 43.25 164.75 237
bb 14-SEP-05 232.5 232.5 232.5
bb 13-OCT-05 235.25 235.25 467.5
bb 15-OCT-05 365.25 600.5 833.0
bb 14-NOV-05 465.25 465.25 1298.25

Thanks again .

October 07, 2005 - 8:13 am UTC

you shall have to explain how you derived your "optimal" output.

certainly isn't sorted by anything? I don't get the numbers.

Year to date /Month to date

Reader, October 07, 2005 - 9:49 am UTC

I wish to create a summary table where we will have sale for every day ,sale up to that day in that month and then upto that day in that year
ie running total or cummulative total

Thanks

October 07, 2005 - 8:22 pm UTC

ok?

Follo up

A reader, October 07, 2005 - 9:52 am UTC

Tom,

The above pivot worked well, however my count are off since
I ONLY want to count the minor when there is no Major.
Something like this..

major minor count

1 major, 0 minor , other 1 1
2 major , 1 minor , other 2 2
0 major, 1 minor, other 0 1 1

* count the minor when there is no major

CREATE TABLE GATE_CONTAINERS
(
GATE_ID NUMBER ,
VISIT NUMBER ,
REFERENCE_ID NUMBER ,
DAMAGE_INSPECTION_BY VARCHAR2(30),
DAMAGE_INSPECTION_DATE DATE,

)

Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(1, 1);
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(17, 10);
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(21, 12);
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(31, 18);
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(33, 19);
Insert into GATE_TBL
(GATE_ID, VISIT, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(36, 22, TO_DATE('06/12/2004 11:48:49', 'MM/DD/YYYY HH24:MI:SS'), 'CRAIG');
Insert into GATE_TBL
(GATE_ID, VISIT, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(37, 23, TO_DATE('06/12/2004 11:50:11', 'MM/DD/YYYY HH24:MI:SS'), 'CRAIG');
Insert into GATE_TBL
(GATE_ID, VISIT, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(39, 25, TO_DATE('06/12/2004 11:48:19', 'MM/DD/YYYY HH24:MI:SS'), 'CRAIG');
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(45, 30);
COMMIT;

CREATE TABLE GATE_DAMAGES
(
GATE_ID NUMBER NOT NULL,
DAMAGE_LOCATION_CODE VARCHAR2(5 BYTE) NOT NULL,
DAMAGE_TYPE_CODE VARCHAR2(5 BYTE) NOT NULL
)

Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(34, '01', '9');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(34, '02', 'C');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(37, '01', 'B');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(62, '05', 'B');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(101, '23', 'C');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(183, '99', '9');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '01', 'D');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '04', 'B');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '07', 'B');
COMMIT;

October 07, 2005 - 8:35 pm UTC

The above pivot worked well, however my count are off since
I ONLY want to count the minor when there is no Major.
Something like this..

major minor count

1 major, 0 minor , other 1 1
2 major , 1 minor , other 2 2
0 major, 1 minor, other 0 1 1

so tell me why there are minor counts when major > 0???

and this is my query

A reader, October 07, 2005 - 9:53 am UTC

select damage_inspection_date,damage_inspection_by,
max(decode(status,'MINOR',cnt)) minor,
max(decode(status,'MAJOR',cnt)) major,
max(decode(status,'TOTAL',cnt)) total
from (select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'F'
group by trunc(c.damage_inspection_date),c.damage_inspection_by
UNION ALL
select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id) cnt
from gate_containers g,
gate_damages z
where g.gate_id = z.gate_id and
z.damage_type_code = 'A'
group by trunc(g.damage_inspection_date),g.damage_inspection_by
UNION ALL
select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id(+) and
SUBSTR(ab.action,2,1) != 'C'
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
group by damage_inspection_date, damage_inspection_by, status, cnt
)
) a
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+)
)
group by damage_inspection_date,damage_inspection_by

I got it....

A reader, October 07, 2005 - 11:16 am UTC

Tom,

I got it....I just had to put the following. Let me know
what you think? If you have any suggestions!
Thanks for all your patient...

select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,z.damage_type_code,
( case z.damage_type_code
when 'F' then 0
ELSE Count(distinct g.gate_id)
end ) CNT
--- count(distinct g.gate_id) cnt
from gate_containers g,gate_damages z
where g.gate_id = z.gate_id
and z.damage_type_code = 'A'

Year to Date and Month to date

Tim, October 07, 2005 - 11:52 pm UTC

Just a guess - could this be what your looking for?

SELECT DISTINCT id,sale_dt
,SUM (amount) OVER
(PARTITION BY id, sale_dt ORDER BY id ASC, sale_dt ASC) sale_daily
,SUM (amount) OVER
(PARTITION BY id, TRUNC(sale_dt,'MM')
ORDER BY id ASC, sale_dt ASC
RANGE UNBOUNDED PRECEDING
) mon_sal
,SUM (amount) OVER
(PARTITION BY id, TRUNC(sale_dt,'Y')
ORDER BY id ASC, sale_dt ASC
RANGE UNBOUNDED PRECEDING
) yr_sal
FROM TEST
ORDER BY id, sale_dt

ID SALE_DT SALE_DAILY MON_SAL YR_SAL
---------- --------- ---------- ---------- ----------
aa 15-SEP-05 72.25 72.25 72.25
aa 14-OCT-05 121.5 121.5 193.75
aa 19-OCT-05 43.25 164.75 237
bb 14-SEP-05 232.5 232.5 232.5
bb 13-OCT-05 235.25 235.25 467.75
bb 15-OCT-05 365.25 600.5 833
bb 14-NOV-05 465.25 465.25 1298.25

-- another variation
SELECT a.*
,SUM (sale_daily) OVER
(PARTITION BY id, TRUNC(sale_dt,'MM')
ORDER BY id ASC, sale_dt ASC
RANGE UNBOUNDED PRECEDING
) mon_sal
,SUM (sale_daily) OVER
(PARTITION BY id, TRUNC(sale_dt,'Y')
ORDER BY id ASC, sale_dt ASC
RANGE UNBOUNDED PRECEDING
) yr_sal
FROM
(
SELECT id,sale_dt
,SUM(amount) sale_daily
FROM TEST
GROUP BY id, sale_dt
) a

EXCELLENT !

reader, October 08, 2005 - 12:24 am UTC

Thanks !

to answer your question

A reader, October 11, 2005 - 12:54 pm UTC

The above pivot worked well, however my count are off since
I ONLY want to count the minor when there is no Major.
Something like this..

major minor count

1 major, 0 minor , other 1 1
2 major , 1 minor , other 2 2
0 major, 1 minor, other 0 1 1

so tell me why there are minor counts when major > 0???

because when there are a major's and minor's I want to count
only the major's. When just the minor when there is no
major...just want to count the minor. those are the only 2 situation that there should be.

How can I ignore some selected columns in my group by?

Neil, October 12, 2005 - 3:14 am UTC

Tom,
I have a set of data that is recorded daily and I want to
compress it; so this:

87654321 1 5 21-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 22-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 23-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 24-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 25-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 26-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 27-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 28-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 29-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 30-AUG-2005 2.7500E+10 0 -1.436E+10 2.7500E+10 0 -1.436E+10
87654321 1 5 31-AUG-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 01-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 02-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 03-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 04-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 05-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 06-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 07-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 08-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 09-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 10-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 11-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 12-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 13-SEP-2005 2.7500E+10 -3.306E+10 -1.991E+10 2.7500E+10 -3.306E+10 -1.991E+10
87654321 1 5 14-SEP-2005 0 0 -1.991E+10 0 0 -1.991E+10
87654321 1 5 15-SEP-2005 0 0 -1.991E+10 0 0 -1.991E+10
87654321 1 5 16-SEP-2005 0 0 -1.991E+10 0 0 -1.991E+10
87654321 1 5 17-SEP-2005 0 0 -1.991E+10 0 0 -1.991E+10
87654321 1 5 18-SEP-2005 0 0 -1.991E+10 0 0 -1.991E+10
87654321 1 5 19-SEP-2005 0 0 -1.991E+10 0 0 -1.991E+10
87654321 1 5 20-SEP-2005 5555550000 0 -1.436E+10 5555550000 0 -1.436E+10
87654321 1 5 21-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 22-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 23-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 24-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 25-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10
87654321 1 5 26-SEP-2005 0 0 -1.436E+10 0 0 -1.436E+10

Needs to be converted into this:

87654321 1 5 21-AUG-2005 29-AUG-2005 0 0 -4.186E+10 0 0 -4.186E+10
87654321 1 5 30-AUG-2005 12-SEP-2005 2.7500E+10 0 -1.436E+10 2.7500E+10 0 -1.436E+10
87654321 1 5 13-SEP-2005 19-SEP-2005 2.7500E+10 -3.306E+10 -1.991E+10 2.7500E+10 -3.306E+10 -1.991E+10
87654321 1 5 20-SEP-2005 01-JAN-4000 5555550000 0 -1.436E+10 5555550000 0 -1.436E+10

The column of interest is the 7th one. Whenever it changes,
I want to create a new row beginning with the day's date,
and ending either on the day before the next change, or, if
there is no next change (LEAD analytic function), substitute
in 01-JAN-4000 to show that this is the current amount.
The problem is, I need to ignore the other figures in columns 5 & 6 and 8 & 9. If I group by all the columns, I get separate
entries for these lines. That's my stumbling block - I have got close with analytics, but so far, no cigar!
I'm on 8.1.7, although I'd be interested in solutions possible in
later versions, too. If you think this is possible, I can paste a create table and SQL*Loader script here, but it would detract from the post: It's a bit of a mess anyway - if only AskTom allowed 132 columns :)
T.I.A

October 12, 2005 - 7:26 am UTC

"i want to create a new row" - that is hard as analytics don't "create rows", they just don't "squish them out" like an aggregate wou.d

make the example smaller - you don't need all of the columns, seems two or three might suffice. show the table, the data (via inserts) and the expected output if you like.

Maybe this should be a GROUP BY question, then

Neil, October 13, 2005 - 4:12 am UTC

OK - here's the table creation scripts and a couple of loader files.
My goal is to create a SQL statement to change the old data into the new.
I can use analytics to give me the start and end dates, but my problem is that I wish to ignore the actin, actout, expin and expout columns and concentrate on the act column. When it changes, I want to take the row, and give it an end date of the day before the date on which it changes again, or the default date of 01-JAN-4000 if no such row exists.
If I could just partition by the earliest date and the latest date where the act figure is the same within serial, volume and part, I could pick off the FIRST and the LAST and use LAG and LEAD to work out the dates...
CREATE TABLE t_old (
DEPOT VARCHAR2(6)
,SERIAL VARCHAR2(8)
,VOLUME NUMBER(4)
,PART NUMBER(2)
,ASAT DATE
,ACTIN NUMBER(8)
,ACTOUT NUMBER(8)
,ACT NUMBER(8)
,EXPIN NUMBER(8)
,EXPOUT NUMBER(8)
,EXPD NUMBER(8)
)
/

LOAD DATA
INFILE *
INTO TABLE t_old
TRUNCATE
FIELDS TERMINATED BY WHITESPACE
(DEPOT
,SERIAL
,VOLUME
,PART
,ASAT
,ACTIN
,ACTOUT
,ACT
,EXPIN
,EXPOUT
,EXPD)
BEGINDATA
DEPOT1 00822000 6086 5 24-SEP-2005 0 0 -1796200 0 0 -1796200
DEPOT1 00822000 6086 5 25-SEP-2005 0 0 -1796200 0 0 -1796200
DEPOT1 00822000 6086 5 26-SEP-2005 0 0 -1796200 0 0 -1796200
DEPOT1 08226111 1 5 29-AUG-2005 0 0 -4185550 0 0 -4185550
DEPOT1 08226111 1 5 30-AUG-2005 2750000 0 -1435550 2750000 0 -1435550
DEPOT1 08226111 1 5 31-AUG-2005 0 0 -1435550 0 0 -1435550
DEPOT1 08226111 1 5 01-SEP-2005 0 0 -1435550 0 0 -1435550
DEPOT1 08226111 1 5 02-SEP-2005 0 0 -1435550 0 0 -1435550
DEPOT1 08226111 1 5 03-SEP-2005 2750000 -3305555 -1991105 2750000 -3305555 -1991105
DEPOT1 08226111 1 5 04-SEP-2005 0 0 -1991105 0 0 -1991105
DEPOT1 08226111 1 5 05-SEP-2005 0 0 -1991105 0 0 -1991105
DEPOT1 08226111 1 5 06-SEP-2005 555555 0 -1435550 555555 0 -1435550
DEPOT1 08226111 1 5 07-SEP-2005 0 0 -1435550 0 0 -1435550
DEPOT1 08226111 1 5 08-SEP-2005 0 0 -1435550 0 0 -1435550
DEPOT1 08226111 420 5 11-SEP-2005 0 0 1150 0 0 1150
DEPOT1 08226111 420 5 12-SEP-2005 0 0 1150 0 0 1150
DEPOT1 08226111 420 5 13-SEP-2005 3329555 -2775150 555555 3329555 -2775150 555555
DEPOT1 08226111 420 5 14-SEP-2005 0 0 555555 0 0 555555
DEPOT1 08226111 420 5 15-SEP-2005 0 0 555555 0 0 555555
DEPOT1 08226111 420 5 16-SEP-2005 0 -555555 0 0 -555555 0
DEPOT1 08226111 420 5 17-SEP-2005 0 0 0 0 0 0
DEPOT1 08226111 495 5 18-SEP-2005 0 0 0 0 0 0
DEPOT1 08226111 495 5 19-SEP-2005 555555 0 555555 555555 0 555555
DEPOT1 08226111 495 5 20-SEP-2005 0 0 555555 0 0 555555
DEPOT1 08226111 495 5 21-SEP-2005 0 0 555555 0 0 555555
DEPOT1 08226111 495 5 22-SEP-2005 0 -555555 0 0 -555555 0
DEPOT1 08226111 495 5 23-SEP-2005 0 0 0 0 0 0
DEPOT1 08226111 664 5 28-AUG-2005 0 0 4228550 0 0 4228550
DEPOT1 08226111 664 5 29-AUG-2005 0 0 4228550 0 0 4228550
DEPOT1 08226111 664 5 30-AUG-2005 0 -2750000 1478550 0 -2750000 1478550
DEPOT1 08226111 664 5 31-AUG-2005 0 0 1478550 0 0 1478550
DEPOT1 08226111 664 5 01-SEP-2005 0 0 1478550 0 0 1478550

CREATE TABLE t_new (
DEPOT VARCHAR2(6)
,SERIAL VARCHAR2(8)
,VOLUME NUMBER(4)
,PART NUMBER(2)
,FROM_D DATE
,UNTIL_D DATE
,ACTIN NUMBER(8)
,ACTOUT NUMBER(8)
,ACT NUMBER(8)
,EXPIN NUMBER(8)
,EXPOUT NUMBER(8)
,EXPD NUMBER(8)
)
/

LOAD DATA
INFILE *
INTO TABLE t_new
TRUNCATE
FIELDS TERMINATED BY WHITESPACE
(DEPOT
,SERIAL
,VOLUME
,PART
,FROM_D
,UNTIL_D
,ACTIN
,ACTOUT
,ACT
,EXPIN
,EXPOUT
,EXPD)
BEGINDATA
DEPOT1 00822000 6086 5 24-SEP-2005 01-JAN-4000 0 0 -1796200 0 0 -1796200
DEPOT1 08226111 1 5 29-AUG-2005 29-AUG-2005 0 0 -4185550 0 0 -4185550
DEPOT1 08226111 1 5 30-AUG-2005 02-SEP-2005 2750000 0 -1435550 2750000 0 -1435550
DEPOT1 08226111 1 5 03-SEP-2005 05-SEP-2005 2750000 -3305555 -1991105 2750000 -3305555 -1991105
DEPOT1 08226111 1 5 06-SEP-2005 01-JAN-4000 555555 0 -1435550 555555 0 -1435550
DEPOT1 08226111 420 5 11-SEP-2005 12-SEP-2005 0 0 1150 0 0 1150
DEPOT1 08226111 420 5 13-SEP-2005 15-SEP-2005 3329555 -2775150 555555 3329555 -2775150 555555
DEPOT1 08226111 420 5 16-SEP-2005 18-SEP-2005 0 -555555 0 0 -555555 0
DEPOT1 08226111 495 5 19-SEP-2005 21-SEP-2005 555555 0 555555 555555 0 555555
DEPOT1 08226111 495 5 22-SEP-2005 01-JAN-4000 0 -555555 0 0 -555555 0
DEPOT1 08226111 664 5 28-AUG-2005 29-AUG-2005 0 0 4228550 0 0 4228550
DEPOT1 08226111 664 5 30-AUG-2005 01-JAN-4000 0 -2750000 1478550 0 -2750000 1478550

Some Help needed!

A reader, October 18, 2005 - 10:54 am UTC

Tom,

How can I count the double moves as 1. For example,
in the case of 1690371?

I want to count
1690371 63 A
1690371 63 X
1690371 64 A
1690371 64 L

I want to count "A" AS ONE MOVE using this query

select trunc(g.damage_inspection_date) damage_inspection_date,g.damage_inspection_by, 'MINOR' STATUS,
count(distinct g.gate_id) cnt
from gate_containers g,
gate_damages z
where g.gate_id = z.gate_id
and z.damage_type_code = 'A'
and g.DAMAGE_INSPECTION_BY = 'COLUMBO'
and trunc(G.damage_inspection_date) = to_date('06-14-2005','mm-dd-yyyy')
group by trunc(g.damage_inspection_date),g.damage_inspection_by

1690355 59 A
1690355 59 E

1690371 63 A
1690371 63 X
1690371 64 A
1690371 64 L

1690405 71 A
1690405 71 I

1690433 71 A
1690433 71 I

1690486 54 F
1690486 54 L
1690486 72 F
1690486 72 I

1690540 59 A
1690540 59 E

1690636 63 A
1690636 63 X

1690781 67 X

One solution

A reader, October 19, 2005 - 9:29 am UTC

Tom,

Can decode work here...

decode(count(distinct g.gate_id,'A','F',0,NULL)

October 19, 2005 - 9:45 am UTC

I didn't really understand the question right above, nor did I see any table creates or inserts, so I sort of ignored it...

More information

A reader, October 19, 2005 - 10:44 am UTC

create table gate_containers
(gate_id number,
action varchar2(5),
damage_inspection_date date,
damage_inspection_by varchar2(30))

Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1686439, 'RNC', TO_DATE('06/14/2005 11:16:16', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1688372, 'RNC', TO_DATE('06/14/2005 13:26:59', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1688374, 'RNC', TO_DATE('06/14/2005 13:27:08', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1688235, 'RNC', TO_DATE('06/14/2005 13:18:15', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1688609, 'RNC', TO_DATE('06/14/2005 13:43:35', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1686827, 'RNC', TO_DATE('06/14/2005 11:42:22', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1688508, 'RNC', TO_DATE('06/14/2005 13:36:38', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1686044, 'RNC', TO_DATE('06/14/2005 10:50:47', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1685720, 'RNC', TO_DATE('06/14/2005 10:27:38', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1686276, 'RNC', TO_DATE('06/14/2005 11:05:23', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)

CREATE TABLE GATE_DAMAGES
(
GATE_ID NUMBER NOT NULL,
DAMAGE_LOCATION_CODE VARCHAR2(5 BYTE) NOT NULL,
DAMAGE_TYPE_CODE VARCHAR2(5 BYTE) NOT NULL
)

--
--SQL Statement which produced this data:
-- SELECT * FROM GATE_DAMAGES
-- WHERE ROWNUM < 20
--
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(34, '01', '9');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(34, '02', 'C');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(37, '01', 'B');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(62, '05', 'B');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(101, '23', 'C');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(183, '99', '9');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '01', 'D');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '04', 'B');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '07', 'B');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '08', 'D');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '11', 'D');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '18', 'D');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '22', 'D');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '24', 'D');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(189, '01', 'B');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(189, '08', 'D');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(189, '11', 'B');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(189, '18', 'D');
Insert into GATE_DAMAGES
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(189, '22', 'D');
COMMIT;
Values
(1686279, 'RNC', TO_DATE('06/14/2005 11:05:34', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1686285, 'RNC', TO_DATE('06/14/2005 11:05:43', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1685831, 'RNC', TO_DATE('06/14/2005 10:36:22', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1685417, 'RNC', TO_DATE('06/14/2005 10:06:00', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1685579, 'RNC', TO_DATE('06/14/2005 10:17:18', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1685828, 'RNC', TO_DATE('06/14/2005 10:34:44', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1686007, 'RNC', TO_DATE('06/14/2005 10:47:43', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1686131, 'RNC', TO_DATE('06/14/2005 10:56:42', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
Insert into GATE_CONTAINER
(GATE_ID, ACTION, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(1688019, 'RNC', TO_DATE('06/14/2005 13:05:56', 'MM/DD/YYYY HH24:MI:SS'), 'COLUMBO');
COMMIT;

Tom,

Let me see if I can try now to explain it better. I am looking for
a report like this..

damages_inspection_date damages_inspection_by cnt minor major total
6/12/2005 MUTH XX XXX XX XX

requirements

MINOR = A
MAJOR = F
TOTAL != 'C'

and also but very important is that when there is a major and a minor
I should only count the major thus ignoring the minor.

October 19, 2005 - 12:34 pm UTC

sorry - but I'll need much more "text" than that. Remember, I haven't been staring at these tables for hours/days, I'm not familar with your vernacular, I don't know what problem you are trying to solve.

spec it out like we used to in the olden days - someone wrote spec (requirements) and someone else might have written the code from the spec.

follow up

A reader, October 19, 2005 - 2:51 pm UTC

GATE_ID DAMAGE_TYPE_CODE
1690355 59 A A
1690355 59 E
1690371 63 A A
1690371 63 X
1690371 64 A
1690371 64 L
1690405 71 A A
1690405 71 I
1690433 71 A A
1690433 71 I
1690486 54 F F
1690486 54 L
1690486 72 F
1690486 72 I
1690540 59 A A
1690540 59 E
1690636 63 A A
1690636 63 X
1690781 67 X

1687912 56 F F
1687912 56 I
1687912 66 A
1687912 66 X

I think this is a good example. In this case I got
A = MINOR DAMAGES
F = MAJOR DAMAGES
TOTAL = NOT EQUAL TO C ( !C)

1. If you look at it closely you can see for every
every gate_id where I have multiple
Major damages just count them as one like for example
Gate_id = 1690371

2 When you have multiples MINORS (Fs)
Like gate_id = 1690486 count them as one

3 When you have gate_id with F and A like
Gate_id = 1687912 then just Count F(MAJORS)

damages_inspection_date damages_inspection_by cnt minor major Total
6/12/2005 MUTH XX XXX XXX XX

October 19, 2005 - 4:30 pm UTC

you seem to be using F as major:

F = MAJOR DAMAGES

but also as minor:

When you have multiples MINORS (Fs)

sorry, I'm not being a "hard whatever", I'm not getting it. step back, pretend you were trying to explain this to your mom.

this is what I got so far

A reader, October 19, 2005 - 3:02 pm UTC

Tom,

This is what I got so far, but the query is
not following the rule with the MINORS DAMAGES..
----------------------------------------
select damage_inspection_date,damage_inspection_by,
max(decode(status,'MINOR',cnt)) minor,
max(decode(status,'MAJOR',cnt)) major,
max(decode(status,'TOTAL',cnt)) total
from (select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'F'
group by trunc(c.damage_inspection_date),c.damage_inspection_by
UNION ALL
select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id)
from gate_containers g, gate_damages z
where g.gate_id = z.gate_id
group by trunc(g.damage_inspection_date),g.damage_inspection_by
UNION ALL
select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id(+) and
SUBSTR(ab.action,2,1) != 'C'
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
group by damage_inspection_date, damage_inspection_by, status, cnt
)
) a
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+)
)
group by damage_inspection_date,damage_inspection_by

SORRY!!

A reader, October 19, 2005 - 3:05 pm UTC

COPY THE WRONG QUERY...

This is what I got so far but the query
is not following the rules with the MINOR
AS stated above.

select damage_inspection_date,damage_inspection_by,
max(decode(status,'MINOR',cnt)) minor,
max(decode(status,'MAJOR',cnt)) major,
max(decode(status,'TOTAL',cnt)) total
from (select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date, gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'F'
group by trunc(c.damage_inspection_date),c.damage_inspection_by
UNION ALL
select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id)
from gate_containers g, gate_damages z
where g.gate_id = z.gate_id
d.damage_type_code = 'A'
group by trunc(g.damage_inspection_date),g.damage_inspection_by
UNION ALL
select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id(+) and
SUBSTR(ab.action,2,1) != 'C'
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
group by damage_inspection_date, damage_inspection_by, status, cnt
)
) a
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+)
)
group by damage_inspection_date,damage_inspection_by

follow up

A reader, October 19, 2005 - 4:47 pm UTC

1. If you look at it closely you can see for every
every gate_id where I have multiple
Major damages just count them as one like for example
Gate_id = 1690371

2 When you have multiples MINORS (Fs)
Like gate_id = 1690486 count them as one

3 When you have gate_id with F and A ALIKE
Gate_id = 1687912 then just Count F(MAJORS)

F FOR MAJOR
A FOR MINOR

SOMETIMES IN THE RECORD WILL HAVE MAJOR AND MINORS
I JUST WANT TO COUNT THE MAJOR AND IGNORE THE MINORS.
I HAVE GIVEN 3 EXAMPLES OF THE RULES....DON'T KNOW
WHAT ELSE TO SAY...ALSO PLEASE LOOK AT THE QUERY
IT'S ALL IN THE UNION. THE ONLY PROBLEM THAT I HAVE
IS THAT I AM COUNTING THE MAJOR AND THE MINORS IN
THE MINOR UNION.

October 19, 2005 - 4:57 pm UTC

1) Ok, I'm looking at that gate id:

1690371 63 A A
1690371 63 X
1690371 64 A
1690371 64 L

IF f is for major
AND that gate id is a prime example of multiple majors
THEN where the heck is f?

2) Ok, I'm looking at that gate id:

1690486 54 F F
1690486 54 L
1690486 72 F
1690486 72 I

Now, I see F's and you said "F IS FOR MAJOR", but now you are saying this is the primary example of multiple MINORS... Maybe I'm being "dumb", but I don't get it?

3) Ok, I'm looking at that:

1687912 56 F F
1687912 56 I
1687912 66 A
1687912 66 X

and we are back to F being a major, not a minor again?

So, no, I don't get it, it is not clear, you can shout loud, but it won't matter.

(am I the only one not really following this??)

ANOTHER EXAMPLE!

A reader, October 19, 2005 - 4:55 pm UTC

1690355 59 A A
1690355 59 E
1690371 63 A A
1690371 63 X
1690371 64 A
1690371 64 L
1690405 71 A A
1690405 71 I
1690433 71 A A
1690433 71 I
1690486 54 F F
1690486 54 L
1690486 72 F
1690486 72 I
1690540 59 A A
1690540 59 E
1690636 63 A A
1690636 63 X
1690781 67 X
A 12
F 9

select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by, 'MINOR' status,
count(distinct g.gate_id) cnt
from gate_containers g, gate_damages z
where g.gate_id = z.gate_id
and z.damage_type_code = 'A'
and damage_inspection_by = 'COLUMBO'
and trunc(damage_inspection_date) = to_date('06-14-2005','mm-dd-yyyy')
group by trunc(g.damage_inspection_date),g.damage_inspection_by

result from the query:

damage_inspection_date
6/14/2005

damage_inspection_by
COLUMBO

status
MINOR

CNT
9 --------IT SHOULD BE 8 WHY? BECAUSE I WANT TO COUNT
THE "A"(MINORS) UNIQUELY,DISTINCTLY. IN OTHER WORDS

follow up

A reader, October 19, 2005 - 5:15 pm UTC

) Ok, I'm looking at that gate id:

1690371 63 A A
1690371 63 X
1690371 64 A
1690371 64 L

*****this count as one A = MINOR

IF f is for major
AND that gate id is a prime example of multiple majors
THEN where the heck is f?

2) Ok, I'm looking at that gate id:

1690486 54 F F
1690486 54 L
1690486 72 F
1690486 72 I

*******THIS COUNT AS ONE F = MAJOR

Now, I see F's and you said "F IS FOR MAJOR", but now you are saying this is the
primary example of multiple MINORS... Maybe I'm being "dumb", but I don't get
it?

3) Ok, I'm looking at that:

1687912 56 F F
1687912 56 I
1687912 66 A
1687912 66 X

***** IN THIS CASE IGNORE THE MINOR(A) AND COUNT JUST THE MAJOR(F)

****THE FAR LETTER ON THE RIGHT IS TO SHOW YOU HOW I AM
CALCULATING WHAT IS ON THE RIGHT...

October 19, 2005 - 7:44 pm UTC

so, by gate_id compute how many A's and how many F's

select gate_id, count(case when col='A' then col end) A,
count(case when col='F' then col end) F
from t;

now you have gate_id and a count of A's and F's

call that Q

select ...
from (Q);

use CASE to look at A and F and return whatever you want.

Year to date Business Day

Reader, October 19, 2005 - 6:48 pm UTC

I am trying to calculate the number of days we did business ie sold anything and then a running total for the year .

CREATE TABLE TEST (ID VARCHAR2(10),sale_dt DATE ,amount NUMBER(6,2) )

INSERT INTO TEST VALUES ('aa','14-OCT-2005',65.25);
INSERT INTO TEST VALUES ('aa','14-OCT-2005',56.25);
INSERT INTO TEST VALUES ('aa','15-SEP-2005',72.25);
INSERT INTO TEST VALUES ('aa','19-OCT-2005',43.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',67.25);
INSERT INTO TEST VALUES ('bb','13-OCT-2005',235.25);
INSERT INTO TEST VALUES ('bb','15-OCT-2005',365.25);
INSERT INTO TEST VALUES ('bb','14-NOV-2005',465.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',165.25);
COMMIT;

SELECT a.*
,SUM (sale_daily) OVER (PARTITION BY ID, TRUNC(sale_dt,'MM')
ORDER BY ID ASC, sale_dt ASC RANGE UNBOUNDED PRECEDING ) mon_sal
,SUM (sale_daily) OVER (PARTITION BY ID, TRUNC(sale_dt,'Y')
ORDER BY ID ASC, sale_dt ASC RANGE UNBOUNDED PRECEDING ) yr_sal
,COUNT(sale_daily) OVER (PARTITION BY ID, TRUNC(sale_dt,'MM')
ORDER BY ID ASC, sale_dt ASC RANGE UNBOUNDED PRECEDING ) mtd_day_of_business
,COUNT(sale_daily) OVER (PARTITION BY ID, TRUNC(sale_dt,'Y')
ORDER BY ID ASC, sale_dt ASC RANGE UNBOUNDED PRECEDING ) ytd_day_of_business
FROM
(
SELECT ID,sale_dt,SUM(amount) sale_daily FROM TEST
GROUP BY ID, sale_dt
) a
ID SALE_DT SALE_DAILY MON_SAL YR_SAL MTD_DAYOFBUS YTD_DAYOFBUS
---------- --------- ---------- ---------- ---------- ------------ ------------
aa 15-SEP-05 216.75 216.75 216.75 1 1
aa 14-OCT-05 299.25 299.25 516 1 2
aa 19-OCT-05 129.75 429 645.75 2 3
bb 14-SEP-05 697.5 697.5 697.5 1 1
bb 13-OCT-05 705.75 705.75 1403.25 1 2
bb 15-OCT-05 1095.75 1801.5 2499 2 3
bb 14-NOV-05 1395.75 1395.75 3894.75 1 4

Ideally ,the business days in the month of sep -14,15
oct-13,14,15,19 nov 1

So the year count should be 7 and the monthly count should be sep 1,2 oct 1,2,3,4 and Nov 1 .

Can this be done using analytical function or is there any other way .

Thanks

October 19, 2005 - 7:56 pm UTC

you mean like this?

ops$tkyte@ORA10GR1> SELECT a.*
  2  ,SUM (sale_daily) OVER (PARTITION BY ID, TRUNC(sale_dt,'MM')
  3       ORDER BY ID ASC, sale_dt ASC RANGE UNBOUNDED PRECEDING ) mon_sal
  4  ,SUM (sale_daily) OVER (PARTITION BY ID, TRUNC(sale_dt,'Y')
  5       ORDER BY ID ASC, sale_dt ASC RANGE UNBOUNDED PRECEDING ) yr_sal
  6  ,COUNT(sale_daily) OVER (PARTITION BY TRUNC(sale_dt,'MM') order by sale_dt )
  7  mtd_day_of_business
  8  ,COUNT(sale_daily) OVER (PARTITION BY TRUNC(sale_dt,'Y') )
  9  ytd_day_of_business
 10  FROM
 11  (
 12  SELECT ID,sale_dt,SUM(amount) sale_daily FROM TEST
 13  GROUP BY ID, sale_dt
 14  ) a
 15  order by sale_dt
 16  /

ID         SALE_DT   SALE_DAILY    MON_SAL     YR_SAL MTD_DAY_OF_BUSINESS YTD_DAY_OF_BUSINESS
---------- --------- ---------- ---------- ---------- ------------------- -------------------
bb         14-SEP-05     2092.5     2092.5     2092.5                   1                   7
aa         15-SEP-05     650.25     650.25     650.25                   2                   7
bb         13-OCT-05    2117.25    2117.25    4209.75                   1                   7
aa         14-OCT-05     1093.5     1093.5    1743.75                   2                   7
bb         15-OCT-05    3287.25     5404.5       7497                   3                   7
aa         19-OCT-05     389.25    1482.75       2133                   4                   7
bb         14-NOV-05    4187.25    4187.25   11684.25                   1                   7

7 rows selected.

Year to date ..

Reader, October 20, 2005 - 12:27 am UTC

The months are fine .
But year should increment ie 1,2,3,4,5,6,7
now it 7th business day for all transactions .

Thanks

October 20, 2005 - 8:06 am UTC

I did that, because you asked for that.

... So the year count should be 7 and the monthly count should be sep 1,2 oct
1,2,3,4 and Nov 1 .....

add the order by to the year count just like I did for the month.

the order by will make it a running total.

Thank you

A reader, October 20, 2005 - 8:56 am UTC

Tom,

Thank you for your solution, above. However,
Can you give me a solution using CASE when
I can count either A and NOT F?

Thank you again.

October 20, 2005 - 8:59 am UTC

select case when cnta > 0 and cntf > 0
then ...
when cnta = 0 and cntf > 0
then ...
when cnta > 0 and cntf = 0
then ...

use a boolean expression after computing the cnt of A and the cnt of F

SOMETHING LIKE THIS...

A reader, October 20, 2005 - 10:18 am UTC

Tom,

you mean something like this..

select trunc(g.damage_inspection_date) damage_inspection_date, g.damage_inspection_by,
'MINOR' STATUS,
SUM (case WHEN z.DAMAGE_TYPE_CODE= 'A' THEN 1 ELSE 0 end)
from gate_containers g, gate_damages z
where g.gate_id = z.gate_id
and z.damage_type_code = 'A'
AND damage_inspection_by = 'COLUMBO'
and trunc(damage_inspection_date) = to_date('06-14-2005','mm-dd-yyyy')
group by trunc(g.damage_inspection_date),g.damage_inspection_by

October 20, 2005 - 4:33 pm UTC

sure

FINAL SOLUTION

A reader, October 20, 2005 - 11:04 am UTC

Tom,

here is the problem that I was facing....I hope
this clear things out.

SQL Statement which produced this data:
select trunc(g.damage_inspection_date) damage_inspection_date, g.damage_inspection_by,'MINOR' STATUS, g.gate_id, z.damage_type_code
from gate_containers g, gate_damages z
where g.gate_id = z.gate_id
and z.damage_type_code = 'A'
AND damage_inspection_by = 'COLUMBO'
--and z.gate_id = '1688273'
and trunc(damage_inspection_date) = to_date('06-14-2005','mm-dd-yyyy')
-------------------------------------------------------------------------

6/14/2005 COLUMBO MINOR 1688235 A
6/14/2005 COLUMBO MINOR 1688609 A
6/14/2005 COLUMBO MINOR 1688273 A------was counting this a minor when it should be counted as a major
6/14/2005 COLUMBO MINOR 1686769 A
6/14/2005 COLUMBO MINOR 1686517 A
6/14/2005 COLUMBO MINOR 1687985 A
6/14/2005 COLUMBO MINOR 1686483 A
6/14/2005 COLUMBO MINOR 1685361 A
6/14/2005 COLUMBO MINOR 1686414 A

SQL Statement which produced this data:
select trunc(g.damage_inspection_date) damage_inspection_date, g.damage_inspection_by,'MINOR' STATUS, g.gate_id, z.damage_type_code
from gate_containers g, gate_damages z
where g.gate_id = z.gate_id
--and z.damage_type_code = 'A'
AND damage_inspection_by = 'COLUMBO'
and z.gate_id = '1688273'
and trunc(damage_inspection_date) = to_date('06-14-2005','mm-dd-yyyy')

6/14/2005 COLUMBO MINOR 1688273 A--------throw away!
6/14/2005 COLUMBO MINOR 1688273 E
6/14/2005 COLUMBO MINOR 1688273 C
6/14/2005 COLUMBO MINOR 1688273 F-------keep
6/14/2005 COLUMBO MINOR 1688273 I

follow up

A reader, October 20, 2005 - 5:01 pm UTC

Tom,

I am still not able to get the corrent result using
a case statement. Maybe I should use a function to return only the F when there is a F and A in the record.

select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
sum (case when z.damage_type_code = 'F' then 1 else 0 end) cnt
from gate_containers g, gate_damages z
where g.gate_id = z.gate_id and
z.damage_type_code = 'F'
group by trunc(g.damage_inspection_date),g.damage_inspection_by

Can this be done using analytical functions??

A reader, October 25, 2005 - 2:59 pm UTC

Tom,

I am done with my query....I am looking for a better
approach or how I can improve it. Thanks!!

select damage_inspection_date, damage_inspection_by,
max(decode(status,'MINOR',cnt)) minor,
max(decode(status,'MAJOR',cnt)) major,
max(decode(status,'TOTAL',cnt)) total
from (select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date)
damage_inspection_date, gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'F'
group by trunc(c.damage_inspection_date),c.damage_inspection_by
UNION ALL
select trunc(g.damage_inspection_date) damage_inspection_date, g.damage_inspection_by, 'MINOR' status,
count(distinct g.gate_id) cnt
from gate_containers g, gate_damages z
where g.gate_id = z.gate_id
and z.damage_type_code = 'A'
and not exists
(select z.gate_id from gate_damages z
where z.gate_id = g.gate_id
and z.damage_type_code = 'F')
group by trunc(g.damage_inspection_date),g.damage_inspection_by
UNION ALL
select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id(+) and
SUBSTR(ab.action,2,1) != 'C'
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
group by damage_inspection_date, damage_inspection_by, status, cnt
)
) a
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+))
group by damage_inspection_date, damage_inspection_by;

October 26, 2005 - 11:24 am UTC

sorry - too big to reverse engineer here as a review/followup....

Analytic Question

Yoav, November 20, 2005 - 8:16 am UTC

Hi Tom.
Im tring to calculating Weighted moving average.
Im having a problem to calculate the values under column SUM_D.
Can you please demonstrate how to achieve the values that appears under the column SUM_D ?

create table t
(stock_date date,
close_value number(8,2));

INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('02-OCT-2005',759.56);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('29-SEP-2005',753.59);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('28-SEP-2005',749.20);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('27-SEP-2005',741.71);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('26-SEP-2005',729.93);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('25-SEP-2005',719.48);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('22-SEP-2005',727.30);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('21-SEP-2005',735.81);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('20-SEP-2005',740.38);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('19-SEP-2005',739.86);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('18-SEP-2005',745.48);
INSERT INTO TEST(STOCK_DATE, CLOSE_VALUE) VALUES ('15-SEP-2005',744.65);
COMMIT;

select RN, day, stock_date,close_value,weight
from(
select rownum RN,to_char(stock_date,'d') Day,
stock_date,close_value,
(case when to_char(stock_date,'d') = 1 then
1*close_value
when to_char(stock_date,'d') = 2 then
2*close_value
when to_char(stock_date,'d') = 3 then
3*close_value
when to_char(stock_date,'d') = 4 then
4*close_value
when to_char(stock_date,'d') = 5 then
5*close_value
end) weight
from( select rownum,stock_date,close_value
from test
order by stock_date asc)
order by 1
)
ORDER BY 1
/

RN D STOCK_DAT CLOSE_VALUE WEIGHT SUM_D
--------- - --------- ----------- ---------- ----------
1 5 15-SEP-05 744.65 3723.25 5
2 1 18-SEP-05 745.48 745.48 1 <==
3 2 19-SEP-05 739.86 1479.72 3
4 3 20-SEP-05 740.38 2221.14 6
5 4 21-SEP-05 735.81 2943.24 10
6 5 22-SEP-05 727.3 3636.5 15
7 1 25-SEP-05 719.48 719.48 1 <==
8 2 26-SEP-05 729.93 1459.86 3
9 3 27-SEP-05 741.71 2225.13 6
10 4 28-SEP-05 749.2 2996.8 10
11 5 29-SEP-05 753.59 3767.95 15

RN D STOCK_DAT CLOSE_VALUE WEIGHT SUM_D
--------- - --------- ----------- ---------- ----------
12 1 02-OCT-05 759.56 759.56 1

Thank You.

November 20, 2005 - 8:31 am UTC

would you like to explain what sum_d is to you? explain the logic behind it.

Analytic Question

Yoav, November 20, 2005 - 10:10 am UTC

Hi Tom.
Im sorry if my explanation wasnt clear enough.
The column SUM_D is actualy a "running total" of the column
Day.
The thing is that i need to reset the value of the
column SUM_D to 1 at the beginning of each week (Sunday).

select RN, day, TheDayIs, stock_date
from(
select rownum RN,to_char(stock_date,'d') Day,
to_char(stock_date,'Day')TheDayIs,
stock_date,close_value
from( select rownum,stock_date,close_value
from test
order by stock_date asc)
order by 1
)
ORDER BY 1
/

RN D SUM_D THEDAYIS STOCK_DAT
--------- - ----- --------- ---------
1 5 5 Thursday 15-SEP-05
2 1 1 Sunday 18-SEP-05
3 2 3 Monday 19-SEP-05
4 3 6 Tuesday 20-SEP-05
5 4 10 Wednesday 21-SEP-05
6 5 15 Thursday 22-SEP-05
7 1 1 Sunday 25-SEP-05
8 2 3 Monday 26-SEP-05
9 3 6 Tuesday 27-SEP-05
10 4 10 Wednesday 28-SEP-05
11 5 15 Thursday 29-SEP-05
12 1 1 Sunday 02-OCT-05

Thank you for you quick response

November 21, 2005 - 8:20 am UTC

you might have to "adjust" your stock_date by a day if 'ww' doesn't group right for you with your NLS settings (sometimes the week ends on a different day depending on your NLS settings - locale issue)



ops$tkyte@ORA9IR2> select row_number() over (order by stock_date) rn,
  2         to_char(stock_date,'d') day,
  3         stock_date,
  4         close_value,
  5         to_number(to_char(stock_date,'d'))*close_value weight,
  6          sum(to_number(to_char(stock_date,'d')))  
                  over (partition by to_char(stock_date,'ww') 
                            order by stock_date) sum_d
  7    from t
  8   order by stock_date
  9  /
 
        RN D STOCK_DAT CLOSE_VALUE     WEIGHT      SUM_D
---------- - --------- ----------- ---------- ----------
         1 5 15-SEP-05      744.65    3723.25          5
         2 1 18-SEP-05      745.48     745.48          1
         3 2 19-SEP-05      739.86    1479.72          3
         4 3 20-SEP-05      740.38    2221.14          6
         5 4 21-SEP-05      735.81    2943.24         10
         6 5 22-SEP-05       727.3     3636.5         15
         7 1 25-SEP-05      719.48     719.48          1
         8 2 26-SEP-05      729.93    1459.86          3
         9 3 27-SEP-05      741.71    2225.13          6
        10 4 28-SEP-05       749.2     2996.8         10
        11 5 29-SEP-05      753.59    3767.95         15
        12 1 02-OCT-05      759.56     759.56          1
 
12 rows selected.

Analytics Question

Yoav, November 21, 2005 - 7:41 am UTC

Hi Tom.
Im Sorry for wasting you time.
i found the solution.

select RN, day, week_no,
sum(day) over
(partition by week_no
order by day) sum_d,
stock_date
from(
select RN, day, week_no, stock_date
from(select rownum RN,to_char(stock_date,'d') Day,
to_char(stock_date,'ww') week_no,
stock_date,close_value,
(case when to_char(stock_date,'d') = 1 then
1*close_value
when to_char(stock_date,'d') = 2 then
2*close_value
when to_char(stock_date,'d') = 3 then
3*close_value
when to_char(stock_date,'d') = 4 then
4*close_value
when to_char(stock_date,'d') = 5 then
5*close_value
end) weight
from( select rownum,'Y',stock_date,close_value
from test
order by stock_date asc)
order by 1)
)
order by 1
/

RN D WE SUM_D STOCK_DAT
------ - -- ---------- ---------
1 5 37 5 15-SEP-05
2 1 38 1 18-SEP-05
3 2 38 3 19-SEP-05
4 3 38 6 20-SEP-05
5 4 38 10 21-SEP-05
6 5 38 15 22-SEP-05
7 1 39 1 25-SEP-05
8 2 39 3 26-SEP-05
9 3 39 6 27-SEP-05
10 4 39 10 28-SEP-05
11 5 39 15 29-SEP-05

RN D WE SUM_D STOCK_DAT
------ - -- ---------- ---------
12 1 40 1 02-OCT-05

Thank You. !!

November 21, 2005 - 8:52 am UTC

see above, you can skip lots of steps here!

Analytics Question

Yoav, November 22, 2005 - 5:29 am UTC

Tom.
Your solution is better then my.
Thank you !

Could you please help me with this

A reader, November 29, 2005 - 4:17 am UTC

I am trying to output a report with different aggregates for different price ranges

create table t(
id number(3),
year number(4),
month number(2),
slno number(2),
colorcd number(2),
sizecd number(2),
itemid number(4),
prdno number(3),
price number(4),
st_qty number(3),
sl_qty number(3),
constraint pk_t primary key(id, year, month, slno));

create table p(
itemid number(4) primary key,
displaycd varchar2(2),
itemname varchar2(10));

insert into t values (1,2005,1,1,1,10,1000,101,150,100,10);
insert into t values (1,2005,1,2,1,11,1000,101,150,120,2);
insert into t values (1,2005,1,3,1,12,1000,101,150,100,10);
insert into t values (1,2005,1,4,1,13,1000,102,150,200,2);
insert into t values (1,2005,2,5,2,10,1000,102,150,100,20);
insert into t values (1,2005,2,6,2,11,1000,102,150,100,12);
insert into t values (1,2005,2,7,3,10,1000,103,150,100,20);
insert into t values (1,2005,3,8,4,10,1000,103,150,100,22);
insert into t values (1,2005,4,9,4,11,1000,103,150,100,12);
insert into t values (1,2005,1,10,5,10,1000,104,450,100,10);
insert into t values (1,2005,1,11,5,11,1000,104,450,120,2);
insert into t values (1,2005,1,12,5,12,1000,104,450,100,10);
insert into t values (1,2005,1,13,5,13,1000,104,450,200,2);
insert into t values (1,2005,2,14,5,14,1000,104,450,100,20);
insert into t values (1,2005,1,15,6,10,1001,105,150,100,10);
insert into t values (1,2005,1,16,6,11,1001,105,150,120,2);
insert into t values (1,2005,1,17,6,12,1001,105,150,100,10);
insert into t values (1,2005,1,18,6,13,1001,105,150,200,2);
insert into t values (1,2005,2,19,7,10,1001,105,150,100,20);
insert into t values (1,2005,2,20,7,11,1002,106,400,100,12);
insert into t values (1,2005,2,21,8,10,1002,106,400,100,20);
insert into t values (1,2005,3,22,9,10,1002,107,400,100,22);
insert into t values (1,2005,4,23,10,11,1002,107,400,100,12);

insert into p values(1000,'AA','Item0');
insert into p values(1001,'AB','Item1');
insert into p values(1002,'AC','Item2');
insert into p values(1003,'AD','Item3');

Desc Itemname <199 <299 <399 <499
----------------------------------------------------------------------------
Count of distinct prdnos Item0 3 null null 1
(Count of distinct prdnos
group by colorcd, sizecd) 3 null null 1
sum of sl_qty group by itemid 110 null null 44

Count of distinct prdnos Item1 1 null null null
(Count of distinct prdnos
group by colorcd, sizecd) 1 null null null
sum of sl_qty group by itemid 44 null null null

Count of distinct prdnos Item2 null null null 2
(Count of distinct prdnos
group by colorcd, sizecd) null null null 1
sum of sl_qty group by itemid null null null 66

Is this possible? The Desc column is not needed and 'null' should be blank.

Thank you

November 29, 2005 - 10:22 am UTC

I don't get the "group by colorcd, sizecd" bit. If you group by those attributes, you'll get a row per unique ITEMNAME, COLORCD, SIZECD.

I don't understand the logic.

A reader, November 29, 2005 - 10:48 am UTC

Dear Tom,

For each unique record of ITEMNAME, COLORCD, SIZECD, the PRODNOs are repeating isn't it? I need a count of distinct prodnos.

COLORCD-SIZECD-ITEMID-PRODNO in that order, please see below

first group
-------------------------
1-10-1000-101
1-11-1000-101
1-12-1000-101
----------------------
second group
----------------------
1-13-1000-102
2-10-1000-102
2-11-1000-102

Both these groups comes under price range < 199. So the distinct count of PRODNO for price range < 199 = 2

Hope this make sense.

Thank you

November 30, 2005 - 10:46 am UTC

not understanding how this gets down to a single row. I did not get it.

why would that be different than the count of disintct prodno's by itemid.

how about this query

steve, November 30, 2005 - 8:04 pm UTC

Hi Tom,

Is there simple way to do it by analytic function?

select dept_num, id, sum(curr_adj_qty)
from
(
select dept_num, id, sum(current_adjust_qty) curr_adj_qty
from adjust
where applied_ind = 'N'
and expired_ind = 'N'
group by dept_num, id
UNION
select dept_num,id,(sum(current_adjust_qty)*-1)curr_adj_qty
from adjust
where expired_ind = 'N'
and applied_ind = 'Y'
group by dept_num, id
) adj_tmp
group by dept_num, id

Thanks a lot!
Steve

November 30, 2005 - 9:17 pm UTC

hows about you

a) set up a small example
b) explain what it is supposed to do in text (so we don't have to reverse engineer what you might have been thinking)

[RE] to Steve NYC

Marcio Portes, November 30, 2005 - 10:43 pm UTC

May be he is looking for this

ops$marcio@LNX10GR2> select dept_num, id, sum(curr_adj_qty)
2 from
3 (
4 select dept_num, id, sum(current_adjust_qty) curr_adj_qty
5 from adjust
6 where applied_ind = 'N'
7 and expired_ind = 'N'
8 group by dept_num, id
9 UNION
10 select dept_num,id,(sum(current_adjust_qty)*-1)curr_adj_qty
11 from adjust
12 where expired_ind = 'N'
13 and applied_ind = 'Y'
14 group by dept_num, id
15 ) adj_tmp
16 group by dept_num, id
17 /

DEPT_NUM ID SUM(CURR_ADJ_QTY)
------------- ------------- -----------------
1 0 185
1 2 186
0 2 77
0 0 81
1 1 165
0 1 56

6 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2815735809

--------------------------------------------------------------------------------
|Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 390 | 11 (46)| 00:00:01 |
| 1 | HASH GROUP BY | | 10 | 390 | 11 (46)| 00:00:01 |
| 2 | VIEW | | 10 | 390 | 10 (40)| 00:00:01 |
| 3 | SORT UNIQUE | | 10 | 120 | 10 (70)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | HASH GROUP BY | | 5 | 60 | 5 (40)| 00:00:01 |
|* 6 | TABLE ACCESS FULL| ADJUST | 250 | 3000 | 3 (0)| 00:00:01 |
| 7 | HASH GROUP BY | | 5 | 60 | 5 (40)| 00:00:01 |
|* 8 | TABLE ACCESS FULL| ADJUST | 250 | 3000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - filter("APPLIED_IND"='N' AND "EXPIRED_IND"='N')
8 - filter("EXPIRED_IND"='N' AND "APPLIED_IND"='Y')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
10 consistent gets
0 physical reads
0 redo size
624 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed

ops$marcio@LNX10GR2>
ops$marcio@LNX10GR2> select dept_num, id,
2 sum( case when applied_ind = 'N'
3 then current_adjust_qty
4 else 0 end )
5 - sum( case when applied_ind = 'Y'
6 then current_adjust_qty
7 else 0 end ) curr_adj_qty
8 from adjust
9 where expired_ind = 'N'
10 group by dept_num, id
11 /

DEPT_NUM ID CURR_ADJ_QTY
------------- ------------- -------------
1 0 185
1 2 186
0 2 77
1 1 165
0 0 81
0 1 56

6 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3658272021

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 60 | 4 (25)| 00:00:01 |
| 1 | HASH GROUP BY | | 5 | 60 | 4 (25)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| ADJUST | 500 | 6000 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("EXPIRED_IND"='N')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
622 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed

I used this script to produce output above.
set echo on
drop table adjust purge;

create table
adjust (
dept_num int,
id int,
applied_ind char(1),
expired_ind char(1),
current_adjust_qty int
);

insert /*+ append */ into adjust
with v
as ( select level l from dual connect by level <= 1000 )
select mod(l, 2), mod(l, 3),
decode(mod(l,8), 0, 'Y', 'N'),
decode(mod(l,5), 0, 'N', 'Y'),
trunc(dbms_random.value(1,10.000))
from v
/
commit;
exec dbms_stats.gather_table_stats( user, 'adjust' )
set autotrace on
select dept_num, id, sum(curr_adj_qty)
from
(
select dept_num, id, sum(current_adjust_qty) curr_adj_qty
from adjust
where applied_ind = 'N'
and expired_ind = 'N'
group by dept_num, id
UNION
select dept_num,id,(sum(current_adjust_qty)*-1)curr_adj_qty
from adjust
where expired_ind = 'N'
and applied_ind = 'Y'
group by dept_num, id
) adj_tmp
group by dept_num, id
/

select dept_num, id,
sum( case when applied_ind = 'N'
then current_adjust_qty
else 0 end )
- sum( case when applied_ind = 'Y'
then current_adjust_qty
else 0 end ) curr_adj_qty
from adjust
where expired_ind = 'N'
group by dept_num, id
/
set autotrace off
set echo off

Multiple aggregates

Raj, December 01, 2005 - 7:45 am UTC

Dear Tom,

This is continuing with my previous post where the given data for the problem was wrong. I was trying to make a sample testcase. Here is the requirements along with the create table statements and corrected data.

This is to output sales figures for a given period of different products.

The output format should be,

1. ITEMNAME - All the items from item table whether a match occurs or not.
2. DISPLAYCD
3. PRICE
4. Count of distinct PRODNOs for an item group by PRICE
5. Total count of distinct( PRODNO+COLORCD+SIZECD) for an item group by PRICE
6. Total SL_QTY for an item group by PRICE
7. Total SL_QTY*PRICE for an item group by PRICE
8. Avg of PRICE for an item
9. Avg of (ST_QTY/SL_QTY) * 7 for an item

create table t(
id number(3),
slno number(2),
year number(4),
month number(2),
itemid number(4),
prdno number(3),
colorcd number(2),
sizecd number(2),
price number(4),
st_qty number(3),
sl_qty number(3),
constraint pk_t primary key(id, year, month, slno));

create table p(
itemid number(4) primary key,
displaycd varchar2(2),
itemname varchar2(10));

With Items as(
select
itemid, displaycd, itemname
from
p),
DistinctCounts as(
select
min(itemid) itemid, min(prdno) prdno, count(prdno) c2, price
from
(select
distinct prdno,colorcd, sizecd, price, itemid
from
t
where
id = 1 and
year= 2005 and
month in(1,2,3)
order by
price, prdno, colorcd, sizecd )
group by price),
Aggregates as(
select
price, min(itemid) itemid, min(prdno) prdno, max(c1) c1, sum(c3) c3,
sum(c4) c4, avg(price) c5, trunc(avg(c6),1) c6
from
(
select
itemid, month, prdno,colorcd, sizecd,
count(distinct prdno) over (partition by price) c1,
sl_qty c3,
sl_qty*price c4,
price,
trunc(st_qty/decode(sl_qty,0,1,sl_qty),1)*7 c6
from
t
where
id = 1 and
year= 2005 and
month in(1,2,3)
order by
prdno,colorcd, sizecd)
group by
price)
select
a.itemid, i.itemname, a.price, sum(c1) prdno_cnt,
c2 sku_cnt, sum(c3) sale_cnt, sum(c4) sale_price, avg(c5) avg_price,
avg(c6) avg_trend
from
DistinctCounts d, Aggregates a, Items i
where
d.prdno=a.prdno and
d.price=a.price and
i.itemid=d.itemid
group by
a.price,a.itemid, i.itemname, c2
order by
a.itemid, i.itemname, a.price
/

With this query I am able to get the report like this,

ITEMID ITEMNAME PRICE PRDNO_CNT SKU_CNT SALE_CNT SALE_PRICE AVG_PRICE AVG_TREND
---------- -------------------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1000 Item0 150 2 8 280 42000 150 40
1000 Item0 450 1 4 110 49500 450 46.6
1001 Item1 350 1 5 270 94500 350 32.6

Is it possible to get the report in the following format along with the nonmatching itemnames and null cells as blanks.

Itemname <199 <299 <399 <499
------------------------------------------------
Item0 2 null null 1
8 null null 4
280 null null 110
42000 null null 49500
150 null null 450
40.0 null null 46.6

Item1 null null 1 null
null null 5 null
null null 270 null
null null 94500 null
null null 350 null
null null 32.6 null

Item2 null null null null
null null null null
null null null null
null null null null
null null null null
null null null null

Item3 null null null null
null null null null
null null null null
null null null null
null null null null
null null null null

Many thanks for your help and patience

Sorry

Raj, December 01, 2005 - 9:45 pm UTC

Dear Tom,

I am sorry, I did'nt post the insert statements. Sorry for being careless. I did execute everything in my system and was formating and copying it one by one, previewed and reread it before posting and still missed it. I will repost the requirements below,

This is to output sales figures for a given period of different products.

The output format should be,

1. ITEMNAME - All the items from item table whether a match occurs or not.
2. DISPLAYCD
3. PRICE
4. Count of distinct PRODNOs for an item grouped by PRICE
5. Total count of distinct( PRODNO+COLORCD+SIZECD) for an item grouped by PRICE
6. Total SL_QTY for an item grouped by PRICE
7. Total SL_QTY*PRICE for an item grouped by PRICE
8. Avg of PRICE for an item
9. Avg of (ST_QTY/SL_QTY) * 7 for an item

drop table t;
drop table p;

create table t(
id number(3),
slno number(2),
year number(4),
month number(2),
itemid number(4),
prdno number(3),
colorcd number(2),
sizecd number(2),
price number(4),
st_qty number(3),
sl_qty number(3),
constraint pk_t primary key(id, year, month, slno));

create table p(
itemid number(4) primary key,
displaycd varchar2(2),
itemname varchar2(10));

insert into t values (1,1,2005,1,1000,101,1,10,150,90,10);
insert into t values (1,2,2005,1,1000,101,1,11,150,80,20);
insert into t values (1,3,2005,1,1000,101,1,12,150,90,10);
insert into t values (1,4,2005,1,1000,101,1,13,150,80,20);
insert into t values (1,5,2005,2,1000,101,1,10,150,80,20);
insert into t values (1,25,2005,1,1000,104,1,11,150,80,20);
insert into t values (1,27,2005,1,1000,104,1,13,150,80,20);
insert into t values (1,25,2005,2,1000,104,1,11,150,80,20);
insert into t values (1,27,2005,2,1000,104,1,13,150,80,20);
insert into t values (1,26,2005,2,1000,104,1,12,150,90,10);
insert into t values (1,24,2005,2,1000,104,1,10,150,90,10);
insert into t values (1,26,2005,1,1000,104,1,12,150,90,10);
insert into t values (1,24,2005,1,1000,104,1,10,150,90,10);
insert into t values (1,6,2005,2,1000,101,1,11,150,60,40);
insert into t values (1,7,2005,2,1000,101,1,12,150,80,20);
insert into t values (1,14,2005,2,1000,101,1,13,150,80,20);
insert into t values (1,15,2005,1,1001,103,1,10,350,90,10);
insert into t values (1,23,2005,3,1001,103,1,11,350,10,90);
insert into t values (1,22,2005,3,1001,103,1,10,350,90,10);
insert into t values (1,21,2005,2,1001,103,1,11,350,80,20);
insert into t values (1,20,2005,2,1001,103,1,10,350,80,20);
insert into t values (1,19,2005,1,1001,103,1,14,350,80,20);
insert into t values (1,18,2005,1,1001,103,1,13,350,70,30);
insert into t values (1,17,2005,1,1001,103,1,12,350,40,60);
insert into t values (1,16,2005,1,1001,103,1,11,350,90,10);
insert into t values (1,8,2005,1,1000,102,1,10,450,80,20);
insert into t values (1,9,2005,1,1000,102,1,11,450,90,10);
insert into t values (1,10,2005,1,1000,102,1,12,450,90,10);
insert into t values (1,11,2005,1,1000,102,1,13,450,90,10);
insert into t values (1,12,2005,2,1000,102,1,10,450,80,10);
insert into t values (1,13,2005,2,1000,102,1,11,450,50,50);

insert into p values(1000,'AA','Item0');
insert into p values(1001,'AB','Item1');
insert into p values(1002,'AC','Item2');
insert into p values(1003,'AD','Item3');

commit;

With Items as(
select
itemid, displaycd, itemname
from
p),
DistinctCounts as(
select
min(itemid) itemid, min(prdno) prdno, count(prdno) c2, price
from
(select
distinct prdno,colorcd, sizecd, price, itemid
from
t
where
id = 1 and
year= 2005 and
month in(1,2,3)
order by
price, prdno, colorcd, sizecd )
group by price),
Aggregates as(
select
price, min(itemid) itemid, min(prdno) prdno, max(c1) c1, sum(c3) c3,
sum(c4) c4, avg(price) c5, trunc(avg(c6),1) c6
from
(
select
itemid, month, prdno,colorcd, sizecd,
count(distinct prdno) over (partition by price) c1,
sl_qty c3,
sl_qty*price c4,
price,
trunc(st_qty/decode(sl_qty,0,1,sl_qty),1)*7 c6
from
t
where
id = 1 and
year= 2005 and
month in(1,2,3)
order by
prdno,colorcd, sizecd)
group by
price)
select
a.itemid, i.itemname, a.price, sum(c1) prdno_cnt,
c2 sku_cnt, sum(c3) sale_cnt, sum(c4) sale_price, avg(c5) avg_price,
avg(c6) avg_trend
from
DistinctCounts d, Aggregates a, Items i
where
d.prdno=a.prdno and
d.price=a.price and
i.itemid=d.itemid
group by
a.price,a.itemid, i.itemname, c2
order by
a.itemid, i.itemname, a.price
/
ITEMID ITEMNAME PRICE PRDNO_CNT SKU_CNT SALE_CNT SALE_PRICE AVG_PRICE AVG_TREND
------ -------- ------ --------- -------- --------- ----------- ---------- ----------
1000 Item0 150 2 8 280 42000 150 40
1000 Item0 450 1 4 110 49500 450 47
1001 Item1 350 1 5 270 94500 350 33

Is it possible to get the report in the following format along with the
nonmatching itemnames and null cells as blanks.

Itemname <199 <299 <399 <499
------------------------------------------------
Item0 2 null null 1
8 null null 4
280 null null 110
42000 null null 49500
150 null null 450
40.0 null null 46.6

Item1 null null 1 null
null null 5 null
null null 270 null
null null 94500 null
null null 350 null
null null 32.6 null

Item2 null null null null
null null null null
null null null null
null null null null
null null null null
null null null null

Item3 null null null null
null null null null
null null null null
null null null null
null null null null
null null null null

Thanking you

December 02, 2005 - 10:48 am UTC

it wasn't just that - it was "this was too big to answer in a coupld of seconds and since I get over 1,000 of these a month, I cannot spend too much time on each one, I'd rather take NEW questions sometimes"

a single query to collapse date ranges

Bob Lyon, December 06, 2005 - 12:56 pm UTC

Tom,

I know what I want to do but can't quite get my mind around the syntax...

We want a single query to collapse date ranges under the assumption that a date range that
starts later than another range has a better value.

So given this test case

CREATE GLOBAL TEMPORARY TABLE RDL (
DATE_FROM DATE,
DATE_TO DATE,
VALUE NUMBER
);

INSERT INTO RDL VALUES (TO_DATE('01/03/2005', 'MM/DD/YYYY'), TO_DATE('01/12/2005', 'MM/DD/YYYY'), 5);
INSERT INTO RDL VALUES (TO_DATE('01/05/2005', 'MM/DD/YYYY'), TO_DATE('01/10/2005', 'MM/DD/YYYY'), 8);

-- I assume the innermost subquery would
-- use the DUAL CONNECT BY LEVEL trick to generate individual days for each grouping
-- ORDER BY DATE_FROM

1 01/03/2005 01/04/2005 5
1 01/04/2005 01/05/2005 5
1 01/05/2005 01/06/2005 5
1 01/06/2005 01/07/2005 5
1 01/07/2005 01/08/2005 5
1 01/08/2005 01/09/2005 5
1 01/09/2005 01/10/2005 5
1 01/10/2005 01/11/2005 5
1 01/11/2005 01/12/2005 5

2 01/05/2005 01/06/2005 8
2 01/06/2005 01/07/2005 8
2 01/07/2005 01/08/2005 8
2 01/08/2005 01/09/2005 8
2 01/09/2005 01/10/2005 8

-- an outer subquery would use analytics to get the max grouping

1 01/03/2005 01/04/2005 5
1 01/04/2005 01/05/2005 5
2 01/05/2005 01/06/2005 8
2 01/06/2005 01/07/2005 8
2 01/07/2005 01/08/2005 8
2 01/08/2005 01/09/2005 8
2 01/09/2005 01/10/2005 8
1 01/10/2005 01/11/2005 5
1 01/11/2005 01/12/2005 5

-- And the outermost subquery would use analytics to collapse the dates into contiguous groups
-- for the desired result

1 01/03/2005 01/05/2005 5
2 01/05/2005 01/10/2005 8
1 01/10/2005 01/12/2005 5

The trick is to do all of the above in a single query!

Any suggestions (Yeah, I know, REALLY learn analytics!)

Thanks in advance,

Bob Lyon

OK, I think I got it

Bob Lyon, December 06, 2005 - 2:25 pm UTC

SELECT d date_from, d2 date_to, value
FROM (
SELECT D, LEAD (d) OVER (ORDER BY D) d2, VALUE
FROM (
SELECT DATE_FROM D, VALUE FROM RDL
UNION
SELECT DATE_TO D, LAG (VALUE) OVER (ORDER BY DATE_FROM) VALUE FROM RDL
)
)
WHERE D2 IS NOT NULL
/

DATE_FROM DATE_TO VALUE
----------------- ----------------- ----------
01/03/05 00:00:00 01/05/05 00:00:00 5
01/05/05 00:00:00 01/10/05 00:00:00 8
01/10/05 00:00:00 01/12/05 00:00:00 5

December 06, 2005 - 3:50 pm UTC

depends on how many overlaps you allow, take your create and:


...
ops$tkyte@ORA9IR2> INSERT INTO RDL VALUES (TO_DATE('01/06/2005', 'MM/DD/YYYY'),
  2  TO_DATE('01/7/2005', 'MM/DD/YYYY'), 99);

1 row created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> SELECT d date_from, d2 date_to, value
  2  FROM (
  3     SELECT D, LEAD (d) OVER (ORDER BY D) d2, VALUE
  4     FROM (
  5     SELECT DATE_FROM D, VALUE FROM RDL
  6     UNION
  7     SELECT DATE_TO   D, LAG (VALUE) OVER (ORDER BY DATE_FROM) VALUE FROM RDL
  8     )
  9  )
 10  WHERE D2 IS NOT NULL
 11  /

DATE_FROM DATE_TO        VALUE
--------- --------- ----------
03-JAN-05 05-JAN-05          5
05-JAN-05 06-JAN-05          8
06-JAN-05 07-JAN-05         99
07-JAN-05 10-JAN-05          8
10-JAN-05 12-JAN-05          5


so, maybe we expand out and keep the row we want:

ops$tkyte@ORA9IR2> with
  2  data
  3  as
  4  (select level-1 l
  5    from (select max(date_to-date_from+1) n from rdl) n
  6  connect by level <= n)
  7  select rdl.date_from+l,
  8         to_number( substr( max( to_char(date_from,'yyyymmdd') ||  value ), 9 ) ) value
  9    from rdl, data
 10   where data.l <= rdl.date_to-rdl.date_from
 11   group by rdl.date_from+l
 12  ;

RDL.DATE_      VALUE
--------- ----------
03-JAN-05          5
04-JAN-05          5
05-JAN-05          8
06-JAN-05         99
07-JAN-05         99
08-JAN-05          8
09-JAN-05          8
10-JAN-05          8
11-JAN-05          5
12-JAN-05          5

10 rows selected.

Select with Analytics Working Partially

denni50, December 06, 2005 - 4:34 pm UTC

Hi Tom

have a question with the below script that is puzzling.
I'm using the 4 idnumbers as test data. I'm looking to select the most recent record where the appealcode is like '_R%' for 2005.

when I run the script it only pulls 2 of the idnumbers.
I've been looking at this and can't see why the other two are being bypassed.I'm trying to use more analytics in my code, it's working for two records and not the other two.

any tips/help greatly appreciated.


SQL> select idnumber,usercode1,substr(mon_raw,1,1)||lower(substr(mon_raw,2))||'2005' firstpaycode,
  2  paydate,payamount,transnum,ltransnum,appealcode
  3    from (
  4  select x.*, row_number() over (partition by idnumber order by payamount desc, transnum desc) rn
 
  5    from (
  6  select idnumber,usercode1,to_char(paydate,'MON') mon_raw, paydate,
  7  payamount, transnum,ltransnum,appealcode,
  8  max(paydate) over (partition by idnumber) maxpd
  9         from payment
 10   where paydate between to_date('01-JAN-2005','DD-MON-YYYY') and to_date('31-OCT-2005','DD-MON-Y
YYY')
 11         ) x
 12    where appealcode like '_R%' 
 13    and paydate=maxpd
 14    and idnumber in(4002401,4004594,5406454,5618190)
 15         )
 16   where rn = 1;

  IDNUMBER USER FIRSTPA PAYDATE    PAYAMOUNT   TRANSNUM  LTRANSNUM APPEALCODE
---------- ---- ------- --------- ---------- ---------- ---------- ----------
   4004594 ACDC May2005 17-MAY-05          0   10159410   10086183 DRE0505
   5618190 ACDC Mar2005 11-MAR-05          0    9918802    9845638 DRJ0503

SQL> 


**** 4 TEST IDNUMBERS FROM BASE TABLE*********************

SQL> select idnumber,appealcode,paydate,payamount,transnum
  2  from payment where appealcode like '_R%'
  3  and idnumber=4004594 order by paydate desc;

  IDNUMBER APPEALCODE PAYDATE    PAYAMOUNT   TRANSNUM
---------- ---------- --------- ---------- ----------
   4004594 DRE0505    17-MAY-05          0   10159410
   4004594 GRG0502    08-FEB-05          0    9804766
   4004594 GRF0501    31-JAN-05          0    9750332
   4004594 GRK0410    01-NOV-04          0    9303510
   4004594 GRC0403    19-MAR-04          0    8371053
   4004594 GRA0305    12-AUG-03          0    7543911
   4004594 GRG0303    16-APR-03          0    7209503
   4004594 GRA0301    16-FEB-03          0    7026840




  IDNUMBER APPEALCODE PAYDATE    PAYAMOUNT   TRANSNUM
---------- ---------- --------- ---------- ----------
   4002401 GRG0502    16-MAR-05         25    9862647
   4002401 GRG0502    23-FEB-05          0    9826142
   4002401 GRA0501    19-JAN-05          0    9712904
   4002401 GRF0412    05-JAN-05          0    9630884
   4002401 GRK0410    21-OCT-04          0    9299106
   4002401 GRG0303    03-MAR-03          0    7066423
   4002401 GRA0301    09-FEB-03          0    7022121



  IDNUMBER APPEALCODE PAYDATE    PAYAMOUNT   TRANSNUM
---------- ---------- --------- ---------- ----------
   5406454 DRJ0503    03-MAR-05          0    9887770
   5406454 DRG0502    28-FEB-05          0    9870637



  IDNUMBER APPEALCODE PAYDATE    PAYAMOUNT   TRANSNUM
---------- ---------- --------- ---------- ----------
   5618190 DRJ0503    11-MAR-05          0    9918802
   5618190 DRG0502    28-FEB-05          0    9870090
   5618190 GRG0502    21-FEB-05          0    9824705

December 07, 2005 - 1:32 am UTC

(i would need a create table and insert statements if you really want me to play with it)

but this predicate:

12 where appealcode like '_R%'
13 and paydate=maxpd
14 and idnumber in(4002401,4004594,5406454,5618190)

says

"only keep _R% records that had the max paydate over ALL records for that id"

to satisfy:

I'm looking to select the most recent
record where the appealcode is like '_R%' for 2005.

perhaps you mean:

select *
from (select t.*,
row_number() over (partition by idnumber sort by paydate DESC)rn
from t
where appealcode like '_R%'
and idnumber in ( 1,2,3,4 ) )
where rn = 1;

that says

"find the _R% records"
"break them up by idnumber"
"sort each group from big to small by paydate"
"keep only the first record in each group"

if

to dennis

Oraboy, December 06, 2005 - 6:09 pm UTC

Hi ,

I tried your problem and looks its working fine.

Just a quick question..Did you check the dates are really 2005 and not 0005?

(Create scripts for anyone who wants to try in future)

Create table Test_T
(IdNumber number,
AppealCode Varchar2(100),
PayDate date,
PayAmount NUmber,
TransNum Number)
/

Insert into Test_t values ( 4004594 ,'DRE0505',to_date('17-May-05','DD-MON-RR'),0,10159410 );
Insert into Test_t values ( 4004594 ,'GRG0502',to_date('8-Feb-05','DD-MON-RR'),0,9804766 );
Insert into Test_t values ( 4004594 ,'GRF0501',to_date('31-Jan-05','DD-MON-RR'),0,9750332 );
Insert into Test_t values ( 4004594 ,'GRK0410',to_date('1-Nov-04','DD-MON-RR'),0,9303510 );
Insert into Test_t values ( 4004594 ,'GRC0403',to_date('19-Mar-04','DD-MON-RR'),0,8371053 );
Insert into Test_t values ( 4004594 ,'GRA0305',to_date('12-Aug-03','DD-MON-RR'),0,7543911 );
Insert into Test_t values ( 4004594 ,'GRG0303',to_date('16-Apr-03','DD-MON-RR'),0,7209503 );
Insert into Test_t values ( 4004594 ,'GRA0301',to_date('16-Feb-03','DD-MON-RR'),0,7026840 );
Insert into Test_t values ( 4002401 ,'GRG0502',to_date('16-Mar-05','DD-MON-RR'),25,9862647 );
Insert into Test_t values ( 4002401 ,'GRG0502',to_date('23-Feb-05','DD-MON-RR'),0,9826142 );
Insert into Test_t values ( 4002401 ,'GRA0501',to_date('19-Jan-05','DD-MON-RR'),0,9712904 );
Insert into Test_t values ( 4002401 ,'GRF0412',to_date('5-Jan-05','DD-MON-RR'),0,9630884 );
Insert into Test_t values ( 4002401 ,'GRK0410',to_date('21-Oct-04','DD-MON-RR'),0,9299106 );
Insert into Test_t values ( 4002401 ,'GRG0303',to_date('3-Mar-03','DD-MON-RR'),0,7066423 );
Insert into Test_t values ( 4002401 ,'GRA0301',to_date('9-Feb-03','DD-MON-RR'),0,7022121 );
Insert into Test_t values ( 5406454 ,'DRJ0503',to_date('3-Mar-05','DD-MON-RR'),0,9887770 );
Insert into Test_t values ( 5406454 ,'DRG0502',to_date('28-Feb-05','DD-MON-RR'),0,9870637 );
Insert into Test_t values ( 5618190 ,'DRJ0503',to_date('11-Mar-05','DD-MON-RR'),0,9918802 );
Insert into Test_t values ( 5618190 ,'DRG0502',to_date('28-Feb-05','DD-MON-RR'),0,9870090 );
Insert into Test_t values ( 5618190 ,'GRG0502',to_date('21-Feb-05','DD-MON-RR'),0,9824705 );

--since the other columns are not relevant , I used dummy values in your Select statement

s61>l
1 select
2 idnumber,
3 usercode1,
4 substr(mon_raw,1,1)||lower(substr(mon_raw,2))||'2005' firstpaycode,
5 paydate,
6 payamount,
7 transnum,
8 ltransnum,
9 appealcode
10 from (
11 select x.*,
12 row_number() over (partition by idnumber order by payamount desc, transnum desc) rn
13 from (
14 select
15 idnumber,1 usercode1,to_char(paydate,'MON') mon_raw, paydate,
16 payamount, transnum,transnum ltransnum,appealcode,
17 max(paydate) over (partition by idnumber) maxpd
18 from Test_t
19 where paydate between to_date('01-JAN-2005','DD-MON-YYYY')
20 and to_date('31-OCT-2005','DD-MON-YYYY')
21 ) x
22 where appealcode like '_R%'
23 and paydate=maxpd
24 and idnumber in(4002401,4004594,5406454,5618190)
25 )
26* where rn = 1

s61>/

IDNUMBER USERCODE1 FIRSTPA PAYDATE PAYAMOUNT TRANSNUM LTRANSNUM APPEALCODE
---------- ---------- ------- --------- ---------- ---------- ---------- -----------
4002401 1 Mar2005 16-MAR-05 25 9862647 9862647 GRG0502
4004594 1 May2005 17-MAY-05 0 10159410 10159410 DRE0505
5406454 1 Mar2005 03-MAR-05 0 9887770 9887770 DRJ0503
5618190 1 Mar2005 11-MAR-05 0 9918802 9918802 DRJ0503

--added the other two columns
s61>alter table test_t add (usercode1 varchar2(100),ltransnum varchar2(100));

Table altered.

s61>update test_t set usercode1 = chr(65+ mod(rownum,3)), ltransnum=transnum+rownum;

20 rows updated.

S61> @<<ursql.txt>>
IDNUMBER USERCODE1 FIRSTPA PAYDATE
---------- ---------------------------------------------------------------------------------------------------- ------- ---------
4002401 A Mar2005 16-MAR-05
4004594 B May2005 17-MAY-05
5406454 B Mar2005 03-MAR-05
5618190 A Mar2005 11-MAR-05

-- this is just a guess on why the other two numbers didnt
--show up in your result

-- updating 2005 to 05
s61>update test_t set paydate=add_months(paydate,-(2005*12)) where idnumber=5406454
2 /

2 rows updated.

s61>update test_t set paydate=add_months(paydate,-(2005*12)) where idnumber=4002401
2 /

7 rows updated.

s61>l
1 select
2 idnumber,
3 usercode1,
4 substr(mon_raw,1,1)||lower(substr(mon_raw,2))||'2005' firstpaycode,
5 paydate,
6 payamount,
7 transnum,
8 ltransnum,
9 appealcode
10 from (
11 select x.*,
12 row_number() over (partition by idnumber order by payamount desc, transnum desc) rn
13 from (
14 select
15 idnumber,usercode1,to_char(paydate,'MON') mon_raw, paydate,
16 payamount, transnum, ltransnum,appealcode,
17 max(paydate) over (partition by idnumber) maxpd
18 from Test_t
19 where paydate between to_date('01-JAN-2005','DD-MON-YYYY')
20 and to_date('31-OCT-2005','DD-MON-YYYY')
21 ) x
22 where appealcode like '_R%'
23 and paydate=maxpd
24 and idnumber in(4002401,4004594,5406454,5618190)
25 )
26* where rn = 1
s61>/

IDNUMBER US FIRSTPA PAYDATE PAYAMOUNT TRANSNUM LTRANSNUM APPEALCODE
---------- -- ------- --------- ---------- ---------- ---------- -------------------------------------
4004594 B May2005 17-MAY-05 0 10159410 10159411 DRE0505
5618190 A Mar2005 11-MAR-05 0 9918802 9918820 DRJ0503

-- same as what you see

Thanks Tom and Oraboy

denni50, December 07, 2005 - 8:37 am UTC

Oraboy...you brought up a good possibility..although
the data gets posted through canned software...users
are responsible for creating the batch headers before
posting batches and what may have happened is a user
inadvertently inserted year 0005 instead of '2005'
for a particular batch that included those idnumbers
I am testing with here. It's happened before.

thanks for that helpful tip!

:~)

Oraboy and Tom

denni50, December 07, 2005 - 8:57 am UTC

Oraboy:
it was not the year, I did testing using '0005' to see
if those two records would output results and it did not.

I changed the script based on logic that Tom suggested
and it worked...see changes below.

thanks Oraboy for your input and help.

SQL> select idnumber,usercode1,substr(mon_raw,1,1)||lower(substr(mon_raw,2))||'2005' firstpaycode
  2  paydate,payamount,transnum,ltransnum,appealcode
  3    from (
  4  select x.*, row_number() over (partition by idnumber order by paydate desc,payamount desc, t
snum desc) rn 
  5    from (
  6  select idnumber,usercode1,to_char(paydate,'MON') mon_raw, paydate,
  7  payamount, transnum,ltransnum,appealcode
  8  --max(paydate) over (partition by idnumber) maxpd
  9         from payment
 10   where paydate between to_date('01-JAN-2005','DD-MON-YYYY') and to_date('31-OCT-2005','DD-MO
YYY')
 11         ) x
 12    where appealcode like '_R%' 
 13    --and paydate=maxpd
 14    and idnumber in(4002401,4004594,5406454,5618190)
 15         )
 16   where rn = 1;

  IDNUMBER USER FIRSTPA PAYDATE    PAYAMOUNT   TRANSNUM  LTRANSNUM APPEALCODE
---------- ---- ------- --------- ---------- ---------- ---------- ----------
   4002401 ACGA Mar2005 16-MAR-05         25    9862647    9789477 GRG0502
   4004594 ACDC May2005 17-MAY-05          0   10159410   10086183 DRE0505
   5406454 ACDC Mar2005 03-MAR-05          0    9887770    9814606 DRJ0503
   5618190 ACDC Mar2005 11-MAR-05          0    9918802    9845638 DRJ0503

SQL>

anyway to do a dynamic lag?

Ryan, December 07, 2005 - 10:27 pm UTC

Is it possible to use lag, but you don't know how many rows you want to go back?
create table history (
history_id number,
history_sequence number,
history_status varchar2(20),
history_balance number);
insert into history(1,123,'HISTORY 1',10);
insert into history(1,128,'PROCESSED',0);
insert into history(1,130,'PROCESSED',0);
insert into history(1,131,'HISTORY 8',15);
insert into history(1,145,'PROCESSED',0);
for each history_id ordered by history_sequence
loop
if status = 'PROCESSED' then
history_balance = the history_balance of the last record where status != 'PROCESSED'
end if;
end loop;
Typically with lag you have to state how many rows you are looking back, in this case my discriminator is based on the value in the status field?
After this is run, I expect the values to be
1,123,'HISTORY 1',10
1,128,'PROCESSED',10
1,130,'PROCESSED',10
1,131,'HISTORY 8',15
1,145,'PROCESSED',15
I can do this with pl/sql. I am trying to figure out how to do this with straight sql.

December 08, 2005 - 2:03 am UTC

last_value with ignore nulls in 10g, or to_number(substr(max in 9i and before can be used....


ops$tkyte@ORA10GR2> select history_id, history_sequence, history_status, history_balance,
  2         last_value(
  3         case when history_status <> 'PROCESSED'
  4                  then history_balance
  5                  end IGNORE NULLS ) over (order by history_sequence ) last_hb
  6    from history
  7  /

HISTORY_ID HISTORY_SEQUENCE HISTORY_STATUS       HISTORY_BALANCE    LAST_HB
---------- ---------------- -------------------- --------------- ----------
         1              123 HISTORY 1                         10         10
         1              128 PROCESSED                          0         10
         1              130 PROCESSED                          0         10
         1              131 HISTORY 8                         15         15
         1              145 PROCESSED                          0         15

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select history_id, history_sequence, history_status, history_balance,
  2         to_number( substr( max(
  3         case when history_status <> 'PROCESSED'
  4                  then to_char(history_sequence,'fm0000000000' ) || history_balance
  5                  end ) over (order by history_sequence ), 11 ) ) last_hb
  6    from history
  7  /

HISTORY_ID HISTORY_SEQUENCE HISTORY_STATUS       HISTORY_BALANCE    LAST_HB
---------- ---------------- -------------------- --------------- ----------
         1              123 HISTORY 1                         10         10
         1              128 PROCESSED                          0         10
         1              130 PROCESSED                          0         10
         1              131 HISTORY 8                         15         15
         1              145 PROCESSED                          0         15

Query

Mark, January 18, 2006 - 5:02 pm UTC

Hi Tom,

Given a Table:
PK_ID NUMBER (PK)
CLASS_ID NUMBER
MY_DATE DATE

I'd like to develop an output like:

CLASS_ID W1 W2 W3...

where W1, W2, W3... are 'weeks' from SYSDATE, using MY_DATE, holding the counts of the CLASS_ID for that row.

How could I do that?

Thanks!

About to read Ch. 12 in Expert one-on-one...

January 19, 2006 - 12:28 pm UTC

if

a) you have a finite number of weeks (eg: a sql query has N columns at parse time, unless you know what N is...)

b) an example <<<=== creates/inserts

we could play with this.

ok, here we go

Mark, January 20, 2006 - 1:31 pm UTC

drop table cts_temp
/
create table cts_temp
( class_id number,
cts_date date)
/
insert into cts_temp
select round(dbms_random.value(1, 20)), trunc(created) from user_objects
/
Output sought:

CLASS_ID W1 W2 W3 W4 W5 W6 W7 W8 W9 W10+
----------------------------------------
1 1 3 7 5 1 0 0 0 0 12
2 1 5 1 0 0 0 5 3 4 6
3 1 0 0 9 0 1 1 5 1 10
...

where W# = # of Weeks away from current date
therefore, W1 is within 7 days of today, W10+ everything 10 weeks and older. These numbers are are Counts of records.

I have done this in the past with DECODE statements, but am looking for a more efficient way to do this using Analytics.

January 20, 2006 - 2:47 pm UTC

but you will not want to use analytics since you NEED TO AGGREGATE.

decode (or case) is the correct approach to this problem, keep using that.

SQL Query

Parag J Patankar, January 23, 2006 - 7:26 am UTC

Hi Tom,

Wish you very happy new year 2006. I have a table 

create table t ( a number(5), b number(6), c number(1), d varchar2(8), e number(10));

insert into t values ( 09009, 1000, 1, 'RIS00001', 100);
insert into t values ( 09009, 1000, 0, 'RIS00001', 200);
insert into t values ( 09009, 1000, 0, 'RIS00001', 300);
insert into t values ( 09009, 1000, 2, 'RIS00001', 400)

insert into t values(09009, 5000, 2, 'BIC77777', 100);
insert into t values(09009, 5000, 2, 'BIC77777', 100);

insert into t values(09009, 6000, 0, 'DIG00077', 100);
insert into t values(09009, 6000, 0, 'DIG00077', 200);
commit;

17:33:38 SQL> select * from t;

         A          B          C D                 E
---------- ---------- ---------- -------- ----------
      9009       1000          1 RIS00001        100
      9009       1000          0 RIS00001        200
      9009       1000          0 RIS00001        300
      9009       5000          2 BIC77777        100
      9009       5000          2 BIC77777        100
      9009       6000          0 DIG00077        100
      9009       6000          0 DIG00077        200

7 rows selected.

In Column "C" values can be 0, 1, 2. 

Now I want to select only those set records for column c value is not 1 for the combination of same a, b, and d.

For e.g I want output like 

      9009       5000          2 BIC77777        100
      9009       5000          2 BIC77777        100
      9009       6000          0 DIG00077        100
      9009       6000          0 DIG00077        200

RIS00001 records should not appear because for column c value 1 once appeared for 09009 10000. 

How can I do this in most efficient way ?

Currently it is very large table and joined to few tables in a query.

best regards
pjp

January 23, 2006 - 10:34 am UTC

for big "all rows" I would go analytics

ops$tkyte@ORA9IR2> select *
  2    from (
  3  select t.*,
  4         max( case when c = 1 then c end ) over (partition by a, b, d ) c_one
  5    from t
  6        )
  7   where c_one is null
  8  /

         A          B          C D                 E      C_ONE
---------- ---------- ---------- -------- ---------- ----------
      9009       5000          2 BIC77777        100
      9009       5000          2 BIC77777        100
      9009       6000          0 DIG00077        100
      9009       6000          0 DIG00077        200


for possibly getting "first row as fast as I can", I might opt for not exists or not in


ops$tkyte@ORA9IR2> select *
  2    from t
  3   where (a,b,d) not in (select a,b,d from t where c = 1 and a is not null and b is not null and d is
  4   not null )
  5  /

         A          B          C D                 E
---------- ---------- ---------- -------- ----------
      9009       5000          2 BIC77777        100
      9009       5000          2 BIC77777        100
      9009       6000          0 DIG00077        100
      9009       6000          0 DIG00077        200

ops$tkyte@ORA9IR2> select *
  2    from t
  3   where not exists (select null from t t2
  4                      where t2.a = t.a and t2.b = t.b and t2.d = t.d and t2.c = 1 )
  5  /

         A          B          C D                 E
---------- ---------- ---------- -------- ----------
      9009       5000          2 BIC77777        100
      9009       5000          2 BIC77777        100
      9009       6000          0 DIG00077        100
      9009       6000          0 DIG00077        200

Eliminating distinct rows

Avishay, February 20, 2006 - 8:48 am UTC

Hello Tom,

I have a view that UNION ALL 3 tables
here is the result for a select * on that view(f_emp_v):

PER_ID C_A_ID WORK_H FTE CALC_FTE IN_USE_FROM IN_USE_UNTIL
111111 20 1/1/2005 5/12/2005
111111 123456 1/23/2005 1/24/2005
111111 123459 1/25/2005
111111 60 75 5/12/2005 5/13/2005
111111 30 5/13/2005 1/1/2006
111111 85 55 5/13/2005

Using the following SQL and analytical functions I fillied the NULL's, and created the IN_USE_FROM,IN_USE_UNTIL columns in a different way the IN_USE_UNTIL recieves the next date of the IN_USE_FROM ASC.
Here is the SQL:

SELECT Person_Id,
Substr(MAX(Decode(Cost_Account_Id,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') ||
Cost_Account_Id))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Cost_Account_Id,
Substr(MAX(Decode(Cost_Account_Code,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') ||
Cost_Account_Code))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Cost_Account_Code,
Substr(MAX(Decode(Working_Hours,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') ||
Working_Hours))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Working_Hours,
Substr(MAX(Decode(Fte,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') || Fte))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Fte,
Substr(MAX(Decode(Calc_Fte_Type,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') ||
Calc_Fte_Type))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Calc_Fte_Type,
In_Use_From,
Lead(t.In_Use_From, 1, In_Use_Until) Over(PARTITION BY t.Person_Id ORDER BY t.In_Use_From ASC) In_Use_Until
FROM Fact_Employee_List_v t
WHERE person_id = 111111
ORDER BY Person_Id,
In_Use_From;
Here are the results:

PER_ID C_A_ID WORK_H FTE CALC_FTE IN_USE_FROM IN_USE_UNTIL
111111 20 1/1/2005 1/23/2005
111111 123456 20 1/23/2005 1/25/2005
111111 123459 20 1/25/2005 5/12/2005
111111 123459 20 60 75 5/12/2005 5/13/2005
111111 123459 30 85 55 5/13/2005 5/13/2005
111111 123459 30 85 55 5/13/2005

The table "Fills" in accordance with the dates.
As you can see the last 2 rows except for the IN_USE_UNTIL are identical.
How can I get 'Rid' of the row with IN_USE_UNTIL NOT NULL ?
Is there a way to do it in the above select?
Maybe change the way the analytical function for IN_US_UNTIL ?

Your remarks will be appreciated
Best Regards,
Avishay

Update using LAG

Zahir M, February 21, 2006 - 2:16 pm UTC

SQL> desc tab1
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 MEMBER_ID                                          NUMBER(10)
 START_DATE                                         DATE
 STOP_DATE                                          DATE

SQL> Select * from tab1 where member_id = 125;

 MEMBER_ID START_DAT STOP_DATE
---------- --------- ---------
       125 23-OCT-00
       125 05-MAY-04
       125 10-MAY-04
       125 30-MAR-05

SQL> Select wh.* , lag(start_date) over ( partition by member_id order by start_date asc)  - 1 new_s
top_date
  2   from tab1 wh where member_id = 125
  3  
SQL> /

 MEMBER_ID START_DAT STOP_DATE NEW_STOP_
---------- --------- --------- ---------
       125 23-OCT-00
       125 05-MAY-04           22-OCT-00
       125 10-MAY-04           04-MAY-04
       125 30-MAR-05           09-MAY-04

SQL> Update tab1 a
  2  set a.stop_date = ( SElect 
  3        lag(b.start_date) 
  4        over ( partition by b.member_id order by b.start_date asc)  - 1 new_stop_date 
  5               from tab1 b where a.member_id = b.member_id and a.rowid = b.rowid ) ;

4 rows updated.

SQL>  Select * from tab1 where member_id = 125;

 MEMBER_ID START_DAT STOP_DATE
---------- --------- ---------
       125 23-OCT-00
       125 05-MAY-04
       125 10-MAY-04
       125 30-MAR-05


SQL>  select * from v$version
  2  /

BANNER
------------------------------------------------------------
Oracle8i Enterprise Edition Release 8.1.7.4.1 - Production
PL/SQL Release 8.1.7.4.0 - Production
CORE    8.1.7.2.1       Production
TNS for 32-bit Windows: Version 8.1.7.4.0 - Production
NLSRTL Version 3.4.1.0.0 - Production

I am trying to use the LAG analytic function for an update statement . It does not seem to work.

Still , the column STOP_DATE is not updated with the new values( ie from the lag ).

Please advise.

February 22, 2006 - 8:16 am UTC

Oh, it is absolutely working!!! The problem is - where clause is applied

where a.member_id = b.member_id and a.rowid = b.rowid

AND THEN analytic is performed - of course, there is only one row however, so, well - the "previous" row isn't there anymore.



ops$tkyte@ORA10GR2> create table emp
  2  as
  3  select job, hiredate, to_date(null) last_hiredate
  4    from scott.emp;

Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> merge into emp
  2  using ( select rowid rid, lag(hiredate) over (partition by job order by hiredate) last_hiredate
  3            from emp) e2
  4  on ( emp.rowid = e2.rid )
  5  when matched then update set last_hiredate = e2.last_hiredate
  6  -- when not matched then insert (job) values (NULL)
  7  /

14 rows merged.

ops$tkyte@ORA10GR2> select * from emp order by job, hiredate;

JOB       HIREDATE  LAST_HIRE
--------- --------- ---------
ANALYST   03-DEC-81
ANALYST   09-DEC-82 03-DEC-81
CLERK     17-DEC-80
CLERK     03-DEC-81 17-DEC-80
CLERK     23-JAN-82 03-DEC-81
CLERK     12-JAN-83 23-JAN-82
MANAGER   02-APR-81
MANAGER   01-MAY-81 02-APR-81
MANAGER   09-JUN-81 01-MAY-81
PRESIDENT 17-NOV-81
SALESMAN  20-FEB-81
SALESMAN  22-FEB-81 20-FEB-81
SALESMAN  08-SEP-81 22-FEB-81
SALESMAN  28-SEP-81 08-SEP-81

14 rows selected.


In 9i, you need the "insert" part - but it'll never happen.


(but really, this looks like a bad idea, you'll have to just keep doing this over and over and over)


In 8i, you'll likely want to "two step this", create global temporary table, insert and update the join.

LEAD UPDATE

Zahir M, February 22, 2006 - 10:45 am UTC

Tom ,

I did the "two steps process" in 8i as you have suggested .
( except that I created a another table - not a global temp table ) . But it takes long duration to perform the update .

130874 rows updated.

Elapsed: 07:469:28146.33

Statistics
----------------------------------------------------------
0 recursive calls
4455554 db block gets
105093306 consistent gets
14702345 physical reads
30683572 redo size
852 bytes sent via SQL*Net to client
670 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
130874 rows processed

February 22, 2006 - 11:05 am UTC

no example :(


should only take seconds for so few rows.  



ops$xp8i\tkyte@ORA8IR3W> alter table big_table add last_created date;

Table altered.

Elapsed: 00:00:00.63
ops$xp8i\tkyte@ORA8IR3W> create table t
  2  ( rid rowid primary key, last_created date );

Table created.

Elapsed: 00:00:00.78
ops$xp8i\tkyte@ORA8IR3W>
ops$xp8i\tkyte@ORA8IR3W> insert into t
  2  select rowid, lag(created) over (partition by object_type order by created)
  3    from big_table;

130874 rows created.

Elapsed: 00:00:36.03
ops$xp8i\tkyte@ORA8IR3W> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

Elapsed: 00:00:06.38
ops$xp8i\tkyte@ORA8IR3W>
ops$xp8i\tkyte@ORA8IR3W> update (select a.last_created new_last_created,
  2                 b.last_created old_last_created
  3                    from t a, big_table b
  4                   where a.rid = b.rowid )
  5     set old_last_created = new_last_created;

130874 rows updated.

Elapsed: 00:00:32.41



Maybe you are getting blocked by other users, since you are updating every row, lock the table first - then update it.

LAG Update

Zahir M, February 22, 2006 - 11:32 am UTC

Thanks , Tom.

I re-ran the update after locking the table.
It took only 47 seconds for the update operation.

I guess it was locked by some other users / processes.

Thanks again !

Reposting

Avishay, February 23, 2006 - 4:28 am UTC

Hello Tom,

I have a view that UNION ALL 3 tables
here is the result for a select * on that view(f_emp_v):

PER_ID C_A_ID WORK_H FTE CALC_FTE IN_USE_FROM IN_USE_UNTIL
111111 20 1/1/2005 5/12/2005
111111 123456 1/23/2005 1/24/2005
111111 123459 1/25/2005
111111 60 75 5/12/2005 5/13/2005
111111 30 5/13/2005 1/1/2006
111111 85 55 5/13/2005

Using the following SQL and analytical functions I fillied the NULL's, and
created the IN_USE_FROM,IN_USE_UNTIL columns in a different way the IN_USE_UNTIL
recieves the next date of the IN_USE_FROM ASC.
Here is the SQL:

SELECT Person_Id,
Substr(MAX(Decode(Cost_Account_Id,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') ||
Cost_Account_Id))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Cost_Account_Id,
Substr(MAX(Decode(Cost_Account_Code,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') ||
Cost_Account_Code))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Cost_Account_Code,
Substr(MAX(Decode(Working_Hours,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') ||
Working_Hours))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Working_Hours,
Substr(MAX(Decode(Fte,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') || Fte))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Fte,
Substr(MAX(Decode(Calc_Fte_Type,
NULL,
NULL,
To_Char(In_Use_From, 'yyyymmddhh24miss') ||
Calc_Fte_Type))
Over(PARTITION BY Person_Id ORDER BY In_Use_From),
15) Calc_Fte_Type,
In_Use_From,
Lead(t.In_Use_From, 1, In_Use_Until) Over(PARTITION BY t.Person_Id ORDER
BY t.In_Use_From ASC) In_Use_Until
FROM Fact_Employee_List_v t
WHERE person_id = 111111
ORDER BY Person_Id,
In_Use_From;
Here are the results:

PER_ID C_A_ID WORK_H FTE CALC_FTE IN_USE_FROM IN_USE_UNTIL
111111 20 1/1/2005 1/23/2005
111111 123456 20 1/23/2005 1/25/2005
111111 123459 20 1/25/2005 5/12/2005
111111 123459 20 60 75 5/12/2005 5/13/2005
111111 123459 30 85 55 5/13/2005 5/13/2005
111111 123459 30 85 55 5/13/2005

The table "Fills" in accordance with the dates.
As you can see the last 2 rows except for the IN_USE_UNTIL are identical.
How can I get 'Rid' of the row with IN_USE_UNTIL NOT NULL ?
Is there a way to do it in the above select?
Maybe change the way the analytical function for IN_US_UNTIL ?

Your remarks will be appreciated
Best Regards,
Avishay

February 23, 2006 - 8:07 am UTC

why did you repost it.

You must have seen the page you used to post this. did you *READ* that page? I ignore all things that look like this - you have the classic example of what I ignore.

slow down, read the page you used to post this repost.

Analytics

Mark, February 23, 2006 - 10:42 am UTC

Hi Tom,

Oracle 9i latest and greatest...

At a loss for this one. Don't know where to start...

I have a table:
Create Table MY_NUMS
(N1 NUMBER, N2 NUMBER, N3 NUMBER, N4 NUMBER, N5 NUMBER, N6 NUMBER)
/

I populate each column with random values between 1-46:

INSERT INTO my_nums
SELECT TRUNC(DBMS_RANDOM.VALUE(1, 47), 0)
,TRUNC(DBMS_RANDOM.VALUE(1, 47), 0)
,TRUNC(DBMS_RANDOM.VALUE(1, 47), 0)
,TRUNC(DBMS_RANDOM.VALUE(1, 47), 0)
,TRUNC(DBMS_RANDOM.VALUE(1, 47), 0)
,TRUNC(DBMS_RANDOM.VALUE(1, 47), 0)
FROM user_objects;
/

and I get data similar to this:
HT4:DEVDB001101007:10:10 - DEV> select * from my_nums where rownum <= 10
2 /

N1 N2 N3 N4 N5 N6
---------- ---------- ---------- ---------- ---------- ----------
6 13 11 6 21 36
33 23 45 11 24 32
36 19 43 19 8 44
11 39 9 14 35 25
42 8 29 15 26 4
1 25 12 41 21 10
20 6 43 29 39 28
16 18 36 15 38 26
16 33 15 16 40 18
17 1 20 39 20 46

10 rows selected.

And my question is how do I get counts of how many times each number appears with every other number?

Output might look like this:

MY_NUMBER OTHER_NUM COUNT(*)
--------- --------- --------
16 15 3
...

This says that the number 16 appeared 3 times with the number 15 in the same row.

I'd have to check each column (N1 - N6) against each of the other 5 columns for each value and sum them up...

Regards,
Mark

February 23, 2006 - 10:47 am UTC

are my_number and other_num INPUTS into your query or what? where did 16 and 15 come from.

Analytics

Mark, February 23, 2006 - 12:28 pm UTC

Oh, ok.

MY_NUMBER is the number I am counting the combinations with all OTHER_NUMBERs.

Output would ideally look like this:

MY_NUMBER OTHER_NUMBER COUNT(*)
--------- ------------ --------
1 2 5
1 3 6
1 4 2
...
2 3 4
2 4 7
2 5 8
...

etc., all the way down to
MY_NUMBER OTHER_NUMBER COUNT(*)
--------- ------------ --------
45 46 7

There are no INPUTS to the query as it calculates counts for all combinations.
This query somewhat does it, but I feel there is a way better way than doing it with decodes:

SELECT n1
,SUM(n2_2 + n3_2 + n4_2 + n5_2 + n6_5) two
,SUM(n2_3 + n3_3 + n4_3 + n5_3 + n6_5) three
,SUM(n2_4 + n3_4 + n4_4 + n5_4 + n6_5) four
,SUM(n2_5 + n3_5 + n4_5 + n5_5 + n6_5) five
,SUM(n2_46 + n3_46 + n4_46 + n5_46 + n6_46) fortysix
FROM (SELECT n1
,DECODE(n2, 2, 1, 0) n2_2
,DECODE(n2, 3, 1, 0) n2_3
,DECODE(n2, 4, 1, 0) n2_4
,DECODE(n2, 5, 1, 0) n2_5
/* n2_6 through n2_45 here ... */
,DECODE(n2, 46, 1, 0) n2_46
,DECODE(n3, 2, 1, 0) n3_2
,DECODE(n3, 3, 1, 0) n3_3
,DECODE(n3, 4, 1, 0) n3_4
,DECODE(n3, 5, 1, 0) n3_5
,DECODE(n3, 46, 1, 0) n3_46
,DECODE(n4, 2, 1, 0) n4_2
,DECODE(n4, 3, 1, 0) n4_3
,DECODE(n4, 4, 1, 0) n4_4
,DECODE(n4, 5, 1, 0) n4_5
,DECODE(n4, 46, 1, 0) n4_46
,DECODE(n5, 2, 1, 0) n5_2
,DECODE(n5, 3, 1, 0) n5_3
,DECODE(n5, 4, 1, 0) n5_4
,DECODE(n5, 5, 1, 0) n5_5
,DECODE(n5, 46, 1, 0) n5_46
,DECODE(n6, 2, 1, 0) n6_2
,DECODE(n6, 3, 1, 0) n6_3
,DECODE(n6, 4, 1, 0) n6_4
,DECODE(n6, 5, 1, 0) n6_5
,DECODE(n6, 46, 1, 0) n6_46
FROM my_nums)
GROUP BY n1
/
N1 TWO THREE FOUR FIVE FORTYSIX
---------- ---------- ---------- ---------- ---------- ----------
1 27 10 14 19 16
2 10 12 12 22 21
3 13 15 13 12 16
4 25 15 24 13 23
5 19 15 16 14 19
6 16 15 13 8 30
7 18 13 13 19 15
8 14 14 12 18 14
9 16 16 18 16 22
10 14 12 16 14 19
...

for the entire matrix of number.

This output says that the number 2 in columns N2,N3,N4,N5,N6 appeared 27 times with the number 1. The number 3 in columns N2 - N6 appeared 10 times with the number 1, etc.

A sort of 'how many times does this number appear with all thes other numbers in the same row" query...

Thanks.

February 23, 2006 - 7:01 pm UTC

I was beaten to the punch on this one ;) see below

A solution

Michel Cadot, February 23, 2006 - 3:10 pm UTC

Hi Mark,

Here's a solution to your issue.
I changed the values to be able to post the whole example but the query does not care about what's inside the table.
I let all the steps but i think you can compact the query.

SQL> INSERT INTO my_nums
  2     SELECT TRUNC(DBMS_RANDOM.VALUE(1, 9), 0)
  3           ,TRUNC(DBMS_RANDOM.VALUE(1, 9), 0)
  4           ,TRUNC(DBMS_RANDOM.VALUE(1, 9), 0)
  5           ,TRUNC(DBMS_RANDOM.VALUE(1, 9), 0)
  6           ,TRUNC(DBMS_RANDOM.VALUE(1, 9), 0)
  7           ,TRUNC(DBMS_RANDOM.VALUE(1, 9), 0)
  8     FROM user_views
  9     where rownum <= 3
 10  /

3 rows created.

SQL> commit;

Commit complete.

SQL> select * from my_nums;
        N1         N2         N3         N4         N5         N6
---------- ---------- ---------- ---------- ---------- ----------
         7          4          6          8          2          1
         1          2          8          8          7          2
         6          3          5          1          4          8

3 rows selected.

SQL> with 
  2    a as ( select row_number () over (order by n1) rn,
  3                  n1, n2, n3, n4, n5, n6
  4           from my_nums
  5         ),
  6    b as ( select rn, n1, n2, n3, n4, n5, n6, part
  7           from a, 
  8                (select rownum part from dual connect by level <= 15)
  9         ),
 10    c as ( select distinct rn, 
 11                           case part
 12                           when  1 then least(n1,n2)
 13                           when  2 then least(n1,n3)
 14                           when  3 then least(n1,n4)
 15                           when  4 then least(n1,n5)
 16                           when  5 then least(n1,n6)
 17                           when  6 then least(n2,n3)
 18                           when  7 then least(n2,n4)
 19                           when  8 then least(n2,n5)
 20                           when  9 then least(n2,n6)
 21                           when 10 then least(n3,n4)
 22                           when 11 then least(n3,n5)
 23                           when 12 then least(n3,n6)
 24                           when 13 then least(n4,n5)
 25                           when 14 then least(n4,n6)
 26                           when 15 then least(n5,n6)
 27                           end v1,
 28                           case part
 29                           when  1 then greatest(n1,n2)
 30                           when  2 then greatest(n1,n3)
 31                           when  3 then greatest(n1,n4)
 32                           when  4 then greatest(n1,n5)
 33                           when  5 then greatest(n1,n6)
 34                           when  6 then greatest(n2,n3)
 35                           when  7 then greatest(n2,n4)
 36                           when  8 then greatest(n2,n5)
 37                           when  9 then greatest(n2,n6)
 38                           when 10 then greatest(n3,n4)
 39                           when 11 then greatest(n3,n5)
 40                           when 12 then greatest(n3,n6)
 41                           when 13 then greatest(n4,n5)
 42                           when 14 then greatest(n4,n6)
 43                           when 15 then greatest(n5,n6)
 44                           end v2
 45           from b
 46         )
 47  select v1, v2, count(*) nb
 48  from c
 49  where v1 != v2
 50  group by v1, v2
 51  order by v1, v2
 52  /
        V1         V2         NB
---------- ---------- ----------
         1          2          2
         1          3          1
         1          4          2
         1          5          1
         1          6          2
         1          7          2
         1          8          3
         2          4          1
         2          6          1
         2          7          2
         2          8          2
         3          4          1
         3          5          1
         3          6          1
         3          8          1
         4          5          1
         4          6          2
         4          7          1
         4          8          2
         5          6          1
         5          8          1
         6          7          1
         6          8          2
         7          8          2

24 rows selected.

Regards
Michel

Excellent

Mark, February 23, 2006 - 3:35 pm UTC

Thanks

To Mark ... COUNT of what?

A reader, February 23, 2006 - 4:07 pm UTC

<quote>And my question is how do I get counts of how many times each number appears with every other number?</quote>

flip@FLOP> select * from my_nums;

N1 N2 N3 N4 N5 N6
---------- ---------- ---------- ---------- ---------- ----------
1 2 2 2 2 2

flip@FLOP> @michel_qry

V1 V2 NB
---------- ---------- ----------
1 2 1

Should the answer here be:
A. "1" [as in the number of rows where 1 is together with 2]
or
B. "5"
?

To A reader

Michel Cadot, February 23, 2006 - 4:46 pm UTC

The "distinct" in c definition is there to count only 1 per each row.
If you want to count all occurrences then remove "distinct".

Regards
Michel

Analytics

Mark, February 23, 2006 - 5:30 pm UTC

The answer to that should be "5" as 1 and 2 appear as a combination 5 times.

Regards,
Mark

Difficulty with min(id) over partition by

Ken, February 27, 2006 - 11:38 pm UTC

Hi Tom:

We have a table with over 4 million records. A bug in the java application has created duplicate records in this table. I am brought in to dedupe the records based on certain criteria. For every duplicate record set, we need to keep the record with the smallest value in the ID field and delete the rest.

I am close to identifying the IDs to be deleted but the query is not identifying the records with min(id) consistently. Here is the table structure (created for this test case -- no index or anything) and query.

Thanks in advance for your help.

ken@DEV9206> desc DEDUPE_TEST_01
Name Null? Type
----------------------------------- -------- --------
EQUIP_ID NOT NULL NUMBER
EQUIP_TYPE_ID NOT NULL NUMBER
TYPE_ID NOT NULL NUMBER
CREATED NOT NULL DATE
COL_1 NUMBER
COL_0 NUMBER
ID NOT NULL NUMBER
CT NUMBER

ken@DEV9206> l
1 SELECT equip_id, equip_type_id, type_id, TO_CHAR(CREATED, 'DDMMYYYY HH24:MI:SS') created,
2 col_1, col_0, MIN(id) OVER (PARTITION BY equip_id
3 ORDER BY equip_id, equip_type_id, type_id, TO_CHAR(created, 'DDMMYYYY HH24:MI:SS'),
4 col_1, col_0) min_id,
5 id
6 from dedupe_test_01
7* where rownum < 31
ken@DEV9206> /

EQUIP_ID EQUIP_TYPE_ID TYPE_ID CREATED COL_1 COL_0 MIN_ID ID
-------- ------------- ------- ----------------- ----- ----- --------- --------
3011 29 221 30110002 00:00:00 3 77 26445635 26445635
3011 29 221 30110002 00:00:00 3 77 26445651
3011 29 221 30110002 00:00:00 3 86 26445626 26445626
3011 29 221 30110002 00:00:00 3 86 26445653
3011 29 221 30110002 00:00:00 3 112 26445617 26445617
3011 29 221 30110002 00:00:00 3 112 26445620
3011 29 221 30110002 00:00:00 3 125 26445631*
3011 29 221 30110002 00:00:00 3 125 26445641
3551 29 221 11032005 17:01:00 1 186 3209093 3209093
3551 29 221 11032005 17:01:00 1 186 6072228
3551 29 221 11032005 17:01:00 1 186 8894681
3551 29 221 11032005 17:01:00 1 186 3837758
3551 29 221 11032005 17:01:00 1 186 3837738
3551 29 221 11032005 20:44:00 1 190 3209092 3209092
3551 29 221 11032005 20:44:00 1 190 3837757
3551 29 221 11032005 20:44:00 1 190 6072227
3551 29 221 11032005 20:44:00 1 190 8894680
3551 29 221 11032005 20:44:00 1 190 3837737
3551 29 221 11032005 23:00:00 1 227 3209091 3209091
3551 29 221 11032005 23:00:00 1 227 3837736
3551 29 221 30110002 00:00:00 3 112 3209094*
3551 29 221 30110002 00:00:00 3 112 3837739
3551 29 221 30110002 00:00:00 3 112 3837759
3551 29 221 30110002 00:00:00 3 112 6072229
3551 29 221 30110002 00:00:00 3 112 8894682
3551 29 221 30110002 00:00:00 3 118 3209095*
3551 29 221 30110002 00:00:00 3 118 3837740
3551 29 221 30110002 00:00:00 3 118 3837760
3551 29 221 30110002 00:00:00 3 118 6072230
3551 29 221 30110002 00:00:00 3 118 8894683

30 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 WINDOW (SORT)
2 1 COUNT (STOPKEY)
3 2 TABLE ACCESS (FULL) OF 'DEDUPE_TEST_01'

February 28, 2006 - 7:14 am UTC

... A bug in the java application has
created duplicate records in this table. ...

You really meant to say "a horribly flawed implementation whereby the java coders decided to do all DATA LOGIC in the wrongest place in the world - the application has exposed itself. We know there are dozens more of these lying in wait for us"

That is identifying the min id very very very very consistently.

but why are you sorting? If you want the min(id) by equip_id, you should leave out the order by - else you get the minimum ID for the current row and every row in front of it - not over all rows by equip_id.

Don't know why you are using:

TO_CHAR(created, 'DDMMYYYY HH24:MI:SS')

why would you take something incredibly sortable (a DATE) and turn it into an ascii string that doesn't even sort correctly? (it will not sort by date at all, it'll sort as a string, wrong)

But - I don't believe you want the order by - but I cannot really say because you do not tell us what the primary key that should have been in place is?

Believe me - data and business logic belongs in the DB

Ken, February 28, 2006 - 8:32 am UTC

I have been telling the Java folks the same. And I have been preaching about the bind variables as well. The database was designed by someone quite famous in the Java tech community a few years back. The data abstraction has created so much issues and only the application knows what column contains what data based on a value in another field. But they are listening. They have changed updated the code to use preparedStatements and more changes are coming. A tiny victory, thanks in large part to your books and the forum. (The blog too, I must add. But I enjoy the non-tech pieces more since they show the otherside we hardly see here. Keep it up!)

The only reason in using TO_CHAR was to expose the time values but this can be done later. I will remove this and sort and run it again.

It is unbelievable how this app works. Pulling all the data and sorting in java, yikes!

Thank you so much, Tom.

February 28, 2006 - 9:05 am UTC

but dates sort by

century, year, month, day, hour, minute, second

quite naturally!!!!!

By "exposing" the time in this example, you mucked up the sort order!!!

You put DD first - it would put the first of ANY MONTH before the second of ANY MONTH - messing it up.

it should just order by DATE - period. Never order by to_char(date)....

Analytics rock Analytics roll

Alf, February 28, 2006 - 2:11 pm UTC

Hello Tom,

I have four tables proc_event, visit, patient, and event

I need to get a list of patients with all records in the proc_event table for proc_id = 123 and 3456.

Patient proc_id
Patient A 123
3456
Patient B 123
3456

The query below is listing all records from the patient table that have either of the proc_id in the proc_event table.

Would you please direct me how or if I might be able to rewrite this in order to gets the desire listing?

Any information would be greatly appreciated. Thanks.

SELECT p.P_NAME,
p.MRN,
pe.proc_id,
e.date_time
FROM ud.patient p,
ud.proc_event pe,
ud.event e,
ud.visit v
WHERE ( (p.patient_id = v.patient_id)
AND (v.visit_id = e.visit_id)
AND (e.event_id(+) = pe.event_id)
AND (pe.proc_id in('123','3456'))
AND (e.date_time BETWEEN
to_date('01-jan-2005 00:00','dd-mon-yyyy hh24:mi') AND
to_date('20-jan-2005 23:59','dd-mon-yyyy hh24:mi')
)
)
GROUP BY p.P_NAME,
p.mrn,
pe.proc_id,
e.date_time
ORDER BY p.P_NAME ASC,
p.mrn ASC,
pe.proc_id ASC

March 01, 2006 - 7:57 am UTC

...
I have four tables proc_event, visit, patient, and event

I need to get a list of patients with all records in the proc_event table for
proc_id = 123 and 3456.
...

Well given that

a) we don't know how these relate
b) what columns might be available
c) pretty much don't know anything

It is sort of difficult.

It sounds like you simply want to

a) join patients to proc_event (but we don't even know if you CAN!!!!!!!)
b) use an IN

Difficulty with min(id) over partition by -- part deux

Ken, March 01, 2006 - 9:29 am UTC

Hi Tom:

I am still not getting this to group correctly. The date is date, and orader is removed. Could you please let me know what could be causing this? Thanks.

ken@DEV9206> break on min_id skip 1
ken@DEV9206> l
1 SELECT equip_id, equip_type_id, type_id, created,
2 col_1, col_0,
3 ID,
4 MIN(id) OVER (PARTITION BY equip_id
5 ORDER BY equip_type_id,
6 type_id,
7 created,
8 col_1,
9 col_0) min_id
10 FROM dedupe_test_01
11* WHERE ROWNUM < 51
ken@DEV9206> /

EQUIP_ID EQUIP_TYPE_ID TYPE_ID CREATED COL_1 COL_0 ID MIN_ID
---------- ------------- ---------- --------- ---------- ---------- ---------- ----------
3011 29 221 30-NOV-02 3 77 26445635 26445635
3011 29 221 30-NOV-02 3 77 26445651

3011 29 221 30-NOV-02 3 86 26445626 26445626
3011 29 221 30-NOV-02 3 86 26445653

3011 29 221 30-NOV-02 3 112 26445617 26445617
3011 29 221 30-NOV-02 3 112 26445620
3011 29 221 30-NOV-02 3 125 26445631
3011 29 221 30-NOV-02 3 125 26445641

3551 29 221 30-NOV-02 3 112 3209094 3209094
3551 29 221 30-NOV-02 3 112 3837739
3551 29 221 30-NOV-02 3 112 3837759
3551 29 221 30-NOV-02 3 112 6072229
3551 29 221 30-NOV-02 3 112 8894682
3551 29 221 30-NOV-02 3 118 3209095***
3551 29 221 30-NOV-02 3 118 3837740
3551 29 221 30-NOV-02 3 118 3837760
3551 29 221 30-NOV-02 3 118 6072230
3551 29 221 30-NOV-02 3 118 8894683

3551 29 221 11-MAR-05 1 186 3209093 3209093
3551 29 221 11-MAR-05 1 186 3837738
3551 29 221 11-MAR-05 1 186 3837758
3551 29 221 11-MAR-05 1 186 6072228
3551 29 221 11-MAR-05 1 186 8894681

3551 29 221 11-MAR-05 1 190 3209092 3209092
3551 29 221 11-MAR-05 1 190 3837737
3551 29 221 11-MAR-05 1 190 3837757
3551 29 221 11-MAR-05 1 190 6072227
3551 29 221 11-MAR-05 1 190 8894680

3551 29 221 11-MAR-05 1 227 3209091 3209091
3551 29 221 11-MAR-05 1 227 3837736
3551 29 221 11-MAR-05 1 227 3837756
3551 29 221 11-MAR-05 1 227 6072226
3551 29 221 11-MAR-05 1 227 8894679

3551 29 221 12-MAR-05 1 153 3209090 3209090
3551 29 221 12-MAR-05 1 153 3837735
3551 29 221 12-MAR-05 1 153 3837755
3551 29 221 12-MAR-05 1 153 6072225
3551 29 221 12-MAR-05 1 153 8894678
3551 29 221 16-MAR-05 1 109 3837734***
3551 29 221 16-MAR-05 1 109 3837754
3551 29 221 16-MAR-05 1 109 6072224
3551 29 221 16-MAR-05 1 109 8894677
3551 29 221 16-MAR-05 1 128 3837733***
3551 29 221 16-MAR-05 1 128 3837753
3551 29 221 16-MAR-05 1 128 6072223
3551 29 221 16-MAR-05 1 128 8894676
3551 29 221 17-MAR-05 1 181 3837732
3551 29 221 17-MAR-05 1 181 3837752
3551 29 221 17-MAR-05 1 181 6072222
3551 29 221 17-MAR-05 1 181 8894675

50 rows selected.

March 01, 2006 - 9:52 am UTC

I said you don't want the order by at all.

If you are trying to associate the MIN(ID) to every record in a group - you just partition - you do NOT order by (else you get the min id of every record from the current on one up)

I don't know what you are trying to retrieve really since all I have is "non functioning sql" to work with - I don't know what your logic is.

query is doing precisely what you asked - break data up by X, sort by A,B,C and assign min(id) from the current row on up on the group.

I think "no order by" is called for.

Difficulty with min(id) over partition by -- part deux - take 2

Ken, March 01, 2006 - 9:49 am UTC

Please disregard my previous posting. It did not come out right.

I still cannot get this to group correctly. TO_CHAR was removed and order WAS removed. This gave me the following.

1 SELECT rownum, equip_id, equip_type_id, type_id, created,
2 col_1, col_0,
3 ID,
4 MIN(id) OVER (PARTITION BY equip_id) min_id
5 FROM dedupe_test_01
6* WHERE ROWNUM < 51
ken@DEV9206/

ROWNUM EQUIP_ID EQUIP_TYPE_ID TYPE_ID CREATED COL_1 COL_0 ID MIN_ID
------ -------- ------------- ------- --------- ----- ----- -------- --------
1 3011 29 221 30-NOV-02 3 77 26445635 26445617
2 3011 29 221 30-NOV-02 3 77 26445651
3 3011 29 221 30-NOV-02 3 86 26445626
4 3011 29 221 30-NOV-02 3 86 26445653
5 3011 29 221 30-NOV-02 3 112 26445617
6 3011 29 221 30-NOV-02 3 112 26445620
7 3011 29 221 30-NOV-02 3 125 26445631
8 3011 29 221 30-NOV-02 3 125 26445641

9 3551 29 221 30-NOV-02 3 112 3209094 3209090
10 3551 29 221 30-NOV-02 3 112 3837739
11 3551 29 221 30-NOV-02 3 112 3837759
12 3551 29 221 30-NOV-02 3 112 6072229
13 3551 29 221 30-NOV-02 3 112 8894682
14 3551 29 221 30-NOV-02 3 118 3209095
15 3551 29 221 30-NOV-02 3 118 3837740
16 3551 29 221 30-NOV-02 3 118 3837760
17 3551 29 221 30-NOV-02 3 118 6072230
18 3551 29 221 30-NOV-02 3 118 8894683
19 3551 29 221 11-MAR-05 1 186 3209093
20 3551 29 221 11-MAR-05 1 186 3837738
21 3551 29 221 11-MAR-05 1 186 3837758
22 3551 29 221 11-MAR-05 1 186 6072228
23 3551 29 221 11-MAR-05 1 186 8894681
24 3551 29 221 11-MAR-05 1 190 3209092
25 3551 29 221 11-MAR-05 1 190 3837737
26 3551 29 221 11-MAR-05 1 190 3837757
27 3551 29 221 11-MAR-05 1 190 6072227
28 3551 29 221 11-MAR-05 1 190 8894680
29 3551 29 221 11-MAR-05 1 227 3209091
30 3551 29 221 11-MAR-05 1 227 3837736
31 3551 29 221 11-MAR-05 1 227 3837756
32 3551 29 221 11-MAR-05 1 227 6072226
33 3551 29 221 11-MAR-05 1 227 8894679
34 3551 29 221 12-MAR-05 1 153 3209090
35 3551 29 221 12-MAR-05 1 153 3837735
36 3551 29 221 12-MAR-05 1 153 3837755
37 3551 29 221 12-MAR-05 1 153 6072225
38 3551 29 221 12-MAR-05 1 153 8894678
39 3551 29 221 16-MAR-05 1 109 3837734
40 3551 29 221 16-MAR-05 1 109 3837754
41 3551 29 221 16-MAR-05 1 109 6072224
42 3551 29 221 16-MAR-05 1 109 8894677
43 3551 29 221 16-MAR-05 1 128 3837733
44 3551 29 221 16-MAR-05 1 128 3837753
45 3551 29 221 16-MAR-05 1 128 6072223
46 3551 29 221 16-MAR-05 1 128 8894676
47 3551 29 221 17-MAR-05 1 181 3837732
48 3551 29 221 17-MAR-05 1 181 3837752
49 3551 29 221 17-MAR-05 1 181 6072222
50 3551 29 221 17-MAR-05 1 181 8894675

50 rows selected.

I added the ORDER BY back in the PARTITION BY clause. Here's the result:

ken@DEV9206get q_1
1 SELECT rownum, equip_id, equip_type_id, type_id, created,
2 col_1, col_0,
3 ID,
4 MIN(id) OVER (PARTITION BY equip_id
5 ORDER BY equip_type_id,
6 type_id,
7 created,
8 col_1,
9 col_0) min_id
10 FROM dedupe_test_01
11* WHERE ROWNUM < 51
ken@DEV9206/

ROWNUM EQUIP_ID EQUIP_TYPE_ID TYPE_ID CREATED COL_1 COL_0 ID MIN_ID
------ -------- ------------- ------- --------- ----- ----- -------- --------
1 3011 29 221 30-NOV-02 3 77 26445635 26445635
2 3011 29 221 30-NOV-02 3 77 26445651

3 3011 29 221 30-NOV-02 3 86 26445626 26445626
4 3011 29 221 30-NOV-02 3 86 26445653

5 3011 29 221 30-NOV-02 3 112 26445617 26445617
6 3011 29 221 30-NOV-02 3 112 26445620
7 3011 29 221 30-NOV-02 3 125 26445631
8 3011 29 221 30-NOV-02 3 125 26445641

9 3551 29 221 30-NOV-02 3 112 3209094 3209094
10 3551 29 221 30-NOV-02 3 112 3837739
11 3551 29 221 30-NOV-02 3 112 3837759
12 3551 29 221 30-NOV-02 3 112 6072229
13 3551 29 221 30-NOV-02 3 112 8894682
14 3551 29 221 30-NOV-02 3 118 3209095
15 3551 29 221 30-NOV-02 3 118 3837740
16 3551 29 221 30-NOV-02 3 118 3837760
17 3551 29 221 30-NOV-02 3 118 6072230
18 3551 29 221 30-NOV-02 3 118 8894683

19 3551 29 221 11-MAR-05 1 186 3209093 3209093
20 3551 29 221 11-MAR-05 1 186 3837738
21 3551 29 221 11-MAR-05 1 186 3837758
22 3551 29 221 11-MAR-05 1 186 6072228
23 3551 29 221 11-MAR-05 1 186 8894681

24 3551 29 221 11-MAR-05 1 190 3209092 3209092
25 3551 29 221 11-MAR-05 1 190 3837737
26 3551 29 221 11-MAR-05 1 190 3837757
27 3551 29 221 11-MAR-05 1 190 6072227
28 3551 29 221 11-MAR-05 1 190 8894680

29 3551 29 221 11-MAR-05 1 227 3209091 3209091
30 3551 29 221 11-MAR-05 1 227 3837736
31 3551 29 221 11-MAR-05 1 227 3837756
32 3551 29 221 11-MAR-05 1 227 6072226
33 3551 29 221 11-MAR-05 1 227 8894679

34 3551 29 221 12-MAR-05 1 153 3209090 3209090
35 3551 29 221 12-MAR-05 1 153 3837735
36 3551 29 221 12-MAR-05 1 153 3837755
37 3551 29 221 12-MAR-05 1 153 6072225
38 3551 29 221 12-MAR-05 1 153 8894678
39 3551 29 221 16-MAR-05 1 109 3837734
40 3551 29 221 16-MAR-05 1 109 3837754
41 3551 29 221 16-MAR-05 1 109 6072224
42 3551 29 221 16-MAR-05 1 109 8894677
43 3551 29 221 16-MAR-05 1 128 3837733
44 3551 29 221 16-MAR-05 1 128 3837753
45 3551 29 221 16-MAR-05 1 128 6072223
46 3551 29 221 16-MAR-05 1 128 8894676
47 3551 29 221 17-MAR-05 1 181 3837732
48 3551 29 221 17-MAR-05 1 181 3837752
49 3551 29 221 17-MAR-05 1 181 6072222
50 3551 29 221 17-MAR-05 1 181 8894675

50 rows selected.

Row numbers 14, 39, 43, and 47 should have started new groups. What am I doing wrong that it is not seeing these as new groups?

Thanks much for your help.

March 01, 2006 - 10:32 am UTC

you need to just state in english what you are trying to do rather than posting SQL that does not achieve it.

Tell us your LOGIC.

Difficulty with min(id) over partition by -- part deux - take 3

Ken, March 01, 2006 - 10:38 am UTC

Thanks, Tom, fair enough.

We would like to select duplicate rows from a table based on the values in certain fields (not all fields will have duplicate data since some are unique to each row). And identify the row within this set with the lowest value in one column --- min(id), and have this value be shown in a separate column.

Please let me know how we can achieve that.

Thanks again.

March 01, 2006 - 10:42 am UTC

tell me what values perhaps.

In general, you will partition by the unique key
You will get the MIN(ID) by that key

No order by

Difficulty with min(id) over partition by -- part deux - take 4

Ken, March 01, 2006 - 11:54 am UTC

Thanks, Tom, but I think we are not on the same page yet.
Please let me try again re: logic.

We have the following table:
ken@DEV9206desc dedupe_test_01
Name Null? Type
----------------------------- -------- ------------------
ID NOT NULL NUMBER
EQUIP_ID NOT NULL NUMBER
EQUIP_TYPE_ID NOT NULL NUMBER
TYPE_ID NOT NULL NUMBER
CREATED NOT NULL DATE
COL_0 NUMBER
COL_1 NUMBER

ken@DEV9206l
1 select * from dedupe_test_01
2 where equip_id in (3011, 3551)
3 and rownum < 51
4* order by 1,2,3,4,5,6
ken@DEV9206/

EQUIP_ID EQUIP_TYPE_ID TYPE_ID CREATED COL_0 COL_1 ID
-------- ------------- ------- --------- ----- ----- --------
3011 29 221 30-NOV-02 77 3 26445635
3011 29 221 30-NOV-02 77 3 26445651
3011 29 221 30-NOV-02 86 3 26445626
3011 29 221 30-NOV-02 86 3 26445653
3011 29 221 30-NOV-02 112 3 26445617
3011 29 221 30-NOV-02 112 3 26445620
3011 29 221 30-NOV-02 125 3 26445631
3011 29 221 30-NOV-02 125 3 26445641
3551 29 221 30-NOV-02 112 3 3209094
3551 29 221 30-NOV-02 112 3 3837739
3551 29 221 30-NOV-02 112 3 3837759
3551 29 221 30-NOV-02 112 3 6072229
3551 29 221 30-NOV-02 112 3 8894682
3551 29 221 30-NOV-02 118 3 3209095
3551 29 221 30-NOV-02 118 3 3837740
3551 29 221 30-NOV-02 118 3 3837760
3551 29 221 30-NOV-02 118 3 6072230
3551 29 221 30-NOV-02 118 3 8894683
3551 29 221 11-MAR-05 186 1 3209093
3551 29 221 11-MAR-05 186 1 3837738
3551 29 221 11-MAR-05 186 1 3837758
3551 29 221 11-MAR-05 186 1 6072228
3551 29 221 11-MAR-05 186 1 8894681
3551 29 221 11-MAR-05 190 1 3209092
3551 29 221 11-MAR-05 190 1 3837737
3551 29 221 11-MAR-05 190 1 3837757
3551 29 221 11-MAR-05 190 1 6072227
3551 29 221 11-MAR-05 190 1 8894680
3551 29 221 11-MAR-05 227 1 3209091
3551 29 221 11-MAR-05 227 1 3837736
3551 29 221 11-MAR-05 227 1 3837756
3551 29 221 11-MAR-05 227 1 6072226
3551 29 221 11-MAR-05 227 1 8894679
3551 29 221 12-MAR-05 153 1 3209090
3551 29 221 12-MAR-05 153 1 3837735
3551 29 221 12-MAR-05 153 1 3837755
3551 29 221 12-MAR-05 153 1 6072225
3551 29 221 12-MAR-05 153 1 8894678
3551 29 221 16-MAR-05 109 1 3837734
3551 29 221 16-MAR-05 109 1 3837754
3551 29 221 16-MAR-05 109 1 6072224
3551 29 221 16-MAR-05 109 1 8894677
3551 29 221 16-MAR-05 128 1 3837733
3551 29 221 16-MAR-05 128 1 3837753
3551 29 221 16-MAR-05 128 1 6072223
3551 29 221 16-MAR-05 128 1 8894676
3551 29 221 17-MAR-05 181 1 3837732
3551 29 221 17-MAR-05 181 1 3837752
3551 29 221 17-MAR-05 181 1 6072222
3551 29 221 17-MAR-05 181 1 8894675

50 rows selected.

And here's the answer I need:
EQUIP_ID EQUIP_TYPE_ID TYPE_ID CREATED COL_0 COL_1 ID MIN_ID
-------- ------------- ------- --------- ----- ----- -------- --------
3011 29 221 30-NOV-02 77 3 26445635 26445635
3011 29 221 30-NOV-02 77 3 26445651
3011 29 221 30-NOV-02 86 3 26445626 26445626
3011 29 221 30-NOV-02 86 3 26445653
3011 29 221 30-NOV-02 112 3 26445617 26445617
3011 29 221 30-NOV-02 112 3 26445620
3011 29 221 30-NOV-02 125 3 26445631 26445631
3011 29 221 30-NOV-02 125 3 26445641
3551 29 221 30-NOV-02 112 3 3209094 3209094
3551 29 221 30-NOV-02 112 3 3837739
3551 29 221 30-NOV-02 112 3 3837759
3551 29 221 30-NOV-02 112 3 6072229
3551 29 221 30-NOV-02 112 3 8894682
3551 29 221 30-NOV-02 118 3 3209095 3209095
3551 29 221 30-NOV-02 118 3 3837740
3551 29 221 30-NOV-02 118 3 3837760
3551 29 221 30-NOV-02 118 3 6072230
3551 29 221 30-NOV-02 118 3 8894683
3551 29 221 11-MAR-05 186 1 3209093 3209093
3551 29 221 11-MAR-05 186 1 3837738
3551 29 221 11-MAR-05 186 1 3837758
3551 29 221 11-MAR-05 186 1 6072228
3551 29 221 11-MAR-05 186 1 8894681
3551 29 221 11-MAR-05 190 1 3209092 3209092
3551 29 221 11-MAR-05 190 1 3837737
3551 29 221 11-MAR-05 190 1 3837757
3551 29 221 11-MAR-05 190 1 6072227
3551 29 221 11-MAR-05 190 1 8894680

Perhaps, PARTITION BY is not appropriate? Thanks.

March 01, 2006 - 1:49 pm UTC

I know we are not on the same page - because all i want is a textual description of the logic.

I do not, will not, reverse engineer "this is what I get" and "this is what I want"

I want you to write it down as if you would give it to someone to implement

because....... That is precisely what you are doing.

Difficulty with min(id) over partition by -- part deux - take 5

Ken, March 01, 2006 - 12:23 pm UTC

Thanks, Tom. Your questions had forced me to look at it again. I was able to resolve it without using PARTITION BY.

Thanks again.

Analytics Question

Mike, March 01, 2006 - 2:48 pm UTC

I am having an issue with using analytics. What I need to be able to accomplish is
I have a table called entity, this table is recursive in the fact the every entity
with the exception of the highest level entity has a parent entity.

Normally this list is easily generated by using 'connect by' in the SQL to show the hierarchy.

The problem I have is now we have to show all the tickets that have been opened for each entity
and all the entities under it.

A simple join will get the open tickets for each entity, but in order to sum all of the tickets
for each entity and all entities under it takes too many resources to be considered usable.

I have tried several ways to get this query to work with as few resources as possible but still
believe that this SQL will generate to unnacceptable as the table(s) grow.

what I really need to do is sum the entities tickets up in reverse order from the bottom up to
the level 1 entity so that I don't have to traverse the tree from level 1 to level 6, then each
level2 to level6, etc,..

I wrote a quick procedure to populate a table (t) with each uuid that has a ticket, it's sum of
the tickets tied to that entity and a concatenated list of uuids under that entity in the tree,
this resulted in a much smaller table but this is not the way I would want to implemet it

This table is going to grow into several hundred thousand entities and I need to be able
to generate this list much faster and using less resources than the connect by statements below.
Any help would definitely be appreciated.

Without using a procedure and a two step process here is what I have to work with for a test,

CREATE TABLE ENTITY (
ENTITY_UUID VARCHAR2(32),
NAME VARCHAR2(256),
PARENT_UUID VARCHAR2(32)
)
/

Table created.

CREATE TABLE ENTITY_TCKT (
ENTITY_UUID VARCHAR2(32),
CURRENT_LIFECYCLE_STATE NUMBER
)
/

Table created.

Then populate the entity table,

insert into entity values ('13E7CAA5FDEB42518A798A77A19F70B0','Level1 Entity',NULL);
insert into entity values ('66A6A6EFFA9D46BE82EC8F5CFFAC91B9','Level4 Entity','536FCF7E4A5D457B8C3AECBED878FDBF');
insert into entity values ('DCF6B6366D6449DB95A5AEA6B14F31F7','Level5 Entity','66A6A6EFFA9D46BE82EC8F5CFFAC91B9');
insert into entity values ('E2FD444948714528805EBFFA102511F5','Level5 Entity','CB4E1B74035947B9A5B9B0FE264DF4E7');
insert into entity values ('2E1E0646AC9F4BB9A6E4A747B20B2595','Level2 Entity','13E7CAA5FDEB42518A798A77A19F70B0');
insert into entity values ('54133391FDD54221B11382A20DFC38AA','Level2 Entity','13E7CAA5FDEB42518A798A77A19F70B0');
insert into entity values ('95A5F85D68184DB7A49F9DF7A236F9AF','Level5 Entity','99762DC75A5D42DCBEA6950D7011F130');
insert into entity values ('EAD30C5578BD491991B0D7049CD4F277','Level4 Entity','536FCF7E4A5D457B8C3AECBED878FDBF');
insert into entity values ('536FCF7E4A5D457B8C3AECBED878FDBF','Level3 Entity','54133391FDD54221B11382A20DFC38AA');
insert into entity values ('883FD970DF264B7A9DC0DFBAF225012A','Level4 Entity','536FCF7E4A5D457B8C3AECBED878FDBF');
insert into entity values ('C23E104B5AA044F795A6896B1C0B08E4','Level3 Entity','54133391FDD54221B11382A20DFC38AA');
insert into entity values ('64BA9F3295194A3B955FD446DBB2E7EC','Level4 Entity','C23E104B5AA044F795A6896B1C0B08E4');
insert into entity values ('0FE28D72C40C4D6FBB439A49B0BE6D3F','Level5 Entity','64BA9F3295194A3B955FD446DBB2E7EC');
----------list goes on------------------

Then to generate some random ticket data

DECLARE
l_tckt_ktr NUMBER;
l_entity_ktr NUMBER;
BEGIN
FOR i IN (select entity_uuid from entity)
LOOP
l_entity_ktr := round(dbms_random.value(1,6),0);
DBMS_OUTPUT.PUT_LINE('Processing entity -> '||i.entity_uuid||' l_entity_ktr = '||l_entity_ktr);
IF l_entity_ktr = 1 THEN
l_tckt_ktr := round(dbms_random.value(1,10),0);
FOR n IN 1 .. l_tckt_ktr
LOOP
insert into entity_tckt values (i.entity_uuid,0);
END LOOP;
ELSIF l_entity_ktr = 3 THEN
l_tckt_ktr := round(dbms_random.value(50,100),0);
FOR n IN 1 .. l_tckt_ktr
LOOP
insert into entity_tckt values (i.entity_uuid,0);
END LOOP;
ELSIF l_entity_ktr = 5 THEN
l_tckt_ktr := round(dbms_random.value(500,1000),0);
FOR n IN 1 .. l_tckt_ktr
LOOP
insert into entity_tckt values (i.entity_uuid,0);
END LOOP;
END IF;
END LOOP;
END;
/

PL/SQL procedure successfully completed.

Now that the tables have data, try two ways to get the query

set lines 256
column lpad('',2*level)||e.name format a50
column lpad('',2*level)||e.entity_uuid format a50
select lpad(' ', 2 * level)||e.name, lpad(' ', 2 * level)||e.entity_uuid, level,
(select count(*) from entity_tckt where current_lifecycle_state = 0 and entity_uuid in (
select a.entity_uuid from entity a start with a.entity_uuid = e.entity_uuid connect by prior a.entity_uuid = a.parent_uuid)) as nbr
from entity e
start with e.entity_uuid = '13E7CAA5FDEB42518A798A77A19F70B0' connect by prior e.entity_uuid = e.parent_uuid
/
LPAD('',2*LEVEL)||E.NAME LPAD('',2*LEVEL)||E.ENTITY_UUID LEVEL NBR
-------------------------------------------------- -------------------------------------------------- ---------- ----------
Level1 Entity 13E7CAA5FDEB42518A798A77A19F70B0 1 15533
Level2 Entity 2E1E0646AC9F4BB9A6E4A747B20B2595 2 7
Level2 Entity 54133391FDD54221B11382A20DFC38AA 2 15526
Level3 Entity 536FCF7E4A5D457B8C3AECBED878FDBF 3 890
Level4 Entity 66A6A6EFFA9D46BE82EC8F5CFFAC91B9 4 51
Level5 Entity DCF6B6366D6449DB95A5AEA6B14F31F7 5 0
Level4 Entity EAD30C5578BD491991B0D7049CD4F277 4 76
Level4 Entity 883FD970DF264B7A9DC0DFBAF225012A 4 0
Level4 Entity 0F01178489CB421CA5A4C55AEC98300E 4 4
Level5 Entity 5B3B033B4A30443BB1B6F57C1BB05399 5 4
Level4 Entity CB4E1B74035947B9A5B9B0FE264DF4E7 4 69

LPAD('',2*LEVEL)||E.NAME LPAD('',2*LEVEL)||E.ENTITY_UUID LEVEL NBR
-------------------------------------------------- -------------------------------------------------- ---------- ----------
Level5 Entity E2FD444948714528805EBFFA102511F5 5 69
Level4 Entity 99762DC75A5D42DCBEA6950D7011F130 4 607
Level5 Entity 95A5F85D68184DB7A49F9DF7A236F9AF 5 517
-------------------and on and on----------------

122 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'ENTITY_TCKT'
4 2 FILTER
5 4 CONNECT BY (WITH FILTERING)
6 5 NESTED LOOPS
7 6 TABLE ACCESS (FULL) OF 'ENTITY'
8 6 TABLE ACCESS (BY USER ROWID) OF 'ENTITY'
9 5 NESTED LOOPS
10 9 BUFFER (SORT)
11 10 CONNECT BY PUMP
12 9 TABLE ACCESS (FULL) OF 'ENTITY'
13 0 CONNECT BY (WITH FILTERING)
14 13 NESTED LOOPS
15 14 TABLE ACCESS (FULL) OF 'ENTITY'
16 14 TABLE ACCESS (BY USER ROWID) OF 'ENTITY'
17 13 NESTED LOOPS
18 17 BUFFER (SORT)
19 18 CONNECT BY PUMP
20 17 TABLE ACCESS (FULL) OF 'ENTITY'

Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
208383 consistent gets
0 physical reads
0 redo size
9007 bytes sent via SQL*Net to client
591 bytes received via SQL*Net from client
10 SQL*Net roundtrips to/from client
27913 sorts (memory)
0 sorts (disk)
122 rows processed

And now using 'WITH'

with alm as (select entity_uuid, count(*) as nbr
from entity_tckt
group by entity_uuid)
select e.entity_uuid, sum(nbr) from entity e, alm
where alm.entity_uuid in (select entity_uuid from entity start with entity_uuid = e.entity_uuid connect by prior entity_uuid = parent_uuid)
group by e.entity_uuid
/

ENTITY_UUID SUM(NBR)
-------------------------------- ----------
026D4A6B547544E08CFEC5DDED3B7777 772
02AD7930E9B94A20A9F4E245B3F4B8C4 1747
04B3AD5F804B4A66AC6C91606EA7019B 74
070547E30C604D4680089959A6DB7684 677
08ABA73521F94F2DB19FF8C1C53ADF06 2
0D65B9BB906B442B9A6BDFEEF3D858AC 86
0F01178489CB421CA5A4C55AEC98300E 4
0FE28D72C40C4D6FBB439A49B0BE6D3F 716
10801AFBCBC44F2F9851E16BC1CCA442 703
13E7CAA5FDEB42518A798A77A19F70B0 15533
1677A02C0B89479BA27F95D7AA3DAC03 9516

ENTITY_UUID SUM(NBR)
-------------------------------- ----------
1AE6BCDA1FA74B0B8DEBEFF13F9A444A 96
1F9BD4CD72FA4879AC338397FB59FD39 74
21FAD03A5EB34D79908E96DA647FF24C 592
235F6C1600584CDD8B3FB1FFCD742E7E 839
23EA0B4618A94C1586BDF08437B97EB0 3067
25CFE1E64F36468DB291CBCF0867B314 59
26C635064E83447B91BA8D125FE74C2A 776
2ABC45D4A5D84684AC29A578C5CCBD3E 1759
2DF37BBB64254DB29081F03D38B5CE33 876
2E1E0646AC9F4BB9A6E4A747B20B2595 7
2FF85FEF82864C0BB75BA4513885ED0E 76

----------------more data ----------------
71 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 FILTER
3 2 NESTED LOOPS
4 3 VIEW
5 4 SORT (GROUP BY)
6 5 TABLE ACCESS (FULL) OF 'ENTITY_TCKT'
7 3 TABLE ACCESS (FULL) OF 'ENTITY'
8 2 FILTER
9 8 CONNECT BY (WITH FILTERING)
10 9 NESTED LOOPS
11 10 TABLE ACCESS (FULL) OF 'ENTITY'
12 10 TABLE ACCESS (BY USER ROWID) OF 'ENTITY'
13 9 NESTED LOOPS
14 13 BUFFER (SORT)
15 14 CONNECT BY PUMP
16 13 TABLE ACCESS (FULL) OF 'ENTITY'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
197127 consistent gets
0 physical reads
0 redo size
3707 bytes sent via SQL*Net to client
547 bytes received via SQL*Net from client
6 SQL*Net roundtrips to/from client
27902 sorts (memory)
0 sorts (disk)
71 rows processed

The logical reads will kill us as the table grows - Any help you could give would be fantastic

P.S. - I love the site, use it every day - Thanx
Mike

Analytics rock Analytics roll

Alf, March 01, 2006 - 2:50 pm UTC

Sorry, I should have mention that there is not direct relation between patient and proc_event:

The only way that Im able to relate patients to proc_event is by joining event to proc_event as (e.event_id(+) = pe.event_id) and then visit to patient (p.patient_id = v.patient_id). The IN operator is working as expected, however I try to use AND as pe.proc_id = '123' AND pe.proc_id ='3456' because this would return only records for patients who would have both 123 and 3456 (proc_di) records in the proc_event table.

Hello Tom, Analytics rock Analytics roll

Alf, March 02, 2006 - 4:51 pm UTC

Hi Tom,

I've been trying many different approach for this

Here are the table desc, I'm including the whole output from de desc command. Not sure if you'd want me to include everything or cut the not revelant culmns out.

SQL> desc ud_master.patient
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 PATIENT_ID                                NOT NULL NUMBER(12)
 NAME                                               VARCHAR2(100)
 TITLE_ID                                           NUMBER(12)
 MEDICAL_RECORD_NUMBER                              VARCHAR2(20)
 SEX                                                VARCHAR2(8)
 BIRTHDATE                                          DATE
 DATE_OF_DEATH                                      DATE
 APT_SUITE                                          VARCHAR2(100)
 STREET_ADDRESS                                     VARCHAR2(100)
 CITY                                               VARCHAR2(50)
 STATE                                              VARCHAR2(50)
 COUNTRY                                            VARCHAR2(50)
 MAILING_CODE                                       VARCHAR2(50)
 MARITAL_STATUS_ID                                  NUMBER(12)
 RACE_ID                                            NUMBER(12)
 RELIGION_ID                                        NUMBER(12)
 FREE_TEXT_RELIGION                                 VARCHAR2(100)
 OCCUPATION_ID                                      NUMBER(12)
 FREE_TEXT_OCCUPATION                               VARCHAR2(100)
 EMPLOYER_ID                                        NUMBER(12)
 FREE_TEXT_EMPLOYER                                 VARCHAR2(150)
 MOTHER_PATIENT_ID                                  NUMBER(12)
 COLLAPSED_INTO_PATIENT_ID                          NUMBER(12)
 SOCIAL_SECURITY_NUMBER                             VARCHAR2(15)
 LIFECARE_VISIT_ID                                  NUMBER(12)
 CONFIDENTIAL_FLAG                                  VARCHAR2(1)
 HOME_PHONE                                         VARCHAR2(20)
 DAY_PHONE                                          VARCHAR2(20)
 SMOKER_FLAG                                        VARCHAR2(1)
 CURRENT_LOCATION                                   VARCHAR2(15)
 SEC_LANG_NAME                                      VARCHAR2(100)
 ADDR_STRING                                        VARCHAR2(50)
 BLOCK_CODE                                         VARCHAR2(50)

SQL> desc ud_master.visit
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 VISIT_ID                                  NOT NULL NUMBER(12)
 VISIT_NUMBER                                       VARCHAR2(40)
 PATIENT_ID                                         NUMBER(12)
 VISIT_TYPE_ID                                      NUMBER(12)
 VISIT_SUBTYPE_ID                                   NUMBER(12)
 VISIT_STATUS_ID                                    NUMBER(12)
 FACILITY_ID                                        NUMBER(12)
 ATTENDING_EMP_PROVIDER_ID                          NUMBER(12)
 RESIDENT_EMP_PROVIDER_ID                           NUMBER(12)
 ADMISSION_DATE_TIME                                DATE
 DISCHARGE_DATE_TIME                                DATE
 DISCHARGE_TYPE_ID                                  NUMBER(12)
 MARITAL_STATUS_ID                                  NUMBER(12)
 RELIGION_ID                                        NUMBER(12)
 FREE_TEXT_RELIGION                                 VARCHAR2(100)
 FINANCIAL_CLASS_ID                                 NUMBER(12)
 OCCUPATION_ID                                      NUMBER(12)
 FREE_TEXT_OCCUPATION                               VARCHAR2(100)
 EMPLOYER_ID                                        NUMBER(12)
 FREE_TEXT_EMPLOYER                                 VARCHAR2(100)
 PHYSICIAN_SERVICE_ID                               VARCHAR2(12)
 LOCATION_ID                                        VARCHAR2(12)
 ADDL_RESP_EMP_PROVIDER_ID                          NUMBER(12)
 ADDL_RESP_STRING                                   VARCHAR2(100)
 ATTENDING_STRING                                   VARCHAR2(100)
 RESIDENT_STRING                                    VARCHAR2(100)
 LAST_LOCATION                                      VARCHAR2(15)
 ADDL_RESP_RESIDENT_SERVICE_ID                      NUMBER(12)
 TRIAGE_ACUITY_ID                                   NUMBER(12)
 SERIES_VISIT_FLAG                                  VARCHAR2(5)
 NEWBORN_FLAG                                       VARCHAR2(5)

SQL> desc ud_master.proc_event
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 VISIT_ID                                  NOT NULL NUMBER(12)
 EVENT_ID                                  NOT NULL NUMBER(12)
 ORDER_SPAN_ID                                      NUMBER(12)
 ORDER_SPAN_STATE_ID                                NUMBER(12)
 PROC_ID                                            NUMBER(12)
 ORIG_SCHEDULE_BEGIN_DATE_TIME                      DATE
 ORIG_SCHEDULE_END_DATE_TIME                        DATE
 FINAL_SCHEDULE_BEGIN_DATE_TIME                     DATE
 FINAL_SCHEDULE_END_DATE_TIME                       DATE
 ABNORMAL_STATE_ID                                  VARCHAR2(3)
 MODIFIED_PROC_NAME                                 VARCHAR2(250)
 FACILITY_ID                                        NUMBER(12)
 PRIORITY_ID                                        NUMBER(12)
 CORRECTED_FLAG                                     VARCHAR2(5)
 RX_FLAG                                            VARCHAR2(5)
 SPEC_RECOLLECT_FLAG                                VARCHAR2(5)
 COMPLETE_RESULT_RPT                                NUMBER(12)
 ORDER_VISIT_ID                                     NUMBER(12)
 ORDER_DEFINITION_ID                                VARCHAR2(25)
 PROC_ORDER_NBR                                     NUMBER(12)

SQL> desc ud_master.event
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 VISIT_ID                                  NOT NULL NUMBER(12)
 EVENT_ID                                  NOT NULL NUMBER(12)
 DATE_TIME                                          DATE
 EVENT_STATUS_ID                                    NUMBER(12)
 EVENT_TYPE_ID                                      NUMBER(12)
 PATIENT_SCHEDULE_DISPLAY                           VARCHAR2(100)

I've created the following query with a sub and with the exists clause, would please review it and let me know whatever I need to correct, many thanks in advance.

SELECT DISTINCT  patient.NAME,
         patient.medical_record_number Medical_RN,
         sub_epe.proc_id,
         to_date(sub_epe.date_time,'dd-mon-yy') event_dt
    FROM ud_master.patient,
         (SELECT proc_event.proc_id, event.date_time,proc_event.VISIT_ID
          FROM ud_master.event, ud_master.proc_event
          WHERE (exists (select * 
                         from ud_master.proc_event sub_p
                        where sub_p.proc_id = sub_p.proc_id) 
                          and (proc_event.visit_id = event.visit_id)))sub_epe,ud_master.visit
    WHERE (   (patient.patient_id = visit.patient_id)
          AND (sub_epe.proc_id = 21078) and (sub_epe.proc_id = 22025)
          AND (sub_epe.date_time BETWEEN to_date('25-jan-2005','dd-mon-yyyy') AND
                                         to_date('31-jan-2005','dd-mon-yyyy')))

GROUP BY patient.NAME,
         patient.medical_record_number,
         sub_epe.proc_id,
         sub_epe.date_time

Another Select Query

A reader, March 08, 2006 - 2:55 am UTC

Hi Tom,
Have a requirement like this:
Create table test(
main number,
a1 varchar2(10),
a2 varchar2(10),
a3 varchar2(10),
a4 varchar2(10),
a5 varchar2(10)) nologging;

Insert into test (1,'A1','A2','A3','A4','A5');

Normal Output:

Main A1 A2 A3 A4 A5
---- -- -- -- -- --
1 A1 A2 A3 A4 A5

Desired Output:
---------------
Main Txt
---- ---
1 A1
1 A2
1 A3
1 A4
1 A5

I know it is a simple query.
Please advice

March 09, 2006 - 12:13 pm UTC

ops$tkyte@ORA10GR2> with r as
  2  (select level l from dual connect by level <= 5)
  3  select test.main,
  4         decode( r.l, 1, a1, 2, a2, 3, a3, 4, a4, 5, a5 ) data
  5    from test, r;

      MAIN DATA
---------- ----------
         1 A1
         1 A2
         1 A3
         1 A4
         1 A5

Analytics rock Analytics roll

Alf, March 09, 2006 - 4:37 pm UTC

Hello Tom,

How about this?

I have four tables (descs included below) that holds information about patients along with their visits and tests (procedures hence the proc table) perform. In some case some patients are ordered two tests/procedures with a certain period of time.

I to create a report listing patients that had had at least two procedures done in this case procedure 21078 and 22025, the desire out put would be:

P_name P_MRN Proc_ID Event_Date_Time
---------------- ----------- ----------- --------------------
Patient Test, A 44422244555 21078 01-jan-2005 00:00
22025 05-jan-2005 10:35
Patient Test, B 3334442222 21078 28-feb-2005 11:15
22025 31-jul-2005 01:35
...

My challenge is there's not direct relation between patient and proc_event, would please review these tables and give a hit how to approach this? Thanks.

Patient
=======
Name Null? Type
----------------------------------------- -------- ----------------------------
PATIENT_ID NOT NULL NUMBER(12)
NAME VARCHAR2(100)
TITLE_ID NUMBER(12)
MEDICAL_RECORD_NUMBER VARCHAR2(20)
SEX VARCHAR2(8)
.....

Visit
=====
Name Null? Type
----------------------------------------- -------- ----------------------------
VISIT_ID NOT NULL NUMBER(12)
VISIT_NUMBER VARCHAR2(40)
PATIENT_ID NUMBER(12)
VISIT_TYPE_ID NUMBER(12)
VISIT_SUBTYPE_ID NUMBER(12)
VISIT_STATUS_ID NUMBER(12)
FACILITY_ID NUMBER(12)
......

proc_event
=====
Name Null? Type
----------------------------------------- -------- ----------------------------
VISIT_ID NOT NULL NUMBER(12)
EVENT_ID NOT NULL NUMBER(12)
ORDER_SPAN_ID NUMBER(12)
ORDER_SPAN_STATE_ID NUMBER(12)
PROC_ID NUMBER(12)
......

event
=====
Name Null? Type
----------------------------------------- -------- ----------------------------
VISIT_ID NOT NULL NUMBER(12)
EVENT_ID NOT NULL NUMBER(12)
DATE_TIME DATE
EVENT_STATUS_ID NUMBER(12)
EVENT_TYPE_ID NUMBER(12)
PATIENT_SCHEDULE_DISPLAY VARCHAR2(100)
......

Analytics Rock

Alf, March 10, 2006 - 3:31 pm UTC

Hello Tom,

Follow-up for my pervious above question regarding a list of patients who have had two specific tests performed/ordered in this case proc_id 22025 and 21078

I finally got the below query to work this time. Would you please review it and let me know if there's anything I need to change to improve speed performance? Thanks.

SELECT
DISTINCT patient.medical_record_number mrn,
patient.name p_name,
proc.proc_id,proc.name,
last_day(to_date(event.date_time, 'dd-mon-yy')) D_time
FROM ud_master.event,
ud_master.proc_event,
ud_master.patient,
ud_master.visit,
ud_master.proc
WHERE (
(patient.patient_id = visit.patient_id)
AND (visit.visit_id = proc_event.visit_id)
AND (visit.visit_id = event.visit_id)
AND (proc_event.visit_id = event.visit_id)
AND (proc_event.event_id = event.event_id)
AND (proc.proc_id in (22025,21078))
AND (event.date_time BETWEEN to_date('01-jan-2005','dd-mon-yyyy') AND
to_date('31-dec-2005','dd-mon-yyyy')))

GROUP BY patient.medical_record_number,
patient.name,
proc.proc_id,
proc.name,
to_date(event.date_time, 'dd-mon-yy')

ORDER BY patient.medical_record_number,'p_name',
proc.proc_id,
proc.name,'D_time'

Analytic function

Mohamed from France, March 23, 2006 - 4:20 am UTC

Dear Tom,

I have a table T1 as follows:

IDE NOT NULL NUMBER
PTFL_IDE NOT NULL NUMBER
BATY_TYP NOT NULL VARCHAR2(6)
CTP_IC NOT NULL NUMBER
LABL_PLY NOT NULL VARCHAR2(6)
NMFI_IDE NOT NULL NUMBER
QTY NOT NULL NUMBER

and a table T2 as follows

IDE NOT NULL NUMBER
SNAP_NMFI_IDE NOT NULL NUMBER
IDE_PORTF VARCHAR2(50) ===>
SNAP_BATY_TYP VARCHAR2(6)
USR_INS VARCHAR2(48)
DAT_INS DATE
USR_UPD VARCHAR2(48)
DAT_UPD DATE
PGM_INS VARCHAR2(48)
PGM_UPD VARCHAR2(48)

and table T3 as follows

PTFL_IDE NOT NULL NUMBER
IDE_PORTF NOT NULL VARCHAR2(50)
PTF_STATUS NOT NULL VARCHAR2(3)
NLAB_COD_LABL NOT NULL VARCHAR2(6)

I would like to get from table T1 the sum(T1.QTY) grouped by T1.PTFL_IDE,T1.BATY_TYP,T1.CTP_IC, T1.LABL_PLY +
the max(T2.dat_ins) for each (T3.IDE_PORTF,T3.SNAP_BATY_TYP) in this group

Remark: T3.SNAP_BATY_TYP = T1. BATY_TYP
T3.IDE_PORTF = SELECT T3.IDE_PORTF FROM T3 WHERE T3.PTFL_IDE = T1.PTFL_IDE

Thanks in advance for your help

March 23, 2006 - 10:54 am UTC

sorry, no help.

no create table.
no insert into.
no real explanation of the goal. the remark doesn't make sense (don't know what you are trying to remark), the "i would like to get" isn't very clear either.

Simple Analytical Query

A reader, March 24, 2006 - 2:31 am UTC

Hi Tom,

I have finally got a chance to make use of analytical Query but I am not able to use it. Following is my table data:-

ITEM          MYORDERS  QTYONHAND SAFETYSTOCK
---------- ---------- ---------- -----------
ABC                 5         50          10
ABC                45         50          10
ABC                25         50          10
DEF                30         60          10
DEF                40         60          10
DEF                30         50          10
DEF                30         60          10
DEF                40         60          10
XYZ                20         80          10
XYZ                10         80          10
XYZ                10         80          10

ITEM          MYORDERS  QTYONHAND SAFETYSTOCK
---------- ---------- ---------- -----------
XYZ                20         80          10

Now I want the Output to be as :-
Item     MYORDERS   qtyonhand  safetystock  tomake  --> (qtyonhand-safetystock-sum(MYORDERS))
====  =========  =======   ============  ========
ABC           5       50             10         0   (haven't depleted stock)
ABC          45       50             10        10   -1 x (50-10-5-45)
ABC          25       50             10        35   -1 x (50-10-5-45-25)
so on........

I tried the following Query but it does'nt give me a running total for the MYORDERS column :-
SQL> select item,MYORDERS,sum(MYORDERS) over (partition by item) tomake from test;
ITEM          MYORDERS    TOMAKE
---------- ---------- ----------
ABC                 5         75
ABC                45         75
ABC                25         75
DEF                30        170
DEF                40        170
DEF                30        170
DEF                30        170
DEF                40        170
XYZ                20         60
XYZ                10         60
XYZ                10         60

ITEM          MYORDERS    TOMAKE
---------- ---------- ----------
XYZ                20         60

12 rows selected.

If I get a running total in TOMAKE then I can just subtract it from the value of SAFETYSTOCK for that particular row to get my desired Output.
I know I am missing the order by clause but I tried using the other 2 columns (QTYONHAND, SAFETYSTOCK) but did not get the desired output. Can you please help me out?

--Create Table Script
CREATE TABLE TEST
(
  ITEM         VARCHAR2(10),
  MYORDERS     NUMBER,
  QTYONHAND    NUMBER,
  SAFETYSTOCK  NUMBER
)

--Insert Statements
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'ABC', 5, 50, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'ABC', 45, 50, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'ABC', 25, 50, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'DEF', 30, 60, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'DEF', 40, 60, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'DEF', 30, 60, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'DEF', 40, 60, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'DEF', 30, 50, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'XYZ', 10, 80, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'XYZ', 20, 80, 10);
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'XYZ', 10, 80, 10); 
INSERT INTO TEST ( ITEM, MYORDERS, QTYONHAND, SAFETYSTOCK ) VALUES ( 
'XYZ', 20, 80, 10);  

COMMIT;

Thanks

March 24, 2006 - 9:49 am UTC

you seem to be missing something to sort by? what orders this data - a running total implies "SORT BY SOMETHING"

Simple Analytical Query

A reader, March 24, 2006 - 12:16 pm UTC

Any one of the other 2 columns can be used to sort the data.

Thanks

March 24, 2006 - 3:34 pm UTC

so, order by them. in the over () clause.

Simple Analytical Query

A reader, March 24, 2006 - 3:50 pm UTC

Hi Tom

I think the below query will work fine for me I have ordered the data by rowid:-

SQL> select item,toorder,sum(toorder) over (partition by item order by rowid) run_tot from test;

ITEM          TOORDER    RUN_TOT
---------- ---------- ----------
ABC                25         25
ABC                45         70
ABC                 5         75
DEF                30         30
DEF                40         70
DEF                30        100
DEF                40        140
DEF                30        170
XYZ                10         10
XYZ                20         30
XYZ                10         40

ITEM          TOORDER    RUN_TOT
---------- ---------- ----------
XYZ                20         60

12 rows selected.

Thanks for all your help

March 24, 2006 - 4:16 pm UTC

as long as you don't care that it gives different answers for the same data on different databases, sure.

Full Join of inline views with analytics

Anwar, March 25, 2006 - 8:22 am UTC

I have a table with date column. I want to display dates from two different months in two columns. I want dates from both months in same rows and not alternate ones.

create table test
(tdate date);
insert into test values('01-jan-2006');
insert into test values('05-jan-2006');
insert into test values('15-jan-2006');
insert into test values('02-feb-2006');
insert into test values('07-feb-2006');
insert into test values('25-feb-2006');

Then i issue following command to retrieve data.

  1  select jan06,feb06
  2  from
  3     (select tdate jan06,
  4     row_number() over (order by tdate) rn
  5     from test
  6     where tdate between '01-jan-2006' and '31-jan-2006') j,
  7     (select tdate feb06,
  8     row_number() over (order by tdate) rn
  9     from test
 10     where tdate between '01-feb-2006' and '28-feb-2006') f
 11* where j.rn=f.rn
SQL> /

JAN06     FEB06
--------- ---------
01-JAN-06 02-FEB-06
05-JAN-06 07-FEB-06
15-JAN-06 25-FEB-06

So far so good. But then i insert another record in the table.

insert into test values('27-feb-2006');

Now number of records for february is more than that of january. I use full outer join and have this unexpected result.

  1  select j.rn,jan06,f.rn,feb06
  2  from
  3     (select tdate jan06,
  4     row_number() over (order by tdate) rn
  5     from test
  6     where tdate between '01-jan-2006' and '31-jan-2006') j
  7  full join
  8     (select tdate feb06,
  9     row_number() over (order by tdate) rn
 10     from test
 11     where tdate between '01-feb-2006' and '28-feb-2006') f
 12* on j.rn=f.rn
SQL> /

        RN JAN06             RN FEB06
---------- --------- ---------- ---------
         1 01-JAN-06          1 02-FEB-06
         2 05-JAN-06          2 07-FEB-06
         3 15-JAN-06          3 25-FEB-06
                                27-FEB-06
                                25-FEB-06
                                07-FEB-06
                                02-FEB-06

7 rows selected.

I create one table each for both months.

create table j_test
as select tdate JAN06,row_number() over (order by tdate) rn
from test
where tdate between '01-jan-2006' and '31-jan-2006'
/
create table f_test
as select tdate FEB06,row_number() over (order by tdate) rn
from test
where tdate between '01-feb-2006' and '28-feb-2006'
/

SQL> select j.rn,jan06,f.rn,feb06
  2  from j_test j full join f_test f
  3  on j.rn=f.rn
  4  /

        RN JAN06             RN FEB06
---------- --------- ---------- ---------
         1 01-JAN-06          1 02-FEB-06
         2 05-JAN-06          2 07-FEB-06
         3 15-JAN-06          3 25-FEB-06
                              4 25-FEB-06

This result is what i was expecting with inline views. Why is the behaviour different with analytics in inline views and tables?

Regards
Anwar

March 25, 2006 - 9:13 am UTC

reproduces in 9ir2, 10gr1 but not 10gr2 - please contact support with this test case.

Full Join of inline views with analytics - follow-up

Michel Cadot, March 25, 2006 - 10:27 am UTC

I don't get exactly the same thing on 9.2.0.6 but still get a wrong answer (btw, the answer showed for the last query is not correct, maybe there are not the same input rows, and i wonder why f.rn is not filled in the previous query answer):

SQL> select j.rn,jan06,f.rn,feb06
  2  from
  3     (select tdate jan06,
  4     row_number() over (order by tdate) rn
  5     from test
  6     where tdate between '01-jan-2006' and '31-jan-2006') j
  7  full join
  8     (select tdate feb06,
  9     row_number() over (order by tdate) rn
 10     from test
 11     where tdate between '01-feb-2006' and '28-feb-2006') f
 12  on j.rn=f.rn
 13  /
        RN JAN06               RN FEB06
---------- ----------- ---------- -----------
         1 01-jan-2006          1 02-feb-2006
         2 05-jan-2006          2 07-feb-2006
         3 15-jan-2006          3 25-feb-2006
                                3 25-feb-2006
                                4 27-feb-2006

5 rows selected.

But this does not happen if you use factoring clause:

SQL> with
  2    j_test as 
  3     (select tdate jan06,
  4     row_number() over (order by tdate) rn
  5     from test
  6     where tdate between '01-jan-2006' and '31-jan-2006'),
  7   f_test as 
  8     (select tdate feb06,
  9     row_number() over (order by tdate) rn
 10     from test
 11     where tdate between '01-feb-2006' and '28-feb-2006') 
 12  select j.rn,jan06,f.rn,feb06
 13  from j_test j full join f_test f
 14  on j.rn=f.rn
 15  /
        RN JAN06               RN FEB06
---------- ----------- ---------- -----------
         1 01-jan-2006          1 02-feb-2006
         2 05-jan-2006          2 07-feb-2006
         3 15-jan-2006          3 25-feb-2006
                                4 27-feb-2006

4 rows selected.

Regards
Michel

But is it possible to divide x days over a week using analytics?

Martijn, April 13, 2006 - 9:39 am UTC

Hello Tom,

First of all: Thanks for your great work and site!

I wonder if its possible to execute the following in SQL, using rocking and rolling analytics:

Each day of the week a query has to select a balanced set of  ids from a table, for example the all_objects table.

Connected to Oracle9i Enterprise Edition Release 9.2.0.5.0 

SQL> select count(*) from all_objects;

  COUNT(*)
----------
     26652

SQL> select round(count(*)/7) from all_objects;

ROUND(COUNT(*)/7)
-----------------
             3807

So each day about 3807 records should be selected.

SQL> SELECT distinct(MOD(t.object_id, 7))+1                     day_of_week
  2  ,      count(*) over (PARTITION BY (MOD(t.object_id,  7))) nof_rec_by_objid
  3  FROM   all_objects t
  4  ;

DAY_OF_WEEK NOF_REC_BY_OBJID
----------- ----------------
          1             3810
          2             3820
          3             3808
          4             3811
          5             3807
          6             3791
          7             3805

7 rows selected

Now the data is more or less evenly distributed per day over a week. Not much of a problem BUT:

By using a parameter we want to be able to do the following:
Imagine that the number of object_id-records queried each day represents a set of persons an email will be sent to. I want to be able to send the same person 'x' emails per week, equally spread over the week. So if  today is day 1 of the week and the parameter value = 3 (emails per week) then I would like to get this result:
DAY_OF_WEEK NOF_REC_BY_OBJID
----------- ----------------
          1             3810      <-mailrun 1
          2             3820
          3             3808      <-mailrun 2
          4             3811
          5             3807      <-mailrun 3 
          6             3791                   
          7             3805

So:

if parameter = 3 and day = 1 then
Select a group of numbers: 1, 3, 5
All persons with mod(person_uid, 7)+1 in (1, 3, 5) need to be selected on day 1, 3 and 5 of the week.

if arameter = 3 and day of week  = 2 then
Select a group of numbers: 2, 4, 6 
All persons with mod(person_uid, 7)+1 in (2, 4, 6) need to be selected on day 2, 4 and 6 of the week.

Parameter = 3, day of week  = 3
Select a group of numbers: 3, 5, 7 
All persons with mod(person_uid, 7)+1 in ( 3, 5, 7) need to be selected on day 3, 5 and 7 of the week.

Etc. (for all days in 1..7).

If the parameter is set to 2 then the query schould get the following:

Parameter = 2, day of week = 1
Select a group of numbers: 1, 5 
All persons with mod(person_uid, 7)+1 in (1, 5) need to be selected on day 1, 3 and 5 of the week.

Etc. for all days of the week.

This is the current version of the daily selection:

SELECT ao.object_id
FROM   all_objects ao
WHERE  mod(ao.object_id, 7)+1 IN 
( to_number(to_char(sysdate, 'D'))
, to_number(to_char(sysdate, 'D'))+3
, to_number(to_char(sysdate, 'D'))-4);

I would like to get rid of the hardcoded ' +3  , -4' lines and replace them by a more flexible approach. Can analytics be of help in this question or is the only solution using  a PL/SQL Function? So the current query needs to be rewritten to:

SELECT ao.object_id
FROM   all_objects ao
WHERE  mod(ao.object_id, 7)+1 IN (select all available days of the week, evenly distributed over the week, based on the value of the (cursor) parameter );

Or

SELECT ao.object_id
FROM   all_objects ao
WHERE EXISTS  (select all available days of the week, evenly distributed over the week, based on the value of the (cursor) parameter where mod(ao.object_id, 7)+1  IN (the available days) );


Thanks in advance for any clue!

Regards,
Martijn

Continued from earlier post

Martijn, April 19, 2006 - 8:58 am UTC

With a PL/SQL function,

the point where I get stuck is the fact that if the value of the parameter is 3, I can't get the distribution right (in an elegant way that is). The odd daynumers mess up the distribution.
I.e., this is what i get when i choose 3 times a week for groups to receive email:

Day Mailto "mods" with no.
-------------------------------
1 1, 3, 5
2 2, 4, 6
3 3, 5, 7 (7 should be 1?)
4 4, 6, 2
5 5, 7, 1 (wrong too)
6 6, 2, 4
7 7, 1, 3 (wrong too)

Still I'm sure that this could be done in SQL, with some formule, analytics, but got lost in finding the correct way to do so.

Analytical function

Amit, April 26, 2006 - 7:50 pm UTC

I have a result set like this

SELECT sku, price, pct_off,
price * ((100 - pct_off) / 100) calculated_price
FROM (SELECT 101 sku, 1000 price, 9 + rownum pct_off
FROM user_all_tables
WHERE rownum <= 5);

SKU PRICE PCT_OFF CALCULATED_PRICE
--------- --------- --------- ----------------
101 1000 10 900
101 1000 11 890 -> 801
101 1000 12 880 -> 704.88
101 1000 13 870 -> 613.25
101 1000 14 860 -> 527.39

The calculated price is simply being calculated here
as Price - pct_off.

What I would like to do is to calculate the price based on previous calculated value.

So for row2, I want the calculated price to be
900 - 11% off = 801
for row3, it should be
801 - 12% off = 704.88 and so on.

April 27, 2006 - 2:54 pm UTC

no create
no insert intos
no example basically

no look, not promising anything, I just don't look at these until I have an example I can cut and paste.

read about LAG() - it will do this, you can figure this out - I'm sure you can (you will need an ORDER BY which is WOEFULLY missing from your example - rows have NO ORDER until you give it to them)

Analytical function

Amit, April 27, 2006 - 4:39 pm UTC

Tom, 
There are no inserts needed to run my query. You can directly run this statement. Please run this statement once and you'll understand what I am talking about.

SELECT sku, price, pct_off, 
       price * ((100 - pct_off) / 100) calculated_price
 FROM (SELECT 101 sku, 1000 price, 9 + rownum pct_off
           FROM user_all_tables
          WHERE rownum <= 5);

I have read about Lag() function and tried to use it here but it does not give me the results I need.

SQL> ed
Wrote file afiedt.buf

  1  SELECT sku, price, pct_off, cal_price,
  2      lag(cal_price, 1, cal_price) over(ORDER BY 1) new_cal_price
  3    FROM (SELECT sku, price, pct_off, price * ((100 - pct_off) / 100) cal_price
  4       FROM (SELECT 101 sku, 1000 price, 9 + rownum pct_off
  5       FROM user_all_tables
  6*        WHERE rownum <= 5))
SQL> /

      SKU     PRICE   PCT_OFF CAL_PRICE NEW_CAL_PRICE
--------- --------- --------- --------- -------------
      101      1000        10       900           900 -- Correct
      101      1000        11       890           900 -- Should be 801
      101      1000        12       880           890 -- Should be 704.88
      101      1000        13       870           880 -- Should be 613.25
      101      1000        14       860           870 -- Should be 527.39

I want to calculate the price based on previous calculated 
value.

For row2, I want the calculated price to be
900 - 11% off  = 801 
for row3, it should be
801 - 12% off = 704.88 and so on.

April 27, 2006 - 4:49 pm UTC

umm, without an order by - I'm afraid there is no "prior row" concept AT ALL.

Need real example, YOU NEED SOMETHING TO ACTUALLY SORT THE DATA BY

order by 1

that orders (in this case) by the CONSTANT NUMBER ONE, which means "they are all equally "first" or "last" or "next")

order order in the court. give us a real example to deal with here.

Analytical question

A reader, April 27, 2006 - 4:50 pm UTC

I have tried this variation. This is closer to my desired result but only upto row 2.

SQL> SELECT sku, price, pct_off, cal_price,
  2      lag(cal_price * ((100 - pct_off) / 100), 1, cal_price) over(ORDER BY 1) new_cal_price
  3    FROM (SELECT sku, price, pct_off, price * ((100 - pct_off) / 100) cal_price
  4       FROM (SELECT 101 sku, 1000 price, 9 + rownum pct_off
  5       FROM user_all_tables
  6         WHERE rownum <= 5));

      SKU     PRICE   PCT_OFF CAL_PRICE NEW_CAL_PRICE
--------- --------- --------- --------- -------------
      101      1000        10       900           900
      101      1000        11       890           810
      101      1000        12       880         792.1
      101      1000        13       870         774.4
      101      1000        14       860         756.9

April 28, 2006 - 1:43 am UTC

I give up.

These rows have NO ORDER WHATSOEVER, you need something to order by.

You don't have anything in your example to tell me what "row 2" is. Your rows could be anywhere - you could run this query tomorrow and your row 2 might become row 4 - or row 5000 and not even appear.

Analytical question.

Amit, April 27, 2006 - 5:12 pm UTC

Ok..let me give you some real data with an order by clause.

SQL> CREATE TABLE A AS
  2   SELECT 101 sku, 1000 price, 9 + rownum pct_off, trunc(sysdate) + (2 * rownum) apply_dt
  3     FROM user_all_tables
  4   WHERE rownum <= 5;

Table created.

SQL> select * from a;

      SKU     PRICE   PCT_OFF APPLY_DT
--------- --------- --------- --------
      101      1000        10 04/29/06
      101      1000        11 05/01/06
      101      1000        12 05/03/06
      101      1000        13 05/05/06
      101      1000        14 05/07/06

SQL> SELECT sku, price, pct_off, apply_dt, cal_price,
  2      lag(cal_price * ((100 - pct_off) / 100), 1, cal_price) over(ORDER BY 1) new_cal_price
  3    FROM (SELECT sku, price, pct_off, price * ((100 - pct_off) / 100) cal_price, apply_dt
  4       FROM a
  5      ORDER BY apply_dt);

      SKU     PRICE   PCT_OFF APPLY_DT CAL_PRICE NEW_CAL_PRICE
--------- --------- --------- -------- --------- -------------
      101      1000        10 04/29/06       900           900
      101      1000        11 05/01/06       890           810
      101      1000        12 05/03/06       880         792.1
      101      1000        13 05/05/06       870         774.4
      101      1000        14 05/07/06       860         756.9

model is great - but 10G only

vlad, April 27, 2006 - 7:09 pm UTC

SQL> select sku, price, pct_off, new_price
  2    from (select aaa.*, rownum rn from aaa order by apply_dt)
  3     model
  4      return updated rows
  5     partition by(sku) dimension by(rn)
  6     measures(price, pct_off,0 new_price)
  7     rules(new_price [any] = nvl(new_price[cv()-1]/100*(100-pct_off[cv()]),price[cv()]/100*(100-pct_off[cv()])));

       SKU      PRICE    PCT_OFF  NEW_PRICE
---------- ---------- ---------- ----------
       101       1000         10        900
       101       1000         11        801
       101       1000         12     704.88
       101       1000         13   613.2456
       101       1000         14 527.391216

SQL>

10G is cool, but all you need is to search carefully Tom's archive :-))

Vlad, April 27, 2006 - 10:19 pm UTC

Amit,
it took me some time to figure out that there is no need for all this model cool features - just common sence and Tom's archive. You actually don't need previous price value - it's enough to know initial value and all factors for calculating new_price for the current row:
SQL> select ord, 1000 * exp(sum(ln((100 - pct) / 100)) over(order by ord)) new_price,
  2         price,
  3         pct
  4    from (select rownum ord, rownum + 9 pct, 1000 price from all_tables where rownum < 10);

       ORD  NEW_PRICE      PRICE        PCT
---------- ---------- ---------- ----------
         1        900       1000         10
         2        801       1000         11
         3     704.88       1000         12
         4   613.2456       1000         13
         5 527.391216       1000         14
         6 448.282533       1000         15
         7 376.557328       1000         16
         8 312.542582       1000         17
         9 256.284917       1000         18

9 rows selected

SQL>

Thanks Vlad

Amit, April 27, 2006 - 11:06 pm UTC

Thanks Vlad. This is exactly what I needed.

I will also checkout the model function in 10G. It looks very interesting.

Is it possible to implement with LAG?

A reader, April 28, 2006 - 1:20 pm UTC

Tom, just curious - how were you going to implement this with LAG function call? Doesn't look like this was the right suggestion (if possible at all?)

April 28, 2006 - 1:32 pm UTC

I did not really look at it until there was an example to go with it (said "not promising anything"). I don't look at them in detail until everything is good to go.

And by then, someone else had already done it :)

Analytical question

Amit, April 28, 2006 - 1:43 pm UTC

I need some more help on the question I asked earlier.
The solution Vlad provided works very well if I need to do percent off in all the rows.
I need to break this pattern based on conditions and then continue again.


create table AA
(
  BATCH_ID       NUMBER(10) not null,
  APPLY_DT       DATE not null,
  SKU_NO         NUMBER(9) not null,
  SITE_NO        NUMBER(4) not null,
  PCT           NUMBER(5,2) not null,
  PRICE          NUMBER(10,2) not null,
  QTY          NUMBER(4) not null,
  UNIT_PRICE      NUMBER
);

insert into AA (BATCH_ID, APPLY_DT, SKU_NO, SITE_NO, PCT, PRICE, QTY, UNIT_PRICE)
values (6282, to_date('06-09-2006', 'dd-mm-yyyy'), 10007, 999, 12, 0, 0, 8.55);
insert into AA (BATCH_ID, APPLY_DT, SKU_NO, SITE_NO, PCT, PRICE, QTY, UNIT_PRICE)
values (6283, to_date('15-09-2006', 'dd-mm-yyyy'), 10007, 999, 5, 0, 0, 8.55);
insert into AA (BATCH_ID, APPLY_DT, SKU_NO, SITE_NO, PCT, PRICE, QTY, UNIT_PRICE)
values (6287, to_date('16-09-2006', 'dd-mm-yyyy'), 10007, 999, 0, 12, 1, 8.55);
insert into AA (BATCH_ID, APPLY_DT, SKU_NO, SITE_NO, PCT, PRICE, QTY, UNIT_PRICE)
values (6286, to_date('22-09-2006', 'dd-mm-yyyy'), 10007, 999, 10, 0, 0, 8.55);
commit;


SQL> SELECT apply_dt, pct, price, qty, unit_price,
  2      round(CASE
  3       WHEN pct > 0 THEN
  4        unit_price * exp(SUM(ln((100 - pct) / 100)) over(ORDER BY apply_dt))
  5       ELSE
  6        price / qty
  7      END, 2) new_price
  8    FROM aa;

APPLY_DT       PCT     PRICE       QTY UNIT_PRICE NEW_PRICE
-------- --------- --------- --------- ---------- ---------
09/06/06        12         0         0       8.55      7.52-> 12% Off 8.55
09/15/06         5         0         0       8.55      7.15-> 5% Off 7.52
09/16/06         0        12         1       8.55        12-> pct = 0 then                                                                    price/qty
09/22/06        10         0         0       8.55      6.43-> 10% off 12

In the example above row4 is the problem. 
In row3 I do not want to do pct off because pct = 0 for row3. I want to use price/qty instead. I am able to do what using case statement.

Now, for row4, I want 10% off row3 value. I want 10% of 12 and should get 10.8 and not 10% off 7.15 which is 6.43.

added some more rows to your table...

Vlad, April 28, 2006 - 5:18 pm UTC

SQL> select zzz.*,
  2         first_value(unit_price) over(partition by part order by apply_dt) *
  3         exp(SUM(ln((100 - pct) / 100)) over(partition by part ORDER BY apply_dt)) new_price
  4    from (select zz.*,
  5                 nvl((sum(rn) over(order by apply_dt desc) - nvl(rn, 0)),0) part
  6            from (select aa.batch_id,
  7                         aa.apply_dt,
  8                         aa.sku_no,
  9                         aa.site_no,
 10                         aa.pct,
 11                         aa.price,
 12                         aa.qty,
 13                         decode(aa.pct, 0, aa.price / aa.qty, aa.unit_price) unit_price,
 14                         z.rn
 15                    from aa,
 16                         (select rowid rd, rownum rn
 17                            from aa
 18                           where pct = 0
 19                           order by apply_dt) z
 20                   where aa.rowid = z.rd(+)
 21                   order by apply_dt) zz) zzz
 22   order by apply_dt;

   BATCH_ID APPLY_DT        SKU_NO SITE_NO     PCT        PRICE   QTY UNIT_PRICE         RN       PART  NEW_PRICE
----------- ----------- ---------- ------- ------- ------------ ----- ---------- ---------- ---------- ----------
       6282 9/6/2006         10007     999   12.00         0.00     0       8.55                     3      7.524
       6283 9/15/2006        10007     999    5.00         0.00     0       8.55                     3     7.1478
       6287 9/16/2006        10007     999    0.00        12.00     1         12          1          2         12
       6286 9/22/2006        10007     999   10.00         0.00     0       8.55                     2       10.8
       6282 9/6/2007         10007     999   12.00         0.00     0       8.55                     2      9.504
       6283 9/15/2007        10007     999    5.00         0.00     0       8.55                     2     9.0288
       6287 9/16/2007        10007     999    0.00        12.00     1         12          2          0         12
       6286 9/22/2007        10007     999   10.00         0.00     0       8.55                     0       10.8

8 rows selected

SQL>

Excellent Vlad

Amit, April 28, 2006 - 9:08 pm UTC

This works very well. Excellent work Vlad. Thanks

Tom, can this be done in any other way ?

Is it possible with Analytics function ?

Jonty, May 01, 2006 - 2:25 pm UTC

Hello Tom,

This will help you to create table and populate data for my problem.

CREATE TABLE T_Order
(Item VARCHAR2(20),
Order_Date DATE NOT NULL,
Order_No NUMBER(10,0))
/

INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('101','1-JAN-2006',9001)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('101','2-JAN-2006',9002)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('101','8-JAN-2006',9003)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('101','16-JAN-2006',9004)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('101','18-JAN-2006',9005)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('102','1-JAN-2006',9006)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('102','5-JAN-2006',9007)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('102','7-JAN-2006',9008)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('102','20-JAN-2006',9009)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('103','5-JAN-2006',9010)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('103','7-JAN-2006',9011)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('103','9-JAN-2006',9012)
/

SELECT a.item, a.order_date, a.order_no
FROM t_order a
/

I have following data in my table.

Row# Item Order_Date Order_No
---- ----- ---------- ---------
1 101 1-Jan-2006 9001
2 101 2-Jan-2006 9002
3 101 8-Jan-2006 9003
4 101 16-Jan-2006 9004
5 101 18-Jan-2006 9005

6 102 1-Jan-2006 9006
7 102 5-Jan-2006 9007
8 102 7-Jan-2006 9008
9 102 20-Jan-2006 9009

10 103 5-Jan-2006 9010
11 103 7-Jan-2006 9011
12 103 9-Jan-2006 9012

I need to find out which are the latest group of Orders for each Item.

Group of Ordres - any 2 orders order_date are within 1 week time gap.

In above example,
For
Item 101 - Row # 4,5 will be group # 1
Row# 1,2,3 wiil be group # 2

Item 102 - Row # 9 will be group # 1
Row # 6,7,8 will be group # 2

Item 103 - Row # 10, 11, 12 will be group # 1

I am only interested in group # 1. Query results should be like this

Item # Order No
-------- ------------
101 9004
101 9005

102 9009

103 9010
103 9011
103 9012

I really appreciate your help on this.

May 02, 2006 - 3:21 am UTC

a tad "ambigous" - 

do you mean within the last week of the LAST RECORD (eg: for item 101 - any date between 18-Jan-2006 and 12-Jan-2006)

OR

any set of records such that the date of the current record is within one week of the prior record (eg: if item 101 had observations:

18-jan, 15-jan, 12-jan, 9-jan, 6-jan, 3-jan

they would all "count")

we can do either, here is the "first" one:

ops$tkyte@ORA10GR2> select *
  2    from (
  3  SELECT a.item, a.order_date, a.order_no,
  4         max(order_date) over (partition by item) max_order_date
  5    FROM t_order a
  6         )
  7   where max_order_date - order_date <= 7
  8  /

ITEM                 ORDER_DAT   ORDER_NO MAX_ORDER
-------------------- --------- ---------- ---------
101                  16-JAN-06       9004 18-JAN-06
                     18-JAN-06       9005 18-JAN-06

102                  20-JAN-06       9009 20-JAN-06

103                  05-JAN-06       9010 09-JAN-06
                     07-JAN-06       9011 09-JAN-06
                     09-JAN-06       9012 09-JAN-06


6 rows selected.

2nd Option

Jonty, May 02, 2006 - 10:11 am UTC

Thanks a lot for quick response.

Yes, I am looking for this option answer.

"any set of records such that the date of the current record is within one week
of the prior record (eg: if item 101 had observations:
18-jan, 15-jan, 12-jan, 9-jan, 6-jan, 3-jan
they would all "count")"

Here is the Insert St. for an example.

INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('104','01-JAN-2006',9091)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('104','02-JAN-2006',9094)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('104','08-JAN-2006',9092)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('104','14-JAN-2006',9093)
/
INSERT INTO t_order
(ITEM,ORDER_DATE,ORDER_NO)
VALUES
('104','18-JAN-2006',9095)
/

ITEM ORDER_DATE ORDER_NO
-------------------- ---------------------- ----------
104 1-Jan-2006 9091
104 2-Jan-2006 9094
104 18-Jan-2006 9092
104 24-Jan-2006 9093
104 28-Jan-2006 9095

Results should bring 18-jan, 24-jan, 28-jan records.

Thanks in advance.

May 02, 2006 - 3:47 pm UTC

arg - why do selects NOT match "supplied data"

inserts <> select....

oh well, work with this:

ops$tkyte@ORA10GR2> select * from t_order;

ITEM                 ORDER_DAT   ORDER_NO
-------------------- --------- ----------
101                  01-JAN-06       9001
101                  02-JAN-06       9002
101                  08-JAN-06       9003
101                  16-JAN-06       9004
101                  18-JAN-06       9005
102                  01-JAN-06       9006
102                  05-JAN-06       9007
102                  07-JAN-06       9008
102                  20-JAN-06       9009
103                  05-JAN-06       9010
103                  07-JAN-06       9011
103                  09-JAN-06       9012
104                  01-JAN-06       9091
104                  02-JAN-06       9094
104                  08-JAN-06       9092
104                  14-JAN-06       9093
104                  18-JAN-06       9095

17 rows selected.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select *
  2    from (
  3  select item, order_date, order_no,
  4         max(grp) over (partition by item order by order_date DESC) maxgrp
  5    from (
  6  select item, order_date, order_no,
  7         case when (lag(order_date) over (partition by item order by order_date DESC)-order_date)>7
  8                  then row_number() over (partition by item order by order_date DESC)
  9              when lag(order_date) over (partition by item order by order_date DESC) is null then 1
 10             end grp
 11    from t_order
 12         )
 13             )
 14   where maxgrp = 1
 15   order by item, order_date
 16  /

ITEM                 ORDER_DAT   ORDER_NO     MAXGRP
-------------------- --------- ---------- ----------
101                  16-JAN-06       9004          1
101                  18-JAN-06       9005          1
102                  20-JAN-06       9009          1
103                  05-JAN-06       9010          1
103                  07-JAN-06       9011          1
103                  09-JAN-06       9012          1
104                  01-JAN-06       9091          1
104                  02-JAN-06       9094          1
104                  08-JAN-06       9092          1
104                  14-JAN-06       9093          1
104                  18-JAN-06       9095          1

11 rows selected.

It works.

Jonty, May 03, 2006 - 10:19 am UTC

Thanks a lot.

It works well.

Where can I find detailed material about Analytic Functions ?

May 03, 2006 - 1:01 pm UTC

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14223/analysis.htm#i1007779 <code>

Merge rows with first non-null value for each column

narendra, May 08, 2006 - 7:48 am UTC

Tom,

It sounds pretty simple to achieve when I put it in english. However, I am not able to write a SQL to achieve the same.
Following are the details:
select * from v$version ;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
PL/SQL Release 9.2.0.6.0 - Production
CORE 9.2.0.6.0 Production
TNS for Solaris: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production

create table t_src (id number not null, name varchar2(10), st_id number, created_date date not null)
/

Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('6',null,'2',to_date('16-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('6',null,null,to_date('08-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('6','Row6','1',to_date('05-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('7',null,null,to_date('07-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('7','Row10','1',to_date('06-MAY-06','DD-MON-YY'))
/

select * from t_src order by id, created_date desc ;

ID NAME ST_ID CREATED_DATE
---------- ---------- ---------- -------------------
6 2 16-05-2006 00:00:00
6 08-05-2006 00:00:00
6 Row6 1 05-05-2006 00:00:00
7 07-05-2006 00:00:00
7 Row10 1 06-05-2006 00:00:00

Desired Output:

ID NAME ST_ID CREATED_DATE
---------- ---------- ---------- -------------------
6 Row6 2 16-05-2006 00:00:00
7 Row10 1 07-05-2006 00:00:00

The logic behind desired output is:
Generate a row for each ID, with latest CREATED_DATE.
Fill columns NAME & ST_ID with first not null value in respective columns for the ID.

Can you please help me out ?

May 08, 2006 - 8:31 am UTC

...
Generate a row for each ID, with latest CREATED_DATE.
Fill columns NAME & ST_ID with first not null value in respective columns for 
the ID.
.....

your example does not do that.  Your ST_ID for row6 is for the latest row - not the first non-null value.

This is what I come up with:

ops$tkyte@ORA10GR2> select id, name, st_id, max_date
  2    from (
  3  select id, name, st_id, created_date,
  4         max(created_date) over (partition by id) max_date ,
  5             min(case when name is not null then created_date end) over (partition by id) min_date
  6    from t_src
  7         )
  8   where created_date = min_date
  9   order by id, created_date
 10  /
 
        ID NAME                                ST_ID MAX_DATE
---------- ------------------------------ ---------- ---------
         6 Row6                                    1 16-MAY-06
         7 Row10                                   1 07-MAY-06

Incomplete Test Data

Narendra, May 08, 2006 - 11:57 pm UTC

Sorry Tom,

My sample data did not cater for all conditions.
select * from t_src order by id, created_date desc ;

ID NAME ST_ID CREATED_DATE
---------- ---------- ---------- -------------------
6 2 16-05-2006 00:00:00
6 Row5 08-05-2006 00:00:00
6 Row6 1 05-05-2006 00:00:00
7 07-05-2006 00:00:00
7 Row10 1 06-05-2006 00:00:00
7 Row11 3 06-05-2006 00:00:00

Desired Output:

ID NAME ST_ID CREATED_DATE
---------- ---------- ---------- -------------------
6 Row5 2 16-05-2006 00:00:00
7 Row10 1 07-05-2006 00:00:00

"Row5" in first row is due to
i) "Name" column for ID = 6 with max created date is null &
ii) For remaining rows having ID = 6, "Row5" is first not null value.

With same logic, row with ID = 7 has "Row10" value in NAME column & "1" in ST_ID column.

Hope I am clear.

May 09, 2006 - 7:42 am UTC

I didn't see any inserts here for me to use - did I miss something?

and your explaination is still not very clear.

I believe you are trying to say:

get me the unique ID's
for each unique ID show me:
a) the max(created_date)
b) the last non-null value of name when sorted by created_date
c) the last non-null value of st_id when sorted by created_date

You don't seem to be mentioning "how row with id=6" got st_id = 2.

More details

Narendra, May 10, 2006 - 12:13 am UTC

Sorry Tom for missing out on inserts...

Following are details.

create table t_src (id number not null, name varchar2(10), st_id number,
created_date date not null)
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('6',null,null,to_date('16-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('6','asdf',null,to_date('10-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('6','Row5',null,to_date('08-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('6',null,'6',to_date('05-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('7',null,null,to_date('07-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('7',null,'1',to_date('06-MAY-06','DD-MON-YY'))
/
Insert into T_SRC ("ID","NAME","ST_ID","CREATED_DATE") values ('7','Row11','3',to_date('04-MAY-06','DD-MON-YY'))
/

select * from t_src order by id, created_date desc
/
ID NAME ST_ID CREATED_DATE
6 2 16-MAY-06
6 10-MAY-06
6 Row5 08-MAY-06
6 6 05-MAY-06
7 07-MAY-06
7 1 06-MAY-06
7 Row11 3 04-MAY-06

Desired Output:
ID NAME ST_ID CREATED_DATE
6 Row5 2 16-MAY-06
7 Row11 1 07-MAY-06

Let me try explaining what I am trying to achieve.
1. For each uniqued ID, get the record with max CREATED_DATE. So in above example, I get following 2 rows:
ID NAME ST_ID CREATED_DATE
6 16-MAY-06
7 07-MAY-06

2. For each record (i.e. unique ID) derived in step (1),
fill rest of the columns as follows
2.1 If the record for a unique ID, with max CREATED_DATE, has any of the other columns with not null values, get those values as it is. So the above results become:
ID NAME ST_ID CREATED_DATE
6 2 16-MAY-06
7 07-MAY-06

2.2 For all columns of the record with unique ID, having max CREATED_DATE, get the FIRST NOT NULL value from remaining rows (from the output of query "select * from t_src order by id, created_date desc") for the same ID.
So the above results become:
ID NAME ST_ID CREATED_DATE
6 Row5 2 16-MAY-06
7 Row11 1 07-MAY-06

Hope I am clear about the same.
Awaiting your reply.

Regards

May 10, 2006 - 7:52 am UTC

no, this is not clear - 2.2 is NOT clear at all. Not really.  

Actually, I don't see why there is a 2.1 and a 2.2 - seems you could say it all at once.

Are the columns treated independantly so that what I stated:

get me the unique ID's
for each unique ID show me:
  a) the max(created_date)
  b) the last non-null value of name when sorted by created_date desc
  c) the last non-null value of st_id when sorted by created_date desc

is actually what you mean.  It seems so.  In which case - I disagree with your sample output.  Your input data:

ops$tkyte@ORA10GR2> select * from t_src order by id, created_date desc;
 
        ID NAME                                ST_ID CREATED_D
---------- ------------------------------ ---------- ---------
         6                                           16-MAY-06
         6 asdf                                      10-MAY-06
         6 Row5                                      08-MAY-06
         6                                         6 05-MAY-06
         7                                           07-MAY-06
         7                                         1 06-MAY-06
         7 Row11                                   3 04-MAY-06
 
7 rows selected.


So, I don't see ANY row with id=6 and st_id = 2.  I don't get where your example comes up with that.  Also, I see that when id=6, asdf is going to be the last non-null value for NAME when sorted by created_date desc.

Stating a requirement and having examples that correspond to what you type in - sort of important.


Ok, I'll guess my statement of the problem is close to what you really mean, then:

ops$tkyte@ORA10GR2> select id,
  2         max(created_date),
  3         substr( max( case when name is not null
  4                    then to_char(created_date,'yyyymmddhh24miss')||name
  5                       end ), 15 ) name,
  6         to_number( substr( max( case when st_id is not null
  7                     then to_char(created_date,'yyyymmddhh24miss')|| st_id
  8                                  end ), 15 ) ) st_id
  9    from t_src
 10   group by id;
 
        ID MAX(CREAT NAME                                ST_ID
---------- --------- ------------------------------ ----------
         6 16-MAY-06 asdf                                    6
         7 07-MAY-06 Row11                                   1
 

is what you were really looking for perhaps.

Results....bang on target

Narendra, May 11, 2006 - 12:10 am UTC

Tom,

Thanks a lot.
That is the exact query I am looking at.
However, just one question in the confusion created by my explanation.
You said:
get me the unique ID's
for each unique ID show me:
a) the max(created_date)
b) the last non-null value of name when sorted by created_date desc
c) the last non-null value of st_id when sorted by created_date desc

With input data as:
ID NAME ST_ID CREATED_D
---------- ------------------------------ ---------- ---------
6 16-MAY-06
6 asdf 10-MAY-06
6 Row5 08-MAY-06
6 6 05-MAY-06
7 07-MAY-06
7 1 06-MAY-06
7 Row11 3 04-MAY-06

Now, "the last non-null value of name when sorted by created_date desc" for ID = 6 becomes "Row5" and not "asdf", Right?
Anyway, that is what I needed. "The first non-null value of name when sorted by created_date desc".
I think my interpretation of "first" and "last" created confusion. Sorry for the same.

Thanks once again.
( Will keep bothering you...:)

Should I use Analytics

David Piazza, May 25, 2006 - 11:08 pm UTC

Tom,

I have the following problem. Table A has multiple x values which has just once value in Table B. I need to update column y in table B based on the criteria of the latest date in column z as long as the record is older than 16-MAY-06. The columns are shown below along with the way I'd like table B to end up:

Table A (Test Schema) Table B
x y z x y
-- -- --------- -- --
1 10 14-MAY-06 1 10
2 25 14_MAY-06 2 NULL
2 2 15_MAY-06 3 11
3 25 10_MAY-06 4 24
3 3 18_MAY-06
3 1 15_MAY-06
4 42 14_APR-06
4 4 14_MAY-05

AFTER:
Table B
x y
-- --
1 10
2 2
3 1
4 42

I have successfully updated one value for value x, but when I try them all using the following query, I don't get the results I need. I've spent a lot of time trying different things but to no avail. Can this be done without analytics, and how would you do it?

UPDATE B
SET y=
(SELECT *
FROM (SELECT y
FROM test.A,B
WHERE test.A.x=B.x AND
test.z < TO_DATE('16-MAY-2006','dd-mon-yy')
ORDER by test.z DESC )
WHERE rownum = 1);

May 26, 2006 - 8:36 am UTC

I would start by setting up a create table and inserts so that when I asked someone to play around with it - they would have an out of the box test case to work with.

Either MERGE using an aggregate query, or a correlated subquery likely.

count's between boundaries

A reader, June 01, 2006 - 5:25 am UTC

tom,

can you please show me a more elegant solution for my query below?



TEST @ adm1 SQL>create table emp as select * from scott.emp;

Tabelle wurde erstellt.

Abgelaufen: 00:00:00.19
TEST @ adm1 SQL>insert into emp select * from scott.emp;

15 Zeilen wurden erstellt.

Abgelaufen: 00:00:00.01
TEST @ adm1 SQL>insert into emp select * from scott.emp;

15 Zeilen wurden erstellt.

Abgelaufen: 00:00:00.00
TEST @ adm1 SQL>edit
file afiedt.buf wurde geschrieben

  1  select lb, rb, count(*) sal_between
  2  from
  3  (
  4  select b.l_bound lb, b.r_bound rb, s.sal ss
  5  from
  6  (
  7  select (rownum - 1) * 1000        l_bound
  8        ,(rownum - 1) * 1000 + 1000 r_bound
  9  from   all_objects
 10  where  rownum <= 20                -- max. l_bound / r_bound.
 11  ) b,
 12  (
 13  select sal
 14  from   emp
 15  where  deptno between 10 and 30    -- data from.
 16  ) s
 17  where s.sal between b.l_bound and b.r_bound
 18  )
 19  group by lb, rb
 20* order by lb, rb
 21  /

        LB         RB SAL_BETWEEN
---------- ---------- -----------
         0       1000           6
      1000       2000          18
      2000       3000          15
      3000       4000           6
      4000       5000           3
      5000       6000           3

6 Zeilen ausgewählt.

Abgelaufen: 00:00:00.20

June 01, 2006 - 10:36 am UTC

it looks like you just want to "divide" doesn't it?? This is not 100% identical to yours (you double count things on the "boundary", dividing won't)

ops$tkyte@ORA10GR2> select lb, rb, count(*) sal_between
  2  from
  3  (
  4  select b.l_bound lb, b.r_bound rb, s.sal ss
  5  from
  6  (
  7  select (rownum - 1) * 1000        l_bound
  8        ,(rownum - 1) * 1000 + 1000 r_bound
  9  from   all_objects
 10  where  rownum <= 20                -- max. l_bound / r_bound.
 11  ) b,
 12  (
 13  select sal
 14  from   emp
 15  where  deptno between 10 and 30    -- data from.
 16  ) s
 17  where s.sal between b.l_bound and b.r_bound
 18  )
 19  group by lb, rb
 20  order by lb, rb
 21  /

        LB         RB SAL_BETWEEN
---------- ---------- -----------
         0       1000           2
      1000       2000           6
      2000       3000           5  <<<==== counted 3000 here
      3000       4000           2  <<<====              and here
      4000       5000           1  <<<==== counted 5000 here
      5000       6000           1  <<<====              and here

6 rows selected.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select minsal, maxsal, cnt, truncsal*1000 || ' >= sal < ' || (truncsal+1)*1000 rng
  2    from (
  3  select min(sal) minsal, max(sal) maxsal, count(*) cnt, trunc(sal/1000) truncsal
  4    from emp
  5   group by trunc(sal/1000)
  6         )
  7   order by truncsal
  8  /

    MINSAL     MAXSAL        CNT RNG
---------- ---------- ---------- --------------------
       800        950          2 0 >= sal < 1000
      1100       1600          6 1000 >= sal < 2000
      2450       2975          3 2000 >= sal < 3000
      3000       3000          2 3000 >= sal < 4000
      5000       5000          1 5000 >= sal < 6000

count between boundaries

A reader, June 01, 2006 - 6:29 am UTC

ups, i made a mistake. overlapping partition boundaries.

TEST @ adm1 SQL>edit
file afiedt.buf wurde geschrieben

  1  select lb, rb, count(*) sal_between
  2  from
  3  (
  4   select b.l_bound lb, b.r_bound rb, s.sal ss
  5   from
  6   (
  7     select (rownum - 1) * 1000 + 1    l_bound
  8           ,(rownum - 1) * 1000 + 1000 r_bound
  9     from   all_objects
 10     where  rownum <= 20                -- max. l_bound / r_bound.
 11   ) b,
 12   (
 13     select sal
 14     from   emp
 15     where  deptno between 10 and 30    -- data from.
 16   ) s
 17   where s.sal between b.l_bound and b.r_bound
 18  )
 19  group by lb, rb
 20* order by lb, rb
TEST @ adm1 SQL>/

        LB         RB SAL_BETWEEN
---------- ---------- -----------
         1       1000           6
      1001       2000          18
      2001       3000          15
      4001       5000           3

Abgelaufen: 00:00:00.07

June 01, 2006 - 10:43 am UTC

see above, same concept will work if you "tweak the math in the division" to get 3000/5000 to go into the group you think they should.

Beware of decimal numbers with your solution?

count's between boundaries

A reader, June 01, 2006 - 7:03 am UTC

i tried the width_bucket analytic function. would this be the best way?

TEST @ adm1 SQL>edit
file afiedt.buf wurde geschrieben

  1  select lb, rb, count(*) rates_between
  2  from
  3  (
  4  select sal
  5        ,width_bucket (sal - 1, 0, 50000, 50) * 1000 - 999 lb
  6        ,width_bucket (sal - 1, 0, 50000, 50) * 1000       rb
  7  from emp
  8  )
  9  group by lb, rb
 10* order by lb, rb
TEST @ adm1 SQL>/

        LB         RB RATES_BETWEEN
---------- ---------- -------------
         1       1000             6
      1001       2000            18
      2001       3000            15
      4001       5000             3
                                  3

Abgelaufen: 00:00:00.01

June 01, 2006 - 10:43 am UTC

I think you just want to "divide"

count's between boundaries

A reader, June 02, 2006 - 8:37 am UTC

divide! yes, that's it! i wish i could find such a simple concise and elegant solution on the first try! ;(

what's about your book concerning "analytics"?


TEST @ adm1 SQL>SELECT   minsal, maxsal, cnt,
  2              LPAD ((DECODE (truncsal * 1000, 0, 0, truncsal * 1000 + 1)),
  3                    4
  4                   )
  5           || ' >= sal <= '
  6           || ((truncsal + 1) * 1000) rng
  7      FROM (SELECT   MIN (sal) minsal, MAX (sal) maxsal, COUNT (*) cnt,
  8                     TRUNC ((sal - 1) / 1000) truncsal
  9                FROM emp
 10            GROUP BY TRUNC ((sal - 1) / 1000))
 11  ORDER BY truncsal
 12  /

    MINSAL     MAXSAL        CNT RNG
---------- ---------- ---------- ----------------------------------------
       800        950          6    0 >= sal <= 1000
      1100       1600         18 1001 >= sal <= 2000
      2450       3000         15 2001 >= sal <= 3000
      5000       5000          3 4001 >= sal <= 5000
                               3  >= sal <=

Abgelaufen: 00:00:00.02

TO_NUMBER PROBLEM

A reader, June 07, 2006 - 7:33 pm UTC

Hi

I have external table which has column which is char(19)

when i do SELECT TO_NUMBER(Col_NUM) FROM ET_table

it comes as 4.70793001110283E15

i want output as number.

June 07, 2006 - 8:26 pm UTC

that is a number - sqlplus is printing a string so you can read it but it is in fact "a number"

you can

select to_char( to_number(col_num), '99999999999999999999.99999' ) from et_table;

but rest assured, it is a nice number and sqlplus is just trying to make it fit reasonably on the screen for us human beings to read.

Analytics or Not

Kandy Train, June 09, 2006 - 11:03 am UTC

Hi Tom,

Forgive me if this is not the proper thread to put this in.
I have never written a analytic query and came across this requiment and felt it can't be done with a normal ORDER BY.

Here is the example..

create table t1 (priority varchar2(20), planned_ship_date date, order_no varchar2(20));
insert into t1 values(1, sysdate + 5, 111111);
insert into t1 values(2, sysdate + 1, 333333);
insert into t1 values(3, sysdate + 1, 444444);
insert into t1 values(4, sysdate, 222222);
insert into t1 values(5, sysdate + 3, 555555);

--column planned_ship_date format a25
--select order_no, priority, planned_ship_date from t1;

I want the results to be sorted
1). All the Priority = 1 rows should come first
2). The rest should be orderd on the min(Planned_Ship_Date) --> earliest
3). When there is a tie on the Planned_Ship_date, priority = 2 rows should come first(If there is any).

The Order No column is there just to see the result, if we have orderes them correctly.

Can this be done??

Thank you so much for your help.

June 09, 2006 - 1:24 pm UTC

order by case when priority = 1 then 1 else 2 end,
planned_ship_date,
case when priority = 2 then 1 else 2 end
/

why is priority a varcahr2(20)? that is so wrong, hope that is not "real life"
same with order_no

never ever use a string to store a number, or a date. never.

LEAST of all dates

Mave, June 12, 2006 - 10:55 am UTC

Tom, I want to get Least of all the selected dates [Trying to get earliest date]

eg:
select least(d1,d2,d3) from (select sysdate-3 d1,sysdate-2 d2,sysdate d3 from dual)

should work. But If one of them is null, i want least of other two. Nulls should not be inlcuded in the list for LEAST Function

Thanks for all your help in advance.

June 13, 2006 - 10:04 am UTC

nvl() them and have the nvl return a date far into the future (like the year 4000)

least( nvl(d1,to_date('4000','yyyy')), nvl(d2,...)

and if you think all three might be null, use decode or case to return NULL if the returned least value is 4000

Works for me....

Mave, June 13, 2006 - 10:33 am UTC

Thanks tom, that would work for me.

But I am curious ..how does this to_date('4000','yyyy') work? I tried and got 06/01/4000 as the date. Why did it show june 1st of 4000?

Thanks,

June 13, 2006 - 12:28 pm UTC

the default day and month for yyyy is "first day of the current month"

you can of course use 01014000 and a format of ddmmyyyy if you like

If /how I can use analytics for this problem?

Ping, June 14, 2006 - 10:00 am UTC

I have a table with info like this:

c_id a_id reg_dt
10 1 1/1/2006
10 2 1/1/2006
10 3 1/1/2006
10 1 1/2/2006
10 2 1/2/2006
10 3 1/2/2006
10 1 1/3/2006
10 2 1/3/2006
10 3 1/3/2006

What I want to get is 3 rows:

c_id a_id reg_dt
10 1 1/1/2006
10 2 1/2/2006
10 3 1/3/2006

I was trying to use the lag/leap functions but didn't seem to be able to make it work. Any suggestions?

Thanks!

June 14, 2006 - 12:37 pm UTC

fascinating.

a bunch of data and no description of what it means or why you got the three rows you did.

that and there is no create table, no inserts, nothing to play with.

Query

Rambabu, June 15, 2006 - 7:10 am UTC

Hi Tom,

I have a table like this.

Pid seq value1 value2
100 1 X A
100 2 Y B
101 1 M O
101 2 N P

I want the output in the following way.

Pid value1 value2
100 X B
101 M P

For all Pids, I want value1 from the row which has "seq =1" and value2 from the row which has "seq =2".

I got the above result using the following query.

select t1.pid,t1.value1, t2.value2
from (select *
from t tt
where seq = (select min(seq) from t where pid = tt.pid)) t1,
(select *
from t tt
where seq = (select max(seq) from t where pid = tt.pid)) t2
where t1.pid = t2.pid

Here i used two instances(t1,t2) of Table t.

How can i get the above result using a single instance of Table t and using analytics.

create table t(
Pid number(6),
seq number(6),
value1 varchar2(20),
value2 varchar2(20));

insert into t values(100,1,'X','A');
insert into t values(100,2,'Y','B');
insert into t values(101,1,'M','O');
insert into t values(101,2,'N','P');
commit;

Thanks in advance,
Rambabu B

June 15, 2006 - 8:47 am UTC

ops$tkyte@ORA10GR2> select pid,
  2         max( decode( seq, 1, value1 ) ) v1,
  3             max( decode( seq, 2, value2 ) ) v2
  4    from t
  5   group by pid;

       PID V1                   V2
---------- -------------------- --------------------
       100 X                    B
       101 M                    P



<b>sorry, analytics didn't fit into the answer</b>, they would not make sense here.

You can use analytics

Anwar, June 15, 2006 - 10:56 am UTC

Tom's answer seems better, more concise, but you can use analytics if you really want to.

SQL> SELECT pid,value1,v2
  2  FROM
  3  (
  4     SELECT pid,
  5             value1,
  6             lead(value2) over(partition by pid order by seq) v2,
  7             seq
  8     FROM t
  9  )
 10  WHERE seq=1
 11  /

       PID VALUE1               V2
---------- -------------------- --------------------
       100 X                    B
       101 M                    P

Sorry, let me try...

A reader, June 15, 2006 - 11:55 am UTC

I think the table I showed earlier is coming from a cross join b/w two tables:

Table 1, stores acct_ids that are cancelled

customer_id acct_id
10 1
10 2
10 3

table 2, stores acct_ids that are currently in use

customer_id acct_id reg_dt
10 11 1/1/2006
10 12 1/2/2006
10 13 1/3/2006

The two tables join on customer_id, so it got 9 rows.

Customer_id 10 cancelled 3 of his acct_ids, and then registered 3 new acct_ids. What my client really needs is, after customer_id 10 cancaled his acct_id 1,2,3, when did he registered acct_id 11,12,13.

customer_id acct_id_cancaled acct_id_new reg_dt
10 1 11 1/1/2006
10 2 12 1/2/2006
10 3 13 1/3/2006

Hope this is clear.

Thanks, Ping

June 16, 2006 - 5:51 pm UTC

I have no idea what you are referring to.

and still, hmm, not a create, not an insert - no real detail...

For create table and insert statements...

A reader, June 15, 2006 - 12:00 pm UTC

create table a
(c_id integer,a_id integer)
;
insert into a values (10, 1);
insert into a values (10, 2);
insert into a values (10, 3);

create table b
(c_id integer,a_id integer,reg_dt date);

insert into b values (10, 11,'1/1/2006');
insert into b values (10, 12,'1/2/2006');
insert into b values (10, 13,'1/3/2006');

Thanks!

SQL Query

Sankar Kumar, June 15, 2006 - 12:50 pm UTC

Hi Tom,

I have a table like this:

Personid Change_Sequence Class Cls_EffDate Location Loc_EffDate
----------------------------------------------------------------------------
1000 1 FullTime Hawaii
1000 2 FullTime California 1/1/2005
1000 3 PartTime 1/1/2006 California 1/1/2005
1000 4 PartTime 1/1/2006 Texas 10/1/2005
1000 5 FullTime 1/1/2007 Boston 1/1/2007
1000 6 COBRA 1/1/2005 Boston 1/1/2007
1000 7 Outside 5/1/2006 Boston 1/1/2007

1. The primary key is (Personid, change_sequence)

2. The effective dates of the first row for each person is null (i.e. for change_sequence = 1)

3. For each row only one column value will be effected (i.e. Either Class or Location can be changed) and the remaining data will be copied from the previous row

4. Both Class and Location can be changed in a row, only if the effective date is same. [See where change_sequence = 5]

I am using the following query for getting Class and Location of the person as per given Effective Date.

Select ECls.personid, ECls.class, ELoc.Location, ECls.change_sequence Cls_change_seq, ELoc.change_Sequence Loc_change_seq
from (select * from (select personid, class, change_sequence,
row_number() over(partition by personid order by change_sequence desc) r
from employee
where nvl(Cls_EffDate, to_date('&givenDate','MM/DD/YYYY')) <= to_date('&givenDate','MM/DD/YYYY')) Cls
where Cls.r = 1) ECls,

(Select * from (select personid, Location, change_sequence,
row_number() over(partition by personid order by change_sequence desc) r
from employee
where nvl(Loc_EffDate, to_date('&givenDate','MM/DD/YYYY')) <= to_date('&givenDate','MM/DD/YYYY')) Loc
where Loc.r = 1) ELoc

where ECls.personid = ELoc.personid;

Now lets run query with three different dates

For 1/1/2005 the output would be

PERSONID CLASS LOCATION CLS_CHANGE_SEQ LOC_CHANGE_SEQ
------------------------------------------------------------------------------
1000 COBRA-Class California 6 3

For 1/1/2006 the output would be
PERSONID CLASS LOCATION CLS_CHANGE_SEQ LOC_CHANGE_SEQ
------------------------------------------------------------------------------
1000 COBRA-Class Texas 6 4

For 1/1/2007 the output would be
PERSONID CLASS LOCATION CLS_CHANGE_SEQ LOC_CHANGE_SEQ
------------------------------------------------------------------------------
1000 Outside Boston 7 7

The table more than 1 Lakh rows and got more than 12 Columns, similar to CLASS and LOCATION, based on effective dates. (For example Dept, Dept_EffDate, Status, Status_EffDate,.....)

In the above query I created two instances ECls, ELoc. If I want dept column too, I would need to create one more instance.

The goal is to get all the Columns using less number of joins, if possible, using only one instance of the table?

If this is not possible, what would be the better design to handle this type of data?

create table employee
(personid number(10),
change_sequence number(3),
class varchar2(50),
cls_effdate date,
location varchar2(50),
loc_effdate date,
primary key (personid,change_sequence));

insert into employee values(1000,1,'FullTime',null,'Hawaii',null);
insert into employee
values(1000,2,'FullTime',null,'California',to_date('1/1/2005','MM/DD/YYYY'));
insert into employee
values(1000,3,'PartTime',to_date('1/1/2006','MM/DD/YYYY'),'California',to_date('1/1/2005','MM/DD/YYYY'));
insert into employee
values(1000,4,'PartTime',to_date('1/1/2006','MM/DD/YYYY'),'Texas',to_date('10/1/2005','MM/DD/YYYY'));
insert into employee
values(1000,5,'FullTime',to_date('1/1/2007','MM/DD/YYYY'),'Boston',to_date('1/1/2007','MM/DD/YYYY'));
insert into employee
values(1000,6,'COBRA-Class',to_date('1/1/2005','MM/DD/YYYY'),'Boston',to_date('1/1/2007','MM/DD/YYYY'));
insert into employee
values(1000,7,'Outside',to_date('5/1/2006','MM/DD/YYYY'),'Boston',to_date('1/1/2007','MM/DD/YYYY'));

Thanks in Advance
Sankar

A reader, June 16, 2006 - 8:01 am UTC

Dear Tom,

I have read chapter 12(analytic function) of your book and don't find the solution to my problem. May be you can help me.

CREATE TABLE T
(IDE NUMBER,
COMPTECLI VARCHAR2(13),
DAT_JOB DATE,
ACTION VARCHAR2(255),
FAG_SUCCESS VARCHAR2(1));

insert into t values (115, 'XX12', TRUN(SYSDATE), '01','N');

insert into t values (120, 'YY14', TRUN(SYSDATE), '01','N');

insert into t values(145, 'ZZ25', TRUNC(SYSDATE),'01','N');

insert into t values(255, 'XX12',TRUN(SYSDATE+1), '01','N');

insert into t values(301, 'AB13',TRUN(SYSDATE+1), '02','N');

commit;

select * from t
order by dat_job;

IDE COMPTECLI DAT_JOB ACTION FAG_SUCCESS
115 XX12 16/06/2006 01 N
120 YY14 16/06/2006 01 N
145 ZZ25 16/06/2006 01 N
301 AB13 17/06/2006 02 N
255 XX12 17/06/2006 01 N

What I would like to get is:
if a same COMPTECLI with the same ACTION and FAG_SUCCES = 'N' occur many times then select only the comptecli with the "bigest" dat_job. In this example I would like to get this:

IDE COMPTECLI DAT_JOB ACTION FAG_SUCCESS
120 YY14 16/06/2006 01 N
145 ZZ25 16/06/2006 01 N
301 AB13 17/06/2006 02 N
255 XX12 17/06/2006 01 N

where comptecli XX12 of date 16/06/2006 has been ignored

Could you please help me

Thanks in advance

June 16, 2006 - 7:13 pm UTC

there actually are examples of your problem in the book... you just didn't recognize them :)

however, what about records where fag_success <> 'N' - what becomes of them?

are they treated differently?  does this only apply to 'N's?

answer is maybe this:

ops$tkyte@ORA10GR2> select * from t
  2  order by dat_job;

       IDE COMPTECLI     DAT_JOB   ACTIO F
---------- ------------- --------- ----- -
       115 XX12          16-JUN-06 01    N
       120 YY14          16-JUN-06 01    N
       145 ZZ25          16-JUN-06 01    N
       255 XX12          17-JUN-06 01    N
       301 AB13          17-JUN-06 02    N

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select *
  2    from (
  3  select ide, comptecli, dat_job, action, fag_success,
  4         max(case when fag_success = 'N' then dat_job end) over (partition by comptecli, action) max_date
  5    from t
  6         )
  7   where fag_success <> 'N'
  8      or (fag_success = 'N' and dat_job = max_date)
  9          or max_date is null;

       IDE COMPTECLI     DAT_JOB   ACTIO F MAX_DATE
---------- ------------- --------- ----- - ---------
       301 AB13          17-JUN-06 02    N 17-JUN-06
       255 XX12          17-JUN-06 01    N 17-JUN-06
       120 YY14          16-JUN-06 01    N 16-JUN-06
       145 ZZ25          16-JUN-06 01    N 16-JUN-06



but you didn't provide the details - and the border test cases!

A reader, June 17, 2006 - 5:42 am UTC

Thanks very much Tom,

Only Fag_Success = 'N' are considered. Records with Fag_Success = 'Y' are deleted before the process starts.

Again Thanks a lot

June 17, 2006 - 7:14 am UTC

then mentioning 'fag-success' at all in the problem description was what we call a 'red herring', something to throw us off the trail :)

distinct and average

riyaz, June 28, 2006 - 12:29 am UTC

There is a data in a select is
>
> XVAL PO_NO BILLNO BILL VALUE
>
> 000001 PO1000000 BL0000001 1000.00
> 000001 PO1000001 BL0000002 2000.00
> 000001 PO1000002 BL0000002 2000.00
> 000001 PO1000003 BL0000003 1000.00
> 000001 PO1000003 BL0000004 1000.00
>
>
> I need a report like
>
> XVAL PO_count BILL Count Avg BILL VALUE
>
> 000001 4 4 1250.00

distict (count) and average - contd

riyaz, June 28, 2006 - 12:36 am UTC

For bill no, amount is repeated twice, so only one amount to be taken
ie average --> 5000/4 = 1250

> XVAL PO_NO BILLNO BILL VALUE
>
> 000001 PO1000000 BL0000001 1000.00
> 000001 PO1000001 BL0000002 2000.00
> 000001 PO1000002 BL0000002 2000.00
(value is repeating, not orginally available in table)
> 000001 PO1000003 BL0000003 1000.00
> 000001 PO1000003 BL0000004 1000.00
>
>

Pl give us analytical query.

June 28, 2006 - 7:45 am UTC

no creates, no inserts, no look

Does it perform better?

A reader, July 05, 2006 - 11:54 am UTC

Tom, Please take a look at this query. Basically i am trying to pull all the tables and it's columns from all_tab_columns and want them listed as group
eg:
Table A Column1
Column2
Column3

Table B Column1
Column2
Column3

I do not want table name to repeat for each column [like BREAK in SQL Plus]

I got it using this query:
select
case when lag(table_name) over (order by table_name) is null
then table_name
when lag(table_name) over(order by table_name) = table_name
then ''
else table_name
end tab_name,
column_name
from all_tab_columns

Do you see any problem with this one? or this should be ok in terms of performance?

Thanks,

July 08, 2006 - 8:11 am UTC

I might have used row_number with a partition - but only because it "made more sense to me" that way... You need to incorporate OWNER in there as well though, technically your query isn't quite right without it.

select decode( row_number() over (partition by owner, table_name order by column_name),
1, owner || '.' || table_name ) tname,
column_name
from all_tab_columns
order by owner, table_name, column_name

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3637 4.36 4.26 0 390552 0 54535
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3639 4.37 4.27 0 390552 0 54535

select
case when lag(table_name) over (order by table_name) is null
then table_name
when lag(table_name) over(order by table_name) = table_name
then ''
else table_name
end tab_name,
column_name
from all_tab_columns

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3637 4.51 4.42 0 390552 0 54535
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3639 4.51 4.42 0 390552 0 54535

Items sold Together

Reader, July 11, 2006 - 6:43 pm UTC

I was exploring if you could help me in writing a SQL to get the popular items sold together .Something like when you go to some websites they say that people who bought this item also bought this item .
eg if a customers buys a shirt he always buys a pen .

THANKS !

July 12, 2006 - 3:28 pm UTC

sort of hard without a table isn't it?

Can i use it as analytic func in ORACLE10 G

Mariana, July 13, 2006 - 5:17 pm UTC

I have a select
SELECT SUPPLIER_NO,ACCOUNT_NO,LINE_NO,AMOUNT
FROM KM_SUP_PAYMENTS
ORDER BY SUPPLIER_NO
that gives such result:
SUPPLIER_NO ACCOUNT_NO LINE_NO AMOUNT
=========== ========== ======= ======
123 555555 10001 34
123 555555 20034 56
123 555555 30034 12
234 345555 34555 56
234 454555 45455 34

But i need to see also after each group of suppliers
total amount for supplier
like here, i tried to do it by analitic but my solution was wrong,
can you help me please in solution in order to get this result:
SUPPLIER_NO ACCOUNT_NO LINE_NO AMOUNT
=========== ========== ======= ======
123 555555 10001 34
123 555555 20034 56
123 555555 30034 12
123 102
234 345555 34555 56
234 454555 45455 34
234 90

July 13, 2006 - 6:24 pm UTC

scott@ORA10GR2> select deptno, ename, sum(sal)
2 from emp
3 group by grouping sets((deptno,ename),(deptno));

DEPTNO ENAME SUM(SAL)
---------- ---------- ----------
10 KING 5000
10 CLARK 2450
10 MILLER 1300
10 8750
20 FORD 3000
20 ADAMS 1100
20 JONES 2975
20 SCOTT 3000
20 SMITH 800
20 10875
30 WARD 1250
30 ALLEN 1600
30 BLAKE 2850
30 JAMES 950
30 MARTIN 1250
30 TURNER 1500
30 9400

17 rows selected.

Continue grouping sets in ORACLE 10G

Marirana, July 14, 2006 - 12:12 am UTC

HI Tom,
Thank for answer,but at this example:
SUPPLIER_NO ACCOUNT_NO LINE_NO AMOUNT
=========== ========== ======= ======
123 555555 10001 34
123 555555 20034 56
123 555555 30034 12
123 102
234 345555 34555 56
234 454555 45455 34
234 90
the amount for each row is not sum , but just when supllier_no changes,even though i have two identical rows
before i have to dispay it and to make sum when supplier_no changes.

July 14, 2006 - 8:22 am UTC

i don't see a create table and inserts for your example?  


but using emp.... we can do this again:


ops$tkyte@ORA10GR2> select supplier_no, account_no, line_no, sum(amount)
  2    from (
  3  select deptno supplier_no, job account_no, empno line_no, sal amount
  4    from emp
  5         )
  6   group by grouping sets((supplier_no,account_no,line_no),(supplier_no))
  7  /

SUPPLIER_NO ACCOUNT_N    LINE_NO SUM(AMOUNT)
----------- --------- ---------- -----------
         10 CLERK           7934        1300
         10 MANAGER         7782        2450
         10 PRESIDENT       7839        5000
         10                             8750
         20 CLERK           7369         800
         20 CLERK           7876        1100
         20 ANALYST         7788        3000
         20 ANALYST         7902        3000
         20 MANAGER         7566        2975
         20                            10875
         30 CLERK           7900         950
         30 MANAGER         7698        2850
         30 SALESMAN        7499        1600
         30 SALESMAN        7521        1250
         30 SALESMAN        7654        1250
         30 SALESMAN        7844        1500
         30                             9400

17 rows selected.


I don't see two *identical rows* in your example - do you?

if you pretend my 30 = your 123 and my salesman = your 555555 - there you go.

Can Analytic do this?

Steve Read, July 14, 2006 - 1:32 am UTC

Here is a simple self-contained query with no creates or inserts needed......

SELECT keynbr
,MAX(MAX(status)) OVER (PARTITION BY keynbr ORDER BY MAX(datetm) DESC) status
FROM -- should return 1, 4, X, X
( SELECT '1' keynbr, '1' status, sysdate +1 datetm FROM dual
UNION ALL SELECT '1' keynbr, '2' status, sysdate +2 datetm FROM dual
UNION ALL SELECT '1' keynbr, '3' status, sysdate +3 datetm FROM dual
UNION ALL SELECT '1' keynbr, '4' status, sysdate +4 datetm FROM dual
-- should return 2, 2, X, NULL
UNION ALL SELECT '2' keynbr, '2' status, sysdate +1 datetm FROM dual
UNION ALL SELECT '2' keynbr, '3' status, sysdate +2 datetm FROM dual
UNION ALL SELECT '2' keynbr, '4' status, sysdate +3 datetm FROM dual
UNION ALL SELECT '2' keynbr, '2' status, sysdate +4 datetm FROM dual
)
GROUP BY keynbr

What it returns now is the MAX(status), but what I actually want is the status associated with the most recent datetm. I think this is probably very simple, but I can't figure out how to do it. I read through this whole thread (took a long time) but I couldn't find the answer.

We are using 8.1.7.

Thanks in advance for your help.

Can Analytic do this, part 2

Steve Read, July 14, 2006 - 1:36 am UTC

Correcting the comment lines from the previous post (I forgot to fix them after simplifying my example):

SELECT keynbr
,MAX(MAX(status)) OVER (PARTITION BY keynbr ORDER BY MAX(datetm) DESC) status
FROM -- should return 1, 4
( SELECT '1' keynbr, '1' status, sysdate +1 datetm FROM dual
UNION ALL SELECT '1' keynbr, '2' status, sysdate +2 datetm FROM dual
UNION ALL SELECT '1' keynbr, '3' status, sysdate +3 datetm FROM dual
UNION ALL SELECT '1' keynbr, '4' status, sysdate +4 datetm FROM dual
-- should return 2, 2
UNION ALL SELECT '2' keynbr, '2' status, sysdate +1 datetm FROM dual
UNION ALL SELECT '2' keynbr, '3' status, sysdate +2 datetm FROM dual
UNION ALL SELECT '2' keynbr, '4' status, sysdate +3 datetm FROM dual
UNION ALL SELECT '2' keynbr, '2' status, sysdate +4 datetm FROM dual
)
GROUP BY keynbr

Michel Cadot, July 14, 2006 - 3:03 am UTC

SQL> with
  2    raw_data as (
  3      SELECT '1' keynbr, '1' status, sysdate +1 datetm FROM dual
  4      UNION ALL SELECT '1' keynbr, '2' status, sysdate +2 datetm FROM dual 
  5      UNION ALL SELECT '1' keynbr, '3' status, sysdate +3 datetm FROM dual 
  6      UNION ALL SELECT '1' keynbr, '4' status, sysdate +4 datetm FROM dual
  7      UNION ALL SELECT '2' keynbr, '2' status, sysdate +1 datetm FROM dual 
  8      UNION ALL SELECT '2' keynbr, '3' status, sysdate +2 datetm FROM dual 
  9      UNION ALL SELECT '2' keynbr, '4' status, sysdate +3 datetm FROM dual 
 10      UNION ALL SELECT '2' keynbr, '2' status, sysdate +4 datetm FROM dual 
 11    ),
 12    numbered_data as (
 13      select keynbr, status,
 14             row_number() over (partition by keynbr order by datetm desc) rn
 15      from raw_data
 16    )
 17  select keynbr, status
 18  from numbered_data
 19  where rn = 1
 20  order by keynbr
 21  /
K S
- -
1 4
2 2

2 rows selected.

Have a look at rank and dense_rank functions depending on the result you want in case of dupplicate dates.

But if you have huge data and (keynbr,datetm) is indexed, the following one will be faster and cheaper:

SQL> with
  2    raw_data as (
  3      SELECT '1' keynbr, '1' status, sysdate +1 datetm FROM dual
  4      UNION ALL SELECT '1' keynbr, '2' status, sysdate +2 datetm FROM dual 
  5      UNION ALL SELECT '1' keynbr, '3' status, sysdate +3 datetm FROM dual 
  6      UNION ALL SELECT '1' keynbr, '4' status, sysdate +4 datetm FROM dual
  7      UNION ALL SELECT '2' keynbr, '2' status, sysdate +1 datetm FROM dual 
  8      UNION ALL SELECT '2' keynbr, '3' status, sysdate +2 datetm FROM dual 
  9      UNION ALL SELECT '2' keynbr, '4' status, sysdate +3 datetm FROM dual 
 10      UNION ALL SELECT '2' keynbr, '2' status, sysdate +4 datetm FROM dual 
 11    ),
 12    max_data as ( 
 13     select keynbr, max(datetm) max_date
 14     from raw_data
 15     group by keynbr
 16    )
 17  select r.keynbr, r.status
 18  from raw_data r, max_data m
 19  where r.keynbr = m.keynbr
 20    and r.datetm = m.max_date
 21  order by r.keynbr
 22  /
K S
- -
1 4
2 2

2 rows selected.

Michel

Complex Query

Sankar, July 14, 2006 - 7:41 am UTC

Hi Tom,

I have a table like this:

Personid Change_Sequence Class Cls_EffDate Location Loc_EffDate
----------------------------------------------------------------------------
1000 1 FullTime Hawaii
1000 2 FullTime California 1/1/2005
1000 3 PartTime 1/1/2006 California 1/1/2005
1000 4 PartTime 1/1/2006 Texas 10/1/2005
1000 5 FullTime 1/1/2007 Boston 1/1/2007
1000 6 COBRA 1/1/2005 Boston 1/1/2007
1000 7 Outside 5/1/2006 Boston 1/1/2007

1. The primary key is (Personid, change_sequence)

2. The effective dates of the first row for each person is null (i.e. for
change_sequence = 1)

3. For each row only one column value will be effected (i.e. Either Class or
Location can be changed) and the remaining data will be copied from the previous
row

4. Both Class and Location can be changed in a row, only if the effective date
is same. [See where change_sequence = 5]

I am using the following query for getting Class and Location of the person as
per given Effective Date.

Select ECls.personid, ECls.class, ELoc.Location, ECls.change_sequence
Cls_change_seq, ELoc.change_Sequence Loc_change_seq
from (select * from (select personid, class, change_sequence,
row_number() over(partition by personid order by
change_sequence desc) r
from employee
where nvl(Cls_EffDate, to_date('&givenDate','MM/DD/YYYY')) <=
to_date('&givenDate','MM/DD/YYYY')) Cls
where Cls.r = 1) ECls,

(Select * from (select personid, Location, change_sequence,
row_number() over(partition by personid order by
change_sequence desc) r
from employee
where nvl(Loc_EffDate, to_date('&givenDate','MM/DD/YYYY')) <=
to_date('&givenDate','MM/DD/YYYY')) Loc
where Loc.r = 1) ELoc

where ECls.personid = ELoc.personid;

Now lets run query with three different dates

For 1/1/2005 the output would be

PERSONID CLASS LOCATION CLS_CHANGE_SEQ LOC_CHANGE_SEQ
------------------------------------------------------------------------------
1000 COBRA-Class California 6 3

For 1/1/2006 the output would be
PERSONID CLASS LOCATION CLS_CHANGE_SEQ LOC_CHANGE_SEQ
------------------------------------------------------------------------------
1000 COBRA-Class Texas 6 4

For 1/1/2007 the output would be
PERSONID CLASS LOCATION CLS_CHANGE_SEQ LOC_CHANGE_SEQ
------------------------------------------------------------------------------
1000 Outside Boston 7 7

The table more than 1 Lakh rows and got more than 12 Columns, similar to CLASS
and LOCATION, based on effective dates. (For example Dept, Dept_EffDate, Status,
Status_EffDate,.....)

In the above query I created two instances ECls, ELoc. If I want dept column
too, I would need to create one more instance.

The goal is to get all the Columns using less number of joins, if possible,
using only one instance of the table?

If this is not possible, what would be the better design to handle this type of
data?

create table employee
(personid number(10),
change_sequence number(3),
class varchar2(50),
cls_effdate date,
location varchar2(50),
loc_effdate date,
primary key (personid,change_sequence));

insert into employee values(1000,1,'FullTime',null,'Hawaii',null);
insert into employee
values(1000,2,'FullTime',null,'California',to_date('1/1/2005','MM/DD/YYYY'));
insert into employee
values(1000,3,'PartTime',to_date('1/1/2006','MM/DD/YYYY'),'California',to_date('1
/1/2005','MM/DD/YYYY'));
insert into employee
values(1000,4,'PartTime',to_date('1/1/2006','MM/DD/YYYY'),'Texas',to_date('10/1/2
005','MM/DD/YYYY'));
insert into employee
values(1000,5,'FullTime',to_date('1/1/2007','MM/DD/YYYY'),'Boston',to_date('1/1/2
007','MM/DD/YYYY'));
insert into employee
values(1000,6,'COBRA-Class',to_date('1/1/2005','MM/DD/YYYY'),'Boston',to_date('1/
1/2007','MM/DD/YYYY'));
insert into employee
values(1000,7,'Outside',to_date('5/1/2006','MM/DD/YYYY'),'Boston',to_date('1/1/20
07','MM/DD/YYYY'));

Thanks in Advance
Sankar

July 14, 2006 - 8:37 am UTC

it might just be me, but 4 and 5 seems to conflict with eachother.

also, why do the dates go UP and DOWN as the sequence increases??? doesn't make sense.

seems the dates should only INCREASE or stay the same over time.

Continue with grouping sets in ORACLE10G

Mariana, July 14, 2006 - 9:48 am UTC

Hi,Tom,
This is just an example:
SUPPLIER_NO ACCOUNT_NO LINE_NO AMOUNT
=========== ========== ======= ======
123 555555 10001 34
123 555555 20034 56
123 555555 30034 12
123 102
234 345555 34555 56
234 454555 45455 34
234 90
But what i really need in my project is
to run through the cursor
SELECT A.SUPPLIER_NO,
B.SUPPLIER_NAME
A.LINE_NO,
A.PAYMENT_NO
B.VAT_SW
A.AMOUNT_NO
FROM KM_SUP_PAYMENTS A,
KM_SUP_SUPPLIERS B
WHERE A.SUPPLIER_NO=B.SUPPLIER_NO
ORDER BY A.SUPPLIER_NO
For each different supplier i have to insert sum(amount) to some array,but for each and each row i have to make other checks,I thought to make sum(amount) by code but for this i have to use previous values fields from cursor,i also thought to use analitic func of sum(amount)
for every row but i am not sure it is good.Help me please.
Thank you very much

July 14, 2006 - 9:59 am UTC

SELECT A.SUPPLIER_NO,
B.SUPPLIER_NAME
A.LINE_NO,
A.PAYMENT_NO
B.VAT_SW
sum(A.AMOUNT_NO)
FROM KM_SUP_PAYMENTS A,
KM_SUP_SUPPLIERS B
WHERE A.SUPPLIER_NO=B.SUPPLIER_NO
group by grouping sets(
(a.supplier_no,b.supplier_name,a.line_no,a.payment_no,b.vat_sw),
(a.supplier_no)
)
ORDER BY A.SUPPLIER_NO

I am going to keep saying basically the same thing over and over - you want a subtotal. Hence, I'll keep bringing up GROUPING SETS!!!!!!

Answer to Complex Query

Andy, July 14, 2006 - 10:32 am UTC

To Sankar,

This query should work for you.

select
personid,
/* repeat in the same pattern for other fields */
max(cls_max_chg_seq) cls_chg_seq,
max(case when (change_sequence = cls_max_chg_seq) then class end) class,
max(case when (change_sequence = cls_max_chg_seq) then cls_effdate end) cls_effdate,
/* repeat in the same pattern for other fields */
max(loc_max_chg_seq) loc_chg_seq,
max(case when (change_sequence = loc_max_chg_seq) then location end) location,
max(case when (change_sequence = loc_max_chg_seq) then loc_effdate end) loc_effdate
from
(
select personid, change_sequence,
/* repeat in the same pattern for other fields */
class,
cls_effdate,
max(case when cls_effdate <= the_date then change_sequence else 1 end) over(partition by personid) cls_max_chg_seq,
/* repeat in the same pattern for other fields */
location,
loc_effdate,
max(case when loc_effdate <= the_date then change_sequence else 1 end) over(partition by personid) loc_max_chg_seq
from
employee
)
group by personid;

Here analytics gives you the max change sequence with effective date less than the input date (the_date => replace it with your own input).

Then group by, with max, allows you select the required value. This query can be readily extended to multiple attributes.

Continue to grouping sets

Mariana, July 14, 2006 - 10:35 am UTC

I understand you, but what i am confused is that i do it through the cursor and i have to put in grouping sets all fields that appear in select,what about the analitical func in every row sum(amount) and i'll write the sum when previous supplier_no!=current supplier_no?
Thank you.

July 14, 2006 - 12:34 pm UTC

not understanding you.

where does a cursor come into the discussion, I'm getting the answer in SQL - no code.

Less code = less bugs. Strive to write less code.

Analytics use

Thiru, July 14, 2006 - 11:55 am UTC

Hello Tom,

Great site. Thank so much.

Can you advise how analytics could be used (if possbile, or any other way to do) to get the results for the query below:

CREATE TABLE Z (A CHAR(2),B NUMBER,C NUMBER,D NUMBER);
CREATE TABLE Y (A CHAR(2) PRIMARY KEY,B NUMBER,C NUMBER);

BEGIN
INSERT INTO Y VALUES('AA',200,NULL);
INSERT INTO Y VALUES('AB',200,NULL);
INSERT INTO Y VALUES('AC',200,NULL);
INSERT INTO Y VALUES('AD',200,NULL);

INSERT INTO Z VALUES('AB',100,1,0);
INSERT INTO Z VALUES('AB',100,1,0);
INSERT INTO Z VALUES('AC',100,1,1);
INSERT INTO Z VALUES('AC',100,1,0);
INSERT INTO Z VALUES('AD',100,0,0);
INSERT INTO Z VALUES('AD',100,0,0);

COMMIT;
END;

I would like to update Table Y col c with the sum(Z.B) grouped by Z.A where SUM(Z.B)=Y.B AND Z.C =1 for all records in the group and Z.D=0 for all records in the group .
Y.A=Z.A and Y.A is primary key.

July 14, 2006 - 12:53 pm UTC

AND Z.C =1 for all records in the group and Z.D=0 for all records
in the group

you lost me there.

do you mean something like

update y
set c = b
where (a,b) in
(
select z.a, sum(z.b)
from z
where z.c = 1
and z.d = 0
group by z.a
)

since SUM(Z.B) must be Y.B, we don't need to update from Z, we just need to find the records to update right?

I messed up the question

Thiru, July 14, 2006 - 1:39 pm UTC

Sorry Tom for the confusion, I really meant:
I would like to update Table Y col c with the sum(Z.B) grouped by Z.A where
Z.C =1 for all records in the group and Z.D=0 for all records
in the group .
Y.A=Z.A and Y.A is primary key.

The sum(Z.b)=y.b is not in the condition. I will have to take the values from Z. Just coincidence that I made the sceanrio where y.b looked same as sum(z.b).

Will that make the subquery more complex?
Also can we use analytics for this looking at the millions of rows in the two tables?

July 14, 2006 - 1:57 pm UTC

Not just a conincidence - you said it:

....
grouped by Z.A where SUM(Z.B)=Y.B AND Z.C =1
........

what version you got going here?
are there any Z.A values that are NOT in Y.A?

analytics will NOT come into play here - aggregates will.

Thiru, July 14, 2006 - 2:01 pm UTC

"what version you got going here?
are there any Z.A values that are NOT in Y.A?"

I am on 10gR2.
Yes it is possible that Z.A values will not be in Y.A.

July 14, 2006 - 2:47 pm UTC


ops$tkyte@ORA10GR2> select * from y;

A           B          C
-- ---------- ----------
AA        200
AB        200
AC        200
AD        200

ops$tkyte@ORA10GR2> merge into y
  2  using
  3  (
  4  select z.a, sum(z.b) new_c
  5    from z
  6   where z.c = 1
  7     and z.d = 0
  8   group by z.a
  9  ) z
 10  on (y.a = z.a)
 11  when matched then update set c = z.new_c;

2 rows merged.

ops$tkyte@ORA10GR2> select * from y;

A           B          C
-- ---------- ----------
AA        200
AB        200        200
AC        200        100
AD        200

Thiru, July 14, 2006 - 3:03 pm UTC

The Y table should have been updated for only one row ie., 'AB' and not 'AC'.

As only AB on Table Z satisfies both conditions. All records in the group AB has 1 for column c and 0 for column d. While for 'AC' one record in Col d has a value 1. So this does not satisfy the condition.

select * from z
/ a b c d
AB 100 1 0
AB 100 1 0
AC 100 1 1
AC 100 1 0
AD 100 0 0
AD 100 0 0

July 14, 2006 - 3:16 pm UTC

that was not entirely clear.

easy enough however


ops$tkyte@ORA10GR2> merge into y
  2  using
  3  (
  4  select z.a, sum(z.b) new_c
  5    from z
  6   group by z.a
  7  having count(*) = count( case when c=1 and d=0 then 1 end)
  8  ) z
  9  on (y.a = z.a)
 10  when matched then update set c = z.new_c;

1 row merged.

ops$tkyte@ORA10GR2> select * from y;

A           B          C
-- ---------- ----------
AA        200
AB        200        200
AC        200
AD        200

Wow!

Thiru, July 14, 2006 - 3:21 pm UTC

Hats off to you! I was trying using CASE but just could not get it right.

Thanks.

Continue to grouping sets in ORACLE10G

MARIANA, July 14, 2006 - 5:12 pm UTC

What i meant here that in my store procedure i have to use this select in cursor:i have to proceed each row at this cursor because i have a logic that i need to do it by code,
and my question is: May i use the analytical function instead of grouping set?
like:
CURSOR
SELECT A.SUPPLIER_NO,
B.SUPPLIER_NAME
A.LINE_NO,
A.PAYMENT_NO
B.VAT_SW
A.AMOUNT,
SUM(A.AMOUNT) OVER (PARTITION BY A.SUPPLIER_NO)
FROM KM_SUP_PAYMENTS A,
KM_SUP_SUPPLIERS B
WHERE A.SUPPLIER_NO=B.SUPPLIER_NO
ORDER BY A.SUPPLIER_NO
Thank you

July 14, 2006 - 5:28 pm UTC

you may do whatever you like, yes.  of course.

(you could also just fetch out grouping:


ops$tkyte@ORA10GR2> select supplier_no, account_no, line_no, sum(amount),
  2         grouping(account_no)
  3    from (
  4  select deptno supplier_no, job account_no, empno line_no, sal amount
  5    from emp
  6         )
  7   group by grouping sets((supplier_no,account_no,line_no),(supplier_no))
  8  /

SUPPLIER_NO ACCOUNT_N    LINE_NO SUM(AMOUNT) GROUPING(ACCOUNT_NO)
----------- --------- ---------- ----------- --------------------
         10 CLERK           7934        1300                    0
         10 MANAGER         7782        2450                    0
         10 PRESIDENT       7839        5000                    0
         10                             8750                    1
         20 CLERK           7369         800                    0
         20 CLERK           7876        1100                    0
         20 ANALYST         7788        3000                    0
         20 ANALYST         7902        3000                    0
         20 MANAGER         7566        2975                    0
         20                            10875                    1
         30 CLERK           7900         950                    0
         30 MANAGER         7698        2850                    0
         30 SALESMAN        7499        1600                    0
         30 SALESMAN        7521        1250                    0
         30 SALESMAN        7654        1250                    0
         30 SALESMAN        7844        1500                    0
         30                             9400                    1

17 rows selected.


so you know what rows are "made up for you" instead of doing the old "if curr <> last then ..." trick.

Continue with grouping settings in ORACLE 10G

Mariana, July 14, 2006 - 6:32 pm UTC

Thank you,Tom,
I have a question about analitic func at this concept:
is this function is expensive ?
Is this function calculated for each row?
or just when supplier_no is changing?
SELECT A.SUPPLIER_NO,
B.SUPPLIER_NAME
A.LINE_NO,
A.PAYMENT_NO
B.VAT_SW
A.AMOUNT,
SUM(A.AMOUNT) OVER (PARTITION BY A.SUPPLIER_NO)
FROM KM_SUP_PAYMENTS A,
KM_SUP_SUPPLIERS B
WHERE A.SUPPLIER_NO=B.SUPPLIER_NO
ORDER BY A.SUPPLIER_NO

July 15, 2006 - 3:19 pm UTC

it'll have to

a) join
b) sort/hash/partition by supplier_no
c) compute the sum(a.amount)

"expensive" - probably less so than you doing it yourself since (a) and (b) would have to be done for you - letting the server do C is a good idea.

However, grouping sets *STILL* seems more appropriate.

Can this be done?

A reader, July 15, 2006 - 11:43 am UTC

Hi Tom,
Can this be done in one select through analytic function, instead of two sql stmnts.
1  select 'client, '||count(*) from client
  2  union all
  3* select to_char(client_key) from client
SQL> /

'CLIENT,'||COUNT(*)
------------------------------------------------
client, 10
10
12
16
22
23
24
25
26
29
31

11 rows selected.
Regards,
Tarun

July 15, 2006 - 3:24 pm UTC

assuming client_key is "unique"

scott%ORA10GR2> select decode( grouping(empno),
2 0, to_char(empno),
3 1, 'client, ' || count(empno) ) data
4 from emp
5 group by rollup(empno)
6 order by grouping(empno) DESC, empno
7 /

DATA
------------------------------------------------
client, 14
7369
7499
7521
7566
7654
7698
7782
7788
7839
7844
7876
7900
7902
7934

15 rows selected.

Good one!

Michel Cadot, July 15, 2006 - 3:48 pm UTC

Continue to grouping sets ORACLE 10G

Mariana, July 15, 2006 - 4:22 pm UTC

Hi Tom,
Thank you for question,
i didn't understand which join is executed in this analitic function?And is the analitic function executed per each row or once per different supplier_no?
SELECT A.SUPPLIER_NO,
B.SUPPLIER_NAME
A.LINE_NO,
A.PAYMENT_NO
B.VAT_SW
A.AMOUNT,
SUM(A.AMOUNT) OVER (PARTITION BY A.SUPPLIER_NO)
FROM KM_SUP_PAYMENTS A,
KM_SUP_SUPPLIERS B
WHERE A.SUPPLIER_NO=B.SUPPLIER_NO
ORDER BY A.SUPPLIER_NO

July 16, 2006 - 9:36 am UTC

you are doing the join? right there - A to B.

I was describing in effect what would have to take place to process said query.

a) join
b) sort
c) compute analytic.

If you did it yourself it would be:

a) join
b) sort
c) YOU compute the analytic

I'm going to guess it will be better for US to commpute the analytic

I'll even guess that GROUPING SETS *is still the right answer* :)

Complex Query

Sankar Kumar, July 16, 2006 - 10:59 am UTC

To Andy,

That is what I am looking for. Thanks alot

What if there is nothing to group on?

Phil, July 21, 2006 - 9:11 am UTC

Hi Tom

This is great but unlike "traditional" SQL I find it hard to learn. Can you recommend a book that deciphers it for the likes of me please?

I have some data which I would like to analyse but I am not sure if I can do it:

WHEN (date) DESCRIPTION (varchar2)
7/21/2006 1:10:44 PM Starting processX. (From 14-MAY-06 for 20 days)
7/21/2006 1:10:42 PM Completed balance update processing in pr_update_lagging_RTS_bals. 7419 balances processed
7/21/2006 11:10:41 AM Completed processing in processcdr. 433923 events,28567 accounts*days
7/21/2006 11:10:41 AM Starting balance update processing in pr_update_lagging_RTS_bals.
7/20/2006 9:42:48 PM Starting processX. (From 24-APR-06 for 20 days)
7/20/2006 9:42:46 PM Completed balance update processing in pr_update_lagging_RTS_bals. 2855 balances processed
7/20/2006 8:58:01 PM Completed processing in processcdr. 286963 events,7211 accounts*days
7/20/2006 8:58:01 PM Starting balance update processing in pr_update_lagging_RTS_bals.
7/20/2006 8:25:48 PM Starting processX. (From 04-APR-06 for 20 days)
7/20/2006 8:25:46 PM Completed balance update processing in pr_update_lagging_RTS_bals. 528 balances processed
7/20/2006 8:17:30 PM Completed processing in processX. 106057 CDR events,1009 accounts*days
7/20/2006 8:17:30 PM Starting balance update processing in pr_update_lagging_RTS_bals.
7/20/2006 8:06:00 PM Starting processX. (From 15-MAR-06 for 20 days)

I only have these two columns. Can I query this for a result as follows:

Process #1 completed in xx minutes
Process #2 completed in yy minutes
...

If I could, I could then do some extra substr to get the frequency and performance (process per min) on this aspect of the app.

Cheers,

Phil

July 23, 2006 - 7:29 am UTC

the data warehousing guide is a pretty good read. And remember "traditional" SQL looked like Klingon to you until you learned it. Then and only then did you start thinking of it as "traditional" easy stuff.

If you have access to Expert One On One Oracle - I have a chapter on them in there as well that explains them in some detail.

we could do what you need in all likelyhood - what you would have to do is

a) provide create tables
b) with inserts to populate them
c) and most importantly describe how to find the begin/end records for a process. I see two possible records - "starting processx" and "starting balance update"

A simple lag() or lead() will do this easily - we just select out the records that represent start and stop and using lag - you can get the prior record attached to the current record.

Gints, July 21, 2006 - 3:10 pm UTC

create table a1(a number);
insert into a1 values(1);
insert into a1 values(2);
insert into a1 values(3);
insert into a1 values(4);
insert into a1 values(5);
SQL> select max (a) over (order by a)
2 from a1;

MAX(A)OVER(ORDERBYA)
--------------------
1
2
3
4
5

SQL> select min (a) over (order by a desc)
2 from a1;

MIN(A)OVER(ORDERBYADESC)
------------------------
5
4
3
2
1

SQL> select max (a) over (order by a), min (a) over (order by a desc)
2 from a1;

MAX(A)OVER(ORDERBYA) MIN(A)OVER(ORDERBYADESC)
-------------------- ------------------------
5 5
4 4
3 3
2 2
1 1

Why max (a) over (order by a) changes it's appearance? :OOOO
Even trying explicitly say window clause there are wrong results although separately everything is ok:
select max (a) over (order by a rows between unbounded preceding and current row),
max (a) over (order by a desc rows between current row and unbounded following)
from a1;
gives the same....

July 23, 2006 - 7:52 am UTC

maybe when we include "a" in the query and order each query by "a" so they are presenting the data in the same order - it'll make more sense and you'll understand why the answer is in fact "correct" (remember, the rows can only be sorted ONE WAY :)


ops$tkyte%ORA10GR2> select a, max (a) over (order by a) from a1 order by a;

         A MAX(A)OVER(ORDERBYA)
---------- --------------------
         1                    1
         2                    2
         3                    3
         4                    4
         5                    5

ops$tkyte%ORA10GR2> select a, min (a) over (order by a desc) from a1 order by a;

         A MIN(A)OVER(ORDERBYADESC)
---------- ------------------------
         1                        1
         2                        2
         3                        3
         4                        4
         5                        5

ops$tkyte%ORA10GR2> select a, max (a) over (order by a), min (a) over (order by a desc) from a1 order by a;

         A MAX(A)OVER(ORDERBYA) MIN(A)OVER(ORDERBYADESC)
---------- -------------------- ------------------------
         1                    1                        1
         2                    2                        2
         3                    3                        3
         4                    4                        4
         5                    5                        5



The max(a) when you order by a in the over clause is the same as A

The min(a) when you order by a DESC in the over clause is the same as A

hence, for your data - max(a) over ... is the SAME as min(a) over ...

and both are the same as A!

Problem in Analitic function

Mariana, July 26, 2006 - 4:44 pm UTC

Hello,Tom,
I have a table:

create table Lines
(LINE_NO NUMBER(4),
MONTH DATE,
POINTS_NUM NUMBER(3));
INSERT INTO TABLE LINES
VALUES(1001,TO_DATE('01/03/2006','DD/MM/YYYY'),25);
INSERT INTO TABLE LINES
VALUES(1001,TO_DATE('01/04/2006','DD/MM/YYYY'),15);
INSERT INTO TABLE LINES
VALUES(1001,TO_DATE('01/05/2006','DD/MM/YYYY'),40);
INSERT INTO TABLE LINES
VALUES(1002,TO_DATE('01/03/2006','DD/MM/YYYY'),35);
INSERT INTO TABLE LINES
VALUES(1002,TO_DATE('01/04/2006','DD/MM/YYYY'),12);
INSERT INTO TABLE LINES
VALUES(1002,TO_DATE('01/05/2006','DD/MM/YYYY'),76);

SELECT LINE_NO,MONTH,POINTS_NUM FROM LINES
ORDER BY LINE_NO,MONTH;

LINE_NO MONTH POINTS_NUM
======= ===== ==========
1001 03/06 25
1001 04/06 15
1001 05/06 40
1002 03/06 35
1002 04/06 12
1002 05/06 76

What i really need is :
1) For each group of lines(i have 2 groups 1001 and 1002)
from the second row of each group i have to calc the percentage of change OF POINTS_NUM from previous month and for each group i have also to calc cumulative percentage of change of POINTS_NUM:
it will look like:

LINE_NO MONTH POINTS_NUM PERCENTAGE_CHANGE CUMULATIVE
======= ===== ========== ================= ==========
1001 03/06 25
1001 04/06 15 (15-25)/25*100 (15-25)/25*100
1001 05/06 40 (40-15)/15*100 (40-25)/25*100

1002 03/06 35
1002 04/06 12 (12-35)/35*100 (12-35)/35*100
1002 05/06 76 (76-12)/12*100 (76-35)/35*100

and i have to calc total cumulative percentage:
SELECT SUM(POINTS_NUM) SUM1 FROM LINES
WHERE MONTH='01-MAR-2006'
SUM1:=25+35=60;

SELECT SUM(POINTS_NUM) SUM2 FROM LINES
WHERE MONTH='01-MAY-2006'
SUM2:=40+76=116;

TOTAL CUMULATIVE PERCENTAGE=((116-60)/60)*100;
My question is -how to do by one select-analitic function everything that i wrote above?
Thank you very much,
Mariana

July 26, 2006 - 5:08 pm UTC

here is your start:

ops$tkyte%ORA10GR2> select line_no, month, points_num,
  2         lag(points_num) over (partition by line_no order by month) last_points_num,
  3         first_value(points_num) over (partition by line_no order by month) first_points_num
  4    from lines
  5  /

   LINE_NO MONTH     POINTS_NUM LAST_POINTS_NUM FIRST_POINTS_NUM
---------- --------- ---------- --------------- ----------------
      1001 01-MAR-06         25                               25
      1001 01-APR-06         15              25               25
      1001 01-MAY-06         40              15               25
      1002 01-MAR-06         35                               35
      1002 01-APR-06         12              35               35
      1002 01-MAY-06         76              12               35

6 rows selected.

Continue to calc cumulative percentage

Mariana, July 26, 2006 - 5:12 pm UTC

Hi Tom,
Thank you for answer,
but how can i calc the total cumulative percentage?

July 26, 2006 - 5:20 pm UTC

you already seem to have those queries coded?!?

Continue to cumulative percentage

Mariana, July 26, 2006 - 5:33 pm UTC

Sorry,i didn't understand you.
What do you mean?

July 26, 2006 - 5:42 pm UTC

you wrote the summation queries for that - union all them if you want.

You want analytics for the lag()/first_value()

you need plain old aggregation for the "cumulative percentage" - and the two results are entire different "shapes"

don't see how they fit together really

Can you show me by example ,please, what you mean

A reader, July 26, 2006 - 5:46 pm UTC

Mariana

July 26, 2006 - 5:53 pm UTC

problem is - I wasn't sure what you meant by cumulative percent.

but your two sum queries certainly retrieve what you say you wanted? You have already written them.

....
and i have to calc total cumulative percentage:
SELECT SUM(POINTS_NUM) SUM1 FROM LINES
WHERE MONTH='01-MAR-2006'
SUM1:=25+35=60;

SELECT SUM(POINTS_NUM) SUM2 FROM LINES
WHERE MONTH='01-MAY-2006'
SUM2:=40+76=116;

TOTAL CUMULATIVE PERCENTAGE=((116-60)/60)*100;
..........

Analytic Query with other functions

A reader, July 26, 2006 - 5:52 pm UTC

Hi Tom,

How do i write the following query:

 select 1, max(decode(dummy,null,'NN',null)) D,
 max(dummy) over (partition by dummy order by dummy desc) S
 from dual
 group by 1;

I get the following error:

SQL> /
max(dummy) over (partition by dummy order by dummy desc) S
    *
ERROR at line 2:
ORA-00979: not a GROUP BY expression

Do not have access to Scott Schema, so used dual instead.
Oracle version: 9.2.0.7

Thanks as always

July 26, 2006 - 6:00 pm UTC

why are you grouping by.

I cannot tell you how to write a query when you haven't told me what question you are trying to ask.

you don't use group by with analytics, you use group by with aggregates.

Sorry for the improper posting

A reader, July 27, 2006 - 10:58 am UTC

Hi Tom,
sorry for the question.
Well the logic is like this;

I have a query on similar lines as follows:

select a.col1, b.col2, c.col3, a.dt,
max(decode(a.col2,null,'NN',null)) D,
from tab_a a,
tab_b b,
tab_c c
where
a.col1 = b.col1
and a.col2 = c.col2
and c.col1 = b.col1
and a.dt = (select max(in_a.dt) from tab_a in_a
where in_a.col1 = a.col1
and in_a.dt = a.dt)
and b.col2 = (select max(in_b.col2) from tab_b in_b
where in_b.col1 = b.col1
and in_b.col2 = b.col2)
group by a.col1, b.col2, c.col3, a.dt;

I want to know how can i rewrite that query using Analytic Functions or in a alternate manner.

Thanks as always.

July 27, 2006 - 12:16 pm UTC

instead of using tab_a in the from list use:

(select *
from (select a.*, max(dt) over (partition by col1, col2) max_dt from tab_a a)
where dt = max_dt
)

(I assume you meant to use col1, col2 not col1, dt in the subquery)

do the same for b

and drop the correlated subquery from the where clause

Thanks for the quick response.

A reader, July 27, 2006 - 1:56 pm UTC

Hi Tom,

Dropping the corelated query was the idea. But I did not know how to reformat the query.
So as per you suggestion it should be as follows:

Original Query:
----------------

select a.col1, b.col2, c.col3, a.dt,
max(decode(a.col2,null,'NN',null)) D,
from tab_a a,
tab_b b,
tab_c c
where
a.col1 = b.col1
and a.col2 = c.col2
and c.col1 = b.col1
and a.dt = (select max(in_a.dt) from tab_a in_a
where in_a.col1 = a.col1
and in_a.dt = a.dt)
and b.col2 = (select max(in_b.col2) from tab_b in_b
where in_b.col1 = b.col1
and in_b.col2 = b.col2)
group by a.col1, b.col2, c.col3, a.dt;

Your Suggestion:
----------------

instead of using tab_a in the from list use:

(select *
from (select a.*, max(dt) over (partition by col1, col2) max_dt from tab_a a)
where dt = max_dt
)

Query Rewritten as:
--------------------
select a.col1, b.col2, c.col3, a.dt,
max(decode(a.col2,null,'NN',null)) D,
from (select *
from (select a.*, max(dt) over (partition by col1, col2) max_dt from tab_a a)
where dt = max_dt
) a,
(select *
from (select b.*, max(col2) over (partition by col1, col2) max_dt from tab_b b)
where dt = max_dt
) b,
tab_c c
where
a.col1 = b.col1
and a.col2 = c.col2
and c.col1 = b.col1
group by a.col1, b.col2, c.col3, a.dt;

Please correct me if I am wrong.
Thanks as always

July 27, 2006 - 2:28 pm UTC

that is correct, all you want is the "most current record by col1, col2"

that analytic gets that record.

second max would be max(dt), not col2

What in case...

A reader, July 28, 2006 - 3:59 pm UTC

Hi Tom,

With reference to the question above:

What should be the approach if i have to do a max(col) on 2 or more cols from the same table.
can you explain with an example using the same query.

Thanks in advance.

July 28, 2006 - 8:53 pm UTC

give me a test case to work with - not sure what you want to do with the max of these two columns

RBO and Analytical Functions

Muhammad Riaz Shahid, July 30, 2006 - 4:42 am UTC

Tom,
Can we use analytical functions while using RBO?

SQL> create table x(a number);

Table created.

SQL>  select row_number() over(order by 1) from x;

no rows selected


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   WINDOW (NOSORT)
   2    1     TABLE ACCESS (FULL) OF 'X'


DB Version 9.2.0.5.0

July 30, 2006 - 8:42 am UTC

you just did.

Analytics and RBO

Jonathan Lewis, July 31, 2006 - 5:44 am UTC

You need to be a little careful with RBO and analytics. RBO is allowed to use an index to implement an "order by" clause without sorting - but if you include an analytic function in the select list the run-time engine may have to sort the data to partition or order by inside the analytic call - leaving the data in the wrong order ... RBO does not know about everything that can happen inside analytics.

I think there's a note about this, with an example, on my website.

A reader, August 05, 2006 - 1:50 am UTC

Hi Tom,

I have the following Analytics wich works fine but I have one little problem with the query.

It must discerned how the query is executed.

The Code column can only be A or B , I use these column for my partioning clause.

The columns pos1,pos2,pos3...pos41 can only be 0,1, or 2.

0 normal count as 0
1 normal count as 1
2 normal count as 1

Now the problem 0 and 1 be calculated in one query to get an result.

The columns with the value 2 can not be calculated within the query.

I must have two result orderd by 1, 0 and 2, 0

17/07/2006;05:59:58;17/07/2006;0388;XXXXXX1;A;1;2;0;0
17/07/2006;06:00:05;17/07/2006;1242;XXXXXX2;A;1;2;1;0
17/07/2006;06:00:11;17/07/2006;1146;XXXXXX3;A;2;1;2;0
24/07/2006;05:59:58;24/07/2006;0388;XXXXXX4;B;2;1;0;0
24/07/2006;06:00:05;24/07/2006;1242;XXXXXX5;B;1;2;1;0
24/07/2006;06:00:11;24/07/2006;1146;XXXXXX6;B;1;2;2;0

for th values with 0,1 the result schould look like this :

C POS1 POS2 POS3 POS3
- ---- ---- ---- ----
A 2 1 1 0
B 2 1 1 0

for th values with 0,1 the result schould look like this :

C POS1 POS2 POS3 POS3
- ---- ---- ---- ----
A 1 2 1 0
B 1 2 1 0

select distinct code,pos1,pos2,pos3,pos4,pos5,pos6,pos7,pos8,pos9,pos10,pos11,pos12,
pos13,pos14,pos15,pos16,pos17,pos18,pos19,pos20,pos21,pos22,pos23,pos24
from (
select vin_code,
sum(pos1) over (partition by code order by code) as pos1,
sum(pos2) over (partition by code order by code) as pos2,
sum(pos3) over (partition by code order by code) as pos3,
sum(pos1) over (partition by code order by code) as pos4,
sum(pos2) over (partition by code order by code) as pos5,
sum(pos1) over (partition by code order by code) as pos6,
sum(pos2) over (partition by code order by code) as pos7,
sum(pos3) over (partition by code order by code) as pos8,
sum(pos1) over (partition by code order by code) as pos9,
sum(pos2) over (partition by code order by code) as pos10,
sum(pos3) over (partition by code order by code) as pos11,
sum(pos1) over (partition by code order by code) as pos12,
sum(pos2) over (partition by code order by code) as pos13,
sum(pos3) over (partition by code order by code) as pos14,
sum(pos1) over (partition by code order by code) as pos15,
sum(pos2) over (partition by code order by code) as pos16,
sum(pos3) over (partition by code order by code) as pos17,
sum(pos1) over (partition by code order by code) as pos18,
sum(pos2) over (partition by code order by code) as pos19,
sum(pos3) over (partition by code order by code) as pos20,
sum(pos1) over (partition by code order by code) as pos21,
sum(pos2) over (partition by code order by code) as pos22,
sum(pos3) over (partition by code order by code) as pos23,
sum(pos1) over (partition by code order by code) as pos24
from monitoring)
group by code,pos1,pos2,pos3,pos4,pos5,pos6,pos7,pos8,pos9,pos10,pos11,pos12,
pos13,pos14,pos15,pos16,pos17,pos18,pos19,pos20,pos21,pos22,pos23,pos24;

So hope you have an idea ;-)

Regards
Marcel

August 05, 2006 - 10:50 am UTC

no idea, didn't follow this logic at all.

And I see no table creates, no inserts.....

Question review

Marcel, August 07, 2006 - 1:31 am UTC

Hi Tom,

I will try it again without errors in the question !

First the create table and some inserts

create table vin_check
(
seq number(4),
vin_code varchar2(10),
system varchar2(1),
pos1 number(1),
pos2 number(1),
pos3 number(1)
);

insert into vin_check values(0001,'XXXXXXXX01','A',1,1,1);
insert into vin_check values(0002,'XXXXXXXX02','A',0,1,2);
insert into vin_check values(0003,'XXXXXXXX03','A',2,2,0);
insert into vin_check values(0004,'XXXXXXXX04','A',1,0,1);
insert into vin_check values(0005,'XXXXXXXX05','B',1,1,1);
insert into vin_check values(0006,'XXXXXXXX06','B',0,1,2);
insert into vin_check values(0006,'XXXXXXXX06','B',2,2,0);
insert into vin_check values(0007,'XXXXXXXX07','B',1,0,1);

The problem I have is the following I must calculate the errors in the system not together
but differenced by system and the factor of how they sum is the error value.

So I took system a partition column. (Query see below )

Here what the numbers mean.
0 is no error,
1 internal error,
2 external error.

SEQ VIN_CODE S POS1 POS2 POS3
---------- ---------- - ---------- ---------- ----------
1 XXXXXXXX01 A 1 1 1
2 XXXXXXXX02 A 0 1 0
3 XXXXXXXX03 A 2 2 0
4 XXXXXXXX04 A 2 0 1
5 XXXXXXXX05 B 1 1 1
6 XXXXXXXX06 B 0 1 2
6 XXXXXXXX06 B 2 2 0
7 XXXXXXXX07 B 0 0 2

select distinct system,pos1,pos2,pos3 from (
select system,
sum(pos1) over (partition by system order by system) as pos1,
sum(pos2) over (partition by system order by system) as pos2,
sum(pos3) over (partition by system order by system) as pos3
from vin_check)
group by system,pos1,pos2,pos3;

S POS1 POS2 POS3
- ---------- ---------- ----------
A 5 4 2
B 3 4 5

This would be the result with the query above, but I need an another result
the errors have not the same value ( for me ) and must treat alone. So it is recommend that I calculate the
errors with 1 and the errors with 2 seperatly.

Would I run the Query for errors with the value 1 the result should look like this:

S POS1 POS2 POS3
- ---------- ---------- ----------
A 1 2 2
B 1 2 1

Would I run the Query for errors with the value 2 the result should look like this:

S POS1 POS2 POS3
- ---------- ---------- ----------
A 2 1 0
B 1 1 2

Thanks Marcel

August 07, 2006 - 8:00 am UTC

this:

select distinct system,pos1,pos2,pos3 from (
select system,
sum(pos1) over (partition by system order by system) as pos1,
sum(pos2) over (partition by system order by system) as pos2,
sum(pos3) over (partition by system order by system) as pos3
from vin_check)
group by system,pos1,pos2,pos3;

is just wrong, you use analytics when you DON'T want to aggregate, just aggregate otherwise:

select system, sum(pos1), sum(pos2), sum(pos3) from vin_check group by system;

that is all that query should ever be.

Now, it would be nice if your supplied example data (create+inserts) matched your text example!!! but they don't, we'll make due:

tkyte%ORCL> select * from vin_check;

SEQ VIN_CODE S POS1 POS2 POS3
---- ---------- - ---- ---- ----
1 XXXXXXXX01 A 1 1 1
2 XXXXXXXX02 A 0 1 2
3 XXXXXXXX03 A 2 2 0
4 XXXXXXXX04 A 1 0 1
5 XXXXXXXX05 B 1 1 1
6 XXXXXXXX06 B 0 1 2
6 XXXXXXXX06 B 2 2 0
7 XXXXXXXX07 B 1 0 1

8 rows selected.

tkyte%ORCL> select system,
2 sum(pos1) pos1_sum,
3 sum(pos2) pos2_sum,
4 sum(pos3) pos3_sum,
5 count(case when pos1=1 then 1 end) "pos1=1",
6 count(case when pos2=1 then 1 end) "pos2=1",
7 count(case when pos2=1 then 1 end) "pos2=1",
8 count(case when pos1=2 then 1 end) "pos1=2",
9 count(case when pos2=2 then 1 end) "pos2=2",
10 count(case when pos2=2 then 1 end) "pos2=2"
11 from vin_check
12 group by system;

S POS1_SUM POS2_SUM POS3_SUM pos1=1 pos2=1 pos2=1 pos1=2 pos2=2 pos2=2
- -------- -------- -------- ------ ------ ------ ------ ------ ------
A 4 4 4 2 2 2 1 1 1
B 4 4 4 2 2 2 1 1 1

that gets it all at once, better than three queries, but if you need three queries, use the columns from above - and a simple group by.

Again confusion.....

Vinayak Awasthi, August 15, 2006 - 9:47 am UTC

Hi Tom,
Just started reading the Analytics chapter in your book and got a bit puzzled. See the following queries:
<<QUERY1>>
select sum(sal) over(order by ename,deptno) from emp
<<QUERY1>>
<<QUERY2>>
select sum(sal) over(order by deptno,ename) from emp
<<QUERY2>>
<<QUERY3>>
select sum(sal) over(order by deptno,ename),sum(sal) over(order by ename,deptno) from emp
<<QUERY3>>
<<QUERY4>>
select sum(sal) over(order by ename,deptno), sum(sal) over(order by deptno,ename) from emp
<<QUERY4>>

The result from query1 and query2 is obvious and understandable. But query3 & query4 bumped me. I am not able to understand how this result set is arrived. Moreover, the query3 & query4 are exactly the same with just the two columns interchanged but the result differs considerably.
Tom, can you please explain this behaviour of analytics.
Thanks

August 15, 2006 - 12:18 pm UTC

if you order the results consistently, you'll find 3 and 4 are identical with the columns reversed:

scott%ORA9IR2> select sum(sal) over(order by deptno,ename),sum(sal) over(order by ename,deptno)
2 from emp
3 order by rowid
4 /

SUM(SAL)OVER(ORDERBYDEPTNO,ENAME) SUM(SAL)OVER(ORDERBYENAME,DEPTNO)
--------------------------------- ---------------------------------
19625 26275
21225 2700
29025 29025
15825 14925
26275 21175
24075 5550
2450 8000
18825 25475
7450 19925
27775 27775
9850 1100
25025 11950
12850 11000
8750 22475

14 rows selected.

scott%ORA9IR2> select sum(sal) over(order by ename,deptno), sum(sal) over(order by
2 deptno,ename) from emp
3 order by rowid
4 /

SUM(SAL)OVER(ORDERBYENAME,DEPTNO) SUM(SAL)OVER(ORDERBYDEPTNO,ENAME)
--------------------------------- ---------------------------------
26275 19625
2700 21225
29025 29025
14925 15825
21175 26275
5550 24075
8000 2450
25475 18825
19925 7450
27775 27775
1100 9850
11950 25025
11000 12850
22475 8750

14 rows selected.

Addendum to my question above

Vinayak Awasthi, August 15, 2006 - 10:17 am UTC

Tom,
in addition to above question, I wanted to ask that how ORDER BY is processed by oracle in case of analytics.If the query does not have an outer order by but implicit analytic columns contain different order by, which order by oracle actually picks to generate the final result set.By running the different queries, I beleive the order by of the last column of the select.
Please advice.

August 15, 2006 - 12:19 pm UTC

unless and until you have an order by on your SQL statement - please make no assumptions about the order of rows, you cannot.

until you have order by, there is no order.

Things not clear !!!!

Vinayak Awasthi, August 16, 2006 - 5:32 am UTC

Tom,
I agree that if we order the result consistently, then our results in both the queries are same but jsut reversed. But my confusion is where we do not provide any order by clause.If you see my query3 & query4, I have not give any order by clause to the sql, then what causes oracle to return the result set differently. Should it not give the same results in both thee queries with the columns reversed.
Please guide.

August 16, 2006 - 8:36 am UTC

if you do not give an order by

YOU CANNOT HAVE ANY EXPECTATION ON THE ORDER OF ROWS.

period.

the result sets are exactly the same - with or without the order by. They are just returned with the rows in different orders.

Thanks Tom...

Vinayak Awasthi, August 16, 2006 - 10:17 am UTC

Thanks Tom,
After running the queries multiple times, I understood what you were trying to explain.
Thanks once again for your time and guidance.

getting wrong answer

Vinayak, August 21, 2006 - 9:19 am UTC

Hi Tom,
I have the following table with the set of data. This is just a short version of the actual table. The actual table has around 13 million records.

CREATE TABLE MY_TEST_TABLE(ID NUMBER,KEY NUMBER,TYPEOFRECORD VARCHAR2(20BYTE),
MYDATE DATE);

Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values(6366556, 404887, 'GP', TO_DATE('07/23/2004 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(6366516, 404887, 'GP', TO_DATE('07/23/2004 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(6366565, 404887, 'GP', TO_DATE('07/23/2004 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(6366568, 404887, 'GP', TO_DATE('07/23/2004 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(7076940, 404887, 'CE', TO_DATE('11/04/2004 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(18197564, 404887, 'CE', TO_DATE('08/29/2005 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(17561339, 404887, 'CE', TO_DATE('05/05/2006 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(18381063, 404887, 'CE', TO_DATE('05/19/2006 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(18381260, 404887, 'CE', TO_DATE('06/09/2006 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(18386869, 404887, 'CE', TO_DATE('06/10/2006 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(18895620, 404887, 'CE', TO_DATE('06/10/2006 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(17769950, 404887, 'CE', TO_DATE('05/06/2006 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(18096803, 404887, 'CE', TO_DATE('05/19/2006 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(18381262, 404887, 'CE', TO_DATE('06/09/2006 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MY_TEST_TABLE(ID, KEY, TYPEOFRECORD, MYDATE)
Values
(18381270, 404887, 'CE', TO_DATE('06/09/2006 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
COMMIT;

My requirement is to get the highest ID value, TYPEOFRECORD value for a given key value. So I wrote this query:

select KEY, max(ID) over (partition by KEY) myid, TYPEOFRECORD
from MY_TEST_TABLE;

but this query returns as many number of records as there are records for a particular KEY value.

But if I write this query it gives the correct result:

select tt1.key,tt1.id,tt1.MYDATE,tt1.TYPEOFRECORD
from MY_TEST_TABLE tt1 where (tt1.KEY,tt1.ID) in
(select tt.KEY, max(tt.ID) myid
from MY_TEST_TABLE tt
group by tt.key)

What wrong I am doing in the analytic query as I wanted to get the result through analytics and not use the traditional GROUP BY clause.

Thanks
Vinayak

August 27, 2006 - 9:25 am UTC

analytics are useful when you DO NOT WANT TO AGGREGATE.

see
</code> https://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52asktom-1735913.html <code>
the "PLAYING WITH AGGREGATION" section.

You do in fact want to aggregate.

Help required on one scenario related to total promotion calculation

Hardik Panchal, September 30, 2006 - 7:00 am UTC

Hi Tom,

I have to calculate promotion in percentage (if % off given, keep the value as it is, if amount off given, convert it to percentage) per customer and item id in one base table and then need to fill summary table based on the base table.

For example,

CUSTOMR: 01
ITEM: 123456
BASE-SDV: 100

Abbreviations used:

PKG: Package Number
GRP: Group Number
EVE: Event Number
SEQ: Sequence Number
Base-SDV: synonym to Item Price. SDV = Store Door Value

Case 1 (Same package):

Promotion data defined:

PKG GRP EVE %off Amt.off SEQ
01 01 01 0 10 01
01 01 02 10 0 02

Calculation required:

For Same Package and Group-Customer-Product, First Event will be applied to its Base-SDV.(means 100 -10 =90). After that Second Event will be applied to Base-SDV which is deducted from first Event.[means 90-(10% of 90)=81]

Data to fill in Base-Table:

PKG GRP EVE %off Amt.off Discount SEQ
01 01 01 0 10 10% 01
01 01 02 10 0 9% 02 *

* Although 10% discount is defined, actually 9% applied

Data to fill in Summary Table:

Customer ITEM TotalPromo
01 123456 19%

Case 2 (Different Package):

PKG GRP EVE %off Amt.off SEQ
01 01 01 0 10 01
02 01 02 10 0 02

Calculation required:

For different Package and Group-Customer-Product ,Event in 1st package will be applied to its Base-SDV.(means 100 -10 =90). After that, event of 2nd package will be applied to original item Base-SDV (without deducting any previous promotion discount) [means 90 - (10% of 100) = 80]

Data to fill in Base-Table:

PKG GRP EVE %off Amt.off Discount SEQ
01 01 01 0 10 10% 01
02 01 01 10 0 10% 02

Data to fill in Summary Table:

Customer ITEM TotalPromo
01 123456 20%

Quick reply is appreciated !

Thanks in Advance :-)

Hardik Panchal

September 30, 2006 - 8:11 am UTC

no create
no inserts
no look

don't know if it can be done, did not really look

Not getting your comments

Hardik Panchal, September 30, 2006 - 9:12 am UTC

Do you mean to say it doesn't look possible or you require more information like table structures , insert stmts.

Thanks

October 01, 2006 - 12:28 am UTC

correct, without a table create and insert intos - I'm not going to be able to test a sql solution and therefore could not look at it.

analytical Functions

Maulesh Jani, September 30, 2006 - 9:30 am UTC

Hi ,
Soory for last mail.
LOGIC :
This table has base data for customer's Products promotion information . Now on the base of this table I have to create replicate table shows the infor in cumulative percentage of promotion .
Like
cust item event price percent-promotion
01 01 01 100 10
01 01 02 100 5

requested output table will contain one more column

cust item event price percent-promotion cumu-promo
01 01 01 100 10 10
01 01 02 100 5 15

I just want to know that for such scenario does analytical functions can be used ? If yes then pl.can give me any example
Thanks and Pl. guide
Regards
Maulesh Jani

October 01, 2006 - 12:30 am UTC

sum(percent-promotition) over (partition by YOUR_GROUP order by event)

I think your group might be cust, item - but not sure as example is "not very example-ish"

Details Information Provided

Hardik Panchal, September 30, 2006 - 10:22 am UTC

I have following table:

CREATE TABLE TEMP_PROMO_CALC_2
(
CTRY_CODE CHAR(3 BYTE) NOT NULL,
CO_CODE CHAR(3 BYTE) NOT NULL,
CUST_NBR NUMBER NOT NULL,
ITEM_ID CHAR(20 BYTE),
EVENT_NBR NUMBER NOT NULL,
PROMO_SEQ_NBR NUMBER(2),
EVENT_ALLOW_PCT NUMBER(5,4),
EVENT_ALLOW_AMT NUMBER,
BASE_PRICE NUMBER,
NEW_PRICE NUMBER,
PERCENT_DIC NUMBER
)

Having following data in this table:

INSERT INTO TEMP_PROMO_CALC_2 ( CTRY_CODE, CO_CODE, CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR,
EVENT_ALLOW_PCT, EVENT_ALLOW_AMT, BASE_PRICE, NEW_PRICE ) VALUES (
'138', '002', 1, '100', 1, 1, 4, 0, 100, NULL);

INSERT INTO TEMP_PROMO_CALC_2 ( CTRY_CODE, CO_CODE, CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR,
EVENT_ALLOW_PCT, EVENT_ALLOW_AMT, BASE_PRICE, NEW_PRICE ) VALUES (
'138', '002', 1, '100', 5, 2, 0, 20, 100, NULL);

COMMIT;

Here, EVENT_ALLOW_PCT is the % discount, while EVENT_ALLOW_AMT is the discount amount in $.

I want following things:
- First convert all the promotions in % form
- Apply the promotion in the seq. mentioned by PROMO_SEQ_NBR

In above example,

- if 4% (Event Seq 1, so this will be applied first) given on a 100 $ price, NEW_PRICE field value after applying this promotion will be 96 $. The PERCENT_DIC field value will be 4% for PROMO_SEQ_NBR -> 1.

- Then 20$ amount off promotion is given. This 20$ amount off promotion needs to be converted into % off i.e. 20$ will be deducted from the 96 $ (amount calculated after applying promotion in order of PROMO_SEQ_NBR, like 1st seq. is 4% percent off and then 2nd sequence is 20$ amount off discount). In this example, the PERCENT_DIC for this 2nd seq. will be 20.83 % [calculated as (20*100)/96] and NEW_PRICE filed value will be 76 $

Desrired Result:

CUST_NBR ITEM_ID EVENT_NBR PROMO_SEQ_NBR EVENT_ALLOW_PCT
--------- -------- --------- ------------- ----------------
1 100 1 1 4
1 100 5 2 0

EVENT_ALLOW_AMT BASE_PRICE NEW_PRICE PERCENT_DIC
---------------- ---------- --------- -----------
0 100 96 4
20 100 76 20.83

I believe this can be achieved with the analytic functions !

Regards,

Hardik Panchal

October 01, 2006 - 12:41 am UTC

ok, so, what happens with other rows, what are the groups here, is event_nbr there just to 'confuse' (is it used for anything in the example or just an extra bit)

start with this I guess


ops$tkyte%ORA10GR2> select cust_nbr, item_id, promo_seq_nbr,
  2         event_allow_pct, event_allow_amt,
  3             sum(pr * (100-event_allow_pct)/100 - event_allow_amt) over (partition by cust_nbr,it
em_id order
  4             by promo_seq_nbr) new_pr
  5    from (
  6  select cust_nbr, item_id, promo_seq_nbr,
  7         event_allow_pct, event_allow_amt,
  8         case when row_number() over (partition by cust_nbr, item_id
  9             order by promo_seq_nbr) = 1 then base_price else 0 end pr
 10    from temp_promo_calc_2
 11         )
 12  /

  CUST_NBR ITE PROMO_SEQ_NBR EVENT_ALLOW_PCT EVENT_ALLOW_AMT     NEW_PR
---------- --- ------------- --------------- --------------- ----------
         1 100             1               4               0         96
         1 100             2               0              20         76

Analytical Question

Hardik Panchal, October 01, 2006 - 2:31 am UTC

Hi TOM,
Thanks for your help .Yes you are right that event_nbr is not usefull in calculation. Your query helped me , but when i tried it on other data then it return me wrong info, i m trying to update your query for desired output,but still not get it .
Here is insert stmt for the base table data :
INSERT INTO TEMP_PROMO_CALC_2 ( CTRY_CODE, CO_CODE, CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR,
EVENT_ALLOW_PCT, EVENT_ALLOW_AMT, BASE_PRICE, NEW_PRICE, PERCENT_DIC ) VALUES (
'138', '002', 1, '100 ', 1, 1, 4, 0, 100, NULL, NULL);
INSERT INTO TEMP_PROMO_CALC_2 ( CTRY_CODE, CO_CODE, CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR,
EVENT_ALLOW_PCT, EVENT_ALLOW_AMT, BASE_PRICE, NEW_PRICE, PERCENT_DIC ) VALUES (
'138', '002', 1, '100 ', 5, 2, 0, 20, 100, NULL, NULL);
INSERT INTO TEMP_PROMO_CALC_2 ( CTRY_CODE, CO_CODE, CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR,
EVENT_ALLOW_PCT, EVENT_ALLOW_AMT, BASE_PRICE, NEW_PRICE, PERCENT_DIC ) VALUES (
'138', '002', 1, '100 ', 3, 3, 5, 0, 100, NULL, NULL);
INSERT INTO TEMP_PROMO_CALC_2 ( CTRY_CODE, CO_CODE, CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR,
EVENT_ALLOW_PCT, EVENT_ALLOW_AMT, BASE_PRICE, NEW_PRICE, PERCENT_DIC ) VALUES (
'138', '002', 2, '200 ', 4, 1, 0, 10, 100, NULL, NULL);
INSERT INTO TEMP_PROMO_CALC_2 ( CTRY_CODE, CO_CODE, CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR,
EVENT_ALLOW_PCT, EVENT_ALLOW_AMT, BASE_PRICE, NEW_PRICE, PERCENT_DIC ) VALUES (
'138', '002', 2, '200 ', 5, 2, 5, 0, 100, NULL, NULL);
COMMIT;

after it when I applies ur query then it doesnt show me the correct output for New_pr . (means in this case when customer having 3 Promotion on same Product and when customer having promotion in different order of Amount_off and Promo_off) ,
. This query works well in the case when %off is first and amount off second .

October 01, 2006 - 3:06 am UTC

"ur"????? what is that.

you best supply a really good detailed specification. eg: explain precisely, in such detail that someone could write code from your specification - even if they never saw your data before and didn't really know your data

eg: that is what we have here. so, specify it out in really good detail like you were going to hire a consultant to write code from your specification.

Over?

Anderson Haertel Rodrigues, October 03, 2006 - 6:12 pm UTC

Hello All,
Oracle 9.2.0.5.0

My doubt as (please):

CREATE GLOBAL TEMPORARY TABLE ETMPTESTE
(
CdPerfilAgrupamento integer,
CdFuncAgrupamento integer,
CdOperacao integer,
UO char(1),
NatVinc char(1),
RelTrab char(1),
Regiao char(1),
RegTrab char(1),
SitPrev char(1),
Carreira char(1),
Cargo char(1),
Funcao char(1)
)
ON COMMIT PRESERVE ROWS;

insert into ETMPTESTE
(
CdPerfilAgrupamento,
CdFuncAgrupamento,
CdOperacao,
UO,
NatVinc,
RelTrab,
Regiao,
RegTrab,
SitPrev,
Carreira,
Cargo,
Funcao
)
values
(
1,85,1,'N','E','T','E','N','N','N','N','N'
);

insert into ETMPTESTE
(
CdPerfilAgrupamento,
CdFuncAgrupamento,
CdOperacao,
UO,
NatVinc,
RelTrab,
Regiao,
RegTrab,
SitPrev,
Carreira,
Cargo,
Funcao
)
values
(
2,85,1,'E','N','E','N','E','N','N','E','N'
);

insert into ETMPTESTE
(
CdPerfilAgrupamento,
CdFuncAgrupamento,
CdOperacao,
UO,
NatVinc,
RelTrab,
Regiao,
RegTrab,
SitPrev,
Carreira,
Cargo,
Funcao
)
values
(
3,85,1,'T','T','T','T','T','T','T','T','T'
);

insert into ETMPTESTE
(
CdPerfilAgrupamento,
CdFuncAgrupamento,
CdOperacao,
UO,
NatVinc,
RelTrab,
Regiao,
RegTrab,
SitPrev,
Carreira,
Cargo,
Funcao
)
values
(
4,85,1,'E','E','E','E','E','E','E','E','E'
);

My Query:
select FA,
Op,
UOAbrangencia,
NatVincAbrangencia,
RelTrabAbrangencia,
RegiaoAbrangencia,
RegTrabAbrangencia,
SitPrevAbrangencia,
CEFAbrangencia,
CCOAbrangencia,
FUCAbrangencia
from
(
select FA,
Op,
CASE WHEN UO = 'T' AND UOLead = 'T' THEN 'T'
WHEN UO = 'T' AND UOLead = 'E' THEN 'T'
WHEN UO = 'T' AND UOLead = 'N' THEN 'T'
WHEN UO = 'E' AND UOLead = 'T' THEN 'T'
WHEN UO = 'E' AND UOLead = 'E' THEN 'E'
WHEN UO = 'E' AND UOLead = 'N' THEN 'E'
WHEN UO = 'N' AND UOLead = 'T' THEN 'T'
WHEN UO = 'N' AND UOLead = 'E' THEN 'E'
WHEN UO = 'N' AND UOLead = 'N' THEN 'N'
END UOAbrangencia,
CASE WHEN NatVinc = 'T' AND NatVincLead = 'T' THEN 'T'
WHEN NatVinc = 'T' AND NatVincLead = 'E' THEN 'T'
WHEN NatVinc = 'T' AND NatVincLead = 'N' THEN 'T'
WHEN NatVinc = 'E' AND NatVincLead = 'T' THEN 'T'
WHEN NatVinc = 'E' AND NatVincLead = 'E' THEN 'E'
WHEN NatVinc = 'E' AND NatVincLead = 'N' THEN 'E'
WHEN NatVinc = 'N' AND NatVincLead = 'T' THEN 'T'
WHEN NatVinc = 'N' AND NatVincLead = 'E' THEN 'E'
WHEN NatVinc = 'N' AND NatVincLead = 'N' THEN 'N'
END NatVincAbrangencia,
CASE WHEN RelTrab = 'T' AND RelTrabLead = 'T' THEN 'T'
WHEN RelTrab = 'T' AND RelTrabLead = 'E' THEN 'T'
WHEN RelTrab = 'T' AND RelTrabLead = 'N' THEN 'T'
WHEN RelTrab = 'E' AND RelTrabLead = 'T' THEN 'T'
WHEN RelTrab = 'E' AND RelTrabLead = 'E' THEN 'E'
WHEN RelTrab = 'E' AND RelTrabLead = 'N' THEN 'E'
WHEN RelTrab = 'N' AND RelTrabLead = 'T' THEN 'T'
WHEN RelTrab = 'N' AND RelTrabLead = 'E' THEN 'E'
WHEN RelTrab = 'N' AND RelTrabLead = 'N' THEN 'N'
END RelTrabAbrangencia,
CASE WHEN Regiao = 'T' AND RegiaoLead = 'T' THEN 'T'
WHEN Regiao = 'T' AND RegiaoLead = 'E' THEN 'T'
WHEN Regiao = 'T' AND RegiaoLead = 'N' THEN 'T'
WHEN Regiao = 'E' AND RegiaoLead = 'T' THEN 'T'
WHEN Regiao = 'E' AND RegiaoLead = 'E' THEN 'E'
WHEN Regiao = 'E' AND RegiaoLead = 'N' THEN 'E'
WHEN Regiao = 'N' AND RegiaoLead = 'T' THEN 'T'
WHEN Regiao = 'N' AND RegiaoLead = 'E' THEN 'E'
WHEN Regiao = 'N' AND RegiaoLead = 'N' THEN 'N'
END RegiaoAbrangencia,
CASE WHEN RegTrab = 'T' AND RegTrabLead = 'T' THEN 'T'
WHEN RegTrab = 'T' AND RegTrabLead = 'E' THEN 'T'
WHEN RegTrab = 'T' AND RegTrabLead = 'N' THEN 'T'
WHEN RegTrab = 'E' AND RegTrabLead = 'T' THEN 'T'
WHEN RegTrab = 'E' AND RegTrabLead = 'E' THEN 'E'
WHEN RegTrab = 'E' AND RegTrabLead = 'N' THEN 'E'
WHEN RegTrab = 'N' AND RegTrabLead = 'T' THEN 'T'
WHEN RegTrab = 'N' AND RegTrabLead = 'E' THEN 'E'
WHEN RegTrab = 'N' AND RegTrabLead = 'N' THEN 'N'
END RegTrabAbrangencia,
CASE WHEN SitPrev = 'T' AND SitPrevLead = 'T' THEN 'T'
WHEN SitPrev = 'T' AND SitPrevLead = 'E' THEN 'T'
WHEN SitPrev = 'T' AND SitPrevLead = 'N' THEN 'T'
WHEN SitPrev = 'E' AND SitPrevLead = 'T' THEN 'T'
WHEN SitPrev = 'E' AND SitPrevLead = 'E' THEN 'E'
WHEN SitPrev = 'E' AND SitPrevLead = 'N' THEN 'E'
WHEN SitPrev = 'N' AND SitPrevLead = 'T' THEN 'T'
WHEN SitPrev = 'N' AND SitPrevLead = 'E' THEN 'E'
WHEN SitPrev = 'N' AND SitPrevLead = 'N' THEN 'N'
END SitPrevAbrangencia,
CASE WHEN Carreira = 'T' AND CarreiraLead = 'T' THEN 'T'
WHEN Carreira = 'T' AND CarreiraLead = 'E' THEN 'T'
WHEN Carreira = 'T' AND CarreiraLead = 'N' THEN 'T'
WHEN Carreira = 'E' AND CarreiraLead = 'T' THEN 'T'
WHEN Carreira = 'E' AND CarreiraLead = 'E' THEN 'E'
WHEN Carreira = 'E' AND CarreiraLead = 'N' THEN 'E'
WHEN Carreira = 'N' AND CarreiraLead = 'T' THEN 'T'
WHEN Carreira = 'N' AND CarreiraLead = 'E' THEN 'E'
WHEN Carreira = 'N' AND CarreiraLead = 'N' THEN 'N'
END CEFAbrangencia,
CASE WHEN Cargo = 'T' AND CargoLead = 'T' THEN 'T'
WHEN Cargo = 'T' AND CargoLead = 'E' THEN 'T'
WHEN Cargo = 'T' AND CargoLead = 'N' THEN 'T'
WHEN Cargo = 'E' AND CargoLead = 'T' THEN 'T'
WHEN Cargo = 'E' AND CargoLead = 'E' THEN 'E'
WHEN Cargo = 'E' AND CargoLead = 'N' THEN 'E'
WHEN Cargo = 'N' AND CargoLead = 'T' THEN 'T'
WHEN Cargo = 'N' AND CargoLead = 'E' THEN 'E'
WHEN Cargo = 'N' AND CargoLead = 'N' THEN 'N'
END CCOAbrangencia,
CASE WHEN Funcao = 'T' AND FuncaoLead = 'T' THEN 'T'
WHEN Funcao = 'T' AND FuncaoLead = 'E' THEN 'T'
WHEN Funcao = 'T' AND FuncaoLead = 'N' THEN 'T'
WHEN Funcao = 'E' AND FuncaoLead = 'T' THEN 'T'
WHEN Funcao = 'E' AND FuncaoLead = 'E' THEN 'E'
WHEN Funcao = 'E' AND FuncaoLead = 'N' THEN 'E'
WHEN Funcao = 'N' AND FuncaoLead = 'T' THEN 'T'
WHEN Funcao = 'N' AND FuncaoLead = 'E' THEN 'E'
WHEN Funcao = 'N' AND FuncaoLead = 'N' THEN 'N'
END FUCAbrangencia
from
( select CdFuncAgrupamento FA,
CdOperacao Op,
UO,
LEAD(UO) over (order by CdFuncAgrupamento, cdOperacao ) UOLead,
NatVinc,
LEAD(NatVinc) over (order by CdFuncAgrupamento, cdOperacao ) NatVincLead,
RelTrab,
LEAD(RelTrab) over (order by CdFuncAgrupamento, cdOperacao ) RelTrabLead,
Regiao,
LEAD(Regiao) over (order by CdFuncAgrupamento, cdOperacao ) RegiaoLead,
RegTrab,
LEAD(RegTrab) over (order by CdFuncAgrupamento, cdOperacao ) RegTrabLead,
SitPrev,
LEAD(SitPrev) over (order by CdFuncAgrupamento, cdOperacao ) SitPrevLead,
Carreira,
LEAD(Carreira) over (order by CdFuncAgrupamento, cdOperacao ) CarreiraLead,
Cargo,
LEAD(Cargo) over (order by CdFuncAgrupamento, cdOperacao ) CargoLead,
Funcao,
LEAD(Funcao) over (order by CdFuncAgrupamento, cdOperacao ) FuncaoLead
from ETMPTESTE
)
)
where UOAbrangencia is not null
group by FA,
Op,
UOAbrangencia,
NatVincAbrangencia,
RelTrabAbrangencia,
RegiaoAbrangencia,
RegTrabAbrangencia,
SitPrevAbrangencia,
CEFAbrangencia,
CCOAbrangencia,
FUCAbrangencia
/

This result as:
FA OP U N R R R S C C F
--- -- - - - - - - - - -
85 1 E E T E E N N E N
85 1 T T T T T T T T T

But, desired:

FA OP U N R R R S C C F
--- -- - - - - - - - - -
85 1 T T T T T T T T T

Thanks All

October 03, 2006 - 7:28 pm UTC

"sorry"????

I have no idea why you expected what you expected.

or - WHAT YOU EXPECTED at all.

I see lots of code that quite simply does not work and no explaination of why you expected what you think should have come out.

and frankly, i don't know what to do about it anymore. People thing other people should be mentally linked or something. That we should be able to look at "code that doesn't work" and "this is what I expected" from abismally small test cases that hardly exercise the boundary conditions or anything...........

it does get discouraging.

please, just pretend you are trying to explain this to your mom, use simple terms, present it simply, yet COMPLETELY. I have *no clue* why you expected that row and not the two that are correctly returned...

Over?

Anderson Haertel Rodrigues, October 04, 2006 - 8:45 am UTC

Tom, Sorry!

My english for write, is very bad! ;-)

But,

CREATE GLOBAL TEMPORARY TABLE ETMPTESTE
(
CdPerfilAgrupamento integer,
CdFuncAgrupamento integer,
CdOperacao integer,
UO char(1),
NatVinc char(1),
RelTrab char(1),
Regiao char(1),
RegTrab char(1),
SitPrev char(1),
Carreira char(1),
Cargo char(1),
Funcao char(1)
)
ON COMMIT PRESERVE ROWS;

insert into ETMPTESTE
(CdPerfilAgrupamento, CdFuncAgrupamento,CdOperacao,
UO, NatVinc, RelTrab, Regiao, RegTrab, SitPrev,
Carreira, Cargo, Funcao)
values
(1,85,1,'N','E','T','E','N','N','N','N','N');

insert into ETMPTESTE
(CdPerfilAgrupamento, CdFuncAgrupamento, CdOperacao,
UO, NatVinc, RelTrab, Regiao, RegTrab, SitPrev,
Carreira, Cargo, Funcao)
values
(2,85,1,'E','N','E','N','E','N','N','E','N');

insert into ETMPTESTE
(CdPerfilAgrupamento, CdFuncAgrupamento, CdOperacao,
UO, NatVinc, RelTrab, Regiao, RegTrab, SitPrev,
Carreira, Cargo, Funcao )
values
(3,85,1,'T','T','T','T','T','T','T','T','T');

insert into ETMPTESTE
(CdPerfilAgrupamento, CdFuncAgrupamento, CdOperacao,
UO, NatVinc, RelTrab, Regiao, RegTrab, SitPrev,
Carreira, Cargo, Funcao )
values
(4,85,1,'E','E','E','E','E','E','E','E','E');

sigrharq@SIGRHDES>select CDPERFILAGRUPAMENTO, CDFUNCAGRUPAMENTO, CDOPERACAO, UO from etmpteste;

CDPERFILAGRUPAMENTO CDFUNCAGRUPAMENTO CDOPERACAO U
------------------- ------------------ ------------------ -
1 85 1 N
2 85 1 E
3 85 1 T
4 85 1 E

But, my desired result as:

CDFUNCAGRUPAMENTO CDOPERACAO UO
85 1 T

without CDPERFILAGRUPAMENTO, where:

IF UO = 'T' AND Lead = 'T'
Column := 'T'
elsif UO = 'T' AND Lead = 'E'
Column := 'T'
elsif UO = 'T' AND Lead = 'N'
Column := 'T'
elsif UO = 'E' AND Lead = 'T'
Column := 'T'
elsif UO = 'E' AND Lead = 'E'
Column := 'E'
elsif UO = 'E' AND Lead = 'N'
Column := 'E
elsif UO = 'N' AND Lead = 'T'
Column := 'T'
elsif UO = 'N' AND Lead = 'E'
Column := 'E'
elsif UO = 'N' AND Lead = 'N'
Column := 'N'

my query:

select FA,
Op,
UOAbrangencia
from
(
select FA,
Op,
CASE WHEN UO = 'T' AND UOLead = 'T' THEN 'T'
WHEN UO = 'T' AND UOLead = 'E' THEN 'T'
WHEN UO = 'T' AND UOLead = 'N' THEN 'T'
WHEN UO = 'E' AND UOLead = 'T' THEN 'T'
WHEN UO = 'E' AND UOLead = 'E' THEN 'E'
WHEN UO = 'E' AND UOLead = 'N' THEN 'E'
WHEN UO = 'N' AND UOLead = 'T' THEN 'T'
WHEN UO = 'N' AND UOLead = 'E' THEN 'E'
WHEN UO = 'N' AND UOLead = 'N' THEN 'N'
END UOAbrangencia
from
( select CdFuncAgrupamento FA,
CdOperacao Op,
UO,
LEAD(UO) over (order by CdFuncAgrupamento, cdOperacao ) UOLead
from ETMPTESTE
order by CdOperacao, CdFuncAgrupamento
)
)
where UOAbrangencia is not null
group by FA,
Op,
UOAbrangencia
/

My result:

FA OP U
--- -- -
85 1 E
85 1 T

Thanks!!!

October 04, 2006 - 5:06 pm UTC

no, see I need to know the "logic here", reading a failed implementation doesn't help.

no more select's, just tell us WHY there is just one row, what this row represents, how you arrived at it, what is the LOGIC

Over?

Anderson Haertel Rodrigues, October 06, 2006 - 8:51 am UTC

Hello Tom!

Thanks for your answers and patience.

My query now it´s Ok!

Thanks all!

Top-n query

Anu, October 10, 2006 - 11:10 am UTC

Hi Tom,

I have to find the most recent 1000 unique values from a set of data. To make it easy here, I've created a small table with sample data. 

create table ANALYTICS_QUESTION
(
  PK_COL NUMBER(2) not null,
  COL1   VARCHAR2(10),
  COL2   VARCHAR2(10),
  A_DATE DATE
);

insert into ANALYTICS_QUESTION (PK_COL, COL1, COL2, A_DATE)
values (1, 'alpha', 'alpha-val', to_date('09-10-2006 17:49:30', 'dd-mm-yyyy hh24:mi:ss'));
insert into ANALYTICS_QUESTION (PK_COL, COL1, COL2, A_DATE)
values (2, 'alpha', 'alpha-val', to_date('09-10-2006 17:49:33', 'dd-mm-yyyy hh24:mi:ss'));
insert into ANALYTICS_QUESTION (PK_COL, COL1, COL2, A_DATE)
values (3, 'alpha', 'alpha-val', to_date('29-09-2006 17:49:36', 'dd-mm-yyyy hh24:mi:ss'));
insert into ANALYTICS_QUESTION (PK_COL, COL1, COL2, A_DATE)
values (4, 'beta', 'beta-val', to_date('04-10-2006 17:49:52', 'dd-mm-yyyy hh24:mi:ss'));
insert into ANALYTICS_QUESTION (PK_COL, COL1, COL2, A_DATE)
values (5, 'gamma', 'gamma-val', to_date('07-10-2006 17:50:06', 'dd-mm-yyyy hh24:mi:ss'));
insert into ANALYTICS_QUESTION (PK_COL, COL1, COL2, A_DATE)
values (6, 'gamma', 'gamma-val', to_date('07-10-2006 17:50:09', 'dd-mm-yyyy hh24:mi:ss'));

The table now has:
SQL> select * from analytics_question;

PK_COL COL1       COL2       A_DATE
------ ---------- ---------- -----------
     1 alpha      alpha-val  10/9/2006 5
     2 alpha      alpha-val  10/9/2006 5
     3 alpha      alpha-val  9/29/2006 5
     4 beta       beta-val   10/4/2006 5
     5 gamma      gamma-val  10/7/2006 5
     6 gamma      gamma-val  10/7/2006 5

The output I need is (consider top 2 rows for this example):
COL1  COL2     
----- ---------
alpha alpha-val
gamma gamma-val

If I had to do this in a spreadsheet, I would remove duplicates in column col1 while retaining the rows that have the most recent date in column a_date. Next, I would sort this by date and pick the top 2 rows. (For a given value in co11, col2 always has the same value.)

Since it involves top-n, I considered using analytic functions. To do it the traditional way, I could use:

SQL> select *
  2  from   (select *
  3          from   (select col1,
  4                         col2,
  5                         max(a_date) as a_date
  6                  from   analytics_question
  7                  group  by col1,
  8                            col2) tab
  9          order  by a_date desc)
 10  where  rownum <= 2;

COL1       COL2       A_DATE
---------- ---------- -----------
alpha      alpha-val  10/9/2006 5
gamma      gamma-val  10/7/2006 5

In reality, the "analytics_question" would be a inline view over a table that has over 600,000 rows. Does not perform well.

I've attempted to use row_number() and rank() functions as below, but I wonder if there is any way to avoid the group by clause.

SQL> select *
  2  from   (select col1,
  3                 col2,
  4                 max(a_date),
  5                 row_number() over(order by max(a_date) desc) as rn
  6          from   analytics_question
  7          group  by col1,
  8                    col2)
  9  where  rn <= 2;

COL1       COL2       MAX(A_DATE)         RN
---------- ---------- ----------- ----------
alpha      alpha-val  10/9/2006 5          1
gamma      gamma-val  10/7/2006 5          2

Is there a way to simplify this further? Otherwise, its back to tuning.

Thanks and regards,
Anu

October 10, 2006 - 7:51 pm UTC

600,000 rows is pretty small - can you define "does not perform well"?

Your Help --

Reader, October 10, 2006 - 5:44 pm UTC

create table testa (a1 date,h_flg varchar2(1),b_flg varchar2(1),amt number(6,2))

insert into testa values( sysdate,'B',null,25.00);
insert into testa values( sysdate,null,'H',55.00);
insert into testa values( sysdate,null,'H',35.00);

How can I get the output in one line --

160 70 90

Thanks
insert into testa values( sysdate,'B',null,45.00);
commit;
select distinct sum(amt) over(partition by nvl(h_flg,0)) h_amt,
sum(amt) over(partition by nvl(b_flg,0)) b_amt,
sum(amt) over() tot_amt from testa

H_AMT B_AMT B_AMT
---------- ---------- ----------
160 70 70
160 90 90

2 rows selected.

How can I get the output in one line --

160 70 90

October 10, 2006 - 8:24 pm UTC

umm, you have a table with just three rows in it? your answer is very dependent on there being precisely three rows - is this the "real problem" or so overly simplified as to not be relevant?

Help Required on Promotion Calculation

Hardik Panchal, October 12, 2006 - 9:40 am UTC

Hi Tom,

I have following table:

CREATE TABLE PROMOTION_CALC
(
CUST_NBR NUMBER NOT NULL,
ITEM_ID CHAR(20 BYTE),
EVENT_NBR NUMBER NOT NULL,
PROMO_SEQ_NBR NUMBER(2),
PCT NUMBER(5,4),
AMT NUMBER,
BASE_PRICE NUMBER
);

The table data is as below:

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 1, 1, 4, 0, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 6, 1, 0, 5, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 5, 2, 0, 20, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 2, 3, 5, 0, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 7, 3, 0, 10, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
2, '400 ', 9, 1, 0, 15, 200);

COMMIT;

following is the tabular form of data:

CUST_NBR ITEM_ID EVENT_NBR PROMO_SEQ_NBR PCT AMT BASE_PRICE
-------- ------- --------- ------------- --- --- ----------
1 300 1 1 4 0 100
1 300 6 1 0 5 100
1 300 5 2 0 20 100
1 300 2 3 5 0 100
1 300 7 3 0 10 100
2 400 9 1 0 15 200

Now, i want to calculate NEW_PRICE and PERCENT_DIC as mentioned below:

CUST ITEM EVENT PROMO_ PCT AMT BASE_PRICE NEW_PRICE PERCENT_DIC
_NBR _ID _NBR SEQ_NBR
---- ---- ------ -------- --- --- ---------- --------- -----------
1 300 1 1 4 0 100 91 4
1 300 6 1 0 5 100 91 5
1 300 5 2 0 20 100 71 21.97
1 300 2 3 5 0 100 57.4532 5
1 300 7 3 0 10 100 57.4532 14.08
2 400 9 1 0 15 200 185 7.5

The logic of calculation is as below:

- First of all, the promotion (whether in percent off or amount off) will be converted in to percent off
- The summation of percent off (group by cust_nbr, item_nbr, promo_seq_nbr) will be applied
- The percent off calculated will be applied on the base_price for the least promo_seq_nbr per cust_nbr, item_nbr combination. For the next set of promo_seq_nbr per cust_nbr, item_nbr combination, the percent off calculated will be applied on the new_price (amount calculated with previous promo_seq_nbr), and not on the base_price.

In the above example,

- Row 1 and Row 2 will be applied to base_price: 100$ individually (as both having same promo_seq_nbr: 1)
i.e.
Row 1: Directly 4 percent off given
Row 2: Amount Off 5$ (on base_price: 100$) will be converted to percent off, which will result into 5%
- The summation of percent off: 9% (4% from row 1 + 5% from row 2) will be applied per cust_nbr, item_nbr, promo_seq_nbr (cust_nbr: 1, item_id: 300, promo_seq_nbr:1 in this case), which will result into new_price 91 $. Note here that the new_price column will be calculated as 91 in both the rows (row 1 and row 2) as a result of combined promotion of 9% applied on base_price.
- Now, for subsequent promo_seq_nbr for cust_nbr, item_nbr combination, the converted Percent off will be applied on new_price and not on the item base_price. i.e. in row 3, the amount off 20$ promtion will first converted in to percent off (applied on 91$ and not on 100$), resulting into 21.97%.
- In a same way, 3rd promo_seq_nbr (again having two rows for cust_nbr, item_nbr combination with same promo_seq_nbr) will be applied (as a summation of row 4 and row 5) on a new_price (which is 71$ after applying 2nd promo_seq_nbr).

The event_nbr field is not used in the promtion calculation, but is kept just because it is required for display purpose only in the business.

Thanks a lot in advance !

Regards,

Hardik Panchal

bulk analytic update

raajesh, October 12, 2006 - 3:32 pm UTC

Hi Tom

I would like to know if I can acheive my problem listed below using anayltic functions.

I have a table like this

Dept ID Dept Name Unique ID
1 A1
2 A1
3 A1
4 A2
5 A2
6 A2
7 A3
8 A4
9 A4
10 A4

Now I want my unique ID field to be generated like this

Dept ID Dept Name Unique ID
1 A1 123
2 A1 123
3 A1 123
4 A2 456
5 A2 456
6 A2 456
7 A3 124
8 A4 125
9 A4 125
10 A4 125

Also, the unique ID needs to be generated via an Oracle Sequence. The numbers given are just examples. i.e. unique ID takes values based on the Dept Name partition.

How to achieve this by an Update statement on the table? Please suggest a solution.

Raajesh

October 13, 2006 - 6:55 am UTC

no create, no inserts, no lookie :)

not promising it can be done, just stating "I do not look at the problem if you expect me to create a table, insert your data into it, create sequences, develop a working solution"

I only do the last bit, if and when possible

to raajesh

Michel Cadot, October 13, 2006 - 8:17 am UTC

I don't see the need of an Oracle sequence.
You can try this one:

update dept a
set unqid = (select val
from (select rownum val, deptname
from (select distinct deptname from dept)
) b
where b.deptname=a.deptname)
/

I'm not sure analytics are faster in this case but you can try (assuming deptid is a unique key):

update dept a
set unqid = (select val
from (select deptid,
first_value (deptid)
over (partition by deptname order by deptid) val
from dept) b
where b.deptid=a.deptid)
/

Regards
Michel

October 13, 2006 - 8:24 am UTC

if you really wanted to use "rownum", you best use order by in the select distinct, else there is no assurance that rownum would be assigned deterministically.

Michel Cadot, October 13, 2006 - 8:45 am UTC

Yes, I agree (about rownum and order) but I did not care about what the unique number is (as raajesh said: The numbers given are just examples) as long as it is different for each deptname.
Anyway, the second query gives a number that fully depends on the data as the department (name) is associated to the first deptid for each name (and it could be any other one). (This to explain to other readers and not to you, Tom, as I know you perfectly understand my queries.)

October 13, 2006 - 2:28 pm UTC

problem is that the calls to the subquery are NOT deterministic, the same rownum could in theory be assigned to different rows in different invocations.

One more analytics question

Serge Shmygelsky, October 13, 2006 - 8:55 am UTC

Hello Tom,

I have a table with the following data:

DN_NUM LAC CELL_ID DURATION
-------------------- -------------------- ---------- ----------
503111011 Kiev 044 10
503111011 Kiev 044 10
503111011 Kiev 044 10
503111011 Kiev region 044 10
503111011 Kiev region 044 10
503111011 Kiev region 044 10
503111011 Kiev region 044 10
503111011 Odesa 048 10
503111011 Odesa region 048 10
503111011 Odesa region 048 10
503111011 Odesa region 048 10
503111011 Odesa region 048 10
503111011 Odesa region 048 10

and I need for every phone number (there is only one for simplicity) to find cell_id with maximum number of calls (rows actually) and within this cell_id find lac with maximum number of calls, e.g. output should look like the following:

DN_NUM CELL_ID LAC COUNT_BY_CELL_ID COUNT_BY_REGION
503111011 044 Kiev region 7 4

I we just do 'GROUP BY cell_id, lac' then we'll have incorrect answer.
I've spent 2 hours playing with rank and rownum and cannot find solution. Is it possible to get it within one query?

Thanks in advance.

P.S. The real table is huge. So I'd rather not use any joins

October 13, 2006 - 2:29 pm UTC

no table create
no inserts into table
NO LOOK

to Serge Shmygelsky

Michel Cadot, October 13, 2006 - 10:09 am UTC

Hi Serge,

Do you understand "no create, no inserts, no lookie"?

If so, provide the data if you want an answer.

Regards
Michel

Analytical Calculation Problem

Hardik Panchal, October 14, 2006 - 9:30 am UTC

Hi Tom,

I am posting one problem scenario, hoping an excellent solution from you as always :-)

I have following table:

CREATE TABLE PROMOTION_CALC
(
CUST_NBR NUMBER NOT NULL,
ITEM_ID CHAR(20 BYTE),
EVENT_NBR NUMBER NOT NULL,
PROMO_SEQ_NBR NUMBER(2),
PCT NUMBER(5,4),
AMT NUMBER,
BASE_PRICE NUMBER
);

The table data is as below:

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 1, 1, 4, 0, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 6, 1, 0, 5, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 5, 2, 0, 20, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 2, 3, 5, 0, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
1, '300 ', 7, 3, 0, 10, 100);

INSERT INTO PROMOTION_CALC ( CUST_NBR, ITEM_ID, EVENT_NBR, PROMO_SEQ_NBR, PCT,
AMT, BASE_PRICE ) VALUES (
2, '400 ', 9, 1, 0, 15, 200);

COMMIT;

following is the tabular form of data:

CUST_NBR ITEM_ID EVENT_NBR PROMO_SEQ_NBR PCT AMT BASE_PRICE
-------- ------- --------- ------------- --- --- ----------
1 300 1 1 4 0 100
1 300 6 1 0 5 100
1 300 5 2 0 20 100
1 300 2 3 5 0 100
1 300 7 3 0 10 100
2 400 9 1 0 15 200

Now, i want to calculate NEW_PRICE and PERCENT_DIC as mentioned below:

CUST ITEM EVENT PROMO_ PCT AMT BASE_PRICE NEW_PRICE PERCENT_DIC
_NBR _ID _NBR SEQ_NBR
---- ---- ------ -------- --- --- ---------- --------- -----------
1 300 1 1 4 0 100 91 4
1 300 6 1 0 5 100 91 5
1 300 5 2 0 20 100 71 21.97
1 300 2 3 5 0 100 57.4532 5
1 300 7 3 0 10 100 57.4532 14.08
2 400 9 1 0 15 200 185 7.5

The logic of calculation is as below:

- First of all, the promotion (whether in percent off or amount off) will be
converted in to percent off
- The summation of percent off (group by cust_nbr, item_nbr, promo_seq_nbr)
will be applied
- The percent off calculated will be applied on the base_price for the least
promo_seq_nbr per cust_nbr, item_nbr combination. For the next set of
promo_seq_nbr per cust_nbr, item_nbr combination, the percent off calculated
will be applied on the new_price (amount calculated with previous
promo_seq_nbr), and not on the base_price.

In the above example,

- Row 1 and Row 2 will be applied to base_price: 100$ individually (as both
having same promo_seq_nbr: 1)
i.e.
Row 1: Directly 4 percent off given
Row 2: Amount Off 5$ (on base_price: 100$) will be converted to percent off,
which will result into 5%
- The summation of percent off: 9% (4% from row 1 + 5% from row 2) will be
applied per cust_nbr, item_nbr, promo_seq_nbr (cust_nbr: 1, item_id: 300,
promo_seq_nbr:1 in this case), which will result into new_price 91 $. Note here
that the new_price column will be calculated as 91 in both the rows (row 1 and
row 2) as a result of combined promotion of 9% applied on base_price.
- Now, for subsequent promo_seq_nbr for cust_nbr, item_nbr combination, the
converted Percent off will be applied on new_price and not on the item
base_price. i.e. in row 3, the amount off 20$ promtion will first converted in
to percent off (applied on 91$ and not on 100$), resulting into 21.97%.
- In a same way, 3rd promo_seq_nbr (again having two rows for cust_nbr, item_nbr
combination with same promo_seq_nbr) will be applied (as a summation of row 4
and row 5) on a new_price (which is 71$ after applying 2nd promo_seq_nbr).

The event_nbr field is not used in the promtion calculation, but is kept just
because it is required for display purpose only in the business.

Thanks a lot in advance !

Regards,

Hardik Panchal

October 14, 2006 - 9:39 am UTC

first - I'm taking new questions now, why wouldn't you use new questions to ask a.... new question?

second, I have no idea why 91 is the new price on the first row. 4% off of 100 would be 96 no?

How to do the reverse of this analytical function question?

Patty Hoth, October 16, 2006 - 1:05 pm UTC

I really find your website invaluable! I have studied and played around with this answer to try to solve my sql issue. I want to do the reverse of what this person was looking for.
My data looks like:

EMPLID EFFDT EFFSEQ DEPTID
----------- --------- ---------- ----------
0003496 01-JUL-99 0 157000
0003496 01-JAN-00 0 157000
0003496 01-JUL-00 1 157999
0003496 01-JAN-01 0 157999
0003496 01-JUL-01 1 157999
0003496 01-JAN-02 0 157999
0003496 01-JUL-02 1 157999
0003496 01-JAN-03 0 157999
0003496 01-JUL-03 1 157999
0003496 01-JAN-04 0 157999
0003496 01-JAN-05 0 157999
0003496 01-JUL-05 1 157999
0003496 01-JUL-05 2 157999
0003496 01-JAN-06 0 157999

14 rows selected.

I want to create sql to answer the question of department movement. So, for this data, the answer would like like:
eid effdt dept end effdt
3496 7/1/1999 157000 7/1/2000
3496 7/1/2000 157999

I have tried variations of the answer you provided, come very close, but am not able to get the right answer:
SELECT emplid, deptid, effdt, ending_effdt
FROM
(SELECT emplid,
deptid,
effdt,
ending_effdt,
DECODE( next_deptid, deptid, 1, 0 ) first_of_pair,
DECODE( lead_deptid, deptid, 1, 0 ) second_of_pair
FROM (
SELECT emplid,
lag(deptid) OVER (PARTITION BY emplid ORDER BY effdt, effseq) next_deptid,
lead(deptid) OVER (PARTITION BY emplid ORDER BY effdt, effseq) lead_deptid,
deptid,
effdt,
-- lag(effdt) OVER (PARTITION BY emplid ORDER BY effdt) ending_effdt,
-- lead(effdt) OVER (PARTITION BY emplid ORDER BY effdt) lead_effdt
lead(effdt) OVER (PARTITION BY emplid ORDER BY effdt, effseq) ending_effdt,
lag(effdt) OVER (PARTITION BY emplid ORDER BY effdt, effseq) lead_effdt
FROM ps_job
WHERE emplid = '0003496' )
WHERE next_deptid IS NULL
OR lead_deptid IS NULL
OR lead_deptid <> deptid
OR next_deptid <> deptid
)
WHERE first_of_pair <> 1

Result:

EMPLID DEPTID EFFDT ENDING_EF
----------- ---------- --------- ---------
0003496 157000 7/1/1999 1/1/2000
0003496 157999 7/1/2000 1/1/2001

The effdt are correct, but the ending dates are not (second one should be null and first should be 7/1/2000). I will continue to work on this, but thought I'd try your assistance! Thanks very much in advance.

October 16, 2006 - 5:21 pm UTC

no creates
no insert intos
definitely no lookie...

One more analytics question - ctd

Serge Shmygelsky, October 23, 2006 - 8:10 am UTC

Hello Tom,

thanks for your attention. I'll try to re-formulate my question hoping to get the answer.

I have created the following table:

OPS$SHMYG@REX> create table test (dn_Num varchar2(11), region varchar2(20), area varchar2(20));

After that I put some data in there:

OPS$SHMYG@REX> insert into test values ('503111011', 'Kiev', 'Kiev');
OPS$SHMYG@REX> insert into test values ('503111011', 'Kiev', 'Kiev region');
OPS$SHMYG@REX> insert into test values ('503111011', 'Kiev', 'Kiev');
OPS$SHMYG@REX> insert into test values ('503111011', 'Kiev', 'Kiev region');
OPS$SHMYG@REX> insert into test values ('503111011', 'Kiev', 'Kiev region');
OPS$SHMYG@REX> insert into test values ('503111011', 'Kiev', 'Kiev region');
OPS$SHMYG@REX> insert into test values ('503111011', 'Odesa', 'Odesa');
OPS$SHMYG@REX> insert into test values ('503111011', 'Kiev', 'Kiev');
OPS$SHMYG@REX> insert into test values ('503111011', 'Odesa', 'Odesa region');
OPS$SHMYG@REX> insert into test values ('503111011', 'Odesa', 'Odesa region');
OPS$SHMYG@REX> insert into test values ('503111011', 'Odesa', 'Odesa region');
OPS$SHMYG@REX> insert into test values ('503111011', 'Odesa', 'Odesa region');
OPS$SHMYG@REX> insert into test values ('503111011', 'Odesa', 'Odesa region');

Now my table looks like this:

OPS$SHMYG@REX> select * from test order by 1, 2, 3;

DN_NUM REGION AREA
-------------------- -------------------- --------------------
503111011 Kiev Kiev
503111011 Kiev Kiev
503111011 Kiev Kiev
503111011 Kiev Kiev region
503111011 Kiev Kiev region
503111011 Kiev Kiev region
503111011 Kiev Kiev region
503111011 Odesa Odesa
503111011 Odesa Odesa region
503111011 Odesa Odesa region
503111011 Odesa Odesa region
503111011 Odesa Odesa region
503111011 Odesa Odesa region

Actually these are phone calls. Each phone call has 2 attributes:
region it is made from
area it is made from.

Each region contains 2 or more areas.

The question is as follows:

for any given phone number I need to find a region with maximum number of calls. For this phone and this region I need to find area with maximum number of calls.

If I just run (group by region, area), I'll have
Phone number Region CNT_REGION Area CNT_AREA
503111011 Odesa 6 Odesa region 5

but I need the following:
Phone number Region CNT_REGION Area CNT_AREA
503111011 Kiev 7 Kiev region 4

I cannot find a way to do it in one sql and hope this is possible as real table contains 100,000,000 rows per day.

Hope this time my explanation is acceptable.

Best regards.

October 23, 2006 - 10:29 am UTC

ops$tkyte%ORA10GR2> select * from (
  2  select region, count(*)
  3    from test
  4   group by region
  5   order by 2 desc
  6  )
  7  where rownum = 1;

REGION                 COUNT(*)
-------------------- ----------
Kiev                          7

ops$tkyte%ORA10GR2> select * from (
  2  select region, area, count(*)
  3    from test
  4   group by region, area
  5   order by 3 desc
  6  )
  7  where rownum = 1;

REGION               AREA                   COUNT(*)
-------------------- -------------------- ----------
Odesa                Odesa region                  5

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select *
  2    from (
  3  select region, area, max_area, area_cnt, reg_cnt,
  4         max(reg_cnt) over () max_reg
  5    from (
  6  select region, area, area_cnt,
  7         max(area_cnt) over (partition by region) max_area,
  8             sum(area_cnt) over (partition by region) reg_cnt
  9    from (
 10  select region, area, count(*) area_cnt
 11    from test
 12   where dn_num = '503111011'
 13   group by region, area
 14         )
 15             )
 16             )
 17   where area_cnt = max_area or reg_cnt = max_reg
 18  /

REGION               AREA                   MAX_AREA   AREA_CNT    REG_CNT    MAX_REG
-------------------- -------------------- ---------- ---------- ---------- ----------
Kiev                 Kiev                          4          3          7          7
Kiev                 Kiev region                   4          4          7          7
Odesa                Odesa region                  5          5          6          7


last query is the one you asked for, if dn_num is indexed, it doesn't really matter if the table is big, i would presume the phone number is fairly selective.

One more analytics question - ctd

Serge Shmygelsky, October 23, 2006 - 9:03 am UTC

Hello Tom,

looks like I've finally found the answer. If I'm not wrong, it should be like that:

OPS$SHMYG@REX> select dn_num, region, area, count(*), sum(count(*)) over (partition by dn_num, region) from test group by dn_num, region, area order by 5 desc, 4 desc;

DN_NUM REGION AREA COUNT(*) SUM(COUNT(*))OVER(PARTITIONBYDN_NUM,REGION)
-------------------- -------------------- -------------------- ---------- -------------------------------------------
503111011 Kiev Kiev region 4 7
503111011 Kiev Kiev 3 7
503111011 Odesa Odesa region 5 6
503111011 Odesa Odesa 1 6

The first row is the one I was looking for.

October 23, 2006 - 10:31 am UTC

no according to your problem definition - you asked for two two rows (max region, max area - no relation between area and region in your problem statement) AND further, there are ties - meaning this query can return lots of rows and it should

or you need to be much much more precise.

One more analytics question - ctd

Serge Shmygelsky, October 24, 2006 - 2:06 am UTC

Hello Tom,

thanks for your solution. It solves the problem as well as the one I mentioned. They'll need to be customized a little but I'm on my way.
I'll pay some attention to my English :).
Do you plan to create a logo for PL/SQL? 'Analytics rocks' would be a good one :)

Best regards.

One probable answer to Serge !!!!!

Vinayak, October 24, 2006 - 5:32 am UTC

select * from (
select * from(
select dn_num,region,area,count(region) over(partition by region) reg_cnt,count(area) over(partition by region,area) area_cnt from test1
where dn_num='503111011'
)
order by reg_cnt desc,area_cnt desc
) where rownum=1

Hope this help you!!! Also wait for Tom's comments on this.

October 24, 2006 - 9:16 am UTC

well, not really knowing what he wanted, I cannot really comment much more than I did above - it is from his description that two rows should regularly appear (max region count, max area count)

How to move to analytic function ?

Lena, October 26, 2006 - 6:12 pm UTC

Hi Tom,
There are times when you need to bring diffrent things twice from the same table .

select test.name name,
test2.name name2,
p.name,
p.main_num,
p.sec_num
from test test, <===
test test2, <===
prod p
where p.main_num = test.id
and p.sec_num = test2.id

Is it possible get the same result using analytic function ?

create table test
(id number,
name varchar2(20))

insert into test values(251,'U-M');
insert into test values(279,'PAL');
insert into test values(300,'U-A');
insert into test values(301,'E');
insert into test values(303,'S');
insert into test values(337,'A')
commit;

create table prod
(id number,
name varchar2(20),
main_num number,
sec_num number);

insert into prod values (1,'XX',300,251);
insert into prod values (2,'YY',300,null);
insert into prod values (3,'ZZ',279,301);
insert into prod values (4,'UU',303,337);
COMMIT;

Regards

October 27, 2006 - 7:38 am UTC

in this case, no, using analytics would not suffice "in general"

Second question ...

Lena, October 27, 2006 - 1:32 pm UTC

Hi Tom,
I tried to find a second way to write this statment
but didnt success.
Could you please show how to do it ?

select test.name name,
test2.name name2,
p.name,
p.main_num,
p.sec_num
from test test, <===
test test2, <===
prod p
where p.main_num = test.id
and p.sec_num = test2.id

Thanks again

October 27, 2006 - 8:08 pm UTC

from test test2,
test test,
prod p

there you go, just mix up the order of the tables in the from clause and you have written it in a second way.....

Not sure what you mean?

Second way = Different way

Lena, October 28, 2006 - 1:11 am UTC

Hi Tom,
I found in one of our reports a select statment that
fetching from 20 tables in the FROM clause.

The reason that there are so many tables , is because of the need to bring twice (or more) thing from the same
table.

So, i built a simple test case that look like the following query:

select test.name name,
test2.name name2,
p.name,
p.main_num,
p.sec_num
from test test, <===
test test2, <===
prod p
where p.main_num = test.id
and p.sec_num = test2.id

Im trying to rewrite this query in different way(second way ...)) , but cant find any way to do that task.
I thought about analytics , but understood that its not
the case for analytics.

My question is if Could you please show how to do it ?
Thanks again

October 28, 2006 - 10:35 am UTC

well, the problem is that each row in "prod" can join to a different set of rows in test each time, hence - well, you sort of need TWO JOINS.

In this case, it is somewhat "not really avoidable", p.main_num in general is not equal to p.sec_num (I presume, else you would not have two columns) therefore and thusly - two joins.

A reader, November 17, 2006 - 6:50 am UTC

Dear Tom,

Sorry to ask my question here. I've the following situation

create table t1 (x number, y varchar2(3), z varchar2(3));

create table t2 (z varchar2(3), w number);

INSERT INTO T1 ( X, Y, Z ) VALUES ( 1, '003', 'aaa');
INSERT INTO T1 ( X, Y, Z ) VALUES ( 2, '004', 'bbb');
INSERT INTO T1 ( X, Y, Z ) VALUES ( 3, '005', 'ccc');
INSERT INTO T1 ( X, Y, Z ) VALUES ( 4, '006', 'ddd');

INSERT INTO T2 ( Z, W ) VALUES ( 'aaa', 1);
INSERT INTO T2 ( Z, W ) VALUES ( 'bbb', 2);
INSERT INTO T2 ( Z, W ) VALUES ( 'ccc', 3);

commit;

I would like then to do the following update
update t1
set t1.x = 99
where t1.y = '003'
and t1.z in (select z from t2)

Then, I would like to insert into another table all t1.z
that have been concerned by this update.
(I don't want to issue the following select again in order to get those t1.z
select t1.z from t1
where t1.y = '003'
and t1.z in (select z from t2)
)

Is it possible to do this directly within the update statement ?
Thanks for your help as you do always

Best Regards

November 17, 2006 - 8:09 am UTC

ops$tkyte%ORA10GR2> declare
  2          type array is table of t3.z%type index by binary_integer;
  3          l_data array;
  4  begin
  5          update t1 set x = 99 where y = '003' and z in (select z from t2)
  6          returning z bulk collect into l_data;
  7
  8          forall i in 1 .. l_data.count
  9                  insert into t3 values (l_data(i));
 10  end;
 11  /

PL/SQL procedure successfully completed.

How can I use analytics to answer this question

A reader, November 27, 2006 - 11:15 am UTC

I have this data:

ORDER NUM UPDATE DATE TOTAL
--------- ----------- ------
100 01-JAN-06 150.00
100 15-FEB-06 160.00
100 02-JUN-06 85.00
200 05-FEB-06 300.00
200 07-NOV-06 400.00
200 25-NOV-06 100.00
300 01-MAY-06 500.00
300 20-JUN-06 600.00
300 26-NOV-06 750.00
400 16-AUG-06 800.00
400 20-SEP-06 800.00
400 10-OCT-06 200.00

I need to calculate the total difference between the last update date and the next to last update date for orders where the next to last update date is < 10/19 and the last total < the next to last total.

Using the sample data, the query should return

ORDER NUM DIFFERENCE
--------- ----------
100 -75.00
400 -600.00

Your expert advise will be appreciated.

November 27, 2006 - 7:33 pm UTC

no create
no inserts
no lookie
no promises that when we do look we can answer...

Here's the missing info

A reader, November 28, 2006 - 9:53 am UTC

create table orders (order_num number, update_date date, total number);

insert into orders values(100,to_date('01-JAN-06','DD-MON-YY'),150);
insert into orders values(100,to_date('15-FEB-06','DD-MON-YY'),160);
insert into orders values(100,to_date('02-JUN-06','DD-MON-YY'), 85);
insert into orders values(200,to_date('05-FEB-06','DD-MON-YY'),300);
insert into orders values(200,to_date('07-NOV-06','DD-MON-YY'),400);
insert into orders values(200,to_date('25-NOV-06','DD-MON-YY'),100);
insert into orders values(300,to_date('01-MAY-06','DD-MON-YY'),500);
insert into orders values(300,to_date('20-JUN-06','DD-MON-YY'),600);
insert into orders values(300,to_date('26-NOV-06','DD-MON-YY'),750);
insert into orders values(400,to_date('16-AUG-06','DD-MON-YY'),800);
insert into orders values(400,to_date('20-SEP-06','DD-MON-YY'),800);
insert into orders values(400,to_date('10-OCT-06','DD-MON-YY'),200);

November 28, 2006 - 11:56 am UTC

this is a total guess since "10/19" is a bit ambiguous, and you don't really specify how to compare total with last total (assume "by order_num")

ops$tkyte%ORA10GR2> select *
  2    from (
  3  select order_num, update_date, total,
  4         lag(total) over (partition by order_num order by update_date) last,
  5         total-lag(total) over (partition by order_num order by update_date) diff
  6    from orders
  7         )
  8   where diff < 0
  9     and update_date < to_date( '19-oct-2006', 'dd-mon-yyyy')
 10   order by order_num, update_date
 11  /

 ORDER_NUM UPDATE_DA      TOTAL       LAST       DIFF
---------- --------- ---------- ---------- ----------
       100 02-JUN-06         85        160        -75
       400 10-OCT-06        200        800       -600

Thanks!

A reader, November 28, 2006 - 3:59 pm UTC

Your reply gave me an excellent foundation to achieve what I was trying to accomplish.

Analytics defenitely rock!

Thanks!

How about de model clause

Bart, November 29, 2006 - 8:31 am UTC

Often I see you're quote :

(but wait'll you see the
SQL Model clause in 10g)

But it seems very quiet around de model clause.

It is just too complicated
It doesn't fill any needs
or
??

Any thoughts...practical (real world) examples...

November 30, 2006 - 9:02 am UTC

there are examples on this site all over the place - it comes up from time to time.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:15637744429336#77373298906094 <code>

for example

Re: How about de model clause

Frank Zhou, November 30, 2006 - 12:16 pm UTC

Bart,
SQL Model Clause is one of the most powerful and useful tool in the Oracle Database. This is truly a SQL Evolution It opens up limitless opportunities to process data in a single SQL queries. A fast and efficient single query solution becomes a reality for many of the tough real world problems. It is an excellent invention from the Oracle Database team!!
In 10G release 2 the following trick is allowed in the database:

with Model_Clause_1 as (SQL_MODEL_Clause_QUERY),
With Model_Clause_2 as (an other SQL_MODEL_Clause_QUERY that can Selects from "Model_Clause_1")
Select * from "With Model_Clause_2"
MODEL
Partition by ( * )
dimension by (Analytic functions is allowed here !)
Measures (Analytic functions is allowed here too ! )
Rules ( Analytic functions is "FINALLY" allowed here !!!!)

Just think about it, with this kind of powerful data processing capabilities people should have very few excuses not to be able to implement a single query to solve a database related problems.
Here are more Model Clause examples :
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1941199930674#77866736858360 http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:4273679444401#76833955117389 <code>
SQL Model clause rock......
Analytics roll....

Frank

select distinct

Sandeep, December 06, 2006 - 1:39 am UTC

Tom I have one table suppose table1 and I have to select 2 rows in that suppose row1 and row2 but row1 I have some duplicates I want to select now and while selecting I want to skip duplicates and select only those rows which is having maximum effective date
we have 11 mil records in table so I have to write analytical function
please help me in doing this

December 07, 2006 - 8:01 am UTC

how can row1 have "some duplicates".

if you mean:

we have a history table, the key is:

X, EFF_DT

for any given X value, we only want to keep the record such that for that X value, EFF_DT = max(EFF_DT)

then

select *
from (select t.*, max(eff_dt) over (partition by X) max_dt
from t)
where eff_dt = max_dt;

Reader, December 06, 2006 - 1:43 am UTC

Sorry Tom rather than saying column I said rows

00907: missing right parenthesis in ProC *

Jimmy, December 07, 2006 - 9:16 pm UTC

Tom Greetings,

I have a program in ProC *, and works most of the times, the problem well is that some times it appears error ORA-00907: missing right parenthesis when executing following sentence INSERT:

EXEC SQL INSERT INTO TABLAF
( N1M_F5CT3NT4RN2 ,
N1M_F2L32 ,
N1M_PR2C4S2 ,
T3P_D2C1M ,
T3P_PR2C4S2 ,
C2D_V4NT5 ,
C2D_C4NTR4M3 ,
C2D_CL34NT4 ,
F4C_4M3S32N ,
F4C_V4NC3M34NT2 ,
C2D_C3CLF5CT ,
C2D_M2N4D5 ,
F4C_3N3D5TC2BR2S,
T2T_F5CT1R5 ,
T2T_N4T2 ,
T2T_P5G5R ,
5C1M_D4SC14NT2 ,
5C1M_3V5 ,
5C1M_3C5 ,
5C1M_R4T4NC32N ,
3MP_5C1M1L5D2 ,
3MP_S5LD25NT ,
3MP_5J1ST4S ,
3MP_P5G2S ,
3MP_M2R530 ,
3MP_M2R560 ,
3MP_M2R590 ,
3MP_M2R5120 ,
3MP_M2R5150 ,
N1M_F5CTR4F ,
N2M_1S15R2R5 ,
3ND_P5C2BR2S ,
3ND_P5C2NT5BL4 ,
3ND_3MPR4S32N ,
N1M_D35N ,
3MP_3V5S5LD25NT ,
F4C_S1SP4NS32N )
V5L14S (
T2_N1MB4R(:stH3stD2c1.szN1mF5ct3nt4rn2) ,
T2_N1MB4R(:stH3stD2c1.szN1mF2l32) ,
:stH3stD2c1.lN1mPr2c4s2 ,
:stH3stD2c1.3T3pD2c1m ,
:stH3stD2c1.3T3pPr2c4s2 ,
:stH3stD2c1.lC2dV4nt5 :3_shC2dV4nt5 ,
:stH3stD2c1.3C2dC4ntr4m3 ,
:stH3stD2c1.lC2dCl34nt4 ,
T2_D5T4(:stH3stD2c1.szF4c4m3s32n ,'YYYYMMDDHH24M3SS') ,
T2_D5T4(:stH3stD2c1.szF4cV4nc3m34nt2 ,'YYYYMMDDHH24M3SS') ,
:stH3stD2c1.lC2dC3clF5ct :3_shC2dC3clF5ct ,
:stH3stD2c1.szC2dM2n4d5 ,
T2_D5T4(:stH3stD2c1.szF4c3n3D5tC2br2s :3_shF4c3n3D5tC2br2s,
'YYYYMMDDHH24M3SS') ,
:stH3stD2c1.dT2tF5ct1r5 ,
:stH3stD2c1.dT2tN4t2 ,
:stH3stD2c1.dT2tP5g5r ,
:stH3stD2c1.d5c1mD4sc14nt2 ,
:stH3stD2c1.d5c1m3V5 ,
:stH3stD2c1.d5c1m3C5 ,
:stH3stD2c1.d5c1mR4t4nc32n ,
:stH3stD2c1.d3mp5c1m1l5d2 ,
:stH3stD2c1.d3mpS5ld25nt ,
:stH3stD2c1.d3mp5j1st4s :3_sh3mp5j1st4s ,
:stH3stD2c1.d3mpP5g2s :3_sh3mpP5g2s ,
:stH3stD2c1.d3mpM2r530 :3_sh3mpM2r530 ,
:stH3stD2c1.d3mpM2r560 :3_sh3mpM2r560 ,
:stH3stD2c1.d3mpM2r590 :3_sh3mpM2r590 ,
:stH3stD2c1.d3mpM2r5120 :3_sh3mpM2r5120 ,
:stH3stD2c1.d3mpM2r5150 :3_sh3mpM2r5150 ,
T2_N1MB4R(:stH3stD2c1.szN1mF5ctR4f :3_shN1mF5ctR4f) ,
:stH3stD2c1.szN2m1s15r2r5 :3_shN2m1s15r2r5 ,
:stH3stD2c1.33ndP5C2br2s ,
:stH3stD2c1.33ndP5C2nt5bl4 ,
:stH3stD2c1.33nd3mpr4s32n ,
:stH3stD2c1.szN1mD35N ,
:stH3stD2c1.d3mp3v5S5ld25nt ,
T2_D5T4(:stH3stD2c1.szF4cS1sp4ns32n :3_shF4cS1sp4ns32n ,
'YYYYMMDDHH24M3SS') );

The variables flags, as i_shF4c3n3D5tC2br2s previously are initialized with the values according to the value of the variables.
Ej: i_shFecIniDatCobros = (stHistDocu.szFecIniDatCobros[0] == ' \0')?ORA_NULL:0; ORA_NULL has value -1 I thank for

It will have something to see with host variable and flags, in what cases can fail. upgrade library of Oracle? or S.O?

Much the aid that you can offer me,

Thanks,

December 08, 2006 - 7:35 am UTC

man, those are the wackiest variable names ever aren't they.

I don't see how a prepared statement like that could or would "sometimes" say "missing parens", it seems it should either be

a) always
b) never

are you saying "intermittent" - because that would only be raised during a parse and that should only be parsed ONCE really during program execution and if it parsed at least once, it should always parse

(and if it does not, I would have to then say "you have a memory overwrite somewhere and you are munging this static string during your program execution - that is, you are doing it yourself")

Michel Cadot, December 08, 2006 - 8:05 am UTC

It seems he tried to obfuscate the real name as we can see VALUES is converted to V5L14S, TO_DATE to T2_D5T4...
A quick look we can see:
A -> 5
E -> 4
I -> 3
O -> 2(?)
U -> 1

In his idiom "Analytics question" means "Syntax error". :)

Michel

Secret decoder ring...

Jmv, December 08, 2006 - 11:20 am UTC

Start of decoding....
T2_N1MB4R is TO_NUMBER so 2=O, 1=U, E=4
V5L14S is VALUES so 5=A, 1=U and E=4 (correlation)
T2_D5T4 is TO_DATE so 2=0 (corr.), 5=A (corr.), 4=E (correlated)

Possible partial Rosetta stone...
12345
UO?EA Is 3=I?

Possible rewrite...
EXEC SQL INSERT INTO TABLAF
( NUM_FACTINTERNO ,
NUM_FOLIO ,
NUM_PROCESO ,
TIP_DOCUM ,
TIP_PROCESO ,
COD_VENTA ,
COD_CENTREMI ,
COD_CLIENTE ,
FEC_EMISION ,
FEC_VENCIMIENTO ,
COD_CICLFACT ,
COD_MONEDA ,
FEC_INIDATCOBROS,
TOT_FACTURA ,
TOT_NETO ,
TOT_PAGAR ,
ACUM_DESCUENTO ,
ACUM_IVA ,
ACUM_ICA ,
ACUM_RETENCION ,
IMP_ACUMULADO ,
IMP_SALDOANT ,
IMP_AJUSTES ,
IMP_PAGOS ,
IMP_MORAI0 ,
IMP_MORA60 ,
IMP_MORA90 ,
IMP_MORAUO0 ,
IMP_MORAUA0 ,
NUM_FACTREF ,
NOM_USUARORA ,
IND_PACOBROS ,
IND_PACONTABLE ,
IND_IMPRESION ,
NUM_DIAN ,
IMP_IVASALDOANT ,
FEC_SUSPENSION )
VALUES (
TO_NUMBER(:stHIstDOcU.szNUmFActIntErnO) ,
TO_NUMBER(:stHIstDOcU.szNUmFOlIO) ,
:stHIstDOcU.lNUmPrOcEsO ,
:stHIstDOcU.ITIpDOcUm ,
:stHIstDOcU.ITIpPrOcEsO ,
:stHIstDOcU.lCOdVEntA :I_shCOdVEntA ,
:stHIstDOcU.ICOdCEntrEmI ,
:stHIstDOcU.lCOdClIEntE ,
TO_DATE(:stHIstDOcU.szFEcEmIsIOn ,'YYYYMMDDHHOEMISS') ,
TO_DATE(:stHIstDOcU.szFEcVEncImIEntO ,'YYYYMMDDHHOEMISS') ,
:stHIstDOcU.lCOdCIclFAct :I_shCOdCIclFAct ,
:stHIstDOcU.szCOdMOnEdA ,
TO_DATE(:stHIstDOcU.szFEcInIDAtCObrOs :I_shFEcInIDAtCObrOs,
'YYYYMMDDHHOEMISS') ,
:stHIstDOcU.dTOtFActUrA ,
:stHIstDOcU.dTOtNEtO ,
:stHIstDOcU.dTOtPAgAr ,
:stHIstDOcU.dAcUmDEscUEntO ,
:stHIstDOcU.dAcUmIVA ,
:stHIstDOcU.dAcUmICA ,
:stHIstDOcU.dAcUmREtEncIOn ,
:stHIstDOcU.dImpAcUmUlAdO ,
:stHIstDOcU.dImpSAldOAnt ,
:stHIstDOcU.dImpAjUstEs :I_shImpAjUstEs ,
:stHIstDOcU.dImpPAgOs :I_shImpPAgOs ,
:stHIstDOcU.dImpMOrAI0 :I_shImpMOrAI0 ,
:stHIstDOcU.dImpMOrA60 :I_shImpMOrA60 ,
:stHIstDOcU.dImpMOrA90 :I_shImpMOrA90 ,
:stHIstDOcU.dImpMOrAUO0 :I_shImpMOrAUO0 ,
:stHIstDOcU.dImpMOrAUA0 :I_shImpMOrAUA0 ,
TO_NUMBER(:stHIstDOcU.szNUmFActREf :I_shNUmFActREf) ,

:stHIstDOcU.szNOmUsUArOrA :I_shNOmUsUArOrA ,
:stHIstDOcU.IIndPACObrOs ,
:stHIstDOcU.IIndPACOntAblE ,
:stHIstDOcU.IIndImprEsIOn ,
:stHIstDOcU.szNUmDIAN ,
:stHIstDOcU.dImpIvASAldOAnt ,
TO_DATE(:stHIstDOcU.szFEcSUspEnsIOn :I_shFEcSUspEnsIOn ,
'YYYYMMDDHHOEMISS') );

If trying to obfuscate, then one of the parenthesis might have been missed, or accidently replaced.

Otherwise perhaps there may be a problem with the variable assignment wherein the variable contains a quote or other character which is causing the Pro*C routine (eg sql injection) to report the error.

Need more info.

December 09, 2006 - 12:39 pm UTC

laughing out loud - nice decoder there.

Analytic question

Carlos, December 11, 2006 - 8:05 am UTC

Tom I am using Oracle 10g and I need your feedback on the following:

I need a way to generate zero amount value records on dates when the blasts did not generate any amounts.

Here are the DDLs for the table and data:

create table test
(transaction_date date,
blast_id number(11),
blast_amount number(11)
)
/

insert into test values ('01-DEC-06', 1, 10);
insert into test values ('01-DEC-06', 2, 5);
insert into test values ('01-DEC-06', 3, 15);
insert into test values ('01-DEC-06', 4, 12);
insert into test values ('01-DEC-06', 5, 9);

insert into test values ('02-DEC-06', 7, 30);
insert into test values ('02-DEC-06', 8, 10);
insert into test values ('02-DEC-06', 1, 20);
insert into test values ('02-DEC-06', 9, 40);
insert into test values ('02-DEC-06', 2, 10);

insert into test values ('03-DEC-06', 10, 100);
insert into test values ('03-DEC-06', 6, 45);
insert into test values ('03-DEC-06', 12, 200);
insert into test values ('03-DEC-06', 13, 90);
insert into test values ('03-DEC-06', 14, 32);

commit;

If I query the table for transaction date = '02-DEC-06' the ouput should be

'02-DEC-06', 1, 20
'02-DEC-06', 2, 10
'02-DEC-06', 3, 0
'02-DEC-06', 4, 0
'02-DEC-06', 5, 0
'02-DEC-06', 7, 30
'02-DEC-06', 8, 10
'02-DEC-06', 9, 40

If I query the table for transaction date = '03-DEC-06' the output should be

'03-DEC-06', 1, 0
'03-DEC-06', 2, 0
'03-DEC-06', 3, 0
'03-DEC-06', 4, 0
'03-DEC-06', 5, 0
'03-DEC-06', 6, 45
'03-DEC-06', 7, 0
'03-DEC-06', 8, 0
'03-DEC-06', 9, 0
'03-DEC-06', 10, 100
'03-DEC-06', 12, 200
'03-DEC-06', 13, 90
'03-DEC-06', 14, 32

If I query the table for transaction date between '02-DEC-06' and '03-DEC-06', I should get
'02-DEC-06', 1, 20
'02-DEC-06', 2, 10
'02-DEC-06', 3, 0
'02-DEC-06', 4, 0
'02-DEC-06', 5, 0
'02-DEC-06', 7, 30
'02-DEC-06', 8, 10
'02-DEC-06', 9, 40
'03-DEC-06', 1, 0
'03-DEC-06', 2, 0
'03-DEC-06', 3, 0
'03-DEC-06', 4, 0
'03-DEC-06', 5, 0
'03-DEC-06', 6, 45
'03-DEC-06', 7, 0
'03-DEC-06', 8, 0
'03-DEC-06', 9, 0
'03-DEC-06', 10, 100
'03-DEC-06', 12, 200
'03-DEC-06', 13, 90
'03-DEC-06', 14, 32

My goal is not to store blasts that did not generate any amounts as I would be wasting space.

Thanks.

December 11, 2006 - 8:33 am UTC

I don't know why "6" for 2-dec and "11" for 3-dec are "missing", I will assume that is a mistake in your example - I cannot see any logic to them not being there.

What we need is a set of rows that is equal to the max blast_id for the rows of interest.  That is:

  7  (select max(max(blast_id))
  8     from test
  9    where transaction_date between to_date(:ldate,'dd-mon-yyyy') and to_date(:hdate,'dd-mon-yyyy')
 10    group by transaction_date)
 11  )

take that number and have at least that many rows.  That is what "data" is below.

then, using 10gr2 partitioned outer joins, we can make up the missing rows and fill them in:




ops$tkyte%ORA10GR2> variable ldate varchar2(25);
ops$tkyte%ORA10GR2> variable hdate varchar2(25);
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec :ldate := '02-dec-2006';

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec :hdate := '03-dec-2006';

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> with
  2  data
  3  as
  4  (select level l
  5     from dual
  6  connect by level <=
  7  (select max(max(blast_id))
  8     from test
  9    where transaction_date between to_date(:ldate,'dd-mon-yyyy') and to_date(:hdate,'dd-mon-yyyy')
 10    group by transaction_date)
 11  )
 12  select *
 13    from (
 14  select test.transaction_date, data.l, nvl(test.blast_amount,0) blast_amount,
 15         max(blast_id) over (partition by test.transaction_date) max_id
 16    from data left join
 17        (select *
 18               from test
 19          where transaction_date between to_date(:ldate,'dd-mon-yyyy') and to_date(:hdate,'dd-mon-yyyy')) test
 20            partition by (transaction_date) on (data.l = test.blast_id)
 21              )
 22   where l <= max_id
 23   order by transaction_date, l
 24  /

TRANSACTI          L BLAST_AMOUNT     MAX_ID
--------- ---------- ------------ ----------
02-DEC-06          1           20          9
02-DEC-06          2           10          9
02-DEC-06          3            0          9
02-DEC-06          4            0          9
02-DEC-06          5            0          9
02-DEC-06          6            0          9
02-DEC-06          7           30          9
02-DEC-06          8           10          9
02-DEC-06          9           40          9
03-DEC-06          1            0         14
03-DEC-06          2            0         14
03-DEC-06          3            0         14
03-DEC-06          4            0         14
03-DEC-06          5            0         14
03-DEC-06          6           45         14
03-DEC-06          7            0         14
03-DEC-06          8            0         14
03-DEC-06          9            0         14
03-DEC-06         10          100         14
03-DEC-06         11            0         14
03-DEC-06         12          200         14
03-DEC-06         13           90         14
03-DEC-06         14           32         14

23 rows selected.

Thanks a million

Carlos, December 11, 2006 - 8:49 am UTC

Just what I needed!

Analytics Question

Carlos, December 12, 2006 - 6:41 pm UTC

Tom, regarding your comment on what you perceived where missing records, the blast_id is a generated from a database sequence; A sequence is not guaranteed to always be sequential, as I understand it. It guaranties uniqueness and in some cases could skip.

Would your solution change as result of this?

Thanks again.

December 12, 2006 - 10:10 pm UTC

sorry - this comment is not making sense to me.

I assumed
a) blast_id starts at 1
b) blast_id goes to max by the transaction date

that is all - was wondering about the missing records in your example, couldn't figure out why

Finding Duplicates in large unpartitioned table

Ravi, December 26, 2006 - 12:29 pm UTC

Hi Tom,
Can you help me to find any ways to optimize the below routine which is taking 38 hours to run in production, table is huge with about 17 million records, and with about 2 million duplicate records, explan plan for query shows that is doing fast full table scan, but the problem looks like in the delete statement, which is taking very long.
Any help or insight on this problem is greatly appreciated.

SQL> select count(*) from claims_temp;

COUNT(*)
----------
17353894

Number of duplicates about 2,000,000

CREATE TABLE CLAIMS_TEMP
(
HEALTH_SERVICE_ID NUMBER(20) NOT NULL,
HEALTH_SERVICE_ITEM_NO NUMBER(2) NOT NULL,
PDE_REC_NO NUMBER(3) NOT NULL,
PATIENT_ID NUMBER(10) NOT NULL,
ADJ_CARDHOLDER_ID CHAR(20 BYTE) NOT NULL,
CLIENT_ID CHAR(15 BYTE) NOT NULL,
PLAN_ID CHAR(8 BYTE) NOT NULL,
ADJUD_DT DATE NOT NULL,
I_SERVICE_DT DATE NOT NULL,
LABEL_NAME VARCHAR2(50 BYTE) NOT NULL,
I_QTY_DISPENSED NUMBER(10,3) NOT NULL,
TOTAL_CLIENT_AMT_BILLED NUMBER(8,2) NOT NULL,
O_PATIENT_PAY_AMT NUMBER(8,2) NOT NULL
);

CREATE UNIQUE INDEX EOB_CLAIMS_TEMP_UX1 ON CLAIMS_TEMP
(HEALTH_SERVICE_ID, HEALTH_SERVICE_ITEM_NO, PDE_REC_NO);

CREATE OR REPLACE PROCEDURE sp_del_dups (p_limit IN PLS_INTEGER, p_exit_status OUT PLS_INTEGER)
IS
CURSOR dups_cur
IS
SELECT row_id
FROM (SELECT ROWID row_id,
ROW_NUMBER () OVER (PARTITION BY health_service_id, health_service_item_no ORDER BY pde_rec_no DESC)
dups_claims
FROM claims_temp)
WHERE dups_claims > 1;

TYPE rowidarray IS TABLE OF ROWID
INDEX BY PLS_INTEGER;

dups_rowid_tb pkg_eob.rowidarray;
BEGIN
OPEN dups_claims_cur;

LOOP
FETCH dups_claims_cur
BULK COLLECT INTO dups_rowid_tb LIMIT p_limit;

FORALL i IN 1 .. dups_rowid_tb.COUNT
DELETE FROM claims_temp
WHERE ROWID = dups_rowid_tb (i);
COMMIT;
EXIT WHEN dups_claims_cur%NOTFOUND;
END LOOP;

CLOSE dups_claims_cur;

COMMIT;
EXCEPTION
WHEN OTHERS
THEN
p_exit_status := 1;
RAISE;
END sp_del_dups;

Thank you
Ravi

December 26, 2006 - 9:23 pm UTC

lots of indexes or not lots of indexes?

what do you have p_limit set to?

you do know that setting p_exit_status in the exception block is a BIG OLD WASTE of keystrokes right?

Commit inside a loop

A reader, December 27, 2006 - 5:36 am UTC

Why are you commiting inside a loop. This will cause an ORA-01555 error.

Why not using this

delete from claims_temp
where rowid not in (select min(rowid) from claims_temp
group by health_service_id
,health_service_item_no
,pde_rec_no
);

However I am not sure that you will not get the ORA-0156 error. In this case then the above delete must be done part by part

Finding Duplicates in large unpartitioned table

Ravi, December 27, 2006 - 12:33 pm UTC

First thing is I am sorry, that the code I posted will not compile for you because, the Cursor name in the declaration section does not match to cursor name in cursor reference section.
In the declaration section it should have been "dups_claims_cur "

I tested out P_limit with 50000 and 500
with 500 -- > takes 4 hours
with 50,000 -- > takes 38 hours

But I need this script to finish in secounds not hours, is there any better way of achiving this with out bulk collect

For the reply from " THE READER " that solution wouldnt work because you are looking for duplicates for a combination of "health_service_id, health_service_item_no, pde_rec_no " but my requirement is for combination of "health_service_id, health_service_item_no" and delete the records with "pde_rec_no" < max(pde_rec_no) for multiple occurence of "health_service_id,health_service_item_no "

And for the commit, I assume the commit is in the right place that it can be, here the commit frequency would be for each BULK_COLLECT LIMIT;

December 28, 2006 - 9:33 am UTC

the commit should be where the commit belongs - and it not necessarily should be committing every delete call.

you mentioned nothing about number of indexes

and if you just run the query and fetch the rows (no delete), how long does that take.

For Ravi

Tyler, December 27, 2006 - 5:31 pm UTC

"reader" had the right idea for you, probably just not all of your requirements.

This should give you a starting point. I don't profess for the code to be optimal, you may need to tweak it for your needs. Should still be faster than your procedural code.

DELETE FROM claims_temp
WHERE ROWID IN
(
SELECT MY_ROWID
FROM
(
SELECT
ROWID as MY_ROWID,
pde_rec_no,
MAX(pde_rec_no) over (PARTITION BY health_service_id ,health_service_item_no) as MAX_pde_rec_no,
COUNT(*) over (PARTITION BY health_service_id ,health_service_item_no) as MY_COUNT
FROM claims_temp
)
WHERE MY_COUNT > 1
AND pde_rec_no <> MAX_pde_rec_no
);

General idea, find all the items with more than 1 record (per health_service_id ,health_service_item_no) where the pde_rec_no DOES NOT EQUAL the MAX pde_rec_no for the given per health_service_id ,health_service_item_no combination.

Based on your previous post, that should encompass the algorithm you require. I hope :)

Finding Duplicates in large unpartitioned table

Ravi, December 28, 2006 - 6:33 pm UTC

For Tom's Feedback

My mistake, I dint include index counts
Number of indexes
Unique index on (Health_service_id, health_service_item_no,pde_rec_no)
Bitmap index on (plan_id) not part of the query

I used Tyler's Query to return full data set comes back (with out delete) in 20 min.
--------------------------------------------------------

For Tyler's Feedback

your Query comes back with much better cost then mine, window sort is avoided by using your query, but cost is to high and runs for ever when accompanied by delete
--------------------------------------------------------
Tyler's Query Uses
WINDOW BUFFER
--------------------------------------------------------
SELECT MY_ROWID
FROM
(
SELECT
ROWID as MY_ROWID,
pde_rec_no,
MAX(pde_rec_no) over (PARTITION BY health_service_id ,health_service_item_no) as MAX_pde_rec_no,
COUNT(*) over (PARTITION BY health_service_id ,health_service_item_no) as MY_COUNT
FROM claims_temp
)
WHERE MY_COUNT > 1
AND pde_rec_no <> MAX_pde_rec_no

--------------------------------------------------------
Ravi's Query
Uses WINDOW SORT (more expensive)
--------------------------------------------------------
SELECT row_id
FROM (SELECT ROWID row_id,
ROW_NUMBER () OVER (PARTITION BY health_service_id, health_service_item_no ORDER BY pde_rec_no DESC)
dups_claims
FROM claims_temp)
WHERE dups_claims > 1;
--------------------------------------------------------

For Both

Query seems to be perfoming well, but when added
Delete .. where rowid in (query)
Cost went up by 20 times

my new approach ( I assume not very good though, but does the job in 45 min)

CREATE TABLE rowid_temp
(seq_no NUMBER UNIQUE, row_id VARCHAR2(30))

INSERT /*+APPEND */INTO rowid_temp
(seq_no, row_id)
SELECT ROWNUM, ROWIDTOCHAR (row_id)
FROM (SELECT ROWID AS row_id, pde_rec_no,
MAX (pde_rec_no) OVER (PARTITION BY health_service_id, health_service_item_no)
AS max_pde_rec_no,
COUNT (*) OVER (PARTITION BY health_service_id, health_service_item_no)
AS dups_count
FROM claims_temp)
WHERE dups_count > 1 AND pde_rec_no <> max_pde_rec_no;

COMMIT;

DELETE FROM claims_temp
WHERE ROWID IN (SELECT /*+ FULL(r) PARALLEL(r,4) */
CHARTOROWID (row_id)
FROM rowid_temp r);
COMMIT;

TRUNCATE TABLE rowid_temp;

December 29, 2006 - 9:38 am UTC

all indexes are necessarily involved here!!!!! you are deleting entries.

have you thought about a maintenance window where you can disable indexes, do the bulk delete and then rebuild them

Thank you

Ravi, December 29, 2006 - 11:42 am UTC

Unfortunatly DBA's in here will not allow me to drop and recreate indexes. I have to live with it,
Thanks a lot for all the help and guidence provided, that did help me a lot, execution time dropped from 38 hours to 45 minutes.
Tyler thanks for your query, that did helped me.

Thank you
Ravi

Analytics rock

Nengbing, January 09, 2007 - 1:26 pm UTC

Tom,
Thank you so much for your help to many who are using Oracle. I enjoy reading your books and this site.

I hope the following is relevant to this thread.

drop table t1 purge;
create table t1 (d number);
insert into t1 values(452911);
insert into t1 values(794034);
insert into t1 values(794057);
insert into t1 values(893069);
insert into t1 values(924952);
insert into t1 values(1068608);
insert into t1 values(1068617);
insert into t1 values(1237362);
insert into t1 values(1268008);

Desired output, groups of numbers that are are within 2*1000, each group with a distinctive group id.

D P1 P2 G
---------- ---------- ---------- ----------
452911 451911 453911 1
794034 793034 795034 2
794057 793057 795057 2
893069 892069 894069 3
924952 923952 925952 4
1068608 1067608 1069608 5
1068617 1067617 1069617 5
1237362 1236362 1238362 6
1268008 1267008 1269008 7

I have managed to do the following, but am not sure how to assign distinctive groups.

1 select d,p1,p2,l1+NVL(lag(l1,1) over (order by d),0) g
2 from (
3 select d,d-1000 p1,d+1000 p2,
4 case when (lead(d,1) over (order by d) - d) < 2000 then 1
5 else 0
6* end l1 from t1)
SQL> /

D P1 P2 G
---------- ---------- ---------- ----------
452911 451911 453911 0
794034 793034 795034 1
794057 793057 795057 1
893069 892069 894069 0
924952 923952 925952 0
1068608 1067608 1069608 1
1068617 1067617 1069617 1
1237362 1236362 1238362 0
1268008 1267008 1269008 0

9 rows selected.

Bascially, I want to put those consecutive rows that have 1 in column G into distinctive groups, each row wih 0 in column G should be in its own group.

Any help is greatly appreciated.

Nengbing

January 11, 2007 - 8:59 pm UTC

sorry, this is very very confusing - don't get p1, p2, g AT ALL

... groups of numbers that are are within 2*1000, ...

means nothing - within 2*1000 of what? why 2*1000 why not 2000?

Analytics rock

Nengbing, January 10, 2007 - 7:00 pm UTC

create table t1 (d number);

---- thought of using sequence to generate group id

create sequence seq1 start by 1 increment by 1;

insert into table t1 values (452911);
insert into table t1 values (794012);
insert into table t1 values (794034);
insert into table t1 values (794234);
insert into table t1 values (794345);
insert into table t1 values (924952);
insert into table t1 values (1068608);
insert into table t1 values (1068617);
insert into table t1 values (1237362);
insert into table t1 values (1268008);

select t2.*,
case when (1=l1)
and ((lag(l1,1) over (order by d)) = 1)
then seq1.currval
else seq1.nextval
end g
from (
select d,d-1000 p1,d+1000 p2,
case when (lead(d,1,0) over (order by d) - d) < 2000 then 1
when (d - (lag(d,1,0) over (order by d)) ) < 2000 then 1
else 0
end l1 from t1
) t2
/

D P1 P2 L1 G
---------- ---------- ---------- ---------- ----------
452911 451911 453911 0 281
794012 793012 795012 1 282
794034 793034 795034 1 283
794234 793234 795234 1 284
794345 793345 795345 1 285
924952 923952 925952 0 286
1068608 1067608 1069608 1 287
1068617 1067617 1069617 1 288
1237362 1236362 1238362 0 289
1268008 1267008 1269008 1 290

10 rows selected.
---- Why column G always increment?

---- replace seq1.nextval with 100

select t2.*,
case when (1=l1)
and ((lag(l1,1) over (order by d)) = 1)
then seq1.currval
else 100
end g
from (
select d,d-1000 p1,d+1000 p2,
case when (lead(d,1,0) over (order by d) - d) < 2000 then 1
when (d - (lag(d,1,0) over (order by d)) ) < 2000 then 1
else 0
end l1 from t1
) t2
/

D P1 P2 L1 G
---------- ---------- ---------- ---------- ----------
452911 451911 453911 0 100
794012 793012 795012 1 100
794034 793034 795034 1 290
794234 793234 795234 1 290
794345 793345 795345 1 290
924952 923952 925952 0 100
1068608 1067608 1069608 1 100
1068617 1067617 1069617 1 290
1237362 1236362 1238362 0 100
1268008 1267008 1269008 1 100

10 rows selected.

---- it seems that the condition testing is working

create sequence seq2 start with 1 increment by 1;

select t2.*,
case when (1=l1)
and ((lag(l1,1) over (order by d)) = 1)
then seq1.currval
else seq2.nextval
end g

from (
select d,d-1000 p1,d+1000 p2,
case when (lead(d,1,0) over (order by d) - d) < 2000 then 1
when (d - (lag(d,1,0) over (order by d)) ) < 2000 then 1
else 0
end l1 from t1
) t2
/

D P1 P2 L1 G
---------- ---------- ---------- ---------- ----------
452911 451911 453911 0 1
794012 793012 795012 1 2
794034 793034 795034 1 290
794234 793234 795234 1 290
794345 793345 795345 1 290
924952 923952 925952 0 6
1068608 1067608 1069608 1 7
1068617 1067617 1069617 1 290
1237362 1236362 1238362 0 9
1268008 1267008 1269008 1 10

10 rows selected.
---- condition testing is OK

select t2.*,
case when (1=l1)
and ((lag(l1,1) over (order by d)) = 1)
then seq1.currval
else seq1.nextval
end g
from (
select d,d-1000 p1,d+1000 p2,
case when (lead(d,1,0) over (order by d) - d) < 2000 then 1
when (d - (lag(d,1,0) over (order by d)) ) < 2000 then 1
else 0
end l1 from t1
) t2
/

D P1 P2 L1 G
---------- ---------- ---------- ---------- ----------
452911 451911 453911 0 291
794012 793012 795012 1 292
794034 793034 795034 1 293
794234 793234 795234 1 294
794345 793345 795345 1 295
924952 923952 925952 0 296
1068608 1067608 1069608 1 297
1068617 1067617 1069617 1 298
1237362 1236362 1238362 0 299
1268008 1267008 1269008 1 300

10 rows selected.

---- How come G always increments?

analytics rock

Nengbing, January 12, 2007 - 2:38 pm UTC

Tom,

I apologize for the confusion and hope this makes it clear.

Assuming that I have a set of numbers, I'd like to put the numbers into distinctive groups within each group the difference of any number to the next one when the numbers are sorted should be less than 2000, and all numbers that differs from any number in a group by less than 2000 should be in the same group.

For example:

452911 1 ---- put 452911 in group 1
794012 2 ---- in a new group because it differs from 1st number by more than 2000
794034 2 ---- in group 2 because it differs from last number by less than 2000
794234 2 ---- same group 2
794345 2 ---- same group 2
924952 3 ---- new group 3
1068608 4 ---- new group 4
1068617 4 ---- same group 4
1237362 5 ---- new group 5
1268008 6 ---- new group 6

By the way I did find in Oracle document that says "......If any of these locations contains references to both CURRVAL and NEXTVAL, then Oracle increments the sequence and returns the same value for both CURRVAL and NEXTVAL......."

So using sequence as I intended from my last reply does not work.

Thank you so much, Tom!

To: Nengbing

Michel Cadot, January 16, 2007 - 5:01 am UTC

SQL> with
  2    step1 as (
  3      select d, 
  4             case 
  5               when d-nvl(lag(d) over(order by d),0) > 2000 
  6               then rownum
  7             end flag
  8      from t1
  9    ),
 10    step2 as (
 11      select d, max(flag) over (order by d) grp
 12      from step1
 13    )
 14  select d,
 15         dense_rank() over (order by grp) grp
 16  from step2
 17  order by d
 18  /
         D        GRP
---------- ----------
    452911          1
    794012          2
    794034          2
    794234          2
    794345          2
    924952          3
   1068608          4
   1068617          4
   1237362          5
   1268008          6

10 rows selected.

First we flag (with their rownum) the rows with a difference greater than 2000.
Then we propagate this flag to all members of the same group.
Finally we convert this flag to a rank.

Regards
Michel

To: Nengbing

Michel Cadot, January 16, 2007 - 3:40 pm UTC

Another solution, just writing what you said in the rule of a model clause:

SQL> select d, grp
  2  from (select d from t1 order by d)
  3  model
  4    dimension by (rownum rn)
  5    measures (d, cast(null as number) grp) 
  6    rules 
  7      ( grp[ANY] = case when d[cv()-1] is null then 1 -- case first line
  8                        when d[cv()]-d[cv()-1]>2000 then grp[cv()-1]+1
  9                        else grp[cv()-1]
 10                   end
 11       )
 12  order by 1
 13  /
         D        GRP
---------- ----------
    452911          1
    794012          2
    794034          2
    794234          2
    794345          2
    924952          3
   1068608          4
   1068617          4
   1237362          5
   1268008          6

10 rows selected.

Michel

Thanks, Michel

Nengbing, January 16, 2007 - 6:30 pm UTC

Very nicely done!

I have only minor modifications (to replace t1 on line 8 with "select d from t1 order by d"; and 0 on line 5 with -2001 to always have a flag for 1st row )

Thank you very much!

To: Nengbing

Michel Cadot, January 17, 2007 - 12:37 am UTC

You're right for line 5 but ordering table t1 in line 8 is useless as the rows are compared with their previous one in the order given in the lag function (that is "d") whatever is the physical order in the table.

Michel

Michel Cadot, January 17, 2007 - 2:20 am UTC

I have to correct my query with the model clause.
I used "rownum" to dimension the result set, it works here 1) because the model clause does not generate any row and 2) thanks to the current implementation. But this is not the correct way to do it: I should number my lines before, for instance with the row_number function.

SQL> select d, grp
  2  from (select d, row_number() over (order by d) rn from t1)
  3    model
  4    dimension by (rn)
  5    measures (d, cast(null as number) grp) 
  6      rules 
  7        ( grp[ANY] = case when d[cv()-1] is null then 1 -- case first line
  8                          when d[cv()]-d[cv()-1]>2000 then grp[cv()-1]+1
  9                          else grp[cv()-1]
 10                     end
 11        )
 12  order by 1
 13  /
         D        GRP
---------- ----------
    452911          1
    794012          2
    794034          2
    794234          2
    794345          2
    924952          3
   1068608          4
   1068617          4
   1237362          5
   1268008          6

10 rows selected.

Michel

Thanks, Michel

Nengbing, January 17, 2007 - 11:55 am UTC

I tend to agree with you. However, the rownum as flag may throw thing off. For example, if I insert a few more rows like:
insert into t1 values (3000);
insert into t1 values (4000);
insert into t1 values (200);
insert into t1 values (400);

the query returns:

D GRP
---------- ----------
200 1
400 1
3000 1
4000 1
452911 1
794012 1
794034 1
794234 1
794345 1
924952 1
1068608 1
1068617 1
1237362 1
1268008 1

Because step1 returns:
D FLAG
---------- ----------
200 13
400
3000 11
4000
452911 1
794012 2
794034
794234
794345
924952 6
1068608 7
1068617
1237362 9
1268008 10

Thanks again, Michel, becuase of contributions from Oracle users like you, this site is even more helpful!

Thanks Nengbing

Michel Cadot, January 17, 2007 - 3:00 pm UTC

Oh yes, rownum is wrong, row_number should be used instead.

SQL> with
  2    step1 as (
  3      select d, 
  4             case 
  5               when d-nvl(lag(d) over(order by d),-2001) > 2000 
  6               then row_number() over(order by d)
  7             end flag
  8      from t1
  9    ),
 10    step2 as (
 11      select d, max(flag) over (order by d) grp
 12      from step1
 13    )
 14  select d,
 15         dense_rank() over (order by grp) grp
 16  from step2
 17  order by d
 18  /
         D        GRP
---------- ----------
       200          1
       400          1
      3000          2
      4000          2
    452911          3
    794012          4
    794034          4
    794234          4
    794345          4
    924952          5
   1068608          6
   1068617          6
   1237362          7
   1268008          8

14 rows selected.

Thanks to correct that.

Michel

analytics to solve credit / debit allocation

RVH, January 24, 2007 - 12:44 am UTC

Tom,

You mentioned in a response to Dave Thompson on March 04, 2004, that the problem described by him in this thread can not be solved using analytical functions. Just to let you know - the following does the job:

SQL> select * from pay_m;

PAY_ID PAYMENT
---------- ----------
1 50
2 25
3 50
4 50

SQL>
SQL>
SQL> select * from prem;

PREM_ID PREM_PAYMENT
---------- ------------
1 100
2 50
3 50
4 50

SQL>

The following query will allocate the actual payments (identified through PAY_ID) to the 'premiums' (identified through PREM_ID). I do also include the credit/debit amount remaining after the payment has been allocated.

SQL> select vprem.prem_id,
2 vpay.pay_id,
3 greatest(vprem.sum_prem_payment - vpay.sum_payment, 0) remaining_debit,
4 greatest(vpay.sum_payment - vprem.sum_prem_payment, 0) available_credit
5 from (select pay_id,
6 payment,
7 sum(payment) over (order by pay_id) - payment sum_prev_payment,
8 sum(payment) over (order by pay_id) sum_payment
9 from pay_m) vpay,
10 (select prem_id,
11 prem_payment,
12 sum(prem_payment) over (order by prem_id) - prem_payment sum_prev_prem_payment,
13 sum(prem_payment) over (order by prem_id) sum_prem_payment
14 from prem) vprem
15 where ( vpay.sum_payment > vprem.sum_prev_prem_payment
16 and vpay.sum_payment <= vprem.sum_prem_payment)
17 or ( vpay.sum_prev_payment > vprem.sum_prev_prem_payment
18 and vpay.sum_prev_payment < vprem.sum_prem_payment)
19 order by 1 asc, 2 asc
20 /

PREM_ID PAY_ID REMAINING_DEBIT AVAILABLE_CREDIT
---------- ---------- --------------- ----------------
1 1 50 0
1 2 25 0
1 3 0 25
2 3 25 0
2 4 0 25
3 4 25 0

6 rows selected.

Thanks for this very useful web site. I am learning a lot by browsing it.

Query - correction

RVH, January 24, 2007 - 7:37 pm UTC

Tom,

When I tested the query more thouroughly that I gave in my previous email I realised that it has a flaw under certain boundary conditions. I underline the change in the 'where' clause that fixes the problem. It is working fine now ... Cheers

select vprem.prem_id,
vpay.pay_id,
greatest(vprem.sum_prem_payment - vpay.sum_payment, 0) remaining_debit,
greatest(vpay.sum_payment - vprem.sum_prem_payment, 0) available_credit
from (select pay_id,
sum(payment) over (order by pay_id) - payment sum_prev_payment,
sum(payment) over (order by pay_id) sum_payment
from pay_m) vpay,
(select prem_id,
sum(prem_payment) over (order by prem_id) - prem_payment sum_prev_prem_payment,
sum(prem_payment) over (order by prem_id) sum_prem_payment
from prem) vprem
where greatest (vpay.sum_prev_payment, vprem.sum_prev_prem_payment)
< least (vpay.sum_payment, vprem.sum_prem_payment)
order by 1, 2
/

Any single query for this scenario

Subramanian T S, January 25, 2007 - 10:03 am UTC

Hi Tom,

I have a lookup table

company brand area ndays
XX {null} {null} 3
XX BB {null} 10
XX BB 1 4
XX BB 2 8

I want a query to retrieve ndays. The logic is as follows:

First search by company, brand and area
If match found return ndays.
Else search by company and brand
If match found return ndays
Else search by company
If match found return ndays
Else return 0

This can be done simple PL/SQL. I am looking for a single query may be using some built-in functions..

Thanks in advance :)

Cheers !
Subbu.

To: Subramanian T S

Michel Cadot, January 29, 2007 - 6:20 am UTC

One way to do it is (with your data):

SQL> var comp varchar2(10)
SQL> var brand varchar2(10)
SQL> var area number
SQL> begin 
  2    :comp := 'XX';
  3    :brand := 'BB';
  4    :area := 1;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> with 
  2    data as (
  3      select t.company, t.brand, t.area, t.ndays,
  4             row_number () over 
  5               (order by t.company, t.brand nulls last, t.area nulls last) nb
  6      from t
  7      where t.company = :comp
  8        and ( t.brand = :brand or t.brand is null )
  9        and ( t.area = :area or t.area is null ) 
 10    )
 11  select company, brand, area, nvl(ndays,0) ndays
 12  from (select 1 nb from dual) dual, data
 13  where data.nb (+) = dual.nb
 14  /
COMPANY    BRAND            AREA      NDAYS
---------- ---------- ---------- ----------
XX         BB                  1          4

1 row selected.

SQL> exec :area := 3

PL/SQL procedure successfully completed.

SQL> /
COMPANY    BRAND            AREA      NDAYS
---------- ---------- ---------- ----------
XX         BB                            10

1 row selected.

SQL> exec :brand := 'CC'

PL/SQL procedure successfully completed.

SQL> /
COMPANY    BRAND            AREA      NDAYS
---------- ---------- ---------- ----------
XX                                        3

1 row selected.

SQL> exec :comp := 'YY'

PL/SQL procedure successfully completed.

SQL> /
COMPANY    BRAND            AREA      NDAYS
---------- ---------- ---------- ----------
                                          0

1 row selected.

The first subquery ("data") searches all lines that match your criteria and orders them according to your algorithm. The main query output the first row and 0 if there is none.

Regards
Michel

Excellent !!!

A reader, January 30, 2007 - 6:04 am UTC

Thank you Michel! Excellent one..

Cheers !
Subbu.

excellent information.

kde, February 07, 2007 - 2:28 pm UTC

Tom,
I am having problems with the following requirement.
Would you suggest using analytics for this?

My data in the source looks like this.

1,abc,500
1,abc,200
1,abc,100
1,abc,150
2,aaa,200
2,aaa,150
2,aaa,50
3,asa,100
3,asa,30
3,asa,60

I need to generate

0,1,abc,500
1,1,abc,300(i.e.,500-200)
2,1,abc,200(i.e.,300-100)
3,1,abc,50(i.e.,200-150)
0,2,aaa,200
1,2,aaa,50(i.e.,200-150)
2,2,aaa,0(i.e.,50-50)
0,3,asa,100
1,3,asa,70(i.e.,100-30)
2,3,asa,10(i.e.,70-60)

I tried using analytics. Here is the query that I have so far
select
A.COL1 col1
,A.TXT1 txt1
,A.COL2 col2
,sum(A.COL2) as moving_sum
,lead(SUM(A.COL2),1) OVER (PARTITION BY a.col1, a.txt1 ORDER BY A.COL2 DESC) as next_row -- ROWS 1 PRECEDING)
FROM t1 a
group by A.COL1,A.TXT1,A.COL2
ORDER BY A.COL1 ASC,A.TXT1 ASC,A.COL2 DESC

--
I tried using the result of the above query but couldn't process the results.

I would really appreciate any suggestions/help.

thanks
kde

February 07, 2007 - 6:54 pm UTC

there is no explanation of how to get from input to output "requirements" wise. You don't explain what your outputs are - we just have a "picture" and we are supposed to derive the requirements from that - that is a flawed approach.

tell us what the outputs actually mean.
and give us create table and inserts to work with.

To kde ...

Gabe, February 07, 2007 - 8:20 pm UTC

The processing order is quite specific and cannot be derived from your input. To that end I have added a "sequence" column to your data set.

create table z ( id number, nm varchar2(5), val number, seq  number );

insert into z values (1,'abc',500,1);
insert into z values (1,'abc',200,2);
insert into z values (1,'abc',100,3);
insert into z values (1,'abc',150,4);
insert into z values (2,'aaa',200,1);
insert into z values (2,'aaa',150,2);
insert into z values (2,'aaa', 50,3);
insert into z values (3,'asa',100,1);
insert into z values (3,'asa', 30,2);
insert into z values (3,'asa', 60,3);

20:08:28 session_139> select id,nm,seq
20:08:30   2        ,first_value(val) over (partition by id order by seq)
20:08:30   3        - sum(case when seq=1 then 0 else val end)
20:08:30   4            over (partition by id order by seq) balance
20:08:30   5  from  z
20:08:30   6  order by 1,3
20:08:30   7  ;

        ID NM           SEQ    BALANCE
---------- ----- ---------- ----------
         1 abc            1        500
         1 abc            2        300
         1 abc            3        200
         1 abc            4         50
         2 aaa            1        200
         2 aaa            2         50
         2 aaa            3          0
         3 asa            1        100
         3 asa            2         70
         3 asa            3         10

10 rows selected.

apologize

a user, February 07, 2007 - 10:44 pm UTC

<quote>
there is no explanation of how to get from input to output "requirements" wise. You don't explain what your outputs are - we just have a "picture" and we are supposed to derive the requirements from that - that is a flawed approach.
<quote>
Tom,
I apologize for asking questions with incomplete information. I should have given you what Gabe had created.

regards

Using analytics in subqueries

sharvu, February 08, 2007 - 3:15 pm UTC

Tom,

I need your suggestion:

create table a(id number,src number,dt date);
There are 50 Mill records in this table that actually has 50 more columns.
I would need something like:
select count(*) from a t1
where src=999 and dt
= (select max(dt) from a t2 where t2.src=t1.src and t2.id=t1.id)

The query runs forever almost. Is there an effective way to do this?

Thanks,

February 08, 2007 - 4:25 pm UTC

select count(*)
from (select dt, max(dt) over (partition by id) max_dt
from a
where src = 999
)
where dt = max_dt;

Can this be done through Analytical Function?

A reader, February 10, 2007 - 2:19 am UTC

Hi Tom,
I have set of very large tables (ranging from around 40 Million rows to 10 Million rows). I have created a test table just to set up an example what I want to achieve:-
create table test (start_date date, end_date date, amnt number(10));
insert into test values ('01-JAN-2000','01-JAN-2001',100);
insert into test values ('01-JAN-2001','01-JAN-2002',200);
insert into test values ('01-JAN-2002','01-JAN-2003',200);
insert into test values ('01-JAN-2003','01-JAN-2004',500);
insert into test values ('18-JAN-2004','01-JAN-2004',1000);
SQL> select * from test;

START_DAT END_DATE AMNT
--------- --------- ----------
01-JAN-00 01-JAN-01 100
01-JAN-01 01-JAN-02 200
01-JAN-02 01-JAN-03 200
01-JAN-03 01-JAN-04 500
18-JAN-04 01-JAN-04 1000

select * FROM test
2 WHERE (start_date, end_date) IN (SELECT max(start_date), max(END_DATE) FROM test
3 where add_months(start_date,12)-END_DATE = 0);

START_DAT END_DATE AMNT
--------- --------- ----------
01-JAN-03 01-JAN-04 500

i.e., I want to travel backwards and get the amount/record where difference of dates is 1 year.
Regards,

February 12, 2007 - 9:51 am UTC

you want to select specific rows - for that you need a where clause.

analytics are useful to look "across rows", not for getting "specific rows" so much.

another way for you would be

select * 
  from 
(
select * 
  from test
 where months_between(start_date,end_date) = 12
 order by start_date DESC
)
where rownum = 1;

To: A Reader

Michel Cadot, February 12, 2007 - 2:41 am UTC

SQL> select start_date, end_date, amnt
  2  from ( select start_date, end_date, amnt,
  3                max(start_date) over () max_start,
  4                max(end_date) over () max_end
  5         from test
  6         where add_months(start_date,12)-END_DATE = 0
  7        )
  8  where start_date = max_start and end_date = max_end
  9  /
START_DATE  END_DATE          AMNT
----------- ----------- ----------
01-JAN-2003 01-JAN-2004        500

Regards
Michel

Is Analytic Query an answer for this?

A Reader, March 01, 2007 - 12:11 am UTC

Tom,
I really need some help with this. Please Help!!!!

I have a table that looks like this. 
CREATE TABLE T5
   ( COL1 VARCHAR2(26) NOT NULL, 
 COL2 NUMBER(10) NOT NULL, 
 COL3 NUMBER(10) NOT NULL, 
 COL4 VARCHAR2(4) NOT NULL, 
 COL5 VARCHAR2(10)
   )

Insert into T5 values ('2007-01-25 08:22:16',101,1,'A','ABC');
Insert into T5 values ('2007-01-25 08:22:16',101,1,'L','XYZ');
Insert into T5 values ('2007-01-25 08:22:16',101,2,'A','ABC');
Insert into T5 values ('2007-01-25 08:22:16',101,2,'L','XYZ');
Insert into T5 values ('2007-01-25 08:22:16',101,3,'A','ABC');
Insert into T5 values ('2007-01-25 08:22:16',101,3,'L','XYZ');

COL1   COL2 COL3 COL4 COL5
2007-01-25 08:22:16 101 1 A ABC
2007-01-25 08:22:16 101 1 L XYZ
2007-01-25 08:22:16 101 2 A ABC
2007-01-25 08:22:16 101 2 L XYZ
2007-01-25 08:22:16 101 3 A ABC
2007-01-25 08:22:16 101 3 L XYZ

I need the output like this.

COL1   COL2 COL3 A_GRP L_GRP
2007-01-25 08:22:16 101 1 ABC XYZ
2007-01-25 08:22:16 101 2 ABC XYZ
2007-01-25 08:22:16 101 3 ABC XYZ

i.e.
For each group of COL1, COL2, COL3 do the following.
For each ~COL5 where COL4='A'~ put the value of ~COL5 where COL4='L'~ right next to it.
I did try the pivot query,

select 
      col1, col2, col3, 
      max(decode(col4,'A',COL5,null)) a,
      max(decode(col4,'L',COL5,null)) b
from t5
group by col1, col2, col3;

COL1   COL2 COL3 A B
2007-01-25 08:22:16 101 2 ABC XYZ
2007-01-25 08:22:16 101 1 ABC XYZ
2007-01-25 08:22:16 101 3 ABC XYZ

But this would not work once I add more columns to the query.
I was thinking that this could be easily done with Analytic Query but I have not been able to figure it out yet.

Is there another way?

thanks for your help

March 01, 2007 - 8:30 am UTC

why doesn't it work - it is a typical pivot, works for any number of "columns"

tell us what isn't working first.

OK

Kumar, March 05, 2007 - 6:15 am UTC

Hi Tom,
I have a query which gives the price difference at three levels namely
Product,Pack( A pack is group of similar products) and market(all packs).
DDL,DML and query is below.

SQL> create table pack(ccode varchar2(30),pack_code varchar2(30),prod_code number,
2 curr_price number,prev_price number,trans_date date)
3 /

Table created.

SQL> insert into pack values('UK','PK1',100,100.25,140.90,sysdate-6);

SQL> insert into pack values('UK','PK1',101,10.25,40.90,sysdate-6);

SQL> insert into pack values('UK','PK1',100,55,45,SYSDATE-6);

SQL> insert into pack values('UK','PK2',200,22.75,20.75,SYSDATE-5);

SQL> insert into pack values('UK','PK2',200,30.25,27.25,SYSDATE-5);

SQL> insert into pack values('UK','PK2',200,102.50,90.50,SYSDATE-5);

SQL> commit
2 /

Commit complete.

SQL> select * from pack
2 /

CCODE PACK_CODE PROD_CODE CURR_PRICE PREV_PRICE TRANS_DAT
------------------------------ ------------------------------ ---------- ---------- ---------- -----
UK PK1 100 100.25 140.9 27-FEB-07
UK PK1 101 10.25 40.9 27-FEB-07
UK PK1 100 55 45 27-FEB-07
UK PK2 200 22.75 20.75 28-FEB-07
UK PK2 200 30.25 27.25 28-FEB-07
UK PK2 200 102.5 90.5 28-FEB-07

6 rows selected.

When I issue the below query the computed columns namely prod_prc_diff and pack_prc_diff
show the same values.

SQL> col ccode for a5

SQL> col pack_code for a5

SQL> select p.*,sum(abs(curr_price-prev_price)) over(partition by pack_code,prod_code,trans_date) as
prod_prc_diff,sum(abs(curr_price-prev_price)) over(partition by pack_code,trans_date)
as pack_prc_diff from pack p;

CCODE PACK_ PROD_CODE CURR_PRICE PREV_PRICE TRANS_DAT PROD_PRC_DIFF PACK_PRC_DIFF
----- ----- ---------- ---------- ---------- --------- ------------- -------------
UK PK1 100 100.25 140.9 27-FEB-07 40.65 40.65
UK PK1 101 10.25 40.9 27-FEB-07 30.65 30.65
UK PK1 100 55 45 27-FEB-07 10 10
UK PK2 200 22.75 20.75 28-FEB-07 2 2
UK PK2 200 30.25 27.25 28-FEB-07 3 3
UK PK2 200 102.5 90.5 28-FEB-07 12 12

6 rows selected.

But I want the pack_prc_diff column to show two rows as

pack_prc_diff
=============
80.3
17

I also need one more column as below which would be sum of 80.3+17

mkt_prc_diff
============
97.3

Please let me know how to arrive at this.
Thanks Tom.

March 05, 2007 - 1:17 pm UTC

I'm confused, why do our outputs differ given the same inputs

ops$tkyte%ORA10GR2> select p.*,sum(abs(curr_price-prev_price)) over(partition by pack_code,prod_code,trans_date) as
  2  prod_prc_diff,sum(abs(curr_price-prev_price)) over(partition by pack_code,trans_date)
  3  as pack_prc_diff from pack p;

CCODE PACK_  PROD_CODE CURR_PRICE PREV_PRICE TRANS_DAT PROD_PRC_DIFF PACK_PRC_DIFF
----- ----- ---------- ---------- ---------- --------- ------------- -------------
UK    PK1          100     100.25      140.9 27-FEB-07         50.65          81.3
UK    PK1          100         55         45 27-FEB-07         50.65          81.3
UK    PK1          101      10.25       40.9 27-FEB-07         30.65          81.3
UK    PK2          200      102.5       90.5 28-FEB-07            17            17
UK    PK2          200      22.75      20.75 28-FEB-07            17            17
UK    PK2          200      30.25      27.25 28-FEB-07            17            17

Is Analytic Query an answer for this?

A reader, March 05, 2007 - 10:05 am UTC

why doesn't it work - it is a typical pivot, works for any number of "columns"

tell us what isn't working first.

Pivot would work if I add more columns to the max(decode()) list. I want to add more columns to the 'group by' list.

drop table t5;

CREATE TABLE T5
(
  COL1  VARCHAR2(26 BYTE)                       NOT NULL,
  COL2  NUMBER(10)                              NOT NULL,
  COL3  NUMBER(10)                              NOT NULL,
  COL4  VARCHAR2(4 BYTE)                        NOT NULL,
  COL5  VARCHAR2(10 BYTE),
  COL6  NUMBER(10),
  COL7  VARCHAR2(10 BYTE)
)
/


Insert into T5
   (COL1, COL2, COL3, COL4, COL5, 
    COL6, COL7)
 Values
   ('2007-01-25 08:22:16', 101, 1, 'A', 'ABC', 
    1, NULL);
Insert into T5
   (COL1, COL2, COL3, COL4, COL5, 
    COL6, COL7)
 Values
   ('2007-01-25 08:22:16', 101, 1, 'L', 'XYZ', 
    2, 'BB');
Insert into T5
   (COL1, COL2, COL3, COL4, COL5, 
    COL6, COL7)
 Values
   ('2007-01-25 08:22:16', 101, 2, 'A', 'ABC', 
    3, NULL);
Insert into T5
   (COL1, COL2, COL3, COL4, COL5, 
    COL6, COL7)
 Values
   ('2007-01-25 08:22:16', 101, 2, 'L', 'XYZ', 
    4, 'DD');
Insert into T5
   (COL1, COL2, COL3, COL4, COL5, 
    COL6, COL7)
 Values
   ('2007-01-25 08:22:16', 101, 3, 'A', 'ABC', 
    5, NULL);
Insert into T5
   (COL1, COL2, COL3, COL4, COL5, 
    COL6, COL7)
 Values
   ('2007-01-25 08:22:16', 101, 3, 'L', 'XYZ', 
    6, 'FF');
COMMIT;

SQL> select 
  2        col1, col2, col3, col6, col7,
  3        max(decode(col4,'A',COL5,null)) a,
  4        max(decode(col4,'L',COL5,null)) b
  5  from T5
  6  group by col1, col2, col3, col6, col7
  7  order by col1, col2, col3
  8  ;

COL1                 COL2  COL3       COL6  COL7       A       B
------------------- ----- ------- ---------- ---------- ------- -------
2007-01-25 08:22:16   101    1          1               ABC
2007-01-25 08:22:16   101    1          2  BB                XYZ
2007-01-25 08:22:16   101    2          3                 ABC
2007-01-25 08:22:16   101    2          4  DD                XYZ
2007-01-25 08:22:16   101    3          5                 ABC
2007-01-25 08:22:16   101    3          6  FF                XYZ

I do realize that the Pivot Query above is working exactly as it is supposed to.
But I need the output to be like

COL1                 COL2  COL3       COL6  COL7       A       B
------------------- ----- ------- ---------- ---------- ------- -------
2007-01-25 08:22:16   101    1          2  BB        ABC     XYZ
2007-01-25 08:22:16   101    2          4  DD        ABC     XYZ
2007-01-25 08:22:16   101    3          6  FF        ABC     XYZ

And that's why I thought that Analutic Query could be the answer.

thanks

March 05, 2007 - 2:07 pm UTC

and you would have to tell us the LOGIC BY WHICH YOUR OUTPUT IS COMING FROM

did you want the max(col6), and the max(col7) - eg; you did not mean to group by them.


ops$tkyte%ORA10GR2> select
  2        col1, col2, col3, max(col6), max(col7),
  3        max(decode(col4,'A',COL5,null)) a,
  4        max(decode(col4,'L',COL5,null)) b
  5  from T5
  6  group by col1, col2, col3
  7  order by col1, col2, col3
  8  ;

COL1                             COL2       COL3  MAX(COL6) MAX(COL7)  A          B
-------------------------- ---------- ---------- ---------- ---------- ---------- ----------
2007-01-25 08:22:16               101          1          2 BB         ABC        XYZ
2007-01-25 08:22:16               101          2          4 DD         ABC        XYZ
2007-01-25 08:22:16               101          3          6 FF         ABC        XYZ

Logic you asked for!!!

A Reader, March 05, 2007 - 3:27 pm UTC

...LOGIC BY WHICH YOUR OUTPUT IS COMING FROM...

I guess that would help. haha. Here it is...

For the table like T5 which has n (where n is defined) number of keys,
For every row with COL4 as L (call it X),
there may be multiple rows with COL4 as A (call it Y).
Among these matching rows there is a common key (which is a part of n keys above).

Data from Y contains values in multiple columns which are need to be JOINed with columns from X in order to create a ~complete~ row for the purpose of the process.

for example, consider the table below

4KEYS 1key COL4 COL5
----- ---- ---- ---------
P 1 L L_VALUES
Q 1 A A_VALUES
R 1 A A_VALUES

From the table above, create TWO rows (because there are TWO 'A' type lines) that look like the following.

4KEYS 1key L A
----- ---- --------- ---------
P 1 L_VALUES A_VALUES
Q 1 L_VALUES A_VALUES

Would you use analytics for that or just a simple self join?

thanks

March 05, 2007 - 4:24 pm UTC

I did not get the 'four keys' bit?

how did you know to keep "Q" but discard "R"

we are missing something here.

Logic (Updated!!!)

A Reader, March 05, 2007 - 3:31 pm UTC

I think the 4KEYS column would have to go...

1key L A
---- --------- ---------
1 L_VALUES A_VALUES
1 L_VALUES A_VALUES

4KEYS would have to go...

A Reader, March 05, 2007 - 4:28 pm UTC

I think the 4KEYS column would have to go...

1key L A
---- --------- ---------
1 L_VALUES A_VALUES
1 L_VALUES A_VALUES

I was wrong!!!

reader, March 06, 2007 - 9:32 am UTC

Actually I was wrong when I gave this table as an example.

4KEYS 1key COL4 COL5
----- ---- ---- ---------
P 1 L L_VALUES
Q 1 A A_VALUES
R 1 A A_VALUES

THE CORRECT EXAMPLE WOULD BE

4KEYS 1key COL4 COL5
----- ---- ---- ---------
P 1 L L_VALUES
P 1 A A_VALUES
P 1 A A_VALUES

i.e. the 4KEYS would remain same for that rowset.

And from this table I would have to create TWO rows (because there are TWO 'A' type lines) that look like the following.

4KEYS 1key L A
----- ---- --------- ---------
P 1 L_VALUES A_VALUES
P 1 L_VALUES A_VALUES

Sorry for the confusion there...
thanks

March 06, 2007 - 11:11 am UTC

Now you entirely, utterly and totally have lost me.

start over - use much more TEXT with WORDS than "pictures of inputs and outputs"

I fail to see how you get from your input to your output.

Across Rows

A reader, March 08, 2007 - 9:41 pm UTC

Hi Tom,

In relation to data set below:-

create table test (start_date date, end_date date, amnt number(10));
insert into test values ('01-JAN-2000','01-JAN-2001',100);
insert into test values ('01-JAN-2001','01-JAN-2002',200);
insert into test values ('01-JAN-2002','01-JAN-2003',200);
insert into test values ('01-JAN-2003','01-JAN-2004',500);
insert into test values ('18-JAN-2004','01-JAN2004',1000);
SQL> select * from test;

START_DAT END_DATE AMNT
--------- --------- ----------
01-JAN-00 01-JAN-01 100
01-JAN-01 01-JAN-02 200
01-JAN-02 01-JAN-03 200
01-JAN-03 01-JAN-04 500
18-JAN-04 01-JAN-04 1000
How to get a record by comparing end dates of two adjacent records (by traversing backwords) where end dates changed/different i.e., I want to get record (second last) with amount 500 as by comparing its end date with the previous one, the date is different.
Regards,

March 09, 2007 - 10:45 am UTC

ops$tkyte%ORA10GR2> select *
  2    from (
  3  select start_date,
  4         end_date,
  5         lead(start_date) over (order by start_date) next_start_date,
  6         amnt
  7    from test
  8         )
  9   where end_date <> next_start_date
 10  /

START_DAT END_DATE  NEXT_STAR       AMNT
--------- --------- --------- ----------
01-JAN-03 01-JAN-04 18-JAN-04        500

Thanks a lot!

A reader, March 09, 2007 - 5:43 pm UTC

Variation

Juan Velez, March 10, 2007 - 4:15 pm UTC

This is related to the topic, so I would like to ask:

I have a table with effective-dated data, this is, effective/termination dates

CREATE TABLE EFFECTIVEDATED(ID NUMBER NOT NULL, EFFECTIVEDAY DATE NOT NULL, TERMINATIONDAY NOT NULL, SOMEVALUE NUMBER);

INSERT INTO EFFECTIVEDATED VALUES(1, TO_DATE('20070301','YYYYMMDD'), TO_DATE('20070601','YYYYMMDD'), 10);
INSERT INTO EFFECTIVEDATED VALUES(1, TO_DATE('20070901','YYYYMMDD'), TO_DATE('20080601','YYYYMMDD'), 8);
INSERT INTO EFFECTIVEDATED VALUES(1, TO_DATE('20080601','YYYYMMDD'), TO_DATE('99991231','YYYYMMDD'), 12);

The data in this example has gaps if we consider the range 20070101 and 20090101. I would like to get those missing segments as part of a query whose "somevalue" column would be 0, this is, given start/end dates 20070101/20090101 and the above data, a query should return the following records (assume that the query looks for ID=1)

ID EFFECTIVEDAY TERMINATIONDAY SOMEVALUE
-- ------------ -------------- ---------
1  20070101     20070301        0
1  20070301     20070601       10
1  20070601     20070901        0
1  20070901     20080601        8
1  20080601     20090101       12

I have accomplished this using PL/SQL but I know it can be done using analytics, I just do not know how, and that's what I am hoping you can help me with

Thanks

March 12, 2007 - 5:38 pm UTC

you know, by doing that - you've utterly handicapped the optimizer. You should have used NULL, not 99991231

start with this... it can be solved using this set:

ops$tkyte%ORA10GR2> with data
  2  as
  3  (select 1 l from dual union all select 2 l from dual)
  4  select *
  5    from data d, (select id, effectiveday edt, terminationday tdt, somevalue val,
  6                         lag( terminationday ) over (order by effectiveday ) last_tdt,
  7                         lead( effectiveday ) over (order by effectiveday ) next_edt,
  8                         to_date(:x,'yyyymmdd') x,
  9                         to_date(:y,'yyyymmdd') y
 10                    from effectivedated
 11                                   where id = :z
 12                                   ) e
 13   order by edt, l
 14  /

         L         ID EDT       TDT              VAL LAST_TDT  NEXT_EDT  X         Y
---------- ---------- --------- --------- ---------- --------- --------- --------- ---------
         1          1 01-MAR-07 01-JUN-07         10           01-SEP-07 01-JAN-07 01-JAN-09
         2          1 01-MAR-07 01-JUN-07         10           01-SEP-07 01-JAN-07 01-JAN-09
         1          1 01-SEP-07 01-JUN-08          8 01-JUN-07 01-JUN-08 01-JAN-07 01-JAN-09
         2          1 01-SEP-07 01-JUN-08          8 01-JUN-07 01-JUN-08 01-JAN-07 01-JAN-09
         1          1 01-JUN-08 31-DEC-99         12 01-JUN-08           01-JAN-07 01-JAN-09
         2          1 01-JUN-08 31-DEC-99         12 01-JUN-08           01-JAN-07 01-JAN-09

6 rows selected.

Analytics to the rescue

Livio, March 12, 2007 - 6:05 am UTC

I hope my problem can be considered a followup of this thread. Otherwise, I'll try to put a question at a later time.

I have a table like this:

create table eub.test (date_time date, validity number);

insert into test (date_time, validity)
values (to_date('08-01-2007', 'dd-mm-yyyy'), 27334);

insert into test (date_time, validity)
values (to_date('08-01-2007', 'dd-mm-yyyy'), 27335);

insert into test (date_time, validity)
values (to_date('08-01-2007', 'dd-mm-yyyy'), 27336);

insert into test (date_time, validity)
values (to_date('09-01-2007', 'dd-mm-yyyy'), 27334);

insert into test (date_time, validity)
values (to_date('09-01-2007', 'dd-mm-yyyy'), 27335);

insert into test (date_time, validity)
values (to_date('09-01-2007', 'dd-mm-yyyy'), 27340);

insert into test (date_time, validity)
values (to_date('10-01-2007', 'dd-mm-yyyy'), 27334);

insert into test (date_time, validity)
values (to_date('10-01-2007', 'dd-mm-yyyy'), 27335);

insert into test (date_time, validity)
values (to_date('10-01-2007', 'dd-mm-yyyy'), 27336);

commit;

I'd like to make a report like the following:
date_time validity group
---------- -------- -----
08-01-2007 27334 0
08-01-2007 27335 0
08-01-2007 27336 0
09-01-2007 27334 1
09-01-2007 27335 1
09-01-2007 27340 1
10-01-2007 27334 0
10-01-2007 27335 0
10-01-2007 27336 0

Said in other words, dates with the very same validities belong to the same group (or partition.)

Is it possible to figure out a solution using analytics, without writing down procedural code?

Thank you very much in advance.

March 12, 2007 - 8:01 pm UTC

you'll need to be a tad more precise in your definition of a "group"

what logic did you follow to put the first 3 and last 3 together and what happens if there was a group with more "validities" that covered that group and all......

define, precisely, how to identify a group - in a manner a programmer could produce code from.

Analytics to the rescue

Livio, March 13, 2007 - 4:36 am UTC

Tom,
I should have used the word "set" instead of "group". In fact, I put the first 3 and the last three together because
the set of values s1 = {27334, 27335, 27336} associated to the date 08-01-2007 is the same set of values associated to the date 10-01-2007.
The set s2 = {27334, 27335, 27340} associated to the date 09-01-2007 is not equal to s1, so dates 09-01-2007 belong to a different set.
If I insert the following rows:

insert into test (date_time, validity)
values (to_date('11-01-2007', 'dd-mm-yyyy'), 27334);

insert into test (date_time, validity)
values (to_date('11-01-2007', 'dd-mm-yyyy'), 27335);

commit;

I would add a third set s3 = {27334, 27335} that, even if subset of s1 and s2, is not equal to s1, s2. Rows with date 11-01-2007 should be marked as belonging to a third set, say 2.
date_time validity group
---------- -------- -----
08-01-2007 27334 0
08-01-2007 27335 0
08-01-2007 27336 0
09-01-2007 27334 1
09-01-2007 27335 1
09-01-2007 27340 1
10-01-2007 27334 0
10-01-2007 27335 0
10-01-2007 27336 0
11-01-2007 27334 2
11-01-2007 27335 2

To sum up: set equality is the criterion to group rows of table test together. Given two dates d1 and d2, let us call s1 and s2 the set of validities associated to d1 and d2, respectively. d1 and d2 belong to the same set (group, partition) if and only if set s1 equals to set s2.

I'd like to solve the above problem without writing down a procedure going through table test to identify all dates with the same set of validities. I thought that an analytic approach might help.

Thank you

March 13, 2007 - 11:24 am UTC

doubt you would ever come up with an efficient SQL based method for this.

SQL Model and Analytics to the rescue

Frank Zhou, March 13, 2007 - 2:43 pm UTC

Hi Livio,

Here is a SQL model clause solution for you.
I was able to put the identical sets into the same group
(but the group number is not exactly the same as yours)

Frank

SQL> SELECT date_time ,
2 SUBSTR(fin_str,
3 INSTR(fin_str, ',', 1, LEVEL ) + 1,
4 INSTR(fin_str, ',', 1, LEVEL+1) -
5 INSTR(fin_str, ',', 1, LEVEL) -1 ) validity,
6 grp
7 FROM (
8 SELECT date_time , ','||fin_str||',' as fin_str, dense_rank( ) over (order by fin_str) -1 grp
9 FROM
10 (
11 SELECT date_time, validity, validity_str, fin_str
12 FROM test
13 MODEL
14 PARTITION BY (date_time)
15 DIMENSION BY ( row_number() OVER (PARTITION BY date_time ORDER BY validity) rn )
16 MEASURES
17 (
18 CAST(NULL AS VARCHAR2(3255)) validity_str, validity, CAST(NULL AS VARCHAR2(3255)) fin_str
19 )
20 RULES
21 (
22 validity_str[ANY] ORDER BY rn =
23 CASE WHEN validity[cv() - 1 ] IS NULL
24 THEN TO_CHAR(validity[cv()])
25 ELSE validity_str[cv()-1]||','|| TO_CHAR(validity[cv()])
26 END,
27 fin_str[ANY] ORDER BY rn =
28 CASE WHEN validity[cv() + 1 ] IS NULL
29 THEN validity_str[CV()]
30 END
31 )
32 )
33 WHERE fin_str IS NOT NULL
34 )
35 CONNECT BY PRIOR date_time = date_time
36 AND INSTR (fin_str, ',', 1, LEVEL+1) > 0
37 AND PRIOR dbms_random.string ('p', 10) IS NOT NULL
38 order by date_time, validity;

DATE_TIME VALIDITY GRP
--------- ------------------ ----------
08-JAN-07 27334 1
08-JAN-07 27335 1
08-JAN-07 27336 1
09-JAN-07 27334 2
09-JAN-07 27335 2
09-JAN-07 27340 2
10-JAN-07 27334 1
10-JAN-07 27335 1
10-JAN-07 27336 1
11-JAN-07 27334 0
11-JAN-07 27335 0

11 rows selected.

SQL> spool off;

sys_connect_by_path and Analytics to the rescue

Frank Zhou, March 13, 2007 - 3:19 pm UTC

In addition to my 10G SQL Model Clause solution above,
Here is a 9I pure SQL solution.

Frank

SQL> SELECT date_time ,
2 SUBSTR(fin_str,
3 INSTR(fin_str, ',', 1, LEVEL ) + 1,
4 INSTR(fin_str, ',', 1, LEVEL+1) -
5 INSTR(fin_str, ',', 1, LEVEL) -1 ) validity,
6 grp
7 FROM
8 (SELECT date_time,fin_str||',' as fin_str,
9 dense_rank( ) over (order by fin_str) -1 grp
10 FROM
11 (SELECT date_time, validity, fin_str,
12 MAX(LENGTH(fin_str) - LENGTH(REPLACE(fin_str, ',', '')))
13 OVER (PARTITION BY date_time) max_path,
14 LENGTH(fin_str)-LENGTH(REPLACE(fin_str, ',', '')) str_len
15 FROM (SELECT date_time, validity,
16 sys_connect_by_path(validity, ',') fin_str
17 from test
18 connect by prior date_time = date_time
19 and prior validity < validity
20 )
21 )
22 WHERE max_path = str_len
23 )
24 CONNECT BY PRIOR date_time = date_time
25 AND INSTR (fin_str, ',', 1, LEVEL+1) > 0
26 AND PRIOR dbms_random.string ('p', 10) IS NOT NULL
27 order by date_time, validity;

DATE_TIME VALIDITY GRP
--------- ------------------ ----------
08-JAN-07 27334 1
08-JAN-07 27335 1
08-JAN-07 27336 1
09-JAN-07 27334 2
09-JAN-07 27335 2
09-JAN-07 27340 2
10-JAN-07 27334 1
10-JAN-07 27335 1
10-JAN-07 27336 1
11-JAN-07 27334 0
11-JAN-07 27335 0

11 rows selected.

SQL> spool off;

Easier Analytics to the rescue

Michel Cadot, March 13, 2007 - 3:34 pm UTC

Using the well-known stragg function defined at:
http://asktom.oracle.com/pls/asktom/f?p=100:11:2073682287744121::::P11_QUESTION_ID:2196162600402

SQL> with
  2    data as (
  3      select date_time, validity,
  4             stragg(validity) over (partition by date_time) same_date
  5      from (select date_time, validity from test order by validity, date_time)
  6    )
  7  select date_time, validity, 
  8         dense_rank() over (order by same_date) "GROUP"
  9  from data
 10  order by 1, 2
 11  /
DATE_TIME    VALIDITY      GROUP
---------- ---------- ----------
01/08/2007      27334          3
01/08/2007      27335          3
01/08/2007      27336          3
01/09/2007      27334          1
01/09/2007      27335          1
01/09/2007      27340          1
01/10/2007      27334          3
01/10/2007      27335          3
01/10/2007      27336          3
01/11/2007      27334          2
01/11/2007      27335          2

Regards
Michel

March 13, 2007 - 3:54 pm UTC

you'd need to make sure to use a stragg that SORTS for this to work reliably (and probably "distincts" too)

but yes, why didn't I think of a simple pivot....

so perhaps something even as simple as:

ops$tkyte%ORA10GR2> select date_time,
  2         v1, v2, v3, v4, v5, v6, v7, v8
  3    from (
  4  select date_time,
  5         max(decode(rn,1,validity)) v1,
  6         max(decode(rn,2,validity)) v2,
  7         max(decode(rn,3,validity)) v3,
  8         max(decode(rn,4,validity)) v4,
  9         max(decode(rn,5,validity)) v5,
 10         max(decode(rn,6,validity)) v6,
 11         max(decode(rn,7,validity)) v7,
 12         max(decode(rn,8,validity)) v8
 13    from (
 14  select date_time, row_number() over (partition by date_time order by validity) rn, validity
 15    from test
 16         )
 17   group by date_time
 18         )
 19   order by v1, v2, v3, v4, v5, v6, v7, v8
 20  /

DATE_TIME         V1         V2         V3         V4         V5         V6         V7         V8
--------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
08-JAN-07      27334      27335      27336
10-JAN-07      27334      27335      27336
09-JAN-07      27334      27335      27340

would be useful - you can partition by v1, v2, .... vN

(I know, 8 columns = limit, you can raise limit. stragg = limit as well though, varchar2(4000))

RE: Analytics to the rescue

Frank Zhou, March 13, 2007 - 4:19 pm UTC

Hi Tom,

My two SQl solutions above took care of the "Sorting" ( The 9i pure sql solution also handles the "distinct"). For the sql model clause solution to handle the "Distinct"

The following sql pattern can be used :

http://www.jlcomp.demon.co.uk/faq/Exclude_duplicate.html
Thanks,

Frank

Easier Analytics to the rescue (2)

Michel Cadot, March 13, 2007 - 4:19 pm UTC

Tom,

For your first remark, this is why I sorted the result set before applying stragg but of course it works only if we not PARALLEL_ENABLE the function.

For pivot, I thought about it but the problem is when you "group by date_time" to build the set of values per date you lost "validity" which is required in the final result so joining this with the original test seems unavoidable and we have 2 test table scans instead of 1 with stragg.

Regards
Michel

March 13, 2007 - 8:53 pm UTC

not sure that we can 100% rely on the sort for the aggregate to work.... it might, it might not. there is nothing saying it will (there is a version of stragg that saves the inputs, sorts them and THEN concatenates them...)

group by date_time - gives you a set of rows you can now PARTITION BY v1, v2, v..... on

Easier Analytics to the rescue (3)

Michel Cadot, March 14, 2007 - 3:03 am UTC

You mean grouping, ranking, degrouping; something like:

SQL> with 
  2    data as (
  3      select date_time, validity,
  4             row_number() over (partition by date_time order by validity) rn
  5      from test
  6    ),
  7    sets as (
  8      select date_time, 
  9             max(decode(rn,1,validity)) v1,
 10             max(decode(rn,2,validity)) v2,
 11             max(decode(rn,3,validity)) v3,
 12             max(decode(rn,4,validity)) v4,
 13             max(decode(rn,5,validity)) v5,
 14             max(decode(rn,6,validity)) v6,
 15             max(decode(rn,7,validity)) v7,
 16             max(decode(rn,8,validity)) v8,
 17             max(decode(rn,9,validity)) v9
 18      from data
 19      group by date_time
 20    ),
 21    groups as (
 22      select date_time, v1, v2, v3, v4, v5, v6, v7, v8, v9,
 23             dense_rank() over (order by v1, v2, v3, v4, v5, v6, v7, v8, v9) grp
 24      from sets
 25    ),
 26    lines as ( select rownum line from dual connect by level <= 9 )
 27  select date_time, 
 28         decode(line,1,v1,2,v2,3,v3,4,v4,5,v5,6,v6,7,v7,8,v8,9,v9) validity,
 29         grp "GROUP"
 30  from groups, lines
 31  where decode(line,1,v1,2,v2,3,v3,4,v4,5,v5,6,v6,7,v7,8,v8,9,v9)
 32          is not null
 33  order by 1, 2
 34  /
DATE_TIME    VALIDITY      GROUP
---------- ---------- ----------
01/08/2007      27334          1
01/08/2007      27335          1
01/08/2007      27336          1
01/09/2007      27334          2
01/09/2007      27335          2
01/09/2007      27340          2
01/10/2007      27334          1
01/10/2007      27335          1
01/10/2007      27336          1
01/11/2007      27334          3
01/11/2007      27335          3

I noticed that you seldomly use "with" clause and prefer inline views. Is this just your way of writing SQL or is there any underlying reason?

Regards
Michel

March 14, 2007 - 7:52 am UTC

i would not bother ungrouping again, once the data is inline like that (v1....vn) it is better than good enough :)

it is my way of writing SQL. I typically start with simple query "Q" - wrap it in parentheses, put the next layer on it.

rather than go "down" the page - my queries tend to explode up and down the page - add a bit on top, add a bit on the bottom...

Analytics to the rescue

Livio, March 14, 2007 - 7:59 am UTC

Hi folks,

thank you all for your help.

Eventually, I could figure out a solution based on the fact that a set of integers s = {s1, s2, ... sn} can be thought of as the number 2^s1 + 2^s2 + ...+ 2^sn.

Here's my solution:

select date_time, validity, dns_rnk, sum(power(2, dns_rnk)) over (partition by date_time) set_id from (
select date_time, validity, dense_rank() over (order by validity) - 1  dns_rnk
from test
)
order by day;

DATE_TIME VALIDITY DNS_RNK SET_ID
--------- ---------- ---------- ----------
08-GEN-07 27334 0 7
08-GEN-07 27335 1 7
08-GEN-07 27336 2 7
09-GEN-07 27334 0 11
09-GEN-07 27335 1 11
09-GEN-07 27340 3 11
10-GEN-07 27334 0 7
10-GEN-07 27335 1 7
10-GEN-07 27336 2 7
11-GEN-07 27334 0 3
11-GEN-07 27335 1 3

I use dense_rank() in the inner query to "normalize" validities and avoid a numeric overflow error while computing the sum of the powers of 2.
Yet, should I have a large number of validities for each date, the value of set_id can become very large (something like 1,563E+112 in my original table) making two different sets look "the same". Anyway, I think this is just a matter of formatting the output.

I will compare the proposed solutions with the one I devised
and apply the best that suits my case.

Thanks again

Analytics to the rescue

Michel Cadot, March 14, 2007 - 8:56 am UTC

Hi Livio,

Nicely thought.

Regards
Michel

To Livio ... constraints matter

Gabe, March 14, 2007 - 11:35 pm UTC

<quote>a set of integers s = {s1, s2, ... sn} can be thought of as the number 2^s1 + 2^s2 + ...+ 2^sn</quote>

Your table definition does not have a unique constraint; the potential need to "distinct" has been mentioned and you have not clarified if duplicates can exist (and how it would impact the definition of "set equality").

If they do, your approach will give you false grouping since, for any n, Power(2,n) + power(2,n) = power(2,n+1) ... and it is just one of the combinations that would give you false positives.

Re: To Livio ... constraints matter

Livio, March 15, 2007 - 8:07 am UTC

Hi Gabe;

thanks for your remark.

You're right: I did not clarified about the possibility of having duplicates values or not. Actually, I don't have to cope with duplicates. My observation should have been written more precisely as

<quote>a set of distinct integers s = {s1, s2, ... sn} can be thought of as the number 2^s1 + 2^s2 + ...+
2^sn</quote>

In fact, columns date_time and validity make up the PK of table test.

Regards
Livio

Analytics to the rescue... continue

Michel Cadot, March 15, 2007 - 8:46 am UTC

Livio,

You can change you big numbers to smaller applying the rank function on the set_id:

SQL> with 
  2    data as (
  3      select date_time, validity, 
  4             dense_rank() over (order by validity) - 1 val_rk
  5      from test
  6    ),
  7    groups as (
  8      select date_time, validity,
  9             sum(power(2,val_rk)) over (partition by date_time) set_id
 10      from data
 11    )
 12  select date_time, validity,
 13         dense_rank () over (order by set_id) - 1 "GROUP"
 14  from groups
 15  order by 1, 2
 16  /
DATE_TIME    VALIDITY      GROUP
---------- ---------- ----------
01/08/2007      27334          1
01/08/2007      27335          1
01/08/2007      27336          1
01/09/2007      27334          2
01/09/2007      27335          2
01/09/2007      27340          2
01/10/2007      27334          1
01/10/2007      27335          1
01/10/2007      27336          1
01/11/2007      27334          0
01/11/2007      27335          0

Regards
Michel

Re: Analytics to the rescue... continue

Livio, March 16, 2007 - 5:33 am UTC

Hi Michel;

you're right. One more analytic makes the output more intelligible.

Thanks for your suggestion

Best Regards

Livio

find the ranges

vincenzo, March 23, 2007 - 1:18 pm UTC

Hi Tom,
I have a big table with about 500 million rows:

  
drop table big_table;
create table big_table 
(serial_number number,
product_code char(10),
constraint pk_bt primary key (serial_number)
);


insert into big_table values (1,'product a');
insert into big_table values (2,'product a');
insert into big_table values (3,'product a');
insert into big_table values (6,'product a');
insert into big_table values (7,'product a');
insert into big_table values (10,'product b');
insert into big_table values (11,'product b');
insert into big_table values (12,'product b');
insert into big_table values (13,'product a');


SQL> select * from big_table;

SERIAL_NUMBER PRODUCT_CO
------------- ----------
            1 product a
            2 product a
            3 product a
            6 product a
            7 product a
           10 product b
           11 product b
           12 product b
           13 product a



I need to aggregate the serial_number by ranges and product_code.
The output should look like this:

FIRST_SERIAL LAST_SERIAL PRODUCT_CODE
------------- ------------- ----------
            1             3 product a   
            6             7 product a   
           10            12 product b
           13            13 product a

How can be done using analytic function ?

Thanks

March 26, 2007 - 7:04 am UTC

ops$tkyte%ORA10GR2> select product_code, min(serial_number), max(serial_number)
  2    from (
  3  select product_code, serial_number,
  4         max(grp) over (partition by product_code order by serial_number) maxgrp
  5    from (
  6  select product_code, serial_number,
  7         case when lag(serial_number)
  8                   over (partition by product_code
  9                         order by serial_number) <> serial_number-1
 10                   or
 11                   row_number()
 12                   over (partition by product_code
 13                         order by serial_number) =1
 14              then row_number()
 15                   over (partition by product_code
 16                         order by serial_number)
 17              end grp
 18    from big_table
 19         )
 20         )
 21   group by product_code, maxgrp
 22   order by product_code, maxgrp
 23  /

PRODUCT_CO MIN(SERIAL_NUMBER) MAX(SERIAL_NUMBER)
---------- ------------------ ------------------
product a                   1                  3
product a                   6                  7
product a                  13                 13
product b                  10                 12

4 rows selected.

To vincenzo

Michel Cadot, March 24, 2007 - 5:50 pm UTC

With the assumption that your serial_number is strictly positive (otherwise change the -1 in the nvl function):

SQL> with 
  2    data as (
  3      select serial_number, product_code,
  4             case 
  5             when nvl(lag(serial_number) 
  6                    over (partition by product_code order by serial_number),-1)
  7                  != serial_number-1
  8               then serial_number
  9             end flag
 10      from big_table
 11    ),
 12    grouping as (
 13      select serial_number, product_code,
 14             max(flag) over (partition by product_code order by serial_number) grp
 15      from data
 16    )
 17  select min(serial_number) first_serial,
 18         max(serial_number) last_serial,
 19         max(product_code) product_code
 20  from grouping
 21  group by grp
 22  order by 1
 23  /
FIRST_SERIAL LAST_SERIAL PRODUCT_CO
------------ ----------- ----------
           1           3 product a
           6           7 product a
          10          12 product b
          13          13 product a

4 rows selected.

Regards
Michel

Dates with overlaps

Sudha Bhagavatula, March 26, 2007 - 9:28 am UTC

I have a question regarding lag and lead.

create table t_mbr_enrol ( subr_id varchar2(15),
dep_nbr varchar2(2),
eff_date date,
term_date date);

insert into t_mbr_enrol values('1001','0', TO_DATE('19990101','YYYYMMDD'), TO_DATE('19991231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','0', TO_DATE('20020101','YYYYMMDD'), TO_DATE('20021231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','0', TO_DATE('20030101','YYYYMMDD'), TO_DATE('20031231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','0', TO_DATE('20040101','YYYYMMDD'), TO_DATE('20041231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','0', TO_DATE('20050101','YYYYMMDD'), TO_DATE('20051231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','0', TO_DATE('20060101','YYYYMMDD'), TO_DATE('20061231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','0', TO_DATE('20070101','YYYYMMDD'), TO_DATE('99991231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','1', TO_DATE('20040701','YYYYMMDD'), TO_DATE('20041231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','1', TO_DATE('20050101','YYYYMMDD'), TO_DATE('20051231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','1', TO_DATE('20060101','YYYYMMDD'), TO_DATE('20061231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','1', TO_DATE('20070101','YYYYMMDD'), TO_DATE('99991231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','2', TO_DATE('20050801','YYYYMMDD'), TO_DATE('99991231','YYYYMMDD'));
insert into t_mbr_enrol values('1001','3', TO_DATE('20070101','YYYYMMDD'), TO_DATE('99991231','YYYYMMDD'));

create table t_mdcr_elig ( subr_id varchar2(15),
dep_nbr varchar2(2),
eff_date date);

insert into t_mdcr_elig values('1001','0',TO_DATE('20040301','YYYYMMDD'));
insert into t_mdcr_elig values('1001','1',TO_DATE('20060801','YYYYMMDD'));

select * from t_mbr_enrol;

SUBR_ID DEP_NBR EFF_DATE TERM_DATE
--------------- ------- ----------- -----------
1001 0 01/01/1999 12/31/1999
1001 0 01/01/2002 12/31/2002
1001 0 01/01/2003 12/31/2003
1001 0 01/01/2004 12/31/2004
1001 0 01/01/2005 12/31/2005
1001 0 01/01/2006 12/31/2006
1001 0 01/01/2007 12/31/9999
1001 1 07/01/2004 12/31/2004
1001 1 01/01/2005 12/31/2005
1001 1 01/01/2006 12/31/2006
1001 1 01/01/2007 12/31/9999
1001 2 08/01/2005 12/31/9999
1001 3 01/01/2007 12/31/9999

13 rows selected

select * from t_mdcr_elig;

SUBR_ID DEP_NBR EFF_DATE
--------------- ------- -----------
1001 0 03/01/2004
1001 1 08/01/2006

My final output should be like this:

SUBR_ID DEP_NBR EFF_DATE TERM_DATE
--------------- ------- ----------- -----------
1001 0 01/01/1999 12/31/1999
1001 0 01/01/2002 12/31/2002
1001 0 01/01/2003 12/31/2003
1001 0 01/01/2004 02/29/2004
1001 0 03/01/2004 12/31/2004
1001 0 01/01/2005 12/31/2005
1001 0 01/01/2006 07/31/2006
1001 0 08/01/2006 12/31/2006
1001 0 01/01/2007 12/31/9999
1001 1 07/01/2004 12/31/2004
1001 1 01/01/2005 12/31/2005
1001 1 01/01/2006 07/31/2006
1001 1 08/01/2006 12/31/2006
1001 1 01/01/2007 12/31/9999
1001 2 08/01/2005 07/31/2006
1001 2 08/01/2006 12/31/9999
1001 3 01/01/2007 12/31/9999

Is there a way of doing this in a single sql query with analytics ?

Thanks a lot
Sudha

March 26, 2007 - 11:12 am UTC

how's about you explain the logic behind your output - so we can understand what we are looking at instead of having to try to understand what you might have been thinking...

Analytics

Sudha Bhagavatula, March 26, 2007 - 12:44 pm UTC

The table t_mbr_enrol contains data for a member's enrollment into a medical plan.

The subscriber id is common to all members of a the same contract. A contract can have only one primary subscriber which is ascertained by the dep_nbr (dependent number) = 0. The rest of them are dependents of this primary subscriber. This table contains no overlaps and the date boundaries for each subr_id, dep_nbr are set.

The second table t_mdcr_elig contains the dat when each member became eligible for medicare, when they torn 65 years of age or due to some medical condition.

When this happens, the dates in enrollment need to be sliced further to reflect medicare eligibility.

So in the example given earlier, the final output is like this:

SUBR_ID DEP_NBR EFF_DATE TERM_DATE
--------------- ------- ----------- -----------
1001 0 01/01/1999 12/31/1999 -- no change
1001 0 01/01/2002 12/31/2002 -- no change
1001 0 01/01/2003 12/31/2003 -- no change
1001 0 01/01/2004 02/29/2004 -- last date before medicare for dep nbr = 0
1001 0 03/01/2004 12/31/2004 -- start of medicare eligibility for subscriber
1001 0 01/01/2005 12/31/2005 -- no change
1001 0 01/01/2006 07/31/2006 -- last date before medicare eligibilty of a dependent (dep_nbr = 1)
1001 0 08/01/2006 12/31/2006 -- medicare eligiblity starts for dep nbr = 1
1001 0 01/01/2007 12/31/9999 -- no change
1001 1 07/01/2004 12/31/2004 -- no change
1001 1 01/01/2005 12/31/2005 -- no change
1001 1 01/01/2006 07/31/2006 -- last date before medicare for dep nbr = 1
1001 1 08/01/2006 12/31/2006 -- medicare eligibility for dep nbr = 1
1001 1 01/01/2007 12/31/9999 -- no change
1001 2 08/01/2005 07/31/2006 -- dates split for medicare eligibilty for dep nbr = 1
1001 2 08/01/2006 12/31/9999 -- dates split for medicare eligibilty for dep nbr = 1
1001 3 01/01/2007 12/31/9999 -- no change since this member became active after the splits

I need to split dates for medicare eligibilty.

Thanks
Sudha

March 26, 2007 - 2:11 pm UTC

that does not really explain the logic at all.

MODEL clause ?

Gary, March 27, 2007 - 3:24 am UTC

The logic appears to be...

DECLARE
  CURSOR c_1 is 
   select subr_id, dep_nbr, eff_date, term_date
   from t_mbr_enrol
   order by 1,2,3;
  cursor c_2 
    (p_subr in varchar2, p_start in date, p_end in date) is 
  select eff_date - 1 eff_date
  from t_mdcr_elig
  where subr_id = p_subr
  and eff_date between p_start and p_end
  order by eff_date;
  v_out_line varchar2(100);
BEGIN
  FOR c_r1 in c_1 LOOP
    FOR c_r2 in c_2 
      (c_r1.subr_id, c_r1.eff_date, c_r1.term_date) LOOP
      dbms_output.put_line(c_r1.subr_id||':'||c_r1.dep_nbr||
        ' From '||c_r1.eff_date||' to '||c_r2.eff_date);
      c_r1.eff_date := c_r2.eff_date+1;
    END LOOP;
    dbms_output.put_line(c_r1.subr_id||':'||c_r1.dep_nbr||
     ' From '||c_r1.eff_date||' to '||c_r1.term_date);
  END LOOP;
END;
/

The following SQL is close, but it needs to generate an extra row whenever the split date is present. If they are on 10G, the MODEL clause would be an option

select t1.subr_id, t1.dep_nbr, 
       t1.eff_date, t1.term_date, t2.eff_date split_date
from t_mbr_enrol t1 left outer join t_mdcr_elig t2
      on (t1.subr_id = t2.subr_id
     and t2.eff_date between t1.eff_date and t1.term_date)
order by t1.subr_id, t1.dep_nbr, t1.eff_date, t2.eff_date;

how to get this select

A reader, March 27, 2007 - 5:02 am UTC

Thanks Tom for this web site,

I have the following question

create table t (name varchar2(50), ident varchar2(13))

insert into t values ('Jean','12456');

insert into t values ('Jean','12457');

insert into t values ('Jean','12458');

insert into t values ('Paul','67850');

insert into t values ('Paul','67851');

insert into t values ('Remy','4879');

commit;

I would like to issue a select that will give me:

Jean 12456 -- only once for Jean whatever the ident
Paul 67850 -- only once for Paul whatever the ident
Remy 4879 -- only once for Remy whatever the ident

Thanks a lot

March 27, 2007 - 9:35 am UTC

select name, min(ident) from t group by name;

I have found

A reader, March 27, 2007 - 5:47 am UTC

Tom,

I have found the solution to get my desired select

select * from (
select name, ident, row_number() over (partition by name order by ident) seq
from t
)
where seq = 1

Thanks

March 27, 2007 - 9:35 am UTC

use the infinitely more straightforward query right above

this is an inappropriate use of analytics.

A reader, March 27, 2007 - 10:58 am UTC

and it is more performant.

Thanks very much Tom

Thanks Gary

Sudha Bhagavatula, March 28, 2007 - 9:50 am UTC

Thanks Gary for the input. I managed to get an extra row by doing a union instead of an outer join.

pagination

shay, March 29, 2007 - 3:44 am UTC

hi tom,

further to pagination result set i have another question:
i have this statement
select *
from (
select
id,type_id,
row_number() over
(order by sortby) rn
from wscportal_mpower.wsc where father_id=1382)
where rn between 1 and 16
order by rn
/

and it returns :

ID TYPE_ID RN
---------- ---------- ----------
1087809 0 1
29359 0 2
113 0 3
1064393 0 4
447340 0 5
462072 0 6
1031179 0 7
458014 0 8
1014710 1 9
464113 0 10
1014711 1 11
1066580 1 12
1032922 0 13
827134 0 14
838371 0 15
864671 0 16

16 rows selected.

i would like to ask also on the type_id and if i find 2 rows with type_id=1 i would like to stop the resultset and get the answer. in other words to stop at row 11 (include)?!
is it possible?

thanks a lot

March 30, 2007 - 12:02 pm UTC

no table create
no insert intos
no look

Help with queries

Jayadevan, March 30, 2007 - 6:30 am UTC

Tom,
Could you please help with the following issues?
1)
There is a table x with 2 columns

SQL> desc x
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CUS_ID                                             NUMBER
 CLMAMT                                             NUMBER

SQL> select * from x;

CUS_ID CLMAMT
---------- ----------
1 13008
3 10500
2 9306
6 4500
4 3002
5 3000
7 3000

7 rows selected.

We have to classify the customers into A,B and C depending on whether the sum total of their claims come to 50%,90% and 10 %. If we do it manually, this is what we will get if we do it in excel

CusID Claim amt  Running total  Running total as % total   ABC class Description
1 13008    13008   28.08532688   A Top customers adding to 50% of total claims
3 10500    23508   50.75567838   A Top customers adding to 50% of total claims
2 9306    32814   70.84808705   B Top customers adding to 90% of total claims
6 4500    37314   80.56395198   B Top customers adding to 90% of total claims
4 3002    40316   87.04551343   B Top customers adding to 90% of total claims
5 3000    43316   93.52275671   C Rest
7 3000    46316   100    C Rest

How can we get this using a query?

There is something wrong with the way I am using a simple sum and over.
It is clubbing the claim amount with the same values and adding it in the first instance itself.

SELECT cus_id, clmamt,
SUM(clmamt)
OVER (ORDER BY clmamt desc ) x
FROM x
1 13008 13008
3 10500 23508
2 9306 32814
6 4500 37314
4 3002 40316
5 3000 46316
7 3000 46316

2) I have a table with 3 columns
tariff, start_date, end_date
There is one record with values
1000 01-Jan-2006 31-Jan-2006

I need to insert into another table records such as
1000 01-Jan-2006
1000 02-Jan-2006
1000 03-Jan-2006
.....
to
1000 31-Jan-2006

i.e. for each date starting with teh start date upto the end date in this table,
insert a record in the target table. Can I do this with one query?

March 30, 2007 - 1:41 pm UTC

1) order by something "distinct" - else - neither of the claimamts "come first", they arrive "together"

order by clmamt desc, rowid

2) yes...

ops$tkyte%ORA10GR2> select * from t;

         X Y         Z
---------- --------- ---------
      1000 01-JAN-07 10-JAN-07
      2000 15-JAN-07 22-JAN-07

ops$tkyte%ORA10GR2> with data
  2  as
  3  (select level l from dual connect by level <= (select max(z-y+1) from t)
  4  )
  5  select t.*,
  6         t.y+data.l-1
  7    from data, t
  8   where data.l <= t.z-t.y+1
  9   order by x, t.y+data.l-1
 10  /

         X Y         Z         T.Y+DATA.
---------- --------- --------- ---------
      1000 01-JAN-07 10-JAN-07 01-JAN-07
      1000 01-JAN-07 10-JAN-07 02-JAN-07
      1000 01-JAN-07 10-JAN-07 03-JAN-07
      1000 01-JAN-07 10-JAN-07 04-JAN-07
      1000 01-JAN-07 10-JAN-07 05-JAN-07
      1000 01-JAN-07 10-JAN-07 06-JAN-07
      1000 01-JAN-07 10-JAN-07 07-JAN-07
      1000 01-JAN-07 10-JAN-07 08-JAN-07
      1000 01-JAN-07 10-JAN-07 09-JAN-07
      1000 01-JAN-07 10-JAN-07 10-JAN-07
      2000 15-JAN-07 22-JAN-07 15-JAN-07
      2000 15-JAN-07 22-JAN-07 16-JAN-07
      2000 15-JAN-07 22-JAN-07 17-JAN-07
      2000 15-JAN-07 22-JAN-07 18-JAN-07
      2000 15-JAN-07 22-JAN-07 19-JAN-07
      2000 15-JAN-07 22-JAN-07 20-JAN-07
      2000 15-JAN-07 22-JAN-07 21-JAN-07
      2000 15-JAN-07 22-JAN-07 22-JAN-07

18 rows selected.

Great

Jayadevan, March 31, 2007 - 3:27 am UTC

Tom,
Thanks a lot for the replies. I am able to do the ABC classification by adding a case statement. I have learned more about Oracle from asktom than from books or training sessions.
Well, some Indians say - 'Cricket is our religion and Tendulkar is our God'. With India crashing out of the world cup, may be Indian Oracle developers can switch to 'Oracle is our religion and Tom is our God :)'.

about running %

wawan, April 02, 2007 - 4:46 am UTC

dear Tom,

create table SCORING
(Name varchar2(5),
Amount number(10,2));

insert into SCORING values('AA',200);
insert into SCORING values('BB',400);
insert into SCORING values('CC',-200);
insert into SCORING values('DD',-300);
insert into SCORING values('EE',400);
insert into SCORING values('FF',500);

commit;

compute sum of Amount on report
break on report

select name,Amount ,
ratio_to_report(amount) over() prctg
from SCORING;

NAME AMOUNT PRCTG
----- ---------- ----------
AA 200 .2
BB 400 .4
CC -200 -.2
DD -300 -.3
EE 400 .4
FF 500 .5
----------
sum 1000

how to make the result is like this ?
the % calculated from the sum of postive amount only
(1500 ---> 200/1500, 400/1500, 500/1500)

NAME AMOUNT PRCTG
----- ---------- ---------- ----- -------
AA 200 .2 200 .1333
BB 400 .4 400 .2666
CC -200 -.2
DD -300 -.3
EE 400 .4 400 .2666
FF 500 .5 500 .3333
---------- -----
sum 1000 1500

I add two last column-result (off course) by typing.

regards

April 03, 2007 - 8:35 pm UTC

ops$tkyte%ORA10GR2> select name,Amount ,
  2  ratio_to_report(case when amount>0 then amount end) over() prctg
  3  from SCORING;

NAME                               AMOUNT      PRCTG
------------------------------ ---------- ----------
AA                                    200 .133333333
BB                                    400 .266666667
CC                                   -200
DD                                   -300
EE                                    400 .266666667
FF                                    500 .333333333

6 rows selected.

To wawan

Michel Cadot, April 02, 2007 - 10:59 am UTC

SQL> select name, amount, 
  2         ratio_to_report(case when amount >= 0 then amount end) over() prctg 
  3  from SCORING; 
NAME      AMOUNT      PRCTG
----- ---------- ----------
AA           200 .133333333
BB           400 .266666667
CC          -200
DD          -300
EE           400 .266666667
FF           500 .333333333

6 rows selected.

Regards
Michel

Thanks

wawan, April 02, 2007 - 9:06 pm UTC

ok. thanks for your query.

To wawan: to get your result

Michel CADOT, April 04, 2007 - 4:30 am UTC

SQL> select case when grouping(name) = 1 then 'Sum' else name end name, 
  2         sum(amount1) amount1, prctg1, sum(amount2) amount2, prctg2 
  3  from (
  4  select name, amount amount1, 
  5         ratio_to_report(amount) over() prctg1, 
  6         case when amount >= 0 then amount end amount2,
  7         ratio_to_report(case when amount >= 0 then amount end) over() prctg2 
  8  from SCORING
  9  )
 10  group by grouping sets ((),(name,prctg1,prctg2))
 11  order by 1
 12  /
NAME     AMOUNT1     PRCTG1    AMOUNT2     PRCTG2
----- ---------- ---------- ---------- ----------
AA           200         .2        200 .133333333
BB           400         .4        400 .266666667
CC          -200        -.2
DD          -300        -.3
EE           400         .4        400 .266666667
FF           500         .5        500 .333333333
Sum         1000                  1500

7 rows selected.

Regards
Michel

Stumped on Analytics!!

Saurabh, April 05, 2007 - 6:43 am UTC

Hi Tom,

This is in response to review "Stumped on Analytics" posted by "Dave Thompson" on 'March 04, 2004'.

I know its too late(ages late) to reply but the case presented was interesting. Besides, I had similar(more complex) situation in one of my projects. Unfortunately, I had to do it on Teradata and Procedural Code was not allowed as a standard (by DBA's). So I gave a shot at Analytics. Finally objective was achieved and performance was worth effort spent but the resulting SQL was too complex to understand and maintain.

For presented case, I think below SQL will do. cusum's are used to achieve looping effect. Analytics rocks!! lot of procedural code can be eliminated using analytics.

select p.pay_id, p.payment, q.prem_payment, q.prem_id, 
   p.cusum, q.cusum,
--   p.prev_cusum, q.prev_cusum,
   case when p.cusum <= q.cusum then p.payment else p.cusum - q.cusum end amt_applied
from
(
select 
pay_id,
payment,
sum(payment) over(order by pay_id rows between unbounded preceding and current row) cusum,
sum(payment) over(order by pay_id rows between unbounded preceding and 1 preceding ) prev_cusum
from pay_m pay
) p , 
(
select prem_id, prem_payment, 
sum(prem_payment) over(order by prem_id  rows between unbounded preceding and current row) cusum,
sum(prem_payment) over(order by prem_id  rows between unbounded preceding and 1 preceding ) prev_cusum
from prem ) q
where nvl(p.prev_cusum,0) < q.cusum
and p.cusum  >=  nvl(q.prev_cusum,0)


    PAY_ID    PAYMENT PREM_PAYMENT    PREM_ID   
---------- ---------- ------------ ---------- 
         1         50          100          1   
         2         25          100          1   
         3         50          100          1   
         3         50           50          2   
         4         50           50          2   
         4         50           50          3

If a grouped output is required, user-defined aggregation function, such as yours STRAGG, can be used.

regards
saurabh

looking at two different prior rows at same time

Kevin Meade, April 09, 2007 - 1:56 pm UTC

--
-- this continues to be cool stuff but I have a variation on analytics I have not seen yet
-- I have read through the analytic post for hours now
--
-- guys in the office had a problem and wanted an all analytic solution
-- essentially they want to look at two different prior rows at the same time
-- (at least I think that is the best way to describe it)
--
--

drop table temp1
/

create table temp1
(
claim_id number
,transtype varchar2(3)
,dollartype varchar2(3)
,transdate date
,amount number
)
/

insert into temp1 values (1,'TEK','USD',to_date('01','dd'),1100);
insert into temp1 values (1,'REV','USD',to_date('02','dd'),1200);
insert into temp1 values (1,'TEK','USD',to_date('04','dd'),1400);
insert into temp1 values (1,'TEK','USD',to_date('05','dd'),1500);
insert into temp1 values (1,'REV','USD',to_date('06','dd'),1600);
insert into temp1 values (1,'REV','USD',to_date('07','dd'),1700);
insert into temp1 values (1,'TEK','USD',to_date('08','dd'),1800);
insert into temp1 values (1,'TEK','USD',to_date('09','dd'),1900);
insert into temp1 values (1,'TEK','USD',to_date('10','dd'),1950);
insert into temp1 values (1,'REV','USD',to_date('11','dd'),2000);

commit
/

select * from temp1 order by transdate
/

/*

CLAIM_ID TRA DOL TRANSDATE AMOUNT
---------- --- --- --------- ----------
1 TEK USD 01-APR-07 1100
1 REV USD 02-APR-07 1200
1 TEK USD 04-APR-07 1400
1 TEK USD 05-APR-07 1500
1 REV USD 06-APR-07 1600
1 REV USD 07-APR-07 1700
1 TEK USD 08-APR-07 1800
1 TEK USD 09-APR-07 1900
1 TEK USD 10-APR-07 1950
1 REV USD 11-APR-07 2000

10 rows selected.
*/

--
-- by claim, for each row, get the amount from the prior REV row
-- get the amount from the prior TEK row
--
-- using scalars it looks like this
--

select a.*
,(
select amount
from temp1 b
where b.claim_id = a.claim_id
and b.transtype = 'TEK'
and b.transdate =
(
select max(c.transdate)
from temp1 c
where c.claim_id = a.claim_id
and c.transdate < a.transdate
and c.transtype = 'TEK'
)
) prior_TEK
,(
select amount
from temp1 b
where b.claim_id = a.claim_id
and b.transtype = 'REV'
and b.transdate =
(
select max(c.transdate)
from temp1 c
where c.claim_id = a.claim_id
and c.transdate < a.transdate
and c.transtype = 'REV'
)
) prior_REV
from temp1 a
order by transdate
/

/*
CLAIM_ID TRA DOL TRANSDATE AMOUNT PRIOR_TEK PRIOR_REV
---------- --- --- --------- ---------- ---------- ----------
1 TEK USD 01-APR-07 1100
1 REV USD 02-APR-07 1200 1100
1 TEK USD 04-APR-07 1400 1100 1200
1 TEK USD 05-APR-07 1500 1400 1200
1 REV USD 06-APR-07 1600 1500 1200
1 REV USD 07-APR-07 1700 1500 1600
1 TEK USD 08-APR-07 1800 1500 1700
1 TEK USD 09-APR-07 1900 1800 1700
1 TEK USD 10-APR-07 1950 1900 1700
1 REV USD 11-APR-07 2000 1950 1700

10 rows selected.
*/

--
-- but they wanted a one pass analytic solution
-- I racked my brain for a couple of days and couldn't figure out how to do this with analytics
-- in one pass. What I need is something like this: (FICTITIOUS WHERE CLAUSE IN THE ANALYTIC)
-- maybe I am just being dense
--

/*
select a.*
,lag(amount) over (partition by claim_id order by transdate WHERE TRANSTYPE = 'REV')
,lag(amount) over (partition by claim_id order by transdate WHERE TRANSTYPE = 'TEK')
from temp1 a
order by transdate
/
*/

--
-- But of course this WHERE clause is not part of the anlaytic syntax so...
-- I gave them the scalars and said if they wanted a one pass analytic solution then
-- they would need someone smarter than me cause I couldn't figure it out
-- to whit they said "what about asking TOM :)", so here I am
--

-- Kevin from CT USA

April 09, 2007 - 2:05 pm UTC

first thing that popped into my head was....

ops$tkyte%ORA10GR2> select claim_id, transtype, dollartype, transdate, amount,
  2         lag(last_tek) over (order by transdate) real_last_tek,
  3         lag(last_rev) over (order by transdate) real_last_rev
  4    from (
  5  select claim_id, transtype, dollartype, transdate, amount,
  6         last_value(case when transtype='TEK' then amount end ignore nulls) over (order by transdate) last_tek,
  7         last_value(case when transtype='REV' then amount end ignore nulls) over (order by transdate) last_rev
  8    from temp1
  9         )
 10   order by transdate
 11  /

  CLAIM_ID TRA DOL TRANSDATE     AMOUNT REAL_LAST_TEK REAL_LAST_REV
---------- --- --- --------- ---------- ------------- -------------
         1 TEK USD 01-APR-07       1100
         1 REV USD 02-APR-07       1200          1100
         1 TEK USD 04-APR-07       1400          1100          1200
         1 TEK USD 05-APR-07       1500          1400          1200
         1 REV USD 06-APR-07       1600          1500          1200
         1 REV USD 07-APR-07       1700          1500          1600
         1 TEK USD 08-APR-07       1800          1500          1700
         1 TEK USD 09-APR-07       1900          1800          1700
         1 TEK USD 10-APR-07       1950          1900          1700
         1 REV USD 11-APR-07       2000          1950          1700

10 rows selected.

carry down on 9i

Scott, April 09, 2007 - 11:41 pm UTC

G'day Tom,

I've been trying to translate your "carry down" example to my seemingly simple problem, but I cannot seem to get it right.

Given

create table swtest (a number, b varchar2(2)); 
insert into swtest values (3,'51'); 
insert into swtest values (4,null); 
insert into swtest values (6,'E9'); 

9i> select * from swtest order by a 

         A B 
---------- -- 
         3 51 
         4 
         6 E9

I want a column C that would return the most recent value of B, eg:

         A B  MI 
---------- -- -- 
         3 51 51 
         4    51 
         6 E9 E9

Something like this sounds kinda right in theory, but doesn't return 51 for the 2nd row.

  1   select a, b,last_value(b) over (order by a rows between unbounded preceding and current row ) 
  2   from swtest 
  3*  order by a 
9i> / 

         A B  LA 
---------- -- -- 
         3 51 51 
         4 
         6 E9 E9

My colleague successfully used

last_value(b IGNORE NULLS) over (order by a range between unbounded preceding and current row )

But I'm on 9i.
Nulls first or any other "order by" fiddling doesn't have an affect on the windowing function.

Any help?

April 10, 2007 - 10:28 am UTC

ops$tkyte%ORA9IR2> select a, b,
  2         substr(max( case when b is not null then to_char(a,'fm0000000000' ) || b end ) over ( order by a ),11) maxb
  3    from swtest
  4  /

         A B  MAX
---------- -- ---
         3 51 51
         4    51
         6 E9 E9

Saurabh, April 10, 2007 - 12:39 am UTC

select a, b, 
max(decode(a,pm, b,null)) over(order by a rows between unbounded preceding and current row)
from (
select a, b,
max(decode(b,null,null,a)) over(order by a rows between unbounded preceding and current row) pm
from swtest 
) x
order by a

         A B  MA
---------- -- --
         3 51 51
         4    51
         5    51
         6 E9 E9

close...

Scott, April 10, 2007 - 1:26 am UTC

Ahh, but if you add

insert into swtest values (7,'60'); 
insert into swtest values (8,NULL);

It's erroneous to my problem. That's relying on the ordering the character set.

I have managed this:

select a, first_value(b) over (partition by r) b 
from (
   select a, b, max(r) over (order by a) r 
   from (
      select a, b, case when b is not null then row_number() over (order by a)  end r
      from swtest
   )
)
order by a;

But I was hoping for something with less inline views, as of course it's ultimately part of a larger query.

Most elegant... that's going in the 9i notebook.

Scott, April 10, 2007 - 7:56 pm UTC

thanks muchly again

Kevin, April 11, 2007 - 4:42 pm UTC

thanks very much for your timely response. I'll look it over in depth.

I am amazed every time I ask you something, how easily you seem to grasp my problem, and how hard it is for me to grasp your solution. hmm.. all of a sudden I am thinking Jethro Tull: you are the wise man and I feel "thick as a brick".

Kevin

analytic function in union query

wawan, April 12, 2007 - 12:49 am UTC

dear Tom , dear All

as a subject ,

create table SCORING1
(Name varchar2(5),
Amount number(10,2));

create table SCORING2
(Name varchar2(5),
Amount number(10,2));

insert into SCORING1 values('AA',200);
insert into SCORING1 values('BB',400);
insert into SCORING1 values('CC',-200);
insert into SCORING2 values('DD',-300);
insert into SCORING2 values('EE',400);
insert into SCORING2 values('FF',500);

commit;

compute sum of Amount on report
break on report

select name,Amount ,
ratio_to_report(amount) over() prctg
from SCORING1
/

NAME AMOUNT PRCTG
----- ---------- ----------
AA 200 .5
BB 400 1
CC -200 -.5
----------
sum 400

select name,Amount ,
ratio_to_report(amount) over() prctg
from SCORING2
/

NAME AMOUNT PRCTG
----- ---------- ----------
DD -300 -.5
EE 400 .666666667
FF 500 .833333333
----------
sum 600

select name,Amount ,
ratio_to_report(amount) over() prctg
from SCORING1
union
select name,Amount ,
ratio_to_report(amount) over() prctg
from SCORING2
/

NAME AMOUNT PRCTG
----- ---------- ----------
AA 200 .5
BB 400 1
CC -200 -.5
DD -300 -.5
EE 400 .666666667
FF 500 .833333333
----------
sum 1000

what is the correct query so the % is
calculated from 1000 for each rows ?

April 13, 2007 - 11:41 am UTC

ops$tkyte%ORA9IR2> compute sum of amount on report
ops$tkyte%ORA9IR2> break on report
ops$tkyte%ORA9IR2> select name, amount, ratio_to_report(amount) over () prctg
  2    from (select * from scoring1 UNION ALL select * from scoring2)
  3  /

NAME                               AMOUNT      PRCTG
------------------------------ ---------- ----------
AA                                    200         .2
BB                                    400         .4
CC                                   -200        -.2
DD                                   -300        -.3
EE                                    400         .4
FF                                    500         .5
                               ----------
sum                                  1000

6 rows selected.

User defined analytic function

Miki, April 16, 2007 - 3:55 pm UTC

Dear Tom,

I have a difficult query to identify rows that member of a continuous constant number (for exapmle 0 traffic for 7 days).
I solved this problem with using couple of analytic functions and subquerys. The idea was to separate the non-0 and the 0 rows and assign a virtual partition number to each row...But I found this method too difficult to understand and genererated a lots of code.
I rethinked my solution and I tried to write a user defined analytic function.
CREATE TABLE "T"
( "TRAFFIC_DATE" DATE,
"MC_ID" NUMBER(*,0),
"TRAFFIC" NUMBER(*,0)
);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('04-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('05-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('06-04-2007', 'dd-mm-yyyy'), 1, 100);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('07-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('08-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('09-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('10-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('11-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('12-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('13-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('14-04-2007', 'dd-mm-yyyy'), 1, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('15-04-2007', 'dd-mm-yyyy'), 1, 3);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('04-04-2007', 'dd-mm-yyyy'), 2, 2);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('05-04-2007', 'dd-mm-yyyy'), 2, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('06-04-2007', 'dd-mm-yyyy'), 2, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('07-04-2007', 'dd-mm-yyyy'), 2, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('08-04-2007', 'dd-mm-yyyy'), 2, 100);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('09-04-2007', 'dd-mm-yyyy'), 2, 100);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('10-04-2007', 'dd-mm-yyyy'), 2, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('11-04-2007', 'dd-mm-yyyy'), 2, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('12-04-2007', 'dd-mm-yyyy'), 2, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('13-04-2007', 'dd-mm-yyyy'), 2, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('14-04-2007', 'dd-mm-yyyy'), 2, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('15-04-2007', 'dd-mm-yyyy'), 2, 0);
insert into T (TRAFFIC_DATE, MC_ID, TRAFFIC)
values (to_date('16-04-2007', 'dd-mm-yyyy'), 2, 2);
commit;
My desired output:
SELECT a.*
,f_normal_day_fg(a.traffic) over(PARTITION BY a.mc_id ORDER BY a.traffic_date ROWS BETWEEN 6 preceding AND 6 following)
FROM t a;
TRAFFIC_DATE MC_ID TRAFFIC F_NORMAL_DAY_FG(A.TRAFFIC)OVER
2007. 04. 04. 1 0 1
2007. 04. 05. 1 0 1
2007. 04. 06. 1 100 1
2007. 04. 07. 1 0 0
2007. 04. 08. 1 0 0
2007. 04. 09. 1 0 0
2007. 04. 10. 1 0 0
2007. 04. 11. 1 0 0
2007. 04. 12. 1 0 0
2007. 04. 13. 1 0 0
2007. 04. 14. 1 0 0
2007. 04. 15. 1 3 1
2007. 04. 04. 2 2 1
2007. 04. 05. 2 0 1
2007. 04. 06. 2 0 1
2007. 04. 07. 2 0 1
2007. 04. 08. 2 100 1
2007. 04. 09. 2 100 1
2007. 04. 10. 2 0 1
2007. 04. 11. 2 0 1
2007. 04. 12. 2 0 1
2007. 04. 13. 2 0 1
2007. 04. 14. 2 0 1
2007. 04. 15. 2 0 1
2007. 04. 16. 2 2 1

My function:
create or replace type F_Normal_Day as object
(
NemNullaForgDB NUMBER,
cnt NUMBER,
static function
ODCIAggregateInitialize(sctx IN OUT F_Normal_Day )
return number,

member function
ODCIAggregateIterate(self IN OUT F_Normal_Day ,
value IN NUMBER )
return number,

member function
ODCIAggregateTerminate(self IN F_Normal_Day,
returnValue OUT NUMBER,
flags IN NUMBER)
return NUMBER,

member function
ODCIAggregateMerge(self IN OUT F_Normal_Day,
ctx2 IN F_Normal_Day)
return NUMBER
)
;
create or replace type body F_Normal_Day
is

static function ODCIAggregateInitialize(sctx IN OUT F_Normal_Day)
return number
is
begin
sctx := F_Normal_Day( 0,0 );
return ODCIConst.Success;
end;

member function ODCIAggregateIterate(self IN OUT F_Normal_Day,
value IN NUMBER )
return number
is
begin
IF VALUE=0 THEN
self.NemNullaForgDB:=self.NemNullaForgDB+1;
ELSE
self.NemNullaForgDB:=0;
END IF;
self.cnt:=greatest(SELF.cnt,SELF.NemNullaForgDB);

RETURN ODCIConst.Success;
end;

member function ODCIAggregateTerminate(self IN F_Normal_Day,
returnValue OUT NUMBER,
flags IN NUMBER)
return number
is
BEGIN

IF SELF.cnt>=7 THEN
returnValue := 0;
ELSE
returnValue := 1;
END IF;

-- returnValue:=SELF.cnt;
-- returnValue:=self.NemNullaForgDB;
return ODCIConst.Success;
end;

member function ODCIAggregateMerge(self IN OUT F_Normal_Day,
ctx2 IN F_Normal_Day)
return number
is
BEGIN
return ODCIConst.Success;
NULL;
end;

end;

CREATE OR REPLACE FUNCTION f_normal_day_fg (input NUMBER) RETURN NUMBER
PARALLEL_ENABLE AGGREGATE USING f_normal_day;

My question is: how can I (if it is possible) to assign the 7 number as parameter to the F_Normal_Day object? I mean, I like to have an analytic funtion like lag() that have 2 parameter. I can not add another parameter to f_normal_day_fg, because it raises an error...

Thanks in advance,
Miki

April 17, 2007 - 9:39 am UTC

aggregate functions - which you are building - take precisely one argument.

You could do something like:

SELECT a.*
,f_normal_day_fg(a.traffic || ',7' ) over(PARTITION BY a.m

perhaps and substr it out - or use an application context and call dbms_session.set_context to the value prior to calling the function from SQL.

calculate % of sum

wawan, April 17, 2007 - 6:14 am UTC

drop table scoring1;
drop table scoring2;

create table SCORING1
(Name varchar2(5),
Location number(2),
Amount number(10,2));

create table SCORING2
(Name varchar2(5),
Location number(2),
Amount number(10,2));

insert into SCORING1 values('AA',1,200);
insert into SCORING1 values('AA',2,300);
insert into SCORING1 values('AA',3,-400);
insert into SCORING1 values('CC',3,500);
insert into SCORING1 values('CC',1,200);
insert into SCORING1 values('CC',2,-200);

insert into SCORING1 values('AA',1,200);
insert into SCORING1 values('AA',2,300);
insert into SCORING1 values('AA',3,-400);
insert into SCORING1 values('CC',3,500);
insert into SCORING1 values('CC',1,200);
insert into SCORING1 values('CC',2,-200);
commit;

col prctg1 format 99.99
col prctg2 format 99.99

compute sum of AAA on name
compute sum of BBB on name
compute sum of prctg1 on name
compute sum of prctg2 on name

break on name skip 1

select A.name,A.location,sum(A.amount) AAA,
case when sum(A.amount)>0 then sum(A.amount) end BBB,
ratio_to_report(case when sum(A.amount)>0 then sum(A.amount) end)
over(partition by A.name) prctg2
from
(
select name,location,amount
from SCORING1
union all
select name,location,amount
from SCORING2
) A
group by A.name, A.location
/

NAME LOCATION AAA BBB PRCTG2
----- ---------- ---------- ---------- ------
AA 1 400 400 .40
2 600 600 .60
3 -800
***** ---------- ---------- ------
sum 200 1000 1.00

CC 1 400 400 .29
2 -400
3 1000 1000 .71
***** ---------- ---------- ------
sum 1000 1400 1.00

how add a new column , which calculate
PRCTG2 * sum of each name : .40* 200 and .60*200 for AA
and .29*1000 and .71*1000 for CC ?

like this :
NAME LOCATION AAA BBB PRCTG2 NEW_COLUMN
----- ---------- ---------- ---------- ------ ----------
AA 1 400 400 .40 80
2 600 600 .60 120
3 -800
***** ---------- ---------- ------ ----------
sum 200 1000 1.00 200

CC 1 400 400 .29 290
2 -400
3 1000 1000 .71 710
***** ---------- ---------- ------ ----------
sum 1000 1400 1.00 1000

its too complicated for me :)

thanks before

April 17, 2007 - 10:13 am UTC

(please discover the CODE button!!! very hard to read)

...
PRCTG2 * sum of each name : .40* 200 and .60*200 for AA
......

where does 200 come from?

calculate query question

wawan, April 17, 2007 - 9:04 pm UTC

col prctg1 format 99.99
col prctg2 format 99.99

compute sum of AAA on name
compute sum of BBB on name
compute sum of prctg1 on name
compute sum of prctg2 on name

break on name skip 1

select A.name,A.location,sum(A.amount) AAA,
case when sum(A.amount)>0 then sum(A.amount) end BBB,
ratio_to_report(case when sum(A.amount)>0 then sum(A.amount) end)
over(partition by A.name) prctg2
from
(
select name,location,amount
from SCORING1
union all
select name,location,amount
from SCORING2
) A
group by A.name, A.location
/

NAME LOCATION AAA BBB PRCTG2
----- ---------- ---------- ---------- ------
AA 1 400 400 .40
2 600 600 .60
3 -800
***** ---------- ---------- ------
sum 200 1000 1.00

CC 1 400 400 .29
2 -400
3 1000 1000 .71
***** ---------- ---------- ------
sum 1000 1400 1.00

how add a new column , which calculate
PRCTG2 * sum of each name : .40* 200 and .60*200 for AA
and .29*1000 and .71*1000 for CC ?

like this :
NAME LOCATION AAA BBB PRCTG2 NEW_COLUMN
----- ---------- ---------- ---------- ------ ----------
AA 1 400 400 .40 80
2 600 600 .60 120
3 -800
***** ---------- ---------- ------ ----------
sum 200 1000 1.00 200

CC 1 400 400 .29 290
2 -400
3 1000 1000 .71 710
***** ---------- ---------- ------ ----------
sum 1000 1400 1.00 1000

its too complicated for me :)

thanks before

the 200 came from sum of amount of each name
sum AA is 200
sum CC is 1000

April 18, 2007 - 11:56 am UTC

ops$tkyte%ORA10GR2> select name,
  2         location,
  3             aaa,
  4             bbb,
  5             prctg2,
  6             prctg2*sum(aaa) over (partition by name) sum_aaa
  7    from (
  8  select A.name,
  9         A.location,
 10             sum(A.amount) AAA,
 11         case when sum(A.amount)>0 then sum(A.amount) end BBB,
 12         ratio_to_report(case when sum(A.amount)>0 then sum(A.amount) end)
 13                 over(partition by A.name) prctg2
 14  from
 15  (
 16  select name,location,amount
 17  from SCORING1
 18  union all
 19  select name,location,amount
 20  from SCORING2
 21  ) A
 22  group by A.name, A.location
 23         )
 24  /

NAME                             LOCATION        AAA        BBB     PRCTG2    SUM_AAA
------------------------------ ---------- ---------- ---------- ---------- ----------
AA                                      1        400        400         .4         80
AA                                      2        600        600         .6        120
AA                                      3       -800
CC                                      1        400        400 .285714286 285.714286
CC                                      2       -400
CC                                      3       1000       1000 .714285714 714.285714

6 rows selected.

asking query

wawan, April 23, 2007 - 2:18 am UTC

Tom,


drop table scoring1;
drop table scoring2;
drop table scoring3;

create table SCORING1
(Name  varchar2(5),
Location number(2),
Amount number(10,2));

create table SCORING2
(Name  varchar2(5),
Location number(2),
Amount number(10,2));

create table SCORING3
(Name  varchar2(5),
Location number(2),
Amount number(10,2));

insert into SCORING1 values('AA',1,-500);
insert into SCORING1 values('AA',2,300);
insert into SCORING1 values('AA',3,400);
insert into SCORING1 values('CC',3,500);
insert into SCORING1 values('CC',1,200);
insert into SCORING1 values('CC',2,-600);

insert into SCORING1 values('AA',1,200);
insert into SCORING1 values('AA',2,500);
insert into SCORING1 values('AA',3,400);
insert into SCORING1 values('CC',3,500);
insert into SCORING1 values('CC',1,400);
insert into SCORING1 values('CC',2,-200);


insert into SCORING3 values('AA',4,500);
insert into SCORING3 values('AA',5,500);
insert into SCORING3 values('CC',6,600);
insert into SCORING3 values('CC',7,300);
insert into SCORING3 values('CC',8,300);
commit;

col prctg1 format 99.99
col prctg2 format 99.99

compute sum of AAA on name
compute sum of BBB on name
compute sum of prctg1 on name
compute sum of prctg2 on name

break on name skip 1

select name,
        location,
            aaa,
            bbb,
            prctg2,
            prctg2*sum(aaa) over (partition by name) sum_aaaXprctg2
   from (
 select A.name,
        A.location,
            sum(A.amount) AAA,
        case when sum(A.amount)>0 then sum(A.amount) end BBB,
        ratio_to_report(case when sum(A.amount)>0 then sum(A.amount) end)
                over(partition by A.name) prctg2
 from
 (
 select name,location,amount
 from SCORING1
 union all
 select name,location,amount
 from SCORING2
 ) A
 group by A.name, A.location
        )
/

select name,location,ratio_to_report(amount)
 over(partition by name) prctg3 from scoring3
/

Commit complete.


NAME    LOCATION        AAA        BBB PRCTG2 SUM_AAAXPRCTG2
----- ---------- ---------- ---------- ------ --------------
AA             1       -300
               2        800        800    .50            650
               3        800        800    .50            650
*****            ---------- ---------- ------
sum                    1300       1600   1.00

CC             1        600        600    .38            300
               2       -800
               3       1000       1000    .63            500
*****            ---------- ---------- ------
sum                     800       1600   1.00


6 rows selected.


NAME    LOCATION     PRCTG3
----- ---------- ----------
AA             4         .5
               5         .5

CC             6         .5
               7        .25
               8        .25


how to make a query with result like this ?

NAME  LOCATION    AAA     BBB PRCTG2 SUM_AAA  new_col prctg3
----- -------- ------ ------- ------ -------  ------- ------
AA           1   -300
             2    800     800    .50     650
             3    800     800    .50     650
             4                                 325     .5
             5                                 325     .5
*****          ------ ------- ------ -------   ------
sum              1300    1600   1.00


CC           1    600     600    .38     300
             2   -800
             3   1000    1000    .63     500
             6                                  250    .5
             7                                  125    .25
             8                                  125    .25
*****          ------ ------- ------
sum               800    1600   1.00


the new column is get from amount in location 3 * prctg3,
for the location 4,5 and so on.

regards

Analytics

Vamsi Krishna, April 27, 2007 - 2:31 am UTC

Tom you are the BEST.

I am exploring the world of Oracle with your invaluable support.

Thanks TOM Keep going........

Using SQL to convert nulls to the value

ht, May 01, 2007 - 12:37 am UTC

Hello Tom,
I've searched your site and am not sure if this is the correct place to ask this question so my apologies up front. I've been stuck on this concept for some time. All I want to do is to use sql (as opposed to pl/sql) to display (and not store) a row's previous value if it is null.

For instance, in the snippet below, I provide a script that adds the emp.week. column. I then populate the column using pl/sql. Then, I query the table for empno=7902.

The resulting output is close to what I want but instead of:

40 7902 8092
41
42 7902 8094
43
44 7902 8096
45
46 7902 8098

I would like:
40 7902 8092
41 7902 8092
42 7902 8094
43 7902 8094
44 7902 8096
45 7902 8096
46 7902 8098

The idea is I would like to save "space" in my database by not storing rows that, when checked, were the previous value (which could also be < the previous value). But I would like to provide a report like the one above to compare empno commissions. The actual output is just a sample, I'm really storing the growth of extents over time in multiple databases.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

Connected.
desc emp
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 EMPNO                                     NOT NULL NUMBER(4)
 ENAME                                              VARCHAR2(10)
 JOB                                                VARCHAR2(9)
 MGR                                                NUMBER(4)
 HIREDATE                                           DATE
 SAL                                                NUMBER(7,2)
 COMM                                               NUMBER(7,2)
 DEPTNO                                             NUMBER(2)


alter table emp disable constraint pk_emp;

Table altered.


alter table emp add week number;

Table altered.


declare
  2  x       number;
  3  y       number;
  4  z       number;
  5  begin
  6  for i in 1 .. 52 loop
  7          for j in (select distinct empno from emp) loop
  8                  -- every other week for some empnos
  9                  y:=mod(j.empno,2);
 10                  x:=mod(i,2);
 11                  if y=0 then
 12                          if x=0 then
 13                                  insert into emp (empno,comm,week)values(j.empno,j.empno+150+i,i);
 14                          end if;
 15                  end if;
 16                  -- every week for some empnos
 17                  z:=mod(j.empno,2);
 18                  if z!=0 then
 19                          insert into emp (empno,comm,week)values(j.empno,j.empno-175-i,i);
 20                  end if;
 21          end loop;
 22  end loop;
 23  end;
 24  /

PL/SQL procedure successfully completed.


--select week,empno,comm from emp where week is not null order by week,empno;

/*
select x.week,x.empno,sum(comm) commission4week from emp x where week is not null
group by x.week,x.empno
order by x.week,x.empno;
*/

select distinct e.week,xyz.empno,xyz.commission4week from emp e ,
  2  (select x.week,x.empno,sum(comm) commission4week from emp x where week is not null
  3  and x.empno=7902
  4  group by x.week,x.empno) xyz
  5  where e.week=xyz.week(+)
  6  order by week,empno;

      WEEK      EMPNO COMMISSION4WEEK
---------- ---------- ---------------
         1
         2       7902            8054
         3
         4       7902            8056
         5
         6       7902            8058
         7
         8       7902            8060
         9
        10       7902            8062
        11
        12       7902            8064
        13
        14       7902            8066
        15
        16       7902            8068
        17
        18       7902            8070
        19
        20       7902            8072
        21
        22       7902            8074
        23
        24       7902            8076
        25
        26       7902            8078
        27
        28       7902            8080
        29
        30       7902            8082
        31
        32       7902            8084
        33
        34       7902            8086
        35
        36       7902            8088
        37
        38       7902            8090
        39
        40       7902            8092
        41
        42       7902            8094
        43
        44       7902            8096
        45
        46       7902            8098
        47
        48       7902            8100
        49
        50       7902            8102
        51
        52       7902            8104


53 rows selected.


desc emp
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 EMPNO                                              NUMBER(4)
 ENAME                                              VARCHAR2(10)
 JOB                                                VARCHAR2(9)
 MGR                                                NUMBER(4)
 HIREDATE                                           DATE
 SAL                                                NUMBER(7,2)
 COMM                                               NUMBER(7,2)
 DEPTNO                                             NUMBER(2)
 WEEK                                               NUMBER


delete emp where week is not null;

468 rows deleted.


alter table emp drop column week;

Table altered.


desc emp
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 EMPNO                                              NUMBER(4)
 ENAME                                              VARCHAR2(10)
 JOB                                                VARCHAR2(9)
 MGR                                                NUMBER(4)
 HIREDATE                                           DATE
 SAL                                                NUMBER(7,2)
 COMM                                               NUMBER(7,2)
 DEPTNO                                             NUMBER(2)


alter table emp enable constraint pk_emp;

Table altered.

exit;

May 01, 2007 - 9:52 am UTC

last_value analytic function with the new in 10g option of "ignore nulls" is what you want:

scott%ORA10GR2> select empno, comm, last_value( comm ignore nulls ) over (order by empno) carry_down
  2  from emp
  3  order by empno;

     EMPNO       COMM CARRY_DOWN
---------- ---------- ----------
      7369
      7499        300        300
      7521        500        500
      7566                   500
      7654       1400       1400
      7698                  1400
      7782                  1400
      7788                  1400
      7839                  1400
      7844          0          0
      7876                     0
      7900                     0
      7902                     0
      7934                     0

14 rows selected.

Works great!

ht, May 01, 2007 - 1:42 pm UTC

Tom,
It's amazing that we can be stuck on a problem for a few days, "give up", submit it to you, go to bed, and wake up with a solution that not only avoids pl/sql but is quicker than the flawwed sql I had settled on.
This solution will save me gigabytes of storage since I don't have to store a value if it hasn't changed since the last check. Also, it saves storage because I don't have to create a table that would have periodically stored my summarized data (using slower pl/sql) so queries would run acceptably.
Thank you.

reader

A reader, May 08, 2007 - 4:12 pm UTC

Using analytical function, is it possible match two adjacent rows and print just the column and values that has changed during update

For example, if I use CDC (Change Data Capture), it prints out the before and after image of a row that has been updated. Ex

UU 123 456 789 101 102
UN 123 555 789 101 102

In the above example UU is the before imaage and UN is the after image. The row has four columns

This row has been updated. But only one column-value has been change i-e column2 value changed from 456 to 555

Is there a way to scan the subscriber view to identify , the column and vlaue that has been changed and print it

May 11, 2007 - 8:36 am UTC

do you have

a) a 'primary key'
b) a flag that says "this is before, this is after"

if so,

select pk, 
       decode( old_c1, new_c1, cast( null as <datatype> ), new_c1 ) c1,
       ....
  from (
select pk, 
       max( decode( flag, 'before', c1 ) ) old_c1,
       max( decode( flag, 'after',  c1 ) ) new_c1,
       ....
  from t
 group by pk
       )

Avoiding a full table scan wgeb using outer joins?

ht, May 10, 2007 - 5:45 am UTC

Tom,
Is it possible to avoid the full table scan occuring below? After implementing your carry_over technique above, it takes over 5 hours to run this query for a specific instance using my "production data".
I believe the predicate causing the issue is:
WHERE e.batch_id = xyz.batch_id(+)

Is there a way to get the output desired and prevent a full table scan?

Thanks again.

SQL*Plus: Release 10.2.0.1.0 - Production on Thu May 10 02:35:41 2007

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

Connected.
SELECT DISTINCT
  2          e.batch_id,
  3          sum(xyz.total)size_mb,
  4          last_value(sum(xyz.total) ignore nulls) over(ORDER BY e.batch_id) carry_over
  5  FROM
  6          testeextent e,
  7          (
  8          SELECT
  9                   x.batch_id,
 10                  SUM(x.size_mb) total
 11          FROM
 12                  testeextent x
 13          WHERE
 14                  x.instance_id =7383
 15          GROUP BY
 16                  x.batch_id
 17          )xyz
 18  WHERE e.batch_id = xyz.batch_id(+)
 19  group by e.batch_id
 20  ORDER BY e.batch_id;

  BATCH_ID    SIZE_MB CARRY_OVER
---------- ---------- ----------
         1          4          4
         2         12         12
         3                    12
         4                    12
         5         40         40
         6         60         60
         7                    60
         8         68         68

8 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 137322112

--------------------------------------------------------------------------------
----------

| Id  | Operation             | Name             | Rows  | Bytes | Cost (%CPU)|
Time     |

--------------------------------------------------------------------------------
----------

|   0 | SELECT STATEMENT      |                  |     5 |    85 |    17  (12)|
00:00:01 |

|   1 |  WINDOW BUFFER        |                  |     5 |    85 |    17  (12)|
00:00:01 |

|   2 |   SORT GROUP BY       |                  |     5 |    85 |    17  (12)|
00:00:01 |

|*  3 |    HASH JOIN OUTER    |                  |     8 |   136 |    16   (7)|
00:00:01 |

|   4 |     TABLE ACCESS FULL | TESTEEXTENT      |     8 |    16 |    14   (0)|
00:00:01 |

|   5 |     VIEW              |                  |     5 |    75 |     1   (0)|
00:00:01 |

|   6 |      HASH GROUP BY    |                  |     5 |    35 |     1   (0)|
00:00:01 |

|*  7 |       INDEX RANGE SCAN| TESTEEXTENT_IDX1 |     5 |    35 |     1   (0)|
00:00:01 |

--------------------------------------------------------------------------------
----------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("E"."BATCH_ID"="XYZ"."BATCH_ID"(+))
   7 - access("X"."INSTANCE_ID"=7383)
       filter("X"."INSTANCE_ID"=7383)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         23  consistent gets
          0  physical reads
          0  redo size
        783  bytes sent via SQL*Net to client
        469  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          8  rows processed

May 11, 2007 - 10:38 am UTC

umm, think about this please..

how would anything short of a full scan be even remotely efficient here?

  5  FROM
  6          testeextent e,
  7          (
  8          SELECT
  9                   x.batch_id,
 10                  SUM(x.size_mb) total
 11          FROM
 12                  testeextent x
 13          WHERE
 14                  x.instance_id =7383
 15          GROUP BY
 16                  x.batch_id
 17          )xyz
 18  WHERE e.batch_id = xyz.batch_id(+)
 19  group by e.batch_id
 20  ORDER BY e.batch_id;

you are getting - by definition - every thing from testeextent - E is retrieved in FULL

tell me, what do you think should happen in this query?

Analytic question

A reader, May 11, 2007 - 9:49 am UTC

Hi Tom,

Thanks for the book expert one on one I've bought. I read carrefuly Chapter 12 : Analytic Functions and may be I've found an error in the explanation (page 557)
----------------------------------
Here, if we look at CLARK again, since we understand........ These are the values of CLARK's salary and the rows preceeding (isn't following instead?)
-----------------
Anyway, I have a question

CREATE TABLE T1
(
TYP_TAR VARCHAR2(3),
FAG_SLC VARCHAR2(1),
TYP_CAT VARCHAR2(2),
PCT_FEE NUMBER(6,4),
PCT_RED NUMBER(6,4),
AMT_MAX_CND_FEE NUMBER(38,17),
AMT_MIN_CND_FEE NUMBER(38,17),
DAT_START_CND DATE,
DAT_END_CND DATE,
SLC_LWR NUMBER(38,17),
SLC_UPR NUMBER(38,17),
TYP_XTR_FAC VARCHAR2(3),
XTR_FAC_VAL VARCHAR2(250),
FAG_STD VARCHAR2(1),
FAG_DONE VARCHAR2(1),
PRC_CTT_BKA_VAL VARCHAR2(1),
AMT_MAX_TAR_FEE NUMBER(38,17),
FAG_APE VARCHAR2(1),
IDE NUMBER(38),
TPER_TYP_PDI VARCHAR2(2)
)
INSERT INTO t1
(typ_tar, fag_slc, typ_cat, pct_fee, pct_red, amt_max_cnd_fee,
amt_min_cnd_fee,
dat_start_cnd,
dat_end_cnd,
slc_lwr, slc_upr, typ_xtr_fac, xtr_fac_val, fag_std, fag_done,
prc_ctt_bka_val, amt_max_tar_fee, fag_ape, ide, tper_typ_pdi
)
VALUES ('02', 'N', '01', 1, NULL, 999999,
40,
TO_DATE ('01/01/1400 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_DATE ('01/01/2900 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
999999, 25000, '02', 'Br/PB', 'Y', 'N',
NULL, 999999, 'Y', 1, '00'
);

INSERT INTO t1
(typ_tar, fag_slc, typ_cat, pct_fee, pct_red, amt_max_cnd_fee,
amt_min_cnd_fee,
dat_start_cnd,
dat_end_cnd,
slc_lwr, slc_upr, typ_xtr_fac, xtr_fac_val, fag_std, fag_done,
prc_ctt_bka_val, amt_max_tar_fee, fag_ape, ide, tper_typ_pdi
)
VALUES ('02', 'N', '01', 14.5, NULL, 999999,
40,
TO_DATE ('01/01/1400 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_DATE ('01/01/2900 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
999999, 25000, '02', 'Br/PB', 'Y', 'N',
NULL, 999999, 'Y', 1, '00'
);

INSERT INTO t1
(typ_tar, fag_slc, typ_cat, pct_fee, pct_red, amt_max_cnd_fee,
amt_min_cnd_fee,
dat_start_cnd,
dat_end_cnd,
slc_lwr, slc_upr, typ_xtr_fac, xtr_fac_val, fag_std, fag_done,
prc_ctt_bka_val, amt_max_tar_fee, fag_ape, ide, tper_typ_pdi
)
VALUES ('02', 'N', '01', 1, NULL, 999999,
40,
TO_DATE ('01/01/1400 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_DATE ('01/01/2900 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
25000, 999999, '02', 'Br/PB', 'Y', 'N',
NULL, 999999, 'Y', 2, '00'
);

INSERT INTO t1
(typ_tar, fag_slc, typ_cat, pct_fee, pct_red, amt_max_cnd_fee,
amt_min_cnd_fee,
dat_start_cnd,
dat_end_cnd,
slc_lwr, slc_upr, typ_xtr_fac, xtr_fac_val, fag_std, fag_done,
prc_ctt_bka_val, amt_max_tar_fee, fag_ape, ide, tper_typ_pdi
)
VALUES ('02', 'N', '02', 0.5, NULL, 999999,
30,
TO_DATE ('01/01/1400 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_DATE ('01/01/2900 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
25000, 999999, '02', 'HB', 'Y', 'N',
NULL, 999999, 'Y', 8, '00'
);

INSERT INTO t1
(typ_tar, fag_slc, typ_cat, pct_fee, pct_red, amt_max_cnd_fee,
amt_min_cnd_fee,
dat_start_cnd,
dat_end_cnd,
slc_lwr, slc_upr, typ_xtr_fac, xtr_fac_val, fag_std, fag_done,
prc_ctt_bka_val, amt_max_tar_fee, fag_ape, ide, tper_typ_pdi
)
VALUES ('01', 'N', '08', 0.0011, NULL, 999999,
40,
TO_DATE ('01/01/1400 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_DATE ('01/01/2900 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
999999, 999999, '###', '######', 'Y', 'N',
NULL, 999999, 'Y', 20, '00'
);
INSERT INTO t1
(typ_tar, fag_slc, typ_cat, pct_fee, pct_red, amt_max_cnd_fee,
amt_min_cnd_fee,
dat_start_cnd,
dat_end_cnd,
slc_lwr, slc_upr, typ_xtr_fac, xtr_fac_val, fag_std, fag_done,
prc_ctt_bka_val, amt_max_tar_fee, fag_ape, ide, tper_typ_pdi
)
VALUES ('01', 'N', '08', 0.0012, NULL, 999999,
40,
TO_DATE ('01/01/1400 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_DATE ('01/01/2900 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
999999, 999999, '###', '######', 'Y', 'N',
NULL, 999999, 'Y', 20, '00'
);

INSERT INTO t1
(typ_tar, fag_slc, typ_cat, pct_fee, pct_red, amt_max_cnd_fee,
amt_min_cnd_fee,
dat_start_cnd,
dat_end_cnd,
slc_lwr, slc_upr, typ_xtr_fac, xtr_fac_val, fag_std, fag_done,
prc_ctt_bka_val, amt_max_tar_fee, fag_ape, ide, tper_typ_pdi
)
VALUES ('01', 'N', '08', 0.0009, NULL, 999999,
40,
TO_DATE ('01/01/1400 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_DATE ('01/01/2900 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'),
999999, 999999, '###', '######', 'Y', 'Y',
NULL, 999999, 'Y', 20, '00'
);

I would like to select several rows plus the min(pct_fee) as in the below select

select ide
,prc_ctt_bka_val
,amt_min_cnd_fee
,fag_slc
,dat_start_cnd
,dat_end_cnd
,fag_ape
,typ_tar
,typ_cat
,slc_lwr
,slc_upr
,typ_xtr_fac
,xtr_fac_val
,min(pct_fee) over( partition by typ_tar, typ_cat, slc_lwr, slc_upr, typ_xtr_fac,xtr_fac_val) min_pct_fee
from t1
where fag_done = 'N'

My problem is that I an getting two rows for ide 1 and 20. I would like to get only one row when there exist more than one ide. Distinct is not what I want

Is it possible to do this using analytics ?

Thanks

May 11, 2007 - 11:38 am UTC

i don't understand what you are looking for.

if you want one row by some key - you are looking for GROUP BY, not analytics.

Maybe a summary table can be used to prevent the full table scan?

ht, May 11, 2007 - 12:01 pm UTC

Tom,
Thanks for the info. I was hoping there was a way to avoid the full table scan the outer join seems to be causing. The query takes over 5 hours so I was hoping I could somehow tune the query.

I'm going to test retrieving distinct batch_ids into another table to see if that speeds up the query. That will probably not help though. Please let me know if you know of a more efficient method to retrieve this data.

Thanks again.

ht

May 11, 2007 - 12:07 pm UTC

no, look at your query

the outer join has NOTHING AT ALL to do with the full scan

you need each and ever row from that table, all of them, each and every single one.

the outer join is NOT causing the full scan.

Your question mandates a full scan.

why do you think, how do you think a full scan can be avoided?

could a summary table that grouped up the table by batch_id be used? sure, but now you have to create it, maintain it.

are you sure you have the right QUERY here? do you understand what this query you are showing us does?

remove the outer join, just query testeevent itself once (b). Now, you have

select
from table
group by col

please, explain how you would "avoid a full scan"

Avoid scanning 832k recs?

ht, May 11, 2007 - 2:21 pm UTC

Tom,
You are absolutely correct. I should not have focused on the full table scan but the actual query elapsed time. I guess my question should be "Is there any other way (besides inserting batch_ids into another table when a batch_id is created) to improve the query elapsed time below?".

select count(*) from testextent2;

  COUNT(*)
----------
    823205

1 row selected.

Elapsed: 00:00:00.66
select count(*) from testbatches2;

  COUNT(*)
----------
        51

1 row selected.

Elapsed: 00:00:00.01

set autotrace on;
SELECT DISTINCT
  2          e.batch_id,
  3          sum(xyz.total)size_mb,
  4          last_value(sum(xyz.total) ignore nulls) over(ORDER BY e.batch_id) carry_over
  5  FROM
  6          testbatches2 e,
  7          (
  8          SELECT
  9                   x.batch_id,
 10                  SUM(x.size_mb) total
 11          FROM
 12                  testextent2 x
 13          WHERE
 14                  x.instance_id =7383
 15          GROUP BY
 16                  x.batch_id
 17          )xyz
 18  WHERE e.batch_id = xyz.batch_id(+)
 19  group by e.batch_id
 20  ORDER BY e.batch_id;

  BATCH_ID    SIZE_MB CARRY_OVER
---------- ---------- ----------
       800      78545      78545
       820      47232      47232
       840      34587      34587
       860      40894      40894
       880      28310      28310
       900      22759      22759
       920      36445      36445
       921      26179      26179
       940      29027      29027
       941      36768      36768
       960      35733      35733
       980      35377      35377
      1000      25364      25364
      1020      39610      39610
      1040      31137      31137
      1041      35676      35676
      1060      34423      34423
      1061      29731      29731
      1080      26837      26837
      1100      34240      34240
      1120      40619      40619
      1140      18641      18641
      1160      45161      45161
      1180      23378      23378
      1200      17084      17084
      1220      19251      19251
      1240      25816      25816
      1260      17200      17200
      1280      20338      20338
      1300                 20338
      1320      36871      36871
      1340      33504      33504
      1360      33514      33514
      1380      47218      47218
      1400      28894      28894
      1420      34509      34509
      1440      36182      36182
      1460      30872      30872
      1480      34677      34677
      1500      30393      30393
      1520       5841       5841
      1540      10515      10515
      1560      10644      10644
      1561       2503       2503
      1562      23232      23232
      1580      25329      25329
      1581      45908      45908
      1600      36838      36838
      1620      43275      43275
      1640      42380      42380
      1660      37154      37154

51 rows selected.

<b>Elapsed: 00:00:00.06</b>

Execution Plan
----------------------------------------------------------
Plan hash value: 575510713

--------------------------------------------------------------------------------
----------

| Id  | Operation             | Name             | Rows  | Bytes | Cost (%CPU)|
Time     |

--------------------------------------------------------------------------------
----------

|   0 | SELECT STATEMENT      |                  |    51 |  1479 |    53   (4)|
00:00:01 |

|   1 |  WINDOW BUFFER        |                  |    51 |  1479 |    53   (4)|
00:00:01 |

|   2 |   SORT GROUP BY       |                  |    51 |  1479 |    53   (4)|
00:00:01 |

|*  3 |    HASH JOIN OUTER    |                  |    51 |  1479 |    52   (2)|
00:00:01 |

<b>|   4 |     TABLE ACCESS FULL | TESTBATCHES2     |    51 |   663 |     3   (0)|
00:00:01 |</b>

|   5 |     VIEW              |                  |    51 |   816 |    48   (0)|
00:00:01 |

|   6 |      HASH GROUP BY    |                  |    51 |   408 |    48   (0)|
00:00:01 |

|*  7 |       INDEX RANGE SCAN| TESTEXTENT2_IDX1 |  5488 | 43904 |    48   (0)|
00:00:01 |

--------------------------------------------------------------------------------
----------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("E"."BATCH_ID"="XYZ"."BATCH_ID"(+))
   7 - access("X"."INSTANCE_ID"=7383)
       filter("X"."INSTANCE_ID"=7383)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
         28  recursive calls
          0  db block gets
        123  consistent gets
          1  physical reads
          0  redo size
       2055  bytes sent via SQL*Net to client
        502  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
         51  rows processed

set autotrace off;
--------------------------------------------------------------
--------------------------------------------------------------
-- test #1
--------------------------------------------------------------

select count(*) from testextent2;

  COUNT(*)
----------
    823205

1 row selected.

Elapsed: 00:00:00.53

set autotrace on;
SELECT DISTINCT
  2          e.batch_id,
  3          sum(xyz.total)size_mb,
  4          last_value(sum(xyz.total) ignore nulls) over(ORDER BY e.batch_id) carry_over
  5  FROM
  6          testextent2 e,
  7          (
  8          SELECT
  9                   x.batch_id,
 10                  SUM(x.size_mb) total
 11          FROM
 12                  testextent2 x
 13          WHERE
 14                  x.instance_id =7383
 15          GROUP BY
 16                  x.batch_id
 17          )xyz
 18  WHERE e.batch_id = xyz.batch_id(+)
 19  group by e.batch_id
 20  ORDER BY e.batch_id;

  BATCH_ID    SIZE_MB CARRY_OVER
---------- ---------- ----------
       800 6.2573E+10 6.2573E+10
       820   52474752   52474752
       840   40017159   40017159
       860   13290550   13290550
       880    7219050    7219050
       900    4005584    4005584
       920    7216110    7216110
       921    3926850    3926850
       940    9259613    9259613
       941    8493408    8493408
       960    8790318    8790318
       980   35978409   35978409
      1000    7380924    7380924
      1020    9466790    9466790
      1040    6351948    6351948
      1041    7741692    7741692
      1060    8261520    8261520
      1061   10584236   10584236
      1080    3918202    3918202
      1100   25816960   25816960
      1120   19294025   19294025
      1140    4063738    4063738
      1160   25606287   25606287
      1180   27819820   27819820
      1200   11958800   11958800
      1220   17961183   17961183
      1240    7512456    7512456
      1260    8514000    8514000
      1280    4738754    4738754
      1300               4738754
      1320   11872462   11872462
      1340    8778048    8778048
      1360    8646612    8646612
      1380   69032716   69032716
      1400    8148108    8148108
      1420    7453944    7453944
      1440   10963146   10963146
      1460   75018960   75018960
      1480   12206304   12206304
      1500    6230565    6230565
      1520    4485888    4485888
      1540   10262640   10262640
      1560    2182020    2182020
      1561    1056266    1056266
      1562   15542208   15542208
      1580   46605360   46605360
      1581   17307316   17307316
      1600   16282396   16282396
      1620   58377975   58377975
      1640   14027780   14027780
      1660    7802340    7802340

51 rows selected.

<b>Elapsed: 00:00:03.90</b>

Execution Plan
----------------------------------------------------------
Plan hash value: 698401733

--------------------------------------------------------------------------------
-------------------

| Id  | Operation              | Name             | Rows  | Bytes |TempSpc| Cost
 (%CPU)| Time     |

--------------------------------------------------------------------------------
-------------------

|   0 | SELECT STATEMENT       |                  |    51 |   969 |       |  547
1  (12)| 00:01:06 |

|   1 |  WINDOW BUFFER         |                  |    51 |   969 |       |  547
1  (12)| 00:01:06 |

|   2 |   SORT GROUP BY NOSORT |                  |    51 |   969 |       |  547
1  (12)| 00:01:06 |

|   3 |    MERGE JOIN OUTER    |                  |    39M|   712M|       |  547
1  (12)| 00:01:06 |

|   4 |     SORT JOIN          |                  |   823K|  2411K|    18M|  490
8   (3)| 00:00:59 |

<b>|   5 |      TABLE ACCESS FULL | TESTEXTENT2      |   823K|  2411K|       |  262
8   (2)| 00:00:32 |</b>

|*  6 |     SORT JOIN          |                  |    51 |   816 |       |    4
9   (3)| 00:00:01 |

|   7 |      VIEW              |                  |    51 |   816 |       |    4
8   (0)| 00:00:01 |

|   8 |       HASH GROUP BY    |                  |    51 |   408 |       |    4
8   (0)| 00:00:01 |

|*  9 |        INDEX RANGE SCAN| TESTEXTENT2_IDX1 |  5488 | 43904 |       |    4
8   (0)| 00:00:01 |

--------------------------------------------------------------------------------
-------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("E"."BATCH_ID"="XYZ"."BATCH_ID"(+))
       filter("E"."BATCH_ID"="XYZ"."BATCH_ID"(+))
   9 - access("X"."INSTANCE_ID"=7383)
       filter("X"."INSTANCE_ID"=7383)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      13427  consistent gets
          0  physical reads
          0  redo size
       2162  bytes sent via SQL*Net to client
        502  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
         51  rows processed

set autotrace off;
--------------------------------------------------------------
exit;
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

May 11, 2007 - 3:41 pm UTC

what is unacceptable here?

Takes 5 hours.

ht, May 11, 2007 - 3:57 pm UTC

Tom,
The scan of 800k+ recs in the second example causes the superset of data 5 hours to return. I'm just wondering if there is a method to avoid the 800k+ rec scan or if I should just create a table that stores the batch_ids upon batch_id creation.

May 11, 2007 - 4:33 pm UTC

if it takes 5 hours to full scan 800k records, you have a serious problem.

How about a TKPROF of that there query with wait events enabled so we can see what it is doing, how it is doing it, how long it is waiting for what.

Analytics to generate multiple types fo date dimensions

David, May 11, 2007 - 6:40 pm UTC

Tom,
I'm trying to get the query that joins
my CONFIG table with ALL_OBJECTS and
produces as many records per each config item as
NUM_RECORDS field indicate:

--Generate dynamic date dimension
create table CONFIG
(
NAME VARCHAR2(50),
NUM_RECORDS NUMBER
);

insert into CONFIG
values ('TEST_2_rec',2);

insert into CONFIG
values ('TEST_3_rec',3);

insert into CONFIG
values ('TEST_5_rec',5);

e.g. there should be 3 groups of records returned,
one group with 2 records, another with 3 and lst one with 5.

I only came up with lame sl as this
one which apparently doesn't work:

col name format a12
SELECT rownum as rnum,name,NUM_RECORDS
from ALL_OBJECTS JOIN CONFIG ON (1=1)
where rownum<= NUM_RECORDS
;

RNUM NAME NUM_RECORDS
---------- ------------ -----------
1 TEST_2_rec 2
2 TEST_2_rec 2
3 TEST_3_rec 3
4 TEST_5_rec 5
5 TEST_5_rec 5

I tried to produce this with analytics,
but results are too scary to post :)

RNUM NAME NUM_RECORDS
---------- ------------ -----------
1 TEST_2_rec 2
2 TEST_2_rec 2

3 TEST_3_rec 3
4 TEST_3_rec 3
5 TEST_3_rec 3

6 TEST_5_rec 5
7 TEST_5_rec 5
8 TEST_5_rec 5
9 TEST_5_rec 5
10 TEST_5_rec 5

Any help or advice would be appreciated :)

Thank you for your time
David

May 14, 2007 - 12:42 pm UTC

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:249696800346831435

To David

martina, May 12, 2007 - 1:35 am UTC

Hi, is that what you want?

INFO/INFO_PROD> r
  1  select name,count(*)
  2  from ( select c.*
  3         from
  4         mpa_config c
  5         ,
  6         (select level l from dual
  7           connect by level < 10 -- Here you need the maximum num_records
  8         ) d
  9        where c.num_records >= d.l
 10       )
 11* group by name

NAME                                                 COUNT(*)
-------------------------------------------------- ----------
TEST_2_rec                                                  2
TEST_3_rec                                                  3
TEST_5_rec                                                  5

INFO/INFO_PROD>

is that what you want?

martina

A reader, May 12, 2007 - 9:27 am UTC

HI Tom,
Back to the question above

CREATE TABLE T1
(
TYP_TAR VARCHAR2(3),
FAG_SLC VARCHAR2(1),
TYP_CAT VARCHAR2(2),
PCT_FEE NUMBER(6,4),
PCT_RED NUMBER(6,4),
AMT_MAX_CND_FEE NUMBER(38,17),
AMT_MIN_CND_FEE NUMBER(38,17),
DAT_START_CND DATE,
DAT_END_CND DATE,
SLC_LWR NUMBER(38,17),
SLC_UPR NUMBER(38,17),
TYP_XTR_FAC VARCHAR2(3),
XTR_FAC_VAL VARCHAR2(250),
FAG_STD VARCHAR2(1),
FAG_DONE VARCHAR2(1),
PRC_CTT_BKA_VAL VARCHAR2(1),
AMT_MAX_TAR_FEE NUMBER(38,17),
FAG_APE VARCHAR2(1),
IDE NUMBER(38),
TPER_TYP_PDI VARCHAR2(2)
)

../..

One of my colleague is doing the following very non performant code:

cursor c1 is
select ,prc_ctt_bka_val
,amt_min_cnd_fee
,fag_slc
,dat_start_cnd
,dat_end_cnd
,pct_fee,fag_ape
,typ_tar
,typ_cat
,slc_lwr
,slc_upr
,typ_xtr_fac
,xtr_fac_val
from t1
where fag_done = 'N' ;

cursor c2 is
select ,prc_ctt_bka_val
,amt_min_cnd_fee
,fag_slc
,dat_start_cnd
,dat_end_cnd
,pct_fee,fag_ape
,typ_tar
,typ_cat
,slc_lwr
,slc_upr
,typ_xtr_fac
,xtr_fac_val
from t1
where typ_tar = p_typ_tar
and typ_cat = p_typ_cat
and slc_lwr = p_slc_lwr
and slc_upr = p_slc_upr
and typ_xtr_fac = p_typ_xtr_fac
and xtr_fac_val = p_xtr_fac_val
and fag_done = 'N'
and pct_fee = (select min(pct_fee) from t1
where typ_tar = p_typ_tar
and typ_cat = p_typ_cat
and slc_lwr = p_slc_lwr
and slc_upr = p_slc_upr
and typ_xtr_fac = p_typ_xtr_fac
and xtr_fac_val = p_xtr_fac_val
and fag_done = 'N' );

r2 c2%rowtype;

BEGIN
for r1 in c1 LOOP
open c2(r1.typ_tar,r1.typ_cat,r1.slc_lwr,r1.typ_xtr_fac,xtr_fac_val);
fetch c2 into r2;
if c2%notfound then
insert into table t2 values comming from cursor r1;
else
insert into table t2 values comming from cursor r2;
end if;
close c2;
END;

I would like to replace the above code by this one
BEGIN
insert into table t2
select ,prc_ctt_bka_val
,amt_min_cnd_fee
,fag_slc
,dat_start_cnd
,dat_end_cnd
,pct_fee,fag_ape
,typ_tar
,typ_cat
,slc_lwr
,slc_upr
,typ_xtr_fac
,xtr_fac_val
,min(pct_fee) over( PARTITION BY typ_tar, typ_cat, slc_lwr, slc_upr, typ_xtr_fac,xtr_fac_val) min_pct_fee
from t1
where fag_done = 'N' ;

END;

1. knowing that table t1 has no any unique constraint (PK or UK) implemented how can I ensure that the above analytic select gives only one row(the one with the min(pct_fee) when there is duplicate (typ_tar ,typ_cat,slc_lwr,slc_upr ,typ_xtr_fac ,xtr_fac_val )

2. Could you please advise something else if you see that both we are wrong

Thanks very much

May 14, 2007 - 12:59 pm UTC

if you want a single recrod by x,y,z,..... you use group by (aggregation), not analytics.

analytics specifically do not "squish out" records, you seem to want one observation by key - hence, you want aggregations.

something like:

ops$tkyte%ORA10GR2> select id,
  2         min(fnm) KEEP (dense_rank first order by rowid) fnm,
  3         min(lnm) KEEP (dense_rank first order by rowid) lnm
  4    from test
  5   group by id;

        ID FNM                            LNM
---------- ------------------------------ ------------------------------
         1 Tom                            Jones
         2 Sue                            Snue
         3 Robert                         Smith

but you group by your big list of columns, not just id

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:228182900346230020

martina, May 13, 2007 - 1:57 am UTC

hi,

i think the 2 cursors you show insert all rows of t1 into t2 fetching them in a complicated way. if you only wanted the c2 values c2 would be sufficient?

This might be what you are looking for:

insert into t2
select .....
from
( select d.*
        ,row_number() over (partition by  typ_tar, typ_cat, slc_lwr, slc_upr, typ_xtr_fac,xtr_fac_val
             order by pct_fee) rd
              from t1 d ) z
WHERE RD = 1
;

if it is not please clarify WHAT you want to achieve, not HOW you want to achieve something.

regards, martina

Thank you

David, May 14, 2007 - 3:34 pm UTC

Thank you very much to Tom and Martina
:)

A reader, May 15, 2007 - 5:46 am UTC

Thanks Tom, Thanks Marina,

What I would like is

select a,
b,
c,
d,
x,
y,
z,
min(r)
from t
group by
x,
y,
z

This will not compile because I am also selecting a, b, c and d

If there is a possibility to select all what I want using group by (x,y,z) then I will use group by.

Since I've not succeed to do it, I am using analytic function which allows me to have select like this

select a,
b,
c,
d,
x,
y,
z,
min(r) over (partition by x,y,z)
from t

May 15, 2007 - 4:35 pm UTC

answer is already here:

ops$tkyte%ORA10GR2> select id,
  2         min(fnm) KEEP (dense_rank first order by rowid) fnm,
  3         min(lnm) KEEP (dense_rank first order by rowid) lnm
  4    from test
  5   group by id;

id = x,y,z
you'll min(a) keep (dense....) a

and just min(r)

Finding Previous Item within range

Jonty, May 15, 2007 - 5:51 pm UTC

Hi,

I want to find out last Item No used in Previous year for each item - catch is - if same Item as current then take which ever is before that item.

I have given code below, which create sample table TB_TRANS, insert st. to populate data, TB_TRANS_OUT output table and how results should look.

Scenario #
1. for Tran_no 1 record - item is 10390 - latest item used in last year was also 10390, which is same, go beyond that and get 2nd last which is 17365.
2. for Tran_no 2 record - same as 1
3. for Tran_no 3 record - item is 17365- latest item used in last year was 10390, previous item no will be 10390.

I hope this will give good idea what I am looking for.

I tried using Analytical Function but somehow I am not getting same results. I can get last item in previous year but scenario 1 I am not able to get it.

Any help will be highly appreciated.

CREATE TABLE TB_TRANS
(TRAN_DATE DATE,
TRAN_NO NUMBER,
CUST_NO NUMBER,
ITEM_NO NUMBER,
QTY NUMBER)
/

INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '07/29/2006 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 6, 101, 15448, 40); 
INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '07/04/2006 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 5, 101, 10390, 30); 
INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '07/01/2006 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 4, 101, 17365, 20); 
INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '07/01/2006 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 3, 101, 17365, 30); 
INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '05/20/2006 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 2, 101, 10390, 30); 
INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '03/04/2006 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 1, 101, 10390, 30); 
INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '12/24/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 1000, 101, 10390, 30); 
INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '12/14/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 999, 101, 17365, 30); 
INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '12/04/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 998, 101, 10390, 30); 
INSERT INTO TB_TRANS ( TRAN_DATE, TRAN_NO, CUST_NO, ITEM_NO, QTY ) VALUES ( 
 TO_Date( '11/04/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), 997, 101, 15448, 30); 
COMMIT;

SELECT * FROM TB_TRANS
ORDER BY tran_date, TRaN_NO
/

CREATE TABLE TB_TRANS_OUT
(TRAN_DATE DATE,
TRAN_NO NUMBER,
CUST_NO NUMBER,
ITEM_NO NUMBER,
QTY NUMBER,
PREV_ITEM_NO NUMBER
)
/

/* EXPECTED RESULT SET */

TRAN_DATE            TRAN_NO CUST_NO ITEM_NO QTY 
11/4/2005            997     101     15448   30  
12/4/2005            998     101     10390   30  
12/14/2005           999     101     17365   30  
12/24/2005           1000    101     10390   30  
3/4/2006             1       101     10390   30  
5/20/2006            2       101     10390   30  
7/1/2006             3       101     17365   30  
7/1/2006             4       101     17365   20  
7/4/2006             5       101     10390   30  
7/29/2006            6       101     15448   40  


SELECT * FROM TB_TRANS_OUT
ORDER BY tran_date, TRaN_NO
/

TRAN_DATE            TRAN_NO CUST_NO ITEM_NO QTY PREV_ITEM_NO 
11/4/2005            997     101     15448   30               
12/4/2005            998     101     10390   30               
12/14/2005           999     101     17365   30               
12/24/2005           1000    101     10390   30               
3/4/2006             1       101     10390   30  17365        
5/20/2006            2       101     10390   30  17365        
7/1/2006             3       101     17365   30  10390        
7/1/2006             4       101     17365   20  10390        
7/4/2006             5       101     10390   30  17365        
7/29/2006            6       101     15448   40  10390

Thanks a lot in advance.

Jonty

May 16, 2007 - 10:18 am UTC

give us more details - will this always be for JUST two years

are the two years INPUTS into the query itself?

analytics will probably not be used here.

To: Jonty

Narendra, May 16, 2007 - 6:08 am UTC

Following is something I could think of:
Assumptions:
1. Your year starts on 1st January and ends on 31st December.
2. CUST_NO remains same for all the rows. (If it changes the query will change a bit).

SQL> select a.tran_date, a.tran_no, a.cust_no, a.item_no, a.qty,
  2  ( select b.item_no from ( select item_no, tran_date from tb_trans order by tran_date desc) b
  3    where b.item_no <> a.item_no and b.tran_date < trunc(a.tran_date, 'YYYY')
  4    and rownum = 1 ) prev_item_no
  5  from tb_trans a
  6  ORDER BY tran_date, TRaN_NO ;

TRAN_DATE      TRAN_NO    CUST_NO    ITEM_NO        QTY PREV_ITEM_NO
----------- ---------- ---------- ---------- ---------- ------------
04-Nov-05          997        101      15448         30 
04-Dec-05          998        101      10390         30 
14-Dec-05          999        101      17365         30 
24-Dec-05         1000        101      10390         30 
04-Mar-06            1        101      10390         30        17365
20-May-06            2        101      10390         30        17365
01-Jul-06            3        101      17365         30        10390
01-Jul-06            4        101      17365         20        10390
04-Jul-06            5        101      10390         30        17365
29-Jul-06            6        101      15448         40        10390

10 rows selected

But I sincerely feel there might be a better approach.

Follw-up

Jonty, May 16, 2007 - 11:33 am UTC

Hi,

Thanks for looking into it.

Unfortunately this will not be a good solution.

1. I have to do it on High Volume data (1 to 300 mil records), there will be many more calculations on same record. I want to avoid using table twice in the query. I am more thinking to use Analycic Function.
2. Years will be varying but to make it easier lets assume Year is fix.
3. Customer No. - there will be many more customers. for each custom and item I have to do this.
4.

May 17, 2007 - 10:46 am UTC

2) no, how about you give the real requirement.

I don't see the opportunity for analytics really here. You would partition by year, but you want last years stuff in this year - analytics work within a partition - not across partitions

how does customer fit into all of this?? you just mentioned "customer and item"

please be amazingly precise in your specification and if customer is part of the equation, make sure your example includes at least two customers to demonstrate with!

Richard, May 16, 2007 - 4:13 pm UTC

In our data warehouse environment, we are receiving updated sales orders on a nightly basis from a 3rd party system via ODBC.
The process is prebuilt and we cannot modify. The only solutions we have available are to rebuild the entire interface process, or add additional processing on top of the delivered functionality.
Management decided we are better off adding customized processes on top of the delivered product.
We can only add new interfaces for additional data needed from the 3rd party app beyond the scope of the delivered interfaces.

SQL*Plus: Release 10.1.0.2.0 - Production on Wed May 16 14:25:12 2007

Copyright (c) 1982, 2004, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

SQL> drop table historical_txns;

Table dropped.

SQL> create table historical_txns (
  2    order_date date not null,
  3    order_no   integer not null,
  4    order_line integer not null,
  5    txn_date   date not null,
  6    blocked_flag varchar2(1) not null,
  7    constraint pk_historical_txns
  8      primary key (order_date, order_no, order_line, txn_date));

Table created.

SQL> drop table order_status_hist;

Table dropped.

SQL> create table order_status_hist (
  2    order_date date not null,
  3    order_no   integer not null,
  4    order_line integer not null,
  5    effective_dt date not null,
  6    expired_dt   date default to_date('99991231','yyyymmdd') not null,
  7    constraint ck_eff_exp_dates
  8      check (effective_dt <= expired_dt),
  9    constraint pk_order_status_hist
 10      primary key (order_date, order_no, order_line, effective_dt));

Table created.

SQL> alter session set nls_date_format='yyyymmdd';

Session altered.

SQL> insert into historical_txns values (to_date('20070201'),100,1,to_date('20070201'),'Y');

1 row created.

SQL> insert into historical_txns values (to_date('20070201'),100,1,to_date('20070203'),'N');

1 row created.

SQL> insert into historical_txns values (to_date('20070201'),100,1,to_date('20070210'),'Y');

1 row created.

SQL> insert into historical_txns values (to_date('20070201'),100,1,to_date('20070211'),'Y');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from historical_txns;

ORDER_DA   ORDER_NO ORDER_LINE TXN_DATE B
-------- ---------- ---------- -------- -
20070201        100          1 20070201 Y
20070201        100          1 20070203 N
20070201        100          1 20070210 Y
20070201        100          1 20070211 Y

ASSUMPTIONS:A given order line may be blocked and released multiple times, depending on item availability, customer credit, etc.
In the order_status_hist table, only one record for a given order line may be "active" at any given time. For example,
given the above data, the following is the expected output:

SQL> select * from order_status_hist;

ORDER_DA   ORDER_NO ORDER_LINE EFFECTIV EXPIRED_
-------- ---------- ---------- -------- --------
20070201        100          1 20070201 20070203
20070201        100          1 20070210 99991231

QUESTION:
To do the initial load of data from historical_txns into order_status_hist we need an analytic SQL or pl/sql procedure?
Note that the transaction dated 11-Feb-2007 did not change the flag.
Going forward, as inserts and updates are processed on the ORDERS table which contains a BLOCKED column with Y or N values, we want to capture changes to the BLOCKED column via a trigger.
If the updated record is blocked and the order was previously not blocked, a new effective date record is added to the order_status_hist table.
If the updated record unblocks a previously blocked order, the currently active record is given the current date as the expired_date.
If the BLOCKED field did not change, do nothing to the history table.
The following trigger should perform the required ongoing maintenance, correct?

CREATE OR REPLACE TRIGGER order_blocked_status_trg
  BEFORE DELETE or INSERT OR UPDATE OF blocked
  ON ORDERS
  FOR EACH ROW
BEGIN
  IF INSERTING
     and :new.blocked='Y'
  THEN
    -- when adding new records that are blocked, also add
    -- records to the blocked status table for tracking
    INSERT INTO order_status_hist (order_date, order_no, order_line, effective_date)
    VALUES (:new.order_date, :new.order_no, :new.order_line, trunc(SYSDATE));
  ELSIF UPDATING
        AND :OLD.blocked <> :NEW.blocked
  THEN
    -- if the line was previously blocked and is now unblocked,
    -- expire the currently blocked status record
    UPDATE order_status_hist
       SET expired_dt = trunc(sysdate)
     where order_date = :old.order_date
       and order_no = :old.order_no
       and order_line = :old.order_line
       and :old.blocked = 1
       and trunc(sysdate) between effective_dt and expired_dt;
    -- If the line was previously unblocked and is now blocked,
    -- create a blocked status record.
    if :new.blocked=1 then
    INSERT INTO order_status_hist (order_date, order_no, order_line, effective_date)
    values (:new.order_date, :new.order_no, :new.order_line, trunc(sysdate));
    end if;
  ELSIF DELETING
  then
    -- if the record being deleted has blocked status records,
    -- also delete these blocked status records for data integrity
    delete from order_status_hist
     where order_date = :old.order_date
       and order_no = :old.order_no
       and order_line = :old.order_line;
  END IF;
END order_blocked_status_trg;
/

May 17, 2007 - 10:55 am UTC

please, I'm begging you to not use that expired date in the year 9999 - please, use null. you'll whack out the optimizers ability to estimate card= values.

I would not use a trigger - hate triggers.

If I can give you a query (using analytics) from the one table to provide you with the desired output, is that sufficient?

Richard, May 17, 2007 - 12:27 pm UTC

Analytic sql would be the ideal solution for me. I can update the process to use nvl(expired_dt,sysdate) instead of 9999/12/31 easily enough.

TO Richard

Kevin, May 17, 2007 - 12:54 pm UTC

> I can update the process to use nvl(expired_dt,sysdate) instead of 9999/12/31 easily enough.

Don't do this:
SYSDATE <= NVL(expired_dt,sysdate)

Instead, use:
(EXPIRED_DT IS NULL OR SYSDATE <= EXPIRED_DT)

Yes, we can see the equivalence, and yes, yours is more "succinct". But the NVL() on EXPIRED_DT will prevent the CBO from using expired_dt column stats when estimating cardinities. In fact, regardless of distribution on EXPIRED_DT, the CBO will use the default 5% selectivity estimate for your version of the constraint.

GROUP BY order date

A reader, May 21, 2007 - 1:05 pm UTC

tom, I have a query and built out of 3- 4 sub queries.

SELECT date,product_id,product_structure, MAX (DECODE (side, 0, cnt, 0)) askprice, MAX (DECODE (side, 1, cnt, 0)) price
FROM (SELECT TRUNC(date) date,product_id,product_structure, side, COUNT (*) cnt
FROM (SELECT order_id, product_id, side, last_user, date,
NVL (LAG (price) OVER (PARTITION BY order_id ORDER BY date ASC), NULL ) previous_price, price,
NVL (LAG (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) previous_status, status,
NVL (LEAD (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) next_status
FROM order o,user u
where date >= TO_DATE ('15-Mar-2007','DD-Mon-YYYY') AND date <= TO_DATE ('16-May-2007','DD-Mon-YYYY')
AND last_user = u.code ) o, products p
WHERE 1 = 1
AND product_id = p.pid(+)
and status = 2
and ( nvl(previous_status,0) <>2 or price <> nvl(previous_price,0))
GROUP BY TRUNC(date),product_id, product_structure,side
)
GROUP BY date,product_id, product_structure

the problem is, if I take off the date from the group by clause i get total askprice + price = 1656
and if i put the date clause (as above ) i get
askprice + price = 1510

looking at the query , do you see this happening ?

May 21, 2007 - 1:27 pm UTC

I don't understand. If you take date off of the group by - the query no longer FUNCTIONS.

nvl( x, null ) is 'redundant'

GROUP BY DATE

A reader, May 21, 2007 - 1:31 pm UTC

SELECT product_id,product_structure, MAX (DECODE (side, 0, cnt, 0)) askprice, MAX (DECODE (side, 1, cnt, 0)) price
FROM (SELECT product_id,product_structure, side, COUNT (*) cnt
FROM (SELECT order_id, product_id, side, last_user, date,
NVL (LAG (price) OVER (PARTITION BY order_id ORDER BY date ASC), NULL ) previous_price, price,
NVL (LAG (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) previous_status, status,
NVL (LEAD (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) next_status
FROM order o,user u
where date >= TO_DATE ('15-Mar-2007','DD-Mon-YYYY') AND date <= TO_DATE ('16-May-2007','DD-Mon-YYYY')
AND last_user = u.code ) o, products p
WHERE 1 = 1
AND product_id = p.pid(+)
and status = 2
and ( nvl(previous_status,0) <>2 or price <> nvl(previous_price,0))
GROUP BY product_id, product_structure,side
)
GROUP BY product_id, product_structure

but now, if you sum askprice + price is not same as before
(
even though we have
where date >= TO_DATE ('15-Mar-2007','DD-Mon-YYYY') AND date <= TO_DATE ('16-May-2007','DD-Mon-YYYY')
AND last_user = u.code ) o, products p
)

May 21, 2007 - 1:55 pm UTC

I'm not understanding it - I don't see any outputs, so I'm not sure what you mean really.

the CODE button is really good for things CODE related (fixed width font)

GROUP BY DATE

A reader, May 21, 2007 - 1:32 pm UTC

by the way , where do you see nvl(x,null) ?

May 21, 2007 - 1:56 pm UTC

NVL (LAG (price) OVER (PARTITION BY order_id ORDER BY date ASC), NULL ) previous_price,

x = lag(price) over (partion by order_id order by date asc)

nvl( x, null ) previous_price

GROUP BY DATE

A reader, May 21, 2007 - 2:17 pm UTC

code 1:

SELECT  <b>date</b>,product_id,product_structure, MAX (DECODE (side, 0, cnt, 0)) askprice, MAX (DECODE (side, 1, cnt, 0)) price
    FROM  (SELECT  <b>TRUNC(date)</b> date,product_id,product_structure, side, COUNT (*) cnt
            FROM  (SELECT order_id, product_id, side, last_user, date,
                      NVL (LAG (price) OVER (PARTITION BY order_id ORDER BY date ASC), NULL ) previous_price, price,
                      NVL (LAG (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) previous_status, status,
                      NVL (LEAD (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) next_status
                  FROM  order o,user u
                  where  date >= TO_DATE ('15-Mar-2007','DD-Mon-YYYY') AND  date <= TO_DATE ('16-May-2007','DD-Mon-YYYY')
                  AND  last_user = u.code ) o, products p
            WHERE  1 = 1
            AND    product_id = p.pid(+)
            and    status = 2
            and  ( nvl(previous_status,0) <>2 or  price <> nvl(previous_price,0))
            GROUP BY  <b>TRUNC(date)</b>,product_id, product_structure,side
            )
GROUP BY <b>date</b>,product_id, product_structure

code 2:

SELECT product_id,product_structure, MAX (DECODE (side, 0, cnt, 0)) askprice, MAX (DECODE (side, 1, cnt, 0)) price
  FROM (SELECT product_id,product_structure, side, COUNT (*) cnt
        FROM (SELECT order_id, product_id, side, last_user, date,
              NVL (LAG (price) OVER (PARTITION BY order_id ORDER BY date ASC), NULL ) previous_price, price,
              NVL (LAG (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) previous_status, status,
              NVL (LEAD (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) next_status
            FROM order o,user u
            where date >= TO_DATE ('15-Mar-2007','DD-Mon-YYYY') AND date <= TO_DATE ('16-May-2007','DD-Mon-YYYY')
            AND last_user = u.code ) o, products p
        WHERE 1 = 1
        AND  product_id = p.pid(+)
        and  status = 2
        and ( nvl(previous_status,0) <>2 or price <> nvl(previous_price,0))
        GROUP BY product_id, product_structure,side
        )
GROUP BY product_id, product_structure

now in the above two queries, in outer query one has date in the group by clause and the 2nd does not, and both has same where clause.

Q: DO YOU see any reason why , if i sum up the counts of askpirce + price , counts will be different ?

May 22, 2007 - 9:18 am UTC

it would be so cool to see the outputs you say are different just to, well, sort of see what it is you are comparing.

GROUP BY DATE

A reader, May 21, 2007 - 2:18 pm UTC

code 1:

SELECT date,product_id,product_structure, MAX (DECODE (side, 0, cnt, 0)) askprice, MAX (DECODE (side, 1, cnt, 0)) price
FROM (SELECT TRUNC(date) date,product_id,product_structure, side, COUNT (*) cnt
FROM (SELECT order_id, product_id, side, last_user, date,
NVL (LAG (price) OVER (PARTITION BY order_id ORDER BY date ASC), NULL ) previous_price, price,
NVL (LAG (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) previous_status, status,
NVL (LEAD (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) next_status
FROM order o,user u
where date >= TO_DATE ('15-Mar-2007','DD-Mon-YYYY') AND date <= TO_DATE ('16-May-2007','DD-Mon-YYYY')
AND last_user = u.code ) o, products p
WHERE 1 = 1
AND product_id = p.pid(+)
and status = 2
and ( nvl(previous_status,0) <>2 or price <> nvl(previous_price,0))
GROUP BY TRUNC(date),product_id, product_structure,side
)
GROUP BY date,product_id, product_structure

code 2:

SELECT product_id,product_structure, MAX (DECODE (side, 0, cnt, 0)) askprice, MAX (DECODE (side, 1, cnt, 0)) price
FROM (SELECT product_id,product_structure, side, COUNT (*) cnt
FROM (SELECT order_id, product_id, side, last_user, date,
NVL (LAG (price) OVER (PARTITION BY order_id ORDER BY date ASC), NULL ) previous_price, price,
NVL (LAG (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) previous_status, status,
NVL (LEAD (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) next_status
FROM order o,user u
where date >= TO_DATE ('15-Mar-2007','DD-Mon-YYYY') AND date <= TO_DATE ('16-May-2007','DD-Mon-YYYY')
AND last_user = u.code ) o, products p
WHERE 1 = 1
AND product_id = p.pid(+)
and status = 2
and ( nvl(previous_status,0) <>2 or price <> nvl(previous_price,0))
GROUP BY product_id, product_structure,side
)
GROUP BY product_id, product_structure

now in the above two queries, in outer query one has date in the group by clause and the 2nd does not, and both has same where clause.

Q: DO YOU see any reason why , if i sum up the counts of askpirce + price , counts will be different ?

GROUP BY DATE -- worked ....

A reader, May 21, 2007 - 5:15 pm UTC

SELECT date,product_id,product_structure, MAX (DECODE (side, 0, cnt, 0)) askprice, MAX (DECODE (side, 1, cnt, 0)) price
FROM (SELECT TRUNC(date) product_id,product_structure, side, COUNT (*) cnt
FROM (SELECT order_id, product_id, side, last_user, date,
NVL (LAG (price) OVER (PARTITION BY order_id ORDER BY date ASC), NULL ) previous_price, price,
NVL (LAG (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) previous_status, status,
NVL (LEAD (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) next_status
FROM order o,user u
where date >= TO_DATE ('15-Mar-2007','DD-Mon-YYYY') AND date <= TO_DATE ('16-May-2007','DD-Mon-YYYY')
AND last_user = u.code ) o, products p
WHERE 1 = 1
AND product_id = p.pid(+)
and status = 2
and ( nvl(previous_status,0) <>2 or price <> nvl(previous_price,0))
GROUP BY TRUNC(date),product_id, product_structure,side
)
GROUP BY date,product_id, product_structure

did not work..

but after changing the place of date column it produced same results...

SELECT product_id,product_structure,date, MAX (DECODE (side, 0, cnt, 0)) askprice, MAX (DECODE (side, 1, cnt, 0)) price
FROM (SELECT product_id,product_structure, side,TRUNC(date) COUNT (*) cnt
FROM (SELECT order_id, product_id, side, last_user, date,
NVL (LAG (price) OVER (PARTITION BY order_id ORDER BY date ASC), NULL ) previous_price, price,
NVL (LAG (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) previous_status, status,
NVL (LEAD (status) OVER (PARTITION BY order_id ORDER BY date ASC),NULL ) next_status
FROM order o,user u
where date >= TO_DATE ('15-Mar-2007','DD-Mon-YYYY') AND date <= TO_DATE ('16-May-2007','DD-Mon-YYYY')
AND last_user = u.code ) o, products p
WHERE 1 = 1
AND product_id = p.pid(+)
and status = 2
and ( nvl(previous_status,0) <>2 or price <> nvl(previous_price,0))
GROUP BY product_id, product_structure,side,TRUNC(date)
)
GROUP BY product_id, product_structure,date

Analytics and PLSQL

A reader, May 23, 2007 - 8:13 am UTC

Hi Tom,

I have the following plsql code in a Procedure

Execute immediate 'insert into t1 (aa,
bb,
cc,
dd)
select a,
b,
c,
min (d) over (partition by a,b)
from t
where fag_done = ''N''
';

The trace file when "tkprofed" gives for this select a ration parse/execute = 1 which needs to be corrected.

Could you please help me to correct this high ration of parse count to execute count

Thanks a lot

Any Analytical funtion of HELP

Hitesh, June 14, 2007 - 2:48 am UTC

Hi Tom,

I am trying to calculate %variation of the no of records inserted past hr with the no of records inserted past week.

Is there any analytical funtion to do that?

Select 'T' "Table Name", A.Hrly_Cnt "Past Hour Statistics", B.Weekly_Cnt "Past Week Statistics", Ceil((B.Weekly_Cnt-A.Hrly_Cnt)* 100 /(B.Weekly_Cnt)) "%Variation"
from
(
Select /*+Parallel(X,4) */ Count(*) Hrly_cnt
from
t X
Where
Insert_date >= Sysdate-1/24 And Insert_date < Sysdate) A,
(
Select /*+Parallel(Y,4) */ Ceil(Count(*)/7/24) Weekly_cnt
from
t Y
Where
Insert_date >= Sysdate-7-1/24 And Insert_date < Sysdate-1/24) B
/

Thanks
Hitesh

June 14, 2007 - 7:24 am UTC

no analytics needed, but a single pass would more than do this

select count( case when insert_date between sysdate-1/24 and sysdate then 1 end),
    count( case when insert_date between sysdate-7-1/24 and sysdate then 1 end)
  from t
where insert_date between sysdate-7-1/24 and sysdate

Can we avoid multi pass of table T1

Ritesh, June 16, 2007 - 4:52 am UTC

Hi Tom,

Can we aviod the multi-pass of Table T1 in the below mentioned SQL. T1 is the parent table based on which ceratin records gets invalid status and they are inserted in table T. We need to know the %of Invalid activity based on the table T1.

Select
'T' "Table Name",
Cnt "Base Count",
Invalid_Match_Cnt "Invalid Match Count",
Invalid_Match_Cnt*100/Cnt "%Invalid"
from
(Select /*+Parallel(X 16) */ Count(*) Cnt
from
T1 X
WHERE Insert_date Between Sysdate-4/24 AND Sysdate-3/24) A,
(Select /*+ Parallel(X 16) Parallel(Y 16) Use_Hash(X Y) */ Count(*) Invalid_Match_Cnt
from
T X, T1 Y
WHERE
X.Insert_date Between Sysdate-4/24 AND Sysdate AND
Y.Insert_date Between Sysdate-4/24 AND Sysdate-3/24 AND
X.tag = Y.tag) B
/

Thanks
Ritesh

Last_Value vs First_Value - Please explain

Jay, July 19, 2007 - 10:56 am UTC

Quote:

"it didn't work for me. I had to change it to first_value(master_record) over (partition by sub_record order by moddate desc)
Is there a reason for that?

Followup December 16, 2003 - 2pm US/Eastern:
doh, default window clause is current row and unbounded preceding
i would have needed a window clause that looks forwards rather then backwards (reason #1 why I
should always set up a test case instead of just answering on the fly)
your solution of reversing the data works just fine. "
---------------------
Hi tom,

I had the same doubt. I read your explanation but couldn't fully comprehend what you were saying.

Why would,

Last_Value() over (partition by .. order by ... )
be different from..
First_Value() over (partition by .. order by ... desc) ??

The later case provides the correct result. The first one doesn't. I'm confused. Thanks for your time..

Jay

July 19, 2007 - 12:20 pm UTC

give us the for example where the latter is correct and the former is not - and then we'll explain what happened.

NULLS would definitely affect that (unless you use nulls first/nulls last to control their placement)

Thanks..

Jay, July 19, 2007 - 1:41 pm UTC

-----------------------------case 1 query

with 
population as
(
 select 123 as test_id, 'Alpha' as test1, 1 as test2 from dual
 union all
 select 123 as test_id, 'Beta' as test1, 2 as test2 from dual
 union all
 select 123 as test_id, 'Gamma' as test1, 3 as test2 from dual       
 union all
 select 123 as test_id, 'Delta' as test1, 4 as test2 from dual
 union all 
 select 123 as test_id, 'Pi' as test1, 5 as test2 from dual
)     


select test_id,
       test1, 
       test2,
       first_value(test1) over (partition by test_id
                                    order by test2 desc) as "Correct" 
  from population p

-----------------------------case 2 query

with 
population as
(
 select 123 as test_id, 'Alpha' as test1, 1 as test2 from dual
 union all
 select 123 as test_id, 'Beta' as test1, 2 as test2 from dual
 union all
 select 123 as test_id, 'Gamma' as test1, 3 as test2 from dual       
 union all
 select 123 as test_id, 'Delta' as test1, 4 as test2 from dual
 union all 
 select 123 as test_id, 'Pi' as test1, 5 as test2 from dual
)     
select test_id,
       test1, 
       test2, 
       last_value(test1) over (partition by test_id
                                   order by test2) as "Incorrect.. why??"
  from population p

case 1 is correct and returns 'pi' which is expected.
case 2 should return 'pi' as well since I am ordering it by test2 ascending and am partitioning it by the ID.

Could you please explain me as to why this is happening?

Thanks for your time as always!

July 19, 2007 - 2:53 pm UTC

oh, ok - easy, the default window is:

default window clause is current row and unbounded preceding

so, first_value/last_value works from the current record BACKWARDS, up to the top of the result set.

first_value will always be - the first record.
last_value (in this case) will always be - the CURRENT RECORD

 14  select test_id,
 15         test1,
 16         test2,
 17         first_value(test1) over (partition by test_id
 18                                      order by test2 desc) as "Correct" ,
 19         last_value(test1) over (partition by test_id
 20                                     order by test2) as "Incorrect.. why??"
 21    from population p
 22    order by test2
 23  /

   TEST_ID TEST1      TEST2 Corre Incor
---------- ----- ---------- ----- -----
       123 Alpha          1 Pi    Alpha
       123 Beta           2 Pi    Beta
       123 Gamma          3 Pi    Gamma
       123 Delta          4 Pi    Delta
       123 Pi             5 Pi    Pi

here, when processing row 1 - last value has to be alpha, because if you look at the current record and all of the preceding ones, the last value of test1 is alpha.

when you go to row 2, the last value has to be beta - because when you look at the last row in the current window (which is from the current row and all preceding), the last value is Beta - and so on...

 14  select test_id,
 15         test1,
 16         test2,
 17         first_value(test1) over (partition by test_id
 18                                      order by test2 desc) as "Correct" ,
 19         last_value(test1) over (partition by test_id
 20                                     order by test2<b>
 21                                     range between current row and unbounded following ) as "Incorrect.. why??"</b>
 22    from population p
 23    order by test2
 24  /

   TEST_ID TEST1      TEST2 Corre Incor
---------- ----- ---------- ----- -----
       123 Alpha          1 Pi    Pi
       123 Beta           2 Pi    Pi
       123 Gamma          3 Pi    Pi
       123 Delta          4 Pi    Pi
       123 Pi             5 Pi    Pi

Here, we changed the window so that last value would look at the current row and everything AFTER it, so now, Pi is the answer because no matter what row we look at - using the current row and unbounded following - the last value is the last row in the window and the last row is always the SAME in that window.

Thank you very much..

Jay, July 19, 2007 - 3:03 pm UTC

Hello Tom,

Thanks a lot for explaining the concept. It makes complete sense now. Thank you so much!!

Jay

last_value

Sokrates, July 20, 2007 - 1:55 am UTC

now, even me, I got it !

LAST_VALUE, unbounded frustration

Duke Ganote, August 24, 2007 - 11:19 am UTC

That's why I never use LAST_VALUE. FIRST_VALUE lets me intuitively use the default window clause (current row and unbounded preceding).

FIRST_VALUE behaves as FIFO (first inline, first out).

The default clauses turn LAST_VALUE into LINF (last inline, never found).

Analytics, configurable date window

Phil, September 02, 2007 - 7:54 am UTC

Hello Tom
I struggle with Analytics, have you finished writing a book that explains it from the start?
I need to produce a report based on time and split the data into chunks. Initially each minute but the user needs to be able to select the time "window" for example 5 minutes.
Is there any easy way of doing this using analytics please?
here are 2 hours of random data I've been working with...

create table tom
(when date)
/
insert into tom(select sysdate+((dbms_random.value)-.5)/14 from all_objects)
/

September 05, 2007 - 1:27 pm UTC

I did,

years ago - expert one on one Oracle - the chapter on Analytics. Covers it from start to finish.

Also, the Data Warehousing guide does a really good job!

you need to define your problem a little better - normally to split things into 1 or N minutes slices - you do NOT need analytics at all. eg: one minute slices, just group by:

to_char( dt, 'yyyymmddhhmm' )

everything in the same minute will be together - you need to be a little more precise as to what you mean here though before I can say anything.

Can LAG do what I need? Version 9.2.0.6.0

Peter, September 13, 2007 - 7:16 am UTC

Dear Tom,

Thanks for the site - I have learned many things and use Analytics a lot now - very performant in many cases. If you can spare the time to look at my problem I will be most grateful.

I have a table like:

SQL> r
  1  create table lag_test (
  2  claim varchar2 (10) not null,
  3  event varchar2 (10),
  4  period number (6,0) not null,
  5  amount number (10,2) not null
  6* )

Table created.

With this data:

SQL> insert into lag_test values ('1', 'A', 200701, 50);

1 row created.

SQL> insert into lag_test values ('1', null, 200702, 75);

1 row created.

SQL> insert into lag_test values ('1', 'A', 200703, 100);

1 row created.

SQL> insert into lag_test values ('1', 'A', 200704, 1000);

1 row created.

SQL> commit;

Commit complete.

And I am using the following query to compute movements in amount:

SQL> edit
Wrote file afiedt.buf
  1  select
  2  event,
  3  period,
  4  amount - lag (amount, 1, 0) over (partition by claim order by claim, period) amt
  5  from
  6* lag_test
SQL> /

EVENT          PERIOD        AMT
---------- ---------- ----------
A              200701         50
               200702         25
A              200703         25
A              200704        900

But I was expecting the following output:

EVENT          PERIOD        AMT
---------- ---------- ----------
A              200701         50
A              200702        -50
               200702         75
A              200703        100
               200703        -75
A              200704        900

Because at period 200702 "A" has gone away and been replaced by "null". And at period 200703 "null" has gone away and been replaced by "A".

I have scoured Chapter 12 in your book but maybe my understanding of how LAG works is flawed - or is there some way I can get my desired output.

Would really appreciate any help you can give me, and apologise if this is off topic.

Kind Regards,

Peter

September 15, 2007 - 7:27 pm UTC

I can only tell you if your desired output is possible if.....

You state in a factual, requirementy sort of what what it is. Saying "I expect 42 to be displayed" is insufficient.

You have 4 rows, I fail to see how a query against a simple 4 row table would return more than 4 in this case.

'A' has nothing to do with this query - you partition by claim (always '1') you order by claim and period - other than you selecting event, it is not part of the picture here.

Thanks for the tip above

Phil, September 15, 2007 - 5:57 pm UTC

Hi Tom Re:

Hello Tom
I struggle with Analytics, have you finished writing a book that explains it from the start?
I need to produce a report based on time and split the data into chunks. Initially each minute but
the user needs to be able to select the time "window" for example 5 minutes.
Is there any easy way of doing this using analytics please?
here are 2 hours of random data I've been working with...

create table tom
(when date)
/
insert into tom(select sysdate+((dbms_random.value)-.5)/14 from all_objects)
/

OK, understand the to_char(t.when, 'yyyymmddhhmm') which makes perfect sense but is this an efficient way of doing a search for a max frequency in a certain window? I thought analytics would offer me some performance advanatages. I'm confident you'll improve on the following to give me a top frequency per period based on the above seed data...

select five_sec_count, five_sec_period, seconds secfrom, seconds + 5 secto
from (select count(*) five_sec_count,
to_char(t.when, 'yyyymmddhhmm') five_sec_period,
case
when to_char(when, 'ss') between 1 and 5 then
5
when to_char(when, 'ss') between 6 and 10 then
10
when to_char(when, 'ss') between 11 and 15 then
15
when to_char(when, 'ss') between 16 and 20 then
20
when to_char(when, 'ss') between 21 and 25 then
25
when to_char(when, 'ss') between 26 and 30 then
30
when to_char(when, 'ss') between 31 and 35 then
35
when to_char(when, 'ss') between 36 and 40 then
40
when to_char(when, 'ss') between 41 and 45 then
45
when to_char(when, 'ss') between 46 and 50 then
50
when to_char(when, 'ss') between 51 and 55 then
55
else
60
end seconds
from tom t
group by to_char(t.when, 'yyyymmddhhmm'),
case
when to_char(when, 'ss') between 1 and 5 then
5
when to_char(when, 'ss') between 6 and 10 then
10
when to_char(when, 'ss') between 11 and 15 then
15
when to_char(when, 'ss') between 16 and 20 then
20
when to_char(when, 'ss') between 21 and 25 then
25
when to_char(when, 'ss') between 26 and 30 then
30
when to_char(when, 'ss') between 31 and 35 then
35
when to_char(when, 'ss') between 36 and 40 then
40
when to_char(when, 'ss') between 41 and 45 then
45
when to_char(when, 'ss') between 46 and 50 then
50
when to_char(when, 'ss') between 51 and 55 then
55
else
60
end
order by 1 desc)
where rownum = 1

Thanks for everything you've provided over the years.

September 15, 2007 - 10:03 pm UTC

well, the case isn't needed - a simple divide and trunc could do that - otherwise - looks fine.

Can LAG do what I need? Version 9.2.0.6.0

Peter, September 19, 2007 - 11:35 am UTC

>>I can only tell you if your desired output is possible if.....

>>You state in a factual, requirementy sort of what what it is. Saying "I expect 42 to be displayed" is insufficient.

Sorry for being unclear Tom. My example was cut down too much and didn't really represent what I was trying to do.

I have found the answer in your article

https://asktom.oracle.com/Misc/oramag/on-uniqueness-space-and-numbers.html
where you mention partitioned "sparse" outer joins. Unfortunately we are not yet on 10g, and my cartesian product will be extremely large.

Many thanks anyway for your response.

Kind Regards,

Peter

Combine row data for date event

Robert, September 28, 2007 - 4:57 pm UTC

Hello Tom,

I have a small module that tracks employee activity within a department. However, the tracker registers the activity as a single event (marked by 'A' - active or 'I' - inactive). Chronologically, for the same employee/department, an event of type 'A' can be either followed by a 'I' or by nothing (in which case the emp is still active).

I need to get the activity as a period, where the start date/time is when the emp became active at a particular time and the end date/time is when the emp became inactive. There's a way to accomplish this using table joins but I'm pretty sure there's a (simpler) way to accomplish it using analytics. How can I do this?

Here's the scripts (Oracle 10g).

create table emp_activity
(
      dept_id number,
      emp_id number,
      dt date,
      activity char(1)
);

One set of sample data:

insert into emp_activity(dept_id,emp_id,dt,activity) 
values(15, 973, to_Date('23-AUG-07 12.00.00 AM', 'DD-MON-RR HH12.MI.SS AM'), 'A');
insert into emp_activity(dept_id,emp_id,dt,activity) 
values(15, 973, to_Date('12-APR-07 12.00.00 AM', 'DD-MON-RR HH12.MI.SS AM'), 'I');
insert into emp_activity(dept_id,emp_id,dt,activity) 
values(15, 973, to_Date('02-OCT-06 12.00.00 AM', 'DD-MON-RR HH12.MI.SS AM'), 'A');

the result for this would be

dept | emp | start_date | end_date
----------------------------------------
15 | 973 | 23-AUG-07 | null
15 | 973 | 02-OCT-06 | 12-APR-07

Another set of data:

insert into emp_activity(dept_id,emp_id,dt,activity) 
values(10, 100, to_Date('16-AUG-01 12.00.00 AM', 'DD-MON-RR HH12.MI.SS AM'), 'A');
insert into emp_activity(dept_id,emp_id,dt,activity) 
values(10, 100, to_Date('25-APR-03 12.00.00 AM', 'DD-MON-RR HH12.MI.SS AM'), 'I');
insert into emp_activity(dept_id,emp_id,dt,activity) 
values(10, 100, to_Date('30-JUL-07 12.00.00 AM', 'DD-MON-RR HH12.MI.SS AM'), 'A');
insert into emp_activity(dept_id,emp_id,dt,activity) 
values(10, 100, to_Date('22-AUG-07 11.08.54 AM', 'DD-MON-RR HH12.MI.SS AM'), 'I');
insert into emp_activity(dept_id,emp_id,dt,activity) 
values(10, 100, to_Date('22-AUG-07 11.35.00 AM', 'DD-MON-RR HH12.MI.SS AM'), 'A');
insert into emp_activity(dept_id,emp_id,dt,activity) 
values(10, 100, to_Date('22-AUG-07 12.00.00 AM', 'DD-MON-RR HH12.MI.SS PM'), 'I');

in this case, the result should be

dept | emp | start_date | end_date
----------------------------------------
10 | 100 | 22-AUG-07 | 22-AUG-07
10 | 100 | 30-JUL-07 | 22-AUG-07
10 | 100 | 16-AUG-01 | 25-APR-03

I left out the time fields in the results, for clarity. Thank you.

October 03, 2007 - 1:10 pm UTC

your last output looks wrong to me, this is what I think it should be:

ops$tkyte%ORA10GR2> select *
  2    from (
  3  select dept_id, emp_id,
  4         dt, lead(dt) over (partition by dept_id, emp_id order by dt) last_dt,
  5         activity, lead(activity) over (partition by dept_id, emp_id order by dt) last_activity
  6    from emp_activity
  7         )
  8   where activity = 'A'
  9   order by dept_id, emp_id, dt
 10  /

   DEPT_ID     EMP_ID DT        LAST_DT   A L
---------- ---------- --------- --------- - -
        10        100 16-AUG-01 25-APR-03 A I
        10        100 30-JUL-07 22-AUG-07 A I
        10        100 22-AUG-07           A
        15        973 02-OCT-06 12-APR-07 A I
        15        973 23-AUG-07           A

Re: Combine row data for date event

Robert, October 02, 2007 - 8:54 am UTC

I actually solved the problem myself after playing with lag and lead!


select dept_id, emp_id, start_date, end_date
from (
    select dept_id, emp_id, dt start_date
      ,lag(dt) over(partition by dept_id, emp_id order by dt desc) end_date
      ,activity
    from emp_activity
) where activity='A'
order by dept_id, emp_id, start_date desc

window clause on time

Ryan, October 10, 2007 - 1:31 pm UTC

This might be on here already, but I couldn't find it. I think I need an analytic function with a window clause to do this:

mytimestamp is a timestamp field
mymetric is a number field

select max(mytimestamp) as time, sum(mymetric) as bytes
from metrictable
where mymetric > ( select max(mymetric) from metrictable) - (1/96)

I want to get this rolled up by 15 minute chunks. do i use a group by, analytic function?

So it is essentially a set of buckets or chunks

first 15 minute period
second 15 minute period

There "where" clause is just me trying to get just the last 15 minutes.

analytic functions

RAJ, November 08, 2007 - 4:16 pm UTC

Hi Tom,

I have the following scenario and given the exepected output as well.

I am trying to achieve with the below said query.

SELECT DISTINCT a.emp_id, CASE WHEN a.operation$ IN ('UU', 'UN') THEN 'U'
ELSE a.operation$ END operation,
CASE
WHEN a.operation$ = 'I' THEN NULL
ELSE FIRST_VALUE (a.amount) OVER (PARTITION BY a.emp_id ORDER BY rsid$)
END prev_amount,
LAST_VALUE (a.amount) OVER (PARTITION BY a.emp_id ORDER BY rsid$)
AS curr_amount,
CASE WHEN a.operation$ = 'I'
THEN NULL ELSE FIRST_VALUE (a.TIMESTAMP$) OVER (PARTITION BY a.emp_id ORDER BY a.rsid$ ASC)
END prev_timestamp,
LAST_VALUE (a.TIMESTAMP$) OVER (PARTITION BY a.emp_id ORDER BY a.rsid$ ASC)
AS curr_timestamp
FROM DEPT a
ORDER BY emp_id

Result -- 3 rows but expected is 1 row. Below I have given the expected output as well. I am not sure where I am missing in the above said query..

Here is the scenario..

create table dept (
OPERATION$ CHAR(2)
,CSCN$ NUMBER
,COMMIT_TIMESTAMP$ DATE
,RSID$ NUMBER
,USERNAME$ VARCHAR2(30)
,TIMESTAMP$ DATE
,SOURCE_COLMAP$ RAW(128)
,TARGET_COLMAP$ RAW(128)
,EMP_ID NUMBER
,TIMESTAMP DATE
,AMOUNT NUMBER
)

INSERT INTO
(OPERATION$,CSCN$,COMMIT_TIMESTAMP$,RSID$,USERNAME$,TIMESTAMP$,SOURCE_COLMAP$,TARGET_COLMAP$,EMP_ID,TIMESTAMP,AMOUNT)
VALUES
('UU ',1234,SYSDATE,40264,'TEST',SYSDATE,'08','08',104,SYSDATE,90000);
INSERT INTO
(OPERATION$,CSCN$,COMMIT_TIMESTAMP$,RSID$,USERNAME$,TIMESTAMP$,SOURCE_COLMAP$,TARGET_COLMAP$,EMP_ID,TIMESTAMP,AMOUNT)
VALUES
('UN ',1234,SYSDATE,40265,'TEST',SYSDATE,'08','08',104,SYSDATE,100000);
INSERT INTO
(OPERATION$,CSCN$,COMMIT_TIMESTAMP$,RSID$,USERNAME$,TIMESTAMP$,SOURCE_COLMAP$,TARGET_COLMAP$,EMP_ID,TIMESTAMP,AMOUNT)
VALUES
('UU ',1234,SYSDATE,40270,'TEST',SYSDATE,'08','08',104,SYSDATE,100000);
INSERT INTO
(OPERATION$,CSCN$,COMMIT_TIMESTAMP$,RSID$,USERNAME$,TIMESTAMP$,SOURCE_COLMAP$,TARGET_COLMAP$,EMP_ID,TIMESTAMP,AMOUNT)
VALUES
('UN ',1234,SYSDATE,40271,'TEST',SYSDATE,'08','08',104,SYSDATE,30000);

UU,UN -- 'U'

Expected result set

EMP_ID, OPERATION$, PREV_AMOUNT, PREV_TIMESTAMP, CURRENT_AMOUNT, CURR_TIMESTAMP

104 , U 90000 SYSDATE , 30000 SYSDATE

November 09, 2007 - 12:10 pm UTC

funny, your example doesn't work AT ALL. operation$ is 2 characters, you have 3. there is no table name. sigh.

Funnier still, And no clue why you expect a SINGLE ROW - when there is clearly three distinct rows.

ops$tkyte%ORA10GR2> SELECT DISTINCT a.emp_id, CASE  WHEN a.operation$ IN ('UU', 'UN') THEN 'U'
  2                     ELSE a.operation$  END operation,
  3                  CASE
  4                     WHEN a.operation$ = 'I' THEN NULL
  5                     ELSE FIRST_VALUE (a.amount) OVER (PARTITION BY a.emp_id ORDER BY rsid$)
  6                  END prev_amount,
  7                  LAST_VALUE (a.amount) OVER (PARTITION BY a.emp_id ORDER BY rsid$)
  8                                                                 AS curr_amount,
  9                  CASE  WHEN a.operation$ = 'I'
 10                        THEN NULL ELSE FIRST_VALUE (a.TIMESTAMP$) OVER (PARTITION BY a.emp_id ORDER BY a.rsid$ ASC)
 11                  END prev_timestamp,
 12                  LAST_VALUE (a.TIMESTAMP$) OVER (PARTITION BY a.emp_id ORDER BY a.rsid$ ASC)
 13                                                              AS curr_timestamp
 14             FROM DEPT a
 15         ORDER BY emp_id
 16  /

    EMP_ID OP PREV_AMOUNT CURR_AMOUNT PREV_TIME CURR_TIME
---------- -- ----------- ----------- --------- ---------
       104 U        90000       30000 09-NOV-07 09-NOV-07
       104 U        90000       90000 09-NOV-07 09-NOV-07
       104 U        90000      100000 09-NOV-07 09-NOV-07

It appears you are misunderstanding how the analytic functions work in this case - so presenting your "answer" without presenting "your question" (phrased in text) is not useful.

how do you carry lag to more than one row?

ee, December 05, 2007 - 6:11 pm UTC

How can you carry forward lag analytic results to multiple rows. Given table t based on view of two cartesianed tables that uses anlaytics dataset is as follows:
eg.
CREATE TABLE T(
cert_number varchar2(10),
inspection_date DATE,
risk_assign_date DATE,
risk_number NUMBER,
derived_risk NUMBER
);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk ) VALUES(1,TO_DATE('11-dec-2006'),NULL,NULL,NULL);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk ) VALUES(1,TO_DATE('12-dec-2006'),NULL,NULL,NULL);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk ) VALUES(1,TO_DATE('01-jan-2007'),TO_DATE('06-jan-2007'),3,3);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('25-jan-2007'),NULL,NULL,3);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('27-jan-2007'),TO_DATE('27-jan-2007'),8,8);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('13-feb-2007'),NULL,NULL,3);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('17-feb-2007'),TO_DATE('17-feb-2007'),2,2);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('18-apr-2007'),NULL,NULL,2);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('23-may-2007'),NULL,NULL,NULL);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('17-aug-2007'),NULL,NULL,NULL);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('22-aug-2007'),TO_DATE('22-aug-2007'),1,1);
Analytic code ...DECODE (risk_number, NULL,first_value(risk_number) over(ORDER BY cert_id_number, inspectiondate ASC ROWS 1 preceding ),risk_number)derived_risk...within the view, correctly determines the risk number for the preceding rows where the risk number is null, but it leaves several rows without a derived risk_number value. How can I carry the lag derived_risk value of 2 forward to more than just one row so that the rows that include inspection dates of 5/23/2007 and 8/23/07 also get a derived_risk value of 2 assigned to the row.
The desired output would look like:
cert_ inspection_ risk_assign_ risk_ derived_
number date date number risk
====== ========== ===== ====== ========
1 12/11/2006 null null null
1 12/12/2006 null null null
1 1/1/2007 1/6/2007 3 3
1 1/25/2007 null null 3
1 1/27/2007 1/27/2007 8 8
1 2/13/2007 null null 3
1 2/17/2007 2/17/2007 2 2
1 4/18/2007 null null 2
1 5/23/2007 null null 2
1 8/17/2007 null null 2
1 8/22/2007 8/22/2007 1 1

December 10, 2007 - 9:52 am UTC

ops$tkyte%ORA10GR2> select t.*,
  2         last_value( risk_number ignore nulls ) over (order by cert_number, inspection_date ) dr
  3    from t;

CERT_NUMBE INSPECTIO RISK_ASSI RISK_NUMBER DERIVED_RISK         DR
---------- --------- --------- ----------- ------------ ----------
1          11-DEC-06
1          12-DEC-06
1          01-JAN-07 06-JAN-07           3            3          3
1          25-JAN-07                                  3          3
1          27-JAN-07 27-JAN-07           8            8          8
1          13-FEB-07                                  3          8
1          17-FEB-07 17-FEB-07           2            2          2
1          18-APR-07                                  2          2
1          23-MAY-07                                             2
1          17-AUG-07                                             2
1          22-AUG-07 22-AUG-07           1            1          1

11 rows selected.

I don't know why your example has

null, null, 3,3,8,3 <<<===???? why 3 - i think it should be 8.

how do you carry lag values forward to more than one row?

ee, December 06, 2007 - 8:55 am UTC

How can you carry forward lag analytic results to multiple rows. Derived risk value carries forward the value of previous recent risk number and only changes when new risk assign date changes with a new risk number. Given table t based on view of two cartesianed tables that uses anlaytics dataset is as follows:
CREATE TABLE T(
cert_number varchar2(10),
inspection_date DATE,
risk_assign_date DATE,
risk_number NUMBER,
derived_risk NUMBER
);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('11-dec-2006'),NULL,NULL,NULL);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('12-dec-2006'),NULL,NULL,NULL);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('01-jan-2007'),TO_DATE('06-jan-2007'),3,3);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('25-jan-2007'),NULL,NULL,3);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('27-jan-2007'),TO_DATE('27-jan-2007'),8,8);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('13-feb-2007'),NULL,NULL,8);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('17-feb-2007'),TO_DATE('17-feb-2007'),2,2);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('18-apr-2007'),NULL,NULL,2);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('23-may-2007'),NULL,NULL,NULL);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('17-aug-2007'),NULL,NULL,NULL);
INSERT INTO T (cert_number,inspection_date,risk_assign_date,risk_number,derived_risk )
VALUES(1,TO_DATE('22-aug-2007'),TO_DATE('22-aug-2007'),1,1);
Analytic code ...DECODE (risk_number, NULL,first_value(risk_number) over(ORDER BY cert_id_number,
inspectiondate ASC ROWS 1 preceding ),risk_number)derived_risk...within the view, correctly
determines the risk number for the preceding rows where the risk number is null, but it leaves
several rows without a derived risk_number value. How can I carry the lag derived_risk value of 2
forward to more than just one row so that the rows that include inspection dates of 5/23/2007 and
8/23/07 also get a derived_risk value of 2 assigned to the row.
The desired output would look like:
cert_ inspection_ risk_assign_ risk_ derived_
number date date number risk
====== ========== ===== ====== ========
1 12/11/2006 null null null
1 12/12/2006 null null null
1 1/1/2007 1/6/2007 3 3
1 1/25/2007 null null 3
1 1/27/2007 1/27/2007 8 8
1 2/13/2007 null null 8
1 2/17/2007 2/17/2007 2 2
1 4/18/2007 null null 2
1 5/23/2007 null null 2
1 8/17/2007 null null 2
1 8/22/2007 8/22/2007 1 1

10g only?

Jay, December 10, 2007 - 10:05 am UTC

Hi Tom,
Thanks! The questioner corrected his output in the following statement. I had a question, is this 'ignore nulls' a 10g thing? Because, it does not seem to be working for me and I use 9i. I get a ORA-00907 Missing right parenthesis when I use the 'ignore nulls' statement.
Thanks for your help!
Jay

December 10, 2007 - 10:16 am UTC

ignore nulls is new with 10g - yes.

before that you can use this "carry down" technique:

ops$tkyte%ORA9IR2> select t.*,
  2         last_value( risk_number ignore nulls ) over (order by cert_number, inspection_date ) dr
  3    from t;
       last_value( risk_number ignore nulls ) over (order by cert_number, inspection_date ) dr
                               *
ERROR at line 2:
ORA-00907: missing right parenthesis


ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select cert_number,inspection_date,risk_assign_date,risk_number,derived_risk,
  2         to_number( substr( max(data) over (order by cert_number, inspection_date), 11 ) ) dr
  3    from (
  4  select t.*,
  5         case when risk_number is not null
  6                  then to_char(row_number() over
  7                                       (order by cert_number, inspection_date),'fm0000000000') ||
  8                                   risk_number
  9                   end data
 10    from t
 11         ) x
 12  /

CERT_NUMBE INSPECTIO RISK_ASSI RISK_NUMBER DERIVED_RISK         DR
---------- --------- --------- ----------- ------------ ----------
1          11-DEC-06
1          12-DEC-06
1          01-JAN-07 06-JAN-07           3            3          3
1          25-JAN-07                                  3          3
1          27-JAN-07 27-JAN-07           8            8          8
1          13-FEB-07                                  3          8
1          17-FEB-07 17-FEB-07           2            2          2
1          18-APR-07                                  2          2
1          23-MAY-07                                             2
1          17-AUG-07                                             2
1          22-AUG-07 22-AUG-07           1            1          1

11 rows selected.

WOW!

Jay, December 10, 2007 - 10:24 am UTC

Thanks Tom. You are simply amazing. I get to learn so much from your site!
Thanks a ton and have a wonderful day.
Jay

carrying lag value forward to more than one row

ee, December 11, 2007 - 12:40 pm UTC

Tom ,as always you find the answers to difficult questions and provide succinct code from which we can follow and learn. I struggled for several days on finding a solution to carry lag values forward to multiple rows and you have delivered once again. As I stated to you once at an RMOUG meeting, I thank the maker above for you insight and willingness to share your talents and knowledge with others.

Thanks again.

carry multiple lag values forward to multple rows

ee, December 13, 2007 - 12:00 pm UTC

TOM: I am still trying to grasp the capabilities of using analytics and have a follow up question:
as you stated: your code.....
ops$tkyte%ORA9IR2>
1 select cert_number,inspection_date,risk_assign_date,risk_number,derived_risk,
2 to_number( substr( max(data) over (order by cert_number, inspection_date), 11 ) ) dr
3 from (
4 select t.*,
5 case when risk_number is not null
6 then to_char(row_number() over
7 (order by cert_number, inspection_date),'fm0000000000')
||
8 risk_number
9 end data
10 from t
11 ) x
12 /
RETURNS this output:
cert_ inspection_ risk_assign_ risk_ derived_
number date date number risk
====== ========== ===== ====== ========
1 12/11/2006 null null null
1 12/12/2006 null null null
1 1/6/2007 1/6/2007 3 3
1 1/25/2007 null null 3
1 1/27/2007 1/27/2007 8 8
1 2/13/2007 null null 8
1 2/17/2007 2/17/2007 2 2
1 4/18/2007 null null 2
1 5/23/2007 null null 2
1 8/17/2007 null null 2
1 8/22/2007 8/22/2007 1 1

But, how would code the query to return this output?
The desired output would look like:
cert_ inspection_ risk_assign_ risk_ derived_ derived_
number date date number risk risk_date
====== ========== ===== ====== ======== =========
1 12/11/2006 null null null null
1 12/12/2006 null null null null
1 1/6/2007 1/6/2007 3 3 1/6/2007
1 1/25/2007 null null 3 1/6/2007
1 1/27/2007 1/27/2007 8 8 1/27/2007
1 2/13/2007 null null 8 1/27/2007
1 2/17/2007 2/17/2007 2 2 2/17/2007
1 4/18/2007 null null 2 2/17/2007
1 5/23/2007 null null 2 2/17/2007
1 8/17/2007 null null 2 2/17/2007
1 8/22/2007 8/22/2007 1 1 8/22/2007

Thank you so very much for your help !!!

December 13, 2007 - 12:37 pm UTC

using the same technique on risk_assign_date

Look at what I did bit by bit here.... This is a technique applicable to anything...

carry down bit by bit...

ops$tkyte%ORA10GR2> select x, y
  2    from t
  3   order by x
  4  /

X                    Y
-------------------- --------------------
10-dec-2007 00:00:00
11-dec-2007 00:00:00 13-dec-2007 12:39:40
12-dec-2007 00:00:00
13-dec-2007 00:00:00 12-dec-2007 12:39:40
14-dec-2007 00:00:00

we start with that, we want to carry down Y. Now, Y is not 'sortable' right now - the 13th comes before the 12th. In general, this will be true - that the attribute we want is not sortable directly... We need something to add to Y to make it "sortable" - row_number will help:

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select x, y,
  2         case when y is not null then row_number() over (order by x) end rn
  3    from t
  4   order by x
  5  /

X                    Y                            RN
-------------------- -------------------- ----------
10-dec-2007 00:00:00
11-dec-2007 00:00:00 13-dec-2007 12:39:40          2
12-dec-2007 00:00:00
13-dec-2007 00:00:00 12-dec-2007 12:39:40          4
14-dec-2007 00:00:00

we want row_number in a field whenever Y is not null (for the MAX() trick later!) We need it to sort - and it would - it is a number, but we need to piggy back onto this RN field our Y value - so we need a string:

o

ps$tkyte%ORA10GR2> select x, y,
  2         case when y is not null then to_char(row_number() over (order by x),'fm000000') end rn
  3    from t
  4   order by x
  5  /

X                    Y                    RN
-------------------- -------------------- -------
10-dec-2007 00:00:00
11-dec-2007 00:00:00 13-dec-2007 12:39:40 000002
12-dec-2007 00:00:00
13-dec-2007 00:00:00 12-dec-2007 12:39:40 000004
14-dec-2007 00:00:00

Now, we need to add our value to this RN field - it is a date, we convert date into something neutral for our string:

ops$tkyte%ORA10GR2> select x, y,
  2         case when y is not null then to_char(row_number() over (order by x),'fm000000') end ||
  3             to_char(y,'yyyymmddhh24miss') rn
  4    from t
  5   order by x
  6  /

X                    Y                    RN
-------------------- -------------------- ---------------------
10-dec-2007 00:00:00
11-dec-2007 00:00:00 13-dec-2007 12:39:40 00000220071213123940
12-dec-2007 00:00:00
13-dec-2007 00:00:00 12-dec-2007 12:39:40 00000420071212123940
14-dec-2007 00:00:00

Now, we can apply the MAX trick to 'carry' down:

ops$tkyte%ORA10GR2> select x, y, MAX(rn) over (order by x) max_rn
  2    from (
  3  select x, y,
  4         case when y is not null then to_char(row_number() over (order by x),'fm000000') end ||
  5             to_char(y,'yyyymmddhh24miss') rn
  6    from t
  7         )
  8   order by x
  9  /

X                    Y                    MAX_RN
-------------------- -------------------- ---------------------
10-dec-2007 00:00:00
11-dec-2007 00:00:00 13-dec-2007 12:39:40 00000220071213123940
12-dec-2007 00:00:00                      00000220071213123940
13-dec-2007 00:00:00 12-dec-2007 12:39:40 00000420071212123940
14-dec-2007 00:00:00                      00000420071212123940

substr gets out bit out again:

ops$tkyte%ORA10GR2> select x, y, substr(MAX(rn) over (order by x),7) max_rn
  2    from (
  3  select x, y,
  4         case when y is not null then to_char(row_number() over (order by x),'fm000000') end ||
  5             to_char(y,'yyyymmddhh24miss') rn
  6    from t
  7         )
  8   order by x
  9  /

X                    Y                    MAX_RN
-------------------- -------------------- ---------------
10-dec-2007 00:00:00
11-dec-2007 00:00:00 13-dec-2007 12:39:40 20071213123940
12-dec-2007 00:00:00                      20071213123940
13-dec-2007 00:00:00 12-dec-2007 12:39:40 20071212123940
14-dec-2007 00:00:00                      20071212123940

we are almost done - a to_date and viola'

ops$tkyte%ORA10GR2> select x, y, to_date( substr(MAX(rn) over (order by x),7),'yyyymmddhh24miss') max_rn
  2    from (
  3  select x, y,
  4         case when y is not null then to_char(row_number() over (order by x),'fm000000') end ||
  5             to_char(y,'yyyymmddhh24miss') rn
  6    from t
  7         )
  8   order by x
  9  /

X                    Y                    MAX_RN
-------------------- -------------------- --------------------
10-dec-2007 00:00:00
11-dec-2007 00:00:00 13-dec-2007 12:39:40 13-dec-2007 12:39:40
12-dec-2007 00:00:00                      13-dec-2007 12:39:40
13-dec-2007 00:00:00 12-dec-2007 12:39:40 12-dec-2007 12:39:40
14-dec-2007 00:00:00                      12-dec-2007 12:39:40

so, apply that technique to your data - I purposely am not giving the final final solution :) I want you to code it.

carry lag values forward to multple rows

ee, December 13, 2007 - 4:06 pm UTC

Thank you sooooo very much. I studied your example and was successful in applying it to my data situation.

PS.
During the daytime, I pour through your books and your askTom website and I carefully study the wonderful examples.... and then at night, I place your books under my pillow hoping that through osmosis, all of the wonderfull contents that is within the pages will be absorbed and understood when I awake. It is much better than counting sheep.

Single query to replace 'Union All'

Matthew, January 08, 2008 - 7:44 am UTC

Given the following script (v10.2):

DROP TABLE issue_category;
DROP TABLE category_type;
DROP TABLE issue_class;
DROP TABLE fd_scorecard;
DROP TABLE class_type;
DROP TABLE fd;

CREATE TABLE fd(
fd_id VARCHAR2(20),
fd_name VARCHAR2(100));

CREATE TABLE class_type(
cl_id NUMBER(6),
description VARCHAR2(100));

CREATE TABLE fd_scorecard(
fd_id VARCHAR2(20),
scorecard_id NUMBER(10),
date_of_meeting DATE,
status VARCHAR2(100));

CREATE TABLE category_type(
ct_id NUMBER(6),
class_type_id NUMBER(6),
description VARCHAR2(100),
weighting NUMBER(5,2));

CREATE TABLE issue_class(
fd_id VARCHAR2(20),
scorecard_id NUMBER(10),
class_type_id NUMBER(6),
ic_name VARCHAR2(100),
num_grade NUMBER(1),
weighting NUMBER(5,2));

CREATE TABLE issue_category(
fd_id VARCHAR2(20),
scorecard_id NUMBER(10),
category_type_id NUMBER(10),
class_type_id NUMBER(6),
num_grade NUMBER(1),
weighting NUMBER(5,2),
date_created DATE,
created_by VARCHAR2(100),
notes VARCHAR2(4000));

/* Create the data */
DECLARE
TYPE class_type_tab IS TABLE OF class_type%ROWTYPE;
t_class_type class_type_tab := class_type_tab();

TYPE category_type_tab IS TABLE OF category_type%ROWTYPE;
t_category_type category_type_tab := category_type_tab();

indx PLS_INTEGER;

PROCEDURE ins(
p_id class_type.cl_id%TYPE,
p_desc class_type.description%TYPE) IS
BEGIN
indx := t_class_type.NEXT(indx);

t_class_type(indx).cl_id := p_id;
t_class_type(indx).description := p_desc;

END ins;

PROCEDURE ins(
p_class_type_id category_type.class_type_id%TYPE,
p_id category_type.ct_id%TYPE,
p_desc category_type.description%TYPE,
p_weighting category_type.weighting%TYPE) IS
BEGIN
indx := t_category_type.NEXT(indx);

t_category_type(indx).ct_id := p_id;
t_category_type(indx).class_type_id := p_class_type_id;
t_category_type(indx).description := p_desc;
t_category_type(indx).weighting := p_weighting;

END ins;

BEGIN
INSERT INTO fd(
fd_id,
fd_name)
VALUES('FD000001', 'FD000001');

indx := 0;

/* Class_Type data */

t_class_type.EXTEND(12);

ins(1,'Class Type 1');
ins(2,'Class Type 2');
ins(3,'Class Type 3');
ins(4,'Class Type 4');
ins(5,'Class Type 5');
ins(6,'Class Type 6');
ins(7,'Class Type 7');
ins(8,'Class Type 8');
ins(9,'Class Type 9');
ins(10,'Class Type 10');
ins(11,'Class Type 11');
ins(12,'Class Type 12');

FORALL i IN 1..t_class_type.COUNT
INSERT INTO class_type
VALUES t_class_type(i);

INSERT INTO fd_scorecard(
fd_id,
scorecard_id,
date_of_meeting,
status)
SELECT fd_id,
ROWNUM Scorecard_ID,
SYSDATE Meeting_Date,
CASE WHEN MOD(ROWNUM,2) = 0 THEN 'Draft'
ELSE 'Final'
END Status
FROM fd;

INSERT INTO issue_class(
fd_id,
scorecard_id,
class_type_id,
ic_name,
num_grade,
weighting)
SELECT fs.fd_id,
fs.scorecard_id,
cl.cl_id Class_Type_ID,
'Issue Class Name '||TO_CHAR(ROWNUM) Ic_Name,
CASE MOD(ROWNUM, 5)
WHEN 0 THEN 5
WHEN 1 THEN 4
WHEN 2 THEN 3
WHEN 3 THEN 2
ELSE 1 END Num_Grade,
ROUND(100 / ROWNUM, 2) Weighting
FROM fd_scorecard fs,
class_type cl;

/* Category_Type data */

t_category_type.EXTEND(56);

indx := 0;

ins(1,1,'Category Type 11',150);
ins(1,2,'Category Type 12',30);
ins(1,3,'Category Type 13',60);
ins(1,4,'Category Type 14',40);
ins(1,5,'Category Type 15',10);
ins(1,6,'Category Type 16',5);
ins(1,7,'Category Type 17',5);

ins(2,1,'Category Type 21',150);
ins(2,2,'Category Type 22',25);
ins(2,3,'Category Type 23',20);
ins(2,4,'Category Type 24',10);
ins(2,5,'Category Type 25',10);
ins(2,6,'Category Type 26',10);

ins(3,1,'Category Type 31',50);
ins(3,2,'Category Type 32',25);
ins(3,3,'Category Type 33',25);

ins(4,1,'Category Type 41',25);
ins(4,2,'Category Type 42',75);
ins(4,3,'Category Type 43',50);

ins(5,1,'Category Type 51',30);
ins(5,2,'Category Type 52',30);
ins(5,3,'Category Type 53',30);
ins(5,4,'Category Type 54',10);

ins(6,1,'Category Type 61',50);
ins(6,2,'Category Type 62',25);
ins(6,3,'Category Type 63',50);
ins(6,4,'Category Type 64',50);
ins(6,5,'Category Type 65',25);
ins(6,6,'Category Type 66',25);
ins(6,7,'Category Type 67',25);

ins(7,1,'Category Type 71',100);
ins(7,2,'Category Type 72',50);
ins(7,3,'Category Type 73',20);
ins(7,4,'Category Type 74',5);
ins(7,5,'Category Type 75',25);
ins(7,6,'Category Type 76',20);
ins(7,7,'Category Type 77',25);
ins(7,8,'Category Type 78',40);
ins(7,9,'Category Type 79',10);
ins(7,10,'Category Type 710',5);

ins(8,1,'Category Type 81',25);
ins(8,2,'Category Type 82',25);

ins(9,1,'Category Type 91',50);
ins(9,2,'Category Type 92',25);
ins(9,3,'Category Type 93',50);
ins(9,4,'Category Type 94',25);
ins(9,5,'Category Type 95',25);
ins(9,6,'Category Type 96',25);

ins(10,1,'Category Type 101',100);
ins(10,2,'Category Type 102',100);
ins(10,3,'Category Type 103',75);
ins(10,4,'Category Type 104',25);

ins(11,1,'Category Type 111',30);
ins(11,2,'Category Type 112',30);
ins(11,3,'Category Type 113',20);
ins(11,4,'Category Type 114',20);

FORALL i IN 1..t_category_type.COUNT
INSERT INTO category_type
VALUES t_category_type(i);

INSERT INTO issue_category(
fd_id,
scorecard_id,
category_type_id,
class_type_id,
num_grade,
weighting,
date_created,
created_by,
notes)
SELECT ic.fd_id,
ic.scorecard_id,
ct.ct_id Category_Type_ID,
ct.class_type_id Class_Type_ID,
CASE MOD(ROWNUM, 5)
WHEN 0 THEN 5
WHEN 1 THEN 4
WHEN 2 THEN 3
WHEN 3 THEN 2
ELSE 1 END Num_Grade,
DBMS_RANDOM.VALUE(1,999) Weighting,
SYSDATE Date_Created,
USER Created_By,
NULL Notes
FROM issue_class ic,
category_type ct
WHERE ic.class_type_id = ct.class_type_id;

COMMIT;

END;

/* Constraints */
ALTER TABLE fd ADD(
CONSTRAINT fd_pk
PRIMARY KEY
(fd_id));

ALTER TABLE class_type ADD(
CONSTRAINT class_type_pk
PRIMARY KEY
(cl_id));

ALTER TABLE category_type ADD(
CONSTRAINT category_type_pk
PRIMARY KEY
(ct_id, class_type_id));

ALTER TABLE issue_class ADD(
CONSTRAINT issue_class_pk
PRIMARY KEY
(fd_id, scorecard_id, class_type_id));

ALTER TABLE issue_category ADD(
CONSTRAINT issue_class_category_pk
PRIMARY KEY
(fd_id, scorecard_id, category_type_id, class_type_id));

ALTER TABLE fd_scorecard ADD(
CONSTRAINT fd_scorecard_pk
PRIMARY KEY
(fd_id, scorecard_id));

ALTER TABLE fd_scorecard ADD(
CONSTRAINT fd_scorecard_fd_fk
FOREIGN KEY (fd_id)
REFERENCES fd(fd_id));

ALTER TABLE issue_class ADD(
CONSTRAINT issue_class_fd_scorecard_fk
FOREIGN KEY (fd_id,scorecard_id)
REFERENCES fd_scorecard(fd_id,scorecard_id));

ALTER TABLE issue_class ADD(
CONSTRAINT issue_class_class_type_fk
FOREIGN KEY (class_type_id)
REFERENCES class_type(cl_id));

ALTER TABLE issue_class ADD(
CHECK (num_grade BETWEEN 1 AND 5));

ALTER TABLE category_type ADD(
CONSTRAINT category_type_class_type_fk
FOREIGN KEY (class_type_id)
REFERENCES class_type(cl_id));

ALTER TABLE issue_category ADD(
CONSTRAINT issue_category_category_typ_fk
FOREIGN KEY (category_type_id, class_type_id)
REFERENCES category_type(ct_id, class_type_id));

the following query produces the required results (i.e. what in my Cobol days I'd have called File Header, (Batch Header, Details, Batch Trailer) times n, File Trailer):

SELECT 0 Class_ID,
f.fd_name Class_Description,
0 Class_Type_ID,
NULL Catg_ID,
fs.status Catg_Description,
TO_NUMBER(NULL) Num_Grade,
TO_NUMBER(NULL) Weighting,
fs.date_of_meeting Date_Created,
NULL Created_By
FROM fd f,
fd_scorecard fs
WHERE fs.fd_id = f.fd_id
AND f.fd_id = 'FD000001'
UNION ALL
SELECT cl_id Class_ID,
cl.description Class_Description,
0 Class_Type_ID,
NULL Catg_ID,
NULL Catg_Description,
TO_NUMBER(NULL) Num_Grade,
TO_NUMBER(NULL) Weighting,
TO_DATE(NULL) Date_Created,
NULL Created_By
FROM class_type cl
UNION ALL
SELECT cl.cl_id Class_ID,
NULL Class_Description,
ic.class_type_id,
CASE ic.class_type_id
WHEN 1 THEN 'A'
WHEN 2 THEN 'B'
WHEN 3 THEN 'C'
WHEN 4 THEN 'D'
WHEN 5 THEN 'E'
WHEN 6 THEN 'F'
WHEN 7 THEN 'G'
WHEN 8 THEN 'H'
WHEN 9 THEN 'I'
WHEN 10 THEN 'J'
WHEN 11 THEN 'K'
WHEN 12 THEN 'L'
END||
TO_CHAR(ct.ct_id) Catg_ID,
CASE GROUPING (ct.ct_id)
WHEN 0 THEN ct.description
ELSE
CASE GROUPING(ic.class_type_id)
WHEN 0 THEN 'Grade'
ELSE 'Overall Grade And Conclusion'
END
END Catg_Description,
ROUND(SUM(ic.num_grade * ic.weighting) / SUM(ic.weighting)) Num_Grade,
ROUND(AVG(ct.weighting)) Weighting,
ic.date_created,
ic.created_by
FROM fd f,
issue_class icl,
class_type cl,
category_type ct,
issue_category ic
WHERE f.fd_id = icl.fd_id
AND icl.class_type_id = cl.cl_id
AND ct.class_type_id = cl.cl_id
AND ic.class_type_id = icl.class_type_id
AND ic.scorecard_id = icl.scorecard_id
AND ic.fd_id = icl.fd_id
AND ic.category_type_id = ct.ct_id
AND f.fd_id = 'FD000001'
GROUP BY GROUPING SETS((ic.class_type_id),(cl.cl_id,cl.description, ic.class_type_id, ct.ct_id, ct.description, ic.date_created, ic.created_by),())
ORDER BY class_id, class_type_id, catg_id;

Can this be done with a single select?

Single query to replace 'Union All'

Matthew, January 08, 2008 - 9:14 am UTC

Apologies, the query I gave you was leaving the 'batch trailer' rows till the end. This one is OK:

SELECT 0 Class_ID,
f.fd_name Class_Description,
0 Class_Type_ID,
0 Ct_ID,
NULL Catg_ID,
fs.status Catg_Description,
0 Batch_Trailer,
TO_NUMBER(NULL) Num_Grade,
TO_NUMBER(NULL) Weighting,
fs.date_of_meeting Date_Created,
NULL Created_By
FROM fd f,
fd_scorecard fs
WHERE fs.fd_id = f.fd_id
AND f.fd_id = :p_fd_id
UNION ALL
SELECT cl_id Class_ID,
cl.description Class_Description,
0 Class_Type_ID,
0 Ct_ID,
NULL Catg_ID,
NULL Catg_Description,
0 Batch_Trailer,
TO_NUMBER(NULL) Num_Grade,
TO_NUMBER(NULL) Weighting,
TO_DATE(NULL) Date_Created,
NULL Created_By
FROM class_type cl
UNION ALL
SELECT NVL(cl.cl_id, ic.class_type_id) Class_ID,
NULL Class_Description,
ic.class_type_id,
NVL(ct.ct_id,0) Ct_ID,
CASE ic.class_type_id
WHEN 1 THEN 'A'
WHEN 2 THEN 'B'
WHEN 3 THEN 'C'
WHEN 4 THEN 'D'
WHEN 5 THEN 'E'
WHEN 6 THEN 'F'
WHEN 7 THEN 'G'
WHEN 8 THEN 'H'
WHEN 9 THEN 'I'
WHEN 10 THEN 'J'
WHEN 11 THEN 'K'
WHEN 12 THEN 'L'
END||
TO_CHAR(ct.ct_id) Catg_ID,
CASE GROUPING (ct.ct_id)
WHEN 0 THEN ct.description
ELSE
CASE GROUPING(ic.class_type_id)
WHEN 0 THEN 'Grade'
ELSE 'Overall Grade And Conclusion'
END
END Catg_Description,
GROUPING_ID(ct.description) Batch_Trailer,
ROUND(SUM(ic.num_grade * ic.weighting) / SUM(ic.weighting)) Num_Grade,
ROUND(AVG(ct.weighting)) Weighting,
ic.date_created,
ic.created_by
FROM fd f,
issue_class icl,
class_type cl,
category_type ct,
issue_category ic
WHERE f.fd_id = icl.fd_id
AND icl.class_type_id = cl.cl_id
AND ct.class_type_id = cl.cl_id
AND ic.class_type_id = icl.class_type_id
AND ic.scorecard_id = icl.scorecard_id
AND ic.fd_id = icl.fd_id
AND ic.category_type_id = ct.ct_id
AND f.fd_id = 'FD000001'
GROUP BY GROUPING SETS((ic.class_type_id),(cl.cl_id,cl.description, ic.class_type_id, ct.ct_id,
ct.description, ic.date_created, ic.created_by),())
ORDER BY class_id, class_type_id, batch_trailer, ct_id;

January 08, 2008 - 9:49 am UTC

I sort of doubt it based on the from lists and the group by grouping sets.

SELECT
  FROM fd f,
       fd_scorecard fs
 UNION ALL
SELECT
  FROM class_type cl
 UNION ALL
SELECT
  FROM fd f,
       issue_class icl,
       class_type cl,
       category_type ct,
       issue_category ic

if the union alls were subsets of eachother - maybe, but here you have fd+fd_scorecard - but score card isn't present in the last query - and fd is 1:M with fd_scorecard it would tend to "add rows" to the last query if we added it (assuming an OUTER JOIN was there in case the relationship is 1:M OPTIONAL - each fd row could have 0, 1 or MORE entries in fd_scorecard)

Single query to replace 'Union All'

Matthew, January 08, 2008 - 9:57 am UTC

I won't waste any more time on it then. Many thanks.

Using analytics for getting hourly intervals

Thiru, January 14, 2008 - 10:45 am UTC

Tom,

What is a good way using analytics for getting the result for:

create table t1 ( id number, sid number,qty number,stime date)
insert into t1 values(1,10,100,to_date('01-15-2008 12:00','MM-DD-YYYY HH24:MI'))
insert into t1 values(1,10,200,to_date('01-15-2008 12:05','MM-DD-YYYY HH24:MI'))
insert into t1 values(1,10,200,to_date('01-15-2008 13:05','MM-DD-YYYY HH24:MI'))
insert into t1 values(1,10,200,to_date('01-15-2008 13:10','MM-DD-YYYY HH24:MI'))
insert into t1 values(1,10,200,to_date('01-15-2008 14:10','MM-DD-YYYY HH24:MI'))

the output for sum (qty) needs to be grouped by id and sid and hourly time with the top of the hour being shown.

ID SID TOP_OF_HOUR TOTAL_QTY
1 10 01-15-2008 12:00 300
1 10 01-15-2008 13:00 400
1 10 01-15-2008 14:00 200

Thanks for the time

January 14, 2008 - 3:43 pm UTC

you don't want analytics, you just want to trunc the time to the HOUR level and group by id, sid, date.... sum(qty) by that.

code..

Jay, January 14, 2008 - 5:45 pm UTC

Tom refers to a code like this..

select id,
       sid, 
       trunc(stime,'hh'),
       sum(qty)
  from t1 
group by id, sid, trunc(stime,'hh')

Thanks!

Selecting columns that do not change

Thiru, January 15, 2008 - 8:51 am UTC

Thanks for the input. If I have let's say 15 columns in the above t1 table and all the columns except the one that is being grouped on are the same, is there a way I can avoid putting all the columns in the group by clause?

if columns are c1....c15 and all columns from c4 to c15 are constant values for the related groups, how do I avoid doing the following?

select c1,c2,sum(c3),....c15
group by c1,c2,c4....c15

Only c1 and c2 are the actual grouping columns

Thanks for the time again.

January 15, 2008 - 1:02 pm UTC

you only have to type it once

and it is mandatory to be typed

You will not avoid the following(the above) but feel free to hide it in a view by all means.

do a max...

Jay, January 15, 2008 - 9:04 am UTC

select c1, c2, sum(c3), max(c4), max(c5)...,max(c15)
group by c1, c2

Thanks,
Jay

January 15, 2008 - 1:10 pm UTC

do not do that.

the goal in programming is not to write obfuscated code for the ease of the current human typing.

the goal is to generate correct code that is maintainable, understandable and correct.

That query would be so misleading as to the intent, the purpose. I would look at it later and say "what sort of goofy person did this, this is so wrong, don't they understand this data model - we'll have to review all of their code now, they don't know what this data is"

Please don't do something like this to avoid a few keystrokes once. This is as simple as CUT AND PASTE

select a, b, c, d, .... z, sum(xx) 
  from t
group by <yank that select list and paste it here>

max worked.

Thiru, January 15, 2008 - 10:45 am UTC

Thanks. That did it. Is max efficient to min in this case because the other column is doing a sum().

January 15, 2008 - 2:59 pm UTC

stop it, please don't be lazy.

Your group by key is your group by key.

Your aggregates are your aggregates.

Please do not confuse the two. You will confuse someone down the road or they'll be convinced you had no idea what you were doing.

I'll never, in a billion years never, be able to understand the fear of typing.

Jay, January 15, 2008 - 11:02 am UTC

I don't understand your question. Both Max or Min would work in this case. No pros or cons... it's the same banana!

Thanks!
Jay

January 15, 2008 - 3:23 pm UTC

yeah, they are both WRONG and INAPPROPRIATE to use.

And that they "asked" implies they don't understand what it is actually doing and that is even worse.

Thiru, January 15, 2008 - 11:33 am UTC

Though the max function did the trick, dont you feel that when working with huge sets of data, the function would be an expensive operation as all these other constant value columns are not going to be indexed. I would like to just pick up one value without using the aggregate function call or do a group by all the 15 constant value columns.

Hope I am making myself clear here. It's like this:

c1 c2 c3 c4 c5 c6
1 2 1 1 1 1
1 5 1 1 1 1
2 10 2 2 2 2
2 20 2 2 2 2

As you see : select c1,sum(c2),max(c3),max(c4),max(c5),max(c6) from t1 group by c1

this has got a) the grouping operation b) the max operation for each of the other columns

would like to avoid this max to get the result:

c1 c2 c3 c4 c5 c6

1 7 1 1 1 1
2 30 2 2 2 2

Thanks again.

January 15, 2008 - 3:34 pm UTC

hey - just use GROUP BY !!!!! (come on, this is a trivial CUT AND PASTE people)

think about this - using MAX() is harder for you to code. *harder*

It is not only wrong
It is *harder*

In your attempt to shave a millisecond of your time, you add many seconds to your time

<b>
select c1, c2, c3, c4, c5, c6, sum(x) 
from t
group by </b> c1, c2, c3, c4, c5, c6

the code in bold - what you have to type. the stuff not in bold, the stuff you have to copy and paste.

Now, do it with max and you are - typing MORE

heck, way more - because you probably have to alias the columns

select max(c1) c1, max(c2) c2, .......

all in a misguided attempt to "type less"...

and yeah, you don't want to max all over the place for performance reasons.

Jay, January 15, 2008 - 1:06 pm UTC

Did you test run the query using the max function? Are you having any performance issues? Tom might talk to you more about this. But, I feel that you should evaluate your query and measure the performance before making a 'generic' comment without any basis.

Now, if you really "hate" doing 'group by', one way to do this via analytics would be something like this.


select distinct c1,
                sum_c2,
                c3,
                c4
  from

       (select c1,
               c2,
               sum(c2) over (partition by c1) as sum_c2,
               c3,
               c4
          from t1)

But, again, I don't understand why you would do this!

Thanks,
Jay

January 15, 2008 - 3:37 pm UTC

it will take a little more cpu to max lots of columns than to group by...

this was a quick and dirty 10g test:

SELECT OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID, OBJECT_TYPE, CREATED,
  LAST_DDL_TIME, TIMESTAMP, STATUS, TEMPORARY, GENERATED, SECONDARY,
  SUM(DATA_OBJECT_ID)
FROM
 T GROUP BY OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID, OBJECT_TYPE,
  CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS, TEMPORARY, GENERATED, SECONDARY

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute     10      0.00       0.00          0          0          0           0
Fetch     5000      7.68      13.12       4800       6900          0      499010
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     5011      7.69      13.13       4800       6900          0      499010
********************************************************************************
SELECT MAX(OWNER), MAX( OBJECT_NAME), MAX( SUBOBJECT_NAME), OBJECT_ID,
  MAX(OBJECT_TYPE), MAX( CREATED), MAX( LAST_DDL_TIME), MAX( TIMESTAMP), MAX(
  STATUS), MAX(TEMPORARY), MAX( GENERATED), MAX( SECONDARY),
  SUM(DATA_OBJECT_ID)
FROM
 T GROUP BY OBJECT_ID

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.02          0          1          0           0
Execute     10      0.00       0.00          0          0          0           0
Fetch     5000      9.27      16.05       7800       6900          0      499010
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     5011      9.29      16.08       7800       6901          0      499010

but the right answer is to group by that which you want to group by.

Thanks Tom

Jay, January 15, 2008 - 1:29 pm UTC

Quote --

"... the goal is to generate correct code that is maintainable, understandable and correct......"

Tom, thanks for the comment. I get your point!

Jay

Conditional sorting

Reader, January 22, 2008 - 7:03 pm UTC

Sample data
create table t
(
num_col number,
var_col varchar2(3)
);

insert into t values(1,'abc');
insert into t values(1,'def');
insert into t values(1,'pqr');

insert into t values(2,'cde');
insert into t values(2,'def');
insert into t values(2,'rst');
insert into t values(2,'xyz');

insert into t values(3,'bcd');
insert into t values(3,'def');
insert into t values(3,'pqr');

Expected Output

1 abc <== This num_col group comes first as abc is the least value among all var_col values in the table
1 def
1 pqr
3 bcd <== This num_col group comes next as bcd will be the next sorted value after abc among all var_col values in the table
3 def
3 pqr
2 cde <== This num_col group comes next as bcd will be the next sorted value after abc among all var_col values in the table
2 def
2 rst
2 xyz

Is it possible to achieve the above expected output in a single SQL query, maybe using Analytics?

January 22, 2008 - 7:13 pm UTC

ops$tkyte%ORA10GR2> select num_col, var_col, min(var_col) over (partition by num_col) min_var_col
  2    from t
  3   order by 3
  4  /

   NUM_COL VAR MIN
---------- --- ---
         1 pqr abc
         1 abc abc
         1 def abc
         3 bcd bcd
         3 def bcd
         3 pqr bcd
         2 rst cde
         2 xyz cde
         2 cde cde
         2 def cde

10 rows selected.

Fantastic

Reader, January 22, 2008 - 7:22 pm UTC

s devarshi, January 29, 2008 - 8:59 am UTC

i have table t.
create table t( x number(3),y varchar2(3),z number(3));
the rows are

insert into t values (1,'a',5);
insert into t values (1,'a',10);
insert into t values (1,'b',25);
insert into t values (1,'b',12);
insert into t values (1,'c',5);
insert into t values (1,'c',5);
insert into t values (2,'a',5);
insert into t values (2,'a',15);
insert into t values (2,'c',5);
insert into t values (2,'c',15);
insert into t values (3,'a',5);
insert into t values (3,'a',5);
insert into t values (3,'b',5);
insert into t values (4,'a',5);
insert into t values (4,'a',35);
insert into t values (5,'a',55);
insert into t values (5,'a',5);
insert into t values (5,'b',55);
insert into t values (6,'b',35);
insert into t values (6,'c',35);

i need an output like this
x y z
- - -
5 a 60
5 b 55
5 tot 115
6 b 35
6 c 35
6 tot 70
4 a 40
4 tot 40

the sorting is on z (tot) ,grouped by x and y .
i can get the output

x,y1,y2,y3,tot
format using decode but not this.

January 30, 2008 - 8:59 am UTC

the sorting is on X,Y, and then SUM(z)

you missed X,Y in your "specification", you must order by X,Y and then sum(z). You cannot sort by Z as that would cause X,Y to not be "grouped" in the result set of course.

ops$tkyte%ORA10GR2> select x, decode( grouping(y), 1, 'tot', y ), sum(z)
  2    from t
  3   group by grouping sets ((x,y),(x))
  4   order by x, y, sum(z)
  5  /

         X DEC     SUM(Z)
---------- --- ----------
         1 a           15
         1 b           37
         1 c           10
         1 tot         62
         2 a           20
         2 c           20
         2 tot         40
         3 a           10
         3 b            5
         3 tot         15
         4 a           40
         4 tot         40
         5 a           60
         5 b           55
         5 tot        115
         6 b           35
         6 c           35
         6 tot         70

18 rows selected.

devarshi, January 31, 2008 - 12:58 am UTC

superb

analytics and 8.1.7

devarshi, February 05, 2008 - 12:59 am UTC

i have two more queries.
1. how to do it in 8.1.7 ?
2. can analytic function be used here ?
thanks

February 05, 2008 - 8:13 am UTC

ops$tkyte@ORA817DEV> select decode( grouping(x), 1, 'tot', x ) x,
  2             decode( grouping(y), 1, 'tot', y ),
  3             sum(z),
  4             grouping(x) gx, grouping(y) gy
  5    from t
  6   group by rollup(x,y)
  7  /

X   DEC     SUM(Z)         GX         GY
--- --- ---------- ---------- ----------
1   a           15          0          0
1   b           37          0          0
1   c           10          0          0
1   tot         62          0          1
2   a           20          0          0
2   c           20          0          0
2   tot         40          0          1
3   a           10          0          0
3   b            5          0          0
3   tot         15          0          1
4   a           40          0          0
4   tot         40          0          1
5   a           60          0          0
5   b           55          0          0
5   tot        115          0          1
6   b           35          0          0
6   c           35          0          0
6   tot         70          0          1
tot tot        342          1          1

19 rows selected.

you can use a where clause to get rid of the tot/tot row if you want.

analytics (the over () functions) would not be appropriate, they do not "make up" rows like grouping sets, rollup and cube do.

Grouped Pagination

reader, February 07, 2008 - 8:26 am UTC

Hi Tom,
I have a view like this(Scott EMP table).
select min(job) over (partition by deptno) mnjb, deptno dnm , e.* from emp e
order by mnjb,deptno;

This view returns an ordered resultset. On this ordered resultset i want to generate a sequential number against each distinct DEPTNO, which I will use for pagination.

The expected output is:

1 ANALYST 20 7369 SMITH CLERK 7902 12/17/1980 00:00:00 800 20
1 ANALYST 20 7876 ADAMS CLERK 7788 01/12/1983 00:00:00 1100 20
1 ANALYST 20 7788 SCOTT ANALYST 7566 12/09/1982 00:00:00 3000 20
1 ANALYST 20 7902 FORD ANALYST 7566 12/03/1981 00:00:00 3000 20
1 ANALYST 20 7566 JONES MANAGER 7839 04/02/1981 00:00:00 2975 20
2 CLERK 10 7782 CLARK MANAGER 7839 06/09/1981 00:00:00 2450 10
2 CLERK 10 7839 KING PRESIDENT 11/17/1981 00:00:00 5000 10
2 CLERK 10 7934 MILLER CLERK 7782 01/23/1982 00:00:00 1300 10
3 CLERK 30 7499 ALLEN SALESMAN 7698 02/20/1981 00:00:00 1600 300 30
3 CLERK 30 7698 BLAKE MANAGER 7839 05/01/1981 00:00:00 2850 30
3 CLERK 30 7654 MARTIN SALESMAN 7698 09/28/1981 00:00:00 1250 1400 30
3 CLERK 30 7900 JAMES CLERK 7698 12/03/1981 00:00:00 950 30
3 CLERK 30 7844 TURNER SALESMAN 7698 09/08/1981 00:00:00 1500 0 30
3 CLERK 30 7521 WARD SALESMAN 7698 02/22/1981 00:00:00 1250 500 30

Please refer the first column. The sequence changes based on the ordered distinct DEPTNO as received from the view.

Is it possible to get the sequences using Analytics or any other method?

February 07, 2008 - 9:07 am UTC

ops$tkyte%ORA10GR2> create or replace view v
  2  as
  3  select dense_rank() over (order by mnjb, deptno) rnk,
  4         x.*
  5    from (
  6  select min(job) over (partition by deptno) mnjb,
  7         deptno dnm ,
  8             e.*
  9    from emp e
 10         ) x
 11   order by mnjb,deptno;

View created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select * from v;

       RNK MNJB             DNM      EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- --------- ---------- ---------- ---------- --------- ---------- --------- ---------- ---------- ----------
         1 ANALYST           20       7876 ADAMS      CLERK           7788 12-JAN-83       1100                    20
         1 ANALYST           20       7566 JONES      MANAGER         7839 02-APR-81       2975                    20
         1 ANALYST           20       7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
         1 ANALYST           20       7788 SCOTT      ANALYST         7566 09-DEC-82   20182.53                    20
         1 ANALYST           20       7369 SMITH      CLERK           7902 17-DEC-80        800                    20
         2 CLERK             10        123 HelloWorld                                                              10
         2 CLERK             10       7839 KING       PRESIDENT            17-NOV-81       5000                    10
         2 CLERK             10       7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
         2 CLERK             10       7934 MILLER     CLERK           7782 23-JAN-82       1300                    10
         3 CLERK             30       7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
         3 CLERK             30       7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
         3 CLERK             30       7900 JAMES      CLERK           7698 03-DEC-81        950                    30
         3 CLERK             30       7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
         3 CLERK             30       7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
         3 CLERK             30       7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30

15 rows selected.

Fantastic...superb....stunning

reader, February 07, 2008 - 9:42 am UTC

When will I be able to think like you?...just thinking out loud....

Ordering Rows by specific values

Humble, February 19, 2008 - 10:49 am UTC

Tom I have the below SQL
SELECT LPAD(NVL(pm.item,'0'),8,'0') item,
'00000000000' supplier,
TO_CHAR(LPAD(NVL(pm.dept,0),3,0)) dept,
TO_CHAR(LPAD(NVL(pm.class,0),4,'0')) class,
RPAD('R',1,'0') dummy,
TO_CHAR(LPAD((ROUND(NVL(pm.new_price,0),2) * 100),7,'0')) new_price, --izp.unit_retail,
'0000000' unit_cost,
' ' tax_category,
RPAD(NVL(pm.item_short_desc,' '),18,' ') item_short_desc,
TO_CHAR(LPAD(NVL(pm.old_multi_units,0),3,'0')) old_multi_units,
TO_CHAR(LPAD((ROUND(NVL(pm.old_multi_unit_retail,0),2) * 100),7,'0')) old_multi_unit_retail,
NVL(RPAD(DECODE(pm.tran_type,1,'A',10,'M',13,'M',21,'D'),1,' '),' ') tran_type
FROM nex_pos_mods pm
WHERE pm.store = TO_NUMBER(p_StoreNo)
AND tran_type IN (1,10,13,21)
GROUP BY pm.item,
'00000000000',
pm.dept,
pm.class,
pm.new_price,
item_short_desc,
pm.old_multi_units,
pm.old_multi_unit_retail,
'0000000',
pm.tran_type;

I would like to order the results by specific tran_types.
I would like all 'D' values first,
all 'A' values second
all 'M' values third

what could I do using SQL analytics?

February 19, 2008 - 5:03 pm UTC

why does everyone say "how can I do this with analytics" when faced with any "i cannot figure out how to write this in sql"?

I know analytics are pretty amazing, but they are not magic.

Analytics do not sort result sets.
Analytics do not aggregate
Analytics do not do many things.

this is a very simple order by requirement:

order by decode( tran_type, 'D', 1, 'A', 2, 'M' 3 ),

Most popular price

Thomas, February 25, 2008 - 11:36 am UTC

Hi Tom,

I have a table containing articles and a table containing
the current price for each article in each store.

For each article, I'd like to know the most popular price,
that is, the price used in the most number of markets. In
case of a tie (several prices being the most popular for an
article), I'd like to have the lowest of these most popular
prices.

Here is some simple test data:

CREATE TABLE article (article INTEGER PRIMARY KEY);
CREATE TABLE price (article INTEGER REFERENCES article,store INTEGER,price NUMBER);

INSERT INTO article VALUES(1);
INSERT INTO article VALUES(2);
INSERT INTO price VALUES(1,1,4.5);
INSERT INTO price VALUES(1,2,4.5);
INSERT INTO price VALUES(1,3,3);
INSERT INTO price VALUES(1,4,5);
INSERT INTO price VALUES(2,1,6);
INSERT INTO price VALUES(2,2,6.5);
INSERT INTO price VALUES(2,3,6);
INSERT INTO price VALUES(2,4,6.5);
INSERT INTO price VALUES(2,5,6);
INSERT INTO price VALUES(2,6,6.5);
INSERT INTO price VALUES(2,6,5);

I've tried it like this (without analytics) for an
individual article, which works fine:

SELECT * FROM (SELECT price FROM price p WHERE p.article=1
GROUP BY p.price ORDER BY COUNT(*) DESC,price)
WHERE ROWNUM=1

However, when I try to use this as a subquery like this,
Oracle 10gR2 won't let me:

SELECT a.article,(SELECT * FROM (SELECT price FROM price p
WHERE p.article=a.article GROUP BY p.price
ORDER BY COUNT(*) DESC,price) WHERE ROWNUM=1)
FROM article a;

Oracle tells me it doesn't know the alias a. The alias a
seems to be present in the first level of the subquery, but
not in the second level, i.e., where I want to use it.

Questions:

1. Why does Oracle tell me it doesn't know about a? To me,
this looks like a valid query. Is there anything I can do to
allow me access to a from the inner level of the subquery?

2. Can this be done better using a subquery with analytics?

February 25, 2008 - 2:06 pm UTC

1) the correlation name goes one level down. period.

2)

ops$tkyte@ORA920> select article, price
  2    from (
  3  select article, price,
  4             row_number() over (partition by article order by cnt desc, price ) rn
  5    from (
  6  select article, price,
  7             count(*) over (partition by article, price) cnt
  8    from price
  9             )
 10             )
 11   where rn = 1
 12   order by article
 13  /

   ARTICLE      PRICE
---------- ----------
         1        4.5
         2          6

analytics

A reader, February 26, 2008 - 6:08 pm UTC

Tom:

If I have a table

Mag_no, contract, stage, create_date
1 ABC 5 1/1/2008
1 ABC 5 1/15/2008
1 ABC 6 2/2/2008
1 ABC 6 3/3/3008
...............

How do I have a query to display the last create date record with highest stage

1 ABC 6 3/3/3008

February 27, 2008 - 2:26 am UTC

The lack of specificity here is huge....

apparently "stage" is not relevant, you want to collapse over that.

but what about mag_no, contract....

and where is the create table, the insert intos

Partition is detached from Order by?

Marat, April 12, 2008 - 4:47 am UTC

Hi Tom,
lets see a table

CREATE TABLE TTT
(
  F1  VARCHAR2(10),
  F2  NUMBER
);

Insert into TTT (F1, F2) Values ('1', 1);
Insert into TTT (F2) Values (2);
Insert into TTT (F2) Values (3);
Insert into TTT (F1, F2)Values ('2', 4);
Insert into TTT (F2) Values(5);
Insert into TTT (F2) Values(6);
Insert into TTT (F2) Values(7);



select ttt.*,
       row_number() over ( partition by f1 order by f2) rn,
       first_value(f2) over ( partition by f1 order by f2)fv
  from ttt
 order by f2;     


F1 F2 RN FV
1 1 1 1
  2 1 2
  3 2 2
2 4 1 4
  5 3 2
  6 4 2
  7 5 2

Note the values in RN - they continue from the first partition (f1=1), and FIRST_VALUE points the second row, where nulls in F1 begin.

So, how to partition same values to make them based on Order by?

Actually, I need the result like

F1 F2 RN FV
1 1 1 1
1 2 1 2
1 3 2 2
2 4 1 4
2 5 1 5
2 6 2 5
2 7 3 5

that is, I need null values to be filled with previous non-null value.
Thank you.

April 13, 2008 - 8:32 am UTC

... Note the values in RN - they continue from the first partition (f1=1), and FIRST_VALUE points the second row, where nulls in F1 begin. ...

no, they don't

F1    F2    RN    FV
1      1    1    1      <<<<= partition 1

2      4    1    4      <<<<= partition 2

       2    1    2      <<<<= partition 3
       3    2    2
       5    3    2
       6    4    2
       7    5    2

Now, since ROWS IN A TABLE HAVE NO ORDER WHATSOEVER, I have to presume you want to carry down the last non-null value of F1 after sorting the entire set by F2 - F2 is what "sequences" or "orders" this data - right?

So, your requirement is simply stated as "carry down the last non-null value of F1 after sorting all data by F2"

ops$tkyte%ORA10GR2> select last_value(f1 ignore nulls) over (order by f2) f1_new,
  2         f2
  3    from ttt
  4   order by f2
  5  /

F1_NEW             F2
---------- ----------
1                   1
1                   2
1                   3
2                   4
2                   5
2                   6
2                   7

7 rows selected.

that works in 10g, in earlier releases, assuming f2 is NOT NULL and POSITIVE - the following would cut it:

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select substr( max( case when f1 is not null then to_char(f2,'fm0000000') || f1 end ) over (order by f2), 8, 10 ) f1_new,
  2         f2
  3    from ttt
  4   order by f2
  5  /

F1_NEW             F2
---------- ----------
1                   1
1                   2
1                   3
2                   4
2                   5
2                   6
2                   7

7 rows selected.

How it works?

A reader, April 14, 2008 - 3:30 am UTC

Excellent! Thank you, Tom! It works fine.

But... I still can't understand why it works NOT like this:

F1 F2 RN FV
1 1 1 1 => Partition 1
2 1 2 => Partition 2
3 2 2
2 4 1 4 => Partition 3
5 3 2 => Partition 4
6 4 2
7 5 2

April 16, 2008 - 1:50 pm UTC

umm, because F1 has three values in your table

f1 = 1
f1 = 2
f1 is null

therefore, if you partition by f1, you will have three partitions.

Is PARTITION BY works before ORDER BY?

Marat, April 15, 2008 - 5:25 am UTC

Well, now I realize that PARTITION BY is working before ORDER BY, isn't it?

April 16, 2008 - 2:42 pm UTC

the data is partitioned by your partition by

THEN, the data in that partition is sorted.

THEN, the window is "set up" (range or row window), the default window is typically "current row and all preceding rows" with an order by.

THEN the function is applied.

Query

Tim, April 17, 2008 - 2:12 pm UTC

N1 N2
1 345
2 645
3 378
4 95
5 2557
6 95
7 111
8 8756
9 40

I need to write SQL for below algorithm:

As you see N1 is sequential. So, algorithm starts from end (MAX N1).

1. Get curent N1 and N2 (lets keep it as N1_1 nd N2_1)
2. Go to previous record and get new N1 and N2 (lets keep it as N1_2 nd N2_2)
3. if N2_2 < N2_1 the return this record else go to previous record and start from step 1

Regarding above data, query must return;

N1 N2
1 345 -> return this (345 is less than 645)
2 645
3 378
4 95 -> return this (95 is less than 2557)
5 2557
6 95
7 111 -> return this (111 is less than 8756)
8 8756
9 40

Could you pls help to write this query?

April 17, 2008 - 4:33 pm UTC

no create table
no insert into table
no look

this is trivial with lag or lead - read up on them, you'll probably be able to figure it out. lag lets you look "back N records" in a result set. Lead - look forward N records...

create table

Tim, April 18, 2008 - 12:05 am UTC

Sorry Tom.

Pls see below for table creation.

create table test (N1 int, N2 int);

insert into test values (1,345);
insert into test values (2,645);
insert into test values (3,378);
insert into test values (4,95);
insert into test values (5,2557);
insert into test values (6,95);
insert into test values (7,111);
insert into test values (8,8756);
insert into test values (9,40);

Thanks;

April 18, 2008 - 8:31 am UTC

did you even try, sigh.... the answer was given, hundreds of examples exist on this site...

did you notice your logic doesn't match your picture?

1. Get curent N1 and N2 (lets keep it as N1_1 nd N2_1)
2. Go to previous record and get new N1 and N2 (lets keep it as N1_2 nd N2_2)
3. if N2_2 < N2_1 the return this record else go to previous record and start from step 1

Regarding above data, query must return;

N1    N2
1    345 -> return this (345 is less than 645)
2    645
3    378
4    95 -> return this (95 is less than 2557)
5    2557
6    95
7    111 -> return this (111 is less than 8756)
8    8756
9    40

1. Get curent N1 and N2 (lets keep it as N1_1 nd N2_1)
2. Go to previous record and get new N1 and N2 (lets keep it as N1_2 nd N2_2)
3. if N2_2 < N2_1 the return this record else go to previous record and start from step 1

1) assume current = N1=7, N2_1=111
2) previous record is N1=6, N2_2=95
3) n2_2 < n2_1, but you wrote "111 is less than 8756"

Ok, play with lag and lead:

ops$tkyte%ORA10GR2> select n1, n2,
  2         lead(n2) over (order by n1) next_n2,
  3         lag(n2) over (order by n1) prior_n2
  4    from test
  5  /

        N1         N2    NEXT_N2   PRIOR_N2
---------- ---------- ---------- ----------
         1        345        645
         2        645        378        345
         3        378         95        645
         4         95       2557        378
         5       2557         95         95
         6         95        111       2557
         7        111       8756         95
         8       8756         40        111
         9         40                  8756

9 rows selected.

and tell us what you really meant - if you cannot get it yourself given this example.

keen relationship

Ken, April 23, 2008 - 6:29 pm UTC

If Ids are available then the query writing gets easy but what could be used if there is only name matching pattens?
For instance if combination is that the Second name of Son will be first name of his father, the third name of the son will be first name of this grand father and sons, forth name will be name of this great grand father. This rules apply on all names, thats why we have 4 names fields other than family name.

I have a table with following fields
CREATE TABLE KEEN_RELATIONSHIP
( "TESTID" NUMBER,
"AGE" NUMBER(3,0),
"FIRSTNAME" VARCHAR2(25 BYTE),
"SECONDNAME" VARCHAR2(25 BYTE),
"THIRDNAME" VARCHAR2(25 BYTE),
"FORTHNAME" VARCHAR2(25 BYTE),
"FAMILYNAME" VARCHAR2(25 BYTE));

And have 39 following records in this table
-- INSERTING into KEEN_RELATIONSHIP
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961527492,47,'Mishal','Mohd','Ali','Abdullah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961562389,47,'Waleed','Abdul Aziz','Abdul Mohsin',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (962101220,42,'Khalid','Mohd','Ali','Abdullah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (964338909,30,'Faisal','Mohd','Ali','Abdullah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961746095,44,'Abdullah','Mohd','Ali','Abdullah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (960785299,56,'Suliman','Salah','Suliman',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (960986913,55,'Abdul Aziz','Salah','Suliman',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (962749770,38,'Ali','Mohd','Ali','Abdullah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (962579046,39,'Maman','Mohd','Ali','Abdullah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961747704,45,'Bader','Abdul Aziz','Abdul Mohsin',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (962145270,43,'Khalid','Mohd','Ibrahim','Al Awada','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (960714745,66,'Ibrahim','Mohd','Ibrahim','Al Awada','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (960792751,57,'Mohd','Ali','Abdul Mohsin','Mohd','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (960111460,82,'Ali','Abdul Mohsin','Mohd',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961886076,44,'Abdul Latif','Ali','Abdul Mohsin','Mohd','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (964355027,33,'Abdul Aziz','Ali','Abdul Mohsin','Mohd','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (963164687,36,'Abdul Mohsin','Ali','Abdul Mohsin','Mohd','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (960519063,72,'Baderia','Ali',null,null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (963027715,36,'Shekha','Ibrahim','Abdul Rehman','Salah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (960985516,52,'Noora','Abdul Aziz','Abdul Mohsin',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961282916,50,'Fozia','Abdul Aziz','Abdul Mohsin',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (962082670,64,'Naja','Rashid','Abdul Rehman',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961971436,45,'Mana','Ali','Salah',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (962094798,82,'Munira','Abdul Mohsin','Mohd',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (964127289,35,'Alia','Mohd','Ali','Abdullah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961377460,48,'Nawal','Abdul Aziz','Abdul Mohsin',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (967763537,23,'Noora','Abdullah','Mohd','Ali Abdullah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (967775024,23,'Shekha','Bader','Abdul Aziz','Abdul Mohsin','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (962214823,41,'Khlood','Mohd','Ali','Abdullah','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (960958623,54,'Baderia','Ali','Abdul Mohsin','Mohd','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961775619,62,'Wasam','Ali','Abdullah',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961541876,51,'Maryaam','Mohd','Ibrahim','Al Awada','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (962343452,61,'Husa','Al Salah','Al Mansoor',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (966099168,26,'Afnan','Ibrahim','Mohd','Ibrahim Al Awada','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961669035,45,'Mana','Ali','Abdul Mohsin','Mohd','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (961076755,51,'Shekha','Ali','Abdul Mohsin','Mohd','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (960791757,91,'Lulu','Ibrahim','Al Awada',null,'Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (966364107,30,'Ghzlan','Mohd','Ali','Abdul Mohsin Mohd','Jaan');
Insert into KEEN_RELATIONSHIP (TESTID,AGE,FIRSTNAME,SECONDNAME,THIRDNAME,FORTHNAME,FAMILYNAME) values (963587153,73,'Latifa','Abdul Rehman','Salah',null,'Jaan');

April 28, 2008 - 9:37 am UTC

... If Ids are available then the query writing gets easy ...

no it doesn't, in fact, it can get easier to get wrong - but we digress.

there is no question here as far as I can see - at all.

and if the rule were "my name is derived from this rule", I would have a hierarchy - NOT a flat table like this.

Meaning, I have my name, my father has his name and so on - and if you want to see my full name, we retrieve the data on the fly.

A reader, May 20, 2008 - 7:03 pm UTC

create table fas157_mref_inp_load_status_b
(Load_id integer,
Load_Status char(1),
month_id integer,
updated_dt date default sysdate,
updated_by varchar2(4000) default sys_context('USERENV','OS_USER'));

alter table fas157_mref_inp_load_status add constraint fas157_mref_inp_load_status_pk
primary key(Load_id);

create table fas157_mref_out_load_status_b
(Load_id integer,
Load_Status char(1),
month_id integer,
updated_dt date default sysdate,
updated_by varchar2(4000) default sys_context('USERENV','OS_USER'));

alter table fas157_mref_out_load_status add constraint fas157_mref_out_load_status_pk
primary key(Load_id);

insert into FAS157_MREF_INP_LOAD_STATUS_B (LOAD_ID, LOAD_STATUS, MONTH_ID, UPDATED_DT, UPDATED_BY)
values (2, null, 200804, to_date('16-05-2008 15:46:52', 'dd-mm-yyyy hh24:mi:ss'), 'mamarti');
insert into FAS157_MREF_INP_LOAD_STATUS_B (LOAD_ID, LOAD_STATUS, MONTH_ID, UPDATED_DT, UPDATED_BY)
values (3, 'P', 200804, to_date('09-05-2008 15:58:44', 'dd-mm-yyyy hh24:mi:ss'), 'mamarti');
insert into FAS157_MREF_INP_LOAD_STATUS_B (LOAD_ID, LOAD_STATUS, MONTH_ID, UPDATED_DT, UPDATED_BY)
values (1, 'P', 200804, to_date('09-05-2008 15:12:21', 'dd-mm-yyyy hh24:mi:ss'), 'mamarti');
insert into FAS157_MREF_INP_LOAD_STATUS_B (LOAD_ID, LOAD_STATUS, MONTH_ID, UPDATED_DT, UPDATED_BY)
values (4, 'p', 200804, to_date('20-05-2008 09:12:21', 'dd-mm-yyyy hh24:mi:ss'), null);
insert into FAS157_MREF_INP_LOAD_STATUS_B (LOAD_ID, LOAD_STATUS, MONTH_ID, UPDATED_DT, UPDATED_BY)
values (5, 'P', 200804, to_date('20-05-2008 09:11:21', 'dd-mm-yyyy hh24:mi:ss'), null);
insert into FAS157_MREF_INP_LOAD_STATUS_B (LOAD_ID, LOAD_STATUS, MONTH_ID, UPDATED_DT, UPDATED_BY)
values (6, 'P', 200804, to_date('09-05-2008 15:59:44', 'dd-mm-yyyy hh24:mi:ss'), null);
commit;

insert into FAS157_MREF_OUT_LOAD_STATUS_B (LOAD_ID, LOAD_STATUS, MONTH_ID, UPDATED_DT, UPDATED_BY)
values (86, 'P', 200804, to_date('14-05-2008 09:56:51', 'dd-mm-yyyy hh24:mi:ss'), 'mamarti');
insert into FAS157_MREF_OUT_LOAD_STATUS_B (LOAD_ID, LOAD_STATUS, MONTH_ID, UPDATED_DT, UPDATED_BY)
values (87, 'P', 200804, to_date('20-05-2008 11:56:51', 'dd-mm-yyyy hh24:mi:ss'), null);
insert into FAS157_MREF_OUT_LOAD_STATUS_B (LOAD_ID, LOAD_STATUS, MONTH_ID, UPDATED_DT, UPDATED_BY)
values (88, 'P', 200804, to_date('19-05-2008 11:56:51', 'dd-mm-yyyy hh24:mi:ss'), null);
commit;

DESC fas157_mref_INP_load_status_b
Name Null? Type
----------------------------------------- -------- ----------------------------
LOAD_ID NUMBER(38)
LOAD_STATUS CHAR(1 CHAR)
MONTH_ID NUMBER(38)
UPDATED_DT DATE
UPDATED_BY VARCHAR2(4000 CHAR)

desc FAS157_MREF_OUT_LOAD_STATUS_B
Name Null? Type
----------------------------------------- -------- ----------------------------
LOAD_ID NUMBER(38)
LOAD_STATUS CHAR(1 CHAR)
MONTH_ID NUMBER(38)
UPDATED_DT DATE
UPDATED_BY VARCHAR2(4000 CHAR)

For all load_ids in output table(FAS157_MREF_OUT_LOAD_STATUS_B
) with load_status='P', I would like to get the load_id in input table (FAS157_MREF_INP_LOAD_STATUS_B) whose updated_dt is nearest to and less than the updated_dt in output table. Can we do this in a single SQL ?

thanks
Christian

May 20, 2008 - 10:00 pm UTC

ops$tkyte%ORA10GR2> select out.*,
  2         (select max( load_id ) keep ( dense_rank first order by inp.updated_dt desc )
  3                from FAS157_MREF_INP_LOAD_STATUS_B inp
  4                   where inp.updated_dt <= out.updated_dt ) that_load_id
  5    from FAS157_MREF_OUT_LOAD_STATUS_B out
  6   where out.load_status = 'P'
  7  /

   LOAD_ID L   MONTH_ID UPDATED_DT           UPDATED_BY THAT_LOAD_ID
---------- - ---------- -------------------- ---------- ------------
        86 P     200804 14-may-2008 09:56:51 mamarti               6
        87 P     200804 20-may-2008 11:56:51                       4
        88 P     200804 19-may-2008 11:56:51                       2

A reader, May 20, 2008 - 7:06 pm UTC

Correction above

alter table fas157_mref_inp_load_status_b add constraint fas157_mref_inp_load_status_pk
primary key(Load_id);

alter table fas157_mref_out_load_status_b add constraint fas157_mref_out_load_status_pk
primary key(Load_id);

Reader, May 21, 2008 - 12:59 pm UTC

create table test
(id varchar2(10),
dt date,
amt number);

insert into test
values
('ABC', to_date('01/02/2008','mm/dd/yyyy'),20);

insert into test
values
('ABC', to_date('01/02/2008','mm/dd/yyyy'),40);

insert into test
values
('ABC', to_date('01/03/2008','mm/dd/yyyy'),0);

insert into test
values
('ABC', to_date('01/04/2008','mm/dd/yyyy'),50);

insert into test
values
('ABC', to_date('01/05/2008','mm/dd/yyyy'),0);

insert into test
values
('ABC', to_date('01/06/2008','mm/dd/yyyy'),20);

insert into test
values
('ABC', to_date('01/07/2008','mm/dd/yyyy'),0);

--

insert into test
values
('PQR', to_date('01/02/2008','mm/dd/yyyy'),30);

insert into test
values
('PQR', to_date('01/03/2008','mm/dd/yyyy'),0);

insert into test
values
('PQR', to_date('01/04/2008','mm/dd/yyyy'),80);

insert into test
values
('PQR', to_date('01/05/2008','mm/dd/yyyy'),0);

insert into test
values
('PQR', to_date('01/06/2008','mm/dd/yyyy'),10);

insert into test
values
('PQR', to_date('01/07/2008','mm/dd/yyyy'),0);

insert into test
values
('PQR', to_date('01/08/2008','mm/dd/yyyy'),10);

insert into test
values
('PQR', to_date('01/09/2008','mm/dd/yyyy'),0);

insert into test
values
('PQR', to_date('01/10/2008','mm/dd/yyyy'),30);

select * from test;

ID DT AMT
---------- --------- ----------
ABC 02-JAN-08 20
ABC 02-JAN-08 40
ABC 03-JAN-08 0
ABC 04-JAN-08 50
ABC 05-JAN-08 0
ABC 06-JAN-08 20
ABC 07-JAN-08 0
PQR 02-JAN-08 30
PQR 03-JAN-08 0
PQR 04-JAN-08 80
PQR 05-JAN-08 0
PQR 06-JAN-08 10
PQR 07-JAN-08 0
PQR 08-JAN-08 10
PQR 09-JAN-08 0
PQR 10-JAN-08 30

16 rows selected.

I need to find sum for the last three days for each id
For example
if today is 1/4/2008, I need to find the sum of 1/2/2008, 1/3/2008 and 1/4/2008
if today is 1/5/2008, I need to find the sum of 1/5/2008, 1/4/2008 and 1/3/2008

Can you please help me how to do this by using analytical functions?

May 21, 2008 - 3:14 pm UTC

that would not involve analytics at all. It is a rather simple "where" and group by.

ops$tkyte%ORA10GR2> variable dt varchar2(20)
ops$tkyte%ORA10GR2> exec :dt := '1/4/2008'

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select id, min(dt), max(dt), sum(amt)
  2    from test where dt between to_date( :dt, 'mm/dd/yyyy')-2 and to_date( :dt, 'mm/dd/yyyy')
  3   group by id;

ID         MIN(DT)   MAX(DT)     SUM(AMT)
---------- --------- --------- ----------
PQR        02-JAN-08 04-JAN-08        110
ABC        02-JAN-08 04-JAN-08        110

ops$tkyte%ORA10GR2> exec :dt := '1/5/2008'

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> /

ID         MIN(DT)   MAX(DT)     SUM(AMT)
---------- --------- --------- ----------
PQR        03-JAN-08 05-JAN-08         80
ABC        03-JAN-08 05-JAN-08         50

A reader, May 21, 2008 - 1:46 pm UTC

thanks

Christian

Reader, May 21, 2008 - 3:56 pm UTC

Tom,
Sorry missed to mention this in my previous question.
We have data since 2004. I need get the sum for all dates. I need to create a view for this, so that I can get data for any date, whenever, I query.

If the date is 05/07/2004, I should sum the data for 05/05/2004, 05/06/2004 and 05/07/2004.

If the date is 05/06/2004, I should sum the data for 05/04/2004, 05/05/2004 and 05/06/2004 and so on. I am not sure if this is called as Moving sum. Please advice.

Reader, May 21, 2008 - 4:28 pm UTC

with regards to the above question -
If I do dt-2, it will take weekends and holidays also. In the test table, I do not store data for weekends and holidays.

May 21, 2008 - 5:21 pm UTC

ok, so that is different - it would/could be an analytic with a range window - but, there is no such thing as "a holiday" - what do you need to have happen here and what do you have existing that will help make it happen.

Vaibhav, May 22, 2008 - 12:39 am UTC

hi tom,

I had to write a query that would give me the number fo errors that occured for a particular error code for the past 3 hrs but split over an interval of 15 mins...
eg: if it is 12 o clock now, then i would want number fo errors coourred between 9 to 9.15, 9.15 to 9.30 ... 11.45 to 12 for each error code...i went through your site and could write the following query

SELECT *
FROM (SELECT *
FROM (SELECT me.error_code,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) AND (p_error_date - 3/24) + NUMTODSINTERVAL(15, 'MINUTE') THEN 1 ELSE NULL END) in1,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(15, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(30, 'MINUTE') THEN 1 ELSE NULL END) in2,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(30, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(45, 'MINUTE') THEN 1 ELSE NULL END) in3,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(45, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(60, 'MINUTE') THEN 1 ELSE NULL END) in4,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(60, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(75, 'MINUTE') THEN 1 ELSE NULL END) in5,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(75, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(90, 'MINUTE' THEN 1 ELSE NULL END) in6,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(90, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(105, 'MINUTE') THEN 1 ELSE NULL END) in7,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(105, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(120, 'MINUTE') THEN 1 ELSE NULL END) in8,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(120, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(135, 'MINUTE') THEN 1 ELSE NULL END) in9,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(135, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(150, 'MINUTE') THEN 1 ELSE NULL END) in10,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(150, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(165, 'MINUTE') THEN 1 ELSE NULL END) in11,
COUNT (CASE when me.error_timestamp BETWEEN (p_error_date - 3/24) + NUMTODSINTERVAL(165, 'MINUTE') AND (p_error_date - 3/24) + NUMTODSINTERVAL(180, 'MINUTE') THEN 1 ELSE NULL END) in12,
SUM (COUNT(1)) OVER (PARTITION BY error_code ORDER BY error_code) total_count
FROM mhs_errors me, mhs_transaction_status mts
WHERE me.transaction_id = mts.transaction_id
AND me.error_timestamp BETWEEN p_error_date - 3/24 AND p_error_date
AND status_type = p_status_type
GROUP BY me.error_code)
ORDER BY total_count desc)
WHERE ROWNUM <= NVL(p_batchsize, 10)

i actually used a function that was called 12 times for each error code and then i went through your queries and came to kow about analytic functions and only then i could write the above query.

i want to know if there ia a better way to write this query?
please help

May 22, 2008 - 7:10 am UTC

I have no idea what to do here. No create tables, no inserts...

I don't know why you have so many columns for such a simple return - seems like it would have three columns in the output:

time (rounded to 15 minute intervals)
error code
count(*)

grouped by time, error code

no analytics.

Given that is what I think - you must not be presenting your requirements.... sorry.

Can try this

Ayan, May 22, 2008 - 3:39 am UTC

You can write a view as follows:

select sysdate,
(trunc(sysdate,'hh')-3/24)+r*15/(24*60) start_dt,
(trunc(sysdate,'hh')-3/24)+(r+1)*15/(24*60) end_dt
from(
select sysdate,rownum r
from user_objects
where rownum<13
)

and then compare your data as

error_timestamp >= start_dt and error_timestamp<end_dt

and then do the grouping.

I dont see use of analytical function here, but Tom is the best one to answer

Reader, May 22, 2008 - 10:01 am UTC

create table test
(id varchar2(10),
dt date,
amt number);

insert into test
values
('ABC', to_date('01/10/2008','mm/dd/yyyy'),0);

insert into test
values
('ABC', to_date('01/11/2008','mm/dd/yyyy'),0);

insert into test
values
('ABC', to_date('01/14/2008','mm/dd/yyyy'),0);

insert into test
values
('ABC', to_date('01/15/2008','mm/dd/yyyy'),0);

insert into test
values
('ABC', to_date('01/16/2008','mm/dd/yyyy'),20);

insert into test
values
('ABC', to_date('01/18/2008','mm/dd/yyyy'),0);

insert into test
values
('ABC', to_date('01/22/2008','mm/dd/yyyy'),0);

insert into test
values
('PQR', to_date('01/11/2008','mm/dd/yyyy'),30);

insert into test
values
('PQR', to_date('01/14/2008','mm/dd/yyyy'),0);

insert into test
values
('PQR', to_date('01/15/2008','mm/dd/yyyy'),80);

insert into test
values
('PQR', to_date('01/16/2008','mm/dd/yyyy'),0);

insert into test
values
('PQR', to_date('01/18/2008','mm/dd/yyyy'),10);

insert into test
values
('PQR', to_date('01/22/2008','mm/dd/yyyy'),0);

commit;

01/15/2008 is tuesday
01/14/2008 is monday
01/11/2008 is friday

I have zero for amount on all the above days for id = ABC. I do not store data for weekends and holidays in the test table.

When I do the 3-day sum using windowing option, I get 1E-34 instead of zero. I am not sure, why I am getting 1E-34. Please advice.

select dt
,id
,sum(amt) as sum_daily
,sum(sum(amt)) OVER (PARTITION BY id order by dt ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as SUM_3d
from test
group by dt, id
order by dt;

DT ID SUM_DAILY SUM_3D
--------- ---------- ---------- ----------
11-JAN-08 ABC 0 0
11-JAN-08 PQR 30 30
14-JAN-08 ABC 0 0
14-JAN-08 PQR 0 30
15-JAN-08 PQR 80 110
15-JAN-08 ABC 0 1E-34
16-JAN-08 ABC 20 20
16-JAN-08 PQR 0 80
18-JAN-08 PQR 10 90
18-JAN-08 ABC 0 20
22-JAN-08 PQR 0 10
22-JAN-08 ABC 0 20

12 rows selected.

May 23, 2008 - 8:01 am UTC

I cannot reproduce.

any other information here - like platform, precise version

Vaibhav, May 23, 2008 - 12:52 am UTC

Hi
Thanks for that quick reply

I apologise for not providing any create and insert scripts.
The reason i am using PARTITION BY is that i want the total count as well for that particular error code
Kindly see below the kind of output that i am lookin out for.
Also, Ayan mentioned using TRUNC(errror_date, 'HH')
i cannot use this because it will round the error timestamp. i dont want that
what i want is if user has passed the error date as 2008/22/05 12:35:56, then
i should give him the error count as

ERROR_CODE IN1 IN2 IN3 IN4 IN5 IN6 IN7 IN8 IN9 IN10 IN11 IN12 TOTAL_COUNT

err1 0 0 0 0 0 0 0 0 0 2 0 5 7
err2 0 0 0 0 0 0 0 0 0 0 0 5 5
err3 0 0 0 0 0 0 0 0 0 0 1 5 6
err4 0 0 0 0 0 0 0 1 0 0 0 8 9

here IN1 = 09:35:56 to 09:50:56
IN2 = 09:50:56 to 10:05:56
IN3 = 10:05:56 to 10:20:56
and so on...the last interval would be
IN12 = 12:20:56 to 12:35:56
the last column TOTAL_COUNT would give me the total count for that error code for that 3 hours ie 09:35:56 to 12:35:56

Here are the scripts:

CREATE TABLE mhs_errors
(
error_id NUMBER(10),
transaction_id VARCHAR2(36) NOT NULL,
error_code VARCHAR2(50) NOT NULL,
error_severity VARCHAR2(10),
error_timestamp TIMESTAMP,
error_text VARCHAR2(700)
);

ALTER TABLE mhs_errors
ADD CONSTRAINT pk_mhs_errors PRIMARY KEY(error_id);

Insert into MHS_ERRORS Values (2, 'tran1', 'err1', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (3, 'tran2', 'err1', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (4, 'tran3', 'err1', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (5, 'tran4', 'err1', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (6, 'tran5', 'err1', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (7, 'tran6', 'err2', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (8, 'tran7', 'err2', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (9, 'tran8', 'err2', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (10, 'tran9', 'err2', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (11, 'tran10', 'err2', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (12, 'tran11', 'err3', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (13, 'tran12', 'err3', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (14, 'tran13', 'err3', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (15, 'tran14', 'err3', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (16, 'tran15', 'err3', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (17, 'tran16', 'err4', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (18, 'tran17', 'err4', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (19, 'tran18', 'err4', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (20, 'tran19', 'err4', 'fatal', systimestamp, 'dummy error text');
Insert into MHS_ERRORS Values (21, 'tran20', 'err4', 'fatal', systimestamp, 'dummy error text');

COMMIT;

SELECT *
FROM (SELECT *
FROM (SELECT me.error_code,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(15, 'MINUTE')
THEN 1 ELSE NULL END) in1,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(15, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(30, 'MINUTE')
THEN 1 ELSE NULL END) in2,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(30, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(45, 'MINUTE')
THEN 1 ELSE NULL END) in3,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(45, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(60, 'MINUTE')
THEN 1 ELSE NULL END) in4,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(60, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(75, 'MINUTE')
THEN 1 ELSE NULL END) in5,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(75, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(90, 'MINUTE')
THEN 1 ELSE NULL END) in6,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(90, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(105, 'MINUTE')
THEN 1 ELSE NULL END) in7,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(105, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(120, 'MINUTE')
THEN 1 ELSE NULL END) in8,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(120, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(135, 'MINUTE')
THEN 1 ELSE NULL END) in9,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(135, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(150, 'MINUTE')
THEN 1 ELSE NULL END) in10,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(150, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(165, 'MINUTE')
THEN 1 ELSE NULL END) in11,
COUNT (CASE when me.error_timestamp BETWEEN (&&p_error_date - 3/24) + NUMTODSINTERVAL(165, 'MINUTE') AND (&&p_error_date - 3/24) + NUMTODSINTERVAL(180, 'MINUTE')
THEN 1 ELSE NULL END) in12,
SUM (COUNT(1)) OVER (PARTITION BY error_code ORDER BY error_code) total_count
FROM mhs_errors me
WHERE me.error_timestamp BETWEEN &&p_error_date - 3/24 AND &&p_error_date
GROUP BY me.error_code)
ORDER BY total_count desc)
WHERE ROWNUM <= NVL(:p_batchsize, 10)

I get the following output after executing the above query:

ERROR_CODE IN1 IN2 IN3 IN4 IN5 IN6 IN7 IN8 IN9 IN10 IN11 IN12 TOTAL_COUNT

err1 0 0 0 0 0 0 0 0 0 0 0 5 5
err2 0 0 0 0 0 0 0 0 0 0 0 5 5
err3 0 0 0 0 0 0 0 0 0 0 0 5 5
err4 0 0 0 0 0 0 0 0 0 0 0 5 5

Kindly guide me how to write this in a better way
thanks in advance

May 23, 2008 - 9:02 am UTC

I see nothing wrong with your existing query.

You need a function / column you want in the output - your counts are correct as is.

You are done. (well, use a bind variable in real life, no &&p_error_date of course and always - ALWAYS use to-date on a string to convert to a date - using the right date format as well)

Vaibhav, May 23, 2008 - 12:44 pm UTC

Hi

Thanks for that piece of advice...

i will make sure i am using to date function henceforth...

now i will tell you what the actual buisness scenario would be...

on the screen, user has an option to select the time interval...

i.e. last 3 hours, last 6 hours, last 12 hours, last 24 hours

the only difference is, for 3 hours the error interval is 15 mins i.e for each error code we get 4 error counts per hour making it 12 error count for 3 hours (12 col + 1 total count = 13 cols in o/p)

for 6 hours, interval is again 15 min, therefore for each error code we have 6*4=24 error count (24 cols in o/p)

for 12 hours, interval is 30 min, therefore for each error code we have 12*2=24 error count(24 cols in o/p)

for 12 hours, interval is 1 hour, therefore for each error code we have 24*1=24 error count(24 cols in o/p)

what i have done is used if condition rewriting the same query 4 times with different time comparisons...

is there a way by which i can write just one select query and still get the desired output...

kindly help me with these..

thanks a million again for all the help that you provide to people...

and ya, i have been reading throughout that you have written a book on analytics...

where can i purchase it from ????

let me know about this

May 23, 2008 - 6:19 pm UTC

... is there a way by which i can write just one select query and still get the
desired output...
...

a query has a fixed number of columns - unless you want to include nulls in the other queries, no. the single query would have a SINGLE set of columns

Vaibhav, May 24, 2008 - 1:52 am UTC

Thanks...

I was just wondering if there is any way by which we can filter out number of columns dynamically...its not possible as i dont want null values in those undesired columns...thanks for confirming this...

TOM...i asked you about your book Expert One-on-One

i surfed some sites like amazon.com and all...

but I am from India...how do i purchase the book here

May 24, 2008 - 7:00 pm UTC

for books: take ISBN to any reputable bookstore and say "I want this book"

expert one on one is out of print, effective oracle by design and expert oracle database architecture are not.

a sql query has a FIXED NUMBER OF COLUMNS, period.

Vaibhav, May 25, 2008 - 2:27 am UTC

thanks...

i will grab a copy of the available books asap...

Min And Max of date ranges

Shivdeep Modi, May 27, 2008 - 8:16 am UTC

Hi,

I've got something like

create table tsreport_date ( report_date date);

insert into tsreport_date values('09-MAY-2008 17:52:22');
insert into tsreport_date values('10-MAY-2008 10:00:03');
insert into tsreport_date values('11-MAY-2008 10:00:01');
insert into tsreport_date values('12-MAY-2008 10:00:06');
insert into tsreport_date values('13-MAY-2008 10:00:01');
insert into tsreport_date values('14-MAY-2008 10:00:02');
insert into tsreport_date values('15-MAY-2008 10:00:05');
insert into tsreport_date values('16-MAY-2008 10:00:03');
insert into tsreport_date values('17-MAY-2008 10:00:01');
insert into tsreport_date values('18-MAY-2008 10:00:04');
insert into tsreport_date values('19-MAY-2008 10:00:02');
insert into tsreport_date values('20-MAY-2008 10:00:04');
insert into tsreport_date values('21-MAY-2008 10:00:04');
insert into tsreport_date values('22-MAY-2008 10:00:04');
insert into tsreport_date values('23-MAY-2008 10:00:02');
insert into tsreport_date values('23-MAY-2008 14:46:27');
insert into tsreport_date values('24-MAY-2008 10:00:02');
insert into tsreport_date values('25-MAY-2008 10:00:03');
insert into tsreport_date values('26-MAY-2008 10:00:01');
insert into tsreport_date values('27-MAY-2008 10:00:03');
insert into tsreport_date values('01-JUN-2008 11:03:16');
insert into tsreport_date values('02-JUN-2008 11:03:21');
insert into tsreport_date values('03-JUN-2008 11:03:23');
insert into tsreport_date values('26-JUN-2008 11:05:01');
insert into tsreport_date values('01-JUL-2008 11:05:10');
insert into tsreport_date values('02-JUL-2008 11:16:25');
insert into tsreport_date values('03-JUL-2008 11:16:27');

select to_char(report_date,'MON-YYYY'), min(report_date), max(report_date)
from tsreport_date
group by to_char(report_Date,'MON-YYYY')
/

TO_CHAR( MIN(REPORT_DATE) MAX(REPORT_DATE)
-------- -------------------- --------------------
JUL-2008 01-JUL-2008 11:05:10 03-JUL-2008 11:16:27
MAY-2008 09-MAY-2008 17:52:22 27-MAY-2008 10:00:03
JUN-2008 01-JUN-2008 11:03:16 26-JUN-2008 11:05:01

3 rows selected.

To get the min and max dates of a month, is it possible to get the result using analytics?
If yes which one of them would be the best?

Regards,
Shivdeep

May 27, 2008 - 8:32 am UTC

ops$tkyte%ORA10GR2> select to_char(trunc(report_date,'mm'),'MON-YYYY'), min(report_date), max(report_date)
  2    from tsreport_date
  3   group by trunc(report_Date,'mm')
  4   order by trunc(report_date,'mm')
  5  /

TO_CHAR( MIN(REPORT_DATE)     MAX(REPORT_DATE)
-------- -------------------- --------------------
MAY-2008 09-may-2008 17:52:22 27-may-2008 10:00:03
JUN-2008 01-jun-2008 11:03:16 26-jun-2008 11:05:01
JUL-2008 01-jul-2008 11:05:10 03-jul-2008 11:16:27

that would be the more efficient way - do not convert a date into a string for operations on it - use TRUNC whenever possible.

Analytics would not be appropriate here - you want to aggregate - you only want three records output - analytics do not "aggregate", aggregation does.

trunc(date,'mm') is a lot less cpu intensive than to_char(date,'mon-yyyy') - does the same thing and is SORTABLE (mon-yyyy isn't sortable at all, you'd have to to_date that string to make it a date again in order to sort!!!)

use to_char on a date only for formatting in a report - not as a way to truncate.

Min And Max of date ranges

Shivdeep Modi, May 27, 2008 - 11:30 am UTC

Thanks for the advise to use trunc. I was not aware I was not using the best options using to_char.

Regards,
Shivdeep

Vaibhav, May 28, 2008 - 12:17 am UTC

Hey Tom,

I found both the books,

Expert one on one oracle and Effective orcale by design

they will reach in 10 days

i m very excited

Karteek, May 28, 2008 - 9:51 am UTC

Tom,

col1 is a grouping column and col2 is actually the result of row_num() analytic function on inner query (not shown), and col3 is some thing like a flag that tells about col2.

create table test(col1 varchar2(2), col2 number(5), col3 number(5));
insert into test values('A', 1, NULL);
insert into test values('A', 2, 2);
insert into test values('A', 3, 3);
insert into test values('A', 4, 4);
insert into test values('A', 5, NULL);
insert into test values('A', 6, 6);
insert into test values('A', 7, NULL);
insert into test values('A', 8, 8);
insert into test values('A', 9, 9);

A 1
A 2 2
A 3 3
A 4 4
A 5
A 6 6
A 7
A 8 8
A 9 9

Output should be like...
A 2
A 8

Condition here is, I need to consider only the first occurence of the sequence with atleast 2 consecutive numbers followed(2,3,4 - 1st, 8,9 - 2nd). Hope I am clear!

Thanks Tom!

May 28, 2008 - 10:11 am UTC

ops$tkyte%ORA10GR2> select *
  2    from (
  3  select col1, col2, col3,
  4         lag(col3) over (partition by col1 order by col2) last_col3,
  5         lead(col3) over (partition by col1 order by col2) next_col3
  6    from test
  7         )
  8   where last_col3 is null and next_col3 = col3+1
  9  /

CO       COL2       COL3  LAST_COL3  NEXT_COL3
-- ---------- ---------- ---------- ----------
A           2          2                     3
A           8          8                     9

Analytics

Shivdeep Modi, June 09, 2008 - 12:34 pm UTC

Hi,

I need to formulate drop statement for the package/package bodies.
I've got something like:
select owner,object_name,object_type
from dba_objects
where object_type like 'PACKAGE%'
and owner = 'NCLDBA'
order by object_name
/

Owner           Object Name                Object Type
--------------- -------------------------- ---------------
NCLDBA          TEST                       PACKAGE BODY
NCLDBA          TEST                       PACKAGE
NCLDBA          TEST2                      PACKAGE
NCLDBA          TEST3                      PACKAGE BODY

4 rows selected.

Now I want to formulate drop statements for the above.
The result set should be:
drop package test;
drop package test2;
drop package body test3;

I think this can be done using analytic functions. However I get the same rank for package/package body combination. What is the mistake here?

select 'drop '||object_type||' '||owner||'.'||object_name||';' Statement,
       object_name,
       object_type,
       rank() over (partition by object_name order by object_name) as rank
  from dba_objects
 where object_type like 'PACKAGE%'
   and owner = 'NCLDBA'
 order by object_name
/

STATEMENT                        Object Name  Object Type    RANK
-------------------------------- ------------ ------------- -----
drop PACKAGE BODY NCLDBA.TEST;   TEST         PACKAGE BODY      1
drop PACKAGE NCLDBA.TEST;        TEST         PACKAGE           1
drop PACKAGE NCLDBA.TEST2;       TEST2        PACKAGE           1
drop PACKAGE BODY NCLDBA.TEST3;  TEST3        PACKAGE BODY      1

June 09, 2008 - 2:19 pm UTC

ops$tkyte%ORA10GR2> select * from (
  2  select 'drop '||object_type||' '||owner||'.'||object_name||';' Statement,
  3         object_name,
  4         object_type,
  5         rank() over (partition by object_name order by object_type) as rank
  6    from dba_objects
  7   where object_type like 'PACKAGE%'
  8     and owner = user
  9     ) where rank = 1
 10   order by object_name
 11  /

STATEMENT                           OBJECT_NAME                    OBJECT_TYPE               RANK
----------------------------------- ------------------------------ ------------------- ----------
drop PACKAGE OPS$TKYTE.TEST;        TEST                           PACKAGE                      1
drop PACKAGE OPS$TKYTE.TEST2;       TEST2                          PACKAGE                      1
drop PACKAGE BODY OPS$TKYTE.TEST3;  TEST3                          PACKAGE BODY                 1

In an analytic, if what you order by isn't UNIQUE within the partition - neither row is "first", they are consider the same, a tie - so order by object type to make it unique.

Analytics

Shivdeep Modi, June 10, 2008 - 4:31 am UTC

Thanks for the solution.

Regards,
Shivdeep

Is there a way to do this using Analytics ?

Rajesh Srivastava, September 17, 2008 - 11:55 pm UTC

Hi Tom,

I have a Table with the following structure

Table : TAB1
---------------------------------------------------------
Client HQ-Country Engagement-Country Revenue
---------------------------------------------------------
C1 US UK 100
C1 US US 200
C1 US Japan 50

C2 US India 100
C2 US Japan 50
-------------------------------------------------------

What we want to do is to select all records for those clients where NO record exists with HQ_COUNTRY = ENGAGEMENT_COUNTRY. So the three records of C1 should not be selected and the two records for C2 should be selected.

I can think of :

select * from TAB1
where client_id NOT IN
(select distinct client_id from TAB1
where HQ_COUNTRY = ENGAGEMENT_COUNTRY)

but it looks clunky and have a feeling will perform badly
(TAB1 is large and is a complex view)

Since the query is about checking a condition in a partition, i think Analytics with its PARTION clause should have a magic bullet - but i can't come up with it ...

Any pointers would be much appreciated

September 18, 2008 - 7:58 am UTC

there are no silver bullets.

this can be done with analytics, but if tab1 is a large complex view that takes a long time to execute - well - it'll still take a long time to execute

select * 
  from (select tab1.*, 
               count( case when hq_country = engagement_country then 1 end)
               over (partition by client_id) cnt
          from tab1 )
 where cnt = 0;

Above logic did not work for me..

Ashish, September 25, 2008 - 11:10 am UTC

Hi Tom,
Thankyou for your time! above query did not work for me please help!

SELECT pc_no,av_cd,jcd,rga,rgm,
from (
select pc_no,av_cd,substr(jct,1,5)||'00' jcd,rga,rgm,
count (case when (nvl(rga,0)>0 OR nvl(rgm,0)>0) then 1 end)
over (partition by av_cd, pc_no, SUBSTR(jct,1,5)||'00') cnt
from ca_cd ca
where av_cd = 'T2'
)
where cnt=1
and avs = '0000422'

September 25, 2008 - 3:38 pm UTC

well, short of giving you another random piece of logic - how about you actually set down and tell us exactly what it is you are trying to do??????????

Seriously - look at this for a second and ask yourself "how could any reasonable person read what I wrote and know what I need, I didn't tell them what I needed - I only told them that the answer to someone else's question does not work for me"

Basically, all you have done is ruled out one solution from an infinite set of all solutions to all problems. We could be here a while throwing out logic/ algorithms and never get to what you need.

So, it will go a lot better if.... you stated precisely what you are trying to do

and remember - create table, insert into - a test case - would be necessary if you want sql that actually compiles.

Lag/Lead Windowing...

J, September 25, 2008 - 2:02 pm UTC

Hello Tom,

Here's a situation that we find ourselves up against. We're hopelessly stuck and am hoping you can advise us on how best to proceed (thanks!)

The setup:
==========

create table jt
(cust_cd varchar2(20),
visit_id1 number(20),
visit_id2 number(8,2),
visit_dt date,
c_flag varchar2(1),
l_flag varchar2(1));

insert into jt values ('P1',1,60961,to_date('11/27/2007','MM/DD/YYYY'),'Y','N');
insert into jt values ('P1',2,60964.02,to_date('11/30/2007','MM/DD/YYYY'),'N','Y');
insert into jt values ('P1',3,61055,to_date('02/29/2008','MM/DD/YYYY'),'Y','Y');
insert into jt values ('P1',4,61055.01,to_date('02/29/2008','MM/DD/YYYY'),'N','Y');
insert into jt values ('P1',5,61072,to_date('03/03/2008','MM/DD/YYYY'),'Y','N');
insert into jt values ('P1',6,61100,to_date('04/14/2008','MM/DD/YYYY'),'Y','Y');
insert into jt values ('P1',7,61121,to_date('05/15/2008','MM/DD/YYYY'),'Y','N');
insert into jt values ('P1',8,61163,to_date('06/16/2008','MM/DD/YYYY'),'Y','Y');
insert into jt values ('P1',9,61163.01,to_date('06/16/2008','MM/DD/YYYY'),'N','Y');
insert into jt values ('P1',10,61163.02,to_date('06/16/2008','MM/DD/YYYY'),'N','Y');
insert into jt values ('P2',11,61105,to_date('03/18/2008','MM/DD/YYYY'),'N','Y');
insert into jt values ('P2',12,62100,to_date('03/25/2008','MM/DD/YYYY'),'Y','N');

commit;

========
Here's the objective: For each Customer's "C" visit (C_Flag = 'Y'), find the *nearest* "L" visit(s) (L_Flag = 'Y') by Visit_Dt - backwards two weeks or forwards two weeks in time and associate the Visit_Id1's.
========

So the "C" visits are the pivot points. The sorting order is: Cust_Cd, Visit_Dt, Visit_Id2. Just to make it more complicated, in the sample data, Visit_Id1 # 8 is a C-Visit. It's also an L-Visit. And... Visits 9 and 10 are on the same day as #9. The number of associated L-Visits for a given C-Visit is three.

The resulting set should look like:

Cust ID C-Visit ID L-Visit ID1 L-Visit ID2 L-Visit ID3
------- ---------- ----------- ----------- -----------
P1 1 2 (null) (null)
P1 3 3 4 (null)
P1 5 3 4 (null)
P1 6 6 (null) (null)
P1 7 (null) (null) (null)
P1 8 8 9 10
P2 12 11 (null) (null)

Can this be answered through some nifty analytics?
Three of us have been wrestling with this all week long and can't figure out an approach - Any assistance or advice would be most appreciated.

Best Regards,
- J

September 25, 2008 - 4:35 pm UTC

here was my though process on this:

ops$tkyte%ORA10GR2> select cust_cd, visit_id1, visit_id2, visit_dt, c_flag, l_flag,
  2         decode( l_flag, 'Y', visit_dt ) curr_l,
  3         last_value( case when l_flag = 'Y' then visit_dt end ignore nulls) over ( partition by cust_cd order by visit_dt ) last_l,
  4         last_value( case when l_flag = 'Y' then visit_dt end ignore nulls) over ( partition by cust_cd order by visit_dt DESC ) next_l
  5    from jt;

CUST_CD               VISIT_ID1  VISIT_ID2 VISIT_DT  C L CURR_L    LAST_L    NEXT_L
-------------------- ---------- ---------- --------- - - --------- --------- ---------
P1                            1      60961 27-NOV-07 Y N                     30-NOV-07
P1                            2   60964.02 30-NOV-07 N Y 30-NOV-07 30-NOV-07 30-NOV-07
P1                            3      61055 29-FEB-08 Y Y 29-FEB-08 29-FEB-08 29-FEB-08
P1                            4   61055.01 29-FEB-08 N Y 29-FEB-08 29-FEB-08 29-FEB-08
P1                            5      61072 03-MAR-08 Y N           29-FEB-08 14-APR-08
P1                            6      61100 14-APR-08 Y Y 14-APR-08 14-APR-08 14-APR-08
P1                            7      61121 15-MAY-08 Y N           14-APR-08 16-JUN-08
P1                            8      61163 16-JUN-08 Y Y 16-JUN-08 16-JUN-08 16-JUN-08
P1                            9   61163.01 16-JUN-08 N Y 16-JUN-08 16-JUN-08 16-JUN-08
P1                           10   61163.02 16-JUN-08 N Y 16-JUN-08 16-JUN-08 16-JUN-08
P2                           11      61105 18-MAR-08 N Y 18-MAR-08 18-MAR-08 18-MAR-08
P2                           12      62100 25-MAR-08 Y N           18-MAR-08

12 rows selected.

Get the current row L-flag date if it exists, get the immediate prior one and get the immediate next one - one of those three (if there) will be the l-date of interest....

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select cust_cd, visit_id1, visit_id2, visit_dt, curr_l, last_l, next_l,
  2         case when curr_l is not null then curr_l
  3                  when last_l is not null and next_l is null and visit_dt-last_l <= 14 then last_l
  4                          when last_l is null and next_l is not null and next_l-visit_dt <= 14 then next_l
  5                          else decode( least( (visit_dt-last_l), (next_l-visit_dt), 14),
  6                                       (visit_dt-last_l), last_l,
  7                                                   (next_l-visit_dt), next_l )
  8                  end interesting_l
  9    from (
 10  select cust_cd, visit_id1, visit_id2, visit_dt, c_flag, l_flag,
 11         decode( l_flag, 'Y', visit_dt ) curr_l,
 12         last_value( case when l_flag = 'Y' then visit_dt end ignore nulls) over ( partition by cust_cd order by visit_dt ) last_l,
 13         last_value( case when l_flag = 'Y' then visit_dt end ignore nulls) over ( partition by cust_cd order by visit_dt DESC ) next_l
 14    from jt
 15         )
 16   where c_flag = 'Y'
 17  /

CUST_CD               VISIT_ID1  VISIT_ID2 VISIT_DT  CURR_L    LAST_L    NEXT_L    INTERESTI
-------------------- ---------- ---------- --------- --------- --------- --------- ---------
P1                            1      60961 27-NOV-07                     30-NOV-07 30-NOV-07
P1                            3      61055 29-FEB-08 29-FEB-08 29-FEB-08 29-FEB-08 29-FEB-08
P1                            5      61072 03-MAR-08           29-FEB-08 14-APR-08 29-FEB-08
P1                            6      61100 14-APR-08 14-APR-08 14-APR-08 14-APR-08 14-APR-08
P1                            7      61121 15-MAY-08           14-APR-08 16-JUN-08
P1                            8      61163 16-JUN-08 16-JUN-08 16-JUN-08 16-JUN-08 16-JUN-08
P2                           12      62100 25-MAR-08           18-MAR-08           18-MAR-08

7 rows selected.

So, we can take those three dates and for the C records - find the interesting l date....

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> with A
  2  as
  3  (
  4  select cust_cd, visit_id1, visit_id2, visit_dt, curr_l, last_l, next_l,
  5         case when curr_l is not null then curr_l
  6                  when last_l is not null and next_l is null and visit_dt-last_l <= 14 then last_l
  7                          when last_l is null and next_l is not null and next_l-visit_dt <= 14 then next_l
  8                          else decode( least( (visit_dt-last_l), (next_l-visit_dt), 14),
  9                                       (visit_dt-last_l), last_l,
 10                                                   (next_l-visit_dt), next_l )
 11                  end interesting_l
 12    from (
 13  select cust_cd, visit_id1, visit_id2, visit_dt, c_flag, l_flag,
 14         decode( l_flag, 'Y', visit_dt ) curr_l,
 15         last_value( case when l_flag = 'Y' then visit_dt end ignore nulls) over ( partition by cust_cd order by visit_dt ) last_l,
 16         last_value( case when l_flag = 'Y' then visit_dt end ignore nulls) over ( partition by cust_cd order by visit_dt DESC ) next_l
 17    from jt
 18         )
 19   where c_flag = 'Y'
 20  ),
 21  B
 22  as
 23  (select * from jt where l_flag = 'Y' )
 24  select a.cust_cd, a.visit_dt, a.visit_id1, b.visit_id1
 25    from A left join B
 26      on (a.cust_cd = b.cust_cd and a.interesting_l = b.visit_dt)
 27   order by a.cust_cd, a.visit_dt, a.visit_id1
 28  /

CUST_CD              VISIT_DT   VISIT_ID1  VISIT_ID1
-------------------- --------- ---------- ----------
P1                   27-NOV-07          1          2
P1                   29-FEB-08          3          3
P1                   29-FEB-08          3          4
P1                   03-MAR-08          5          3
P1                   03-MAR-08          5          4
P1                   14-APR-08          6          6
P1                   15-MAY-08          7
P1                   16-JUN-08          8          8
P1                   16-JUN-08          8          9
P1                   16-JUN-08          8         10
P2                   25-MAR-08         12         11

11 rows selected.

and then outer join back to pick up the records of interest.... You could pivot this if you want - but basically this is your set of data.

RE: Lag/Lead Windowing

J, September 25, 2008 - 6:25 pm UTC

Thank you Tom!

This is a *completely* different approach from what we were trying. Will dissect this to try to tease the nuances of your "how to apply analytics" mindset.

All the best,
- J

Analytics

Ashish, October 07, 2008 - 6:05 am UTC

Tom,
First of all I apology for not providing 100% input to you in order to help me resolve the problem.

Here I go...Code is:

SELECT pc_no,av_cd,jcd,rga,rgm,
from (
select pc_no,av_cd,substr(jct,1,5)||'00' jcd,rga,rgm,
count (case when (nvl(rga,0)>0 OR nvl(rgm,0)>0) then 1 end)
over (partition by av_cd, pc_no, SUBSTR(jct,1,5)||'00') cnt
from ca_cd ca
where av_cd = 'T2'
)
where cnt=1
and avs = '0000424'

If I run above query I get below output:

pc_no av_cd jcd rga rgm
0000424 T2 7166500 0 0
0000424 T2 7166500 6 6
0000424 T2 7166500 0 0

While, I only want (rga>0 or rgm>0) data i.e. only one row out of above 3 rows from above table.

Please help!

Thanks,

October 08, 2008 - 9:13 pm UTC

add a where clause?

where cnt = 1 and (rga >0 or rgm>0)

what about your book on Analytical functions

Vijay'S, April 15, 2009 - 6:49 am UTC

Dear Tom,
it's long you said that you are thinking of writing a book on analytical functions, have you started on it already or is it going to be announced soon. I know I am too optimistic here but reading your few more posts on analytics I can't wait more to have your book on this subject to understand them better.

Please make this happen it will be great help for everyone as your other books have helped a lot.

Regards,
Vijay'S

can this be done in analytics -or single sql

A reader, July 28, 2009 - 9:40 am UTC

Hi Tom,

I have a table like below and the sample data as given below.

drop table T ;
create table T ( ordernumber number,stockid number, side char(1),quantity number,price number) ;

sample data :

insert into t values (1, 1, 'S', 100, 50.00) ;
insert into t values (2, 5, 'B', 25, 12.50) ;
insert into t values (3, 1, 'S', 100, 50.00) ;
insert into t values (4, 1, 'B', 150, 51.00) ;
insert into t values (5, 1, 'B', 50, 49.00) ;
insert into t values (6, 8, 'B', 100, 16.75) ;
insert into t values (7, 5, 'S', 50, 12.25) ;
insert into t values (8, 8, 'S', 100, 17.00) ;

my sql should be able to give me a output for all the possible records where i can see a stock to somone who want to buy..the ordernunber is a system generated number and that is typical a order.the sell and buy match off should have 2 restrictions

the sell price must be >= buy price.
and the stock_number should be same for sell and buy records.

here "sell" records are the one for which side = 'S' and buy records are the one with side = 'B' ...so the expected output of the sql should be

stockId sell_order buy_order sell_qty buy_quty
1 1 4 100 100
1 3 4 50 50
5 7 2 25 25

in case of order #7 the remaining qty to sell should be updated as part of this process ...so the table should be updated that now for order number 7 there are still 25 stocks and they can be considereed for the next run..

same is true for order for #3.

so far i have taken this approach - but not able to break it...

column diff format 99999
select 'T'
, buy.ordernumber buy_ord
,sell.ordernumber sell_ord
,sell.stockid
,sell.quantity sell_qty
-- ,sell.tot_sell_qty
-- ,buy.tot_buy_qty
,buy.quantity buy_qty
,sell.price sell_price
,buy.price buy_price
,lag(sell.quantity - buy.quantity) over ( order by buy.price desc) diff
from
(select t.* , sum(quantity) over (partition by side,price) tot_sell_qty from t where side ='S' and quantity >0 and price > 0
order by ordernumber,stockid,price ) sell,
(select t.* ,sum(quantity) over (partition by side,price) tot_buy_qty from t where side ='B' and quantity >0 and price > 0
order by ordernumber,stockid,price ) buy
where sell.stockid = buy.stockid and buy.price >= sell.price ;

i would really appreciate any help on this.

Is analytics better in this case?

Joe, July 29, 2009 - 7:50 pm UTC

Tom,
I have become a huge fan of analytics and I find myself trying to use them whenever possible, even if there may be an easier or better to get the result. So I figured I better get your opinion as to whether the fictional example below is better with or without the analytics.

declare
  s1 varchar2(30);
  s2 varchar2(30);

  function get_first_index_loop(t in varchar2) return varchar2 is
    result varchar2(30) := null;
  begin
    for c in (select 1               priority,
                     constraint_name index_name
                from user_constraints
               where table_name = t
                 and constraint_type = 'P'
              union
              select 2 priority,
                     index_name
                from user_indexes
               where table_name = t
               order by priority)
    loop
      result := c.index_name;
      exit;
    end loop;
    return result;
  end;

  function get_first_index_analytics(t in varchar2) return varchar2 is
    result varchar2(30) := null;
  begin
    select distinct first_value(index_name) over(order by priority)
      into result
      from (select 1 priority,
                   constraint_name index_name
              from user_constraints
             where constraint_type = 'P'
               and table_name = t
            union
            select 2 priority,
                   index_name
              from user_indexes
             where table_name = t
             order by priority);
  
    return result;
  
  exception
    when no_data_found then
      return null;
  end;

begin
  for t in (select table_name
              from user_tables
             order by table_name)
  loop
    s1 := get_first_index_loop(t.table_name);
    s2 := get_first_index_analytics(t.table_name);
  end loop;
end;

I know that "if it can be done in SQL then do it, else use as little pl/sql as possible". But in the above example there are other factors like the exception handler that could affect performance.

Is one way better than the other (peformance, etc) and if not, could you share you opinion as to which way you prefer?

Thanks,
Joe

August 03, 2009 - 5:10 pm UTC

I would not use procedural code for this - no. You shouldn't have any code at all - forget comparing "procedural code - no analytics" vs "procedural code with analytics"

There should never have been a function - ever.

ops$tkyte%ORA10GR2> select table_name,
  2             nvl(  ( select constraint_name index_name
  3                   from user_constraints
  4                  where constraint_type = 'P'
  5                    and table_name = t.table_name
  6                    and rownum = 1 ),
  7                ( select index_name
  8                    from user_indexes
  9                   where table_name = t.table_name
 10                     and rownum = 1 ) ) iname
 11    from user_tables t
 12   order by table_name
 13  /

TABLE_NAME                     INAME
------------------------------ ------------------------------
C
CASE                           CASE_SEARCH1
DEPT                           DEPT_PK
DOCUMENTS                      T_IOT_PK

How to give group id for the particular group of rows

Suji, January 27, 2010 - 3:20 pm UTC

Ex:
CREATE TABLE TAB_A(A_ID NUMBER, A_DT DATE, A_TTL VARCHAR2(20));

BEGIN
INSERT INTO TAB_A VALUES(1,TO_DATE('01/01/2009','MM/DD/YYYY'),'ABC');
INSERT INTO TAB_A VALUES(2,TO_DATE('01/01/2009','MM/DD/YYYY'),'ABC');
INSERT INTO TAB_A VALUES(3,TO_DATE('01/01/2009','MM/DD/YYYY'),'ABCD');
INSERT INTO TAB_A VALUES(4,TO_DATE('01/01/2009','MM/DD/YYYY'),'ABCD');
INSERT INTO TAB_A VALUES(5,TO_DATE('02/01/2009','MM/DD/YYYY'),'ABC');
INSERT INTO TAB_A VALUES(6,TO_DATE('02/01/2009','MM/DD/YYYY'),'ABCDE');
INSERT INTO TAB_A VALUES(6,TO_DATE('03/01/2009','MM/DD/YYYY'),'ABCDEF');
INSERT INTO TAB_A VALUES(7,TO_DATE('03/01/2009','MM/DD/YYYY'),'ABCDEF');
INSERT INTO TAB_A VALUES(8,TO_DATE('03/01/2009','MM/DD/YYYY'),'ABCDEF');
END;

SELECT DISTINCT A.A_ID, A.A_DT, A.A_TTL
FROM TAB_A A
WHERE (A.A_DT, A.A_TTL) IN
(SELECT A.A_DT, A.A_TTL
FROM TAB_A A
GROUP BY A.A_DT, A.A_TTL
HAVING COUNT(DISTINCT A.A_ID) > 1)
ORDER BY A.A_ID

Answer is

A_ID A_DT A_TTL
1 1/1/2009 ABC
2 1/1/2009 ABC
3 1/1/2009 ABCD
4 1/1/2009 ABCD
6 3/1/2009 ABCDEF
7 3/1/2009 ABCDEF
8 3/1/2009 ABCDEF

I just want 4th group which should generate unique number for each group of A_DT and A_TTL columns like

all the 1/1/2009 and ABC should be displayed 1
all the 1/1/2009 and ABC should be 2
all the 3/1/2009 and ABCDEF should be 3
etc..........
How do I generate query

Thanks in advance

January 29, 2010 - 3:23 pm UTC

ops$tkyte%ORA11GR2> select a.*, dense_rank() over (order by a_ttl)
  2  FROM   TAB_A A
  3  WHERE   (A.A_DT, A.A_TTL) IN
  4  (SELECT A.A_DT, A.A_TTL
  5  FROM   TAB_A A
  6  GROUP BY A.A_DT, A.A_TTL
  7  HAVING COUNT(DISTINCT A.A_ID) > 1)
  8  ORDER BY A.A_ID
  9  /

      A_ID A_DT      A_TTL                DENSE_RANK()OVER(ORDERBYA_TTL)
---------- --------- -------------------- ------------------------------
         1 01-JAN-09 ABC                                               1
         2 01-JAN-09 ABC                                               1
         3 01-JAN-09 ABCD                                              2
         4 01-JAN-09 ABCD                                              2
         6 01-MAR-09 ABCDEF                                            3
         7 01-MAR-09 ABCDEF                                            3
         8 01-MAR-09 ABCDEF                                            3

7 rows selected.

Excelent

suji, February 01, 2010 - 12:59 pm UTC

My Problem is solved. Excellent. Thank you very much.
Thanks

A reader, February 11, 2010 - 5:58 am UTC

> create table test (id number, token varchar2(10));
> insert into test values (1,'ABC');
> insert into test values (2,'XYZ');
> insert into test values (3,'DEF');
> insert into test values (3,'DEF');
> insert into test values (4,'GHI');
> insert into test values (5,'RST');
> insert into test values (6,'JKL');
> insert into test values (6,'JKL');
> insert into test values (7,'MNO');
> insert into test values (8,'PQR');
> insert into test values (9,'DEF');
> insert into test values (10,'RST');
> SELECT * FROM TEST;

ID TOKEN
---------- ----------
1 ABC
2 XYZ
3 DEF
3 DEF
4 GHI
5 RST
6 JKL
6 JKL
7 MNO
8 PQR
9 DEF
10 RST

I want a query that will return all duplicate Token values unless the duplicates all belong to the same Id. For each duplicate returned the Id should also be displayed. Based on the above data the query should return:

ID TOKEN
--- ------
10 RST
5 RST
3 DEF
9 DEF

February 16, 2010 - 9:43 am UTC

  1  select distinct *
  2    from (
  3  select id, token, count(distinct id) over (partition by token) cnt
  4    from test
  5         )
  6*  where cnt > 1
ops$tkyte%ORA10GR2> /

        ID TOKEN             CNT
---------- ---------- ----------
        10 RST                 2
         9 DEF                 2
         5 RST                 2
         3 DEF                 2

Great work Tom

Dominic, February 25, 2010 - 6:27 am UTC

Tom,

First of all thanks for all the work you've done for the Oracle development community.

create table test1
(object_no number,
 abi_code  varchar2(3),
 priority_over varchar2(3))
/
 
insert into test1 values(1,'G07','Z99');
insert into test1 values(1,'J01','A22');
insert into test1 values(1,'J01','G06');
insert into test1 values(1,'J01','G07');
insert into test1 values(1,'J01','Z99');
insert into test1 values(1,'J01','P01');
insert into test1 values(2,'P01','Z99');
 
commit;

SQL> select *
  2  from test1
  3  /
 
 OBJECT_NO ABI PRI
---------- --- ---
         1 G07 Z99
         1 J01 A22
         1 J01 G06
         1 J01 G07
         1 J01 Z99
         1 J01 P01
         2 P01 Z99
 
7 rows selected.

I need to flag those rows where the abi_code exists in the priority_over column for another abi_code for that object. The end output I would be after would be 

 OBJECT_NO ABI
---------- ---
         1 J01
         2 P01

which I can get with 

select distinct object_no, abi_code
from test1 t
where abi_code not in (select priority_over
                         from test1
                        where object_no = t.object_no)

but this is 2 full table scans so I was wondering if there was an analytic function I could use to flag those rows that I want to exclude

I tried 

select object_no, 
       abi_code, 
       priority_over,
       first_value(priority_over) 
         over (partition by object_no
                   order by decode(priority_over,
                                   abi_code,1,2)) match
  from test1

but that failed miserably.

Can I do this with analytics or am I stuck with the correlated subquery

Thanks in advance

Dom

March 01, 2010 - 10:15 am UTC

even with the analytics - you would have to make two passes -

pass1: identify
pass2: update

because you cannot update the result of an analytic directly.

You are not stuck with a correlated subquery, you could use not in (select object_no, priority_code ...), but I would suggest this instead:

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'TEST1', numrows=> 1000000, numblks => 10000);

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select object_no, abi_code
  2    from test1 t1
  3   where NOT EXISTS
  4     (select null
  5        from test1 t2
  6       where t2.object_no = t1.object_no
  7         and t2.priority_over = t1.abi_code)
  8  /

Execution Plan
----------------------------------------------------------
Plan hash value: 2822440888

------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |   999K|    30M|       |  1981   (4)| 00:00:15 |
|*  1 |  HASH JOIN ANTI    |       |   999K|    30M|    26M|  1981   (4)| 00:00:15 |
|   2 |   TABLE ACCESS FULL| TEST1 |  1000K|    15M|       |   709   (3)| 00:00:06 |
|   3 |   TABLE ACCESS FULL| TEST1 |  1000K|    15M|       |   711   (4)| 00:00:06 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T2"."OBJECT_NO"="T1"."OBJECT_NO" AND
              "T2"."PRIORITY_OVER"="T1"."ABI_CODE")

ops$tkyte%ORA10GR2> set autotrace off

looking for that nice juicy hash anti-join.

Thanks Tom

Dominic, March 01, 2010 - 11:40 am UTC

Excellent as usual.

The actual problem was complicated by the fact that test1 is actually an inline view which initially caused me problems, but when I build the view in a with statement, that anti hash join comes up lovely.

  1  with il as (
  2  select hi.object_no,
  3         hi.item_description,
  4         hfm.abi_code,
  5         himp.priority_over
  6    from obj_hitem hi,
  7         hp_freetext_map hfm,
  8         hp_item_map_priority himp
  9   where instr(upper(hi.item_description),hfm.rule) > 0
 10     and himp.abi_code = hfm.abi_code)
 11  select il.object_no,
 12         il.item_description,
 13         il.abi_code,
 14         il.priority_over
 15    from il
 16   where not exists (select null
 17                       from il il2
 18                      where il2.object_no = il.object_no
 19*                       and il2.priority_over = il.abi_code)
SQL> set autotrace traceonly explain
SQL> /
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=21 Card=376 Bytes=60
          912)
 
   1    2     RECURSIVE EXECUTION OF 'SYS_LE_2_0'
   2    0   TEMP TABLE TRANSFORMATION
   3    2     HASH JOIN (ANTI) (Cost=21 Card=376 Bytes=60912)
   4    3       VIEW (Cost=4 Card=388 Bytes=56648)
   5    4         TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D6602_77604101'
           (Cost=4 Card=388 Bytes=5432)
 
   6    3       VIEW (Cost=4 Card=388 Bytes=6208)
   7    6         TABLE ACCESS (FULL) OF 'SYS_TEMP_0FD9D6602_77604101'
           (Cost=4 Card=388 Bytes=5432)

Once more you allow me to build a reputation of competence based on your hard work. It's solutions like this that are allowing me to turn 14000 rows of slow by slow pl/sql processing into one 3500 line insert statement :-)

Seriously though, I plug your website in every interview I attend, or site I work on. Who knows, if management finally learn that there is such a thing as good SQL, they might start hiring good developers to write it. Won't hold my breath.

Thanks again for all your work, just don't retire before I do.

Dom

CezarN, March 01, 2010 - 2:57 pm UTC

@Dominic:
I thought about analytics as a solution to your question:

select object_no, abi_code, priority_over
from (select object_no,
cod,
src,
count(distinct src) over(partition by cod) as csrc,
count(distinct object_no) over(partition by cod) as cobj,
abi_code,
priority_over
from (select object_no,
case d.src
when 1 then abi_code
else priority_over
end as cod,
d.src,
abi_code,
priority_over
from test1
join (select 1 as src from dual
union all
select 2 as src from dual) d
on 1 = 1))
where csrc = 2 and cobj > 1
order by cod, src, object_no;

@CezarN

Dominic, March 02, 2010 - 3:53 am UTC

Nice way of looking at the problem, I'm pretty impressed.

Unfortunately, as it is at the moment, it relies on the existence of the row where abi_code = 'J01' and priority_over = 'P01', but I tweaked it a bit and got this

select distinct object_no, abi_code
from (
select object_no,
cod,
src,
count(distinct src) over(partition by object_no,
cod) as csrc,
abi_code,
priority_over
from (select object_no,
case d.src
when 1 then abi_code
else priority_over
end as cod,
d.src,
abi_code,
priority_over
from test1
join (select 1 as src from dual
union all
select 2 as src from dual) d
on 1 = 1))
where csrc = 1
and src = 1

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (UNIQUE)
2 1 VIEW
3 2 WINDOW (SORT)
4 3 NESTED LOOPS
5 4 VIEW
6 5 UNION-ALL
7 6 TABLE ACCESS (FULL) OF 'DUAL'
8 6 TABLE ACCESS (FULL) OF 'DUAL'
9 4 TABLE ACCESS (FULL) OF 'TEST1'

I'll have to do some benchmarking to see which method performs best, and also which one can be more easily merged into the main query but it's always nice to have several options.

Thanks

Dom

March 02, 2010 - 8:28 am UTC

I'd lay odds on a nice juicy pair of full scans and a hash operation.

Always bet on Tom

Dominic, March 02, 2010 - 10:52 am UTC

SQL> get get_stats
  1  declare
  2    l_start number;
  3  begin
  4    insert into run_stats
  5    select 'before',
  6           stats.*
  7      from stats;
  8    l_start := dbms_utility.get_time;
  9    for i in 1 .. 100
 10    loop
 11      for x in (with il as (select hi.object_no,
 12                                   hi.item_description,
 13                                   hfm.abi_code,
 14                                   himp.priority_over
 15                              from obj_hitem hi,
 16                                   hp_freetext_map hfm,
 17                                   hp_item_map_priority himp
 18                             where instr(upper(hi.item_description),
 19                                         hfm.rule(+)) > 0
 20                               and himp.abi_code(+) = hfm.abi_code)
 21                select il.object_no,
 22                       il.item_description,
 23                       il.abi_code
 24                  from il
 25                 where not exists (select null
 26                                     from il il2
 27                                    where il2.object_no = il.object_no
 28                                      and il2.priority_over = il.abi_code))
 29      loop
 30        null;
 31      end loop;
 32    end loop;
 33    dbms_output.put_line( (dbms_utility.get_time-l_start) || ' hsecs' );
 34    insert into run_stats select 'after 1', stats.* from stats;
 35    l_start := dbms_utility.get_time;
 36    for i in 1 .. 100
 37    loop
 38      for x in (with test1 as (select hi.object_no,
 39                                   hi.item_description,
 40                                   hfm.abi_code,
 41                                   himp.priority_over
 42                              from obj_hitem hi,
 43                                   hp_freetext_map hfm,
 44                                   hp_item_map_priority himp
 45                             where instr(upper(hi.item_description),
 46                                         hfm.rule(+)) > 0
 47                               and himp.abi_code(+) = hfm.abi_code)
 48                select distinct object_no,
 49                       item_description,
 50                       abi_code
 51                  from (select object_no,
 52                               item_description,
 53                               cod,
 54                               src,
 55                               count(distinct src) over(partition by object_no,
 56                                                                     cod) as csrc,
 57                               abi_code,
 58                               priority_over
 59                          from (select object_no,
 60                                       item_description,
 61                                       case d.src
 62                                         when 1 then abi_code
 63                                         else priority_over
 64                                       end as cod,
 65                                       d.src,
 66                                       abi_code,
 67                                       priority_over
 68                                  from test1
 69                                  join (select 1 as src from dual
 70                                        union all
 71                                        select 2 as src from dual) d
 72                                    on 1 = 1))
 73                 where csrc = 1
 74                   and src = 1)
 75      loop
 76        null;
 77      end loop;
 78    end loop;
 79    dbms_output.put_line( (dbms_utility.get_time-l_start) || ' hsecs' );
 80    insert into run_stats select 'after 2', stats.* from stats;
 81* end;
 82  /
1240 hsecs
1480 hsecs

SQL> get extract_stats
  1  select a.name, b.value-a.value run1, c.value-b.value run2,
  2         ( (c.value-b.value)-(b.value-a.value)) diff
  3    from run_stats a, run_stats b, run_stats c
  4   where a.name = b.name
  5     and b.name = c.name
  6     and a.runid = 'before'
  7     and b.runid = 'after 1'
  8     and c.runid = 'after 2'
  9     and (c.value-a.value) > 0
 10     and (c.value-b.value) <> (b.value-a.value)
 11*  order by abs( (c.value-b.value)-(b.value-a.value))
SQL> /
 
NAME                                           RUN1       RUN2       DIFF
---------------------------------------- ---------- ---------- ----------
LATCH.FOB s.o list latch                          1          0         -1
LATCH.lgwr LWN SCN                                4          5          1
LATCH.session timer                               4          5          1
STAT...calls to kcmgcs                            6          7          1
LATCH.mostly latch-free SCN                       4          5          1
LATCH.active checkpoint queue latch               8         10          2
LATCH.channel operations parent latch           708        710          2
LATCH.Consistent RBA                              4          1         -3
LATCH.redo writing                               25         22         -3
STAT...active txn count during cleanout           0          4          4
STAT...cleanout - number of ktugct calls          0          4          4
STAT...consistent gets - examination              0          4          4
LATCH.messages                                   35         40          5
LATCH.simulator hash latch                     4372       4348        -24
LATCH.multiblock read objects                  9554       9502        -52
LATCH.sort extent pool                          700        800        100
STAT...parse count (total)                      106          6       -100
STAT...sorts (disk)                               0        100        100
STAT...opened cursors cumulative                106          6       -100
STAT...execute count                            205        105       -100
LATCH.session idle bit                          102          1       -101
LATCH.simulator lru latch                       561        672        111
STAT...db block gets                            759        637       -122
STAT...prefetched blocks                      26031      25893       -138
STAT...free buffer requested                  30972      30775       -197
LATCH.dml lock allocation                       202          4       -198
STAT...calls to kcmgas                          200          0       -200
STAT...sorts (memory)                             4        204        200
LATCH.session allocation                        204          3       -201
STAT...session uga memory max                     0        216        216
LATCH.checkpoint queue latch                   1536       1920        384
LATCH.undo global data                         3807       3408       -399
LATCH.loader state object freelist              200        600        400
STAT...consistent changes                       400          0       -400
LATCH.redo allocation                           455         47       -408
STAT...redo entries                             445         31       -414
STAT...calls to get snapshot scn: kcmgss      20607      20105       -502
STAT...enqueue releases                         708        202       -506
STAT...enqueue requests                         709        202       -507
LATCH.enqueues                                 4727       4021       -706
LATCH.cache buffers lru chain                   900        100       -800
STAT...db block changes                         870         53       -817
LATCH.enqueue hash chains                      5028       4026      -1002
LATCH.shared pool                              1428        233      -1195
LATCH.library cache pin                        1501        305      -1196
LATCH.library cache pin allocation             1284         88      -1196
LATCH.row cache objects                        9630       8254      -1376
LATCH.row cache enqueue latch                  9630       8244      -1386
STAT...buffer is not pinned count             81716      80316      -1400
STAT...table scan blocks gotten               81700      80300      -1400
STAT...no work - consistent read gets         81708      80308      -1400
STAT...consistent gets                       101917     100320      -1597
STAT...session logical reads                 102676     100957      -1719
LATCH.cache buffers chains                   243285     241440      -1845
LATCH.library cache                            3893        497      -3396
STAT...physical writes                          800       4900       4100
STAT...physical writes direct                   800       4900       4100
STAT...physical writes non checkpoint           800       4900       4100
STAT...physical reads                         30858      35669       4811
STAT...physical reads direct                      0       5000       5000
STAT...redo size                              82956      29868     -53088
STAT...recursive calls                        66331       7731     -58600
STAT...table scan rows gotten               8852600    8717200    -135400
STAT...sorts (rows)                            3084     201984     198900
 
64 rows selected.
 

I'm guessing it's the sort that does for the analytic version?

Dom

March 02, 2010 - 12:47 pm UTC

I'd use get_CPU_time instead of just get time.

hash (anti/non-anti) joins are just brutally efficient - one of the reasons we started using hash operations for many more things in 10g (over sort operations)

@Dominic

CezarN, March 03, 2010 - 12:22 am UTC

Your where condition:

where csrc = 1
and src = 1

means than we have to have only ONE src and that src must be abi_code, right?

So, we can reduce the amount of sort analytics implies by work arounding that COUNT(DISTINCT src). So I changed the coding set for src from (1,2) in (0,1). This means that a SUM of ZERO on src means that we have only ONE src and that src must be abi_code. No more DISTICT:

select distinct object_no, abi_code
from (
select object_no,
cod,
src,
sum(src) over(partition by object_no, cod) as csrc,
abi_code,
priority_over
from (select object_no,
case d.src
when 0 then abi_code
else priority_over
end as cod,
d.src,
abi_code,
priority_over
from test1
join (select 0 as src from dual
union all
select 1 as src from dual) d
on 1 = 1)
)
where csrc = 0

Hope this does better then with DISTINCT.

@CezarN

Dominic, March 03, 2010 - 5:30 am UTC

Actually you can have more than 1 src as shown below, but fortunately because we are partitioning on cod, multiple values are still picked up. Having said that, I do like the sum method more, seems sharper to me somehow.

SQL> select *
  2  from test1
  3  /
 
 OBJECT_NO ABI PRI
---------- --- ---
         1 G07 Z99
         1 J01 A22
         1 J01 G06
         1 J01 G07
         1 J01 Z99
         2 P01 Z99
 
6 rows selected.
 
SQL> insert into test1
  2  values(1,'H01','Z99')
  3  /
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
select distinct object_no, 
       abi_code
  from (select object_no,
               cod,
               src,
               sum(src) over(partition by object_no, cod) as csrc,
               abi_code,
               priority_over
          from (select object_no,
                       case d.src
                         when 0 then abi_code
                         else priority_over
                       end as cod,
                       d.src,
                       abi_code,
                       priority_over
                  from test1
                  join (select 0 as src from dual
                        union all
                        select 1 as src from dual) d
                    on 1 = 1)
)
where csrc = 0

SQL> /
 
 OBJECT_NO ABI
---------- ---
         1 H01
         1 J01
         2 P01

The run times don't change much to be honest (sorry Tom, should have mentioned it before, I'm on version 9.2.0.7.0 so no get_cpu_time) and the stats seem to be the same, so the anti hash join looks to be the winner over analytics in this case. 

(looks like that's another set of pre-conceptions for the bin) :-) 

Thanks

Dom

Reader, June 10, 2010 - 12:17 pm UTC

Tom,
In what cases should we use analytics and in what cases group by should be used.

Ranking can be done without using rank function. Can you tell which one is better? Ranking using rank analytic function or just group by and get rownumber

thanks

June 10, 2010 - 2:49 pm UTC

that is like asking "when should I join and when should I group by"

Show me an example of an analytic (that does not ever aggregate, they do not by definition) that can be replaced by group by (that always aggregates - in general returns FEWER rows than a non-group by) and vice versa

when you do that, I'll let you know the answer.

Use rank for example, do ranking with group by and ranking with analytics and then we can have a discussion.

Analytical Query

NS, June 22, 2010 - 3:49 pm UTC

I’ll really appreciate your help to solve a particular scenario using analytical query.
I'm working on Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit.

Following are the create and insert statements:
create table ntest(a number, pdt varchar2(10),sec_pdt varchar2(10),htime number);

insert into ntest values(1,'RE','ST',1000);
insert into ntest values(2,'FF','RE',1100);
insert into ntest values(3,'RE','NR',1400);
insert into ntest values(4,'NR','FF',1450);
insert into ntest values(5,'RE','NR',1700);
insert into ntest values(6,'NR','FF',2100);
insert into ntest values(7,'TT','FA',2105);
insert into ntest values(8,'MM','FJ',2107);
insert into ntest values(9,'RE','TT',2115);
insert into ntest values(10,'RE','FF',2500);
insert into ntest values(11,'MN','RT',2510);
insert into ntest values(12,'MN','NT',2600);
insert into ntest values(13,'ZZ','FF',3000);
insert into ntest values(14,'ZN','FF',3100);

commit;

I want to select column a for those values where pdt_cd in that row matches with the pdt_cd in another row which is within the range of +150 or -150 of htime.
eg : if pdt_cd is 'NS' in row of htime 1000 and there is another row with htime between 850 and 1150 with same pdt_cd, then I want to select that a value.

similar is the case with sec_pdt column.
I'm able to tackle this situation by writing following query :
select a from
(
select a,
count(*) OVER(Partition by pdt
order by htime RANGE BETWEEN 150 PRECEDING AND 150 FOLLOWING) pdtcnt,
count(*) OVER(Partition by sec_pdt
order by htime RANGE BETWEEN 150 PRECEDING AND 150 FOLLOWING) secpdtcnt
from ntest
) n
where pdtcnt > 1
or secpdtcnt > 1;

I want to add 2 more OR conditions to this query
1. I want to select column a for those values where pdt_cd in that row matches with the sec_pdt_cd in another row which is within the range of +150 or -150 of htime

2. I want to select column a for those values where sec_pdt_cd in that row matches with the pdt_cd in another row which is within the range of +150 or -150 of htime

I'm not able to figure out how to add these 2 conditions in the above query or is there any better way to do this in single query?

June 24, 2010 - 6:18 am UTC

it would have been interesting to have at least one observation such that:

where pdt_cd in that row matches
with the sec_pdt_cd in another row which is within the range of +150 or -150 of
htime

in the example wouldn't it?

I added:

insert into ntest values(42,'RE','NR',1800);

to create one. Then, if you add:

count(*) over (partition by pdt, sec_pdt order by htime range between 150 preceding and 150 following)

you just want to keep where that is greater than one.

and isn't:

I want to select column a for those values where sec_pdt_cd in that row
matches with the pdt_cd in another row which is within the range of +150 or
-150 of htime

the same as the prior - if pdt_cd has a sec_pdt_cd within +-150 - then that same thing is true in reverse already.

Analytics Question

NS, June 24, 2010 - 2:53 pm UTC

When I said
"where pdt_cd in that row matches
with the sec_pdt_cd in another row which is within the range of +150 or -150 of htime " I meant that pdt_cd in first row matches with the sec_pdt_cd in other rows, like 2,3 .... which is within the range of +150 or -150 of first row.

In my example RE(pdt_cd) from row 1 matches with RE(sec_pdt_cd) of row 2 and is within +150 or -150 of 1000(htime of row 1)

The other scenario is reverse
NR(sec_pdt_cd) from row 3 matches with NR(pdt_cd) of row 4 which is within +150 or -150 of 1400(htime of row 3)

From someone help I'm able to come up to the following query which brings the output I'm looking for.

with t as
(select a, pdt combined_pdt, htime,pdt,sec_pdt
from ntest
union
select a, sec_pdt combined_pdt, htime,pdt,sec_pdt
from ntest)
select distinct a,pdt,sec_pdt,htime
from (select a,
combined_pdt,
pdt,
sec_pdt,
htime,
count(*) over(partition by combined_pdt order by htime range between 150 preceding and 150 following) pdt_cnt
from t)
where pdt_cnt > 1
and combined_pdt is not null
order by htime;

Here is the result.
A PDT SEC_PDT HTIME
1 RE ST 1000
2 FF RE 1100
3 RE NR 1400
4 NR FF 1450
5 RE NR 1700
42 RE NR 1800
7 TT FA 2105
9 RE TT 2115
11 MN RT 2510
12 MN NT 2600
13 ZZ FF 3000
14 ZN FF 3100

Thanks.

reader

A reader, July 12, 2010 - 12:03 pm UTC

On Toad Grid output,
break on column

does not work

Is there a way to simulate that function with analyticals so that I can just execute a sql query without "break on table_name" at the top and get the same result

Thanks

July 19, 2010 - 10:53 am UTC

ops$tkyte%ORA10GR2> select deptno, ename from scott.emp order by deptno, ename;

    DEPTNO ENAME
---------- ----------
        10 CLARK
        10 KING
        10 MILLER
        20 ADAMS
        20 FORD
        20 JONES
        20 SCOTT
        20 SMITH
        30 ALLEN
        30 BLAKE
        30 JAMES
        30 MARTIN
        30 TURNER
        30 WARD

14 rows selected.

ops$tkyte%ORA10GR2> break on deptno
ops$tkyte%ORA10GR2> select deptno, ename from scott.emp order by deptno, ename;

    DEPTNO ENAME
---------- ----------
        10 CLARK
           KING
           MILLER
        20 ADAMS
           FORD
           JONES
           SCOTT
           SMITH
        30 ALLEN
           BLAKE
           JAMES
           MARTIN
           TURNER
           WARD

14 rows selected.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select decode( row_number() over (partition by deptno order by ename), 1, deptno ) new_deptno,
  2         ename
  3    from scott.emp
  4    order by deptno, ename
  5  /

NEW_DEPTNO ENAME
---------- ----------
        10 CLARK
           KING
           MILLER
        20 ADAMS
           FORD
           JONES
           SCOTT
           SMITH
        30 ALLEN
           BLAKE
           JAMES
           MARTIN
           TURNER
           WARD

14 rows selected.

it will not be "as efficient" or course, takes extra work to compute row_number

Analytics Stumper

Kevin Kirkpatrick, August 27, 2010 - 4:55 pm UTC

Hey Tom,

I'm presently stumped on this analytics problem - and am about to bite the bullet and code it out in some very ugly PL/SQL. But before throwing in the towel, I figured I'd let a pro take a stab.

2 Tables, DEBTS and ASSETS, defined as

WITH DEBTS AS
(
SELECT 1 TRANSACTION_ID, 'X' ATYPE, 4 UNITS FROM DUAL UNION ALL
SELECT 2 TRANSACTION_ID, 'X' ATYPE, 5 UNITS FROM DUAL UNION ALL
SELECT 3 TRANSACTION_ID, 'Y' ATYPE, 2 UNITS FROM DUAL UNION ALL
SELECT 4 TRANSACTION_ID, 'Z' ATYPE, 3 UNITS FROM DUAL
) ,
ASSETS AS
(SELECT 'A' ASSET_ID, 'X' ATYPE, 3 UNITS FROM DUAL UNION ALL
SELECT 'B' ASSET_ID, 'X' ATYPE, 3 UNITS FROM DUAL UNION ALL
SELECT 'C' ASSET_ID, 'X' ATYPE, 4 UNITS FROM DUAL UNION ALL
SELECT 'D' ASSET_ID, 'X' ATYPE, 2 UNITS FROM DUAL UNION ALL
SELECT 'E' ASSET_ID, 'Y' ATYPE, 3 UNITS FROM DUAL UNION ALL
SELECT 'F' ASSET_ID, 'Z' ATYPE, 2 UNITS FROM DUAL)
-- insert query here.

/*
DEBTS can be interpretted as: "Transaction <TRANSACTION_ID> must deduct <UNITS> units from
assets of type <ATYPE>."
ASSETS can be interpretted as: "Asset <ASSET_ID> is of type <ATYPE> and has <UNITS> available"

Desired results: Show the required series of steps to properly deduct DEBTS units from ASSETS units of the same ASSET TYPE.
Rules:

1) Process DEBTS in order of TRANSACTION_ID.
2) Debt units must be removed from ASSETS with a matching asset type, and must be
deducted from assets in order of ASSET_ID*.
3) Uncovered Debts of a given transaction are not associate with any ASSET, but
are available in the result set for error handling.

So, for TRANSACTION_ID 1, we need to remove 4 units from ATYPE = 'X'. We first remove 3 units
from ASSET_ID A, then 1 more unit for ASSET_ID B. Transaction 2 must remove an additional 5
units from ATYPE='X'. It will take the remaining 2 units from ASSET_ID B, and then 3 more units
from ASSET_ID C. No units will be taken from ASSET_ID D.

The total activity for ATYPE=X would thus appear as follows in the query results:

TRANSACTION_ID UNITS_REMOVED ASSET_ID
1 3 A
1 1 B
2 2 B
2 3 C

Following this logic, transaction ID 3 is a new asset type, Y, and would thus have the following:

TRANSACTION_ID UNITS_REMOVED ASSET_ID
3 2 E

The special case is if there are insufficient UNITS in ASSETS to cover a DEBT. In that case
the remaining units need to be left as <null> ASSET_ID. So Transaction ID 4 would give:

TRANSACTION_ID UNITS_REMOVED ASSET_ID
4 2 F
4 1 <null>

Anyway, to recap: can you substitute for <insert query here> such that this result set would be obtained:

WITH DEBTS AS
(
SELECT 1 TRANSACTION_ID, 'X' ATYPE, 4 UNITS FROM DUAL UNION ALL
SELECT 2 TRANSACTION_ID, 'X' ATYPE, 5 UNITS FROM DUAL UNION ALL
SELECT 3 TRANSACTION_ID, 'Y' ATYPE, 2 UNITS FROM DUAL UNION ALL
SELECT 4 TRANSACTION_ID, 'Z' ATYPE, 3 UNITS FROM DUAL ) ,
ASSETS AS
(SELECT 'A' ASSET_ID, 'X' ATYPE, 3 UNITS FROM DUAL UNION ALL
SELECT 'B' ASSET_ID, 'X' ATYPE, 3 UNITS FROM DUAL UNION ALL
SELECT 'C' ASSET_ID, 'X' ATYPE, 4 UNITS FROM DUAL UNION ALL
SELECT 'D' ASSET_ID, 'X' ATYPE, 2 UNITS FROM DUAL UNION ALL
SELECT 'E' ASSET_ID, 'Y' ATYPE, 3 UNITS FROM DUAL UNION ALL
SELECT 'F' ASSET_ID, 'Z' ATYPE, 2 UNITS FROM DUAL)
-- insert query here.

TRANSACTION_ID UNITS_REMOVED ASSET_ID
1 3 A
1 1 B
2 2 B
2 3 C
3 2 E
4 2 F
4 1 <null>

Or do I have to bite the bullet and cram out some PL/SQL? (the main reason I want to avoid that is performance - I've got to process hundreds of thousands of DEBTS from millions of ASSETS, and need it to happen quick, e.g. PARALLEL FTS of ASSETS, PARALLEL FTS of DEBTS, hash join, and crunch).

Solved

Kevin Kirkpatrick, August 30, 2010 - 10:30 am UTC

Tom,

Feel free to ignore the above - I've come up with the following solution (spoiler alert - to those who like to work through a neatly packaged SQL puzzler, I'd put this as a prime example...)

WITH
DEBTS AS (
SELECT 1 TRANSACTION_ID, 'X' ATYPE, 4 UNITS FROM DUAL UNION ALL
SELECT 2 TRANSACTION_ID, 'X' ATYPE, 5 UNITS FROM DUAL UNION ALL
SELECT 3 TRANSACTION_ID, 'Y' ATYPE, 2 UNITS FROM DUAL UNION ALL
SELECT 4 TRANSACTION_ID, 'Z' ATYPE, 3 UNITS FROM DUAL )
,ASSETS AS (
SELECT 'A' ASSET_ID, 'X' ATYPE, 3 UNITS FROM DUAL UNION ALL
SELECT 'B' ASSET_ID, 'X' ATYPE, 3 UNITS FROM DUAL UNION ALL
SELECT 'C' ASSET_ID, 'X' ATYPE, 4 UNITS FROM DUAL UNION ALL
SELECT 'D' ASSET_ID, 'X' ATYPE, 2 UNITS FROM DUAL UNION ALL
SELECT 'E' ASSET_ID, 'Y' ATYPE, 3 UNITS FROM DUAL UNION ALL
SELECT 'F' ASSET_ID, 'Z' ATYPE, 2 UNITS FROM DUAL)
-- Solution:
-- Asset ranges define the number of units each asset_id will contribute
-- to a given asset type. Define into existence the "null" asset, which
-- effectively contributes infinite units to each asset type.
,ASSET_RANGES AS (
SELECT ASSET_ID, ATYPE,
NVL(LAG(RUNNING_UNITS) OVER (PARTITION BY ATYPE ORDER BY ASSET_ID),0)+1 MIN_UNIT,
RUNNING_UNITS MAX_UNIT
FROM
(SELECT SUM(UNITS) OVER (PARTITION BY ATYPE ORDER BY ASSET_ID) RUNNING_UNITS,
A.*
FROM ASSETS A)
UNION ALL
SELECT NULL ASSET_ID, ATYPE, SUM(UNITS)+1 MIN_UNIT, 1000000000000 MAX_UNIT
FROM ASSETS
GROUP BY ATYPE)
-- Debt ranges define the number of units each transaction_id will deduct
-- from a given asset type.
,DEBT_RANGES AS (
SELECT TRANSACTION_ID, ATYPE,
NVL(LAG(RUNNING_UNITS) OVER (PARTITION BY ATYPE ORDER BY TRANSACTION_ID),0)+1 MIN_UNIT,
RUNNING_UNITS MAX_UNIT
FROM
(SELECT SUM(UNITS) OVER (PARTITION BY ATYPE ORDER BY TRANSACTION_ID) RUNNING_UNITS,
D.*
FROM DEBTS D))
-- Using the ranges defined above, I get my desired "HASH JOIN AND CRUNCH" performance
SELECT TRANSACTION_ID,
ASSET_ID,
LEAST(A.MAX_UNIT, D.MAX_UNIT) - GREATEST(A.MIN_UNIT,D.MIN_UNIT) +1
FROM DEBT_RANGES D, ASSET_RANGES A
WHERE D.ATYPE = A.ATYPE AND D.MIN_UNIT <= A.MAX_UNIT AND D.MAX_UNIT >= A.MIN_UNIT
ORDER BY TRANSACTION_ID, ASSET_ID

analytics

A reader, October 15, 2010 - 9:09 am UTC

Hi Tom,
you always say that the analytical functions *opens up NEW * path .

Quote from your book :

"This new set of functionality holds some exciting possibilities. It opens up a whole new way of looking
at the data. It will remove a lot of procedural code and complex (or inefficient) queries that would have
taken a long time to develop, to achieve the same result".

Waht is the new path ? what is the 'mechanism' behind that ? how it happens? with out doing 'self join' how it apply the analytcal function and calculate AVG (for eample)?

Thanks

October 15, 2010 - 9:25 am UTC

not sure how to answer this.

close your eyes, envision a table of data. Now, envision a sliding range of data in that big table of data - a range of rows. Analytics has access to the rows in that range and can reference them. The "new path" is "new code" that was added to the database kernel to open up access to that range - what we call a "window" and added steps to plans like "winder buffer sort" and the like.

A reader, October 15, 2010 - 9:32 am UTC

Many thanks Tom,

"The "new path" is "new code" that was added to the database kernel"

Yes, understood now - One more doubt is that -Oracle engine has to access the FULL table first before applying ANY analytical function (assume we are applying analytics on a TABLE) - correct ?

October 15, 2010 - 10:04 am UTC

... Oracle engine has to access the
FULL table first before applying ANY analytical function ...

of course the oracle engine has access to the full table - that is because we "own" the table, we manage all of the data.

However, it is not a REQUIREMENT to access the full table before applying ANY analytical function - it would depend on the schema (indexing), the query, the partitioning clause and so on.

Sometimes the entire query is "executed" before the first rows are returned and sometimes the first rows may be returned immediately without hitting the entire table. "it depends" - for all queries "it depends". It is not a requirement.

Slow analytic function

lalu, October 17, 2010 - 4:17 am UTC

HiTom,

I have a table with contents as below:

id notes
1 abcdefghijklmnopqrstuvwxyz

I want the output as below

id line notes
1 1 abcd (varchar of 4 characters)
1 2 efgh
1 3 ijkl
.....
....
1 7 yz

I use the below query to get the data:

SELECT DISTINCT A.ID, LEVEL LINE, SUBSTR(A.T, (LEVEL - 1) * 4 + 1, 4) NOTES
FROM (SELECT ID, CONTENTS T FROM MY_TABLE) A
CONNECT BY LENGTH(SUBSTR(A.T, (LEVEL - 1) * 4 + 1, (LEVEL - 1) * 4)) >= 1
ORDER BY ID, LINE

In my production system the table has got 50000 rows and the notes field is 1000char.
I want the data in chunk of 80 chars(notes) in the output.

Tried the above query by replacing 4 with 80.The query runs for 3/4 hours but still no output.
Is there any way to rewrite the query to get the data faster?

Thanks.
lalu.

October 25, 2010 - 8:17 am UTC

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select rownum id,
  4         rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) ||
  5         rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) ||
  6         rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) ||
  7         rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) ||
  8         rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) ||
  9         rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) ||
 10         rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) ||
 11         rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) || rpad( object_name, 30, 'x' ) ||
 12             rpad('y',81,'y') data
 13    from all_objects
 14   where rownum <= 5000
 15  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set autotrace traceonly
ops$tkyte%ORA10GR2> set timing on
ops$tkyte%ORA10GR2> select id, to_number( substr( column_value, 1, 4 ) ) id2, substr( column_value, 5 ) new_data
  2    from t, table( cast( multiset( select to_char( rownum, 'fm0009' ) || substr( data, 1+(rownum-1)*80, 80 )
  3                                     from dual
  4                                                                  connect by level <= ceil(length(data)/80) ) as sys.odciVarchar2List ) )
  5  /

70000 rows selected.

Elapsed: 00:00:01.29

Execution Plan
----------------------------------------------------------
Plan hash value: 442444467

----------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |    50M|    25G| 59040 |
|   1 |  NESTED LOOPS                       |      |    50M|    25G| 59040 |
|   2 |   TABLE ACCESS FULL                 | T    |  6130 |  3202K|    84 |
|   3 |   COLLECTION ITERATOR SUBQUERY FETCH|      |       |       |       |
|   4 |    COUNT                            |      |       |       |       |
|*  5 |     CONNECT BY WITHOUT FILTERING    |      |       |       |       |
|   6 |      FAST DUAL                      |      |     1 |       |     2 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter(LEVEL<=CEIL(LENGTH(:B1)/80))

Note
-----
   - cpu costing is off (consider enabling it)
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
         68  recursive calls
          0  db block gets
       4903  consistent gets
        834  physical reads
          0  redo size
    6447972  bytes sent via SQL*Net to client
      51726  bytes received via SQL*Net from client
       4668  SQL*Net roundtrips to/from client
       5000  sorts (memory)
          0  sorts (disk)
      70000  rows processed

ops$tkyte%ORA10GR2> set autotrace off

seems to work fairly well.

Tried but got error

A reader, October 25, 2010 - 9:18 am UTC

Hi TOm,

I tried your example.
Create table T,then
select id, to_number( substr( column_value, 1, 4 ) ) id2, substr( column_value, ) new_data
from t, table( cast( multiset( select to_char( rownum, 'fm0009' ) || substr( data,
1+(rownum-1)*80, 80 ) from dual connect by level <= ceil(length(data)/80) )as sys.odciVarchar2List ) )

returns:
select id, to_number( substr( column_value, 1, 4 ) ) id2, substr( column_value, ) new_data
*
ERROR at line 1:
ORA-00936: missing expression

Working

lalu, October 25, 2010 - 9:33 am UTC

Hi Tom,

Thanks a lot.
I was missing a very small char from the script.
Its running now.Will check the performance.

Thanks.

Achieve this with analytics or pivot query?

A reader, January 20, 2011 - 3:35 pm UTC

Hi

I have following data:

phone state state_mod_date service_id
-------- ------- ---------------- ------------
9087624 online 2010/02/22 1
9087624 online 2010/02/22 2
9087624 online 2010/02/23 6
9087624 offline 2010/02/23 2
9087624 offline 2010/02/24 6
9200921 online 2010/01/25 1
9200921 online 2010/03/12 3
9200921 online 2010/04/21 7
9200921 offline 2010/09/11 3
9760293 online 2010/01/02 1

I need to transform it to

phone service_id start_date end_date
-------- ----------- ----------- ----------
9087624 1 2010/02/22
9087624 2 2010/02/22 2010/02/23
9087624 6 2010/02/23 2010/02/24
9200921 1 2010/01/25
9200921 3 2010/03/12 2010/09/11
9200921 7 2010/04/21
9760293 1 2010/01/02

I have been doing this with PL/SQL but the amount of data is increasing in the last few months and the PL/SQL code is running slower and slower.

I have been looking the analytic function or pivot query but cant find a way of doing it, can you please share some lights?

I am running 10.2.0.3

Thanks a lot

Pedro

January 24, 2011 - 6:58 am UTC

you would need to

a) supply create tables
b) insert into tables (so we have data)

and most importantly:

c) explain your requirement, sometimes a picture is worth a thousand words - but we need an explanation of your inputs and outputs here and the processing that needs to take place.

It looks like just getting the min(state_mod_date) and max(state_mod_date) (where max<>min) by phone and service_id, it is not clear what the state column is used for

A reader, January 24, 2011 - 11:39 am UTC

use group by as below...

select phone, service_id, min(case when state='online' then state_mod_date END) as start_date , max(case when state='offline' then state_mod_date END) as end_date
from data
Group by phone, service_id

Achieve this with analytics or pivot query?

Ravi Kumar Pilla, January 27, 2011 - 9:00 am UTC

Dear Tom,
I'm a silent reader of your site and learned a lot and appreciate your work.
I don't know whether it is the best way, but its working for me. Correct me if any other way which works faster for the person who asked.

SELECT PHONE,SERVICEID,MAX(DECODE(status,'ONLINE',START_date)) startdate,
MAX(decode(status,'OFFLINE',START_DATE)) ENDDATE FROM PHONE GROUP BY PHONE,
SERVICEID

February 01, 2011 - 3:16 pm UTC

as I said above:

It looks like just getting the min(state_mod_date) and max(state_mod_date) (where max<>min) by phone and service_id, it is not clear what the state column is used for

I cannot validate your query as I don't know what the question is

Combine dates

Bob C, January 28, 2011 - 3:12 pm UTC

I have a similar challenge as above, except I want to combine rows to show continuous date range when rate plan and customer are the same. Records are not supposed to overlap, so that does not need to be handled.

drop table t1;
create table t1 ( key_col char(1), d_begin date, d_end date, cust char(10), rate_plan char(10) );

insert into t1 values ('a',TO_DATE('01/01/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/02/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r1');
insert into t1 values ('a',TO_DATE('01/03/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/04/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r1');
insert into t1 values ('a',TO_DATE('01/05/2011 01:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/06/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r1');
insert into t1 values ('a',TO_DATE('01/07/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/08/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r1');
insert into t1 values ('a',TO_DATE('01/09/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/10/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r1');
insert into t1 values ('a',TO_DATE('01/11/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/12/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r2');
insert into t1 values ('b',TO_DATE('01/01/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/02/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r1');
insert into t1 values ('b',TO_DATE('01/03/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/04/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r1');
insert into t1 values ('b',TO_DATE('01/05/2011 01:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/06/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r1');
insert into t1 values ('b',TO_DATE('01/07/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/08/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r1');
insert into t1 values ('b',TO_DATE('01/09/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'),TO_DATE('01/10/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 'c1', 'r2');
commit;

Would like to get results:

a, 01/01/2011 00:00:00, 01/04/2011 23:59:59, c1, r1
a, 01/05/2011 01:00:00, 01/10/2011 23:59:59, c1, r1
a, 01/11/2011 00:00:00, 01/12/2011 23:59:59, c1, r2
b, 01/01/2011 00:00:00, 01/04/2011 23:59:59, c1, r1
b, 01/05/2011 01:00:00, 01/10/2011 23:59:59, c1, r1
b, 01/11/2011 00:00:00, 01/12/2011 23:59:59, c1, r2

following query is close, but does not account for the gap between rows 2 and 3

select distinct key_col, cust, rate_plan,
first_value(d_begin) over (partition by key_col, cust, rate_plan order by d_begin rows unbounded preceding) as min_eff,
last_value(d_end) over (partition by key_col, cust, rate_plan order by d_begin ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as max_term
from t1
order by 1,2,4

KEY_COL CUST RATE_PLAN MIN_EFF MAX_TERM
a c1 r1 1/1/2011 1/10/2011 11:59:59 PM
a c1 r2 1/11/2011 1/12/2011 11:59:59 PM
b c1 r1 1/1/2011 1/8/2011 11:59:59 PM
b c1 r2 1/9/2011 1/10/2011 11:59:59 PM

February 01, 2011 - 3:31 pm UTC

I don't understand your output

a, 01/01/2011 00:00:00, 01/04/2011 23:59:59, c1, r1
a, 01/05/2011 01:00:00, 01/10/2011 23:59:59, c1, r1
a, 01/11/2011 00:00:00, 01/12/2011 23:59:59, c1, r2

I see why the third row is there, but I don't get why the first two are not merged.

I don't see any "gap" there.

@Bob C re:combine dates

Stew Ashton, January 29, 2011 - 5:04 am UTC

Hi Bob,

Here's a solution based on a contribution by Steve from Pleasanton CA:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1594948200346140013#1596748700346101469

select key_col, cust, rate_plan, min(d_begin) d_begin, max(d_end) d_end from (
  select key_col, cust, rate_plan, d_begin, d_end,
  sum(new_start) over(partition by key_col,cust,rate_plan order by d_begin) grp from (
    select key_col, cust, rate_plan, d_begin, d_end,
    case when 
      lag(d_end + 1/24/60/60) over(partition by key_col,cust,rate_plan order by d_begin)
      >= d_begin
      then 0 else 1 end new_start 
    from t1
  )
)
group by key_col, cust, rate_plan, grp
order by key_col, cust, rate_plan, d_begin;

The innermost query sets NEW_START to 0 when there is no date gap with the previous record, and to 1 when you have to start afresh. The middle query sets GRP as a running total, so all records within the same continuous date range will belong to the same GRP. The outer query uses GROUP BY to get one record per date range.

Perfect answer thanks

A reader, February 01, 2011 - 10:24 am UTC

Works great, thanks.

Your Feb. 1 followup

Stew Ashton, February 01, 2011 - 4:35 pm UTC

Tom, you said 'I don't see any "gap" there.'

The second row starts at 1 A.M., not midnight. I didn't see that at first either. Or did you mean something else?

February 01, 2011 - 6:30 pm UTC

indeed, the gap was too subtle for my eyes :) I missed it entirely. After putting my glasses on:

ops$tkyte%ORA11GR2> select key_col, cust, rate_plan, min(d_begin), max(d_end)
  2    from (
  3  select key_col, cust, rate_plan, d_begin, d_end, last_d_end,
  4         last_value(rn IGNORE NULLS) over (partition by key_col, cust, rate_plan order by d_end) grp
  5    from (
  6  select key_col, cust, rate_plan, d_begin, d_end,
  7         lag(d_end) over (partition by key_col, cust, rate_plan order by d_end) last_d_end,
  8         case when lag(d_end) over (partition by key_col, cust, rate_plan order by d_end) = d_begin-1/24/60/60 then null
  9                  else row_number() over (partition by key_col, cust, rate_plan order by d_end )
 10                  end rn
 11    from t1
 12         )
 13             )
 14   group by key_col, cust, rate_plan, grp
 15   order by key_col, cust, rate_plan
 16  /

K CUST       RATE_PLAN  MIN(D_BEGIN)         MAX(D_END)
- ---------- ---------- -------------------- --------------------
a c1         r1         01-jan-2011 00:00:00 04-jan-2011 23:59:59
a c1         r1         05-jan-2011 01:00:00 10-jan-2011 23:59:59
a c1         r2         11-jan-2011 00:00:00 12-jan-2011 23:59:59
b c1         r1         01-jan-2011 00:00:00 04-jan-2011 23:59:59
b c1         r1         05-jan-2011 01:00:00 08-jan-2011 23:59:59
b c1         r2         09-jan-2011 00:00:00 10-jan-2011 23:59:59

6 rows selected.

for Pedro from Spain

CezarN, February 02, 2011 - 3:36 am UTC

/*--Here are create/insert statements you have to supply whenever you ask something
--drop table t1;
--create table t1 (phone varchar2(10), state varchar2(10), state_mod_date date, service_id number(2));

insert into t1 (phone, state, state_mod_date, service_id) values ('9087624', 'online', to_date('2010/02/22','yyyy/mm/dd'), 1);
insert into t1 (phone, state, state_mod_date, service_id) values ('9087624', 'online', to_date('2010/02/22','yyyy/mm/dd'), 2);
insert into t1 (phone, state, state_mod_date, service_id) values ('9087624', 'online', to_date('2010/02/23','yyyy/mm/dd'), 6);
insert into t1 (phone, state, state_mod_date, service_id) values ('9087624', 'offline', to_date('2010/02/23','yyyy/mm/dd'), 2);
insert into t1 (phone, state, state_mod_date, service_id) values ('9087624', 'offline', to_date('2010/02/24','yyyy/mm/dd'), 6);

insert into t1 (phone, state, state_mod_date, service_id) values ('9200921', 'online', to_date('2010/01/25','yyyy/mm/dd'), 1);
insert into t1 (phone, state, state_mod_date, service_id) values ('9200921', 'online', to_date('2010/03/12','yyyy/mm/dd'), 3);
insert into t1 (phone, state, state_mod_date, service_id) values ('9200921', 'online', to_date('2010/04/21','yyyy/mm/dd'), 7);
insert into t1 (phone, state, state_mod_date, service_id) values ('9200921', 'offline', to_date('2010/09/11','yyyy/mm/dd'), 3);

insert into t1 (phone, state, state_mod_date, service_id) values ('9760293', 'online', to_date('2010/01/02','yyyy/mm/dd'), 1);
commit;

select * from t1;
*/
--Hope this is what you want

select phone, service_id, start_date, end_date
from (
select phone, service_id, start_date, last_value(end_date) over (partition by phone, service_id) as end_date
from (
select
phone,
service_id,
case when state='online' then state_mod_date else null end as start_date,
case when state='offline' then state_mod_date else null end as end_date,
state
from t1
order by phone, service_id, start_date
)
)
where start_date is not null
order by phone, service_id

to_date function

Nitesh Jyotirmay, April 14, 2011 - 6:16 am UTC

SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'DD:MM:YYYY HH24:MI:SS';

Session altered.

SQL> SELECT SYSDATE,to_date('13:14:00', 'HH24:MI:SS') FROM DUAL;

SYSDATE             to_date('13:14:00',
------------------- -------------------
14:04:2011 11:17:43 01:04:2011 13:14:00

why the to date function gives the output like that.

i am using the Oracle 9i

April 14, 2011 - 10:00 am UTC

because you told us to.

you are running a query that selected two dates out. When you select a date in sqlplus, it converts it to a string for display. It uses your NLS date format for that.

If you want to control what the STRING representation of a date looks like - you would have to TO_CHAR it with a format - the format being the format you want.

TO_DATE with only TIME elements in the string

Kim Berg Hansen, April 14, 2011 - 10:23 am UTC

Hi, Tom

A followup to the review from Nitesh above.
Nitesh did this:

SQL> SELECT SYSDATE,to_date('13:14:00', 'HH24:MI:SS') FROM DUAL;

SYSDATE             to_date('13:14:00',
------------------- -------------------
14:04:2011 11:17:43 01:04:2011 13:14:00

That is, he used TO_DATE on a string that only contains TIME elements. Since he did not specify any day, month or year, Oracle will have to "make up" something. Just as if you use TO_DATE only with DD-MM-YYYY elements, Oracle "makes up" that this must be midnight 00:00:00 hour.

The curious question is why Oracle picks the "first day of the current month"? It could have just as well have picked "today" or "January first Year 1"?

(Maybe Nitesh should consider using an INTERVAL datatype if he is only concerned with a time period? :-)

Anyway - just curious :-)

April 14, 2011 - 10:27 am UTC

because we documented it to be that way.

If you do not specify a year, the year is the current year.
If you do not specify a month, the month is the current month.
If you do not specify a day, the day is 1.

Thanks Tom

Kim Berg Hansen, April 14, 2011 - 10:33 am UTC

Thanks - I just couldn't find that bit in the documentation when I searched :-)

April 14, 2011 - 11:22 am UTC

SQL Language Reference

http://docs.oracle.com/docs/cd/E11882_01/server.112/e17118/sql_elements001.htm#sthref154

<quote>
The default date values are determined as follows:

The year is the current year, as returned by SYSDATE.

The month is the current month, as returned by SYSDATE.

The day is 01 (the first day of the month).

The hour, minute, and second are all 0.
</quote>

OK Great :-)

Kim Berg Hansen, April 14, 2011 - 11:42 am UTC

I was looking in the documentation of TO_DATE and in the "date formats" and those parts of the documentation. It did not occur to me to look at the datatype documentation :-)

Thanks for your patience...

April 14, 2011 - 5:25 pm UTC

I'll admit, it is a bit odd to be there - since it is really to_date related more so than data type related.

SQL query needs to be modified

A reader, April 17, 2011 - 1:02 pm UTC

Sql) query needs to be modified according to parameters for date range providedQuestion: the below query is for sales compensation based on the datre range i have to modify it so automatically it will change the date, can anyone please tell me how to go ab out it

--Use follwoing query for total tab of excel to make s` ure for current three month change OSR DATE FOR THAT
select decode(to_char(osr_date),
'12/10/2010 00:00:00', '12/2010',
'12/22/2010 00:00:00', '12/2010',
'01/04/2011 00:00:00', '12/2010',
'01/10/2011 00:00:00', '01/2011',
'01/22/2011 00:00:00', '01/2011',
'02/04/2011 00:00:00', '01/2011',
'02/10/2011 00:00:00', '02/2011',
'02/22/2011 00:00:00', '02/2011',
'03/04/2011 00:00:00', '02/2011',
'03/10/2011 00:00:00', '03/2011',
'03/22/2011 00:00:00', '03/2011',
'04/04/2011 00:00:00', '03/2011',
'1') osr_run,
lst_nm,sum(less_usage) "LESS_USAGE"
from ( select a.LST_NM , round(sum(comp_amt),2)less_usage,
decode (to_char(trunc(a.end_dt)),
'12/10/2010 00:00:00', '1',
'12/22/2010 00:00:00', '1',
'01/04/2011 00:00:00', '1',
'01/10/2011 00:00:00', '2',
'01/22/2011 00:00:00', '2',
'02/04/2011 00:00:00', '2',
'02/10/2011 00:00:00', '3',
'02/22/2011 00:00:00', '3',
'03/04/2011 00:00:00', '3',
'03/10/2011 00:00:00', '4',
'03/22/2011 00:00:00', '4',
'04/04/2011 00:00:00', '4',
to_char(a.end_dt)) osr_run,
to_char(trunc(a.end_dt))osr_date
from amk_arm_comp_2011 a
where lst_nm <>'House Account - CRM '
and a.end_dt between to_date('12/10/2010','mm/dd/yyyy') and to_date('04/04/2011','mm/dd/yyyy') --update each month by 1. 4 cycle month range for ARMS.
and a.sub_id <>2002003611
group by a.LST_NM, decode (to_char(trunc(a.end_dt)),
'12/10/2010 00:00:00', '1',
'12/22/2010 00:00:00', '1',
'01/04/2011 00:00:00', '1',
'01/10/2011 00:00:00', '2',
'01/22/2011 00:00:00', '2',
'02/04/2011 00:00:00', '2',
'02/10/2011 00:00:00', '3',
'02/22/2011 00:00:00', '3',
'03/04/2011 00:00:00', '3',
'03/10/2011 00:00:00', '4',
'03/22/2011 00:00:00', '4',
'04/04/2011 00:00:00', '4',
to_char(a.end_dt)),
to_char(trunc(a.end_dt))
)
group by decode(to_char(osr_date),
'12/10/2010 00:00:00', '12/2010',
'12/22/2010 00:00:00', '12/2010',
'01/04/2011 00:00:00', '12/2010',
'01/10/2011 00:00:00', '01/2011',
'01/22/2011 00:00:00', '01/2011',
'02/04/2011 00:00:00', '01/2011',
'02/10/2011 00:00:00', '02/2011',
'02/22/2011 00:00:00', '02/2011',
'03/04/2011 00:00:00', '02/2011',
'03/10/2011 00:00:00', '03/2011',
'03/22/2011 00:00:00', '03/2011',
'04/04/2011 00:00:00', '03/2011',
'1'), lst_nm
order by lst_nm, osr_run desc;

April 18, 2011 - 10:27 am UTC

You'll need to be a tad more clear on this

"i have to modify it so automatically it will change the date"

I presume you mean you'd like what are now hard coded dates to be replaced with some sort of function that figures out what the dates should be. BUT in order to do that - you would sort of need to fill us in on WHAT THE RULES ARE - what the LOGIC is.

I'm looking at what appear to be a bunch of random dates to me. Consider:

ops$tkyte%ORA11GR2> select dt, lag(dt) over (order by dt) ,
  2         dt-lag(dt) over (order by dt)
  3  from
  4  (select to_date( '12/10/2010 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
  5  select to_date( '12/22/2010 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
  6  select to_date( '01/04/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
  7  select to_date( '01/10/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
  8  select to_date( '01/22/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
  9  select to_date( '02/04/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
 10  select to_date( '02/10/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
 11  select to_date( '02/22/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
 12  select to_date( '03/04/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
 13  select to_date( '03/10/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
 14  select to_date( '03/22/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual union all
 15  select to_date( '04/04/2011 00:00:00', 'mm/dd/yyyy hh24:mi:ss') dt from dual)
 16  /

DT        LAG(DT)OV DT-LAG(DT)OVER(ORDERBYDT)
--------- --------- -------------------------
10-DEC-10
22-DEC-10 10-DEC-10                        12
04-JAN-11 22-DEC-10                        13
10-JAN-11 04-JAN-11                         6
22-JAN-11 10-JAN-11                        12
04-FEB-11 22-JAN-11                        13
10-FEB-11 04-FEB-11                         6
22-FEB-11 10-FEB-11                        12
04-MAR-11 22-FEB-11                        10
10-MAR-11 04-MAR-11                         6
22-MAR-11 10-MAR-11                        12
04-APR-11 22-MAR-11                        13

12 rows selected.

what is the "logic" behind these dates that are 13, 12, sometimes 6, other times 10 days apart???!?!?!?!?

I will say - DO NOT DO NOT DO NOT code this:

(to_char(trunc(a.end_dt)),
'12/10/2010 00:00:00', '1',...

DO code DO code DO code this:

(to_char(trunc(a.end_dt)),
to_date('12/10/2010 00:00:00','mm/dd/yyyy hh24:mi:ss'), '1', ....

NEVER NEVER NEVER rely on DEFAULTS.

Also - do not DO NOT do not code this:

decode (to_char(trunc(a.end_dt)),
'12/10/2010 00:00:00', '1',

you are once again relying on IMPLICIT CONVERSIONS - do not do that - ALWAYS (repeat ALWAYS) use an explicit format!@!!!!!!!!!!!!!!!!

a reader, April 18, 2011 - 1:19 pm UTC

The dates on the 10th, 22nd and 4th (of the following month) are one considered "cycle".. so
12/10/2010, 12/22/2010 & 04/04/2011 fall under Dec-10 cycle. 01/10/2011, 01/22/2011 &
02/04/2011 fall under Jan-11 cycle... & so on. These cycle dates are being inserted from another table.

What I'm looking to do is get rid of the hard-coded values from the 2 sets of the decode.
So I want to run this SQl from a given data range say - 12/05/2010 to 04/05/2011, it should pick up all "cycle" dates and name
them with the right month in order to figure our commissions for the sales rep.

Believe me, I could not agree more with you on the implicit conversions. But this is what we are dealing with for now.

April 18, 2011 - 1:51 pm UTC

keep going, explain the number system and dates in these too:

                                            '12/10/2010 00:00:00',    '12/2010',
                                            '12/22/2010 00:00:00',    '12/2010',
                                            '01/04/2011 00:00:00',    '12/2010',
                                            '01/10/2011 00:00:00',    '01/2011',
                                            '01/22/2011 00:00:00',    '01/2011',
                                            '02/04/2011 00:00:00',    '01/2011',
                                            '02/10/2011 00:00:00',    '02/2011',
                                            '02/22/2011 00:00:00',    '02/2011',
                                            '03/04/2011 00:00:00',    '02/2011',
                                            '03/10/2011 00:00:00',    '03/2011',
                                            '03/22/2011 00:00:00',    '03/2011',
                                            '04/04/2011 00:00:00',    '03/2011',
                                             '1') osr_run,
       lst_nm,sum(less_usage) "LESS_USAGE"
from ( select a.LST_NM  , round(sum(comp_amt),2)less_usage,
                         decode (to_char(trunc(a.end_dt)),
                                            '12/10/2010 00:00:00',    '1',
                                            '12/22/2010 00:00:00',    '1',
                                            '01/04/2011 00:00:00',    '1',
                                            '01/10/2011 00:00:00',    '2',
                                            '01/22/2011 00:00:00',    '2',
                                            '02/04/2011 00:00:00',    '2',
                                            '02/10/2011 00:00:00',    '3',
                                            '02/22/2011 00:00:00',    '3',
                                            '03/04/2011 00:00:00',    '3',
                                            '03/10/2011 00:00:00',    '4',
                                            '03/22/2011 00:00:00',    '4',
                                            '04/04/2011 00:00:00',    '4',
                                             to_char(a.end_dt)) osr_run,

o is the first week of a month always to be for the last month?

o are the numbers always 111,222,333,444 in order from first date to last date or are they assigned based on the date or what?

Believe me, I could not agree more with you on the implicit conversions. But
this is what we are dealing with for now.

you are obviously in the midst of fixing things, this should have been the first fix!

A reader, April 27, 2011 - 12:53 pm UTC

Yes. The first week of a month is always to be for the last month & the numbers are in order (111,222,333) from first date to last date.

The result of the above query produces results in this fashion -
OSR_RUN LST_NM COMP_AMT
Dec-10 Jennings 495129.66
Jan-11 Jennings 484153.76
Feb-11 Jennings 472022.35
Mar-11 Jennings 481284.15
Dec-10 Parker 512221.55
Jan-11 Parker 519161.53
Feb-11 Parker 518672.03
Mar-11 Parker 518291.82
Dec-10 Scott 473346.21
Jan-11 Scott 478488.66
Feb-11 Scott 483854.59
Mar-11 Scott 480323.2
Dec-10 Spencer 501227.09
Jan-11 Spencer 512199.9
Feb-11 Spencer 511897.21
Mar-11 Spencer 506145.24

April 27, 2011 - 1:55 pm UTC

so, to be clear, the first three lowest dates are assigned one (even if they are not in the same time period - eg, if the first three dates we hit are:

22-dec, 04-jan, 10-jan

they will be assigned one?

A reader, April 28, 2011 - 9:54 am UTC

The first three dates we should be hitting are -

10-dec, 22-dec, 04-jan. These will be assigned 1.
10-jan, 22-jan, 04-feb. These will be assigned 2.
10-feb, 22-feb, 04-mar. These will be assigned 3.

April 28, 2011 - 10:20 am UTC

so, you are saying there is an implied constraint here that your input date ranges will ALWAYS be on the tenth of some month.

Ok, what if the date range I put in is

10-JUN-2010 through 04-DEC-2010

does june get one?

In any case - use whichever function below that works for you:

ops$tkyte%ORA11GR2> variable start_date varchar2(20)
ops$tkyte%ORA11GR2> variable end_date varchar2(20)
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec :start_date := '10-dec-2010'

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> exec :end_date := '04-apr-2011'

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> with four as
  2  (select decode(level,1,4,2,22,10) dy from dual connect by level <= 3),
  3  twelve as
  4  (select level mm from dual connect by level <= 12),
  5  one_year as
  6  (select to_date( dy||'/'||mm, 'dd/mm' ) dt from four, twelve),
  7  days as
  8  (select decode( l, 1, dt, add_months(dt,-12) ) dt
  9     from one_year, (select level l from dual connect by level <= 2)
 10  )
 11  select dt,
 12         to_char( dt-5, 'mm/yyyy' ),
 13             to_char( add_months(dt-5,1), 'fmmm' ) first,
 14             to_char( add_months(dt-5,1), 'fmmm' )-to_char( add_months(to_date(:start_date, 'dd-mon-yyyy')-5,1),'fmmm') + 1
 15             second
 16    from days
 17   where dt between to_date( :start_date, 'dd-mon-yyyy' ) and to_date( :end_date, 'dd-mon-yyyy' )
 18   order by dt;

DT        TO_CHAR FI     SECOND
--------- ------- -- ----------
10-DEC-10 12/2010 1           1
22-DEC-10 12/2010 1           1
04-JAN-11 12/2010 1           1
10-JAN-11 01/2011 2           2
22-JAN-11 01/2011 2           2
04-FEB-11 01/2011 2           2
10-FEB-11 02/2011 3           3
22-FEB-11 02/2011 3           3
04-MAR-11 02/2011 3           3
10-MAR-11 03/2011 4           4
22-MAR-11 03/2011 4           4
04-APR-11 03/2011 4           4

12 rows selected.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec :start_date := '10-feb-2011'

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> exec :end_date := '04-jul-2011'

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> /

DT        TO_CHAR FI     SECOND
--------- ------- -- ----------
10-FEB-11 02/2011 3           1
22-FEB-11 02/2011 3           1
04-MAR-11 02/2011 3           1
10-MAR-11 03/2011 4           2
22-MAR-11 03/2011 4           2
04-APR-11 03/2011 4           2
10-APR-11 04/2011 5           3
22-APR-11 04/2011 5           3
04-MAY-11 04/2011 5           3
10-MAY-11 05/2011 6           4
22-MAY-11 05/2011 6           4
04-JUN-11 05/2011 6           4
10-JUN-11 06/2011 7           5
22-JUN-11 06/2011 7           5
04-JUL-11 06/2011 7           5

15 rows selected.

Divide into 3 groups

A reader, May 03, 2011 - 2:59 pm UTC

Table has following data

TEAM member
team1 2
team2 5
team3 3
team4 1
team5 10
team6 4
team7 38
...

How to divide these teams relative evenly into 3 groups?

May 04, 2011 - 1:45 pm UTC

read about ntile() or just simply add a column "mod(rownum,3) grp" to your select list.

I would have given you an example, but you didn't give us a create table/insert intos to work with!

Divide into 3 groups

A reader, May 05, 2011 - 8:49 am UTC

Tom,

I have tried with mod(rownum,3).  But the result is not that even.

create table t (team varchar2(8), member_num number, group_no number);

insert into t values ('team1',3, null); 
insert into t values ('team2',5, null);
insert into t values ('team3',8, null);
insert into t values ('team4',9, null);
insert into t values ('team5',2, null);
insert into t values ('team6',7, null);
insert into t values ('team7',4, null);
insert into t values ('team8',3, null);
insert into t values ('team9',3, null);
insert into t values ('team10',25, null);
insert into t values ('team11',33, null);

commit;

update t
set t.group_no = (select r.rg from (select REPLACE(mod(rownum,3),0,3) rg, team 
                                     from t order by member) r 
where t.team = r.team
);

SQL > select group_no, sum(member) from t group by group_no;

  GROUP_NO SUM(MEMBER)
---------- -----------
         1          41
         2          43
         3          18

Thanks.

May 06, 2011 - 10:06 am UTC

When you said to "How to divide these teams relative evenly into 3 groups?
", what immediately pops into mine?

Well, to break them into three groups of equal sizes....

I see 11 teams. I would put

1,4,7,10 into a group of 4
2,5,8,11 into a group of 4
3,6,9 into a group of 3

there I have three groups of equal sizes.

You did not say anything like:

given that member_num column, can you split the values in the above list up into 3 groups such that the sum of member_num within each group is about the same magnitude.

Which is what you are apparently asking to do.

That is a 'bin fitting' type of problem and does not lend itself easily to SQL based solutions.

see
http://asktom.oracle.com/pls/asktom/asktom.search?p_string=%22bin+fitting%22

for past discussions on this

Led/lag or connect by prior..

Jayadevan, June 08, 2011 - 3:55 am UTC

Hello Tom,
I have the following scenario -

  CREATE TABLE "HR"."ACTIVITY" 
   ( "ACTIVITY_ID" NUMBER(*,0), 
 "PERSON_ID" NUMBER(*,0), 
 "START_DATE" DATE, 
 "END_DATE" DATE, 
 "ACTIVITY_NAME" VARCHAR2(10 BYTE)
   )

Sample data -

 Insert into HR.ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (1,1,to_date('08-JUN-11','DD-MON-RR'),to_date('18-JUN-11','DD-MON-RR'),'LEAVE');
Insert into HR.ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (2,1,to_date('06-JUN-11','DD-MON-RR'),to_date('07-JUN-11','DD-MON-RR'),'LEAVE');
Insert into HR.ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (3,1,to_date('01-JUN-11','DD-MON-RR'),to_date('04-JUN-11','DD-MON-RR'),'LEAVE');

select * from activity order by start_date;

ACTIVITY_ID PERSON_ID START_DAT END_DATE ACTIVITY_N
----------- ---------- --------- --------- ----------
3 1 01-JUN-11 04-JUN-11 LEAVE
2 1 06-JUN-11 07-JUN-11 LEAVE
1 1 08-JUN-11 18-JUN-11 LEAVE

I will get one date as input - say sysdate. Now I have to get the activity a person is assigned for that date (i.e. start date <= sysdate and end_date>= sysdate) and walk back through all the data till there is a blank date, and walk forward till there is a blank date, and display all those records.
For example, if user enters '10-JUN-2011' as input, Activityid 1 is the 'current' activity. Then I step back to the previous activity (activity id 2, with end date as 07. I step back again- now the end date of previous activity is 04-June. This implaies a break. So I stop there and stop going back. Similarly, I have to go forward. Ihope the situation is clear. There is another break condition - if previous activity is not the same as current, that implies stop. But that should be easy enough if I can figure out the firts case. Any suggestions?

June 08, 2011 - 10:42 am UTC

are you doing this for a single person or for a set of people or for all people in the table.

A more interesting test case would have more than one person in it to ensure the answer actually works too.

can a person have more than one activity assigned at the same time?

do the start/end times overlap?

this implies a break"

why? and what does that mean? how did you know that was a 'break'.

There is another break condition - if previous activity is not the same as current, that implies stop.

if that implied a stop, then we would have stopped on the first record???? it had activity 2, which is not 1.

Aswers

Jayadevan, June 08, 2011 - 11:43 pm UTC

Thanks for the reply.

1) are you doing this for a single person or for a set of people or for all people in the table.

This can happen for a number of persons. The maximum number of persons is, as of now 6000. I am sure it can go up. So we could select upto 6000 persons. Since we have to maintain the data for these persons for a few years, the number of records is already in millions. We already have a temp table which will store the 'filtered' persons- filtered on various conditions user will choose- all persons from a location, or all persons with some designation.... and that will be used for filrst-level filtering. Now we are iterating through a loop to get the 'break' and as number of persons/activities go up, iterations go up - no very scalable, I feel.

A more interesting test case would have more than one person in it to ensure the answer actually works too.

2)can a person have more than one activity assigned at the same time?
No

3) do the start/end times overlap?
Not to the minute/second level. But end time can be today 10 AM and next start time can be tonight 11 or something like that

this implies a break"

why? and what does that mean? how did you know that was a 'break'.
If a person was not assigned any activity, we do not have to display what he was doing before that day.

There is another break condition - if previous activity is not the same as current, that implies stop.

4)if that implied a stop, then we would have stopped on the first record???? it had activity 2, which is not 1.

I mean the type of activity- which is 'LEAVE' is all the cases here. If the activity was, say , 'TRAINING', yes, we would not have taken the record with Activity id 2 (i.e. take only activity 1. Activity 2's name is different, so we do not take it, and do not traverse further.
I hope it is clear.
The column activity_id is just a column I created as PK. It is the activity_name that I need to check for 'is it different or same'.

June 09, 2011 - 9:56 am UTC

provide me, as requested, a better test case.

Give me an employee that has a date based break.
Give me another on that has an activity based break.
Give me another that has no breaks - all data would be returned.
Give me one that has both a date based break and an activity based break.

Make your test case actually test all of the different cases

SQL Queries

Raghu, June 10, 2011 - 5:08 am UTC

Hi Tom,
I am a Oracle developer and in my work I rarely get requirements to write complex SQL Queries, where I can make use of analytic functions and other features.
So I was wondering if you had any set of questions (On EMP/DEPT tables), which would test our query writing ability.
Or what queries you might ask in an interview.
There are daily PL/SQL quiz hapening, wish we had SQL Quiz too.
Thanks for all the help, I am a big fan of yours.

Activities and breaks

Jayadevan, June 11, 2011 - 1:05 am UTC

Hello Tom,
Apologies for not providing enough sample data...

CREATE TABLE "ACTIVITY" ("ACTIVITY_ID" NUMBER(*,0), "PERSON_ID" NUMBER(*,0), "START_DATE" DATE, "END_DATE" DATE, "ACTIVITY_NAME" VARCHAR2(10))

Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (4,1,to_date('19-JUN-11','DD-MON-RR'),to_date('18-JUN-11','DD-MON-RR'),'TRAINING');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (2,1,to_date('05-JUN-11','DD-MON-RR'),to_date('07-JUN-11','DD-MON-RR'),'LEAVE');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (3,1,to_date('01-JUN-11','DD-MON-RR'),to_date('04-JUN-11','DD-MON-RR'),'LEAVE');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (5,1,to_date('27-MAY-11','DD-MON-RR'),to_date('29-MAY-11','DD-MON-RR'),'LEAVE');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (1,1,to_date('08-JUN-11','DD-MON-RR'),to_date('18-JUN-11','DD-MON-RR'),'LEAVE');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (14,2,to_date('01-JUN-11','DD-MON-RR'),to_date('02-JUN-11','DD-MON-RR'),'GROUND');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (12,2,to_date('03-JUN-11','DD-MON-RR'),to_date('06-JUN-11','DD-MON-RR'),'GROUND');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (13,2,to_date('09-JUN-11','DD-MON-RR'),to_date('10-JUN-11','DD-MON-RR'),'GROUND');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (15,2,to_date('11-JUN-11','DD-MON-RR'),to_date('12-JUN-11','DD-MON-RR'),'GROUND');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (11,2,to_date('07-JUN-11','DD-MON-RR'),to_date('08-JUN-11','DD-MON-RR'),'GROUND');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (20,3,to_date('07-JUN-11','DD-MON-RR'),to_date('09-JUN-11','DD-MON-RR'),'LEAVE');
Insert into ACTIVITY (ACTIVITY_ID,PERSON_ID,START_DATE,END_DATE,ACTIVITY_NAME) values (21,3,to_date('13-JUN-11','DD-MON-RR'),to_date('19-JUN-11','DD-MON-RR'),'TRAINING');

select * from activity order by person_id, start_date, end_date ;

          5          1 27-MAY-11 29-MAY-11 LEAVE
          3          1 01-JUN-11 04-JUN-11 LEAVE
          2          1 05-JUN-11 07-JUN-11 LEAVE
          1          1 08-JUN-11 18-JUN-11 LEAVE
          4          1 19-JUN-11 18-JUN-11 TRAINING
         14          2 01-JUN-11 02-JUN-11 GROUND
         12          2 03-JUN-11 06-JUN-11 GROUND
         11          2 07-JUN-11 08-JUN-11 GROUND
         13          2 09-JUN-11 10-JUN-11 GROUND
         15          2 11-JUN-11 12-JUN-11 GROUND
         20          3 07-JUN-11 09-JUN-11 LEAVE
         21          3 13-JUN-11 19-JUN-11 TRAINING

The input filter is '07-JUN-11'.
For pers_id = 1 , I have to go back till activity_id 3. The start_date of the previous activity (id = 5) activity is not the same as the end_date of activity_id 5. So we stop there. This is a date-based break. While going forward, I have to go till activity_id 1. The activity_name for activity_id 4 is 'TRAINING', which is not the same as the ACTIVITY_NAME for person_id for the date '07-JUN-11'. This is a change of activity-based break.
For pers_id 3, there is an activity and date based_break on activity_id 21 and we stop there (actually, if one break condition is met, we do not have to check for the other condition)

For pers_id 2, I have to fetch all the records - going back from '07-JUN-11' and forward from '07-JUN-11', there is no 'break' as far as one activity's end_date and the next activity's start_dates are concerned. The activity names are also same. In the end, the output I have to provide is

PERSON_ID         START_DAT     END_DATE          ACTIVITY_NAME
----------      ---------      ---------          ----------
    1             01-JUN-11     18-JUN-11         LEAVE
    2             01-JUN-11     12-JUN-11         GROUND
    3             07-JUN-11     09-JUN-11         LEAVE

Correction

Jayadevan, June 11, 2011 - 1:14 am UTC

A correction in the sentence
"The start_date of the previous activity (id = 5) activity is not the same as the end_date of activity_id 5. So we stop there. "
It should be
"The end_date of the previous activity (id = 5) is not the same as the start_date of activity_id 3. It is not the previous date as compared to the start_date of activity 3 either So we stop there. ". In effect, there is some gap between the assignments- there is one or more days when the employee is not assigned any activity. That is a break

@Jayadevan: Generic Analytic Solution

Brendan, June 11, 2011 - 8:06 am UTC

The requirement is expressed rather procedurally in terms of going forward and back from a given record, then stopping at the break points. If you want a solution by analytics, I would view it rather as a simple extension of a general class of problems where you want to group records primarily by contiguity of ranges (usually date ranges). I wondered whether it might not be interesting to present the class generically, propose a generic analytic solution, then apply it to your specific case.

I would describe the class for your problem as follows. Consider the fields in a record set to divide into the following categories:

key                     - partition by fields
range_start, range_end  - range fields
attr_brk                - break fields
attr_oth                - any other fields

The problem is then to obtain for each record a group_start, group_end pair that are the range_start and range_end values for the records that respectively start and end the break group of the current record. The records are to be ordered by range_start, range_end within the partitioning key, and a new break group starts when, between successive records, either there is a gap between range_end and range_start fields, or any of the attr_brk fields change value. In this case no overlaps are allowed in the ranges within a key.

It's worth noting that if you drop the break fields overlapping ranges can easily be accommodated, making a related second class that can be solved by similar methods.

I have written a kind of pseudo-SQL solution for both classes, and have also implemented them for your particular table, with my own test sets. I could post these if there is any interest, and TK has no objection. However, I will just state the approach used for now to keep it brief.

1. Within an inline view, use Lag and Lead functions with CASE expressions to set group_start and group_end values on the respective start and end records of the break groups, leaving other values null.
2. Select all the original fields from the inline view, as well as the new fields within First_Value, Last_Value functions with the IGNORE NULLS option
3. The ouptut from step 2 solves the problems as defined, but if necessary, can be used within another inline view to restrict the output to certain groups only (eg a 'current' group)

Correction

Brendan, June 12, 2011 - 12:45 pm UTC

The second class actually needs an additional prior step. I've put an article on Scribd on this.

Query

Jayadevan, June 13, 2011 - 9:09 am UTC

Hello,
If you do have a working query, please do share that. I hope some amount of filtering/elimination will happen in the first query that gets executed, since the table in question has millions of records. I think Tom too would be intersted in seeing that query. I considered a simple lead/lag inner query with filters applied in the outer query. But that would mean a full scan of this huge table, and hence I thought I should seek Tom's help.
Regards,
Jayadevan

Code as requested

Brendan, June 14, 2011 - 2:33 am UTC

Hi Jayadevan

Here is the code:

SELECT /* NO_OVERLAP */
       person_id, start_date, end_date, activity_name, activity_id id,
       Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date) group_start,
       First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) group_end
  FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
       CASE WHEN (start_date > Nvl (Lag (end_date) OVER (PARTITION BY person_id ORDER BY start_date), start_date-1)) OR 
                 (activity_name != Lag (activity_name) OVER (PARTITION BY person_id ORDER BY start_date)) THEN start_date END group_start,
       CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date), end_date+1) > end_date) OR
                 (activity_name != Lead (activity_name) OVER (PARTITION BY person_id ORDER BY start_date)) THEN end_date END  group_end
  FROM activity
 WHERE person_id IN (1, 2)
)
 ORDER BY person_id, start_date

The shape of the data matters as much as the size - is it wide and shallow or deep and narrow for example, in terms of persons and time. I've only tested on 30 records. For wide and shallow, it may be efficient. However, if you have a very deep data set but with only a few records in the 'current' group you might be better using a more procedural approach. Assuming you've indexed start and end times, you might 'anchor' from the current records, then execute indexed selects to pass from one record to the adjacent until you hit a break. You could also use the method mentioned in the recent thread on '3 second...' where a recursive subquery approach is mentioned, you'd have to put it in a union to go both ways I think - I'd have tried that myself but have not had time to do it. Alternatively, within the analytic approach, would it help to do an 'Exists current record' subquery? I don't know your data so I can't say.

re: Activities and breaks

Stew Ashton, June 14, 2011 - 11:13 am UTC

This seems like a job for procedural logic. Analytics let you look forward and backward in the data your are selecting, but they don't let you look backward in the result set you are creating. Brendan used the MODEL clause in another thread and I think it does the trick here. The MODEL clause carries forward the start date as long as there is not a "break", then the GROUP BY clause gives us one record with the latest end date:

> select person_id, start_date, max(end_date) end_date, activity_name from (
  SELECT person_id, start_date, end_date, activity_name
  FROM activity
  MODEL
    PARTITION BY (person_id)
    DIMENSION BY (Row_Number() OVER (PARTITION BY person_id ORDER BY start_date) rn)
    MEASURES (start_date, end_date, activity_name)
    RULES (
      start_date[rn>1] order by rn = 
        case when end_date[cv()-1] >= start_date[cv()]-1
        and activity_name[cv()-1] = activity_name[cv()] 
        then start_date[cv()-1]
        else start_date[cv()] end
  )
) 
where to_date('07-JUN-11', 'DD-MON-RR') between start_date and end_date
group by person_id, start_date, activity_name
order by 1,2,4
/
PERSON_ID  START_DATE   END_DATE   ACTIVITY_NAME
---------- ------------ ---------- -------------
1          01-JUN-11    18-JUN-11  LEAVE
2          01-JUN-11    12-JUN-11  GROUND
3          07-JUN-11    09-JUN-11  LEAVE

Recursive Solution

Brendan, June 14, 2011 - 5:56 pm UTC

Some analytics sneaked in though, basic testing done...

WITH rsq_f (person_id, start_date, end_date, activity_name, activity_id) AS (
SELECT person_id, start_date, end_date, activity_name, activity_id
  FROM activity
 WHERE '&TODAY' BETWEEN start_date AND Nvl(end_date, '&TODAY')
   AND person_id IN (1, 2)
 UNION ALL
SELECT  act.person_id, act.start_date, act.end_date, act.activity_name, act.activity_id 
  FROM rsq_f
  JOIN activity act
    ON act.start_date      = rsq_f.end_date
   AND act.person_id       = rsq_f.person_id
   AND act.activity_name   = rsq_f.activity_name
   AND act.person_id IN (1, 2)
), rsq_b (person_id, start_date, end_date, activity_name, activity_id) AS (
SELECT person_id, start_date, end_date, activity_name, activity_id
  FROM activity
 WHERE '&TODAY' BETWEEN start_date AND Nvl(end_date, '&TODAY')
   AND person_id IN (1, 2)
 UNION ALL
SELECT  act.person_id, act.start_date, act.end_date, act.activity_name, act.activity_id 
  FROM rsq_b
  JOIN activity act
    ON act.end_date         = rsq_b.start_date
   AND act.person_id        = rsq_b.person_id
   AND act.activity_name    = rsq_b.activity_name
)
SELECT /* RSQ '&TODAY' */ person_id, start_date, end_date, activity_name, activity_id,
        Min (start_date) OVER (PARTITION BY person_id) grp_start, Max (end_date) OVER (PARTITION BY person_id) grp_end
  FROM (
SELECT  person_id, start_date, end_date, activity_name, activity_id
  FROM rsq_b
UNION
SELECT person_id, start_date, end_date, activity_name, activity_id
  FROM rsq_f
)
ORDER BY person_id, start_date
/

 PERSON_ID START_DAT END_DATE  ACTIVITY_N ACTIVITY_ID GRP_START GRP_END
---------- --------- --------- ---------- ----------- --------- ---------
         1 08-JUN-11 09-JUN-11 LEAVE                4 08-JUN-11 14-JUN-11
           09-JUN-11 14-JUN-11 LEAVE                5 08-JUN-11 14-JUN-11

         2 09-JUN-11 14-JUN-11 TRAINING            11 09-JUN-11 14-JUN-11

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                              |             |       |       |     2 (100)|          |
|   1 |  WINDOW SORT                                  |             |     8 |   408 |     2 (100)| 00:00:01 |
|   2 |   VIEW                                        |             |     8 |   408 |     2 (100)| 00:00:01 |
|   3 |    SORT UNIQUE                                |             |     8 |   408 |     2 (100)| 00:00:01 |
|   4 |     UNION-ALL                                 |             |       |       |            |          |
|   5 |      VIEW                                     |             |     4 |   204 |     0   (0)|          |
|   6 |       UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |       |       |            |          |
|   7 |        INLIST ITERATOR                        |             |       |       |            |          |
|*  8 |         TABLE ACCESS BY INDEX ROWID           | ACTIVITY    |     2 |   102 |     0   (0)|          |
|*  9 |          INDEX RANGE SCAN                     | ACTIVITY_U1 |     1 |       |     0   (0)|          |
|  10 |        NESTED LOOPS                           |             |       |       |            |          |
|  11 |         NESTED LOOPS                          |             |     2 |   160 |     0   (0)|          |
|  12 |          RECURSIVE WITH PUMP                  |             |       |       |            |          |
|* 13 |          INDEX UNIQUE SCAN                    | ACTIVITY_U2 |     1 |       |     0   (0)|          |
|* 14 |         TABLE ACCESS BY INDEX ROWID           | ACTIVITY    |     1 |    51 |     0   (0)|          |
|  15 |      VIEW                                     |             |     4 |   204 |     0   (0)|          |
|  16 |       UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |       |       |            |          |
|  17 |        INLIST ITERATOR                        |             |       |       |            |          |
|* 18 |         TABLE ACCESS BY INDEX ROWID           | ACTIVITY    |     2 |   102 |     0   (0)|          |
|* 19 |          INDEX RANGE SCAN                     | ACTIVITY_U1 |     1 |       |     0   (0)|          |
|  20 |        NESTED LOOPS                           |             |       |       |            |          |
|  21 |         NESTED LOOPS                          |             |     2 |   160 |     0   (0)|          |
|  22 |          RECURSIVE WITH PUMP                  |             |       |       |            |          |
|* 23 |          INDEX UNIQUE SCAN                    | ACTIVITY_U1 |     1 |       |     0   (0)|          |
|* 24 |         TABLE ACCESS BY INDEX ROWID           | ACTIVITY    |     1 |    51 |     0   (0)|          |
-------------------------------------------------------------------------------------------------------------

Recursion Refined

Brendan, June 15, 2011 - 1:57 am UTC

This version might be more efficient, it does half the table accesses...

DEFINE TODAY='11-JUN-2011'
PROMPT Recursive Subquery (OR) for '&TODAY'
WITH   act AS (
SELECT person_id, start_date, end_date, activity_name, activity_id
  FROM activity
 WHERE '&TODAY' BETWEEN start_date AND Nvl(end_date, '&TODAY')
   AND person_id IN (1, 2)
),     rsq (person_id, start_date, end_date, activity_name, activity_id, go_back) AS (
SELECT person_id, start_date, end_date, activity_name, activity_id, go_back
  FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id, 'N' go_back
  FROM act
 UNION ALL
SELECT person_id, start_date, end_date, activity_name, activity_id, 'Y'
  FROM act
)
 UNION ALL
SELECT  act.person_id, act.start_date, act.end_date, act.activity_name, act.activity_id, CASE WHEN act.start_date = rsq.end_date THEN 'N' ELSE 'Y' END
  FROM rsq
  JOIN activity act
    ON ((act.start_date     = rsq.end_date AND go_back = 'N') OR
        (act.end_date        = rsq.start_date AND go_back = 'Y'))
   AND act.person_id       = rsq.person_id
   AND act.activity_name   = rsq.activity_name
   AND act.person_id IN (1, 2)
)
SELECT /* RSQ '&TODAY' */ person_id, start_date, end_date, activity_name, activity_id,
        Min (start_date) OVER (PARTITION BY person_id) grp_start, Max (end_date) OVER (PARTITION BY person_id) grp_end
  FROM (
SELECT  person_id, start_date, end_date, activity_name, activity_id
  FROM rsq
UNION
SELECT person_id, start_date, end_date, activity_name, activity_id
  FROM rsq
)
ORDER BY person_id, start_date
/
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                           |       |       |    14 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION                 |                           |       |       |            |          |
|   2 |   LOAD AS SELECT                           |                           |       |       |            |          |
|   3 |    INLIST ITERATOR                         |                           |       |       |            |          |
|*  4 |     TABLE ACCESS BY INDEX ROWID            | ACTIVITY                  |     2 |   102 |     0   (0)|          |
|*  5 |      INDEX RANGE SCAN                      | ACTIVITY_U1               |     1 |       |     0   (0)|          |
|   6 |   LOAD AS SELECT                           |                           |       |       |            |          |
|   7 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|                           |       |       |            |          |
|   8 |     VIEW                                   |                           |     4 |   216 |     4   (0)| 00:00:01 |
|   9 |      UNION-ALL                             |                           |       |       |            |          |
|  10 |       VIEW                                 |                           |     2 |   102 |     2   (0)| 00:00:01 |
|  11 |        TABLE ACCESS FULL                   | SYS_TEMP_0FD9D661A_BC16D2 |     2 |   102 |     2   (0)| 00:00:01 |
|  12 |       VIEW                                 |                           |     2 |   102 |     2   (0)| 00:00:01 |
|  13 |        TABLE ACCESS FULL                   | SYS_TEMP_0FD9D661A_BC16D2 |     2 |   102 |     2   (0)| 00:00:01 |
|  14 |     NESTED LOOPS                           |                           |       |       |            |          |
|  15 |      NESTED LOOPS                          |                           |     1 |    92 |     4   (0)| 00:00:01 |
|  16 |       RECURSIVE WITH PUMP                  |                           |       |       |            |          |
|* 17 |       INDEX RANGE SCAN                     | ACTIVITY_U1               |     2 |       |     0   (0)|          |
|* 18 |      TABLE ACCESS BY INDEX ROWID           | ACTIVITY                  |     1 |    51 |     0   (0)|          |
|  19 |   WINDOW SORT                              |                           |    10 |   510 |     6  (34)| 00:00:01 |
|  20 |    VIEW                                    |                           |    10 |   510 |     6  (34)| 00:00:01 |
|  21 |     SORT UNIQUE                            |                           |    10 |   510 |     6  (67)| 00:00:01 |
|  22 |      UNION-ALL                             |                           |       |       |            |          |
|  23 |       VIEW                                 |                           |     5 |   255 |     2   (0)| 00:00:01 |
|  24 |        TABLE ACCESS FULL                   | SYS_TEMP_0FD9D661B_BC16D2 |     5 |   270 |     2   (0)| 00:00:01 |
|  25 |       VIEW                                 |                           |     5 |   255 |     2   (0)| 00:00:01 |
|  26 |        TABLE ACCESS FULL                   | SYS_TEMP_0FD9D661B_BC16D2 |     5 |   270 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
Output...
Recursive Subquery (OR) for '11-JUN-2011'
old   4:  WHERE '&TODAY' BETWEEN start_date AND Nvl(end_date, '&TODAY')
new   4:  WHERE '11-JUN-2011' BETWEEN start_date AND Nvl(end_date, '11-JUN-2011')
old  25: SELECT /* RSQ '&TODAY' */ person_id, start_date, end_date, activity_name, activity_id,
new  25: SELECT /* RSQ '11-JUN-2011' */ person_id, start_date, end_date, activity_name, activity_id,

 PERSON_ID START_DAT END_DATE  ACTIVITY_N ACTIVITY_ID GRP_START GRP_END                                                 
---------- --------- --------- ---------- ----------- --------- ---------                                               
         1 08-JUN-11 09-JUN-11 LEAVE                4 08-JUN-11 14-JUN-11                                               
           09-JUN-11 14-JUN-11 LEAVE                5 08-JUN-11 14-JUN-11                                               
                                                                                                                        
         2 09-JUN-11 14-JUN-11 TRAINING            11 09-JUN-11 14-JUN-11

Analytical question

John, June 15, 2011 - 7:24 pm UTC

have a table data
col1 col2
1 2
5 6
8 9
2 3
10 12
3 20
6 25
want to display data as
col1 col2
1 2
2 3
3 20
5 6
6 25
8 9
10 12

Thanks in advance

June 17, 2011 - 1:25 pm UTC

if I had such tables, I could show you how to do that.

But I don't, so I cannot.

If you provide me with a create table and a bunch of inserts, I could

no create
no inserts
no look

I was wrong about "Activities and breaks"

Stew Ashton, June 16, 2011 - 8:58 am UTC

I thought analytics couldn't do this, but I was wrong. In this solution,
Q1 adds a "break" column that is set to 1 when a break occurs.
Q2 does a running total on the "break" column; this assigns each set of lines a rank that increments at each break.
Finally, the main select gets the earliest start date and latest end date for each rank.

This solution is based on http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1594948200346140013#1596748700346101469 By the way, in my tests the MODEL-based solution was somewhat faster :)

with q1 as (
  select person_id, start_date, end_date, activity_name
  , case when nvl(
    lag(activity_name)
     over (partition by person_id order by start_date, activity_name)
      , activity_name
    ) = activity_name 
    and nvl(
      lag(end_date)
       over (partition by person_id order by start_date, activity_name)
      , start_date-1
    ) >= start_date-1  
    then 0 else 1 end as break
  from activity
), q2 as (
  select person_id, start_date, end_date, activity_name
  , sum(break)
   over(partition by person_id order by start_date, activity_name)
   + 1 as rank
  from q1
)
select person_id, min(start_date) start_date, max(end_date) end_date, activity_name
from q2
group by person_id, rank, activity_name
having to_date('07-JUN-11', 'DD-MON-RR') between min(start_date) and max(end_date)
order by 1, 2
/
PERSON_ID  START_DATE           END_DATE             ACTIVITY_NAME
---------- -------------------- -------------------- -------------
1          2011/06/01 00:00:00  2011/06/18 00:00:00  LEAVE
2          2011/06/01 00:00:00  2011/06/12 00:00:00  GROUND
3          2011/06/07 00:00:00  2011/06/09 00:00:00  LEAVE

Analytics

John, June 17, 2011 - 9:56 pm UTC

Please see the following
create table t_hierarchy( col1 number(10), col2 number(10));
insert into t_hierarchy values(1,80);
insert into t_hierarchy values(60,10);
insert into t_hierarchy values(80,60);
insert into t_hierarchy values(65,200);
insert into t_hierarchy values(10,100);
insert into t_hierarchy values(200,55);
insert into t_hierarchy values(101,500);

The data display should as follows
col1 col2
---- ----
1 80
80 60
60 10
10 100
65 200
200 55
101 500

Thanks in advance

June 20, 2011 - 9:44 am UTC

neat, I can do that by:

select 1 oc, t.* from t where col1 = 1 and col2 = 60
UNION ALL
select 2 oc, t.* from t where col1 = 80 and col = 60
UNION ALL
...
order by 1;

there you go - now, if you would care to explain the logic behind your requested output and ALL of the rules governing your data (I can probably see what you are trying to do here) - such as

what describes the parent child relation?
can there be more than one child of a given parent?
if yes, what then?
is there only one root - only one "master parent"?
if not, what then?
can there be loops?
etc etc etc

In other words, write a SPECIFICATION.

I completely fail to understand the questions with:

a) here is my data
b) here is my output

now please write a query. Without providing ANY OTHER INFORMATION.

If your data always conforms to the rules I've made up in my head, then this might work (and it also might be overkill for you - you don't give me ANYTHING to work with)

ops$tkyte%ORA11GR2> select *
  2    from t_hierarchy t1
  3  start with rowid in (select rowid from t_hierarchy t2 where not exists (select null from t_hierarchy t3 where t3.col2 = t2.col1))
  4  connect by prior t1.col2 = t1.col1
  5  /

      COL1       COL2
---------- ----------
         1         80
        80         60
        60         10
        10        100
        65        200
       200         55
       101        500

7 rows selected.

About the t_hierarchy question

Stew Ashton, June 20, 2011 - 12:52 pm UTC

Tom, I see that the "root" values are returned in ascending order. Is this guaranteed, or must one add the "order siblings by" clause?

June 20, 2011 - 1:55 pm UTC

hierarchical queries will return the children of children of children .... in order

but the children in a level will not come out in any order in particular.

You don't need an order siblings by in this case because (in the the example and one of my assumptions) there is ONLY ONE sibling per level - there is nothing to sort at that level

Quick followup

Stew Ashton, June 20, 2011 - 2:08 pm UTC

There are 3 "root" values: 1, 65 and 101. Those are the "siblings" I'm talking about. They happen to come back in ascending order, but can I count on that without an "order siblings by"?

select *
from t_hierarchy t1
start with rowid in (
  select rowid from T_HIERARCHY T2 
  where not exists (select null from T_HIERARCHY T3 where T3.COL2 = T2.COL1)
)
connect by prior T1.COL2 = T1.COL1
order siblings by T1.COL1 desc
COL1                   COL2                   
---------------------- ---------------------- 
101                    500                    
65                     200                    
200                    55                     
1                      80                     
80                     60                     
60                     10                     
10                     100

June 20, 2011 - 3:11 pm UTC

ah, I see.

yes, they would have to be ordered to get them ordered.

else they could come back in any order.

am jealous

venkata, July 26, 2011 - 7:39 am UTC

Tom,

i don't know anything about model, dimension by, measures, partition by clauses. can you give me some pointers where i can learn about these and i stumbled upon the following url which i didn't understand at all please help.

http://www.sqlsnippets.com/en/topic-11663.html

July 28, 2011 - 6:30 pm UTC

http://docs.oracle.com/docs/cd/E11882_01/server.112/e16579/sqlmodel.htm#g1042989

Activity dates

Jayadevan, July 29, 2011 - 5:06 am UTC

Hi Stew,
I got back to working on the activity/date query. The second query you provided -
with q1 as (
select person_id, start_date, end_date, activity_name
, case when nvl(
lag(activity_name)
over (partition by person_id order by start_date, activity_name)
, activity_name
) = activity_name
.......
works great.
Thank you.

help

Venkata, August 05, 2011 - 6:29 am UTC

Hi,

to learn about data modelling i have created a table and inserted data as shown below.

create table xtest(cust_id number(10), sale_val number(20,4), asoflastday date)

insert into xtest select trunc(dbms_random.value*100000,0) c,trunc(dbms_random.value*100000000,4) amt,last_day(add_months(trunc(sysdate,'yyyy'),rownum-1)) last
from all_objects where rownum < 13

now when you do
select * from xtest
you will get data which is different from what i have

what i need is

sales as on feb = (febsale+Marsale)/2

sales as on Mar = (Marsale+Aprsale)/2

hope you are getting what i am trying to say :)

i need your help in solving this using only data modelling let me know if you can use analytics also

August 05, 2011 - 8:23 am UTC

how can you solve a question like that with "data modeling".

Data modeling is the art of coming up with your schema. It doesn't involve actually answering any business questions?

No, I don't know what you are trying to say.

with analytics

Venkata, August 05, 2011 - 3:00 pm UTC

can you please get me a solution with analytics please

August 05, 2011 - 3:05 pm UTC

ops$tkyte%ORA11GR2> select cust_id, sale_val, asoflastday, (sale_val+lead(sale_val) over (order by asoflastday))/2 what_you_asked_for
  2    from xtest
  3   order by asoflastday
  4  /

   CUST_ID   SALE_VAL ASOFLASTD WHAT_YOU_ASKED_FOR
---------- ---------- --------- ------------------
     50333 59710311.4 31-JAN-11         44289384.5
     43525 28868457.6 28-FEB-11         40566135.3
      2827   52263813 31-MAR-11         72036300.6
     44790 91808788.3 30-APR-11         82621668.8
     10819 73434549.3 31-MAY-11         67047508.9
        35 60660468.5 30-JUN-11         31365013.1
     93257 2069557.62 31-JUL-11         17826510.8
     90286   33583464 31-AUG-11         28697602.1
     59528 23811740.1 30-SEP-11         18998234.4
     88679 14184728.7 31-OCT-11         8633988.97
     48826 3083249.24 30-NOV-11         17567744.3
     55996 32052239.3 31-DEC-11

12 rows selected.

seems to be what you asked for - but hard to say since it was pretty darn vague.

little to tweak

Venkata, August 05, 2011 - 3:39 pm UTC

Tom,

first thank you the pretty reply, i will eloberate you in detail. here i was waiting for your reply so that i can explain it a little more

you gave this

CUST_ID SALE_VAL ASOFLASTD WHAT_YOU_ASKED_FOR
---------- ---------- --------- ------------------
50333 59710311.4 31-JAN-11 44289384.5
43525 28868457.6 28-FEB-11 40566135.3
2827 52263813 31-MAR-11 72036300.6
44790 91808788.3 30-APR-11 82621668.8
10819 73434549.3 31-MAY-11 67047508.9
35 60660468.5 30-JUN-11 31365013.1
93257 2069557.62 31-JUL-11 17826510.8
90286 33583464 31-AUG-11 28697602.1
59528 23811740.1 30-SEP-11 18998234.4
88679 14184728.7 31-OCT-11 8633988.97
48826 3083249.24 30-NOV-11 17567744.3
55996 32052239.3 31-DEC-11

i need something like this

CUST_ID SALE_VAL ASOFLASTD WHAT_YOU_ASKED_FOR
---------- ---------- --------- ------------------
50333 59710311.4 31-JAN-11
43525 28868457.6 28-FEB-11
---------------------------------------------
88578769.0/2 <----- sale for feb
------^ here (2) is processing month number
---------------------------------------------
88578769.0 28-feb-11
2827 52263813 31-MAR-11
---------------------------------------------
140842582/3 <------- avg sale for 31-Mar-2011
------^ here 3 is processing month number

the result should be like this for other months also. Will that be possible.

August 06, 2011 - 8:01 am UTC

. here i was waiting for your reply so that i can explain it a little more

why - why OH WHY would you do something like that? Just to waste both of our time? Think about this please.

When you ask a question, you are asking for something from someone - that something is "their time". Their time is valuable to them. Do not waste it by asking incomplete, vague questions (in short - by asking the WRONG question the first N times around). Rather - use some of your precious time to precisely and concisely ask what you really meant to ask the first time - get it right the first time around.

There is nothing, repeat nothing, more frustrating then getting a response "ok, that was cool - but what I *really* meant to ask was....."

I cannot read your stuff - you did not use the code button - therefore, your text is using a proportional font - it is not readable.

Additionally, it makes no sense.

I *think*, I'm GUESSING, that you want a running total - divided by the number of observations - in short, an average.

You do not want it divided by the month number (what if the months didn't run from 1..12 or there where more than 12 or a month was missing etc...). You want an average plain and simple.

Now, since you used random numbers.... I cannot reproduce your example exactly - but here:

ops$tkyte%ORA11GR2> select cust_id, sale_val, asoflastday, avg(sale_val) over (order by asoflastday) what_you_PROBABLY_asked_for
  2    from xtest
  3   order by asoflastday
  4  /

   CUST_ID   SALE_VAL ASOFLASTD WHAT_YOU_PROBABLY_ASKED_FOR
---------- ---------- --------- ---------------------------
     44944 58494945.3 31-JAN-11                  58494945.3
     72912 47251551.8 28-FEB-11                  52873248.6
     17580 99228681.8 31-MAR-11                  68325059.6
     56824 76260729.8 30-APR-11                  70308977.2
     24469 11107307.6 31-MAY-11                  58468643.3
     20531 10186073.3 30-JUN-11                  50421548.3
     14909 42877408.5 31-JUL-11                    49343814
     11919 32108787.5 31-AUG-11                  47189435.7
     79820 4212567.56 30-SEP-11                  42414228.1
     28397 53316727.6 31-OCT-11                  43504478.1
     27438 96206453.3 30-NOV-11                  48295566.7
     22665 65783689.2 31-DEC-11                  49752910.3

12 rows selected.

Analytics

A reader, August 08, 2011 - 2:37 pm UTC

Hi Tom,

I have a question.

create table demo_1(col1 number, col2 varchar2(10));

insert into demo_1 values(1,'P');
insert into demo_1 values(2,'P');
insert into demo_1 values(3,'P');
insert into demo_1 values(4,'P');
insert into demo_1 values(5,'P');

create table demo(col1 number, col2 number, col3 varchar2(10));

insert into demo values(1,1,'A');
insert into demo values(2,1,'A');
insert into demo values(3,1,'A');
insert into demo values(4,2,'C');
insert into demo values(5,2,'B');
insert into demo values(6,3,'B');
insert into demo values(7,3,'B');
insert into demo values(8,4,'B');
insert into demo values(9,4,'C');
insert into demo values(10,5,'X');

We have to join the two tables demo and demo_1 where demo_1.col1 = demo.col2

The query should select the items based on follwoing criteria:

Select the value If there are multiple col2 values, they must exist in all the rows.
Select the value If there are only one col2 value.

The result will be

1 A
3 B
5 X

The col2=2 and 4 will be discarded as it has multiple col3 values in demo table.

Can we achieve this using Analytic function?

August 13, 2011 - 3:36 pm UTC

ops$tkyte%ORA11GR2> select t1.col1, count( distinct t2.col3), max(t2.col3)
  2    from demo_1 t1, demo t2
  3   where t1.col1 = t2.col2
  4   group by t1.col1
  5  having count(distinct t2.col3) = 1
  6  /

      COL1 COUNT(DISTINCTT2.COL3) MAX(T2.COL
---------- ---------------------- ----------
         1                      1 A
         5                      1 X
         3                      1 B

no analytics this time, just old fashioned aggregates.

consider using oracle forum

chris227, August 11, 2011 - 8:57 am UTC

Im my opinion oracle forums are a good place for those kind of questions.
Anyway, you may try this:

with a as (
select 1 a,'P' b from dual
union all
select 2,'P' from dual
union all
select 3,'P' from dual
union all
select 4,'P' from dual
union all
select 5,'P' from dual
),
b as(
select 1 a,1 b,'A' c from dual
union all
select 2,1,'A' from dual
union all
select 3,1,'A' from dual
union all
select 4,2,'C' from dual
union all
select 5,2,'B' from dual
union all
select 6,3,'B' from dual
union all
select 7,3,'B' from dual
union all
select 8,4,'B' from dual
union all
select 9,4,'C' from dual
union all
select 10,5,'X' from dual
)

select distinct * from (
select b.b,b.c, count(distinct b.c) over (partition by b.b) num from a,b
where
a.a=b.b)
where num = 1

pre-Analytics

A reader, August 11, 2011 - 4:35 pm UTC

I've not been "fortunate" to use Oracle pre-Analytics, just wonder what you could do WITHOUT analytics for this;

select *
from   (select t1.*,
               row_number() over (partition by t1.val order by t2.priority desc) rn
        from   t1, t2
        where  t2.id = t1.t2_id
       )
where  rn = 1

The above selects all rows from t1 (with a FK to t2) and returns the t1 row with the highest priority.

What would be a good equivalent without Analytics?

Just been having a chat about the good old days with colleagues and we agree Analytics "rock", but this would make work more challenging :)

answer to pre-analytics

chris227, August 12, 2011 - 3:40 am UTC

consider subqueries.

slightly modified example from question before:

with t2 as (
select 1 id,5 priority from dual
union all
select 2,1 from dual
union all
select 3,2 from dual
union all
select 4,3 from dual
union all
select 5,4 from dual
),
t1 as(
select 1 val,1 t2_id,'A' c from dual
union all
select 2,1,'A' from dual
union all
select 3,1,'A' from dual
union all
select 4,2,'C' from dual
union all
select 5,2,'B' from dual
union all
select 6,3,'B' from dual
union all
select 7,3,'B' from dual
union all
select 8,4,'B' from dual
union all
select 9,4,'C' from dual
union all
select 10,5,'X' from dual
),
j1 as (
select *
        from   t1, t2
        where  t2.id = t1.t2_id
)

select val, t2_id, c
from   j1
where
priority >= (
    select max(priority) from j1 j2
    where
    j1.val = j2.val
    )


       VAL      T2_ID C
---------- ---------- -
         1          1 A
         2          1 A
         3          1 A
         4          2 C
         5          2 B
         6          3 B
         7          3 B
         8          4 B
         9          4 C
        10          5 X

10 rows selected.

Same with but now old school

select val, t2_id, c
from   (select val, t2_id, c,
               row_number() over (partition by j1.val order by j1.priority desc) rn
        from j1
       )
where  rn = 1


       VAL      T2_ID C
---------- ---------- -
         1          1 A
         2          1 A
         3          1 A
         4          2 C
         5          2 B
         6          3 B
         7          3 B
         8          4 B
         9          4 C
        10          5 X

10 rows selected.

answer to pre-analytics 2

chris227, August 12, 2011 - 3:52 am UTC

sorry, slip of the pen, statements should be swapped.

Notice that the window of the analytic corrsponds to the predicate of the where clause of the subquery

One question

A reader, August 18, 2011 - 3:16 pm UTC

How to select the rows which are equal to number of number of master table.

Lets say we have three different flags in rows for a employee id.

In my second table I have object_id assigned to that flag. I will only select those flags which are equal to number of rows in first table.

table_1:

emp_id flag
1      XX
1      YY
1      ZZ
2      PP
2      XX
3      RR

table_2

emp_id   object_id
1        10
1        10
1        10
1        20
1        20
1        45
1        45
1        45
2        50
2        50
2        45
3        25
3        45

So mz expected output should be
emp_1 : 10, 45(since these occur 3 times equal to rows in table_1) We discard 20 as it occurs only 2 times

emp_2 : 50(since these occur 2 times equal to rows in table_1) We discard 45 as it occurs only 1 times

emp_3 : 25, 45(since these occur 1 times equal to rows in table_1)

Could you please suggest a solution.

Oracle 10G R1 .. Estimated rows in table_1 : 1 million
table_2 : 500 MB

August 30, 2011 - 12:54 pm UTC

no creates
no inserts
no look

if you want a query, provide create tables and inserts.

21 days sql question

Muhammet, August 26, 2011 - 4:41 pm UTC

Hi,
Think a table with patients and inspect date.
The business rule says that there can be no more than one inspection within 21 days. But table has dirty data.
They want a report with patient and inspection date with cleaning data. They say for every 21 day starting
with inspection date,take the first one in the report.

Sample
Patient Inspection_Date
------ --------------
Ahmet 01/01/2011 ==> DD/MM/YYYY
Ahmet 15/01/2011
Ahmet 01/02/2011
Ahmet 15/02/2011
Ahmet 01/06/2011
Ahmet 15/06/2011
Ahmet 01/07/2011

So the report will be

Ahmet 01/01/2011
Ahmet 02/02/2011
Ahmet 01/06/2011
Ahmet 01/07/2011

create table T (patient varchar2(20),inspection_date date);
insert into T values('Ahmet',to_date('01/01/2011','DD/MM/YYYY');
insert into T values('Ahmet',to_date('15/01/2011','DD/MM/YYYY');
insert into T values('Ahmet',to_date('01/02/2011','DD/MM/YYYY');
insert into T values('Ahmet',to_date('15/02/2011','DD/MM/YYYY');
insert into T values('Ahmet',to_date('01/06/2011','DD/MM/YYYY');
insert into T values('Ahmet',to_date('15/06/2011','DD/MM/YYYY');
insert into T values('Ahmet',to_date('01/07/2011','DD/MM/YYYY');

Thanks

re: 21 days sql question

Stew Ashton, August 29, 2011 - 11:41 am UTC

Hello Muhammet,

This solution may not be to everyone's taste, but the only alternative I know is worse:

SELECT distinct patient, keep_date
FROM t
MODEL
  PARTITION BY (patient)
  DIMENSION BY (Row_Number() OVER (PARTITION BY patient ORDER BY inspection_date) rn)
  MEASURES (inspection_date, inspection_date keep_date)
  RULES (
    keep_date[rn>1] ORDER BY rn = 
      CASE WHEN inspection_date[cv()] < keep_date[cv()-1]+21
      then keep_date[cv()-1]
      ELSE inspection_date[cv()] END
)
order by 1, 2;

PATIENT              KEEP_DATE                 
-------------------- ------------------------- 
Ahmet                01-JAN-11 00.00.00        
Ahmet                01-FEB-11 00.00.00        
Ahmet                01-JUN-11 00.00.00        
Ahmet                01-JUL-11 00.00.00

Re: One Question: Tables for Analytics question

A reader, August 30, 2011 - 4:29 pm UTC

Hi Tom,

Please find the create and inserts:

create table table_1(emp_id number, flag varchar2(10));

insert into table_1 values(1,'XX');
insert into table_1 values(1,'YY');
insert into table_1 values(1,'ZZ');
insert into table_1 values(2,'PP');
insert into table_1 values(2,'XX');
insert into table_1 values(3,'RR');

create table table_2(emp_id number, object_id varchar2(10));

insert into table_2 values(1,'10');
insert into table_2 values(1,'10');
insert into table_2 values(1,'10');
insert into table_2 values(1,'20');
insert into table_2 values(1,'20');
insert into table_2 values(1,'45');
insert into table_2 values(1,'45');
insert into table_2 values(1,'45');
insert into table_2 values(2,'50');
insert into table_2 values(2,'50');
insert into table_2 values(2,'45');
insert into table_2 values(3,'25');
insert into table_2 values(3,'45');

Thanks for the time.

To A Reader

Michel Cadot, August 31, 2011 - 2:21 am UTC

You have no need of analytic functions for this:

SQL> select b.emp_id, b.object_id
  2  from (select emp_id, count(*) nb from table_1 group by emp_id) a,
  3       (select emp_id, object_id, count(*) nb from table_2 group by emp_id, object_id) b
  4  where a.emp_id = b.emp_id and a.nb = b.nb
  5  order by 1, 2
  6  /
    EMP_ID OBJECT_ID
---------- ----------
         1 10
         1 45
         2 50
         3 25
         3 45

5 rows selected.

Regards
Michel

To Muhammet

Michel Cadot, August 31, 2011 - 6:54 am UTC

With one more row in your table, here's a query with no model clause:

SQL> select patient, inspection_date
  2  from t
  3  order by 1, 2
  4  /
PATIENT              INSPECTION_
-------------------- -----------
Ahmet                01-JAN-2011
Ahmet                15-JAN-2011
Ahmet                01-FEB-2011
Ahmet                10-FEB-2011
Ahmet                15-FEB-2011
Ahmet                01-JUN-2011
Ahmet                15-JUN-2011
Ahmet                01-JUL-2011

8 rows selected.

SQL> with
  2    data as (
  3      select patient, inspection_date,
  4             min(inspection_date)
  5               over (partition by patient order by inspection_date
  6                     range between 22 following and unbounded following)
  7              next_21
  8      from t a
  9      ),
 10    compute (patient, inspection_date, next_21) as (
 11      select patient, min(inspection_date), min(next_21)
 12      from data
 13      group by patient
 14      union all
 15      select data.patient, data.inspection_date, data.next_21
 16      from data, compute c
 17      where data.patient = c.patient
 18        and data.inspection_date = c.next_21
 19    )
 20  select patient, inspection_date
 21  from compute
 22  order by 1, 2
 23  /
PATIENT              INSPECTION_
-------------------- -----------
Ahmet                01-JAN-2011
Ahmet                01-FEB-2011
Ahmet                01-JUN-2011
Ahmet                01-JUL-2011

4 rows selected.

Regards
Michel

Analytic Question

A reader, August 31, 2011 - 11:04 am UTC

Is it possible to re write something like this?

SELECT
CASE
WHEN a.status IN (1,2,3,5)
AND a.pm_id = b.pm_id
AND c.ac_id =1
AND b.ho_id=1
THEN MIN(a.col2) keep (dense_rank FIRST
ORDER BY (
CASE
WHEN a.status IN (1,2,3,5)
AND a.pm_id = b.pm_id
AND c.ac_id =1
AND b.ho_id=1
THEN 0
ELSE 1
END), a.col2) over (partition BY a.col1)
ELSE NULL
END pp1_account_id,
CASE
WHEN a.status IN (1,2,3,5)
AND a.pm_id = b.pm_id
AND c.ac_id =1
AND b.ho_id=1
THEN MIN(b.col3) keep (dense_rank FIRST
ORDER BY (
CASE
WHEN a.status IN (1,2,3,5)
AND a.pm_id = b.pm_id
AND c.ac_id =1
AND b.ho_id=1
THEN 0
ELSE 1
END), b.col3) over (partition BY a.col1)
ELSE NULL
FROM table_a a,
table_b b,
table_c c
WHERE a.col1 = b.col1(+)
AND b.col1 = c.col1
AND c.col1 = a.col1

August 31, 2011 - 2:11 pm UTC

maybe - I don't have the specification for the question being asked, I don't have the schema and constraints, I don't know of the query itself as stated is correct.

just for readability I'd be using an inline view at the very least - so I didn't have to keep reproducing

a.status IN (1,2,3,5)
AND a.pm_id = b.pm_id
AND c.ac_id =1
AND b.ho_id=1

over and over again

@Michel re:Muhammet's question

Stew Ashton, August 31, 2011 - 2:10 pm UTC

Hi Michel,

Yes, that was the alternative I was thinking of, although I used the old-fashioned hierarchical query syntax. Amusing that we probably live near each other but meet only on a server in Austin :)

Restriction on analytic function result on fly

Sonal, September 01, 2011 - 1:44 pm UTC

Hi Tom,

Is it possible to restric query result based on cummulative data while processing.
i.e. table has following data

empno sal
10 100
11 200
12 300
13 400

by using analytic function, we can get

empno sal cummulative sal
10 100 100
11 200 300
12 300 600
13 400 1000

select empno, sal, sum(sal) over (order by empno) cummulative_sal from emp;

now if i want that give employee where cummulative salary first reach 500. so I will do

select * from (
select a.*, rank() over (order by cummulative_sal) rnk from (
select empno, sal, sum(sal) over (order by empno) cummulative_sal from emp) a
where cummulative_sal >= 500
) where rnk = 1

If I want this in dept wise or some other grouping then I will add partition clause. So here I have used 2 times from clause and get the result. I am wondering, if any function which tells me to stop processing more rows when my cummulative reach to that limit. if inside query is returning 100K rows and then if i am only interested in first 3 then why should I process another rows. I think performance wise there is no benefits but just to know is there any way for doing this?

Regards
Sonal

September 01, 2011 - 2:36 pm UTC

analytics is not able to do this (yet), no

A reader, September 03, 2011 - 6:12 pm UTC

If you are on 11gR2 you can do with recursive with clause as below I assume you want to reset the total when >=500 to 0

with t(deptno, empno, sal) as
(
select 1, 10, 100 from dual union all
select 1, 11, 200 from dual union all
select 1, 12, 300 from dual union all
select 1, 13, 400 from dual

),
data as
(
select t.*,row_number()over(partition by deptno order by empno) as rno,
sum(sal)over(partition by deptno order by empno) as Run_Tot
from t

)
,
rec (rno, deptno, empno, sal, Run_Tot, Total, flg) as
(
select rno, deptno, empno, sal, Run_Tot, sal, case when sal >=500 Then 1 ELSE 0 END
from data
where rno=1

union all

select d.rno, d.deptno, d.empno, d.sal, d.Run_Tot, case when r.Total + d.sal >=500 Then d.sal ELSE r.Total + d.sal END,
case when r.Total + d.sal >=500 Then 1 ELSE 0 END

from rec r, data d
where r.rno+1=d.rno and r.deptno=d.deptno

)
select * from rec
where flg=1

hth...

Thanks

@Sonal on "Restriction on analytic function result on fly"

Stew Ashton, September 04, 2011 - 4:01 am UTC

Hi Sonal,

You can remove a FROM in your query if you say

SELECT empno, sal, cum_sal FROM (
  SELECT empno, sal,
  sum(sal) OVER (ORDER BY empno) cum_sal,
  nvl(sum(sal) OVER (
    ORDER BY empno ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ),0) prev_cum_sal
  FROM emp
) WHERE cum_sal >=500 AND prev_cum_sal < 500;

You could get down to one FROM with the MODEL clause

SELECT empno, sal, cum_sal FROM emp
MODEL
RETURN UPDATED ROWS
DIMENSION BY (
  sum(sal) OVER (ORDER BY empno) cum_sal,
  nvl(sum(sal) OVER (
    ORDER BY empno ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ),0) prev_cum_sal
)
MEASURES (empno, sal)
RULES (
  empno[cum_sal>=500, prev_cum_sal<500] = empno[cv(),cv()]
);

Neither would do what you are looking for, but the execution plans have fewer steps. (The MODEL clause is just for fun.)

21 days sql question

CezarN, November 01, 2011 - 7:17 am UTC

I have modified Michel Cadot's solution in order not to use recursive CTE which work on 11g R2 and above (waiting for this "above" :D):

select patient, inspection_date from
(
select patient, inspection_date,
min(inspection_date)
over (partition by patient order by inspection_date
range between 22 following and unbounded following)
next_21,
row_number() over (partition by patient order by inspection_date) as rn
from t a
)
connect by prior next_21 = inspection_date
start with rn = 1
;

PATIENT INSPECTION_DATE
-------------------- ---------------
Ahmet 01/01/2011
Ahmet 01/02/2011
Ahmet 01/06/2011
Ahmet 01/07/2011

Regards,
Cezar

Counting partners who appeared together with each other

Ramis, November 27, 2011 - 8:58 am UTC

Hi Tom

I am posting here becuase I think it can perhaps be solved by analytics.

I have a table t

create table T
(player varchar2(40),
for_team varchar2(10),
Vs_team varchar2(10),
matchid number,
points number)
/


INSERT INTO T VALUES ('John','A','B',1,2);
INSERT INTO T VALUES ('Fulton','A','B',1,10);
INSERT INTO T VALUES ('Sarah','A','B',1,9);
INSERT INTO T VALUES ('Peter','B','A',1,7);
INSERT INTO T VALUES ('Carlos','B','A',1,9);
INSERT INTO T VALUES ('Jose','B','A',1,6);
INSERT INTO T VALUES ('Joe','A','B',2,8);
INSERT INTO T VALUES ('Peter','A','B',2,9);
INSERT INTO T VALUES ('Carlos','A','B',2,1);
INSERT INTO T VALUES ('Rubben','B','A',2,10);
INSERT INTO T VALUES ('John','B','A',2,0);
INSERT INTO T VALUES ('Fulton','B','A',2,1);
INSERT INTO T VALUES ('Marcelo','A','B',3,7);
INSERT INTO T VALUES ('Daniela','A','B',3,1);
INSERT INTO T VALUES ('John','A','B',3,2);
INSERT INTO T VALUES ('Jose','B','A',3,5);
INSERT INTO T VALUES ('Abrao','B','A',3,3);
INSERT INTO T VALUES ('Carlos','B','A',3,10);


Select * from t
order by matchid, for_team
/

Player For_team vs_team  matchid   Points
John       A       B       1       2
Fulton     A       B       1       10
Sarah      A       B       1       9
Carlos     B       A       1       9
Peter      B       A       1       7
Jose       B       A       1       6
Peter      A       B       2       9
Carlos     A       B       2       1
Joe        A       B       2       8
Fulton     B       A       2       1
John       B       A       2       0
Rubben     B       A       2       10
Daniela    A       B       3       1
John       A       B       3       2
Marcelo    A       B       3       7
Jose       B       A       3       5
Abrao      B       A       3       3
Carlos     B       A       3       10

Note: Each player can appear for more than one team in different matches. For exmaple, John appeared for Team A in matchid = 1 and then in matchid = 2 he appeared for team B. So same could for other players

Requirment:
I want for each player, the sum of total number of matches and points (which is easy using SUM) but along with total number of different teammates he played with in all the matches he appeared [for his team(s)] and also the total number opposition players and lastly the total number of different players he played with in all the matches he appeared in.

Just to clarify some terms, incase any doubt:
Here teammates = all players who appeared (for_team) in a match from the same team for whom the respective player also appeared in the same match.

opposition players = all players who appeared for VS_team played by each player.
total different players = all unique players who appeared for for_team or vs_team in the matches in which each player appeared.

Here is my desired outout:
Player    total Matches    Sum(points)   Different teammates    Different           total 
                                         player played with     opposition players  different                                                                                           players
John      3                4             5                      5                   10
Fulton    2                11            3                      4                   6  
Sarah     1                9             2                      3                   5
Peter     2                16            3                      4                   7
Carlos    3                20            4                      6                   10
Jose      2                11            3                      5                   8
Joe       1                8             2                      3                   5
Rubben    1                10            2                      3                   5
Marcelo   1                7             2                      3                   5
Daniela   1                1             2                      3                   5
Abrao     1                3             2                      3                   5

I want one simple query and shortest query to achieve about output since in my actual table data is huge.

thanks in advance.

regards
Asim

November 29, 2011 - 7:09 am UTC

I don't think analytics are appropriate, but an aggregation would be:

ops$tkyte%ORA11GR2> select t1_player,
  2         count(case when t1_player = t2_player then matchid end) "total matches",
  3         sum(case when t1_player = t2_player then points end) "total points",
  4         count(distinct case when t1_for_team = t2_for_team and t1_player <> t2_player then t2_player end) "diff teammates",
  5         count(distinct case when t1_for_team <>t2_for_team then t2_player end) "diff opp"
  6    from (
  7  select t1.player t1_player, t1.for_team t1_for_team, t2.player t2_player, t2.for_team t2_for_team, t1.matchid, t1.points
  8    from t t1, t t2
  9   where t1.matchid = t2.matchid
 10         )
 11   group by t1_player
 12   order by t1_player
 13  /

T1_PLAYER  total matches total points diff teammates   diff opp
---------- ------------- ------------ -------------- ----------
Abrao                  1            3              2          3
Carlos                 3           20              4          6
Daniela                1            1              2          3
Fulton                 2           11              3          4
Joe                    1            8              2          3
John                   3            4              5          5
Jose                   2           11              3          5
Marcelo                1            7              2          3
Peter                  2           16              3          4
Rubben                 1           10              2          3
Sarah                  1            9              2          3

11 rows selected.

we cartesian product each matchid group - that'll join every play with every other player - including themselves - by matchid.

then - if we count the rows where the players are the same - that is the number of matches that player played in.

if we sum the points where the players are the same - that is the sum of their points.

if we count the number of distinct player names where the teams are the same (but the player name is different) that is the total number of teammates.

if we count the number of distinct player names where the teams are different - that is the total number of opposing players...

Now, if this table is "HUGE", remember that this will make it temporarily 16 times "more huge".

I put airquotes on "HUGE" because some people thing a few million rows is "HUGE" - but it really isn't.

You do not want a single index to be used - you want two full scans and a nice big hash join.

thanks alot

Ramis, December 01, 2011 - 8:34 am UTC

Dear Tom

This is really so amazing. Thank you so much. I didn't expect that it will be so simple. On my actual table its takes just 11 seconds to give me output for 2600+ players and 3000+ matches. I appreciate your time and help. thanks again.

regards
Ramis

Requirement to capture boundry values

Rajeshwaran, Jeyabal, December 07, 2011 - 8:06 pm UTC

Tom:

I have a requirement to calculate the below values ( where the table 'T' in application is external table to read from text files)

1) ST_START_ID should be calculated as value of x when y = 'ST'
2) ST_END_ID should be calculated as value of x-1 when y ='CLM'
3) CLM_START_ID should be calculated as value of x when y ='CLM'
4) CLM_END_ID should be calculated as value of x-1 when y ='SE' or y ='CLM'

drop table t purge;
create table t(x number,y varchar2(10));
insert into t values(1,'ISA');
insert into t values(2,'ST');
insert into t values(3,'D1');
insert into t values(4,'D2');
insert into t values(5,'CLM1');
insert into t values(6,'X1');
insert into t values(7,'X2');
insert into t values(8,'X3');
insert into t values(9,'SE');
insert into t values(10,'ST');
insert into t values(11,'D1');
insert into t values(12,'D2');
insert into t values(13,'D3');
insert into t values(14,'CLM1');
insert into t values(15,'X1');
insert into t values(16,'CLM2');
insert into t values(17,'X1');
insert into t values(18,'X2');
insert into t values(19,'CLM3');
insert into t values(20,'X1');
insert into t values(21,'SE');
insert into t values(22,'IEA');
commit;

I need the output like this

ST_START_ID  ST_END_ID  CLM_START_ID CLM_END_ID
2    4   5    8
10    13   14    15 
10    13   16    18
10    13   19    20

December 08, 2011 - 12:37 pm UTC

umm, this doesn't make sense at all.

the pivoting is unexplained and doesn't make sense.

You wrote:

1) ST_START_ID should be calculated as value of x when y = 'ST'
2) ST_END_ID should be calculated as value of x-1 when y ='CLM'
3) CLM_START_ID should be calculated as value of x when y ='CLM'
4) CLM_END_ID should be calculated as value of x-1 when y ='SE' or y ='CLM'

I see nothing where y = 'CLM'

going further on, assuming you meant "y like'CLM%'", how did you know to "go on" when you hit CLM1 with x=5. you used it twice for st_end_id and clm_start_id, why didn't you use it a third time for clm_end_id

also, why did you use it twice, if that is always going to be true, you don't really need the st_end_id do you - it is rather obvious what it is - you don't need to compute it in the result set.

and what happens if in between an 'ST' record and the next 'ST' record - there are no clm records.

and you best explain those two last lines of output - they make no sense to me.

Ask yourself - honestly and truly ask yourself - have you given enough information in the form of a specification here that anyone could write code from it without ambiguity. I could code up something that gets your result set - that is easy enough, but I have no idea if it is what you really meant. I won't take my time until you give us a lot more detail.

Requirement to capture boundry values

Rajeshwaran, Jeyabal, December 07, 2011 - 8:08 pm UTC

Tom, Sorry to miss this in previous post, that the size of this file mapped to External table 'T' will be around 54MB to 80MB.

December 08, 2011 - 12:37 pm UTC

so it is very small, ok, post more details from above please...

Requirement to capture boundry values

Rajeshwaran, Jeyabal, December 08, 2011 - 1:07 pm UTC

Tom:

Here is the more details

1) With in ST and SE ( that is where x between 2 and 9) there will be multiple CLM% segments. (atleast there will be one CLM available between ST and SE)
2) I need to output the boundry for each CLM segments with ST and SE combinations.

a) So i have one CLM% between x >=2 and x<=9 so I need the one output like this.

 ST_START_ID  ST_END_ID  CLM_START_ID CLM_END_ID
 2             4            5             8

b) Now i have theree CLM% between x >=10 and x<=21 so I need the three output like this one for each CLM% values.

 
 ST_START_ID  ST_END_ID  CLM_START_ID CLM_END_ID
 10             13            14             15 
 10             13            16             18
 10             13            19             20

December 09, 2011 - 4:00 pm UTC

this is as close as I can get you - YOU will take it from here.

ops$tkyte%ORA11GR2> select *
  2    from (
  3  select x, y, grp,
  4             max(st_x) over (partition by grp) new_st_x,
  5             max(se_x) over (partition by grp) new_se_x
  6    from (
  7  select x, y,
  8         count( case when y = 'ST' then 'ST' end ) over (order by x) grp,
  9             case when y = 'ST' then x end st_x,
 10             case when y = 'SE' then x end se_x
 11    from t
 12         )
 13             )
 14   where y like 'CLM_'
 15   order by x
 16  /

         X Y                 GRP   NEW_ST_X   NEW_SE_X
---------- ---------- ---------- ---------- ----------
         5 CLM1                1          2          9
        14 CLM1                2         10         21
        16 CLM2                2         10         21
        19 CLM3                2         10         21

your description leaves something to be - well, I'll leave it at that.

Anyway - what you have there is every CLM_ record "joined" with its prior ST record and post SE record. That should be everything you need.

If you need the prior CLM_ record "joined" to each existing CLM_ record - please use LAG()

Requirement to capture boundry values

Rajeshwaran, Jeyabal, December 10, 2011 - 2:41 pm UTC

Thank Tom.

I think i got it.

rajesh@ORA11GR2> select st_x as st_start_id,
  2        st_end_id - 1 as st_end_id,
  3        min(x) as clm_start_id,
  4        max(x) as clm_end_id
  5  from (
  6  select x,y,st_x,st_grp,last_value(clm_grp ignore nulls) over(order by x) as clm_grp,
  7         min(case when y like 'CLM%' then x end) over(partition by st_grp order by x) as st_end_id
  8  from (
  9  select x,y,
 10       case when y like 'CLM%' then row_number() over(order by x)
 11       when y like 'SE' then 1 end as clm_grp,
 12       last_value(case when y ='ST' then x end ignore nulls) over(order by x) as st_x,
 13      count(case when y='ST' then 'ST' end) over(order by x) as st_grp
 14  from t
 15      )
 16  )
 17  where clm_grp > 1
 18  group by st_x,st_end_id,clm_grp
 19  order by clm_grp
 20  /

ST_START_ID  ST_END_ID CLM_START_ID CLM_END_ID
----------- ---------- ------------ ----------
          2          4            5          8
         10         13           14         15
         10         13           16         18
         10         13           19         20

Elapsed: 00:00:00.11
rajesh@ORA11GR2>
rajesh@ORA11GR2>

Alternate approach

Matt McPeak, December 14, 2011 - 2:56 pm UTC

Another way to do it:

select * from
(
select t.*, max(case when y = 'ST' THEN x ELSE null end) over ( partition by null order by x range between unbounded preceding and current row ) st_start_id,
max(case when y = 'CLM1' THEN x-1 ELSE null end) over ( partition by null order by x range between unbounded preceding and current row ) st_end_id,
max(case when y like 'CLM%' THEN x ELSE null end) over ( partition by null order by x range between unbounded preceding and current row ) clm_start_id,
min(case when y like 'CLM%' or y IN ('SE','IEA') THEN x-1 ELSE null end) over ( partition by null order by x range between 1 following and unbounded following) clm_end_id
from t
)
where y like 'CLM%'

Analytic function

A reader, January 13, 2012 - 9:55 pm UTC

Hello Tom,

I have a table TEST which contain one column id.
And it has values from 99 to 1 in desc manner.

ID
99
98
..
..
1

Please help me getting a output as below

id1 id2 id3 id4 id5 id6 id7
99 98 97 96 95 94 93
86 87 88 89 90 91 92
85 84 83 .. ..

Thanks.
lalu.

January 17, 2012 - 11:23 am UTC

And it has values from 99 to 1 in desc manner.

No, I'm sorry, it does not.

I might well have 1 to 99 in it, but they are not in any "order". there is no order to any set of rows unless and until you add an order by.

your output doesn't make sense to me.

you go 99 down to 93
then from 86 up to 92
then from 85 down to ....

do you want to go up, or down, or did you really mean to interleave it

ANY WHY DOESN'T ANYONE SEEM TO SAY STUFF LIKE THIS ANYMORE. where are the specifications, where is the explanation of what you really mean? pictograms like this don't tell the whole story.

pick the one you want

ops$tkyte%ORA11GR2> with data(r)
  2  as
  3  (select 1 r from dual
  4   union all
  5   select r+1 from data where r < 99
  6  )
  7  select max(decode(m,6,r)) id1,
  8         max(decode(m,5,r)) id2,
  9         max(decode(m,4,r)) id3,
 10         max(decode(m,3,r)) id4,
 11         max(decode(m,2,r)) id5,
 12         max(decode(m,1,r)) id6,
 13         max(decode(m,0,r)) id7
 14    from (
 15  select r, trunc((r+7-1.99)/7) d, mod(r-2+7, 7) m
 16    from data
 17         )
 18   group by d
 19   order by d desc;

       ID1        ID2        ID3        ID4        ID5        ID6        ID7
---------- ---------- ---------- ---------- ---------- ---------- ----------
        99         98         97         96         95         94         93
        92         91         90         89         88         87         86
        85         84         83         82         81         80         79
        78         77         76         75         74         73         72
        71         70         69         68         67         66         65
        64         63         62         61         60         59         58
        57         56         55         54         53         52         51
        50         49         48         47         46         45         44
        43         42         41         40         39         38         37
        36         35         34         33         32         31         30
        29         28         27         26         25         24         23
        22         21         20         19         18         17         16
        15         14         13         12         11         10          9
         8          7          6          5          4          3          2
         1

15 rows selected.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> with data(r)
  2  as
  3  (select 1 r from dual
  4   union all
  5   select r+1 from data where r < 99
  6  )
  7  select max(decode(m,0,r)) id1,
  8         max(decode(m,1,r)) id2,
  9         max(decode(m,2,r)) id3,
 10         max(decode(m,3,r)) id4,
 11         max(decode(m,4,r)) id5,
 12         max(decode(m,5,r)) id6,
 13         max(decode(m,6,r)) id7
 14    from (
 15  select r, trunc((r+7-1.99)/7) d, mod(r-2+7, 7) m
 16    from data
 17         )
 18   group by d
 19   order by d desc
 20  /

       ID1        ID2        ID3        ID4        ID5        ID6        ID7
---------- ---------- ---------- ---------- ---------- ---------- ----------
        93         94         95         96         97         98         99
        86         87         88         89         90         91         92
        79         80         81         82         83         84         85
        72         73         74         75         76         77         78
        65         66         67         68         69         70         71
        58         59         60         61         62         63         64
        51         52         53         54         55         56         57
        44         45         46         47         48         49         50
        37         38         39         40         41         42         43
        30         31         32         33         34         35         36
        23         24         25         26         27         28         29
        16         17         18         19         20         21         22
         9         10         11         12         13         14         15
         2          3          4          5          6          7          8
                                                                           1

15 rows selected.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> with data(r)
  2  as
  3  (select 1 r from dual
  4   union all
  5   select r+1 from data where r < 99
  6  )
  7  select max(decode(m,decode( mod(d,2), 1, 0, 6) ,r)) id1,
  8         max(decode(m,decode( mod(d,2), 1, 1, 5) ,r)) id2,
  9         max(decode(m,decode( mod(d,2), 1, 2, 4) ,r)) id3,
 10         max(decode(m,decode( mod(d,2), 1, 3, 3) ,r)) id4,
 11         max(decode(m,decode( mod(d,2), 1, 4, 2) ,r)) id5,
 12         max(decode(m,decode( mod(d,2), 1, 5, 1) ,r)) id6,
 13         max(decode(m,decode( mod(d,2), 1, 6, 0) ,r)) id7
 14    from (
 15  select r, trunc((r+7-1.99)/7) d, mod(r-2+7, 7) m
 16    from data
 17         )
 18   group by d
 19   order by d desc
 20  /

       ID1        ID2        ID3        ID4        ID5        ID6        ID7
---------- ---------- ---------- ---------- ---------- ---------- ----------
        99         98         97         96         95         94         93
        86         87         88         89         90         91         92
        85         84         83         82         81         80         79
        72         73         74         75         76         77         78
        71         70         69         68         67         66         65
        58         59         60         61         62         63         64
        57         56         55         54         53         52         51
        44         45         46         47         48         49         50
        43         42         41         40         39         38         37
        30         31         32         33         34         35         36
        29         28         27         26         25         24         23
        16         17         18         19         20         21         22
        15         14         13         12         11         10          9
         2          3          4          5          6          7          8
         1

15 rows selected.

ops$tkyte%ORA11GR2>

Ouch

Chuck, January 17, 2012 - 1:28 pm UTC

To me #3 wins the "Least Likely to be Useful" award hands down.

That that 'with' clause passes the compiler, let alone producing more than one row, simply makes my head hurt.

SQL fun at it's best. Thanks.

Analytic and Tom both rocks.

A reader, January 18, 2012 - 7:32 am UTC

Thanks Tom.
I could not explain my requirement, but you have provided me the output.

January 18, 2012 - 7:44 am UTC

what did analytics have to do with anything? We didn't use any in this particular problem.

Alternative approach using pivot

AndyP, January 30, 2012 - 5:10 am UTC

Tom has of course provided the exact answer to the question and it probably isn't actually worth pursuing for its own sake, but I just wondered how to achieve the same results using the pivot syntax, which might be of interest to someone I suppose

In doing so I found that the solution provided relies on knowing the starting value of the dataset being used, so I've made it work for any highest number, not just 99

This method seems to require more work generating the data in the right format, but then the presentation of it using pivot is straightforward

(I too found the with clause a surprising formulation, so I used a more explicit one)


col rowval noprint

prompt This one has each row descending
with 
data as(select r from (select level r from dual connect by level <=25) order by r desc),
datasets as(select r dataval,trunc((rownum-0.1)/7) rowval,mod((rownum-1),7) colval from data)
select * from datasets pivot(max(dataval) for colval in(0 as id1,1 as id2,2 as id3,3 as id4,4 as id5,5 as id6,6 as id7))
order by rowval
/

prompt This one has each row ascending
with
data as(select r from (select level r from dual connect by level <=25) order by r desc),
datasets as(select r dataval,trunc((rownum-0.1)/7) rowval,mod((rownum-1),7) colval from data)
select * from datasets pivot(max(dataval) for colval in(6 as id1,5 as id2,4 as id3,3 as id4,2 as id5,1 as id6,0 as id7))
order by rowval
/

prompt This one has each row ascending too, achieved by reversing the data rather than altering the pivot column sequence
with
data as(select r from (select level r from dual connect by level <=25) order by r desc),
datasets as(select r dataval,trunc((rownum-0.1)/7) rowval,6-mod((rownum-1),7) colval from data)
select * from datasets pivot(max(dataval) for colval in(0 as id1,1 as id2,2 as id3,3 as id4,4 as id5,5 as id6,6 as id7))
order by rowval
/

prompt This one has rows alternately descending and ascending, achieved by ordering the data like that before pivoting
with
data as
(select r from (select level r from dual connect by level <=25) order by r desc),
datasets as
(select r dataval,trunc((rownum-0.1)/7) rowval,decode(mod(trunc((rownum-0.1)/7),2),0,mod((rownum-1),7),6-mod((rownum-1),7)) colval from data)
select * from datasets pivot(max(dataval) for colval in(0 as id1,1 as id2,2 as id3,3 as id4,4 as id5,5 as id6,6 as id7))
order by rowval
/

SQL > @pivdown
This one has each row descending

       ID1        ID2        ID3        ID4        ID5        ID6        ID7
---------- ---------- ---------- ---------- ---------- ---------- ----------
        25         24         23         22         21         20         19
        18         17         16         15         14         13         12
        11         10          9          8          7          6          5
         4          3          2          1

This one has each row ascending

       ID1        ID2        ID3        ID4        ID5        ID6        ID7
---------- ---------- ---------- ---------- ---------- ---------- ----------
        19         20         21         22         23         24         25
        12         13         14         15         16         17         18
         5          6          7          8          9         10         11
                                          1          2          3          4

This one has each row ascending too, achieved by reversing the data rather than altering the pivot column sequence

       ID1        ID2        ID3        ID4        ID5        ID6        ID7
---------- ---------- ---------- ---------- ---------- ---------- ----------
        19         20         21         22         23         24         25
        12         13         14         15         16         17         18
         5          6          7          8          9         10         11
                                          1          2          3          4

This one has rows alternately descending and ascending, achieved by ordering the data like that before pivoting

       ID1        ID2        ID3        ID4        ID5        ID6        ID7
---------- ---------- ---------- ---------- ---------- ---------- ----------
        25         24         23         22         21         20         19
        12         13         14         15         16         17         18
        11         10          9          8          7          6          5
                                          1          2          3          4

January 31, 2012 - 5:48 pm UTC

;)

thanks

RANGE BETWEEN UNBOUNDED PRECEDING confusion

biswaranjan., April 05, 2012 - 6:29 am UTC

Hi Tom,
I am confused for the out put of the below query .
emp table has the below 14 columns.

7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-87 1100 20
7900 JAMES CLERK 7698 03-DEC-81 950 30
7902 FORD ANALYST 7566 03-DEC-81 3001 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10

.........
SELECT deptno, empno, sal,
Count(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) CNT_LT_HALF,
COUNT(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) CNT_MT_HALF
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, sal

DEPTNO EMPNO SAL CNT_LT_HALF CNT_MT_HALF
------- ------ ----- ----------- -----------
20 7369 800 0 3
20 7876 1100 0 3
20 7566 2975 2 0
20 7788 3000 2 0
20 7902 3000 2 0
30 7900 950 0 3
30 7521 1250 0 1
30 7654 1250 0 1
30 7844 1500 0 1
30 7499 1600 0 1
30 7698 2850 3 0

11 rows selected.

can you plz explain the out put of the CNT_LT_HALF and CNT_MT_HALF
columns . I was thinking myself and found no confusion for the out put of cnt_lt_half column but confused for the cnt_mt_half column's .

could you plz explain the logic.

thanks and regards,
Biswaranjan.

April 06, 2012 - 10:17 am UTC

does this make more sense:

ops$tkyte%ORA11GR2> break on deptno skip 1
ops$tkyte%ORA11GR2> SELECT deptno, empno, sal,
  2         Count(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
  3           BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) cnt1,
  4         first_value(empno) OVER (PARTITION BY deptno ORDER BY sal RANGE
  5           BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) fv1,
  6         last_value(empno) OVER (PARTITION BY deptno ORDER BY sal RANGE
  7           BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) lv1,
  8         COUNT(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
  9           BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) cnt2,
 10         first_value(empno) OVER (PARTITION BY deptno ORDER BY sal RANGE
 11           BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) fv2,
 12         last_value(empno) OVER (PARTITION BY deptno ORDER BY sal RANGE
 13           BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) lv2
 14  FROM emp
 15  WHERE deptno IN (20, 30)
 16  ORDER BY deptno, sal
 17  /

DEPTNO EMPNO   SAL  CNT1   FV1   LV1  CNT2   FV2   LV2
------ ----- ----- ----- ----- ----- ----- ----- -----
    20  7369   800     0                 3  7566  7902
        7876  1100     0                 3  7566  7902
        7566  2975     2  7369  7876     0
        7788  3000     2  7369  7876     0
        7902  3000     2  7369  7876     0

    30  7900   950     0                 3  7844  7698
        7521  1250     0                 1  7698  7698
        7654  1250     0                 1  7698  7698
        7844  1500     0                 1  7698  7698
        7499  1600     0                 1  7698  7698
        7698  2850     3  7900  7654     0


11 rows selected.

see what the empno ranges are now - you can see exactly which records are included in the range...

A reader, April 06, 2012 - 12:24 pm UTC

Dear Sir,

the range for preceding is from unlimited lower bound to sal/2 and for following is from sal/2 to unlimited upper bound.

My above understanding match with the above output for preceding clause but not for following

for eg: for record deptno=20 and empno=7369 the range will be 400 to unlimited.

can you please clarify this?

Thanks in Advance

April 08, 2012 - 5:44 pm UTC

does this help:

ops$tkyte%ORA11GR2> SELECT deptno, empno, sal,
  2         'btwn 1st and ' || (sal-sal/2) txt1,
  3         Count(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
  4           BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) cnt1,
  5         first_value(empno) OVER (PARTITION BY deptno ORDER BY sal RANGE
  6           BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) fv1,
  7         last_value(empno) OVER (PARTITION BY deptno ORDER BY sal RANGE
  8           BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) lv1,
  9             'btwn ' || (sal+sal/2) || ' and last' txt2,
 10         COUNT(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
 11           BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) cnt2,
 12         first_value(empno) OVER (PARTITION BY deptno ORDER BY sal RANGE
 13           BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) fv2,
 14         last_value(empno) OVER (PARTITION BY deptno ORDER BY sal RANGE
 15           BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) lv2
 16  FROM emp
 17  WHERE deptno IN (20, 30)
 18  ORDER BY deptno, sal
 19  /

DEPTNO EMPNO   SAL TXT1               CNT1   FV1   LV1 TXT2                CNT2   FV2   LV2
------ ----- ----- ----------------- ----- ----- ----- ------------------ ----- ----- -----
    20  7369   800 btwn 1st and 400      0             btwn 1200 and last     3  7566  7902
        7876  1100 btwn 1st and 550      0             btwn 1650 and last     3  7566  7902
        7566  2975 btwn 1st and 1487     2  7369  7876 btwn 4462.5 and la     0
                   .5                                  st

        7788  3000 btwn 1st and 1500     2  7369  7876 btwn 4500 and last     0
        7902  3000 btwn 1st and 1500     2  7369  7876 btwn 4500 and last     0

    30  7900   950 btwn 1st and 475      0             btwn 1425 and last     3  7844  7698
        7521  1250 btwn 1st and 625      0             btwn 1875 and last     1  7698  7698
        7654  1250 btwn 1st and 625      0             btwn 1875 and last     1  7698  7698
        7844  1500 btwn 1st and 750      0             btwn 2250 and last     1  7698  7698
        7499  1600 btwn 1st and 800      0             btwn 2400 and last     1  7698  7698
        7698  2850 btwn 1st and 1425     3  7900  7654 btwn 4275 and last     0


11 rows selected.

unbound following

Biswaranjan, April 07, 2012 - 12:28 am UTC

Hi Tom,
First of all thanks for your quick reply.
I finally got it after thinking deeply your query's output.

@reader

you are thinking ,the way i was thinking so same confusionn you faced.

for your eg:
for record deptno=20 and emp=7369 the range will be from current sal 400 ahead to
unlimited.
so the lower limit will be 800+800/2=1200
and the upper limit will be unlimited. so obviously the count for that record will be
3.

thanks again tom.
regards,
Biswaranjan

A reader, April 07, 2012 - 4:20 pm UTC

@Biswaranjan - Thanks I understand that from output for following but the same does not work for preceding may be i miss something

for eg:

when processing deptno=20 and empno=7876 for preceding, the range will be from unbounded preceding to 800(previous record salary) + 800/2=1200 upper limit
so the count in this case should be 1 rather than 0?

is it like that for preceding the range is from unlimited TO sal/2 of previous record and for following it
is from sal*1.5 TO unlimited following?

@Tom can you please clarify?

Thanks in Advance

range window

biswaranjan, April 09, 2012 - 1:06 am UTC

@reader
for preceding the range should be
unbound prededing to sal-sal/2 and for followning
the range should be sal+sal/2.

you got wrong output coz for preceding you have added sal/2 with sal where you should have subtracted fro sal.

thanks,
Biswaranjan.

range window with example

biswaranjan, April 09, 2012 - 1:15 am UTC

@reader
hope below example will clarify your doubt completely.\

SELECT deptno, empno, sal,sal-sal/4,sal+sal/4,
Count(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN UNBOUNDED PRECEDING AND (sal/4) PRECEDING) CNT_LT_HALF,
COUNT(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN (sal/4) FOLLOWING AND UNBOUNDED FOLLOWING) CNT_MT_HALF
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, sal;
Deptno empno sal (sa-sa/4)(sa+sa/4)pre_CNT follow_CNT
20 7369 800 600 1000 0 4
20 7876 1100 825 1375 1 3
20 7566 2975 2231.25 3718.75 2 0
20 7788 3000 2250 3750 2 0
20 7902 3000 2250 3750 2 0
30 7900 950 712.5 1187.5 0 5
30 7521 1250 937.5 1562.5 0 2
30 7654 1250 937.5 1562.5 0 2
30 7844 1500 1125 1875 1 1
30 7499 1600 1200 2000 1 1
30 7698 2850 2137.5 3562.5 5 0

pre_count column is less than (sa-sa/4) and following_count column is more than (sa+sa/4).

thanks,
Biswaranjan.

A reader, April 09, 2012 - 10:53 am UTC

Thank you very much Biswaranjan!!

analytic performance

Biswaranjan, April 18, 2012 - 4:04 am UTC

Hi Tom,

hope you are doing fine :).

can you please tell which below query is good for performance.

select ename,deptno,(
select max(sal) from emp e1
where emp.deptno = e1.deptno ) max_sal_deptwise
from emp
order by deptno;

select ename,deptno,max(sal) over(partition by deptno) max_sal_deptwise from emp;

thanks,
Biswaranjan

April 18, 2012 - 8:01 am UTC

In almost every case - the analytic would be superior to the scalar subquery in general.

an exception to that might be if you wanted to get the first row from the query as fast as possible - the scalar subquery might have a slight advantage.

ops$tkyte%ORA11GR2> set linesize 1000
ops$tkyte%ORA11GR2> set echo on
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> /*
ops$tkyte%ORA11GR2> drop table t;
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create table t
ops$tkyte%ORA11GR2> as
ops$tkyte%ORA11GR2> select object_name ename, object_id empno,
ops$tkyte%ORA11GR2>        mod(object_id,1000) deptno, mod(object_id,1000) sal,
ops$tkyte%ORA11GR2>            a.*
ops$tkyte%ORA11GR2>   from all_objects a
ops$tkyte%ORA11GR2> /
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create index t_idx on t(deptno,sal);
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );
ops$tkyte%ORA11GR2> */
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> variable n number
ops$tkyte%ORA11GR2> set autotrace traceonly
ops$tkyte%ORA11GR2> exec  :n := dbms_utility.get_cpu_time;

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> select ename,deptno,(
  2  select max(sal) from t e1
  3  where emp.deptno = e1.deptno ) max_sal_deptwise
  4  from t  emp
  5  order by deptno;

72889 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4042721663

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       | 72889 |  2064K|       |   978   (1)| 00:00:12 |
|   1 |  SORT AGGREGATE              |       |     1 |     8 |       |            |          |
|   2 |   FIRST ROW                  |       |     1 |     8 |       |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T_IDX |     1 |     8 |       |     2   (0)| 00:00:01 |
|   4 |  SORT ORDER BY               |       | 72889 |  2064K|  2872K|   978   (1)| 00:00:12 |
|   5 |   TABLE ACCESS FULL          | T     | 72889 |  2064K|       |   395   (1)| 00:00:05 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("E1"."DEPTNO"=:B1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      17441  consistent gets
          0  physical reads
          0  redo size
    2767962  bytes sent via SQL*Net to client
      53869  bytes received via SQL*Net from client
       4861  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      72889  rows processed

ops$tkyte%ORA11GR2> exec dbms_output.put_line( (dbms_utility.get_cpu_time-:n) || ' hsecs' );
37 hsecs

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec  :n := dbms_utility.get_cpu_time;

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> select ename,deptno,max(sal) over(partition by deptno) max_sal_deptwise from t;

72889 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2335850315

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 72889 |  2348K|       |  1049   (1)| 00:00:13 |
|   1 |  WINDOW SORT       |      | 72889 |  2348K|  3160K|  1049   (1)| 00:00:13 |
|   2 |   TABLE ACCESS FULL| T    | 72889 |  2348K|       |   395   (1)| 00:00:05 |
-----------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1422  consistent gets
          0  physical reads
          0  redo size
    2767962  bytes sent via SQL*Net to client
      53869  bytes received via SQL*Net from client
       4861  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      72889  rows processed

ops$tkyte%ORA11GR2> exec dbms_output.put_line( (dbms_utility.get_cpu_time-:n) || ' hsecs' );
32 hsecs

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> set autotrace off

Thanks.

Biswaranjan, April 18, 2012 - 11:34 pm UTC

Hi Tom,
Thanks for a nice and simple explanation.

simply you rock!

one little request to you can you please tell which book is good for performance tuning for developer as a beginer to performance tuning.(sorry if I am not supposed to post these
requests).

thanks,
Biswaranjan.

April 19, 2012 - 5:44 am UTC

you can post these requests - but i always say the same thing

I don't know of a single book - or even a set of books - that can teach a developer the 15 steps to performance. The optimizer is that check list.

My approach to performance has always been:

a) understand how it (the database) works
b) know as much as you can about your options
c) understand those options work in general
d) put a, b, c together logically to say "these are probably the best ways"
e) benchmark them.

Just like I did above.

and then with experience - like 5-10-15 years - you'll be able to recognize general patterns/problems and be able to solve them without even thinking about it.

answer says it all!

Ravi B, April 19, 2012 - 3:50 pm UTC

WOW nice

Biswaranjan, April 19, 2012 - 11:36 pm UTC

Thanks a lot.

I will follow your vital points mentioned.

A log way to go for me(3.5 years in oracle :)).
I will try to do performance issue myself if not able then surely post in this site.

thanks again,
Biswaranjan

scnhealthcheck.sql" script for the Oracle version 9.2.0.1.0 .

Menaka, May 16, 2012 - 6:54 am UTC

Hi Tom

We're searching scnhealthcheck.sql script for the Oracle version 9.2.0.1.0 .

Could you please help me download this?
Thanks

May 17, 2012 - 2:16 am UTC

please utilize support for something like this.

Grouping moving set

Sanji, May 24, 2012 - 1:07 pm UTC

Tom,

I'm trying to write a query that'd return 1 record for a given set from a set of entries coming into the system.

CREATE TABLE EQP_TEST (SEQ_NUM INT, JTYPE CHAR(1),EQP_ID INT, EQP_AMT INT, DATE_TIME TIME);
INSERT INTO EQP_TEST VALUES(1,'A',10,10,CAST ('10:00:00' AS TIME));
INSERT INTO EQP_TEST VALUES(2,'A',10,10,CAST ('12:00:00' AS TIME));
INSERT INTO EQP_TEST VALUES(3,'A',10,10,CAST ('12:30:00' AS TIME));
INSERT INTO EQP_TEST VALUES(4,'B',10,20,CAST ('12:35:00' AS TIME));
INSERT INTO EQP_TEST VALUES(5,'B',10,20,CAST ('13:00:00' AS TIME));
INSERT INTO EQP_TEST VALUES(6,'C',20,40,CAST ('13:30:00' AS TIME));
INSERT INTO EQP_TEST VALUES(7,'C',20,40,CAST ('14:00:00' AS TIME));
INSERT INTO EQP_TEST VALUES(8,'A',10,10,CAST ('14:30:00' AS TIME));
INSERT INTO EQP_TEST VALUES(9,'C',20,40,CAST ('15:00:00' AS TIME));

What i need is
SEQ_NUM JTYPE EQP_ID EQP_AMT DATE_TIME
3 A 10 10 12:30:00
5 B 10 20 13:00:00
7 C 20 40 14:00:00
8 A 10 10 14:30:00
9 C 20 40 15:00:00

I know the max(date_time) over (partition by jtype rows unbounded following) would also include seq_num 8 for JTYPE 'A', which is what i do not want.
I am unable to come up with grouping sequence numbers 1,2,3 as 1 group, 4 and 5 as another and so on.

Any help is much appreciated.

Thanks

May 24, 2012 - 3:10 pm UTC

I'm sorry but did you mean to ask "askbill" or someone else this question?

CREATE TABLE EQP_TEST (SEQ_NUM INT, JTYPE CHAR(1),EQP_ID INT, EQP_AMT INT,
DATE_TIME TIME);

doesn't really compute in Oracle - and any answer I gave you probably wouldn't work in whatever else you are actually using.

Grouping moving set

Sanji, May 24, 2012 - 4:10 pm UTC

My apologies. I should have known better.
Could you take out time to explain how this requirement can be achieved in Oracle.

CREATE TABLE EQP_TEST (SEQ_NUM NUMBER, JTYPE CHAR(1), EQP_ID NUMBER, EQP_AMT NUMBER, DATE_TIME DATE);

INSERT INTO EQP_TEST VALUES(1,'A',10,10, TO_DATE('2012-05-01:10:00:00','YYYY-MM-DD:HH24:MI:SS'));
INSERT INTO EQP_TEST VALUES(2,'A',10,10, TO_DATE('2012-05-01:10:30:00','YYYY-MM-DD:HH24:MI:SS'));
INSERT INTO EQP_TEST VALUES(3,'A',10,10, TO_DATE('2012-05-01:11:30:00','YYYY-MM-DD:HH24:MI:SS'));
INSERT INTO EQP_TEST VALUES(4,'B',10,20, TO_DATE('2012-05-01:12:30:00','YYYY-MM-DD:HH24:MI:SS'));
INSERT INTO EQP_TEST VALUES(5,'B',10,20, TO_DATE('2012-05-01:13:00:00','YYYY-MM-DD:HH24:MI:SS'));
INSERT INTO EQP_TEST VALUES(6,'C',20,40, TO_DATE('2012-05-01:14:00:00','YYYY-MM-DD:HH24:MI:SS'));
INSERT INTO EQP_TEST VALUES(7,'C',20,40, TO_DATE('2012-05-01:14:30:00','YYYY-MM-DD:HH24:MI:SS'));
INSERT INTO EQP_TEST VALUES(8,'A',10,10, TO_DATE('2012-05-01:15:30:00','YYYY-MM-DD:HH24:MI:SS'));
INSERT INTO EQP_TEST VALUES(9,'D',20,40, TO_DATE('2012-05-01:15:45:00','YYYY-MM-DD:HH24:MI:SS'));

ALTER SESSION SET NLS_DATE_FORMAT = 'HH24:MI:SS';

SELECT * FROM EQP_TEST;

SEQ_NUM J EQP_ID EQP_AMT DATE_TIM
---------- - ---------- ---------- --------
1 A 10 10 10:00:00
2 A 10 10 10:30:00
3 A 10 10 11:30:00
4 B 10 20 12:30:00
5 B 10 20 13:00:00
6 C 20 40 14:00:00
7 C 20 40 14:30:00
8 A 10 10 15:30:00
9 D 20 40 15:45:00

I'm looking for
3 A 10 10 11:30:00
5 B 10 20 13:00:00
7 C 20 40 14:30:00
8 A 10 10 15:30:00
9 D 20 40 15:45:00

Appreciate you input.
Thanks
Sanji

May 25, 2012 - 8:51 am UTC

I made the assumption you wanted to keep the last eqp_* column in each group


ops$tkyte%ORA11GR2> select *
  2    from (
  3  select seq_num,
  4         jtype,
  5         eqp_id,
  6         eqp_amt,
  7             decode( lead(jtype) over (order by date_time), jtype, null, '<=' ) end_of_grp,
  8         date_time
  9    from eqp_test
 10         )
 11   where end_of_grp is not null
 12   order by date_time
 13  /

   SEQ_NUM J     EQP_ID    EQP_AMT EN DATE_TIM
---------- - ---------- ---------- -- --------
         3 A         10         10 <= 11:30:00
         5 B         10         20 <= 13:00:00
         7 C         20         40 <= 14:30:00
         8 A         10         10 <= 15:30:00
         9 D         20         40 <= 15:45:00

ops$tkyte%ORA11GR2>

RE: Grouping moving set

Duke Ganote, May 24, 2012 - 8:39 pm UTC

Tom has name for this technique, although it escapes my memory at the moment. Just use the LEAD analytic function:

select e.*
from (
select e.seq_num
, e.jtype
, e.eqp_id
, e.eqp_amt
, e.date_time
, case LEAD(Jtype) OVER (order by seq_num)
when Jtype then NULL
else seq_num
end as jagged
from eqp_test e
) e
where jagged IS NOT NULL
order by seq_num
/
SEQ_NUM J EQP_ID EQP_AMT DATE_TIM JAGGED
------- - ---------- ---------- -------- ----------
3 A 10 10 11:30:00 3
5 B 10 20 13:00:00 5
7 C 20 40 14:30:00 7
8 A 10 10 15:30:00 8
9 D 20 40 15:45:00 9

Grouping moving set

Sanji, May 25, 2012 - 9:26 am UTC

Brilliant... Really appreciate all your help Tom/ Duke.

Thanks much...
Sanji

Query

A reader, May 26, 2012 - 10:42 am UTC

Hi Tom,

I need to get the PK_REF_COL column, it represent the value of the PK column, and this should be derived based on the column PARENT_FK and DIR.

For each record must find the record where the value of the column DIR is different from the value of the current record.The order of the records it is indicated by the ORDER_NO column.

PK PARENT_FK ORDER_NO DIR PK_REF_COL

10 100 1 N 12
11 100 2 N 12
12 100 3 S 13
13 100 4 N 14
14 100 5 S 15
15 100 6 N NULL

16 200 1 N 20
17 200 2 N 20
18 200 3 N 20
19 200 4 N 20
20 200 5 S 21
21 200 6 N NULL

22 300 1 S 23
23 300 2 N NULL
24 300 3 N NULL
25 300 4 N NULL

query for PK_REF_COL for reader

Biswaranjan, May 26, 2012 - 9:36 pm UTC

Hi,

@reader

############Input

create table table2(PK number,PARENT_FK number,ORDER_NO number,DIR char(1));

insert into table2 values(10 , 100 , 1, 'N');
insert into table2 values(11 , 100 , 2 , 'N' );
insert into table2 values(12 , 100 , 3 , 'S' );
insert into table2 values(13 , 100 , 4 , 'N' );
insert into table2 values(14 , 100 , 5 , 'S' );
insert into table2 values(15 , 100 ,6 , 'N' );

insert into table2 values(16 , 200 , 1 , 'N' );
insert into table2 values(17 , 200 , 2 , 'N' );
insert into table2 values(18 , 200 , 3 , 'N' );
insert into table2 values(19 , 200 , 4 , 'N' );
insert into table2 values(20 , 200 , 5 , 'S' );
insert into table2 values(21 , 200 ,6 , 'N' );

insert into table2 values(22 , 300 , 1 , 'S' );
insert into table2 values(23 , 300 , 2 , 'N' );
insert into table2 values(24 , 300 , 3 , 'N' );
insert into table2 values(25 , 300 , 4 , 'N' );

############ sql query

SELECT pk,parent_fk,order_no,dir
,decode(x,null,pk+to_number(last_value(X ignore nulls) OVER(partition by parent_fk ORDER BY ROWNUM)-order_no),to_char(pk+1)) PK_REF_COL
FROM(
SELECT table2.*
,case when lead(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir
or lag(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir
then null
when lag(dir) over (PARTITION BY PARENT_FK order by ORDER_NO desc) is null
then null
else order_no
end X
FROM table2) ask_all order by parent_fk,order_no;

########################output/result
PK PARENT_FK ORDER_NO D PK_REF_COL
---------- ---------- ---------- - ----------
10 100 1 N 12
11 100 2 N 12
12 100 3 S 13
13 100 4 N 14
14 100 5 S 15
15 100 6 N
16 200 1 N 20
17 200 2 N 20
18 200 3 N 20
19 200 4 N 20
20 200 5 S 21
21 200 6 N
22 300 1 S 23
23 300 2 N
24 300 3 N
25 300 4 N

16 rows selected.

@reader
Please always provide the create table and insert queries so
Tom or others can make use of them without wasting time for
creating and inserting by their own.
If you are new to this asktom then ok but please make a habit to do that. :) .

Thanks & Regards,
Biswaranjan.

continuation to last posted result.

Biswaranjan, May 26, 2012 - 10:00 pm UTC

Hi,
@reader

I just removed the to_number things from my above posted query .
#############

SELECT pk,parent_fk,order_no,dir
,decode(x,null,pk+last_value(X ignore nulls) OVER(partition by parent_fk ORDER BY ROWNUM)-order_no,pk+1) PK_REF_COL
FROM(
SELECT table2.*
,case when lead(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir
or lag(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir
then null
when lag(dir) over (PARTITION BY PARENT_FK order by ORDER_NO desc) is null
then null
else order_no
end X
FROM table2) order by parent_fk,order_no;

#########
regards,
Biswaranjan

query

A reader, May 26, 2012 - 10:11 pm UTC

Thanks, I will..

Biswaranjan, May 27, 2012 - 7:38 am UTC

I was just checking with different input for my above posted query for pk_ref_col result.
I missed the logic slightly.

test cases with diff inputs########
SQL> select * from table2;

        PK  PARENT_FK   ORDER_NO D
---------- ---------- ---------- -
        10        100          1 N
        11        100          2 N
        12        100          3 S
        13        100          4 N
        14        100          5 N
        15        100          6 S
        16        200          1 N
        17        200          2 N
        18        200          3 N
        19        200          4 N
        20        200          5 S
        21        200          6 N
        22        300          1 S
        23        300          2 N
        24        300          3 S
        25        300          4 N

16 rows selected.

SQL> SELECT pk,parent_fk,order_no,dir
  2  ,decode(x,null,pk+last_value(X ignore nulls) OVER(partition by parent_fk ORDER BY 
  3  ROWNUM)-order_no,pk+1) PK_REF_COL
  4  FROM(
  5       SELECT table2.*
  6              ,case when lead(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir 
  7                    or lag(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir
  8                        then null
  9                    when lag(dir) over (PARTITION BY PARENT_FK order by ORDER_NO desc) is null 
 10                        then null
 11                    else order_no 
 12                    end   X 
 13        FROM table2)  order by parent_fk,order_no;

        PK  PARENT_FK   ORDER_NO D PK_REF_COL
---------- ---------- ---------- - ----------
        10        100          1 N         12
        11        100          2 N         12
        12        100          3 S         13
        13        100          4 N
        14        100          5 N
        15        100          6 S
        16        200          1 N         20
        17        200          2 N         20
        18        200          3 N         20
        19        200          4 N         20
        20        200          5 S         21
        21        200          6 N
        22        300          1 S         23
        23        300          2 N         24
        24        300          3 S         25
        25        300          4 N

16 rows selected.

SO above result was not giving the expected result.

SO I rewrite the query ,with many inputs I tried and found no error.

@reader Please ignore the last posted query . Make use of following query.

###############
SQL> SELECT pk,parent_fk,order_no,dir
  2  ,case when x is null 
  3       then  pk+last_value(X ignore nulls) OVER(partition by parent_fk ORDER BY order_no desc)-order_no
  4       when s =1
  5       then null
  6       else 
  7       pk+1
  8       end pk_ref_col
  9  FROM(
 10       SELECT table2.*
 11              ,row_number() over (PARTITION BY PARENT_FK order by ORDER_NO desc) s
 12              ,case when lead(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir 
 13                    or lag(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir
 14                    then null
 15                    else order_no 
 16                    end   X 
 17        FROM table2)  order by parent_fk,order_no;

        PK  PARENT_FK   ORDER_NO D PK_REF_COL
---------- ---------- ---------- - ----------
        10        100          1 N         12
        11        100          2 N         12
        12        100          3 S         13
        13        100          4 N         15
        14        100          5 N         15
        15        100          6 S
        16        200          1 N         20
        17        200          2 N         20
        18        200          3 N         20
        19        200          4 N         20
        20        200          5 S         21
        21        200          6 N
        22        300          1 S         23
        23        300          2 N         24
        24        300          3 S         25
        25        300          4 N

16 rows selected.


Thanks & Regards,
Biswaranjan.

query

A reader, May 27, 2012 - 1:22 pm UTC

Hi Biswaranjan,

The pk column is not a consecutive number is random, but it is unique.

query

A reader, May 27, 2012 - 1:24 pm UTC

Biswaranjan,

I mean in the example it is , but it could be randomly.

query for PK_REF_COL for reader for consecutive or non consecutive pk

Biswaranjan, May 27, 2012 - 3:59 pm UTC

SQL> select * from table1;

        PK  PARENT_FK   ORDER_NO D
---------- ---------- ---------- -
        10        100          1 N
        11        100          2 N
        13        100          3 S
        14        100          4 N
        15        100          5 S
        16        100          6 N
        16        200          1 N
        17        200          2 N
        18        200          3 N
        19        200          4 N
        20        200          5 S
        21        200          6 N
        22        300          1 S
        24        300          2 N
        25        300          3 N
        28        300          4 N

16 rows selected.

SQL> SELECT pk,parent_fk,order_no,dir
  2      ,case when x is null 
  3           then last_value(X ignore nulls) OVER(partition by parent_fk ORDER BY order_no desc) 
  4           when s =1
  5           then null
  6           else 
  7           lag(pk) over(partition by parent_fk order by order_no desc)
  8           end pk_ref_col
  9      FROM(
 10          SELECT table1.*
 11                 ,row_number() over (PARTITION BY PARENT_FK order by ORDER_NO desc) s
 12                 ,case when lead(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir 
 13                       or lag(DIR) over (PARTITION BY PARENT_FK order by ORDER_NO desc)=dir
 14                       then null
 15                       else pk
 16                       end   X 
 17           FROM table1)  order by parent_fk,order_no;

        PK  PARENT_FK   ORDER_NO D PK_REF_COL
---------- ---------- ---------- - ----------
        10        100          1 N         13
        11        100          2 N         13
        13        100          3 S         14
        14        100          4 N         15
        15        100          5 S         16
        16        100          6 N
        16        200          1 N         20
        17        200          2 N         20
        18        200          3 N         20
        19        200          4 N         20
        20        200          5 S         21
        21        200          6 N
        22        300          1 S         24
        24        300          2 N
        25        300          3 N
        28        300          4 N

16 rows selected.

SQL> 

@reader you can make user of above query , Please check with 
various input and let us know if it works or not(I tested and worked fine).

Thanks & Regards,
Biswaranja.

results that never changed over certain period of time

Ashrf, June 18, 2012 - 5:32 am UTC

create table emp_salaries (emp_id number, r_year number(4), r_month number(2), r_code number, r_value number);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 8, 102, 10187.097);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 8, 221, 516.129);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 8, 597, 6000);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 8, 598, 3783.871);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 9, 102, 840);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 9, 221, 46.667);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 10, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 10, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 11, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 11, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 12, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2010, 12, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 1, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 1, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 2, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 2, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 3, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 3, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 4, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 4, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 5, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 5, 199, 918.333);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 5, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 6, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 6, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 6, 599, 918.333);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 7, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 7, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 8, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 8, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 9, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 9, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 10, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 10, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 11, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 11, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 12, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2011, 12, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 1, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 1, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 2, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 2, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 3, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 3, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 4, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 4, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 5, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 5, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 6, 102, 900);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 6, 221, 50);
insert into emps_salaries
(emp_id, r_year, r_month, r_code, r_value)
values
(1000000, 2012, 6, 233, 150);
commit;
I want to write a SQL statement deciding if a salary changed over a certain period of time which is determined as a bind variables (i.e. did salary change in the period 10/2010 till 04/2011). THANKS

June 18, 2012 - 9:09 am UTC

you could have made the example a tad bit shorter.

but this is trivial

select count(distinct r_value) from table where date between :x and :y;

if count > 1, it changed.

Help/guidance writing analytical function SQL

A reader, June 28, 2012 - 8:03 am UTC

DROP TABLE SK_T;

CREATE TABLE SK_T
(ITEM                VARCHAR2(25) NOT NULL,
 START_DATE          DATE       NOT NULL,
 END_DATE            DATE,
 PRICE_TYPE          VARCHAR2(20),
 PRICE               NUMBER(10,2));

INSERT INTO SK_T
VALUES
('123456', TO_DATE('07/01/2012', 'MM/DD/RRRR'), NULL, 'REGULAR', 2.00);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('08/01/2012', 'MM/DD/RRRR'), NULL, 'REGULAR', 2.5);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('09/01/2012', 'MM/DD/RRRR'), NULL, 'REGULAR', 2.75);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('07/15/2012', 'MM/DD/RRRR'), TO_DATE('07/22/2012', 'MM/DD/RRRR'), 'SALE', 1.75);

Is it possible to generate the following using just SQL. The Table can have 3-4 million rows.

ITEM       START_DATE      END_DATE    PRICE_TYPE    PRICE
123456     07/01/2012      07/14/2012  REGULAR       2.00
123456     07/15/2012      07/22/2012  SALE          1.75
123456     07/23/2012      07/31/2012  REGULAR       2.00
123456     08/01/2012      08/31/2012  REGULAR       2.50
123456     09/01/2012      NULL        REGULAR       1.75

In the table Regular Priced items will only have the Start Date and Sale Priced will have a Start Date and End Date.
If the Regular Priced Item's date range falls between a Sale priced item's date range then I have to generate a line for Start date and End Date and then a new row after the sale ends.

June 29, 2012 - 9:17 am UTC

explain your logic thoroughly - precisely.

and if this has to work for multiple items, by all means expand your test case to be inclusive of multiple items so one can check for edge conditions.

but most importantly - explain this to death. where did that jul-23-jul31 record magically appear from. where did the 1.75 come from for september. detail your logic.

Help/guidance writing analytical function SQL

A reader, June 28, 2012 - 8:43 am UTC

Continuation to my previous question.........

Using Lead() I can generate the End Date value for Regular Items, my problem is generating new lines for the overlapping rows

ITEM START_DATE END_DATE PRICE_TYPE PRICE
123456 07/01/2012 07/31/2012 REGULAR 2.00
123456 07/15/2012 07/22/2012 SALE 1.75
123456 07/26/2012 07/30/2012 SALE 1.70

Should be converted to
ITEM START_DATE END_DATE PRICE_TYPE PRICE
123456 07/01/2012 07/31/2012 REGULAR 2.00
123456 07/15/2012 07/22/2012 SALE 1.75
123456 07/23/2012 07/25/2012 REGULAR 2.00
123456 07/26/2012 07/30/2012 SALE 1.70
123456 07/31/2012 07/31/2012 REGULAR 2.00

June 29, 2012 - 9:22 am UTC

why do you have overlapping ranges? what if you have multiple overlapping ranges? This doesn't make sense - to store anything like that. Are you basically saying "sale over overrides regular" and do you guarantee that ONLY ONE sale will override any given regular???

Help/guidance writing analytical function SQL

A reader, June 29, 2012 - 3:46 pm UTC

The overlapping date range in the above example was a typing error. Sorry for the confusion.

We have a table that stores the regular prices of the item, It has item_id, Start Date(/Effective Date) for the price and the Price.
Example
ITEM START_DATE PRICE
123456 08/01/2012 2.50
123456 09/01/2012 2.75
The regular Price is valid until the next Start Date/Effective date.
That means from Aug 01, 2012 thru Aug 31, 2012 the Price is $2.50 and from Sep 01 the Price will be $2.75. Sep 01, 2012's price does not have an End date because there no Start Date greater than Sep 01,2012.

There is another table than has Sale/Special Prices on that item. That table has Item Id, Start date, End date and the Price.
Start date and End date is NOT NULL and there won't be an overlap in the date range for the same item.
Example
ITEM START_DATE END_DATE PRICE_TYPE PRICE
123456 08/01/2012 08/06/2012 SALE 1.50
123456 08/21/2012 08/28/2012 SALE 1.75
That means the Price on the Item is $1.50 from Aug 01, 2012 thru Aug 06, 2012 and then reverts back to the Regular Price after the Sale Period ends.

I need to combine the following two tables and generate the Price with the date Ranges that gives me the Price for the Item and it's correspoding date ranges.
Please note that after the Sale Price period ends, the Price Should revert it back to regular price
For the above example we should get the following output
ITEM START_DATE END_DATE PRICE_TYPE PRICE
123456 08/01/2012 08/06/2012 SALE 1.50
123456 08/07/2012 08/20/2012 REGULAR 2.50
123456 08/21/2012 08/28/2012 SALE 1.75
123456 08/29/2012 08/31/2012 REGULAR 2.50
123456 09/01/2012 NULL REGULAR 2.75

I am resending the test script again with more data for two items.
Also, I am attaching a SQL I was able to come up with which gives the output in the above format. I don't know whether it's efficient or not.

DROP TABLE SK_T;

CREATE TABLE SK_T
(ITEM VARCHAR2(25) NOT NULL,
START_DATE DATE NOT NULL,
END_DATE DATE,
PRICE_TYPE VARCHAR2(20),
PRICE NUMBER(10,2));

INSERT INTO SK_T
VALUES
('123456', TO_DATE('07/15/2012', 'MM/DD/RRRR'), TO_DATE('07/22/2012', 'MM/DD/RRRR'), 'SALE', 1.50);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('07/23/2012', 'MM/DD/RRRR'), TO_DATE('07/26/2012', 'MM/DD/RRRR'), 'SALE', 1.25);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('08/01/2012', 'MM/DD/RRRR'), TO_DATE('08/06/2012', 'MM/DD/RRRR'), 'SALE', 1.50);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('08/21/2012', 'MM/DD/RRRR'), TO_DATE('08/28/2012', 'MM/DD/RRRR'), 'SALE', 1.75);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('09/21/2012', 'MM/DD/RRRR'), TO_DATE('09/28/2012', 'MM/DD/RRRR'), 'SALE', 1.00);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('07/01/2012', 'MM/DD/RRRR'), NULL, 'REGULAR', 2.00);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('08/01/2012', 'MM/DD/RRRR'), NULL, 'REGULAR', 2.5);

INSERT INTO SK_T
VALUES
('123456', TO_DATE('09/01/2012', 'MM/DD/RRRR'), NULL, 'REGULAR', 2.75);

INSERT INTO SK_T
VALUES
('23456', TO_DATE('07/17/2012', 'MM/DD/RRRR'), TO_DATE('07/24/2012', 'MM/DD/RRRR'), 'SALE', 7.50);

INSERT INTO SK_T
VALUES
('23456', TO_DATE('09/01/2012', 'MM/DD/RRRR'), TO_DATE('09/07/2012', 'MM/DD/RRRR'), 'SALE', 8);

INSERT INTO SK_T
VALUES
('23456', TO_DATE('06/01/2012', 'MM/DD/RRRR'), NULL, 'REGULAR', 11.50);

INSERT INTO SK_T
VALUES
('23456', TO_DATE('07/25/2012', 'MM/DD/RRRR'), NULL, 'REGULAR', 10.75);

INSERT INTO SK_T
VALUES
('23456', TO_DATE('10/01/2012', 'MM/DD/RRRR'), NULL, 'REGULAR', 11.25);

COMMIT;

WITH a AS --Using this to generate Regular Price row after the Sale Ends.
(SELECT LEVEL L
FROM DUAL
CONNECT BY LEVEL <= 2),
I AS
(SELECT ITEM,
START_DATE,
(CASE WHEN PRICE_TYPE='SALE' THEN
END_DATE
WHEN END_DATE > LEAD((CASE WHEN PRICE_TYPE='SALE' THEN START_DATE END),1) OVER (PARTITION BY ITEM ORDER BY START_DATE) THEN
LEAD((CASE WHEN PRICE_TYPE='SALE' THEN START_DATE END),1) OVER (PARTITION BY ITEM ORDER BY START_DATE)-1
ELSE
END_DATE
END) END_DATE,
LAG(PRICE_TYPE,1) OVER (PARTITION BY ITEM ORDER BY START_DATE) PREV_PRICE_TYPE,
LEAD(PRICE_TYPE,1) OVER (PARTITION BY ITEM ORDER BY START_DATE) NEXT_PRICE_TYPE,
(CASE WHEN PRICE_TYPE='SALE' THEN
LAG((CASE WHEN PRICE_TYPE='REGULAR' THEN END_DATE END),1) IGNORE NULLS OVER (PARTITION BY ITEM ORDER BY START_DATE,ORD_TYPE)
ELSE
END_DATE
END) ORIG_END_DATE,
(CASE WHEN PRICE_TYPE='SALE' THEN
LAG((CASE WHEN PRICE_TYPE='REGULAR' THEN PRICE END),1) IGNORE NULLS OVER (PARTITION BY ITEM ORDER BY START_DATE, ORD_TYPE)
END) ORIG_PRICE,
PRICE_TYPE,
PRICE,
ORD_TYPE,
LEAD((CASE WHEN PRICE_TYPE='SALE' THEN START_DATE END),1) OVER (PARTITION BY ITEM ORDER BY START_DATE,ORD_TYPE) NEXT_SALE_START_DATE
FROM (SELECT ITEM,
START_DATE,
NVL((CASE WHEN END_DATE IS NOT NULL THEN
END_DATE
ELSE
(LEAD(START_DATE,1) IGNORE NULLS
OVER (PARTITION BY ITEM, PRICE_TYPE
ORDER BY START_DATE))-1
END), TO_DATE('31-DEC-5099', 'DD-MON-YYYY')) END_DATE,
PRICE_TYPE,
PRICE,
DECODE(PRICE_TYPE, 'REGULAR', 1, 2) ORD_TYPE
FROM SK_T
ORDER BY 1,2)),
items AS
(SELECT ITEM, START_DATE, END_DATE, PRICE_TYPE, PRICE, ORIG_END_DATE, PREV_PRICE_TYPE, NEXT_PRICE_TYPE, ORIG_PRICE, L,ORD_TYPE, NEXT_SALE_START_DATE
FROM A, I
WHERE (L=1
OR (PRICE_TYPE='SALE'
AND END_DATE < ORIG_END_DATE
AND ((PREV_PRICE_TYPE ='REGULAR' AND NEXT_PRICE_TYPE='REGULAR')
OR (PREV_PRICE_TYPE ='SALE' AND NEXT_PRICE_TYPE='REGULAR')
OR NEXT_SALE_START_DATE-1 != END_DATE
OR NEXT_PRICE_TYPE IS NULL)))),
qry AS
(SELECT ITEM,
(CASE WHEN L= 2 THEN
(LAST_VALUE(END_DATE) OVER (PARTITION BY ITEM ORDER BY START_DATE,ORD_TYPE, LEVEL))+1
ELSE
START_DATE
END) START_DATE,
(CASE WHEN L= 2 THEN
(CASE WHEN (NEXT_SALE_START_DATE-1) < ORIG_END_DATE THEN
NEXT_SALE_START_DATE-1
ELSE
ORIG_END_DATE
END)
ELSE
END_DATE
END) END_DATE,
(CASE WHEN L= 2 THEN
'REGULAR'
ELSE
PRICE_TYPE
END) PRICE_TYPE,
(CASE WHEN L= 2 THEN
ORIG_PRICE
ELSE
PRICE
END) PRICE
FROM items)
SELECT item, start_date, end_date, price_type, price
FROM qry
WHERE start_date <= NVL(end_date,start_date)
ORDER BY 1,2;

July 02, 2012 - 6:52 am UTC

There is another table than has Sale/Special Prices on that item.

then why the heck does your example have but one table??? why would you do that - why would you say "i have two" and then give me but one???????? How can someone develop an answer for you if you do stuff like that????? Your real life problem has nothing to do with your test case???

I doubt your query works if there are two sales in the window of a regular price, that is, if you have a regular price for the month of august, but it goes on sale on the 5th-10th and the 20th-25th. You only produce two rows of a regular price - you might need N rows.

when you run this query:

a) will you do this for an item or for N items (where N is generally what number) or all items?

b) how large do you anticipate the date ranges you have to deal with being? weeks, months, years?

c) please fix your example to model REALITY, if you have two tables - please use that in your example - it would be a waste of my time to provide a solution that doesn't work in real life.

Help/guidance writing analytical function SQL

A reader, July 02, 2012 - 8:37 am UTC

The reason I didn't give the example with two tables was because I thought it's only a UNION ALL statementbetween the two tables and didn't want to confuse you with lot of details

a) will you do this for an item or for N items (where N is generally what number) or all items?
The query is all Items in the tables(SK_REG_TEMP and SK_SALE_TEMP)
There can be items in SK_REG_TEMP and not in SK_SALE_TEMP and vice versa.

b) how large do you anticipate the date ranges you have to deal with being? weeks, months, years?
Most of the date ranges will be in weeks, but there will be few that are in months(4-6 months)

c) please fix your example to model REALITY, if you have two tables - please use that in your example - it would be a waste of my time to provide a solution that doesn't work in real life.
Please find th updated test case below.
And Sorry for the confusion.

DROP TABLE SK_REG_TEMP;
DROP TABLE SK_SALE_TEMP;

CREATE TABLE SK_REG_TEMP
(ITEM                VARCHAR2(25) NOT NULL,
 START_DATE          DATE       NOT NULL,
 PRICE               NUMBER(10,2));
 
CREATE TABLE SK_SALE_TEMP
(ITEM                VARCHAR2(25) NOT NULL,
 START_DATE          DATE       NOT NULL,
 END_DATE            DATE       NOT NULL,
 PRICE               NUMBER(10,2));
 
INSERT INTO SK_REG_TEMP
(ITEM, START_DATE, PRICE)
VALUES
('ITEM1', TO_DATE('07/01/2012', 'MM/DD/RRRR'), 2.00);

INSERT INTO SK_REG_TEMP
(ITEM, START_DATE, PRICE)
VALUES
('ITEM1', TO_DATE('08/15/2012', 'MM/DD/RRRR'), 2.50);

INSERT INTO SK_REG_TEMP
(ITEM, START_DATE, PRICE)
VALUES
('ITEM1', TO_DATE('12/15/2012', 'MM/DD/RRRR'), 2.75);

INSERT INTO SK_REG_TEMP
(ITEM, START_DATE, PRICE)
VALUES
('ITEM2', TO_DATE('07/15/2012', 'MM/DD/RRRR'), 25.00);

INSERT INTO SK_REG_TEMP
(ITEM, START_DATE, PRICE)
VALUES
('ITEM2', TO_DATE('01/15/2013', 'MM/DD/RRRR'), 24.00);

INSERT INTO SK_REG_TEMP
(ITEM, START_DATE, PRICE)
VALUES
('ITEM3', TO_DATE('07/23/2012', 'MM/DD/RRRR'), 15.00);

INSERT INTO SK_REG_TEMP
(ITEM, START_DATE, PRICE)
VALUES
('ITEM4', TO_DATE('07/12/2012', 'MM/DD/RRRR'), 31.00);

INSERT INTO SK_REG_TEMP
(ITEM, START_DATE, PRICE)
VALUES
('ITEM4', TO_DATE('01/15/2013', 'MM/DD/RRRR'), 33.00);


INSERT INTO SK_SALE_TEMP
(ITEM, START_DATE, END_DATE, PRICE)
VALUES
('ITEM1', TO_DATE('07/08/2012', 'MM/DD/RRRR'), TO_DATE('07/15/2012', 'MM/DD/RRRR'),  1.50);

INSERT INTO SK_SALE_TEMP
(ITEM, START_DATE, END_DATE, PRICE)
VALUES
('ITEM1', TO_DATE('07/23/2012', 'MM/DD/RRRR'), TO_DATE('07/30/2012', 'MM/DD/RRRR'),  1.75);

INSERT INTO SK_SALE_TEMP
(ITEM, START_DATE, END_DATE, PRICE)
VALUES
('ITEM1', TO_DATE('08/15/2012', 'MM/DD/RRRR'), TO_DATE('01/15/2013', 'MM/DD/RRRR'),  1.90);


INSERT INTO SK_SALE_TEMP
(ITEM, START_DATE, END_DATE, PRICE)
VALUES
('ITEM2', TO_DATE('07/19/2012', 'MM/DD/RRRR'), TO_DATE('07/26/2012', 'MM/DD/RRRR'),  22.00);

INSERT INTO SK_SALE_TEMP
(ITEM, START_DATE, END_DATE, PRICE)
VALUES
('ITEM2', TO_DATE('09/23/2012', 'MM/DD/RRRR'), TO_DATE('09/30/2012', 'MM/DD/RRRR'),  24.00);


INSERT INTO SK_SALE_TEMP
(ITEM, START_DATE, END_DATE, PRICE)
VALUES
('ITEM3', TO_DATE('08/14/2012', 'MM/DD/RRRR'), TO_DATE('08/21/2012', 'MM/DD/RRRR'),  12.00);

INSERT INTO SK_SALE_TEMP
(ITEM, START_DATE, END_DATE, PRICE)
VALUES
('ITEM5', TO_DATE('10/20/2012', 'MM/DD/RRRR'), TO_DATE('10/27/2012', 'MM/DD/RRRR'),  15.70);

INSERT INTO SK_SALE_TEMP
(ITEM, START_DATE, END_DATE, PRICE)
VALUES
('ITEM5', TO_DATE('12/21/2012', 'MM/DD/RRRR'), TO_DATE('02/15/2013', 'MM/DD/RRRR'),  14.70);

COMMIT;

Desired Output........

ITEM       START_DATE      END_DATE    PRICE_TYPE    PRICE   
ITEM1      07/01/2012      07/07/2012  REG           2.00  
ITEM1      07/08/2012      07/15/2012  SALE          1.50 
ITEM1      07/16/2012      07/22/2012  REG           2.00  
ITEM1      07/23/2012      07/30/2012  SALE          1.75
ITEM1      07/31/2012      08/14/2012  REG           2.00 
ITEM1      08/15/2012      01/15/2013  SALE          1.90
ITEM1      01/16/2013      NULL        REG           2.75

ITEM2      07/15/2012      07/18/2012  REG           25.00
ITEM2      07/19/2012      07/26/2012  SALE          22.00
ITEM2      07/27/2012      09/22/2012  REG           25.00
ITEM2      09/23/2012      09/30/2012  SALE          24.00
ITEM2      10/01/2012      01/14/2013  REG           25.00
ITEM2      01/15/2013      NULL        REG           24.00

ITEM3      07/23/2012      08/13/2012  REG           15.00
ITEM3      08/14/2012      08/21/2012  SALE          12.00
ITEM3      08/22/2012      NULL        REG           15.00

ITEM4      07/12/2012      01/14/2013  REG           31.00
ITEM4      01/15/2013      NULL        REG           33.00

ITEM5      10/20/2012      10/27/2012  SALE          15.70
ITEM5      12/21/2012      02/15/2013  SALE          14.70

July 02, 2012 - 12:52 pm UTC

you might consider a data model more suited to representing this data relationally....

It isn't pretty, but I'm pretty sure it works.

The concept it to turn the ranges into rows. for the regular price we have to figure out the end date - which is easy for all of the rows except the last row in the set by items. That "end date" will be the greatest of a) the last observed regular price, or b) the last observed sales price for that item.

Once we have the end date - it is easy to turn the single row into a row per day per record.

Same with sale date

then just join (by_date)

and then do a bit of grouping to get the ranges back again and a fancy case at the end to figure out whether to print out the end date or leave it null

ops$tkyte%ORA11GR2> with reg
  2  as
  3  (
  4  select item, start_date,
  5         nvl( lead(start_date-1) over (partition by item order by start_date),
  6              greatest( r.start_date+1,
  7                        (select max(s.end_date+1)
  8                           from sk_sale_temp s
  9                          where s.item = r.item)) ) end_date,
 10         price
 11    from sk_reg_temp r
 12  ),
 13  reg_by_date
 14  as
 15  (
 16  select item, start_date+column_value curr_date, start_date, end_date,
 17         price reg_price, cast( null as number ) sale_price, 'R' price_type
 18     from reg,
 19     TABLE( cast(multiset(select level-1 from dual
 20             connect by level <= end_date-start_date+1) as sys.odciNumberList) )
 21  ),
 22  sale_by_date
 23  as
 24  (select item, start_date+column_value curr_date, start_date, end_date,
 25          cast( null as number ) reg_price, price sale_price, 'S' price_type
 26     from sk_sale_temp,
 27     TABLE( cast(multiset(select level-1 from dual
 28             connect by level <= end_date-start_date+1) as sys.odciNumberList) )
 29  ),
 30  by_date
 31  as
 32  (
 33  select item, curr_date,
 34         case when max(price_type) = 'R'
 35              then max(reg_price)
 36              else max(sale_price)
 37          end the_price,
 38          max(price_type) the_price_type
 39    from (
 40  select *
 41    from reg_by_date
 42   union all
 43  select *
 44    from sale_by_date
 45         )
 46   group by item, curr_date
 47  )
 48  select item, sdate,
 49         case when the_price_type <> 'R' or edate <> last_date
 50              then edate
 51          end edate, the_price_type, the_price
 52    from (
 53  select item, min(curr_date) sdate, max(curr_date) edate, the_price, the_price_type,
 54         max( max(curr_date) ) over (partition by item) last_date
 55    from (
 56  select item, curr_date, the_price, the_price_type, count(flag) over (partition by item order by curr_date) grp
 57    from (
 58  select item, curr_date, the_price, the_price_type,
 59         case when lag(curr_date) over (partition by item order by curr_date) <> curr_date-1
 60                or lag(the_price) over (partition by item order by curr_date) <> the_price
 61                or lag(the_price_type) over (partition by item order by curr_date) <> the_price_type
 62              then 1
 63          end flag
 64    from by_date
 65         )
 66         )
 67   group by item, the_price, the_price_type, grp
 68         )
 69   order by 1, 2
 70  /

ITEM                      SDATE     EDATE     T  THE_PRICE
------------------------- --------- --------- - ----------
ITEM1                     01-JUL-12 07-JUL-12 R          2
                          08-JUL-12 15-JUL-12 S        1.5
                          16-JUL-12 22-JUL-12 R          2
                          23-JUL-12 30-JUL-12 S       1.75
                          31-JUL-12 14-AUG-12 R          2
                          15-AUG-12 15-JAN-13 S        1.9
                          16-JAN-13           R       2.75

ITEM2                     15-JUL-12 18-JUL-12 R         25
                          19-JUL-12 26-JUL-12 S         22
                          27-JUL-12 22-SEP-12 R         25
                          23-SEP-12 30-SEP-12 S         24
                          01-OCT-12 14-JAN-13 R         25
                          15-JAN-13           R         24

ITEM3                     23-JUL-12 13-AUG-12 R         15
                          14-AUG-12 21-AUG-12 S         12
                          22-AUG-12           R         15

ITEM4                     12-JUL-12 14-JAN-13 R         31
                          15-JAN-13           R         33

ITEM5                     20-OCT-12 27-OCT-12 S       15.7
                          21-DEC-12 15-FEB-13 S       14.7


20 rows selected.

Help/guidance writing analytical function SQL

A reader, July 02, 2012 - 9:14 am UTC

The following query looks like it's working for the above test data. I have two sale data in the regular price window for item ITEM1(07/08/2012 to 07/15/2012 and 07/23/2012 to 07/30/2012). I am producing two rows for every Sale row.
It looks ugly and I don't know if it's efficient.

WITH a AS --Using this to generate Regular Price row after the Sale Ends.
(SELECT LEVEL L
 FROM DUAL
 CONNECT BY LEVEL <= 2),
STG AS
(SELECT ITEM, START_DATE, TO_DATE(NULL) END_DATE, 'REGULAR' PRICE_TYPE, PRICE
 FROM SK_REG_TEMP
 UNION ALL
 SELECT ITEM, START_DATE, END_DATE, 'SALE' PRICE_TYPE, PRICE
 FROM SK_SALE_TEMP),
I AS 
(SELECT ITEM,  
        (CASE WHEN PRICE_TYPE='SALE' THEN 
                 START_DATE
              WHEN RN > 1 AND START_DATE < LAG((CASE WHEN PRICE_TYPE='SALE' THEN END_DATE +1 END),1, SYSDATE-3000) OVER (PARTITION BY ITEM ORDER BY END_DATE) THEN
                 LAG((CASE WHEN PRICE_TYPE='SALE' THEN END_DATE +1 END),1) OVER (PARTITION BY ITEM ORDER BY END_DATE)
              ELSE
                 START_DATE
         END) START_DATE,
        (CASE WHEN PRICE_TYPE='SALE' THEN 
                 END_DATE
              WHEN END_DATE > LEAD((CASE WHEN PRICE_TYPE='SALE' THEN START_DATE-1 END),1) OVER (PARTITION BY ITEM ORDER BY START_DATE) THEN
                 LEAD((CASE WHEN PRICE_TYPE='SALE' THEN START_DATE-1 END),1) OVER (PARTITION BY ITEM ORDER BY START_DATE)
              ELSE
                 END_DATE
         END) END_DATE,
        LAG(PRICE_TYPE,1) OVER (PARTITION BY  ITEM  ORDER BY START_DATE) PREV_PRICE_TYPE,
        LEAD(PRICE_TYPE,1) OVER (PARTITION BY  ITEM  ORDER BY START_DATE) NEXT_PRICE_TYPE,
        (CASE WHEN PRICE_TYPE='SALE' THEN 
                 LAG((CASE WHEN PRICE_TYPE='REGULAR' THEN END_DATE END),1) IGNORE NULLS OVER (PARTITION BY  ITEM  ORDER BY  START_DATE,ORD_TYPE)
              ELSE
                 END_DATE
         END) ORIG_END_DATE,
        (CASE WHEN PRICE_TYPE='SALE' THEN 
                 LAG((CASE WHEN PRICE_TYPE='REGULAR' THEN PRICE END),1) IGNORE NULLS OVER (PARTITION BY  ITEM  ORDER BY START_DATE, ORD_TYPE)
         END) ORIG_PRICE,
        PRICE_TYPE,
        PRICE,
        ORD_TYPE,
        LEAD((CASE WHEN PRICE_TYPE='SALE' THEN START_DATE END),1) OVER (PARTITION BY ITEM ORDER BY START_DATE,ORD_TYPE) NEXT_SALE_START_DATE
 FROM (SELECT ITEM, 
              START_DATE, 
              NVL((CASE WHEN END_DATE IS NOT NULL THEN
                       END_DATE
                    ELSE
                       (LEAD(START_DATE,1) IGNORE NULLS 
                           OVER (PARTITION BY ITEM, PRICE_TYPE 
                                 ORDER BY START_DATE))-1
               END), TO_DATE('31-DEC-5099', 'DD-MON-YYYY')) END_DATE,
              PRICE_TYPE,
              PRICE,
              DECODE(PRICE_TYPE, 'REGULAR', 1, 2) ORD_TYPE,
              ROW_NUMBER() OVER (PARTITION BY ITEM ORDER BY START_DATE) RN
       FROM STG
       ORDER BY 1,2)),
items AS
(SELECT ITEM, START_DATE, END_DATE, PRICE_TYPE, PRICE, ORIG_END_DATE, PREV_PRICE_TYPE, NEXT_PRICE_TYPE, ORIG_PRICE, L,ORD_TYPE, NEXT_SALE_START_DATE
 FROM A, I
 WHERE (L=1
        OR (PRICE_TYPE='SALE' 
             AND END_DATE < ORIG_END_DATE 
         AND ((PREV_PRICE_TYPE ='REGULAR' AND NEXT_PRICE_TYPE='REGULAR') 
                OR (PREV_PRICE_TYPE ='SALE' AND NEXT_PRICE_TYPE='REGULAR')
                OR  NEXT_SALE_START_DATE-1 != END_DATE
                OR  NEXT_PRICE_TYPE IS NULL)))),
qry AS  
(SELECT ITEM, 
        (CASE WHEN L= 2 THEN
             (LAST_VALUE(END_DATE) OVER (PARTITION BY  ITEM  ORDER BY START_DATE,ORD_TYPE, LEVEL))+1
          ELSE
             START_DATE
     END) START_DATE, 
    (CASE WHEN L= 2 THEN
             (CASE WHEN (NEXT_SALE_START_DATE-1) < ORIG_END_DATE THEN
                      NEXT_SALE_START_DATE-1
                   ELSE
                      ORIG_END_DATE
              END)
          ELSE
             END_DATE
     END) END_DATE,  
    (CASE WHEN L= 2 THEN
             'REGULAR'
          ELSE
             PRICE_TYPE
     END) PRICE_TYPE, 
    (CASE WHEN L= 2 THEN
             ORIG_PRICE
          ELSE
             PRICE
     END) PRICE 
 FROM items)
SELECT item, start_date, end_date, price_type, price
FROM qry
WHERE start_date <= NVL(end_date,start_date)
ORDER BY 1,2;

Help/guidance writing analytical function SQL

A reader, July 02, 2012 - 2:58 pm UTC

Thanks a lot for the SQL.

The SQL I wrote was not dividing the regular and sales rows by date, but trying to find the end_date+1 from the Sales row to get the next regular price row.
Your SQL is much more cleaner and easier to understand. Thanks again for the nice SQL.

Is there any difference between (CAST NULL AS NUMBER) and TO_NUMBER(NULL)?

July 03, 2012 - 8:04 am UTC

cast(null as number) and to_number(null) lead to pretty much the same effect. I just personally find cast( null as number ) to be cleaner to read.

Help/guidance writing analytical function SQL

A reader, July 02, 2012 - 3:13 pm UTC

You had mentioned earlier
"you might consider a data model more suited to representing this data relationally.... "

It's not possible for me to change the data model right now.

If you were to do the data model, how would you have done this

Requirement
1) Maintain Regular Price of the Items with effective/start date.
2) Maintain a sale price. The sale price will have a start and end date.
The sale price can be a fixed price($2.25) or a percentage off the retail(25% off retail) or fixed dollars off the retail($0.50 off the retail)

July 03, 2012 - 8:19 am UTC

The requirement can be more readily stated as:

maintain the correct price of something during a period of time.

I probably would have gone with:

item, start_date, end_date, price, description

where description was 'regular', 'sales'. Single table, non-overlapping ranges.

In short, the table would have been your report, your report would have been "select * from table order by item, start_date"

Analytics or not

Tony Fernandez, August 01, 2012 - 3:05 pm UTC

Dear Tom,

We have this query we are trying to implement using analytic functions but the cost and statistics increase dramatically, tried "with" in the select statament to "prepare data", but no gain at all. We need your help:
( data provided is very small set, tables go to millions of rows each )
Here is query:
select case when dic.din_mapping_grp_flag = 1 then dic.dist_org_id
else dic.distributor_id
end distributor_id
,case when dic.din_mapping_grp_flag = 1 then do.distributor_name
else dc.distributor_name
end distributor_name
,dic.din
,dic.active_flag
,min ( dic.dist_item_source ) dist_item_source
,max ( dic.brand ) brand
,max ( dic.description ) description

from dist_item_catalog dic
,distributor do
,distributor dc

where dic.dist_org_id = do.distributor_id
and dic.distributor_id = dc.distributor_id
and dic.active_flag = 1

group by case when dic.din_mapping_grp_flag = 1 then dic.dist_org_id
else dic.distributor_id
end,
case when dic.din_mapping_grp_flag = 1 then do.distributor_name
else dc.distributor_name
end,
dic.din,
dic.active_flag;

Table distributor is:
CREATE TABLE DISTRIBUTOR
( DISTRIBUTOR_ID VARCHAR2(13 BYTE) NOT NULL,
DISTRIBUTOR_NAME VARCHAR2(60 BYTE) NOT NULL );

SET DEFINE OFF;
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00729', 'Sysco Sacramento');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00705', 'AFI Foodservice INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00699', 'US Foods Detroit INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00697', 'Hansen Sales - VEND');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00696', 'Fox River');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00703', 'DiCarlo Distributors');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00701', 'US Foods Cincinnati INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00700', 'US Foods Knoxville 2270 6H');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00730', 'US Foods Paris (formerly PYA ) / 5Q INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00718', 'Upper Lakes Foods Inc');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00719', 'Fleming INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00720', 'Camellia INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00724', 'Reinhart Foodservice Twin Cities');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00725', 'Indianhead Foodservice ');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00726', 'Martin Preferred Foods');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00727', 'Wenning and Sons INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00706', 'J Kings Food Service Professionals - Pro Act');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00707', 'Texoma Meat Company INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00710', 'Q S I Foods INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00711', 'Sysco Jamestown INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00712', 'City Meats INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00713', 'Merit Day Food Service INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00714', 'US Foods Milwaukee 248 (formerly Alliant) 3D');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2001-00715', 'Sysco Pittsburgh');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2002-01122', 'Sysco Hampton Roads');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2002-01106', 'Martin Brothers');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2003-00231', 'Dean Foods - Pet Dairy Johnson City TN (Land O Sun)');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2003-00232', 'Dean Foods - Pet Dairy Spartanburg SC INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2003-00233', 'Dean Foods - Schepps Dairy Houston INACTIVE');
Insert into GPO.DISTRIBUTOR
(DISTRIBUTOR_ID, DISTRIBUTOR_NAME)
Values
('DO-2003-00234', 'Dean Foods - Shenandoah''s Pride Dairy');
COMMIT;

and table dist_item_catalog:
CREATE TABLE DIST_ITEM_CATALOG
(
DIN VARCHAR2(20 BYTE) NOT NULL,
DIST_ITEM_SOURCE VARCHAR2(1 BYTE) NOT NULL,
DISTRIBUTOR_ID VARCHAR2(13 BYTE),
DIST_ORG_ID VARCHAR2(13 BYTE),
DIN_MAPPING_GRP_FLAG NUMBER(1),
ACTIVE_FLAG NUMBER,
DESCRIPTION VARCHAR2(100 BYTE),
BRAND VARCHAR2(40 BYTE),
);

SET DEFINE OFF;
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '17253', 1,
'D', 'PACTIV', 'TRAY FOAM SCHL 5COMP WHT 8X10');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '55659', 1,
'D', 'SUNKSTS', 'DRINK CRANBERRY CKTL 10% 4+1');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '59859', 1,
'D', 'SUBWAY', 'TURKEY BRST CKD SLI .5 OZ');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '83737', 1,
'D', 'PRIMOS', 'BOX BAKERY 10X10X4 PRINT PRIMO');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '83790', 1,
'D', 'C&M', 'LID PLAS DOME F/9" DEEP');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '100628', 1,
'D', 'SYS IMP', 'JUICE ORANGE 100% NECTAR THICK');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '101675', 1,
'D', 'SYS IMP', 'WATER LEMON NECTAR THICK');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '102020', 1,
'D', 'SYS IMP', 'MILK 2% DAIRY HONEY THICK');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '117341', 1,
'D', 'BASICAM', 'POTATO MASHED NATURE S OWN');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '127672', 1,
'D', 'SUBWAY', 'FILM PVC RL 18"X2000 W/CUTTER');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '138792', 1,
'D', 'PACKER', 'SUGAR COARSE CON AA');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '143356', 1,
'D', 'SYS CLS', 'POTATO H/BRN DEHY GLDN GRL');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '156901', 1,
'D', 'HELLMAN', 'DRESSING 1000 ISL FF PKT');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '172623', 1,
'D', 'CHFMXWL', 'PEA GREEN FRZN');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '213452', 1,
'D', 'SUBWAY', 'DOUGH COOKIE RASP CHSCAK');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '218962', 1,
'D', 'SYS CLS', 'CHICKEN WING BUFF GLZE 1&2 PCK');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '236964', 1,
'D', 'SLIKRIK', 'MIX SPICE CHILI ORGANIC');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '237018', 1,
'D', 'SLIKRIK', 'MIX SPICE RED HOT ORGANIC');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '284137', 1,
'D', 'BRILL', 'CAKE RED VELVET DBL LYR 8"');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '322376', 1,
'D', 'BRKBUSH', 'CHICKEN WING BRD BRBN WILD');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '325223', 1,
'D', 'CORNER', 'BACON SLI APLWD SMKD 13/15');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '349245', 1,
'D', 'THOMPSN', 'STEAK PORTERHOUSE SEL ANGUS');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '350514', 1,
'D', 'THOMPSN', 'BEEF ROUND INSIDE DENUDE');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '353237', 1,
'D', 'AREZZIO', 'CHEESE STRING MOZZARELLA');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '354219', 1,
'D', 'THOMPSN', 'PORK CHOP FRCHD 8-10 OZ');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '356024', 1,
'D', 'FARMLND', 'BACON CANADIAN SLI .75 OZ FRZN');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '356857', 1,
'D', 'SCHWANS', 'ICE CREAM STWBRY');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '359453', 1,
'D', 'CORNER', 'SAUCE BASIL PSTO');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '361028', 1,
'D', 'SUBWAY', 'CUP PLAS CLD 40 OZ "GRN LEAF"');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '366904', 1,
'D', 'MSGROWN', 'POTATO SWEET FRSH');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '384495', 1,
'D', 'KRSPY K', 'DOUGH BISCUIT BTRMLK DROP');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '387318', 1,
'D', 'SEABEST', 'SNAPPER FIL SKON SCAR 8/10 IDN');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '389280', 1,
'D', 'BKRSCLS', 'BATTER MUFFIN CHOC CHOC CHIP');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '392490', 1,
'D', 'COMET', 'CUP PLAS CAR 32TV OZ WHT');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '395802', 1,
'D', 'PRAIRIE', 'STRAW WRPD TRANS GNT 10.25"');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '429274', 1,
'D', 'SYS IMP', 'MILK 2% DAIRY NECTAR THICK');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '435046', 1,
'D', 'PLDRHNO', 'SAUCE BBQ');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '449280', 1,
'D', 'GILSTER', 'PASTA SPAGHETTI 10"');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '455099', 1,
'D', 'PACKER', 'PIE LEMON MERNGE');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '471922', 1,
'D', 'SUBWAY', 'BAG PLAS SNDW "GRN LEAF"');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '476224', 1,
'D', 'CABBHNP', 'STEAK FILET C\C CH EXCLUSIVE');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '477513', 1,
'D', 'TODDS', 'SAUCE TERIYAKI SWEET THICK');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '491567', 1,
'D', 'KINGCHS', 'CAKE BUNDT ASST 9"');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '496578', 1,
'D', 'BHB/NPM', 'STEAK STRIP E\E 1"TL PR FRZN');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '501957', 1,
'D', 'MARY B', 'DOUGH BISCUIT BUTTERMILK');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '504169', 1,
'D', 'AEP INC', 'LINER TRASH 38X58 1.5M BLK');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '533517', 1,
'D', 'BTRBALL', 'TURKEY BRGR SAVRY WHT RTC');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '557575', 1,
'D', 'RUFINOS', 'APTZR MUSHROOM STFD W/CRABMEAT');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '566824', 1,
'D', 'FIRECLS', 'BEEF GROUND BULK FINE 73/27');
Insert into GPO.DIST_ITEM_CATALOG
(DIN_MAPPING_GRP_FLAG, DIST_ORG_ID, DISTRIBUTOR_ID, DIN, ACTIVE_FLAG,
DIST_ITEM_SOURCE, BRAND, DESCRIPTION)
Values
(1, 'DO-2004-00404', 'DO-2001-00589', '570273', 1,
'D', 'CYCREEK', 'ALLIGATOR MEAT FARM RAISED');
COMMIT;

August 01, 2012 - 3:26 pm UTC

I do not see a single analytic function in that entire query.

????

?

Tony Fernandez, August 01, 2012 - 3:50 pm UTC

That is correct Tom, no analytic function is displayed but the ones attempted do not help in the tuning but make it more costly.

The aggregates have been added analytics like "over ( partition by case
when dic.din_mapping_grp_flag = 1 then dic.dist_org_id
else dic.distributor_id
end,
case
when dic.din_mapping_grp_flag = 1 then do.distributor_name
else dc.distributor_name
end,
dic.din,
dic.active_flag )

We need these items to "group by" or "over ( partitioned by )"

Thanks,

August 01, 2012 - 3:52 pm UTC

??? I don't understand at all.

I don't see how analytics would fit into this query at all, you appear to want to aggregate.

I'm not sure where you are trying to go with analytics, I don't see them as being useful in this query - not if your need is to AGGREGATE.

continued

Tony Fernandez, August 01, 2012 - 4:04 pm UTC

Sorry Tom,

Here is the analytic version:
select case when dic.din_mapping_grp_flag = 1 then dic.dist_org_id
else dic.distributor_id
end distributor_id
,case when dic.din_mapping_grp_flag = 1 then do.distributor_name
else dc.distributor_name
end distributor_name
,dic.din
,dic.active_flag
,min ( dic.dist_item_source ) over ( partition by case when dic.din_mapping_grp_flag = 1 then dic.dist_org_id
else dic.distributor_id
end,
dic.din,
dic.active_flag )
,max ( dic.brand ) -- analityc same as prior column directly above
,max ( dic.description ) -- idem

from dist_item_catalog dic
,distributor do
,distributor dc

where dic.dist_org_id = do.distributor_id
and dic.distributor_id = dc.distributor_id
and dic.active_flag = 1
;

Please keep in mind that analytic is only one alternative to attempt to expedite the query, any other suggestion even aside of analytics will be fine.

Bottom line is the query for an GUI transaction takes up to 20 minutes on the whole data set, we would like to talk about seconds for the GUI user.

Thanks,

August 01, 2012 - 4:10 pm UTC

but that gives the wrong answer doesn't it.

you need to aggregate, you want to GET RID of rows. analytics do not do that.

have you considered a materialized view.

Mat view

Tony Fernandez, August 01, 2012 - 4:15 pm UTC

That is correct, more rows.
Materialized view is out of the question because this being a financial system and many things happen per second that could obsolete the mat view almost instantaneously.

Bottom line is that there are many aggregates with the same group by and noticed that just dropping one aggregated column makes great gain in time. But unfortunately, all columns in query are needed.

One more attempt

Tony Fernandez, August 06, 2012 - 9:10 am UTC

Tom,

Here is query written with no analytics. Understood that analytics will not help in this case. Can you please point any other direction yet to be explored? My DBA group all agree that mat view was not solution before due to high OLTP and we want to continue with that norm.

Here is the query:
with dic2 as ( select min( dist_item_catalog_id ) dicid
,din
,decode( din_mapping_grp_flag, 1, dist_org_id, distributor_id ) distid
from dist_item_catalog
group by din
,decode( din_mapping_grp_flag, 1, dist_org_id, distributor_id )
)

select 'Available' type
,decode( dic.din_mapping_grp_flag, 1, dic.dist_org_id, dic.distributor_id ) distributor_id
,decode( dic.din_mapping_grp_flag, 1, do.distributor_name, dc.distributor_name ) distributor_name
,dic.din
,dic.active_flag
,dic.dist_item_source
,dic.brand
,dic.description
,dic.dist_category_id
,dic.category_name
,dic.dist_sub_category_id
,dic.subcategory_name
,dic.pack_size
,dic.dist_mfr_name
,dic.dist_mfr_id

from dist_item_catalog dic
,dic2
,distributor do
,distributor dc

where dic2.dicid = dic.dist_item_catalog_id
and dic.dist_org_id = do.distributor_id
and dic.distributor_id = dc.distributor_id
and dic.active_flag = 1
and not exists ( select 1
from dist_term_set_item dtsi
where dtsi.dist_item_catalog_id =
dic.dist_item_catalog_id );

August 17, 2012 - 12:20 pm UTC

do you have a plan

and are the estimated cardinalities in the plan close to being correct? use your knowledge of the data to answer that.

Book?

Nelson, August 10, 2012 - 12:08 pm UTC

So Tom... 10 years later.. did you finally get a chance to publish a book on analytics?

Plan

Tony Fernandez, August 17, 2012 - 1:47 pm UTC

Tom,

Here is my explain for two questions ago in current thread:
Execution Plan
----------------------------------------------------------

-----------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1652K| 623M| 86663 |

| 1 | HASH JOIN | | 1652K| 623M| 86663 |

| 2 | TABLE ACCESS FULL | DISTRIBUTOR | 5673 | 238K| 58 |

| 3 | HASH JOIN | | 1652K| 556M| 86585 |

| 4 | TABLE ACCESS FULL | DISTRIBUTOR | 5673 | 238K| 58 |

| 5 | HASH JOIN | | 1652K| 488M| 86506 |

| 6 | VIEW | | 1652K| 20M| 17601 |

| 7 | HASH GROUP BY | | 1652K| 85M| 17601 |

| 8 | TABLE ACCESS FULL | DIST_ITEM_CATALOG | 1652K| 85M| 17418 |

| 9 | HASH JOIN RIGHT ANTI | | 1652K| 467M| 42682 |

| 10 | INDEX FAST FULL SCAN| DIST_TERM_SET_ITEM_IDX | 1056K| 6192K| 476 |

| 11 | TABLE ACCESS FULL | DIST_ITEM_CATALOG | 1652K| 458M| 17517 |

-----------------------------------------------------------------------------------

and Statistics:
Statistics
----------------------------------------------------------
107 recursive calls
0 db block gets
47967 consistent gets
5140 physical reads
0 redo size
25155 bytes sent via SQL*Net to client
507 bytes received via SQL*Net from client
15 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
209 rows processed

Cardinalities are fine.

Thanks,

August 17, 2012 - 3:37 pm UTC

no they aren't.

we estimated 1.6 MILLION ROWS as output.

you got 209

that sure doesn't add up.

can you run the query with gather_plan_statistics in sqlplus:
http://jonathanlewis.wordpress.com/2006/11/09/dbms_xplan-in-10g/

or get a tkprof and merge the actual row counts with the estimated and give us a report that shows the plan - then estimated - then actual rows.

Analytics - considering a flag in both senarios

ARC, October 02, 2012 - 3:59 am UTC

Hi Tom,
Please help me with analytics.

Table Data:
SNO FLG AMT
1 NPI 1000
2 NPI 2000
3 AFM 1400
4 AFM 1500
5 AFM 1700
6 NFM 500
7 NFM 700
8 NPI 8000
9 NPI 6000
10 OTH 9000

In the above data I have to consider NFM flag records under both NPI and AFM records. When I select NPI the amounts under NPI and NFM should sum up, When I select AFM the amounts under AFM and NFM should sum up.
I tried with below SQL, but it is working for one condition either NPI or AFM. I'm trying to achive in a single query without any filters on the FLAG.
==========
SELECT case when flg in ('NPI','NFM') then 'NPI' when flg in ('AFM','NFM') then 'AFM' else FLG end SSS,
sum(amt) amt
FROM TEST_FLG
group by case when flg in ('NPI','NFM') then 'NPI' when flg in ('AFM','NFM') then 'AFM' else FLG end
==========
My desired out put is
NPI 18200
AFM 5800
OTH 9000

Plese help me in correcting query. Thanks in advance.

-ARC

October 09, 2012 - 11:34 am UTC

no create
no inserts
no look

no promises either, i really didn't look at this

rebisco, October 10, 2012 - 2:29 am UTC

Hello ARC,

I'm not so sure if this is what you wanted but, for me it can be solved using UNION ALL and a few filters?

SQL> CREATE TABLE TEST AS
  2  SELECT 1 SNO, 'NPI' FLG, 1000 AMT FROM DUAL
  3  UNION ALL
  4  SELECT 2 SNO, 'NPI' FLG, 2000 AMT FROM DUAL
  5  UNION ALL
  6  SELECT 3 SNO, 'AFM' FLG, 1400 AMT FROM DUAL
  7  UNION ALL
  8  SELECT 4 SNO, 'AFM' FLG, 1500 AMT FROM DUAL
  9  UNION ALL
 10  SELECT 5 SNO, 'AFM' FLG, 1700 AMT FROM DUAL
 11  UNION ALL
 12  SELECT 6 SNO, 'NFM' FLG, 500 AMT FROM DUAL
 13  UNION ALL
 14  SELECT 7 SNO, 'NFM' FLG, 700 AMT FROM DUAL
 15  UNION ALL
 16  SELECT 8 SNO, 'NPI' FLG, 8000 AMT FROM DUAL
 17  UNION ALL
 18  SELECT 9 SNO, 'NPI' FLG, 6000 AMT FROM DUAL
 19  UNION ALL
 20  SELECT 10 SNO, 'OTH' FLG, 9000 AMT FROM DUAL;

Table created

SQL> SELECT 'NFM' flag, SUM(amt)amt FROM TEST WHERE flg IN ('NPI','AFM')
  2  UNION ALL
  3  SELECT 'NPI' flag, SUM(amt)amt FROM TEST WHERE flg IN ('NPI','NFM')
  4  UNION ALL
  5  SELECT 'AFM' flag, SUM(amt)amt FROM TEST WHERE flg IN ('AFM','NFM')
  6  UNION ALL
  7  SELECT 'OTH' flag, SUM(amt) amt FROM TEST WHERE flg NOT IN ('AFM','NPI','NFM');

FLAG                                    AMT
-------------------------------- ----------
NFM                                   21600
NPI                                   18200
AFM                                    5800
OTH                                    9000

Printing range with comma separated values

Arvind, December 06, 2012 - 12:01 am UTC

Dear Tom,

In this question, user wants the data to be categorised in conitnous range. But I have some what same but bit different requirement as given below.

I have table and data as given below
create table ag_t1
(col1 varchar2(5),
col2 varchar2(10)
);

insert into ag_t1 values ('01.01', 'v1');
insert into ag_t1 values ('01.02', 'v1');
insert into ag_t1 values ('01.03', 'v1');
insert into ag_t1 values ('01.04', 'v2');
insert into ag_t1 values ('01.05', 'v1');
insert into ag_t1 values ('01.06', 'v2');
insert into ag_t1 values ('02.01', 'v1');
insert into ag_t1 values ('02.02', 'v1');
insert into ag_t1 values ('02.03', 'v2');
insert into ag_t1 values ('02.04', 'v2');
insert into ag_t1 values ('02.05', 'v1');
insert into ag_t1 values ('02.06', 'v2');
insert into ag_t1 values ('08.01', 'v1');
insert into ag_t1 values ('08.02', 'v1');
insert into ag_t1 values ('08.03', 'v1');
insert into ag_t1 values ('08.04', 'v2');
insert into ag_t1 values ('08.05', 'v1');
insert into ag_t1 values ('08.06', 'v2');
I want the output as given below.
Output
v1 - 01.01 to 01.03, 01.05, 02.01 to 02.02, 02.05, 08.01 to 08.03, 08.05
v2 - 01.04, 01.06, 02.03, 02.04, 02.06, 08.04, 08.06

December 14, 2012 - 1:57 pm UTC

why isn't 2.03, 2.04 using "to" like 2.01 to 2.02 did.

define this better.

always 2 digit number "." 2 digit number?

always collapse out consecutive numbers using "to"?

Printing range with comma separated values

Rajeshwaran, Jeyabal, December 15, 2012 - 7:19 am UTC

drop table t purge;

create table t
(col1  number(5,2),
 col2  varchar2(10)
 );
 
 insert into t values (01.01, 'v1');
 insert into t values (01.02, 'v1');
 insert into t values (01.03, 'v1');
 insert into t values (01.04, 'v2');
 insert into t values (01.05, 'v1');
 insert into t values (01.06, 'v2');
 insert into t values (02.01, 'v1');
 insert into t values (02.02, 'v1');
 insert into t values (02.03, 'v2');
 insert into t values (02.04, 'v2');
 insert into t values (02.05, 'v1');
 insert into t values (02.06, 'v2');
 insert into t values (08.01, 'v1');
 insert into t values (08.02, 'v1');
 insert into t values (08.03, 'v1');
 insert into t values (08.04, 'v2');
 insert into t values (08.05, 'v1');
 insert into t values (08.06, 'v2');
 commit;

rajesh@ORA11G> select col2,min(col1) min_col1,max(col1) max_col1,
  2      case when nullif(min(col1),max(col1)) is not null
  3              then min(col1)||' to '||max(col1)
  4     else to_char(min(col1)) end reslt
  5  from (
  6  select col1,col2,max(val) over(order by col1,col2) grp
  7  from (
  8  select t.*,
  9    case when lag(col2) over(order by col1,col2) is null
 10    or lag(col2) over(order by col1,col2) <> col2 then
 11    row_number() over(order by col1,col2) end as val
 12  from t
 13       )
 14       )
 15  group by col2,grp
 16  order by col2
 17  /

COL2         MIN_COL1   MAX_COL1 RESLT
---------- ---------- ---------- --------------------
v1               1.01       1.03 1.01 to 1.03
v1               1.05       1.05 1.05
v1               2.01       2.02 2.01 to 2.02
v1               2.05       2.05 2.05
v1               8.01       8.03 8.01 to 8.03
v1               8.05       8.05 8.05
v2               1.04       1.04 1.04
v2               1.06       1.06 1.06
v2               2.03       2.04 2.03 to 2.04
v2               2.06       2.06 2.06
v2               8.04       8.04 8.04
v2               8.06       8.06 8.06

12 rows selected.

Elapsed: 00:00:00.01
rajesh@ORA11G> select col2,listagg(reslt,',') within group(order by min_col1) as listed_values
  2  from (
  3  select col2,min(col1) min_col1,max(col1) max_col1,
  4      case when nullif(min(col1),max(col1)) is not null
  5              then min(col1)||' to '||max(col1)
  6     else to_char(min(col1)) end reslt
  7  from (
  8  select col1,col2,max(val) over(order by col1,col2) grp
  9  from (
 10  select t.*,
 11    case when lag(col2) over(order by col1,col2) is null
 12    or lag(col2) over(order by col1,col2) <> col2 then
 13    row_number() over(order by col1,col2) end as val
 14  from t
 15       )
 16       )
 17  group by col2,grp
 18      )
 19  group by col2
 20  /

COL2       LISTED_VALUES
---------- ------------------------------------------------------------
v1         1.01 to 1.03,1.05,2.01 to 2.02,2.05,8.01 to 8.03,8.05
v2         1.04,1.06,2.03 to 2.04,2.06,8.04,8.06

Elapsed: 00:00:00.03
rajesh@ORA11G>

December 18, 2012 - 7:41 am UTC

you did not produce their requested output, they did not provide sufficient information to actually answer the question....

Muhammad Bilal, December 31, 2012 - 4:44 am UTC

I have the following dataset:

CREATE TABLE RoomGroups
(group_id NUMBER,
group_name VARCHAR2(30),
group_capacity NUMBER);

INSERT INTO RoomGroups VALUES(1, 'SEECS UG Block', 100);
INSERT INTO RoomGroups VALUES(2, 'IAEC Building', 70);
INSERT INTO RoomGroups VALUES(3, 'RIMMS Building', 90);
Commit;

CREATE TABLE DegreeBatches
(batch_id NUMBER, batch_name VARCHAR2(30), batch_strength NUMBER);

INSERT INTO DegreeBatches VALUES(10,'BIT-10',35);
INSERT INTO DegreeBatches VALUES(20,'BIT-11',40);
INSERT INTO DegreeBatches VALUES(30,'BSCS-2',35);
INSERT INTO DegreeBatches VALUES(40,'BSCS-3',40);
INSERT INTO DegreeBatches VALUES(50,'BEE-4',50);
INSERT INTO DegreeBatches VALUES(60,'BICSE-7',25);
INSERT INTO DegreeBatches VALUES(70,'BESE-3',30);
Commit;

I want to achieve the following through single SQL statement:
1. Get all possible sets of DegreeBatches for each group - in such a way that for each set the SUM(batch_strength) <= Capacity of that group as show below.
GroupId Batches StrengthStr StrengthTotal Capacity
1. {BIT-10, BIT-11, BICSE-7} {35, 40, 25} 100 100
1. {BSCS-2, BSCS-3, BICSE-7} {35, 40, 25} 100 100
1. {BEE-4, BSCS-3} {50, 40} 90 100
...
...
2. {BIT-10, BSCS-2} {35, 35} 70 70
2. {BSCS-3, BESE-3} {40, 30} 70 70
2. {BIT-11, BICSE-7} {40, 25} 65 70
and so on...

2. Next I want to retrieve only non-overlapping sets based on minimum value of Capacity-StrengthTotal for each group
GroupId Batches StrengthStr StrengthTotal Capacity
1. {BIT-10, BIT-11, BICSE-7} {35, 40, 25} 100 100
2. {BSCS-3, BESE-3} {40, 30} 70 70
3. {BSCS-2, BEE-4} {35, 50} 85 90

3. The final result must have all the batches in it...
Which in this case is true...

Any help will be highly appreciated...
Thanks
Bilal

How to squash/combine rows around current row

Matt, January 08, 2013 - 11:30 am UTC

Hi Tom

I am trying to figure out how to combine similar, connected rows. I think I need analytics but all my attempts are meeting failure.

drop table mtemp;
create table mtemp (startdt date,enddt date, ra number, nqc number);

insert into mtemp values ('1-JAN-2013','2-JAN-2013',1,2);
insert into mtemp values ('2-JAN-2013','3-JAN-2013',1,2);
insert into mtemp values ('3-JAN-2013','4-JAN-2013',1,2);
insert into mtemp values ('4-JAN-2013','5-JAN-2013',1000,2000);
insert into mtemp values ('5-JAN-2013','6-JAN-2013',1,2);
commit;

what I need is a report that "squashes" the first three records where ra and nqc values are the same into one row. Then the next record where the values change to 1000 and 2000 and then a third record where the values go back to their initial values.

The output I would like is:

startdt enddt ra nqc
-------- -------- ------- --------
1-1-2013 1-4-2013 1 2 <three records "squashed"
1-4-2013 1-5-2013 1000 2000
1-5-2013 1-6-2013 1 2 <back to initial vals, need as SEPARATE record

I thought this was a simple analytic function using first_value and last_value. But the fact that the partition by values "return" to something previous is the troublesome portion.

select first_value(startdt) over (partition by ra,nqc order by startdt) minstart
, last_value(enddt) over (partition by ra,nqc order by startdt) maxend
, ra
, nqc
from mtemp

But with that I get

MINSTART MAXEND RA NQC
1/1/2013 1/2/2013 1 2
1/1/2013 1/3/2013 1 2
1/1/2013 1/4/2013 1 2
1/1/2013 1/6/2013 1 2
1/4/2013 1/5/2013 1000 2000

I have also tried various windowing with RANGE BETWEEN and ROWS BETWEEN but I am not finding good examples and I am failing.

Thanks so much for your help over the years.
matt

January 14, 2013 - 11:53 am UTC

ops$tkyte%ORA11GR2> select min(startdt), max(enddt), min(ra), min(nqc)
  2    from (
  3  select startdt, enddt, ra, nqc, last_value(tag ignore nulls) over (order by startdt) grp
  4    from (
  5  select startdt, enddt, ra, nqc,
  6         case when decode( lag(ra) over (order by startdt), ra, 1,0 ) = 0
  7                       and
  8                   decode( lag(nqc) over (order by startdt), nqc, 1,0 ) = 0
  9                          then row_number() over (order by startdt)
 10                  end tag
 11    from mtemp
 12         )
 13             )
 14   group by grp
 15   order by grp
 16  /

MIN(START MAX(ENDDT    MIN(RA)   MIN(NQC)
--------- --------- ---------- ----------
01-JAN-13 04-JAN-13          1          2
04-JAN-13 05-JAN-13       1000       2000
05-JAN-13 06-JAN-13          1          2

is one approach.

the inner most query tags the beginning of each group - based on whether the last ra/nqc are the same:

ops$tkyte%ORA11GR2> select startdt, enddt, ra, nqc,
  2                   lag(ra) over (order by startdt) lrq,
  3                   lag(nqc) over (order by startdt) lnqc,
  4                   decode( lag(ra) over (order by startdt), ra, 1,0 ) dra,
  5                   decode( lag(nqc) over (order by startdt), nqc, 1,0 ) dnqc,
  6         case when decode( lag(ra) over (order by startdt), ra, 1,0 ) = 0
  7                       and
  8                   decode( lag(nqc) over (order by startdt), nqc, 1,0 ) = 0
  9                          then row_number() over (order by startdt)
 10                  end tag
 11    from mtemp
 12  /

STARTDT   ENDDT             RA        NQC        LRQ       LNQC        DRA       DNQC        TAG
--------- --------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
01-JAN-13 02-JAN-13          1          2                                0          0          1
02-JAN-13 03-JAN-13          1          2          1          2          1          1
03-JAN-13 04-JAN-13          1          2          1          2          1          1
04-JAN-13 05-JAN-13       1000       2000          1          2          0          0          4
05-JAN-13 06-JAN-13          1          2       1000       2000          0          0          5

the second level carries down this value for us:

ops$tkyte%ORA11GR2> select startdt, enddt, ra, nqc, last_value(tag ignore nulls) over (order by startdt) grp
  2    from (
  3  select startdt, enddt, ra, nqc,
  4         case when decode( lag(ra) over (order by startdt), ra, 1,0 ) = 0
  5                       and
  6                   decode( lag(nqc) over (order by startdt), nqc, 1,0 ) = 0
  7                          then row_number() over (order by startdt)
  8                  end tag
  9    from mtemp
 10         )
 11  /

STARTDT   ENDDT             RA        NQC        GRP
--------- --------- ---------- ---------- ----------
01-JAN-13 02-JAN-13          1          2          1
02-JAN-13 03-JAN-13          1          2          1
03-JAN-13 04-JAN-13          1          2          1
04-JAN-13 05-JAN-13       1000       2000          4
05-JAN-13 06-JAN-13          1          2          5

and then we just aggregate.

Question asked by Matt - bottom of page

Artieboy, January 15, 2013 - 12:33 am UTC

As you can tell I'm a novice, but would this work Oracle?

select unique startdt_new, enddt_new, ra, nqc
from
(--3
select startdt, enddt, ra, nqc, startdt_new,
last_value(enddt) over (partition by startdt_new) enddt_new from

(--2
select startdt, enddt, ra, nqc,
case when ra = lead(ra) over(partition by ra, nqc order by startdt) and nqc = lead(nqc) over(partition by ra, nqc order by startdt)
then c2 else startdt end startdt_new

from
(--1
select startdt, enddt, ra, nqc,
min(startdt) over(partition by ra, nqc) c2
from mtemp
)--1
)--2
)--3
order by startdt_new

CHEERS!

January 15, 2013 - 10:34 am UTC

explain your logic, like I did mine. reverse engineering what someone was thinking can be error prone ;)

it doesn't work in general, no

ops$tkyte%ORA11GR2> select * from mtemp order by startdt;

STARTDT   ENDDT             RA        NQC
--------- --------- ---------- ----------
01-JAN-13 02-JAN-13          1          2
02-JAN-13 03-JAN-13          1          2
03-JAN-13 04-JAN-13          1          2
04-JAN-13 05-JAN-13       1000       2000
05-JAN-13 06-JAN-13          1          2
07-JAN-13 08-JAN-13       5000       6000
09-JAN-13 10-JAN-13          1          2

7 rows selected.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select unique startdt_new, enddt_new, ra, nqc
  2  from
  3  (--3
  4  select startdt, enddt, ra, nqc, startdt_new,
  5  last_value(enddt) over (partition by startdt_new) enddt_new from
  6  (--2
  7  select startdt, enddt, ra, nqc,
  8  case when ra = lead(ra) over(partition by ra, nqc order by startdt) and nqc =
  9  lead(nqc) over(partition by ra, nqc order by startdt)
 10  then c2 else startdt end startdt_new
 11  from
 12  (--1
 13  select startdt, enddt, ra, nqc,
 14  min(startdt) over(partition by ra, nqc) c2
 15  from mtemp
 16  )--1
 17  )--2
 18  )--3
 19  order by startdt_new
 20  /

STARTDT_N ENDDT_NEW         RA        NQC
--------- --------- ---------- ----------
01-JAN-13 06-JAN-13          1          2
04-JAN-13 05-JAN-13       1000       2000
07-JAN-13 08-JAN-13       5000       6000
09-JAN-13 10-JAN-13          1          2

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select min(startdt), max(enddt), min(ra), min(nqc)
  2    from (
  3  select startdt, enddt, ra, nqc, last_value(tag ignore nulls) over (order by startdt) grp
  4    from (
  5  select startdt, enddt, ra, nqc,
  6         case when decode( lag(ra) over (order by startdt), ra, 1,0 ) = 0
  7                       and
  8                   decode( lag(nqc) over (order by startdt), nqc, 1,0 ) = 0
  9                          then row_number() over (order by startdt)
 10                  end tag
 11    from mtemp
 12         )
 13             )
 14   group by grp
 15   order by grp
 16  /

MIN(START MAX(ENDDT    MIN(RA)   MIN(NQC)
--------- --------- ---------- ----------
01-JAN-13 04-JAN-13          1          2
04-JAN-13 05-JAN-13       1000       2000
05-JAN-13 06-JAN-13          1          2
07-JAN-13 08-JAN-13       5000       6000
09-JAN-13 10-JAN-13          1          2

Approach to How to squash/combine rows

Matt, January 15, 2013 - 12:13 pm UTC

Tom,
I like your approach. We ended up solving a different way that appears to work, though. It relies on two "counters" using row_number(). The first creates a running counter and the other a counter for like rows - rows with the same RA and NQC. Subtract one from the other and you get a number that is unique for the grouping.

Do you see anything wrong with this:

-- first the inner query showing the counters:
SELECT startdt, enddt, ra, nqc
, ROW_NUMBER() OVER(ORDER BY startdt, enddt) grpa
, ROW_NUMBER() OVER(PARTITION BY ra, nqc ORDER BY startdt, enddt) grpb
, ROW_NUMBER() OVER(ORDER BY startdt, enddt) - ROW_NUMBER() OVER(PARTITION BY ra, nqc ORDER BY startdt, enddt) grp
FROM mtemp;

STARTDT ENDDT RA NQC GRPA GRPB GRP
--------- --------- ---------- ---------- ---------- ---------- ----------
01-JAN-13 02-JAN-13 1 2 1 1 0
02-JAN-13 03-JAN-13 1 2 2 2 0
03-JAN-13 04-JAN-13 1 2 3 3 0
04-JAN-13 05-JAN-13 1000 2000 4 1 3
05-JAN-13 06-JAN-13 1 2 5 4 1

Now we aggregate and remove the counting/grouping

SELECT MIN(startdt) startdt, MAX(enddt) enddt, ra, nqc
FROM (SELECT startdt, enddt, ra, nqc
, ROW_NUMBER() OVER(ORDER BY startdt, enddt) grpa
, ROW_NUMBER() OVER(PARTITION BY ra, nqc ORDER BY startdt, enddt) grpb
, ROW_NUMBER() OVER(ORDER BY startdt, enddt) - ROW_NUMBER() OVER(PARTITION BY ra, nqc ORDER BY startdt, enddt) grp
FROM mtemp)
GROUP BY ra, nqc, grp
ORDER BY startdt, enddt;

STARTDT ENDDT RA NQC
--------- --------- ---------- ----------
01-JAN-13 04-JAN-13 1 2
04-JAN-13 05-JAN-13 1000 2000
05-JAN-13 08-JAN-13 1 2

My colleague found this while researching this as a "Gaps and Islands" problem.

Thanks again.

January 15, 2013 - 2:33 pm UTC

interesting approach - it looks to be roughly equivalent workwise - same number of window sort operations, they both full scan - window sort - window sort - aggregate.

I'll have to file that one away in my head for future reference :)

I've used what I call the "carry down" technique (my approach above) so so so many times now - it just types itself anymore...

Re: Approach to How to squash/combine rows

Narendra, January 17, 2013 - 3:19 am UTC

Matt,

Not sure if you (or your collegues) have come across this but an Oracle ACE member has demonstrated it and also named it as "Tabibitosan method". See his thread https://forums.oracle.com/forums/thread.jspa?messageID=3989177&tstart=0#3989177

..question

A Reader, February 05, 2013 - 6:21 am UTC

Hello Tom,
  

SQL> create table t1 ( x  varchar2(31));

Table created.

SQL>insert into t1 values('zzz90000(Zh900000)xyz~0001234');
SQL>insert into t1 values('zzz90000(7i900000)xyz~0001234');
SQL>insert into t1 values('zzz90000(2i900000)xyz~0001234');
SQL>insert into t1 values('zzz80000(7i900000)xyz~0001234');
SQL>insert into t1 values('zzz70000(Ve900000)xyz~0001234');
SQL>insert into t1 values('zzz70000(2f900000)xyz~0001234');
SQL>insert into t1 values('zzz60000(Zh900000)xyz~0001234');
SQL>insert into t1 values('zzz60000(Ve900000)xyz~0001234');
SQL>insert into t1 values('zzz60000(ui900000)xyz~0001234');
SQL>insert into t1 values('zzz60000(Ui900000)xyz~0001234');

man@ora10g:rac1> select * from t1 order by 1;

X
-------------------------------
zzz60000(Ui900000)xyz~0001234
zzz60000(Ve900000)xyz~0001234
zzz60000(Zh900000)xyz~0001234
zzz60000(ui900000)xyz~0001234
zzz70000(2f900000)xyz~0001234
zzz70000(Ve900000)xyz~0001234
zzz80000(7i900000)xyz~0001234
zzz90000(2i900000)xyz~0001234
zzz90000(7i900000)xyz~0001234
zzz90000(Zh900000)xyz~0001234

10 rows selected.


man@ora10g:rac1> select max(x) from t1;

MAX(X)
-------------------------------
zzz90000(Zh900000)xyz~0001234

man@ora10g:rac1> select * from t1 where x>'zzz70000(Ve900000)xyz~0001234';

X
-------------------------------
zzz90000(Zh900000)xyz~0001234
zzz90000(7i900000)xyz~0001234
zzz90000(2i900000)xyz~0001234
zzz80000(7i900000)xyz~0001234

 
 Question.
 
 I am interested in finding the result for top 4 values  with the given info of max(x).
 
 Had x been a number data type...
 I would have written
 I know using analytics it can be done.. but for some reasons I want to use max(x) .. 
 so I will write 
 select * from t1 where x > ( select ( max(x) - 4) from t1 ) ;
 
 how to select something similar in above case ( when we have x as varchar2 and with the 10 rows as shown ) 
  select * from t1 where x> ( select ( max(x)  - ??)  from t1);

February 06, 2013 - 8:24 am UTC

you should write this as

select * from (select * from t order by x DESC) where rownum <= 4;

in both cases. with your number case - you were assuming gap free numbers, that isn't going to be true in real life.

to get the top for - use a top-n query.

http://docs.oracle.com/cd/E11882_01/server.112/e26088/pseudocolumns009.htm#sthref66


ops$tkyte%ORA11GR2> select * from (select * from t1 order by x DESC) where rownum <= 4;

X
-------------------------------
zzz90000(Zh900000)xyz~0001234
zzz90000(7i900000)xyz~0001234
zzz90000(2i900000)xyz~0001234
zzz80000(7i900000)xyz~0001234

ops$tkyte%ORA11GR2>

Help with this one

Doug, February 06, 2013 - 10:21 pm UTC

Hi Tom,

Here is my query

With tbs_name as (select tablespace_name 
                  from dba_tablespaces 
                  where tablespace_name = UPPER ('ADMIN_INDEX') )
SELECT   'free space' owner, '      ' OBJECT, file_id, block_id, blocks,
         (blocks * 8192) / (1024 * 1024) meg
    FROM dba_free_space a, Tbs_name
   WHERE a.tablespace_name = tbs_name.tablespace_name
UNION
SELECT   SUBSTR (owner, 1, 20), SUBSTR (segment_name, 1, 32), file_id,
         block_id, blocks,
         (blocks * 8192) / (1024 * 1024) meg
    FROM dba_extents a, tbs_name
   WHERE a.tablespace_name = tbs_name.tablespace_name
ORDER BY 3, 4;

I would like to take the output of this and summarise or
consolidation into contiguous space.

ie if output was

OWNER     OBJECT  FILE_ID   BLOCK_ID   BLOCKS MEG
---------------------------------------------------
USER       IDX_1      2         9        128      1
USER       IDX_2      2       137        128      1
USER       IDX_2      2       265        128      1
USER       IDX_2      2       393        128      1
free space            2       521        128      1
free space            2       649        128      1
USER       IDX_2      2       777        128      1

Then I would want to roll up block Ids 137, 265 and 393
as as contigous space for IDX_2 and also roll up the freepace to report it as total of contigous space as well

I know defrags are normally a waste of time , my goal is
is to indentify indexes I can rebuild to free up free space.
I dropped 400Gb of indexes that were not being used and the space is available within the index tablespace but I need it
for other purposes
Regards
Doug

February 07, 2013 - 6:46 am UTC

I dropped 400Gb of indexes that were not being used and the space is available within the index tablespace but I need it
for other purposes

great, so use it - why do you need to do anything to use this space? it is already 100% useable???????



ops$tkyte%ORA11GR2> with tbs_name
  2  as
  3  (select tablespace_name
  4     from dba_tablespaces
  5    where tablespace_name = UPPER ('USERS')
  6  ),
  7  space
  8  as
  9  (
 10  SELECT 'free space' owner,
 11         '      ' OBJECT,
 12             file_id,
 13             block_id,
 14             blocks
 15    FROM dba_free_space a, Tbs_name
 16   WHERE a.tablespace_name = tbs_name.tablespace_name
 17   UNION ALL
 18  SELECT SUBSTR (owner, 1, 20),
 19         SUBSTR (segment_name, 1, 32), file_id,
 20         block_id,
 21             blocks
 22    FROM dba_extents a, tbs_name
 23   WHERE a.tablespace_name = tbs_name.tablespace_name
 24  )
 25  select owner, object, file_id, min(block_id), sum(blocks), count(*) cnt
 26    from (
 27  select owner,
 28         object,
 29             file_id,
 30             block_id,
 31             blocks,
 32             last_value( grp ignore nulls ) over (partition by owner, object, file_id order by block_id ) lgrp
 33    from (
 34  select owner,
 35         object,
 36             file_id,
 37             block_id,
 38             blocks,
 39             case when nvl( lag(block_id+blocks) over (partition by owner, object, file_id order by block_id), block_id+blocks) <> block_id
 40                  then row_number() over (partition by owner, object, file_id order by block_id)
 41                  end grp
 42    from space
 43         )
 44             )
 45   group by owner, object, file_id, lgrp
 46   order by owner, object, file_id
 47  /

OWNER      OBJECT                  FILE_ID MIN(BLOCK_ID) SUM(BLOCKS)        CNT
---------- -------------------- ---------- ------------- ----------- ----------
OPS$TKYTE  FY_CAL                        4           160           8          1
OPS$TKYTE  SYS_C0046904                  4           200           8          1
OPS$TKYTE  SYS_C0046904                  4          8872           8          1
OPS$TKYTE  SYS_C0046905                  4           248           8          1
OPS$TKYTE  SYS_C0046905                  4           280           8          1
OPS$TKYTE  SYS_C0046905                  4           328           8          1
OPS$TKYTE  SYS_C0046905                  4          1232           8          1
OPS$TKYTE  SYS_C0046905                  4          1264           8          1
OPS$TKYTE  SYS_C0046905                  4          8840           8          1
OPS$TKYTE  SYS_C0046905                  4          8856           8          1
OPS$TKYTE  SYS_C0046905                  4          9088         128          1
OPS$TKYTE  SYS_C0046905                  4         16640           8          1
OPS$TKYTE  SYS_C0046905                  4         16656           8          1
OPS$TKYTE  SYS_C0046905                  4         16672           8          1
OPS$TKYTE  SYS_C0046905                  4         18944           8          1
OPS$TKYTE  SYS_C0046905                  4         18960           8          1
OPS$TKYTE  SYS_C0046905                  4         18976           8          1
OPS$TKYTE  SYS_C0046905                  4         19032           8          1
OPS$TKYTE  SYS_C0046905                  4         19048           8          1
OPS$TKYTE  SYS_C0046905                  4         19064           8          1
OPS$TKYTE  SYS_C0046907                  4           272           8          1
OPS$TKYTE  SYS_C0046907                  4           296           8          1
OPS$TKYTE  SYS_C0046907                  4           344           8          1
OPS$TKYTE  SYS_C0046907                  4          1240           8          1
OPS$TKYTE  SYS_C0046907                  4          1272           8          1
OPS$TKYTE  SYS_C0046907                  4          8832           8          1
OPS$TKYTE  SYS_C0046907                  4          8848           8          1
OPS$TKYTE  SYS_C0046907                  4          8864           8          1
OPS$TKYTE  SYS_C0046907                  4          9216         128          1
OPS$TKYTE  SYS_C0046907                  4         16648           8          1
OPS$TKYTE  SYS_C0046907                  4         16664           8          1
OPS$TKYTE  SYS_C0046907                  4         16680           8          1
OPS$TKYTE  SYS_C0046907                  4         18952           8          1
OPS$TKYTE  SYS_C0046907                  4         18984           8          1
OPS$TKYTE  SYS_C0046907                  4         19024           8          1
OPS$TKYTE  SYS_C0046907                  4         19040           8          1
OPS$TKYTE  SYS_C0046907                  4         19056           8          1
OPS$TKYTE  T                             4          8880           8          1
OPS$TKYTE  T1                            4           152           8          1
OPS$TKYTE  T2                            4           136           8          1
OPS$TKYTE  TBL_HISTORY                   4         29960           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4           144           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4           168          16          2
OPS$TKYTE  TDETAIL_REF_MANY              4           232          16          2
OPS$TKYTE  TDETAIL_REF_MANY              4           256           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4           288           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4           336           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4          1184           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4          1200           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4          1216           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4          1248           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4          2176         128          1
OPS$TKYTE  TDETAIL_REF_MANY              4          6272         128          1
OPS$TKYTE  TDETAIL_REF_MANY              4          8704         128          1
OPS$TKYTE  TDETAIL_REF_MANY              4         10496         128          1
OPS$TKYTE  TDETAIL_REF_MANY              4         16384         128          1
OPS$TKYTE  TDETAIL_REF_MANY              4         16752           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4         18968           8          1
OPS$TKYTE  TDETAIL_REF_MANY              4         19000          16          2
OPS$TKYTE  TDETAIL_REF_ONE               4           184           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4           216          16          2
OPS$TKYTE  TDETAIL_REF_ONE               4           264           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4           304          24          3
OPS$TKYTE  TDETAIL_REF_ONE               4          1176           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4          1192           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4          1208           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4          1224           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4          1256           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4          2304         128          1
OPS$TKYTE  TDETAIL_REF_ONE               4          8576         128          1
OPS$TKYTE  TDETAIL_REF_ONE               4          8960         128          1
OPS$TKYTE  TDETAIL_REF_ONE               4         16256         128          1
OPS$TKYTE  TDETAIL_REF_ONE               4         16744           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4         16760           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4         18992           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4         19016           8          1
OPS$TKYTE  TDETAIL_REF_ONE               4         27776         128          1
OPS$TKYTE  TMASTER                       4           192           8          1
OPS$TKYTE  TMASTER                       4           208           8          1
OPS$TKYTE  TMASTER                       4           352         824         13
OPS$TKYTE  TMASTER                       4          1280         896          7
OPS$TKYTE  TMASTER                       4          2432        3840         30
OPS$TKYTE  TMASTER                       4          6400        2176          3
OPS$TKYTE  TMASTER                       4          9344        1152          2
OPS$TKYTE  TMASTER                       4         10624        5632          9
OPS$TKYTE  TMASTER                       4         16512         128          1
OPS$TKYTE  TMASTER                       4         16688          56          7
OPS$TKYTE  TMASTER                       4         16768        2176          3
OPS$TKYTE  TMASTER                       4         19072        8704         19
OPS$TKYTE  TMASTER                       4         27904        2048          2
free space                               4           128           8          1
free space                               4          8888          72          9
free space                               4         29952           8          1
free space                               4         29968        1488          1

94 rows selected.

Printing range with comma separated values

Arvind, February 06, 2013 - 11:36 pm UTC

Dear Tom,

This is further information as asked by you in my earlier post. In this question, user wants the data to be categorised in conitnous range. But I have some what
same but bit different requirement as given below.

I have table and data as given below
create table ag_t1
(col1 varchar2(5),
col2 varchar2(10)
);

insert into ag_t1 values ('01.01', 'v1');
insert into ag_t1 values ('01.02', 'v1');
insert into ag_t1 values ('01.03', 'v1');
insert into ag_t1 values ('01.04', 'v2');
insert into ag_t1 values ('01.05', 'v1');
insert into ag_t1 values ('01.06', 'v2');
insert into ag_t1 values ('02.01', 'v1');
insert into ag_t1 values ('02.02', 'v1');
insert into ag_t1 values ('02.03', 'v2');
insert into ag_t1 values ('02.04', 'v2');
insert into ag_t1 values ('02.05', 'v1');
insert into ag_t1 values ('02.06', 'v2');
insert into ag_t1 values ('08.01', 'v1');
insert into ag_t1 values ('08.02', 'v1');
insert into ag_t1 values ('08.03', 'v1');
insert into ag_t1 values ('08.04', 'v2');
insert into ag_t1 values ('08.05', 'v1');
insert into ag_t1 values ('08.06', 'v2');
I want the output as given below.
Output
v1 - 01.01 to 01.03, 01.05, 02.01 to 02.02, 02.05, 08.01 to 08.03, 08.05
v2 - 01.04, 01.06, 02.03, 02.04, 02.06, 08.04, 08.06

col1 values can be in three formats
a) 01.01
b) 01.001
c) 1

February 11, 2013 - 7:58 am UTC

you answered nothing that I asked above actually???

sigh.

if there is a 01.001 why isn't it in your example, where does it fit in. Would it be "contiguous" with 01.01? what does it mean.

if there is a 1, why isn't it in your example where does it fit in. Would it be contiguous with 01?

be precise, pretend you were explaining this to your mom - be that detailed.

arvind's analytic problem.

Biswaranjan, February 07, 2013 - 1:13 pm UTC

Hi Arvind,

I just analyzed your request deeply(initially it was very difficult to understand the logic behind it but finally I
got it totally ;)).

please run the below query and let us know whether it works or not
I tested for your input and some random inputs it works fine,
but plz test it.
#############(tested in 11g)I have just modified rajesh's query to fit your output #######
select col2,listagg(reslt,',') within group(order by min_col1) as listed_values
from (
select col2,min(col1) min_col1,max(col1) max_col1,
case when nullif(min(col1),max(col1)) is not null
then min(col1)||' to '||max(col1)
else to_char(min(col1)) end reslt
from (
select col1,col2,case when col2<> 'v2' then max(val) over(order by rownum) else val end grp
from (
select ag_t1.*,
case when lag(col2) over(order by rownum) is null
or lag(col2) over(order by rownum) <> col2 or col2<>'v1' then
row_number() over(order by rownum) end as val
from ag_t1
)
)
group by col2,grp
)
group by col2;
############################
for you data I got the result as below.

v1 - 01.01 to 01.03, 01.05, 02.01 to 02.02, 02.05, 08.01 to 08.03, 08.05
v2 - 01.04, 01.06, 02.03, 02.04, 02.06, 08.04, 08.06

But I know tom can write it more beautifully(regarding performance ).

regards,
Biswaranjan.

February 11, 2013 - 8:50 am UTC

you got it totally?

where does 1 and 01.001 fit into the scheme of things? what would your code do with that?

it is not possible for you to have gotten it totally, it was not specified at a level of detail for anyone to get it right without guessing at what should happen with those values.

slightly different

A reader, February 07, 2013 - 4:22 pm UTC

Hi Tom,

I would like to reclaim the space because the system does not neet 400Gb of free space in the index tablespace.
It does however need disk space to turn on archive log mode and currently the system is at full capacity, no place to add more disks.

I don't think I explained my requirement very well sorry however I liked seeing your reply.

What I am trying to do is work out which indexes should I
rebuild to try and get the free space towards the end of a data file do I can shrink it.

Regards Doug

February 11, 2013 - 9:04 am UTC

oh, the way to do this would be to rebuild all of the indexes in that tablespace into a new tablespace and then drop the old, now empty, tablespace.

the new tablespace will be 100% contiguous (use an autoextend datafile that starts "small" and grows as you put more in there) and the old tablespace can just be dropped.

If you do it by identifying (via dba_extents) what indexes are at the "end" of your datafile - you can move the furthest out index first, shrinking the old datafile as you are growing the new one.

and if you want, just rename the new tablespace to the old name after you are done.

Arvind's question logic.

Biswaranjan., February 08, 2013 - 12:51 am UTC

for Arvinda's question logic.

('01.01', 'v1')
('01.02', 'v1')
('01.03', 'v1')
('01.04', 'v2')
('01.05', 'v1')
('01.06', 'v2')
('02.01', 'v1')
('02.02', 'v1')
('02.03', 'v2')
('02.04', 'v2')
('02.05', 'v1')
('02.06', 'v2')
('08.01', 'v1')
('08.02', 'v1')
('08.03', 'v1')
('08.04', 'v2')
('08.05', 'v1')
('08.06', 'v2')

v1- 01.01 to 01.03(coz it is continuos upto 01.03,so skipped 01.02),01.05
(though it has no v1 values above and below it we will keep it as it ),
now coming down 02.01 to 02.0(coz this v1 values starts from 02.01 and continuos upto 02.02),
02.05(though it has no v1 values above and below it we will keep it as it between commas),
now coming down 08.01 to 08.03(coz this v1 values starts from 08.01 and continuos upto 08.03),
and finally 08.05 (though it has no v1 values above and below it we will keep it as it).
so with out comment it would be as below for 'v1'
01.01 to 01.03, 01.05, 02.01 to 02.02, 02.05, 08.01 to 08.03, 08.05

And for v2 we should not see anything.place each and every value for v2 as comma separated.
so simply as below.
01.04, 01.06, 02.03, 02.04, 02.06, 08.04, 08.06

That is all the logic ,what I understood.

thanks & regards,
Biswaranjan.

:) you opened my eyes further.

Biswaranjan, February 11, 2013 - 9:28 am UTC

frankly speaking, I Just guessed that logic(as you know :)).I should not have written like "got it totally".

But I know guessing might be disastrous and wasting of time
in many case.

Thank you Tom.

By the way ,
I have been using this site ,since 3 years but never
faced any error opening this site(I mean clicking any link on your homepage).
But Now I am getting the below error.
######
ORA-01654: unable to extend index ASKTOM.ATE_QUESTION_LOG_IDX by 8192 in tablespace FLOW_19246
##############
I even tried from linux machine but samething happend.
By accessing many times I was able to open and writing now :).

Is it some issue with your site or My system problem (am confused).

Thanks agian.

February 11, 2013 - 10:12 am UTC

sorry about that, I ran out of space on apex.oracle.com and had to up my quota.

Continuation from Matt’s request on 1/15/2013

Artieboy, February 19, 2013 - 6:19 pm UTC

Hey Tom:

Thanks for letting me know my query did not work. So how about this? It’s kind of like your carry down approach but it looks for the first value instead.

First I added in some extra rows after Matt’s original records and the two records you added:

INSERT INTO mtemp VALUES('11-jan-2013', '12-jan-2013', 1, 2);
INSERT INTO mtemp VALUES('13-jan-2013', '14-jan-2013', 1, 2);
INSERT INTO mtemp VALUES('15-jan-2013', '16-jan-2013', 5000, 6000);
INSERT INTO mtemp VALUES('17-jan-2013', '18-jan-2013', 1, 2);

Query:

select min(startdt), max(enddt), ra, nqc
from

(--3
select startdt, enddt, ra, nqc,
case when c1 is null then first_value(c1 ignore nulls)
over(order by startdt rows between current row and unbounded following)-1 else c1 end grp
from

(--2
select startdt, enddt, ra, nqc, lag_ra, lead_ra,
case when (ra=lag_ra or ra=lead_ra) AND (nqc=lag_nqc or nqc=lead_nqc)
then null else startdt end c1
from

(--1
select startdt, enddt, ra,
lag(ra) over(order by startdt)lag_ra,
lead(ra) over (order by startdt) lead_ra,
nqc,
lag(nqc) over (order by startdt)lag_nqc,
lead(nqc) over (order by startdt)lead_nqc
from mtemp
)--1
)--2
)--3
group by grp, ra, nqc
order by 1

Explanation:

Where ra does not equal the ra of the next or previous record (same thing for nqc) then that is a stand alone record. For those records give me the startdt of that row, for all others return null.

Now I have all the “break points”, places where a continuous set is broken. For all null values, starting from the current row and going down, give me the first date seen of column c1 minus 1 day, This creates a grouping based on the date. Then aggregate.

In addition:

I added one more record:

INSERT INTO mtemp VALUES('30-dec-2012', '31-dec-2012', 5000, 6000);

This gives the first row of column c1 a date entry and the query still works.

Is this viable solution Oracle? Just seems easier to group records that have a date field (min(date), max(date)) with a date field.

Cheers!

February 25, 2013 - 8:27 am UTC

.. Just seems easier to group records that have a
date field (min(date), max(date)) with a date field.
...

how so? Your query has more levels of nesting - and seems more complex to me.

ops$tkyte%ORA11GR2> select startdt, enddt, ra, nqc, last_value(tag ignore nulls) over (order by 
startdt) grp
  2    from (
  3  select startdt, enddt, ra, nqc,
  4         case when decode( lag(ra) over (order by startdt), ra, 1,0 ) = 0
  5                       and
  6                   decode( lag(nqc) over (order by startdt), nqc, 1,0 ) = 0
  7                          then row_number() over (order by startdt)
  8                  end tag
  9    from mtemp
 10         )
 11  /

two layers of nesting.

I didn't prove your query works (or not) - I'm just stating that I find your approach much less obvious.

arvind's analytic problem

A reader, February 21, 2013 - 10:17 pm UTC

Dear Tom,

This is further information as asked by you in my earlier post. In this question, user wants the
data to be categorised in conitnous range. But I have some what
same but bit different requirement as given below.

I have table and data as given below
create table ag_t1
(col3 varchar2(4)
col1 varchar2(5),
col2 varchar2(10)
);

here my primary key is col3, and col1

Data set 1 -----

insert into ag_t1 values ('4000','01.01', 'v1');
insert into ag_t1 values ('4000','01.02', 'v1');
insert into ag_t1 values ('4000','01.03', 'v1');
insert into ag_t1 values ('4000','01.04', 'v2');
insert into ag_t1 values ('4000','01.05', 'v1');
insert into ag_t1 values ('4000','01.06', 'v2');
insert into ag_t1 values ('4000','02.01', 'v1');
insert into ag_t1 values ('4000','02.02', 'v1');
insert into ag_t1 values ('4000','02.03', 'v2');
insert into ag_t1 values ('4000','02.04', 'v2');
insert into ag_t1 values ('4000','02.05', 'v1');
insert into ag_t1 values ('4000','02.06', 'v2');
insert into ag_t1 values ('4000','08.01', 'v1');
insert into ag_t1 values ('4000','08.02', 'v1');
insert into ag_t1 values ('4000','08.03', 'v1');
insert into ag_t1 values ('4000','08.04', 'v2');
insert into ag_t1 values ('4000','08.05', 'v1');
insert into ag_t1 values ('4000','08.06', 'v2');

for above data set, I want the output as given below.

Output
v1 - 01.01 to 01.03, 01.05, 02.01 to 02.02, 02.05, 08.01 to 08.03, 08.05
v2 - 01.04, 01.06, 02.03, 02.04, 02.06, 08.04, 08.06

Data set 2 -----

insert into ag_t1 values ('4001','1', 'v3');
insert into ag_t1 values ('4001','2', 'v3');
insert into ag_t1 values ('4001','3', 'v3');
insert into ag_t1 values ('4001','4', 'v4');
insert into ag_t1 values ('4001','5', 'v3');
insert into ag_t1 values ('4001','6', 'v4');

for above data set, I want the output as given below.

Output
v3 - 1 to 3, 5
v4 - 4, 6

Data set 3 -----

insert into ag_t1 values ('4002','01.001', 'v1');
insert into ag_t1 values ('4002','01.002', 'v1');
insert into ag_t1 values ('4002','01.003', 'v1');
insert into ag_t1 values ('4002','01.004', 'v2');
insert into ag_t1 values ('4002','01.005', 'v1');
insert into ag_t1 values ('4002','01.006', 'v2');
insert into ag_t1 values ('4002','02.001', 'v1');
insert into ag_t1 values ('4002','02.002', 'v1');
insert into ag_t1 values ('4002','02.003', 'v2');
insert into ag_t1 values ('4002','02.004', 'v2');
insert into ag_t1 values ('4002','02.005', 'v1');
insert into ag_t1 values ('4002','02.006', 'v2');
insert into ag_t1 values ('4002','08.001', 'v1');
insert into ag_t1 values ('4002','08.002', 'v1');
insert into ag_t1 values ('4002','08.003', 'v1');
insert into ag_t1 values ('4002','08.004', 'v2');
insert into ag_t1 values ('4002','08.005', 'v1');
insert into ag_t1 values ('4002','08.006', 'v2');

for above data set, I want the output as given below.

Output
v1 - 01.001 to 01.003, 01.005, 02.001 to 02.002, 02.005, 08.001 to 08.003, 08.005
v2 - 01.004, 01.006, 02.003, 02.004, 02.006, 08.004, 08.006

February 25, 2013 - 11:11 am UTC

ops$tkyte%ORA11GR2> select c1, listagg( label, ',' ) within group (order by max_grp) str
  2    from (
  3  select c1, max_grp, to_char(min(c2),'fm00.00') || case when min(c2) <> max(c2)
  4                                                         then ' to ' || to_char(max(c2),'fm00.00')
  5                                                     end label
  6    from (
  7  select c1, c2, max(grp) over (partition by c1 order by c2) max_grp
  8    from (
  9  select c1, c2,
 10         case when nvl(lag(c2) over (partition by c1 order by c2), c2) <> c2-0.01
 11              then row_number() over (partition by c1 order by c2)
 12          end grp
 13    from (select col2 C1, to_number(col1) C2
 14            from ag_t1)
 15         )
 16         )
 17   group by c1, max_grp
 18         )
 19   group by c1
 20   order by c1
 21  /

C1         STR
---------- ------------------------------------------------------------------------
v1         01.01 to 01.03,01.05,02.01 to 02.02,02.05,08.01 to 08.03,08.05
v2         01.04,01.06,02.03 to 02.04,02.06,08.04,08.06

now you see the technique (break the query out layer by layer, running the innermost query, then the next layer and so on so you can understand what is going on)

and then you can easily do it for the other data sets.

Analytical Query

Arvind, February 28, 2013 - 3:13 am UTC

Dear Tom,

Thanks for your response to my query. I have gone thru the query. Some time, data is coming with alphabets like 01.01A, 01.01B, 01.02 etc. In that case it may not be possible to convert date to number.

Thanks & Regards
Arvind

February 28, 2013 - 8:02 am UTC

you didn't specify that.

get us a FULL specification that covers 100% of the issue and maybe we can take a look at it.

we can only deal with what we are given. even here you don't say what to do with these values, you give no example (creates, inserts) to work with.

Arvind query

Stew Ashton, February 28, 2013 - 9:51 am UTC

I'm being a bad boy. Tom said to specify more fully, so I should wait for Arvind to answer, but I'm not.

My variant is a bit "generic", but it only works if two assumptions are true:
- first, that the col1 values are formatted so that a simple ORDER BY works, without having to convert to numbers;
- and second, that any two successive col1 values are "consecutive" as long as the the col3 and col2 values are the same.

These assumptions fit the datasets given. If they are wrong, then Arvind has a lot more to specify.

One other thing, it appears that Arvind only wants " to " when there are at least 3 consecutive values.

P.S. This is another example of the "Tabibitosan method" as explained at https://forums.oracle.com/forums/thread.jspa?threadID=1005478

select col3, col2, listagg(
  min_col1 || decode(cnt, 1, null, 2, ', ' || max_col1, ' to ' || max_col1)
  , ', '
) within group(order by min_col1) col1
from (
  select col3, col2, count(*) cnt, min(col1) min_col1, max(col1) max_col1
  from (
    select col3, col2, col1,
    row_number() over(partition by col3 order by col1) -
    row_number() over(partition by col3, col2 order by col1) grp
    from ag_t1 t
  )
  group by col3, col2, grp
)
group by col3, col2
order by col3, col2;

COL3  COL2  COL1
----- ----- --------------------------------------------------------------------------
4000  v1    01.01 to 01.03, 01.05, 02.01, 02.02, 02.05, 08.01 to 08.03, 08.05
4000  v2    01.04, 01.06, 02.03, 02.04, 02.06, 08.04, 08.06
4001  v3    1 to 3, 5
4001  v4    4, 6
4002  v1    01.001 to 01.003, 01.005, 02.001, 02.002, 02.005, 08.001 to 08.003, 08.005
4002  v2    01.004, 01.006, 02.003, 02.004, 02.006, 08.004, 08.006

February 28, 2013 - 11:52 am UTC

nope, doesn't work - assumption 2 cannot be true. the want them to be consecutive if they different by a unit - 0.01 in this case.

for example, if you remove a row:

-- insert into ag_t1 values ('4000','01.04', 'v2');

then your result becomes:

ops$tkyte%ORA11GR2> select col3, col2, listagg(
  2    min_col1 || decode(cnt, 1, null, 2, ', ' || max_col1, ' to ' || max_col1)
  3    , ', '
  4  ) within group(order by min_col1) col1
  5  from (
  6    select col3, col2, count(*) cnt, min(col1) min_col1, max(col1) max_col1
  7    from (
  8      select col3, col2, col1,
  9      row_number() over(partition by col3 order by col1) -
 10      row_number() over(partition by col3, col2 order by col1) grp
 11      from ag_t1 t
 12    )
 13    group by col3, col2, grp
 14  )
 15  group by col3, col2
 16  order by col3, col2;

COL3 COL2       COL1
---- ---------- ------------------------------------------------------------------------
4000 v1         01.01 to 01.05, 02.01, 02.02, 02.05, 08.01 to 08.03, 08.05
4000 v2         01.06, 02.03, 02.04, 02.06, 08.04, 08.06

but we don't have 1.01 to 1.05 - 1.04 is missing...

they have not specified this adequately to answer - but I know that assumption #2 cannot be correct - removing a row for V2 should not affect the answer of V1.

Analytical Query

Arvind, March 03, 2013 - 11:38 pm UTC

This sample includes all cases

This is further information as asked by you in my earlier post. In this question, user wants the
data to be categorised in conitnous range. But I have some what
same but bit different requirement as given below.

I have table and data as given below
create table ag_t1
(col3 varchar2(4)
col1 varchar2(5),
col2 varchar2(10)
);

here my primary key is col3, and col1

Data set 1 -----

insert into ag_t1 values ('4000','01.01', 'v1');
insert into ag_t1 values ('4000','01.02', 'v1');
insert into ag_t1 values ('4000','01.03', 'v1');
insert into ag_t1 values ('4000','01.04', 'v2');
insert into ag_t1 values ('4000','01.05', 'v1');
insert into ag_t1 values ('4000','01.06', 'v2');
insert into ag_t1 values ('4000','02.01', 'v1');
insert into ag_t1 values ('4000','02.02', 'v1');
insert into ag_t1 values ('4000','02.03', 'v2');
insert into ag_t1 values ('4000','02.04', 'v2');
insert into ag_t1 values ('4000','02.05', 'v1');
insert into ag_t1 values ('4000','02.06', 'v2');
insert into ag_t1 values ('4000','08.01', 'v1');
insert into ag_t1 values ('4000','08.02', 'v1');
insert into ag_t1 values ('4000','08.03', 'v1');
insert into ag_t1 values ('4000','08.04', 'v2');
insert into ag_t1 values ('4000','08.05', 'v1');
insert into ag_t1 values ('4000','08.06', 'v2');

for above data set, I want the output as given below.

Output
v1 - 01.01 to 01.03, 01.05, 02.01 to 02.02, 02.05, 08.01 to 08.03, 08.05
v2 - 01.04, 01.06, 02.03, 02.04, 02.06, 08.04, 08.06

Data set 2 -----

insert into ag_t1 values ('4001','1', 'v3');
insert into ag_t1 values ('4001','2', 'v3');
insert into ag_t1 values ('4001','3', 'v3');
insert into ag_t1 values ('4001','4', 'v4');
insert into ag_t1 values ('4001','5', 'v3');
insert into ag_t1 values ('4001','6', 'v4');

for above data set, I want the output as given below.

Output
v3 - 1 to 3, 5
v4 - 4, 6

Data set 3 -----

insert into ag_t1 values ('4002','01.001', 'v1');
insert into ag_t1 values ('4002','01.002', 'v1');
insert into ag_t1 values ('4002','01.003', 'v1');
insert into ag_t1 values ('4002','01.004', 'v2');
insert into ag_t1 values ('4002','01.005', 'v1');
insert into ag_t1 values ('4002','01.006', 'v2');
insert into ag_t1 values ('4002','02.001', 'v1');
insert into ag_t1 values ('4002','02.002', 'v1');
insert into ag_t1 values ('4002','02.003', 'v2');
insert into ag_t1 values ('4002','02.004', 'v2');
insert into ag_t1 values ('4002','02.005', 'v1');
insert into ag_t1 values ('4002','02.006', 'v2');
insert into ag_t1 values ('4002','08.001', 'v1');
insert into ag_t1 values ('4002','08.002', 'v1');
insert into ag_t1 values ('4002','08.003', 'v1');
insert into ag_t1 values ('4002','08.004', 'v2');
insert into ag_t1 values ('4002','08.005', 'v1');
insert into ag_t1 values ('4002','08.006', 'v2');

for above data set, I want the output as given below.

Output
v1 - 01.001 to 01.003, 01.005, 02.001 to 02.002, 02.005, 08.001 to 08.003, 08.005
v2 - 01.004, 01.006, 02.003, 02.004, 02.006, 08.004, 08.006

Data set 4 -----

insert into ag_t1 values ('4000','01.01A', 'v1');
insert into ag_t1 values ('4000','01.01B', 'v1');
insert into ag_t1 values ('4000','01.01C', 'v1');
insert into ag_t1 values ('4000','01.02', 'v1');
insert into ag_t1 values ('4000','01.03', 'v1');
insert into ag_t1 values ('4000','01.04', 'v2');
insert into ag_t1 values ('4000','01.05', 'v1');
insert into ag_t1 values ('4000','01.06', 'v2');
insert into ag_t1 values ('4000','02.01A', 'v1');
insert into ag_t1 values ('4000','02.01B', 'v1');
insert into ag_t1 values ('4000','02.01C', 'v1');
insert into ag_t1 values ('4000','02.02', 'v1');
insert into ag_t1 values ('4000','02.03', 'v2');
insert into ag_t1 values ('4000','02.04', 'v2');
insert into ag_t1 values ('4000','02.05', 'v1');
insert into ag_t1 values ('4000','02.06', 'v2');
insert into ag_t1 values ('4000','08.01A', 'v1');
insert into ag_t1 values ('4000','08.01B', 'v1');
insert into ag_t1 values ('4000','08.01C', 'v1');
insert into ag_t1 values ('4000','08.02', 'v1');
insert into ag_t1 values ('4000','08.03', 'v1');
insert into ag_t1 values ('4000','08.04', 'v2');
insert into ag_t1 values ('4000','08.05', 'v1');
insert into ag_t1 values ('4000','08.06', 'v2');

for above data set, I want the output as given below.

Output
v1 - 01.01A to 01.01C, 01.02, 01.03, 01.05, 02.01A to 02.01C, 02.02, 02.05, 08.01A to 08.01C, 08.02, 08.03, 08.05
v2 - 01.04, 01.06, 02.03, 02.04, 02.06, 08.04, 08.06

March 04, 2013 - 4:32 pm UTC

why did you post all of the redundant stuff ...

and a create table that does not work

with inserts that fail after you fix the create table...

*
ERROR at line 1:
ORA-12899: value too large for column "OPS$TKYTE"."AG_T1"."COL1" (actual: 6,
maximum: 5)

ugh....

if we are allowed to assume:

all data will either be in the form

NN.NN where N is a digit between 0 and 9

or

NN.NNC where N is a digit between 0 and 9 and C is a character between A and Z

then, a simple variation on a theme can work - we'll convert NN.NN into a number and NN.NNC into a number by taking NN.NN + ascii-code(C)/10000

then the original query just works - albeit with a little more complex formatting of the output.

If these assumptions are not correct - FIX YOUR SPECIFICATION (actually, provide one, you never ever have...)

ops$tkyte%ORA11GR2> select c1, listagg( label, ',' ) within group (order by max_grp) str
  2    from (
  3  select c1, max_grp,
  4         to_char( trunc(min(c2),2) ,'fm00.00') ||
  5                      case when (min(c2)-trunc(min(c2),2)) <> 0
  6                                       then chr( (min(c2)-trunc(min(c2),2))*10000 )
  7                                   end ||
  8                      case when min(c2) <> max(c2)
  9                  then ' to ' || to_char( trunc(max(c2)),'fm00.00') ||
 10                            case when (max(c2)-trunc(max(c2),2)) <> 0
 11                                             then chr( (max(c2)-trunc(max(c2),2))*10000 )
 12                                         end
 13                  end label
 14    from (
 15  select c1, c2, max(grp) over (partition by c1 order by c2) max_grp
 16    from (
 17  select c1, c2,
 18         case when nvl(lag(c2) over (partition by c1 order by c2), c2) <> c2-0.0001
 19              then row_number() over (partition by c1 order by c2)
 20          end grp
 21    from (select col2 C1, to_number(substr( col1, 1, 5 ))+ nvl(ascii(substr(col1,6,1))/10000,0) C2
 22            from ag_t1)
 23         )
 24         )
 25   group by c1, max_grp
 26         )
 27   group by c1
 28   order by c1
 29  /

C1
----------
STR
-------------------------------------------------------------------------------
v1
01.01A to 01.00C,01.02,01.03,01.05,02.01A to 02.00C,02.02,02.05,08.01A to 08.00
C,08.02,08.03,08.05

v2
01.04,01.06,02.03,02.04,02.06,08.04,08.06

problem understanding statement

preet, March 20, 2013 - 4:56 pm UTC

Hi Tom,

I am going through ORACLE documentation for understanding Analytic function. Unfortunately, I couldn't get this statement "To filter the results of a query based on an analytic function, nest these functions within the parent query, and then filter the results of the nested subquery."

Could you please provide some example on this statement? I couldn't understand it.

Thank you

March 25, 2013 - 3:46 pm UTC

see my original answer, it does just this.

I cannot write:

where lead(station) over (partition by order# order by close_date) <> station

so I have to alias lead(station) over (partition by order# order by close_date) as "lead_station" and in another level of the query use

where lead_station <> station

Basically you cannot "where" on the analytics, you have to create an inline view (or with factored subquery) and then where on the results of that inline view.

Anand, March 25, 2013 - 6:57 am UTC

Hi Tom,

I have table as below :

DROP TABLE test
/
CREATE TABLE test (branch_code NUMBER,tot_cnt NUMBER ,process_id NUMBER )
/
INSERT INTO test VALUES (1,5000,99)
/
INSERT INTO test VALUES (2,2500,99)
/
INSERT INTO test VALUES (3,2500,99)
/
INSERT INTO test VALUES (4,3000,99)
/
INSERT INTO test VALUES (5,2000,99)
/
INSERT INTO test VALUES (6,1000,99)
/
INSERT INTO test VALUES (7,4000,99)
/
INSERT INTO test VALUES (8,1200,99)
/

Now i want to updated the process_id from 1 to 6 from default 99.It should divide in such a
way that it should have tot_count equally as much as possible in each of the process_id.i know it not possible
to have count exactly equal.

If i use below query it will give me count higher on some process_id and lower on some process id :

MERGE INTO test a
USING (SELECT branch_code,
ntile(6) over(order by branch_code) stream
FROM test) b
ON (a.branch_code = b.branch_code)
WHEN MATCHED THEN
UPDATE SET a.process_id = b.stream;

Result :

SELECT SUM(TOT_CNT), PROCESS_ID FROM TEST GROUP BY PROCESS_ID

SUM(TOT_CNT) PROCESS_ID
7500 1 ---> Higher count
5500 2
2000 3
1000 4 ---> Lower count
4000 5
1200 6

Anand, April 23, 2013 - 1:27 pm UTC

Hi Tom,

Waiting for your reply on above question.

April 23, 2013 - 2:56 pm UTC

search for

bin fitting

on this site (and others). there are approaches with the model clause, many demonstrated.

Anand, April 25, 2013 - 1:47 pm UTC

Hi Tom,

Thanks for you reply .Never knew about this.

You have suggested below site in one of forum :

http://www.jlcomp.demon.co.uk/faq/Bin_Fitting.html

Now my question how if i want divide the record in 15 or 20 bucket ? Do i have to modify the query and add bucket 3,4,5...20(as per example) ?

April 25, 2013 - 7:49 pm UTC

left as an exercise for the reader... we've pointed you to the "technique". Learn the technique and you'll be able to use it in thousands of places over your career ;)

Recursive Subquery Factors...

Brendan, April 26, 2013 - 2:40 pm UTC

I had a look at that link to a Model (approximate) solution and you have to rewrite the query to change the number of buckets. I implemented a similar algorithm using the v11.2 feature Recursive Subquery Factoring in which I parametrise the number of buckets. It's not quite identical as it distributes the items in batches across the set of buckets, but gave the same answers on my test problems, which involved 10,001 items of random values.

Insert test data:

INSERT INTO items
SELECT 'item' || n, DBMS_Random.Value (0, 10000) FROM  
(SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <10002)

Set the variable for number of buckets:

VAR N_BUCKETS NUMBER
EXECUTE :N_BUCKETS := 3

Recursive Subquery Factor query:

WITH buckets AS (
       SELECT LEVEL bucket, :N_BUCKETS n_buckets FROM DUAL CONNECT BY LEVEL <= :N_BUCKETS
), items_desc AS (
       SELECT item_name, item_value, Row_Number () OVER (ORDER BY item_value DESC) rn
         FROM items
), rsf (bucket, item_name, item_value, bin_value, lev, bin_rank, n_buckets) AS (
SELECT b.bucket,
       i.item_name, 
       i.item_value, 
       i.item_value,
       1,
       b.n_buckets - i.rn + 1,
       b.n_buckets
  FROM buckets b
  JOIN items_desc i
    ON i.rn = b.bucket
 UNION ALL
SELECT r.bucket,
       i.item_name, 
       i.item_value, 
       r.bin_value + i.item_value,
       r.lev + 1,
       Row_Number () OVER (ORDER BY r.bin_value + i.item_value),
       r.n_buckets
  FROM rsf r
  JOIN items_desc i
    ON (i.rn - r.lev * r.n_buckets) = r.bin_rank
)
SELECT r.item_name,
       r.bucket, r.item_value, r.bin_value, r.lev, r.bin_rank
  FROM rsf r
 ORDER BY item_value DESC

Output extract for linked Model for 3 buckets, and RSF for 3 and 20 buckets:

Model 3 buckets

NAME                                VALUE BUCKET_NAME         B1         B2         B3
------------------------------ ---------- ----------- ---------- ---------- ----------
item8937                             9999           1       9999
item4315                             9999           2       9999       9999
item3367                             9999           3       9999       9999       9999
...
item5523                                1           2   16734557   16734557   16734557
item145                                 1           1   16734558   16734557   16734557
item3670                                0           2   16734558   16734557   16734557

10001 rows selected.

Elapsed: 00:03:00.79

RSF 3 buckets

ITEM_NAME                          BUCKET ITEM_VALUE  BIN_VALUE        LEV   BIN_RANK
------------------------------ ---------- ---------- ---------- ---------- ----------
item3367                                3       9999       9999          1          1
item8937                                1       9999       9999          1          3
item4315                                2       9999       9999          1          2
...
item5523                                3          1   16734557       3333          2
item145                                 2          1   16734558       3334          2
item3670                                3          0   16734557       3334          1

10001 rows selected.

Elapsed: 00:02:37.65

RSF 20 buckets

ITEM_NAME                          BUCKET ITEM_VALUE  BIN_VALUE        LEV   BIN_RANK
------------------------------ ---------- ---------- ---------- ---------- ----------
item8937                                1       9999       9999          1         20
item3367                                3       9999       9999          1         18
item4315                                2       9999       9999          1         19
...
item5523                               20          1    2510177        500          1
item145                                19          1    2510178        500          3
item3670                               20          0    2510177        501          1

10001 rows selected.

Elapsed: 00:02:25.73

...and Plain Old SQL

Brendan, April 26, 2013 - 3:53 pm UTC

On further thought, you might consider a very simple solution where you just assign items sequentially to buckets in value order. This won't give quite as good an answer but often it might be good enough and it's much simpler and faster. Here are the results for 3 buckets, showing model and rsf giving the same very good results, with the simple solution a bit worse, but much quicker:

Model 3 buckets

NAME                                VALUE BUCKET_NAME         B1         B2         B3
------------------------------ ---------- ----------- ---------- ---------- ----------
item906                              9999           1       9999
item1343                             9998           2       9999       9998
item2819                             9995           3       9999       9998       9995
...
item3158                                7           3   16701010   16701010   16701010
item814                                 5           1   16701015   16701010   16701010
item5304                                4           2   16701015   16701014   16701010

10001 rows selected.

Elapsed: 00:03:01.91
RSF 3 buckets

ITEM_NAME                          BUCKET ITEM_VALUE  BIN_VALUE        LEV   BIN_RANK
------------------------------ ---------- ---------- ---------- ---------- ----------
item906                                 1       9999       9999          1          3
item1343                                2       9998       9998          1          2
item2819                                3       9995       9995          1          1
...
item3158                                1          7   16701010       3333          2
item814                                 2          5   16701015       3334          2
item5304                                1          4   16701014       3334          1

10001 rows selected.

Elapsed: 00:02:37.84

POS 3 buckets

ITEM_NAME                          BUCKET ITEM_VALUE BUCKET_TOTAL
------------------------------ ---------- ---------- ------------
item906                                 2       9999     16704330
item1343                                3       9998     16701019
item2819                                1       9995     16697690
...
item3158                                1          7     16697690
item814                                 2          5     16704330
item5304                                3          4     16701019

10001 rows selected.

Elapsed: 00:00:03.00

Here is the simple SQL:

WITH items_desc AS (
       SELECT item_name, item_value, Mod (Row_Number () OVER (ORDER BY item_value DESC), :N_BUCKETS) + 1 bucket
         FROM items
)
SELECT item_name, bucket, item_value, Sum (item_value) OVER (PARTITION BY bucket) bucket_total
  FROM items_desc
 ORDER BY item_value DESC

Improved MODEL

Brendan, April 28, 2013 - 2:18 pm UTC

And, for MODEL aficianados, here I've taken the same underlying algorithm as in the link above, but used MODEL in a slightly different way that avoids embedding the number of buckets in the query structure, and also avoids the expensive repeated summing. I use the first :N_BINS rows to store the running totals, rather than columns. It runs in about 50s on my PC compared with around 110s for both the original MODEL and my RSF (it's a different PC from above). As a bonus it's quite a lot simpler too.

SELECT item_name, bin, item_value, CASE WHEN rn_m <= :N_BINS THEN bin_value END bin_value
  FROM items
  MODEL 
    DIMENSION BY (Row_Number() OVER (ORDER BY item_value DESC) rn)
    MEASURES (item_name, 
              item_value,
              0 bin,
              item_value bin_value,
              Row_Number() OVER (ORDER BY item_value DESC) rn_m,
              0 min_bin,
              Count(*) OVER () - :N_BINS - 1 n_iters
    )
    RULES ITERATE(10000) UNTIL (ITERATION_NUMBER >= n_iters[1]) (
      bin[rn <= :N_BINS] = CV(rn),
      min_bin[1] = Min(rn_m) KEEP (DENSE_RANK FIRST ORDER BY bin_value)[rn <= :N_BINS],
      bin[ITERATION_NUMBER + :N_BINS + 1] = min_bin[1],
      bin_value[min_bin[1]] = bin_value[CV()] + item_value[ITERATION_NUMBER + :N_BINS + 1]
    )
 ORDER BY item_value DESC

@Brendan re: Improved MODEL

Stew Ashton, April 28, 2013 - 7:57 pm UTC

Brendan,

Your last MODEL solution is brilliant. Studying it, I noticed that it doesn't quite work when the number of buckets is greater than or equal to the number of items. I suggest this slight adjustment to handle the edge case. It adds an nvl() near the end and filters out a possible extra row with a null item_name.

SELECT item_name, bin, item_value, CASE WHEN rn_m <= :N_BINS THEN bin_value END bin_value
FROM (
  select * from items
  MODEL 
    DIMENSION BY (Row_Number() OVER (ORDER BY item_value DESC) rn)
    MEASURES (item_name, 
              item_value,
              0 bin,
              item_value bin_value,
              Row_Number() OVER (ORDER BY item_value DESC) rn_m,
              0 min_bin,
              Count(*) OVER () - :N_BINS - 1 n_iters
    )
    RULES ITERATE(10000) UNTIL (ITERATION_NUMBER >= n_iters[1]) (
      bin[rn <= :N_BINS] = CV(rn),
      min_bin[1] =
        Min(rn_m) KEEP (DENSE_RANK FIRST ORDER BY bin_value)[rn <= :N_BINS],
      bin[ITERATION_NUMBER + :N_BINS + 1] = min_bin[1],
      bin_value[min_bin[1]] =
        bin_value[CV()] + nvl(item_value[ITERATION_NUMBER + :N_BINS + 1],0)
    )
)
where item_name is not null
ORDER BY item_value DESC

@Brendan: another suggestion

Stew Ashton, April 28, 2013 - 8:53 pm UTC

In the MEASURES clause, put

Row_Number() OVER (ORDER BY item_value DESC) bin,

instead of

0 bin,

then remove the first RULE. In my tests, it cuts elapsed time by about one third.

@Stew: Thanks

Brendan, April 29, 2013 - 11:16 am UTC

Thanks, Stew

You are correct on both points. I hadn't tested the more items than buckets special case, and I get a reduction from 68s to 39s by initialising bin to its desired value instead of setting it in the rules.

Bin fitting: PL/SQL seems better

Stew Ashton, April 29, 2013 - 12:16 pm UTC

Further testing indicates that my second suggestion is no help. However, I did try a pipelined table function for comparison and it runs an order or two of magnitude faster than the MODEL solution. I suspect it scales better too, assuming the MODEL solution keeps the whole table in memory.

> truncate table items
/
table ITEMS truncated.

> insert /*+ append */ into items
select level, dbms_random.value(0,10000) from dual connect by level <= 10002
/
10,002 rows inserted.

> commit
/
committed.

> create or replace package bin_fit as
cursor cur_out is select item_name, item_value, 0 num_bin from items;
type tt_out is table of cur_out%rowtype;
function do(p_numbins number) return tt_out pipelined;
end bin_fit;
/
PACKAGE BIN_FIT compiled

> create or replace package body bin_fit as
function do(p_numbins number) return tt_out pipelined is
  l_bins sys.odcinumberlist := sys.odcinumberlist();
  min_bin number := 1;
begin
  l_bins.extend(p_numbins);
  for i in 1..l_bins.count loop
    l_bins(i) := 0;
  end loop;
  for rec in (
    select item_name, item_value, 0 num_bin
    from items order by item_value desc
  ) loop
    l_bins(min_bin) := l_bins(min_bin) + rec.item_value;
    rec.num_bin := min_bin;
    pipe row (rec);
    for i in 1..l_bins.count loop
      if l_bins(i) < l_bins(min_bin) then
        min_bin := i;
      end if;
    end loop;
  end loop;
  return;
end do;
end bin_fit;
/
PACKAGE BODY BIN_FIT compiled

> select num_bin, sum(item_value) from (
  select * from table(bin_fit.do(20))
)
group by num_bin
order by num_bin
/
   NUM_BIN SUM(ITEM_VALUE)
---------- ---------------
         1     2506121.265 
         2     2506121.058 
         3     2506126.829 
         4     2506126.745 
         5     2506128.011 
         6     2506123.159 
         7     2506125.575 
         8      2506126.78 
         9     2506120.774 
        10     2506120.798 
        11     2506120.623 
        12      2506123.07 
        13     2506120.905 
        14       2506123.8 
        15     2506128.144 
        16     2506126.882 
        17     2506120.592 
        18     2506124.556 
        19      2506128.09 
        20     2506125.791 

 20 rows selected

Elapsed: 00:00:00.130

PL/SQL

Brendan, April 29, 2013 - 12:26 pm UTC

Yes, you'd expect PL/SQL to be much faster as it's an essentially procedural algorithm.

multiple record matching using self join or analytics

santosh, June 12, 2013 - 3:16 am UTC

We have requirement to find matching rows (3 or 4 columns match ) from within same table .There may be a match of 1 row to another and also ,1 to many and also, many to many .

Do we have an efficient way to do this if data in scope could be in range of millions.

June 18, 2013 - 2:50 pm UTC

the database is brutally efficient at doing joins. Just join.

Anand, June 20, 2013 - 11:47 am UTC

Hi Tom,

I have a table like below :

create table test (limit_id NUMBER,stream NUMBER)
/
insert into test values (123,1)
/
insert into test values (123,2)
/
insert into test values (456,3)
/
insert into test values (678,2)
/

Now i want to use analytical function to get the such limit_id which is having 2 different stream.

June 20, 2013 - 2:52 pm UTC

no analytics, just good old aggregates:

ops$tkyte%ORA11GR2> select limit_id from test group by limit_id having count(distinct stream) = 2;

  LIMIT_ID
----------
       123

Anand, June 20, 2013 - 4:06 pm UTC

Thanks tom....

nth_value

Rajeshwaran, June 21, 2013 - 1:26 pm UTC

Tom:

What additional information I need to provide nth_value function to get the output as 2975 rather than 3000? I know I can easily achieve results using dense_rank but would like to know how can this be handled using nth_value.

rajesh@ORA11G> select empno,deptno,sal,
  2    nth_value(sal,2) over(partition by
  3    deptno order by sal desc) nth_sal
  4  from emp
  5  where deptno = 20
  6  order by deptno,sal desc ;

     EMPNO     DEPTNO        SAL    NTH_SAL
---------- ---------- ---------- ----------
      7788         20       3000       3000
      7902         20       3000       3000
      7566         20       2975       3000
      7876         20       1100       3000
      7369         20        800       3000

July 01, 2013 - 4:50 pm UTC

nth value gives you the nth value in a window - that is 3,000 in this case. that is simply what this function does. It isn't looking for the nth DISTINCT value, just the nth value after ordering and partitioning.

Need a help on formulating report query

kumar, July 10, 2013 - 8:01 am UTC

Hi Tom,
I need to generate weekly(as of Monday 0:01 AM) and monthly(as of 1st Date of month)
report for the following set of data which is a modification history of a product.
I need to produce a count of product for every latest status(max of seq_id within/group by prod_id) of product for every week or month.
Lets take an example for weekly report:-
Need to generate report for a given period - start date : 27-May-2013; end date : 09-Jul-2013
we need to generate report as of week of the year(week will be considered as of Monday 0:01 am).
week number for the given dates are 22/2013(Mon - 27-05-2013 00:01 am), 23/2013.......28/2013.
here is the query for every weekly report but i want your help in combining alltogether in a single query.

CREATE TABLE PRODUCT_TEST
(
 PROD_SEQ_ID NUMBER PRIMARY KEY,
 PROD_ID NUMBER,
 PRD_CHNG_DTTM DATE,  
 PROD_STTS VARCHAR2(25)
);

Insert into PRODUCT_TEST values (1, 101, to_date('27-MAY-2013','DD-MON-YYYY'),'New');
Insert into PRODUCT_TEST values (2, 101, to_date('28-MAY-2013','DD-MON-YYYY'),'Open');
Insert into PRODUCT_TEST values (3, 101, to_date('30-MAY-2013','DD-MON-YYYY'),'Error');

Insert into PRODUCT_TEST values (4, 102, to_date('31-MAY-2013','DD-MON-YYYY'),'New');
Insert into PRODUCT_TEST values (5, 102, to_date('04-JUN-2013','DD-MON-YYYY'),'Open');
Insert into PRODUCT_TEST values (6, 102, to_date('08-JUN-2013','DD-MON-YYYY'),'Closed');

Insert into PRODUCT_TEST values (7, 103, to_date('12-JUN-2013','DD-MON-YYYY'),'New');
Insert into PRODUCT_TEST values (8, 103, to_date('19-JUN-2013','DD-MON-YYYY'),'Open');
Insert into PRODUCT_TEST values (9, 103, to_date('26-JUN-2013','DD-MON-YYYY'),'Closed');

Insert into PRODUCT_TEST values (10, 104, to_date('10-JUL-2013','DD-MON-YYYY'),'New');

COMMIT;

First we need to find all the week numbers i.e.,Monday 00:01AM so b/w 27th May and 9th Jul
we will have week 
22/2013(date will be 27/05/2013 0:01 am)
23/2013(date will be 03/06/2013 0:01 am)
24/2013(date will be 10/06/2013 0:01 am) and so on....till end date
For weekly report
For week 22/2013=======PROD_CHNG_DTTM < 27-May-2013 00:01==========
SELECT COUNT( CASE WHEN PROD_STTS = 'New'
                       THEN 1
                   END ) cnt_new,
       COUNT( CASE WHEN PROD_STTS = 'Open'
                       THEN 1
                   END ) cnt_open,
       COUNT( CASE WHEN PROD_STTS = 'Error'
                       THEN 1
                   END ) cnt_error,
       COUNT( CASE WHEN PROD_STTS = 'Closed'
                       THEN 1
                   END ) cnt_closed
FROM
(
SELECT PROD_SEQ_ID, PROD_ID, PROD_STTS
,ROW_NUMBER() OVER (PARTITION BY PROD_ID ORDER BY PROD_SEQ_ID DESC) rn
FROM PRODUCT_TEST
WHERE PRD_CHNG_DTTM < to_date('27-May-2013 00:01', 'DD-MON-YYYY hh24:mi')
)
WHERE rn = 1;

For week 23/2013========PROD_CHNG_DTTM < 03-Jun-2013 00:01==========
SELECT COUNT( CASE WHEN PROD_STTS = 'New'
                       THEN 1
                   END ) cnt_new,
       COUNT( CASE WHEN PROD_STTS = 'Open'
                       THEN 1
                   END ) cnt_open,
       COUNT( CASE WHEN PROD_STTS = 'Error'
                       THEN 1
                   END ) cnt_error,
       COUNT( CASE WHEN PROD_STTS = 'Closed'
                       THEN 1
                   END ) cnt_closed
FROM
(
SELECT PROD_SEQ_ID, PROD_ID, PROD_STTS
,ROW_NUMBER() OVER (PARTITION BY PROD_ID ORDER BY PROD_SEQ_ID DESC) rn
FROM PRODUCT_TEST
WHERE PRD_CHNG_DTTM < to_date('03-Jun-2013 00:01', 'DD-MON-YYYY hh24:mi')
)
WHERE rn = 1;
............. and so on for every week till end date and need to produce a 
weekly report like

For the period 27th May to 9th Jul--
As of    New_Count Open_Count Error_Count Closed_count
2013 FW22 1      0  0  0
2013 FW23 1      0  1  0
2013 FW24 0      0  1  1
..........................so on
Thanks in advance..

Can you please help me in formulating above query.

kumar, July 13, 2013 - 2:55 pm UTC

Need your help Tom....

help needed

kumar, July 16, 2013 - 5:11 pm UTC

Hi Tom,
please help me in formulating above requirement in combining all the queries into one....
Anxiously waiting for response..

Need your help on reporting query.

kumar, July 17, 2013 - 10:58 am UTC

Hi Tom,
I need to prepare a monthly, weekly and yearly count report from a modification history based on status.

CREATE TABLE PRODUCT_TEST
(
 PROD_SEQ_ID NUMBER PRIMARY KEY,
 PROD_ID NUMBER,
 PRD_CHNG_DTTM DATE,  
 PROD_STTS VARCHAR2(25)
);

Insert into PRODUCT_TEST values (1, 101, to_date('27-MAY-2013','DD-MON-YYYY'),'New');
Insert into PRODUCT_TEST values (2, 101, to_date('28-MAY-2013','DD-MON-YYYY'),'Open');
Insert into PRODUCT_TEST values (3, 101, to_date('30-MAY-2013','DD-MON-YYYY'),'Error');

Insert into PRODUCT_TEST values (4, 102, to_date('31-MAY-2013','DD-MON-YYYY'),'New');
Insert into PRODUCT_TEST values (5, 102, to_date('04-JUN-2013','DD-MON-YYYY'),'Open');
Insert into PRODUCT_TEST values (6, 102, to_date('08-JUN-2013','DD-MON-YYYY'),'Closed');

Insert into PRODUCT_TEST values (7, 103, to_date('12-JUN-2013','DD-MON-YYYY'),'New');
Insert into PRODUCT_TEST values (8, 103, to_date('19-JUN-2013','DD-MON-YYYY'),'Open');
Insert into PRODUCT_TEST values (9, 103, to_date('26-JUN-2013','DD-MON-YYYY'),'Closed');

Insert into PRODUCT_TEST values (10, 104, to_date('10-JUL-2013','DD-MON-YYYY'),'New');

COMMIT;
If user wants the report from date 26-May-2013 todate 09-Jul-2013 
Let's just take an weekly report example :- 
First we need to find all the week numbers i.e., all the Mondays 00:01AM so for 26th May and 9th Jul range we will have weeks as

21/2013 (date will be 20/05/2013 0:01 am)
22/2013 (date will be 27/05/2013 0:01 am)
......
28/2013 (date will be 08/07/2013 0:01 am)

Now i have to produce a trend report which will have recods as of weeks :-

For 21/2013 my query will have PROD_CHNG_DTTM < 20-May-2013 00:01 i.e., take all records from beginning till 20/05/13

SELECT COUNT( CASE WHEN PROD_STTS = 'New'
                       THEN 1
                   END ) cnt_new,
       COUNT( CASE WHEN PROD_STTS = 'Open'
                       THEN 1
                   END ) cnt_open,
       COUNT( CASE WHEN PROD_STTS = 'Error'
                       THEN 1
                   END ) cnt_error,
       COUNT( CASE WHEN PROD_STTS = 'Closed'
                       THEN 1
                   END ) cnt_closed
FROM
(
SELECT PROD_SEQ_ID, PROD_ID, PROD_STTS
,ROW_NUMBER() OVER (PARTITION BY PROD_ID ORDER BY PROD_SEQ_ID DESC) rn
FROM PRODUCT_TEST
WHERE PRD_CHNG_DTTM < to_date('27-May-2013 00:01', 'DD-MON-YYYY hh24:mi')
)
WHERE rn = 1;

For 22/2013 my query will have PROD_CHNG_DTTM < 27-May-2013 00:01 i.e., take all records from beginning till 27/05/13

SELECT COUNT( CASE WHEN PROD_STTS = 'New'
                       THEN 1
                   END ) cnt_new,
       COUNT( CASE WHEN PROD_STTS = 'Open'
                       THEN 1
                   END ) cnt_open,
       COUNT( CASE WHEN PROD_STTS = 'Error'
                       THEN 1
                   END ) cnt_error,
       COUNT( CASE WHEN PROD_STTS = 'Closed'
                       THEN 1
                   END ) cnt_closed
FROM
(
SELECT PROD_SEQ_ID, PROD_ID, PROD_STTS
,ROW_NUMBER() OVER (PARTITION BY PROD_ID ORDER BY PROD_SEQ_ID DESC) rn
FROM PRODUCT_TEST
WHERE PRD_CHNG_DTTM < to_date('27-May-2013 00:01', 'DD-MON-YYYY hh24:mi')
)
WHERE rn = 1;

One way is write queries for all the weeks requested by user's date range and use union all but it will be the slowest option i think if user demands for date range having over 100 weeks.
I am sure you must be having a faster solution for combining all the queries into one(I mean finding weeks query, all the report queries for max of record in varying todate...).
Hope I have made myself clear.
Thanks a lot in Advance

July 17, 2013 - 6:08 pm UTC

I don't really have anything to add - this is not really a followup question - maybe when I have time to take a new question.

since you seem to use

SELECT PROD_SEQ_ID, PROD_ID, PROD_STTS
,ROW_NUMBER() OVER (PARTITION BY PROD_ID ORDER BY PROD_SEQ_ID DESC) rn
FROM PRODUCT_TEST
WHERE PRD_CHNG_DTTM < to_date('27-May-2013 00:01', 'DD-MON-YYYY hh24:mi')
)

multiple times, perhaps in your union all you can

with data
as
( that select )
select .. from data ..
union all
select ... from data. ...
union all
select ... from data ..

we'll build the result once (data) and use it over and over.

Thanks a lot TOM

kumar, July 30, 2013 - 9:12 am UTC

Thank you very much Tom..
For the time being i have made my query as per your suggestion, but still want to see some good or flexible option s from your pen just for learning or enhance my analytical ability.

May be i will to try put this into new question section whenever i will be able to see new question button......

@kumar on reporting query

Stew Ashton, July 31, 2013 - 1:02 pm UTC

Hi kumar,

You may not ever read this, but I think you misled Tom. You repeated

WHERE PRD_CHNG_DTTM < to_date('27-May-2013 00:01', 'DD-MON-YYYY hh24:mi')

when you said you needed two different dates.

I think you want a weekly summary of product status. For that, you need:
- the "as of" dates you want the summary for
- the range of dates the status applies to
- join status to "as of" dates based on the date range
- then group by "as of" date.

with as_of_dates as (
  select min_as_of_date + level*7 as_of_date
  from (
    select trunc(min(prd_chng_dttm)-1/24/60/60, 'IW') min_as_of_date
    from product_test
  )
  connect by min_as_of_date + level*7 <= sysdate
), data as (
  select prd_chng_dttm,
  lead(prd_chng_dttm-1/24/60/60,1,sysdate)
    over(partition by prod_id order by prd_chng_dttm) prd_chng_dttm_end,
  CASE WHEN PROD_STTS = 'New' THEN 1 else 0 END cnt_new,
  CASE WHEN PROD_STTS = 'Open' THEN 1 else 0 END cnt_Open,
  CASE WHEN PROD_STTS = 'Error' THEN 1 else 0 END cnt_Error,
  CASE WHEN PROD_STTS = 'Closed' THEN 1 else 0 END cnt_Closed
  from product_test
)
select as_of_date, sum(cnt_new) cnt_new, sum(cnt_open) cnt_open,
sum(cnt_error) cnt_error, sum(cnt_closed) cnt_closed
from data a, as_of_dates b
where b.as_of_date between a.prd_chng_dttm and a.prd_chng_dttm_end
group by as_of_date
order by as_of_date;

AS_OF_DATE              CNT_NEW   CNT_OPEN  CNT_ERROR CNT_CLOSED
-------------------- ---------- ---------- ---------- ----------
27-MAY-13 00.00.00            1          0          0          0 
03-JUN-13 00.00.00            1          0          1          0 
10-JUN-13 00.00.00            0          0          1          1 
17-JUN-13 00.00.00            1          0          1          1 
24-JUN-13 00.00.00            0          1          1          1 
01-JUL-13 00.00.00            0          0          1          2 
08-JUL-13 00.00.00            0          0          1          2 
15-JUL-13 00.00.00            1          0          1          2 
22-JUL-13 00.00.00            1          0          1          2 
29-JUL-13 00.00.00            1          0          1          2

using partition

A, August 01, 2013 - 6:57 pm UTC

Hello Tom,
Was wondering can we use partition to display "Multiple" for Col3 is the getting repeated?

If the data is like this,

Col1 Col2 Col3
111 AA NY
222 BB LON
111 AA PAR
333 AA MUM
111 AA MUM
222 BB LON
444 XX LON

We want the output like this:

Col1 Col2 Col3
111 AA Multiple
222 BB Multiple
333 AA MUM
444 XX LON

If there are multiple values for Col3 for an combination of Col1 and Col2 then in Col3 it should display as "Multiple" else it should show the value.

Thanks

August 02, 2013 - 7:26 pm UTC

ops$tkyte%ORA11GR2> with data
  2  as
  3  (select 111 c1, 'aa' c2, 'NY' c3 from dual union all
  4   select 111 c1, 'aa' c2, 'XX' c3 from dual union all
  5   select 222 c1, 'aa' c2, 'YY' c3 from dual union all
  6   select 222 c1, 'aa' c2, 'YY' c3 from dual
  7  )
  8  select c1, c2, case when count(distinct c3) > 1 then 'multiple' else max(c3) end c3
  9    from data
 10   group by c1, c2
 11   order by c1, c2
 12  /

        C1 C2 C3
---------- -- --------
       111 aa multiple
       222 aa YY

Thanks

A, August 05, 2013 - 7:14 am UTC

Hello Tom,
Thanks for the reply. This is hard-coded to only 2 distinct rows. There are more than 500 rows in the table and any combination of these

some thing like ...

Col1 Col2 Col3
111 AA NY
222 BB LON
111 AA PAR
333 AA MUM
111 AA MUM
222 BB LON
444 XX LON
100 AQ JJ
101 BB PAR
...
...
...

How to write a query for this?

August 08, 2013 - 4:26 pm UTC

there is no hard coded limit? what are you talking about?

my query works for any number of rows.

Response

Raj B, August 05, 2013 - 12:46 pm UTC

Tom has already answered the query! All you need to do is just try it. 

SQL> select * from t3;

      COL1 COL COL
---------- --- ---
       222 BB  ZZZ
       345 MM  YYY
       111 AA  CAL
       222 BB  IAD
       222 BB  BLT
       111 AA  LON
       123 AA  MUM

7 rows selected.

SQL> select col1,col2, case when count(distinct col3) > 1 then 'MUL' else max(col3) end col3
    from t3
    group by col1, col2 order by col1, col2;

      COL1 COL COL
---------- --- ---
       111 AA  MUL
       123 AA  MUM
       222 BB  MUL
       345 MM  YYY

tough query

Bhavesh Ghodasara, August 21, 2013 - 1:27 pm UTC

Hi Tom,

I was regular at your blog before 7-8 years. and learned a lot from you at beginning of my career. and it helped a lot. Ironically I became Teradata DBA :).. Thanks for all your help and wonderful solutions.

So I am back with query again.
create table t
(
Artcl integer,
store integer,
yr integer,
week integer,
Inventory integer);

insert into t values( 1,1,2011,50,2);
insert into t values( 1,1,2011,51,-5);
insert into t values( 1,1,2011,52,4);
insert into t values( 1,1,2012,1,-5);
insert into t values( 1,1,2012,4,3);
insert into t values( 1,1,2012,7,2);
insert into t values( 1,1,2012,8,4);
insert into t values( 1,1,2012,10,-2);
insert into t values( 1,1,2012,15,7);
insert into t values( 2,1,2011,52,-5);
insert into t values( 2,1,2012,4,2);
insert into t values( 2,1,2012,6,-1);
insert into t values( 2,1,2012,50,5);
insert into t values( 2,1,2012,52,2);
insert into t values( 2,1,2013,1,-2);
insert into t values( 2,1,2013,2,-7);
insert into t values( 2,1,2013,3,-2);
insert into t values( 2,1,2013,4,4);
insert into t values( 2,1,2013,5,-2);
insert into t values( 3,2,2012,6,1);
insert into t values( 3,2,2012,50,2);
insert into t values( 3,2,2012,52,3);
insert into t values( 3,2,2013,1,-2);
insert into t values( 3,2,2013,2,-1);
insert into t values( 3,2,2013,3,-1);
insert into t values( 3,2,2013,4,5);
insert into t values( 3,2,2013,5,-2);

Now data will look like (except last column)
Artcl store year week Inventory Rolling total
1 1 2011 50 2 2
1 1 2011 51 -5 0
1 1 2011 52 4 4
1 1 2012 1 -5 0
1 1 2012 4 3 3
1 1 2012 7 2 5
1 1 2012 8 4 9
1 1 2012 10 -2 7
1 1 2012 15 7 14
2 1 2011 52 -5 0
2 1 2012 4 2 2
2 1 2012 6 -1 1
2 1 2012 50 5 6
2 1 2012 52 2 8
2 1 2013 1 -2 6
2 1 2013 2 -7 0
2 1 2013 3 -2 0
2 1 2013 4 4 4
2 1 2013 5 -2 2
3 2 2012 6 1 1
3 2 2012 50 2 3
3 2 2012 52 3 6
3 2 2013 1 -2 4
3 2 2013 2 -1 3
3 2 2013 3 -1 2
3 2 2013 4 5 7
3 2 2013 5 -2 5

I need rolling column like above report.
1) partition by artcl, store, year , week
2) rolling total of inventory with following conditions :
a) if total becomes 0 or negative, we have to consider as 0 and start all over.

I tried hard to get solution but no avail. I even think now that it will not probably possible
with simple sql and will need function/procedure for that.

Please help. As I left with no option, I came back to my childhood hero for help :)

String Diff

TCS, September 17, 2013 - 10:44 am UTC

i want to do string diff between two string like if i have two string input say : ABCDCAFGV and ABDCAF then my output will be : CGV
Basically i want to do string position match between two string and output the difference.
Can it be done througn Oracle SQL query only or we need to write UDF for that.

reply to Bhavesh's question

Ranjan., September 19, 2013 - 5:19 pm UTC

Hi Bhavesh,

For your scenario ,here is the below code which will produce the desire result.

################################
CREATE TYPE typ_obj_breaksum AS OBJECT
(
sum NUMBER,

STATIC FUNCTION ODCIAggregateInitialize (
sctx IN OUT typ_obj_breaksum
) RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateIterate (
self IN OUT typ_obj_breaksum,
value IN NUMBER
) RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateTerminate (
self IN typ_obj_breaksum,
retval OUT NUMBER,
flags IN NUMBER
) RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateMerge (
self IN OUT typ_obj_breaksum,
ctx2 IN typ_obj_breaksum
) RETURN NUMBER
);

/

CREATE TYPE BODY typ_obj_breaksum IS

STATIC FUNCTION ODCIAggregateInitialize (
sctx IN OUT typ_obj_breaksum
) RETURN NUMBER IS
BEGIN
sctx := typ_obj_breaksum(0);
RETURN ODCIConst.Success;
End;
MEMBER FUNCTION ODCIAggregateIterate (
self IN OUT typ_obj_breaksum,
value IN NUMBER
) RETURN NUMBER IS
BEGIN
self.sum := CASE
WHEN value >= 0
OR (value < 0 AND self.sum + value > 0)
THEN self.sum + value
ELSE 0
END;
Return Odciconst.Success;
END;
MEMBER FUNCTION ODCIAggregateTerminate (
self IN typ_obj_breaksum,
retval OUT NUMBER,
flags IN NUMBER
) RETURN NUMBER IS
BEGIN
retval := self.sum;
RETURN ODCIConst.Success;
End;

MEMBER FUNCTION ODCIAggregateMerge (
self IN OUT typ_obj_breaksum,
ctx2 IN typ_obj_breaksum
) RETURN NUMBER IS
BEGIN
self.sum := CASE
WHEN self.sum + ctx2.sum > 0
THEN self.sum + ctx2.sum
ELSE 0
END;
RETURN ODCIConst.Success;
END;

End;

/

CREATE FUNCTION breaksum (input NUMBER) RETURN NUMBER
PARALLEL_ENABLE
AGGREGATE USING typ_obj_breaksum;

/

SELECT artcl , store,yr,week,inventory, BREAKSUM (inventory) over(partition by artcl order by rownum) AS rolling_total
From T
;

And Here The Output Is:
##################################
Artcl store year week Inventory Rolling total
1 1 2011 50 2 2
1 1 2011 51 -5 0
1 1 2011 52 4 4
1 1 2012 1 -5 0
1 1 2012 4 3 3
1 1 2012 7 2 5
1 1 2012 8 4 9
1 1 2012 10 -2 7
1 1 2012 15 7 14
2 1 2011 52 -5 0
2 1 2012 4 2 2
2 1 2012 6 -1 1
2 1 2012 50 5 6
2 1 2012 52 2 8
2 1 2013 1 -2 6
2 1 2013 2 -7 0
2 1 2013 3 -2 0
2 1 2013 4 4 4
2 1 2013 5 -2 2
3 2 2012 6 1 1
3 2 2012 50 2 3
3 2 2012 52 3 6
3 2 2013 1 -2 4
3 2 2013 2 -1 3
3 2 2013 3 -1 2
3 2 2013 4 5 7
3 2 2013 5 -2 5
############################

Hope that help you and from now ,you can able to create your own user defined aggregate function. :)

cont to my above post for Bhavesh

Ranjan., September 19, 2013 - 5:53 pm UTC

Hi Bhavesh,

by seeing your result,
you datas need to partition by artcl,store order by yr,week.

Because for "2 1 2013 1 -2 6"
the 6 came from the year 2012 balance ,which is 8 and added "-2" of year 2013 value.

So the query should be as below.
################33
SELECT artcl , store,yr,week,inventory, BREAKSUM (inventory) over(partition by artcl,store order by yr,week) AS rolling_total
From T ;
########################3

Enjoy :).

thanks

Bhavesh Ghodasara, September 24, 2013 - 11:50 am UTC

Thank you so much Rajan.

I really appriciate your effort.

I want to know whether it is possible to get particular output by just Query, I dont want to use any function.

cont to my last post for @Bhavesh

Ranjan, September 26, 2013 - 8:43 pm UTC

Hi Bhavesh,
For Your requirement, It is not that easy to display desire report directly using select query.

Long back I read something "creating our own aggregate function" from the below site.
http://www.oracle-developer.net/display.php?id=215

I suggest you to read that page by spending 10 minutes and
you will know why it is not that easy :).

I have only copied those logic from that site and shown in my last post.

Regards,
Ranjan.

Rich, October 29, 2013 - 3:26 pm UTC

Tomm,

Thanks for providing the "Old Carry Forward Trick." It worked well for me!

"One of my (current) favorite analytic tricks -- the old "carry forward". We mark rows such that
the preceding row was different -- subsequent dup rows would have NULLS there for grp.

Then, we use max(grp) to "carry" that number down....

Now we have something to group by -- we've divided the rows up into groups we can deal with."

selecting selective subset of rows

Ravi B, April 05, 2014 - 12:00 am UTC

Hi Tom,

Could you please help me with this problem.

CREATE TABLE TEST_VERSION(VERSION_ID NUMBER,SUBVERSION VARCHAR2(100));

insert into TEST_VERSION VALUES(1,null);
insert into TEST_VERSION VALUES(1,'Alpha');
insert into TEST_VERSION VALUES(1,'Internal Beta');
insert into TEST_VERSION VALUES(1,'Technical Preview');
insert into TEST_VERSION VALUES(1,'RTM');

insert into TEST_VERSION VALUES(2,null);
insert into TEST_VERSION VALUES(2,'Alpha');
insert into TEST_VERSION VALUES(3,'Internal Beta');
insert into TEST_VERSION VALUES(4,'Technical Preview');

SELECT * FROM TEST_VERSION;

select rules:

for given version_id:

1) if there is a SUBVERSION='RTM' row(s) in the set and also a row with SUBVERSION= NULL ignore the row with SUBVERSION= NULL and select all the other rows including RTM

2) if there is a row SUBVERSION= NULL and there are no rows with SUBVERSION='RTM' then pick all the rows including SUBVERSION=NULL

In essence, row with SUBVERSION= NULL cannot not co-exist with SUBVERSION='RTM' for the same version_id.

April 16, 2014 - 5:43 pm UTC

I think #2 should be phrased as:

2) if there are no rows with RTM, select them all.

ops$tkyte%ORA11GR2> select *
  2    from (
  3  select version_id, subversion,
  4         max( case when subversion = 'RTM' then 1 else 0 end ) over (partition by version_id ) has_rtm
  5    from test_version
  6         )
  7   where (has_rtm = 1 and subversion is not null)
  8      or (has_rtm = 0)
  9   order by version_id, subversion
 10  /

VERSION_ID SUBVERSION              HAS_RTM
---------- -------------------- ----------
         1 Alpha                         1
         1 Internal Beta                 1
         1 RTM                           1
         1 Technical Preview             1
         2 Alpha                         0
         2                               0
         3 Internal Beta                 0
         4 Technical Preview             0

8 rows selected.

selecting selective subset of rows

A reader, April 17, 2014 - 3:51 am UTC

Brilliant!!

alternatively

Igor, April 17, 2014 - 4:02 pm UTC

SQL> select * from test_version
  2  minus
  3  select version_id, decode(subversion,'RTM',null) from test_version where subversion='RTM'
  4  order by 1,2;

VERSION_ID SUBVERSION
---------- -----------------
         1 Alpha
         1 Internal Beta
         1 RTM
         1 Technical Preview
         2 Alpha
         2
         3 Internal Beta
         4 Technical Preview

8 rows selected.

April 17, 2014 - 5:40 pm UTC

depends, we know nothing about their primary keys, unique constraints and so on. don't forget that MINUS adds a "distinct" to the query. That means that in general, you cannot substitute MINUS for what I did above.

just re-run the inserts and:

ops$tkyte%ORA11GR2> select *
  2    from (
  3  select version_id, subversion,
  4         max( case when subversion = 'RTM' then 1 else 0 end ) over (partition by version_id ) has_rtm
  5    from test_version
  6         )
  7   where (has_rtm = 1 and subversion is not null)
  8      or (has_rtm = 0)
  9   order by version_id, subversion
 10  /

VERSION_ID SUBVERSION              HAS_RTM
---------- -------------------- ----------
         1 Alpha                         1
         1 Alpha                         1
         1 Internal Beta                 1
         1 Internal Beta                 1
         1 RTM                           1
         1 RTM                           1
         1 Technical Preview             1
         1 Technical Preview             1
         2 Alpha                         0
         2 Alpha                         0
         2                               0
         2                               0
         3 Internal Beta                 0
         3 Internal Beta                 0
         4 Technical Preview             0
         4 Technical Preview             0

16 rows selected.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select * from test_version
  2  minus
  3  select version_id, decode(subversion,'RTM',null) from test_version where subversion='RTM'
  4  order by 1,2;

VERSION_ID SUBVERSION
---------- --------------------
         1 Alpha
         1 Internal Beta
         1 RTM
         1 Technical Preview
         2 Alpha
         2
         3 Internal Beta
         4 Technical Preview

8 rows selected.

and you don't need your decode, just select version_id, null - since you are only getting RTM subversions - the decode is an extra step you would not need.

a non-analytic query that would work could be:

ops$tkyte%ORA11GR2> select *
  2    from test_version a
  3   where not exists (select null
  4                       from test_version b
  5                      where a.version_id = b.version_id
  6                        and b.subversion = 'RTM'
  7                        and a.subversion is null)
  8   order by 1, 2
  9  /

but be careful trying to substitute MINUS, UNION, UNION ALL, INTERSECT, etc - they have "set based side effects" that can and will change the answer you receive - unless there are constraints in place that tell you "that side effect cannot happen"

but yes, there are a huge number of equivalent ways to get this answer...

Igor, April 17, 2014 - 9:03 pm UTC

Tom, you are right, decode was not needed in my query, thanks for pointing at this. And yes, these set operators (union, minus etc.) may produce different results, depending on the data (for example the duplicate records will be lost). Thanks!

April 17, 2014 - 10:52 pm UTC

thanks for the ideas though :) it is always good to see alternatives...

A reader, August 11, 2015 - 7:51 pm UTC

In this example, is there a way to get the values without doing the join twice. I need to get the value from table N where the MAX date (field d) is less than or equal to date in table A.

create table g(a int, b varchar2(2), d date) -- a, b and d are together unique;
insert into g values (1, 'ER', to_date('01012005','mmddyyyy');
insert into g values (1, 'ER', to_date('06012005','mmddyyyy');
insert into g values (1, 'ER', to_date('01012015','mmddyyyy');
insert into g values (2, 'PR', to_date('01012005','mmddyyyy');
insert into g values (2, 'PR', to_date('01012015','mmddyyyy');
insert into g values (2, 'PR', to_date('06012015','mmddyyyy');
insert into g values (3, 'FR', to_date('06012015','mmddyyyy');

create table n(a int, b varchar2(2), d date, e varchar2(3), f varchar2(3)) -- a, b and d are unique;
insert into n values(1, 'ER', to_date('01012005','mmddyyyy', 'FSC', 'COL');
insert into n values(1, 'ER', to_date('01012014','mmddyyyy', 'FOO', 'CLL');
insert into n values(1, 'PR', to_date('01012014','mmddyyyy', 'FOI', 'COL');
insert into n values(1, 'PR', to_date('06012014','mmddyyyy', 'FIT', 'CLL');

select g.a, g.b, g.d,
(select n.e from n where n.a = g.a and n.b = g.b and n.d = (select max(n.d) from n where n.a = g.a and n.b = g.b and n.d<=g.d) e,
(select n.f from n where n.a = g.a and n.b = g.b and n.d = (select max(n.d) from n where n.a = g.a and n.b = g.b and n.d<=g.d) f
from g

Analytics based on Logical offset

Rajeshwaran, Jeyabal, July 13, 2016 - 11:24 am UTC

set feedback off
drop table t purge;
create table t(
HYB_PROJECT_KEY    NUMBER,
SAMP_CNFG_KEY      NUMBER,
SAMPLE_RESULT_KEY  NUMBER,
RESULT_SOURCE      VARCHAR2(1),
RESULT_SOURCE_CDC1 VARCHAR2(1),
RESULT_DATE        DATE,
RESULT_VALUE_TYPE  VARCHAR2(10),
RESULT_VALUE       NUMBER,
IS_ACTIVE          NUMBER);
Insert into T (HYB_PROJECT_KEY,SAMP_CNFG_KEY,SAMPLE_RESULT_KEY,RESULT_SOURCE,RESULT_SOURCE_CDC1,RESULT_DATE,RESULT_VALUE_TYPE,RESULT_VALUE,IS_ACTIVE) 
 values (84,114,1365970,'A','A',to_date('09-JUN-2014','DD-MON-YYYY'),'TEST',7.2,1);
Insert into T (HYB_PROJECT_KEY,SAMP_CNFG_KEY,SAMPLE_RESULT_KEY,RESULT_SOURCE,RESULT_SOURCE_CDC1,RESULT_DATE,RESULT_VALUE_TYPE,RESULT_VALUE,IS_ACTIVE) 
 values (84,114,1365970,'A','A',to_date('14-JUN-2014','DD-MON-YYYY'),'TEST',8,1);
Insert into T (HYB_PROJECT_KEY,SAMP_CNFG_KEY,SAMPLE_RESULT_KEY,RESULT_SOURCE,RESULT_SOURCE_CDC1,RESULT_DATE,RESULT_VALUE_TYPE,RESULT_VALUE,IS_ACTIVE) 
 values (84,114,1365970,'H','H',to_date('01-JUN-2014','DD-MON-YYYY'),'TEST',4,1);
Insert into T (HYB_PROJECT_KEY,SAMP_CNFG_KEY,SAMPLE_RESULT_KEY,RESULT_SOURCE,RESULT_SOURCE_CDC1,RESULT_DATE,RESULT_VALUE_TYPE,RESULT_VALUE,IS_ACTIVE) 
 values (84,114,1365970,'H','H',to_date('03-JUN-2014','DD-MON-YYYY'),'TEST',5,1);
Insert into T (HYB_PROJECT_KEY,SAMP_CNFG_KEY,SAMPLE_RESULT_KEY,RESULT_SOURCE,RESULT_SOURCE_CDC1,RESULT_DATE,RESULT_VALUE_TYPE,RESULT_VALUE,IS_ACTIVE) 
 values (84,114,1365970,'H','H',to_date('06-JUN-2014','DD-MON-YYYY'),'TEST',4,1);
Insert into T (HYB_PROJECT_KEY,SAMP_CNFG_KEY,SAMPLE_RESULT_KEY,RESULT_SOURCE,RESULT_SOURCE_CDC1,RESULT_DATE,RESULT_VALUE_TYPE,RESULT_VALUE,IS_ACTIVE) 
 values (84,114,1365970,'H','H',to_date('07-JUN-2014','DD-MON-YYYY'),'TEST',3,1);
Insert into T (HYB_PROJECT_KEY,SAMP_CNFG_KEY,SAMPLE_RESULT_KEY,RESULT_SOURCE,RESULT_SOURCE_CDC1,RESULT_DATE,RESULT_VALUE_TYPE,RESULT_VALUE,IS_ACTIVE) 
 values (84,114,1365970,'H','H',to_date('15-JUN-2014','DD-MON-YYYY'),'TEST',10,1);
Insert into T (HYB_PROJECT_KEY,SAMP_CNFG_KEY,SAMPLE_RESULT_KEY,RESULT_SOURCE,RESULT_SOURCE_CDC1,RESULT_DATE,RESULT_VALUE_TYPE,RESULT_VALUE,IS_ACTIVE) 
 values (84,114,1365970,'H','H',to_date('17-JUN-2014','DD-MON-YYYY'),'TEST',0,1);
commit;
set feedback on

Team,
1) for each sample_result_key and result_date just retain only the Minimum non-zero values as result_value.
2) for each sample_result_key if the maximum result_date having the result_value = 0, then go back seven days and pick the minimum result_value and its corresponding result_source and result_source_cdc1.

I am able to come up with a query like this, but do you have any other better way of doing this?

demo@ORA11G>
demo@ORA11G> select hyb_project_key, samp_cnfg_key , sample_result_key,
  2          max(result_date) result_date,
  3          min(case when result_value > 0 then result_value end) result_value,
  4          min(result_source) keep(dense_rank first order by case when result_value > 0 then result_value end asc nulls last) result_source,
  5          min(result_source_cdc1) keep(dense_rank first order by case when result_value > 0 then result_value end asc nulls last) result_source_cdc1
  6  from (
  7  select hyb_project_key, samp_cnfg_key,
  8          sample_result_key,result_date, result_value,rnk,
  9          result_source,result_source_cdc1
 10  from (
 11  select hyb_project_key, samp_cnfg_key,
 12          sample_result_key,result_date, result_value,rnk,
 13          result_source,result_source_cdc1,nxt_value,
 14          last_value(nxt_value ignore nulls) over(partition by sample_result_key order by rnk) nxt_value2
 15  from (
 16  select hyb_project_key, samp_cnfg_key,
 17          sample_result_key,result_date,
 18          min(case when result_value > 0 then result_value end) result_value,
 19          min(result_source) keep(dense_rank first order by case when result_value > 0 then result_value end nulls last) result_source,
 20          min(result_source_cdc1) keep(dense_rank first order by case when result_value > 0 then result_value end nulls last) result_source_cdc1 ,
 21          row_number() over(partition by sample_result_key order by result_date desc) rnk,
 22          case when row_number() over(partition by sample_result_key order by result_date desc) = 1
 23                and min(case when result_value > 0 then result_value end) is null
 24                then last_value( result_date ) over( partition by sample_result_key order by result_date desc
 25                    range between current row and interval '7' day following) end nxt_value
 26  from t
 27  group by hyb_project_key, samp_cnfg_key,
 28          sample_result_key,result_date
 29  order by result_date desc
 30       )
 31       )
 32  where rnk =1 or nxt_value2 <= result_date
 33       ) t2
 34  group by hyb_project_key, samp_cnfg_key , sample_result_key ;

HYB_PROJECT_KEY SAMP_CNFG_KEY SAMPLE_RESULT_KEY RESULT_DATE RESULT_VALUE R R
--------------- ------------- ----------------- ----------- ------------ - -
             84           114           1365970 17-JUN-2014            8 A A

1 row selected.

July 13, 2016 - 1:24 pm UTC

I'm not sure I understand the requirement. Based on the example, do you mean:

"Return the maximum result date for each sample. If this has a value of zero, return the lowest (non-zero) result in the seven previous days. Also show the source and CDC that gave this minimum"

?

If so, the following does that:

select hyb_project_key, samp_cnfg_key, sample_result_key, mx,
       min(case when result_value > 0 then result_value end) res_value,
       min(result_source) keep (
         dense_rank first order by case when result_value > 0 then result_value end nulls last
       ) score,
       min(result_source_cdc1) keep (
         dense_rank first order by case when result_value > 0 then result_value end
       ) cdc
from (
  select t.*,
         max(result_date) over (partition by sample_result_key) mx
  from   t
)
where  mx - result_date <= 7
group  by hyb_project_key, samp_cnfg_key, sample_result_key, mx;

HYB_PROJECT_KEY  SAMP_CNFG_KEY  SAMPLE_RESULT_KEY  MX                    RES_VALUE  SCORE  CDC  
84               114            1,365,970          17-JUN-2014 00:00:00  8          A      A

Better is always a bit subjective. But this is far less code!

Obviously the where clause (mx - result_date <= 7) will need adjusting if there are multiple result keys in the data.

Analytics based on Logical offset

Rajeshwaran, Jeyabal, July 13, 2016 - 2:36 pm UTC

Thanks, this helps.

On "Analytics based on Logical offset"

Stew Ashton, January 09, 2017 - 11:03 am UTC

In 12c there is a MATCH_RECOGNIZE solution:

select * from t
match_recognize(
  partition by HYB_PROJECT_KEY, SAMP_CNFG_KEY, sample_result_key
  order by result_date desc
  measures a.RESULT_DATE RESULT_DATE,
    ab.RESULT_SOURCE RESULT_SOURCE, 
    ab.RESULT_SOURCE_CDC1 RESULT_SOURCE_CDC1, 
    ab.RESULT_VALUE_TYPE RESULT_VALUE_TYPE, 
    ab.RESULT_VALUE RESULT_VALUE, 
    ab.IS_ACTIVE IS_ACTIVE
  pattern (^a (b|c)*)
  subset bc = (b, c), ab = (a, b)
  define
    b as a.result_value = 0 and result_date >= a.result_date - 7
      and result_value = min(bc.result_value),
    c as a.result_value = 0 and result_date >= a.result_date - 7
);

January 10, 2017 - 4:04 am UTC

Love your work :-)

Analytics question

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Excellent!!!

Wow!!

Small correction

another good book on the list please go ahead on this one too

Very useful

Can we reach to the end of the group?

Almost....

Is this what you are looking for ?

Tom's Book

Is it possible to get the same result in standard edition ?

ok

Why...

Another solution

still q's on analytics

Regarding the previous post ...

OK

with analytics

OK

Stumped on Analytics

Oops - Error in previous post

Thanks.....

analytics book

OK

Will this be faster?

lead/lag on different dataset

Create table scripts

Excellent

Help On SQL

Help On SQL

analytic q

thanx!

How to compute this running total (sort of...)

This is how far I went...and no further

Yes...they are redundant

But it is redundant..

Almost there?

Follow up

Typo in previous post

Order

More Info

Hi Again

Great Stuff

Excellent

Is this solvable with ANALTICS too?

Thank you!

You said a book on analytics?

quick analytic question

Great!!!

Thank you very much

How to mimic Ora10g LAST_VALUE(... IGNORE NULLS)?

Doesn't work with PL/SQL ????????

Effect of distinct on lag

Distinct effect release

Distinct effect release

how to write this query

Finding pairs in result set

Brilliant as usual !!

PJ's query

oops

Cool....

analytic q

thanx!

SQL query

Great -

analytics problem

excellent

Limitation of Analytic Functions

Great!!!

Using analytical function, LEAD, LAG

Windowing clause and range function.

I do have Expert One on One

Great answer!

Using range windows

Re: another query using analytics