Excellent!!!
October 7, 2003 - 5am Central time zone
Reviewer: A reader
Hi Tom,
'been thinking about writing a book just about analytics ' ... please make this book available
soon and am sure it will be yet another gift from you to Oracle World :)
Wow!!
October 7, 2003 - 7am Central time zone
Reviewer: Michael T from Dallas, Tx
This is exactly what I needed! Analytics do rock! I just
need to understand them better. If you do decide to write a
book on analytics, it would be at the top of my must have
list. Thanks again!!!
Small correction
October 7, 2003 - 7am Central time zone
Reviewer: Michael T from Dallas, Tx
After looking at it a little closer it looks like there is
one small error. The start date for the first MACH1 entry
should be the close date of the prior different station. In
this case 07/01/2003. However, by making some small changes
to your query I can get the results I want.
SELECT order#,
station,
lag(close_date) over (partition by order# order by close_date)
start_date,
close_date
FROM (SELECT order#,
station,
close_date
FROM (SELECT order#,
lag(station) over (partition by order# order by
close_date) lag_station,
lead(station) over (partition by order# order by
close_date) lead_station,
station,
close_date
FROM t)
WHERE lead_station <> station
OR lead_station is null
OR lag_station is null)
There might be an easier way to construct this query, but
it works great for me. Thanks a lot for your help!
Followup October 7, 2003 - 8am Central time zone:
sorry about that -- you are right -- when we have "a pair", we want to use lag/lead again to get
and keep the right dates.
So, we want to keep rows that are:
a) the first row in the partition "where lag_station is null"
b) the last row in the partition "where lead_station is null"
c) the first of a possible pair "where lag_station <> station"
d) the second of a possible pair "where lead_station <> station"
This query does that:
ops$tkyte@ORA920> select order#,
2 station,
3 lag_close_date,
4 close_date,
5 decode( lead_station, station, 1, 0 ) first_of_pair,
6 decode( lag_station, station, 1, 0 ) second_of_pair
7 from (
8 select order#,
9 lag(station) over (partition by order# order by close_date)
10 lag_station,
11 lead(station) over (partition by order# order by close_date)
12 lead_station,
13 station,
14 close_date,
15 lag(close_date) over (partition by order# order by close_date)
16 lag_close_date,
17 lead(close_date) over (partition by order# order by close_date)
18 lead_close_date
19 from t
20 )
21 where lag_station is null
22 or lead_station is null
23 or lead_station <> station
24 or lag_station <> station
25 /
ORDER# STATION LAG_CLOSE_ CLOSE_DATE FIRST_OF_PAIR SECOND_OF_PAIR
------ ------- ---------- ---------- ------------- --------------
12345 RECV 07/01/2003 0 0
12345 MACH1 07/01/2003 07/02/2003 1 0
12345 MACH1 07/05/2003 07/11/2003 0 1
12345 INSP1 07/11/2003 07/12/2003 0 0
12345 MACH1 07/12/2003 07/16/2003 0 0
12345 MACH2 07/16/2003 07/30/2003 0 0
12345 STOCK 07/30/2003 08/01/2003 0 0
7 rows selected.
we can see with the 1's the first/second of a pair in there. All we need to do now is "reach
forward" for the first of a pair and grab the close date from the next record:
ops$tkyte@ORA920> select order#,
2 station,
3 lag_close_date,
4 close_date
5 from (
6 select order#,
7 station,
8 lag_close_date,
9 decode( lead_station,
10 station,
11 lead(close_date) over (partition by order# order by close_date),
12 close_date ) close_date,
13 decode( lead_station, station, 1, 0 ) first_of_pair,
14 decode( lag_station, station, 1, 0 ) second_of_pair
15 from (
16 select order#,
17 lag(station) over (partition by order# order by close_date)
18 lag_station,
19 lead(station) over (partition by order# order by close_date)
20 lead_station,
21 station,
22 close_date,
23 lag(close_date) over (partition by order# order by close_date)
24 lag_close_date,
25 lead(close_date) over (partition by order# order by close_date)
26 lead_close_date
27 from t
28 )
29 where lag_station is null
30 or lead_station is null
31 or lead_station <> station
32 or lag_station <> station
33 )
34 where second_of_pair <> 1
35 /
ORDER# STATION LAG_CLOSE_ CLOSE_DATE
------ ------- ---------- ----------
12345 RECV 07/01/2003
12345 MACH1 07/01/2003 07/11/2003
12345 INSP1 07/11/2003 07/12/2003
12345 MACH1 07/12/2003 07/16/2003
12345 MACH2 07/16/2003 07/30/2003
12345 STOCK 07/30/2003 08/01/2003
6 rows selected.
and discard the second of pairs row
That is another way to do it (and an insight into how I develop analytic queries -- adding extra
columns like that just to see visually what I want to do)
another good book on the list please go ahead on this one too
October 7, 2003 - 8am Central time zone
Reviewer: Vijay Sehgal from India
Best Regards,
Vijay Sehgal
Very useful
October 7, 2003 - 12pm Central time zone
Reviewer: Michael T. from Dallas, Tx
Excellent, as always!
Can we reach to the end of the group?
December 15, 2003 - 11am Central time zone
Reviewer: Steve from UK
For example, say our analytic query returns the following result:
master_record sub_record nxt_record
95845433 25860032 95118740
95118740 25860032 95837497
95837497 25860032
What I'd like is to do is grab the final master_record, 95837497, and have that populated in the
final column. There could be 2,3 or more in each group.
Followup December 15, 2003 - 3pm Central time zone:
so the nxt_record of the last record should be the master_record of that row?
then just select
nvl( lead(master_record) over (....), master_record ) nxt_record
when the lead is NULL, return the master_record of the current row
Almost....
December 15, 2003 - 5pm Central time zone
Reviewer: Steve from UK
but I dodn't explain it well enough. What I'd like to see is a result set that looks like:
master_record sub_record nxt_record
95845433 25860032 95837497
95118740 25860032 95837497
95837497 25860032 95837497
The data comes from this:
table activity
cllocn moddate
25860032 18/06/2003
95118740 26/08/2003
95837497 15/12/2003
95845433 19/08/2003
table ext_dedupe
master_cllocn dupe_cllocn
25860032 95118740
25860032 95837497
25860032 95845433
My query is:
select * from ( select master_record, sub_record, lead(master_record) over (partition by sub_record
order by lst_activity asc) nxt_activity
from ( select * from (select case when dupelast_ackdate>last_ackdate then dupe_cllocn
when last_ackdate>dupelast_ackdate then master_cllocn
else master_cllocn
end master_record, greatest(last_ackdate,dupelast_ackdate) lst_activity,
case when dupelast_ackdate>last_ackdate then master_cllocn
when last_ackdate>dupelast_ackdate then dupe_cllocn
else dupe_cllocn
end sub_record
from (select master_cllocn, (select max(moddate) from activity a where a.cllocn=ed.master_cllocn)
last_ackdate,
dupe_cllocn, (select max(moddate) from activity a where a.cllocn=ed.dupe_cllocn) dupelast_ackdate
from ext_dedupe ed))))
Am I on the right track or is there a simpler way to this?
Thanks
Followup December 16, 2003 - 6am Central time zone:
can you explain in "just text" how you got from your inputs to your outputs.
it is not clear (and i didn't feel like parsing that sql to reverse engineer what it does)
Is this what you are looking for ?
December 15, 2003 - 6pm Central time zone
Reviewer: Venkat from Detroit, MI USA
select master, sub, moddate
, min(master) keep (dense_rank first order by moddate) over (partition by sub) first_in_list
, max(master) keep (dense_rank last order by moddate) over (partition by sub) last_in_list
from (select master, sub, moddate from (
select 95845433 master, 25860032 sub, to_date('19-aug-03','dd/mon/yy') moddate from dual
union all
select 95118740, 25860032, to_date('26-aug-03','dd/mon/yy') from dual union all
select 95837497, 25860032, to_date('15-dec-03','dd/mon/yy') from dual))
MASTER SUB MODDATE FIRST_IN_LIST LAST_IN_LIST
95845433 25860032 8/19/2003 95845433 95837497
95118740 25860032 8/26/2003 95845433 95837497
95837497 25860032 12/15/2003 95845433 95837497
Tom's Book
December 16, 2003 - 4am Central time zone
Reviewer: umesh from blore india
Tom
Do not announce until you are finished with the book .. when you talk of a book ..can't wait until
We have it here
Analytics Book That must be real good
Is it possible to get the same result in standard edition ?
December 16, 2003 - 4am Central time zone
Reviewer: Ninoslav from croatia
Hi Tom,
yes, analitic functions are great. However, we can use it only in enterprise edition of database.
We have a few small customers that want only a standard edition.
So, is it possible in this question to get the same result without analitic functions ?
It would be nice to have some kind of mapping between analitics and 'standard' queries. But, that
is probabaly impossible...
Followup December 16, 2003 - 7am Central time zone:
Oracle 9iR2 and up -- analytics are a feature of standard edition.
there are things you can do in analytics that are quite simply NOT PRACTICAL in any sense without
them.
ok
December 16, 2003 - 8am Central time zone
Reviewer: Steve from uk
I have two tables - activity and ext_dedupe.
table activity
cllocn moddate
25860032 18/06/2003
95118740 26/08/2003
95837497 15/12/2003
95845433 19/08/2003
table ext_dedupe
master_cllocn dupe_cllocn
25860032 95118740
25860032 95837497
25860032 95845433
Ext_dedupe is a table created by a third party app which has identified duplicate records within
our database. The first column is supposed to be the master and the second the duplicate. The
idea is to mark as archived all our duplicate records with a pointer to the master.
Notwithstanding the order of the columns, what we want to do is find out which record has the most
recent activity (from the activity table) and archive off the others.
So, in this example although the master is listed as 25860032 against the other 3, an examination
of the activity dates mean I want to keep 95837497 and mark the others as archived and have a
pointer on each of them to 95837497. That's why I thought if I could get to the following result
it would make it simpler.
master_record sub_record nxt_record
95845433 25860032 95837497
95118740 25860032 95837497
95837497 25860032 95837497
Hope that makes sense!
Followup December 16, 2003 - 11am Central time zone:
oh, then nxt_record is just
last_value(master_record) over (partition by sub_record order by moddate)
Why...
December 16, 2003 - 1pm Central time zone
Reviewer: Steve from UK
it didn't work for me. I had to change it to
first_value(master_record) over (partition by sub_record order by moddate desc)
Is there a reason for that?
Followup December 16, 2003 - 2pm Central time zone:
doh, default window clause is current row and unbounded preceding
i would have needed a window clause that looks forwards rather then backwards (reason #1 why I
should always set up a test case instead of just answering on the fly)
your solution of reversing the data works just fine.
Another solution
December 16, 2003 - 4pm Central time zone
Reviewer: A reader
The following gives the same result ...
select cllocn master_record, nvl(master_cllocn,cllocn) sub_record
, max(cllocn) keep (dense_rank last order by moddate)
over (partition by nvl(master_cllocn,cllocn)) nxt_record
from activity, ext_dedupe where cllocn = dupe_cllocn
MASTER_RECORD SUB_RECORD NXT_RECORD
95118740 25860032 95837497
95837497 25860032 95837497
95845433 25860032 95837497
Followup December 16, 2003 - 5pm Central time zone:
yes, there are many many ways to do this.
first_value
last_value
substring of max() without keep
sure.

December 16, 2003 - 4pm Central time zone
Reviewer: A reader
Actually the nvl(master_cllocn...) is required only if you need all 4 rows in the output as
follows(there is an outer join involved). If you need only the 3 rows as shown in the above post,
there is no need for the nvl's....
select cllocn master_record, nvl(master_cllocn,cllocn) sub_record
, max(cllocn) keep (dense_rank last order by moddate)
over (partition by nvl(master_cllocn,cllocn)) nxt_record
, last_value(cllocn) over (partition by nvl(master_cllocn,cllocn) order by moddate) nxt
from activity, ext_dedupe where cllocn = dupe_cllocn (+)
MASTER_RECORD SUB_RECORD NXT_RECORD
25860032 25860032 95837497
95118740 25860032 95837497
95837497 25860032 95837497
95845433 25860032 95837497
still q's on analytics
January 30, 2004 - 10am Central time zone
Reviewer: A reader from Madison, wi
Okay, so my web application logs "web transaction" statistics to a table. This actually amounts to
0 to many database tranactions... but anyway.. I need to summarize (sum, min, max, count, average)
each day's transaction times for each class (name2) and action (name3) and ultimately "archive"
this data to a hisory table. I am running 8.1.7 and pretty new to analytics.
My table looks like this:
SQL> desc tran_stats
Name Null? Type
----------------------- -------- ----------------
ID NOT NULL NUMBER(9)
NAME1 VARCHAR2(100)
NAME2 VARCHAR2(100)
NAME3 VARCHAR2(100)
NAME4 VARCHAR2(100)
SEC NOT NULL NUMBER(9,3)
TS_CR NOT NULL DATE
ID NAME1 NAME2 NAME3 SEC NAME4 TS_CR
---------- ----- ------------------------- ---------- ------ ----- ---------
35947 /CM01_PersonManagement CREATE .484 15-JAN-04
35987 /CM01_PersonManagement CREATE .031 15-JAN-04
36086 /CM01_PersonManagement EDIT .312 16-JAN-04
36555 /CM01_PersonManagement CREATE .297 19-JAN-04
36623 /CM01_PersonManagement EDIT .375 19-JAN-04
36627 /CM01_PersonManagement CREATE .047 19-JAN-04
36756 /CM01_AddressManagement CREATE .375 20-JAN-04
36766 /CM01_AddressManagement CREATE .305 20-JAN-04
36757 /CM01_AddressManagement INSERT .391 20-JAN-04
37178 /CM01_PersonManagement EDIT .203 20-JAN-04
and I need output like this:
TS_CR NAME2 NAME3 M_SUM M_MIN M_MAX M_COUNT M_AVG
--------- ------------------------- ---------- ------ ------ ------ ------- ------
20-JAN-04 /CM01_AddressManagement CREATE .680 .305 .375 2 .340
20-JAN-04 /CM01_AddressManagement INSERT .391 .391 .391 1 .391
20-JAN-04 /CM01_PersonManagement EDIT .203 .203 .203 1 .203
19-JAN-04 /CM01_PersonManagement CREATE .344 .047 .297 2 .172
19-JAN-04 /CM01_PersonManagement EDIT .375 .375 .375 1 .375
16-JAN-04 /CM01_PersonManagement EDIT .312 .312 .312 1 .312
15-JAN-04 /CM01_PersonManagement CREATE .515 .031 .484 2 .258
This seems to work, but there has to be a better/cleaner/more efficient way to do this:
select distinct ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
from (
select trunc(ts_cr) ts_cr,id, name2, name3, sum(sec) m_dummy
, min(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_min
, max(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_max
, round(avg(sum(sec)) over(partition by name2,name3,trunc(ts_cr)),5) as m_avg
, count(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_count
, sum(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_sum
from tran_stats group by name2, name3,trunc(ts_cr),id
)n order by 1 desc, 2, 3;
Any help or pointers would be appreciated. Thanks in advance.
Followup January 30, 2004 - 10am Central time zone:
why does there "have to be"?
what is "unclean" about this? I could make it more verbose (and perhaps more readable) but this
does exactly what you ask for?
It seems pretty "good", very "clean" and probably the most efficient method to get this result?
Regarding the previous post ...
January 30, 2004 - 11am Central time zone
Reviewer: A reader
Am I missing something or will the following do the same ..
select trunc(ts_cr) ts_cr, name2, name3,
count(*) m_count, min(sec) m_min, max(sec) m_max,
sum(sec) m_sum, avg(sec) m_avg
from tran_stats
group by trunc(ts_cr), name2, name3
order by 1 desc, 2, 3
Followup January 30, 2004 - 7pm Central time zone:
with the supplied data -- since "group by trunc(ts_cr), name2, name3" happened to be unique
yes.
In general -- no. consider:
ops$tkyte@ORA9IR2> select distinct ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
2 from ( select trunc(ts_cr) ts_cr,
3 id,
4 name2,
5 name3,
6 sum(sec) m_dummy ,
7 min(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_min ,
8 max(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_max ,
9 round(avg(sum(sec)) over(partition by name2,name3,trunc(ts_cr)),5)
as m_avg ,
10 count(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as
m_count ,
11 sum(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_sum
12 from tran_stats
13 group by name2, name3,trunc(ts_cr),id
14 )n
15 MINUS
16 select ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
17 from (
18 select trunc(ts_cr) ts_cr, name2, name3,
19 count(*) m_count, min(sec) m_min, max(sec) m_max,
20 sum(sec) m_sum, avg(sec) m_avg
21 from tran_stats
22 group by trunc(ts_cr), name2, name3 )
23 /
no rows selected
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into tran_stats
2 select 35947,'/CM01_PersonManagement','CREATE', .484 ,'15-JAN-04'
3 from all_users where rownum <= 5;
5 rows created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select distinct ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
2 from ( select trunc(ts_cr) ts_cr,
3 id,
4 name2,
5 name3,
6 sum(sec) m_dummy ,
7 min(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_min ,
8 max(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_max ,
9 round(avg(sum(sec)) over(partition by name2,name3,trunc(ts_cr)),5)
as m_avg ,
10 count(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as
m_count ,
11 sum(sum(sec)) over(partition by name2,name3,trunc(ts_cr)) as m_sum
12 from tran_stats
13 group by name2, name3,trunc(ts_cr),id
14 )n
15 MINUS
16 select ts_cr, name2, name3, m_sum, m_min,m_max,m_count,m_avg
17 from (
18 select trunc(ts_cr) ts_cr, name2, name3,
19 count(*) m_count, min(sec) m_min, max(sec) m_max,
20 sum(sec) m_sum, avg(sec) m_avg
21 from tran_stats
22 group by trunc(ts_cr), name2, name3 )
23 /
TS_CR NAME2 NAME3 M_SUM M_MIN M_MAX M_COUNT M_AVG
--------- ----------------------- -------- ---------- ---------- ---------- ---------- ----------
15-JAN-04 /CM01_PersonManagement CREATE 2.935 .031 2.904 2 1.4675
add more data and it won't be the same.
OK
January 31, 2004 - 9am Central time zone
Reviewer: Siva from Bangalore,India
Dear Tom,
Can analytics be used for the following formats of the same query
sql>select ename,nvl(ename,'Name is null') from emp
sql>select ename,decode(ename,null,'Name is null',ename)
from emp
If you know other ways,Please let me know
Bye!
Followup January 31, 2004 - 10am Central time zone:
umm, why ?
with analytics
February 18, 2004 - 7am Central time zone
Reviewer: A reader
with the following data
-- ------
1 val1_1
1 val1_2
1 val1_3
2 val1_1
2 val2_2
can i produce
-- ------ --------------------
1 val1_1 val1_1,val1_2,val1_3
1 val1_2 val1_1,val1_2,val1_3
1 val1_3 val1_1,val1_2,val1_3
2 val2_1 val2_1,val2_2
2 val2_2 val2_1,val2_2
with an analytic that rocks
Followup February 18, 2004 - 8pm Central time zone:
if
select max(count(*)) from t group by id
has a reasonable maximum -- yes, but it would be a trick lag/lead thing.
I would probably join using stragg. join the details to the aggregate using inline views.
OK
March 1, 2004 - 9am Central time zone
Reviewer: Siddiq from UAE
Hi Tom,
What can be the business use cases of the analytic functions
1)cume_dist
2)percentile_dist
3)percentile_cont
Where can they be of immense use?
Bye!
Followup March 1, 2004 - 10am Central time zone:
they are just statistical functions for analysis.
2 and 3 are really variations on eachother (disc=discrete, cont=continuous) and would be used to
compute pctcentiles (like you might see on an SAT test report from back in high school).
percentile_* can be used to find a median for example :)
cume_dist is a variation on that. I'll cheat on an example, from the doc:
Analytic Example
The following example calculates the salary percentile for each employee in the purchasing area.
For example, 40% of clerks have salaries less than or equal to Himuro.
SELECT job_id, last_name, salary, CUME_DIST() OVER (PARTITION BY job_id ORDER BY salary) AS
cume_dist FROM employees WHERE job_id LIKE PU% ;
JOB_ID LAST_NAME SALARY CUME_DIST
---------- ------------------------- ---------- ----------
PU_CLERK Colmenares 2500 .2
PU_CLERK Himuro 2600 .4
PU_CLERK Tobias 2800 .6
PU_CLERK Baida 2900 .8
PU_CLERK Khoo 3100 1
PU_MAN Raphaely 11000 1
Stumped on Analytics
March 4, 2004 - 9am Central time zone
Reviewer: Dave Thompson from West Yorkshire, England.
Hi Tom,
I have the following two tables:
CREATE TABLE PAY_M
(
PAY_ID NUMBER,
PAYMENT NUMBER
)
--
--
CREATE TABLE PREM
(
PREM_ID NUMBER,
PREM_PAYMENT NUMBER
)
With the following data:
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES (
1, 100);
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES (
2, 50);
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES (
3, 50);
INSERT INTO PREM ( PREM_ID, PREM_PAYMENT ) VALUES (
4, 50);
COMMIT;
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES (
1, 50);
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES (
2, 25);
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES (
3, 50);
INSERT INTO PAY_M ( PAY_ID, PAYMENT ) VALUES (
4, 50);
COMMIT;
PAY_M contains payments made against the premiums in the table prem.
Payments:
PAY_ID PAYMENT
---------- ----------
1 50
2 25
3 50
4 50
Prem:
PREM_ID PREM_PAYMENT
---------- ------------
1 100
2 50
3 50
4 50
We are trying to find which payment Ids paid each premium payment in Prem. The payments are
assigned sequentially to the premiums.
For example payments 1,2 & 3 pay off the £100 in premium 1 leaving £25. Then the remaining payment
from payment 3 & payment 4 pay off premium 2 leaving a balance of £25, and so on.
We are trying to create a query that will use the analytical functions to find all the payment IDs
that pay off the associated premium ids. We want to keep this SQL based as we need to Process
about 30 million payments!
Thanks.
Great website, hope you enjoyed your recent visit to the UK.
Followup March 4, 2004 - 1pm Central time zone:
let me make sure I have this straight -- you want to
o sum up the first 3 records in payments
o discover they are 125 which exceeds 100
o output the fact that prem_id 1 is paid for by pay_id 1..3
o carry forward 25 from 3, discover that leftover 3+4 = 75 pays for prem_id 2
with 25 extra
while I believe (not sure) that the 10g MODEL clause might be able to do this (if you can do it in
a spreadsheet, we can use the MODEL clause to do it).....
I'm pretty certain that analytics cannot -- we would need to recursively use lag (eg: after finding
that 1,2,3 pay off 1, we'd need to -- well, it's hard to explain...)
I cannot see analytics doing this -- future rows depend on functions of the analytics from past
rows and that is just "not allowed".
I can see how to do this in a pipelined PLSQL function -- will that work for you?
Oops - Error in previous post
March 4, 2004 - 10am Central time zone
Reviewer: Dave Thompson from West Yorkshire, England
Tom,
Sorry, ignore the above tables as they are missing the joining column:
CREATE TABLE PAY_M
(
PREM_ID NUMBER,
PAY_ID NUMBER,
PAYMENT NUMBER
)
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES (
1, 1, 50);
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES (
1, 2, 25);
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES (
1, 3, 50);
INSERT INTO PAY_M ( PREM_ID, PAY_ID, PAYMENT ) VALUES (
1, 4, 50);
COMMIT;
CREATE TABLE PREM
(
PREM_ID NUMBER,
PAY_ID NUMBER,
PREM_PAYMENT NUMBER
)
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES (
1, 1, 100);
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES (
1, 2, 50);
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES (
1, 3, 50);
INSERT INTO PREM ( PREM_ID, PAY_ID, PREM_PAYMENT ) VALUES (
1, 4, 50);
COMMIT;
SQL> l
1 SELECT *
2* FROM PAY_M
SQL> /
PREM_ID PAY_ID PAYMENT
---------- ---------- ----------
1 1 50
1 2 25
1 3 50
1 4 50
SQL> select *
2 from prem;
PREM_ID PAY_ID PREM_PAYMENT
---------- ---------- ------------
1 1 100
1 2 50
1 3 50
1 4 50
Thanks.....
March 5, 2004 - 4am Central time zone
Reviewer: Dave Thompson from West Yorkshire, England.
Tom,
Thanks for your prompt response.
I am familiar with Pipeline functions.
I was however hoping we could do this as a set based opertion because of the volume of data
involved.
Thanks for your time.
analytics book
March 5, 2004 - 5am Central time zone
Reviewer: Ron Chennells from UK
Just another vote and pre order for the analytics book
OK
March 19, 2004 - 12am Central time zone
Reviewer: Gerhard from Dusseldorf,Germany
Dear Tom,
I used the following query to find the difference of salaries between employees.
SQL> select ename,sal,sal-lag(sal) over(order by sal) as diff_sal from emp;
ENAME SAL DIFF_SAL
---------- ---------- ----------
SMITH 800
JAMES 950 150
ADAMS 1100 150
WARD 1250 150
MARTIN 1250 0
MILLER 1300 50
TURNER 1500 200
ALLEN 1600 100
CLARK 2450 850
BLAKE 2850 400
JONES 2975 125
ENAME SAL DIFF_SAL
---------- ---------- ----------
SCOTT 3000 25
FORD 3000 0
KING 5000 2000
14 rows selected.
My Question is:
" What is the difference between King's sal with other
employees?".Could you please help with the query?
Bye!
Followup March 19, 2004 - 8am Central time zone:
scott@ORA9IR2> select ename,sal,sal-lag(sal) over(order by sal) as diff_sal ,
2 sal-king_sal king_sal_diff
3 from (select sal king_sal from emp where ename = 'KING'),
4 emp
5 /
ENAME SAL DIFF_SAL KING_SAL_DIFF
---------- ---------- ---------- -------------
SMITH 800 -4200
JAMES 950 150 -4050
ADAMS 1100 150 -3900
WARD 1250 150 -3750
MARTIN 1250 0 -3750
MILLER 1300 50 -3700
TURNER 1500 200 -3500
ALLEN 1600 100 -3400
CLARK 2450 850 -2550
BLAKE 2850 400 -2150
JONES 2975 125 -2025
SCOTT 3000 25 -2000
FORD 3000 0 -2000
KING 5000 2000 0
14 rows selected.
Will this be faster?
March 19, 2004 - 4pm Central time zone
Reviewer: Venkat from Detroit
select ename, sal,
sal-lag(sal) over(order by sal) as diff_sal,
sal - max(case when ename='KING' then sal
else null end) over () king_sal_diff
from emp
Followup March 20, 2004 - 9am Central time zone:
when you benchmarked it and tested it to scale, what did you see? it would be interesting no?
lead/lag on different dataset
May 3, 2004 - 9pm Central time zone
Reviewer: Stalin from CA, USA
Hi Tom,
I've similar requirement but i'm not sure how to use lead or lag to refer from a different dataset.
Eg. logs table has both login and logout information and they are identified by action column.
There could be different login/logout modes so records that have action in (1,2) and (3,4,5,6,7)
values are login and logout records respectively. Now i need to find signon and signout times and
also session duration in mins.
here is some sample data of logs table :
LOG_ID LOG_CREATION_DATE USER_ID SERVICE ACTION
---------- ------------------- ---------- ---------- ----------
1 04/29/2004 10:48:36 3 5 2
3 04/29/2004 10:53:44 3 5 3
5 04/29/2004 11:11:35 3 5 1
1003 05/03/2004 15:18:53 3 5 5
1004 05/03/2004 15:19:50 8 5 1
here is a query i came up with (not exacly what i want) :
select log_id signon_id, lead(log_id, 1) over (partition by account_id, user_id, mac order by
log_id) signoff_id,
user_id, log_creation_date signon_date,
lead(log_creation_date, 1) over (partition by account_id, user_id, mac order by
log_creation_date) signoff_date,
nvl(round(((lead(log_creation_date, 1)
over (partition by account_id, user_id order by
log_creation_date)-log_creation_date)*1440), 2), 0) Usage_Mins
from logs
where account_id = 'Robert'
and service = 5
order by user_id
desired output :
SIGNON_ID SIGNOFF_ID USER_ID SIGNON_DATE SIGNOFF_DATE USAGE_MINS
---------- ---------- ---------- ------------------- ------------------- ----------
1 3 3 04/29/2004 10:48:36 04/29/2004 10:53:44 5.13
5 1003 3 04/29/2004 11:11:35 05/03/2004 15:18:53 6007.3
1004 8 05/03/2004 15:19:50 0
Thanks in Advance,
Stalin
Followup May 4, 2004 - 7am Central time zone:
maybe if you supply simple create table and insert ... values ... statements for me.... this stuff
would go faster.
Your query references columns that are not in the example as well.
Create table scripts
May 4, 2004 - 1pm Central time zone
Reviewer: Stalin from CA, USA
Sorry for not giving this info in the first place.
here goes the scripts....
create table logs (log_id number, log_creation_date date, account_id varchar2(25), user_id number,
service number, action number, mac varchar2(50))
/
insert into logs values (1, to_date('04/29/2004 10:48:36'), 'Robert', 3, 5, 2, '00-00-00-00')
/
insert into logs values (3, to_date('04/29/2004 10:53:44'), 'Robert', 3, 5, 3, '00-00-00-00')
/
insert into logs values (5, to_date('04/29/2004 11:11:35'), 'Robert', 3, 5, 1, '00-00-00-00')
/
insert into logs values (1003, to_date('05/03/2004 15:18:53'), 'Robert', 3, 5, 5, '00-00-00-00')
/
insert into logs values (1004, to_date('05/03/2004 15:19:50'), 'Robert', 8, 5, 1, '00-00-00-00')
/
The reason for including mac in the partition group is cause users can login via multiple pc's
without logging out hence i grouped it on account_id, user_id and mac.
Thanks,
Stalin
Followup May 4, 2004 - 2pm Central time zone:
ops$tkyte@ORA9IR2> select a.* , round( (signoff_date-signon_date) * 24 * 60, 2 ) minutes
2 from (
3 select log_id,
4 case when action in (1,2) and lead(action) over (partition by account_id,user_id,mac
order by log_creation_date) in (3,4,5,6,7)
5 then lead(log_id) over (partition by account_id, user_id, mac order by
log_creation_date)
6 end signoff_id,
7 user_id,
8 log_creation_date signon_date,
9 case when action in (1,2) and lead(action) over (partition by account_id,user_id,mac
order by log_creation_date) in (3,4,5,6,7)
10 then lead(log_creation_date) over (partition by account_id, user_id, mac order by
log_creation_date)
11 end signoff_date,
12 action
13 from logs
14 where account_id = 'Robert'
15 and service = 5
16 order by user_id
17 ) a
18 where action in (1,2)
19 /
LOG_ID SIGNOFF_ID USER_ID SIGNON_DATE SIGNOFF_DATE ACTION MINUTES
---------- ---------- ---------- ------------------- ------------------- ---------- ----------
1 3 3 04/29/2004 10:48:36 04/29/2004 10:53:44 2 5.13
5 1003 3 04/29/2004 11:11:35 05/03/2004 15:18:53 1 6007.3
1004 8 05/03/2004 15:19:50 1
Excellent
May 4, 2004 - 3pm Central time zone
Reviewer: Stalin from CA, USA
This is exactly what i'm looking for.
Thanks so much!
Help On SQL
May 4, 2004 - 8pm Central time zone
Reviewer: VKOUL from Lacey, WA
I want to substitute the non null value of a column to its null column. e.g.
If I have records like the following
year month column_value
----- ------ --------------------
2002 06 55
2002 06 57
2002 07 NULL
2002 08 NULL
2002 09 NULL
2002 10 100
2002 11 101
I want the results as below
year month column_value
----- ------ --------------------
2002 06 55
2002 06 57
2002 07 57 ------> Repeated
2002 08 57 ------> Repeated
2002 09 57 ------> Repeated
2002 10 100
2002 11 101
Followup May 4, 2004 - 9pm Central time zone:
create table,
insert into table
much appreciated......... (so i don't spend days of my life making create tables and insert into
statements. I've added this request to all pages where you can input stuff and I'll just be asking
for it from now on in...... Not picking on you, just reminding everyone that i need a script like
I provide.....)
but..... asked and answered:
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:10286792840956
Help On SQL
May 4, 2004 - 11pm Central time zone
Reviewer: VKoul
Beautiful !!!
I'll keep in mind "create table etc."
Thanks
VKoul
analytic q
May 11, 2004 - 6pm Central time zone
Reviewer: A reader
Tom
Please look at the following schema and data.
---------
spool schema
set echo on
drop table host_instances;
drop table rac_instances;
drop table instance_tablespaces;
create table host_instances
(
host_name varchar2(50),
instance_name varchar2(50)
);
create table rac_instances
(
rac_name varchar2(50),
instance_name varchar2(50)
);
create table instance_tablespaces
(
instance_name varchar2(50),
tablespace_name varchar2(50),
tablespace_size number
);
-- host to instance mapping data
insert into host_instances values ( 'h1', 'i1' );
insert into host_instances values ( 'h2', 'i2' );
insert into host_instances values ( 'h3', 'i3' );
insert into host_instances values ( 'h4', 'i4' );
insert into host_instances values ( 'h5', 'i5' );
-- rac to instance mapping data
insert into rac_instances values ( 'rac1', 'i1' );
insert into rac_instances values ( 'rac1', 'i2' );
insert into rac_instances values ( 'rac2', 'i3' );
insert into rac_instances values ( 'rac2', 'i4' );
--- instance to tablespace mapping data
insert into instance_tablespaces values( 'i1', 't11', 100 );
insert into instance_tablespaces values( 'i1', 't12', 200 );
insert into instance_tablespaces values( 'i2', 't11', 100 );
insert into instance_tablespaces values( 'i2', 't12', 200 );
insert into instance_tablespaces values( 'i3', 't31', 500 );
insert into instance_tablespaces values( 'i3', 't32', 300 );
insert into instance_tablespaces values( 'i4', 't31', 500 );
insert into instance_tablespaces values( 'i4', 't32', 300 );
insert into instance_tablespaces values( 'i5', 't51', 400 );
commit;
---------
What I need is to sum up all tablespaces of all instances
for a list of hosts. However, if two hosts in the list
belong to a RAC then I should only pick one of the
hosts (I can pick any one of them.)
e.g. in the above data I should only pick i1 or i2 not
both since they both belong to the same RAC 'rac1'.
Following is the select I came up with for the above data.
Let me know if you have any comments on it.
Any other alternative solutions you can think of would
also be educating to me. I have not benchmarked this
select yet. The number of hosts could reach up to 2000
approximately. On an average we can assume each will have
one instance - some of these will be RACs.
Thank you!
-----------
scott@ora10g> set echo on
scott@ora10g> column host_name format a10
scott@ora10g> column instance_name format a10
scott@ora10g> column rac_name format a10
scott@ora10g> column row_number format 999
scott@ora10g>
scott@ora10g> select a.instance_name, sum( tablespace_size )
2 from
3 (
4 select instance_name
5 from
6 (
7 select host_name, instance_name, rac_name,
8 row_number() over
9 (
10 partition by rac_name
11 order by rac_name, instance_name
12 ) row_number
13 from
14 (
15 select hi.host_name, hi.instance_name, ri.rac_name
16 from host_instances hi, rac_instances ri
17 where hi.instance_name = ri.instance_name(+)
18 )
19 )
20 where row_number <= 1
21 ) a, instance_tablespaces e
22 where a.instance_name = e.instance_name
23 group by a.instance_name;
i1 300
i3 800
i5 400
---
Also do you prefer the .sql file (as above) or
the spooled output of schema.sql (i.e. schema.lst.)
The above is more convenient to reproduce - but the spooled output makes for better reading in some
cases.
Followup May 11, 2004 - 9pm Central time zone:
I like the cut and paste from sqlplus truth be told.
sure, I have to do two vi commands and a couple of deletes to fix it up but.... I'm fairly certain
that the poster *actually ran the commands successfully!* which is most relevant to me....
Besides, I do it to you ;)
ops$tkyte@ORA9IR2> select *
2 from (
3 select h.host_name, h.instance_name, r.rac_name, sum(t.tablespace_size),
4 row_number() over (partition by r.rac_name order by h.host_name ) rn
5 from host_instances h,
6 rac_instances r,
7 instance_tablespaces t
8 where h.instance_name = r.instance_name(+)
9 and h.instance_name = t.instance_name
10 group by h.host_name, h.instance_name, r.rac_name
11 )
12 where rn = 1
13 /
HO IN RAC_N SUM(T.TABLESPACE_SIZE) RN
-- -- ----- ---------------------- ----------
h1 i1 rac1 300 1
h3 i3 rac2 800 1
h5 i5 400 1
is the first thing that popped into my head.
with just a couple hundred rows -- any of them will perform better than good enough.
thanx!
May 11, 2004 - 9pm Central time zone
Reviewer: A reader
"I like the cut and paste from sqlplus truth be told."
Actually I was going to post that only - but your
example at the point of posting led me to believe
that you want a straight sql - may be you wanna
fix that (not that many people seem to care anyways!:))
Thanx for the sql - it looks good and a tad simpler
than the one I wrote...
How to compute this running total (sort of...)
May 18, 2004 - 11am Central time zone
Reviewer: Kishan from USA
create table investment (
investment_id number,
asset_id number,
agreement_id number,
constraint pk_i primary key (investment_id)
)
/
create table period (
period_id number,
business_domain varchar2(10),
status_code varchar2(10),
constraint pk_p primary key (period_id)
)
/
create table entry (
entry_id number,
period_id number,
investment_id number,
constraint pk_e primary key(entry_id),
constraint fk_e_period foreign key(period_id) references period(period_id),
constraint fk_e_investment foreign key (investment_id) references investment(investment_id)
)
/
create table entry_detail(
entry_id number,
account_type varchar2(10),
amount number,
constraint pk_ed primary key(entry_id, account_type),
constraint fk_ed_entry foreign key(entry_id) references entry(entry_id)
)
/
insert into period (period_id, business_domain, status_code)
SELECT rownum AS period_id,
'BDG' AS business_domain,
'2' AS status_code
from all_objects where rownum <= 5
/
insert into investment(investment_id, asset_id, agreement_id)
select rownum+10 AS investment_id,
rownum+100 AS asset_id,
rownum+1000 AS agreement_id
from all_objects where rownum <=5
/
insert into entry(entry_id, period_id, investment_id) values (1, 1, 11)
/
insert into entry(entry_id, period_id, investment_id) values (2, 2, 11)
/
insert into entry(entry_id, period_id, investment_id) values (3, 3, 11)
/
insert into entry(entry_id, period_id, investment_id) values (4, 3, 13)
/
insert into entry(entry_id, period_id, investment_id) values (5, 4, 13)
/
insert into entry(entry_id, period_id, investment_id) values (6, 4, 14)
/
insert into entry(entry_id, period_id, investment_id) values (7, 5, 14)
/
insert into entry_detail(entry_id, account_type, amount) values(1, 'AC1', 1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(1, 'AC2', -200 )
/
insert into entry_detail(entry_id, account_type, amount) values(1, 'AC3', 300 )
/
insert into entry_detail(entry_id, account_type, amount) values(2, 'AC1', 200 )
/
insert into entry_detail(entry_id, account_type, amount) values(2, 'AC4', -1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(2, 'AC2', -500 )
/
insert into entry_detail(entry_id, account_type, amount) values(3, 'AC2', 2200 )
/
insert into entry_detail(entry_id, account_type, amount) values(3, 'AC1', 200 )
/
insert into entry_detail(entry_id, account_type, amount) values(4, 'AC4', -1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(4, 'AC2', -500 )
/
insert into entry_detail(entry_id, account_type, amount) values(5, 'AC2', 2200 )
/
insert into entry_detail(entry_id, account_type, amount) values(6, 'AC1', 200 )
/
insert into entry_detail(entry_id, account_type, amount) values(6, 'AC4', -1000 )
/
insert into entry_detail(entry_id, account_type, amount) values(6, 'AC2', -500 )
/
insert into entry_detail(entry_id, account_type, amount) values(7, 'AC1', 2200 )
/
insert into entry_detail(entry_id, account_type, amount) values(7, 'AC3', 500 )
/
insert into entry_detail(entry_id, account_type, amount) values(7, 'AC4', 1200 )
/
scott@LDB.US.ORACLE.COM> select * from period;
PERIOD_ID BUSINESS_D STATUS_COD
---------- ---------- ----------
1 BDG 2
2 BDG 2
3 BDG 2
4 BDG 2
5 BDG 2
scott@LDB.US.ORACLE.COM> select * from investment;
INVESTMENT_ID ASSET_ID AGREEMENT_ID
------------- ---------- ------------
11 101 1001
12 102 1002
13 103 1003
14 104 1004
15 105 1005
scott@LDB.US.ORACLE.COM> select * from entry;
ENTRY_ID PERIOD_ID INVESTMENT_ID
---------- ---------- -------------
1 1 11
2 2 11
3 3 11
4 3 13
5 4 13
6 4 14
7 5 14
7 rows selected.
scott@LDB.US.ORACLE.COM> select * from entry_detail;
ENTRY_ID ACCOUNT_TY AMOUNT
---------- ---------- ----------
1 AC1 1000
1 AC2 -200
1 AC3 300
2 AC1 200
2 AC4 -1000
2 AC2 -500
3 AC2 2200
3 AC1 200
4 AC4 -1000
4 AC2 -500
5 AC2 2200
6 AC1 200
6 AC4 -1000
6 AC2 -500
7 AC1 2200
7 AC3 500
7 AC4 1200
17 rows selected.
The resultant view needed is given below.
To give an example from the result below, the first entry for investment_id 14
is from period 4. The account types entered on period 4 are AC1, AC4, AC2. We
need these three account types in all subsequent periods. Also, on period 5 a
new account type AC3 is added. So, if there is another period, say period_id 6, we need
information for AC1, AC2, AC3, AC4 (that's 4 account types). If there's no entry
for any of these account_types for any subseqent periods, the amount_for_period for such
periods are considered to be 0.00 and the balance will be sum(amount_for_period)
until that period.
PERIOD_ID INVESTMENT_ID ACCOUNT_TYPE AMOUNT_FOR_PERIOD BALANCE_TILL_PERIOD
--------- ------------- ------------ ----------------- -------------------
1 11 AC1 1000 1000
1 11 AC2 -200 -200
1 11 AC3 300 300
2 11 AC1 200 1200
2 11 AC2 -500 -700
2 11 AC3 0 300
2 11 AC4 -1000 -1000
3 11 AC1 200 1400
3 11 AC2 200 -500
3 11 AC3 0 300
3 11 AC4 0 1000
4 11 AC1 0 1400
4 11 AC2 0 -500
4 11 AC3 0 300
4 11 AC4 0 1000
5 11 AC1 0 1400
5 11 AC2 0 -500
5 11 AC3 0 300
5 11 AC4 0 1000
3 13 AC4 -1000 -1000
3 13 AC2 -500 -500
4 13 AC4 0 -1000
4 13 AC2 -500 -1000
5 13 AC4 0 -1000
5 13 AC4 0 -1000
4 14 AC1 200 200
4 14 AC4 -1000 -1000
4 14 AC2 -500 -500
5 14 AC1 2200 2400
5 14 AC3 500 500
5 14 AC4 1200 200
5 14 AC2 0 -500
The blank lines in between are just for clarity. As always, grateful for all your efforts.
Regards,
Kishan.
Followup May 18, 2004 - 6pm Central time zone:
so, what does your first try look like :) at least get the join written up for the details - maybe
the running total will be obvious from that.
This is how far I went...and no further
May 19, 2004 - 10am Central time zone
Reviewer: Kishan from USA
select distinct period_id,
investment_id,
account_type,
amount_for_period,
balance_till_period
from ( select period.period_id,
entry.investment_id,
entry_detail.account_type,
(case when entry.period_id = period.period_id then entry_detail.amount else 0
end) amount_for_period,
sum(amount) over(partition by period.period_id, investment_id, account_type)
balance_till_period
from period left outer join (entry join entry_detail on (entry.entry_id =
entry_detail.entry_id)) on (entry.period_id <= period.period_id))
order by investment_id
The result looks as below:
PERIOD_ID INVESTMENT_ID ACCOUNT_TY AMOUNT_FOR_PERIOD BALANCE_TILL_PERIOD
---------- ------------- ---------- ----------------- -------------------
1 11 AC1 1000 1000
1 11 AC2 -200 -200
1 11 AC3 300 300
2 11 AC1 0 1200
2 11 AC1 200 1200
2 11 AC2 -500 -700
2 11 AC2 0 -700
2 11 AC3 0 300
2 11 AC4 -1000 -1000
3 11 AC1 0 1400
3 11 AC1 200 1400
3 11 AC2 0 1500
3 11 AC2 2200 1500
3 11 AC3 0 300
3 11 AC4 0 -1000
4 11 AC1 0 1400
4 11 AC2 0 1500
4 11 AC3 0 300
4 11 AC4 0 -1000
5 11 AC1 0 1400
5 11 AC2 0 1500
5 11 AC3 0 300
5 11 AC4 0 -1000
3 13 AC2 -500 -500
3 13 AC4 -1000 -1000
4 13 AC2 0 1700
4 13 AC2 2200 1700
4 13 AC4 0 -1000
5 13 AC2 0 1700
5 13 AC4 0 -1000
4 14 AC1 200 200
4 14 AC2 -500 -500
4 14 AC4 -1000 -1000
5 14 AC1 0 2400
5 14 AC1 2200 2400
5 14 AC2 0 -500
5 14 AC3 500 500
5 14 AC4 0 200
5 14 AC4 1200 200
First, I am sorry my originally constructed result (by hand..;) misses a couple of rows .
However, other than that, I am unable to remove the redundant rows that are shows up for the
particular investment and accout_type for a period as the logic beats me.
Basically, I need to remove rows where the amount_for_period is 0 for an account_type only if its a
redundant row for that set. That is, the first row of period_id 2 and 3 are redundant but the rows
for the period 4 are not redundant.
Could you help me out?
Regards,
Kishan.
Followup May 19, 2004 - 11am Central time zone:
are we missing some more order bys? I mean -- what if:
3 11 AC1 0 1400
3 11 AC1 200 1400
3 11 AC2 0 1500
3 11 AC2 2200 1500
3 11 AC3 0 300
3 11 AC4 0 -1000
was really:
3 11 AC1 200 1400
3 11 AC2 0 1500
3 11 AC2 2200 1500
3 11 AC3 0 300
3 11 AC4 0 -1000
3 11 AC1 0 1400
would that still be redundant? missing something here/
Yes...they are redundant
May 19, 2004 - 12pm Central time zone
Reviewer: A reader
Tom:
Yes, for that particular set, those rows are redundant, no matter what the order is.
Regards,
Kishan.
Followup May 19, 2004 - 2pm Central time zone:
ok, so what is the "key" of that result set? what can we partition the result set by.
my idea will be to use your query in an inline view and analytics on that to weed out what you
want.

May 19, 2004 - 3pm Central time zone
Reviewer: Kishan from USA
The key would be period_id, investment_id and accout_type. Basically, what the result represents is
the amount and the balance-to-date for a particular account_type of an investment_id for a period.
Eg: Period 1->Investment 1->Account_Type AC1->Amount=1000->Balance=1000
If there's no activity on that investment and account_type for the next period, say Period 2, the
amount will be 0 for that period, and the balance will be previous period's balance.
Period 1->Investment 1->Account_Type AC1->Amount=1000->Balance=1000
Period 2->Investment 1->Account_Type AC1->Amount=0->Balance = 1000
But, if there's an activity on that account_type for that investment, then the amount will be the
amount for that period and balance will be the sum of previous balance and current amount. Say for
Period 2, the amount is 500, then
Period 1->Investment 1->Account_Type AC1->Amount=1000-> Balance=1000
Period 2->Investment 1->Account_Type AC1->Amount=500-> Balance=1500
And if there's a new account type entry, say AC2 and amount, say 2000 created for period 2, then
the result set will be
Period 1->Investment 1->Account_Type AC1->Amount=1000->Balance=1000
Period 2->Investment 1->Account_Type AC1->Amount=500->Balance=1500
Period 2->Investment 1->Account_Type AC2->Amount=2000->Balance=2000
There may be many investments per period and many account_types per investment. Hope I am clear....
Regards,
Kishan.
Followup May 19, 2004 - 5pm Central time zone:
so... if you have:
PERIOD_ID INVESTMENT_ID ACCOUNT_TY AMOUNT_FOR_PERIOD BALANCE_TILL_PERIOD
---------- ------------- ---------- ----------------- -------------------
1 11 AC1 1000 1000
1 11 AC2 -200 -200
1 11 AC3 300 300
2 11 AC1 0 1200
2 11 AC1 200 1200
2 11 AC2 -500 -700
2 11 AC2 0 -700
2 11 AC3 0 300
2 11 AC4 -1000 -1000
you see though, why isn't the 4th line here "redundant" then?
But it is redundant..
May 19, 2004 - 11pm Central time zone
Reviewer: Kishan from USA
Tom, I am assuming the 4th line you mention is 2->11->AC2->0->-700. Yes, it is redundant.
We need amount and balance for every period_id, investment_id and account_type. One line, per
period_id, investment_id and account_type, anything more, is redundant.
Issue is, there may not be entries for a specific account_type of an investment for a particular
period. In such cases, we need to assume amount for such periods are 0 and compute the balances
accordingly.
Regards,
Kishan
Followup May 20, 2004 - 10am Central time zone:
so, if you partition by
PERIOD_ID INVESTMENT_ID ACCOUNT_TY BALANCE_TILL_PERIOD
order by
AMOUNT_FOR_PERIOD
select a.*, lead(amount_for_period) over (partition by .... order by ... ) nxt
from (YOUR_QUERY)
you can then
select *
from (that_query)
where nxt is NULL or (nxt is not null and amount_for_period <> 0)
if nxt is null -- last row in the partition, keep it.
if nxt is not null AND we are zero -- remove it.
Almost there?
May 20, 2004 - 12pm Central time zone
Reviewer: Dave Thompson from UK
Hi Tom,
We have the following table of data:
CREATE TABLE DEDUP_TEST
(
ID NUMBER,
COLUMN_A VARCHAR2(10 BYTE),
COLUMN_B VARCHAR2(10 BYTE),
COLUMN_C VARCHAR2(10 BYTE),
START_DATE DATE,
END_DATE DATE
)
With:
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'A', 'B', 'C', TO_Date( '10/01/1999 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2000 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'D', 'B', 'C', TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'A', 'B', 'C', TO_Date( '10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'a', 'f', 'f', TO_Date( '02/06/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'02/07/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/01/2000 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/02/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/05/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/02/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/03/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/04/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/06/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
3, 'A', 'F', 'F', TO_Date( '02/10/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'02/20/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
COMMIT;
We are trying to sequentially de-duplicate this data.
Basically from the top of the table we go down and the check each row against the previous. If
they are the same the row that is a duplicate is marked as such as is the original row.
So far we have this query:
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
CASE WHEN ( DUP = 'DUP' OR DUPER = 'DUP' ) THEN 'DUP' ELSE 'NOT' END LETSEE
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
DUP,
CASE WHEN COLUMN_A = NEXT_A
AND COLUMN_B = NEXT_B
AND COLUMN_C = NEXT_C THEN 'DUP' ELSE 'NOT' END DUPER
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
NEXT_A,
NEXT_B,
NEXT_C,
CASE WHEN COLUMN_A = PREV_A
AND COLUMN_B = PREV_B
AND COLUMN_C = PREV_C THEN 'DUP' ELSE 'NOT' END DUP
FROM ( SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
LAG (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS prev_A,
LAG (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS prev_B,
LAG (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS prev_C,
LEAD (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS next_A,
LEAD (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS next_B,
LEAD (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS next_C
FROM DEDUP_TEST
ORDER
BY 1, 5 ) ) )
ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-00 01-OCT-01 DUP
2 A B B 01-OCT-01 01-OCT-03 DUP
2 A B B 02-OCT-01 05-OCT-03 DUP
2 a f f 06-FEB-04 07-FEB-04 NOT
2 A B B 02-OCT-05 03-OCT-05 DUP
2 A B B 04-OCT-05 06-OCT-05 DUP
3 A F F 10-FEB-04 20-FEB-04 NOT
The resultset from this is almost what I am after.
However where there are groups of duplicate rows I only want to return one row. I take the
attributes, the start_date of the first row duplicated and the end_date of the last row duplicated.
I do not want to group all the duplicates together, so for example the rows with the attributes
ID COLUMN_A COLUMN_B COLUMN_C
2 A B B
will result in two output rows:
2 A B B 01-OCT-00 01-OCT-03
2 A B B 02-OCT-05 06-OCT-05
This is the final piece I cannot work out.
Any help would be appreciated.
Thanks.
Followup May 20, 2004 - 2pm Central time zone:
what happens in your data if you had
1 A1 B1 C1 ....
1 A2 B2 C2 ....
1 A1 B1 C1 ....
that might or might not be "dup" since you just order by ID? don't we need to ordedr by a,b, and
c?
Follow up
May 21, 2004 - 5am Central time zone
Reviewer: Dave Thompson from UK
Hi Tom,
In repsonse to your question:
what happens in your data if you had
1 A1 B1 C1 ....
1 A2 B2 C2 ....
1 A1 B1 C1 ....
Then the first row would be classed as unique, as would the second and the third. We are only
looking at duplicates that occur sequentially.
Sequential duplicates are then turned into one row by taking the start date of the first row and
the end date of the last row in the group.
The test data should have had sequential dates:
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'A', 'B', 'C', TO_Date( '10/01/1999 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2000 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'D', 'B', 'C', TO_Date( '10/01/2001 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
1, 'A', 'B', 'C', TO_Date( '10/01/2002 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'a', 'f', 'f', TO_Date( '02/06/2009 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'02/07/2010 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/01/2003 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2004 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/01/2005 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/01/2006 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/02/2007 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/05/2008 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/02/2011 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/03/2012 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
2, 'A', 'B', 'B', TO_Date( '10/04/2013 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'10/06/2014 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
INSERT INTO DEDUP_TEST ( ID, COLUMN_A, COLUMN_B, COLUMN_C, START_DATE,
END_DATE ) VALUES (
3, 'A', 'F', 'F', TO_Date( '02/10/2014 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date(
'02/20/2015 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'));
COMMIT;
CREATE TABLE DEDUP_TEST
(
ID NUMBER,
COLUMN_A VARCHAR2(10 BYTE),
COLUMN_B VARCHAR2(10 BYTE),
COLUMN_C VARCHAR2(10 BYTE),
START_DATE DATE,
END_DATE DATE
)
The query:
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
CASE WHEN ( DUP = 'DUP' OR DUPER = 'DUP' ) THEN 'DUP' ELSE 'NOT' END LETSEE
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
DUP,
CASE WHEN COLUMN_A = NEXT_A
AND COLUMN_B = NEXT_B
AND COLUMN_C = NEXT_C THEN 'DUP' ELSE 'NOT' END DUPER
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
NEXT_A,
NEXT_B,
NEXT_C,
CASE WHEN COLUMN_A = PREV_A
AND COLUMN_B = PREV_B
AND COLUMN_C = PREV_C THEN 'DUP' ELSE 'NOT' END DUP
FROM ( SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
LAG (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS prev_A,
LAG (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS prev_B,
LAG (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS prev_C,
LEAD (COLUMN_A, 1, 0) OVER (ORDER BY ID) AS next_A,
LEAD (COLUMN_B, 1, 0) OVER (ORDER BY ID) AS next_B,
LEAD (COLUMN_C, 1, 0) OVER (ORDER BY ID) AS next_C
FROM DEDUP_TEST
ORDER
BY ID, START_DATE ) ) )
Gives:
ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 03-OCT-12 DUP
2 A B B 04-OCT-13 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT
From this the sequentially duplicated rows with the attributes a, b, c will become:
2 A B C 01-OCT-03 05-OCT-08
2 A B C 02-OCT-11 06-OCT-14
Thanks.
Followup May 21, 2004 - 10am Central time zone:
define sequentially.
1 A1 B1 C1 ....
1 A2 B2 C2 ....
1 A1 B1 C1 ....
ordered by ID is the same (exact same) as:
1 A1 B1 C1 ....
1 A1 B1 C1 ....
1 A2 B2 C2 ....
and
1 A2 B2 C2 ....
1 A1 B1 C1 ....
1 A1 B1 C1 ....
and in fact, two runs of your query could return different answers given the SAME exact data. How
to handle that, you must have something more to sort by.
Typo in previous post
May 21, 2004 - 5am Central time zone
Reviewer: Dave Thompson from England
Tom,
The final output should be:
From this the sequentially duplicated rows with the attributes a, b, c will
become:
2 A B B 01-OCT-03 05-OCT-08
2 A B B 02-OCT-11 06-OCT-14
Thanks.
Order
May 21, 2004 - 10am Central time zone
Reviewer: Dave Thompson from England
Hi Tom,
The order of the dataset should be on the ID and Start Date.
ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 03-OCT-12 DUP
2 A B B 04-OCT-13 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT
Thanks.
Followup May 21, 2004 - 11am Central time zone:
Ok, your example doesn't do that -- it is "non-deterministic", given the same data, it could/would
return two different answers at different times during the day!
so, i think you want one of these:
ops$tkyte@ORA9IR2> select *
2 from (
3 select id, a,b,c, start_date, end_date,
4 case when (a = lag(a) over (order by id, start_date desc) and
5 b = lag(b) over (order by id, start_date desc) and
6 c = lag(c) over (order by id, start_date desc) )
7 then row_number() over (order by id, start_date)
8 end rn
9 from v
10 )
11 where rn is null
12 /
ID A B C START_DAT END_DATE RN
---------- ---------- ---------- ---------- --------- --------- ----------
1 A B C 01-OCT-99 01-OCT-00
1 D B C 01-OCT-01 01-OCT-02
1 A B C 01-OCT-02 01-OCT-03
2 A B B 02-OCT-07 05-OCT-08
2 a f f 06-FEB-09 07-FEB-10
2 A B B 04-OCT-13 06-OCT-14
3 A F F 10-FEB-14 20-FEB-15
7 rows selected.
ops$tkyte@ORA9IR2> select *
2 from (
3 select id, a,b,c, start_date, end_date,
4 case when (a = lag(a) over (order by id, start_date) and
5 b = lag(b) over (order by id, start_date) and
6 c = lag(c) over (order by id, start_date) )
7 then row_number() over (order by id, start_date)
8 end rn
9 from v
10 )
11 where rn is null
12 /
ID A B C START_DAT END_DATE RN
---------- ---------- ---------- ---------- --------- --------- ----------
1 A B C 01-OCT-99 01-OCT-00
1 D B C 01-OCT-01 01-OCT-02
1 A B C 01-OCT-02 01-OCT-03
2 A B B 01-OCT-03 01-OCT-04
2 a f f 06-FEB-09 07-FEB-10
2 A B B 02-OCT-11 03-OCT-12
3 A F F 10-FEB-14 20-FEB-15
7 rows selected.
we just need to mark records that the preceding record is the "same" after sorting -- then nuke
them.
More Info
May 21, 2004 - 12pm Central time zone
Reviewer: Dave Thompson from England, Sunny spells with cloud today.
Hi Tom,
Thanks for the prompt reply.
I re-wrote the base query:
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
CASE WHEN ( DUP = 'DUP' OR DUPER = 'DUP' ) THEN 'DUP' ELSE 'NOT' END LETSEE
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
DUP,
CASE WHEN COLUMN_A = NEXT_A
AND COLUMN_B = NEXT_B
AND COLUMN_C = NEXT_C THEN 'DUP' ELSE 'NOT' END DUPER
FROM (
SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
NEXT_A,
NEXT_B,
NEXT_C,
CASE WHEN COLUMN_A = PREV_A
AND COLUMN_B = PREV_B
AND COLUMN_C = PREV_C THEN 'DUP' ELSE 'NOT' END DUP
FROM ( SELECT ID,
COLUMN_A,
COLUMN_B,
COLUMN_C,
START_DATE,
END_DATE,
ROWID ROWID_R,
LAG (COLUMN_A, 1, 0) OVER (ORDER BY ID, START_DATE) AS prev_A,
LAG (COLUMN_B, 1, 0) OVER (ORDER BY ID, START_DATE) AS prev_B,
LAG (COLUMN_C, 1, 0) OVER (ORDER BY ID, START_DATE) AS prev_C,
LEAD (COLUMN_A, 1, 0) OVER (ORDER BY ID, START_DATE) AS next_A,
LEAD (COLUMN_B, 1, 0) OVER (ORDER BY ID, START_DATE) AS next_B,
LEAD (COLUMN_C, 1, 0) OVER (ORDER BY ID, START_DATE) AS next_C
FROM DEDUP_TEST
ORDER
BY ID, START_DATE ) ) )
And got:
ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 03-OCT-12 DUP
2 A B B 04-OCT-13 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT
Looking at the column LETSEE I want to add a unique identifier to each row, treating duplicated
rows as 1.
For example:
ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET DUP_ID
---------- ---------- ---------- ---------- --------- --------- --- ------
1 A B C 01-OCT-99 01-OCT-00 NOT 1
1 D B C 01-OCT-01 01-OCT-02 NOT 2
1 A B C 01-OCT-02 01-OCT-03 NOT 3
2 A B B 01-OCT-03 01-OCT-04 DUP 4
2 A B B 01-OCT-05 01-OCT-06 DUP 4
2 A B B 02-OCT-07 05-OCT-08 DUP 4
2 a f f 06-FEB-09 07-FEB-10 NOT 5
2 A B B 02-OCT-11 03-OCT-12 DUP 6
2 A B B 04-OCT-13 06-OCT-14 DUP 6
3 A F F 10-FEB-14 20-FEB-15 NOT 7
Then I could use the Dup_Id to partition on to do the anaysis I need.
Any idea?
Have a nice weekend.
Thanks.
Followup May 21, 2004 - 1pm Central time zone:
the above query doesn't work?
Hi Again
May 21, 2004 - 2pm Central time zone
Reviewer: Dave Thompson from England
Hi Tom,
The above didn't work.
From the source query:
ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 03-OCT-12 DUP
2 A B B 04-OCT-13 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT
I want to output the following resultset:
ID COLUMN_A COLUMN_B COLUMN_C START_DAT END_DATE LET
---------- ---------- ---------- ---------- --------- --------- ---
1 A B C 01-OCT-99 01-OCT-00 NOT
1 D B C 01-OCT-01 01-OCT-02 NOT
1 A B C 01-OCT-02 01-OCT-03 NOT
2 A B B 01-OCT-03 05-OCT-08 DUP
2 a f f 06-FEB-09 07-FEB-10 NOT
2 A B B 02-OCT-11 06-OCT-14 DUP
3 A F F 10-FEB-14 20-FEB-15 NOT
On the resultset from your queries the start and end dates were incorrect.
Where duplicates rows occur one after another then we need to take the start_date of the first row
and the end_date of the last row in that block.
So far the following:
2 A B B 01-OCT-03 01-OCT-04 DUP
2 A B B 01-OCT-05 01-OCT-06 DUP
2 A B B 02-OCT-07 05-OCT-08 DUP
You would get
2 A B B 01-OCT-03 05-OCT-08 DUP
Does this make sense?
Thanks again for you input on this.
Followup May 21, 2004 - 2pm Central time zone:
ops$tkyte@ORA9IR2> select id, a,b,c, min(start_date) start_date, max(end_date) end_date
2 from (
3 select id, a,b,c, start_date, end_date,
4 max(grp) over (order by id, start_date desc) grp
5 from (
6 select id, a,b,c, start_date, end_date,
7 case when (a <> lag(a) over (order by id, start_date desc) or
8 b <> lag(b) over (order by id, start_date desc) or
9 c <> lag(c) over (order by id, start_date desc) )
10 then row_number() over (order by id, start_date desc)
11 end grp
12 from v
13 )
14 )
15 group by id, a,b,c,grp
16 order by 1, 5
17 /
ID A B C START_DAT END_DATE
---------- ---------- ---------- ---------- --------- ---------
1 A B C 01-OCT-99 01-OCT-00
1 D B C 01-OCT-01 01-OCT-02
1 A B C 01-OCT-02 01-OCT-03
2 A B B 01-OCT-03 05-OCT-08
2 a f f 06-FEB-09 07-FEB-10
2 A B B 02-OCT-11 06-OCT-14
3 A F F 10-FEB-14 20-FEB-15
7 rows selected.
One of my (current) favorite analytic tricks -- the old "carry forward". We mark rows such that
the preceding row was different -- subsequent dup rows would have NULLS there for grp.
Then, we use max(grp) to "carry" that number down....
Now we have something to group by -- we've divided the rows up into groups we can deal with.
(note: if a,b,c allow NULLS, we'll need to accomidate for that!)
Great Stuff
May 21, 2004 - 5pm Central time zone
Reviewer: Dave Thompson from England, Overnight frost expected!
Tom,
Thanks very much for that.
I'll go over it in more detail when I'm in the Office Monday but it looks great from here.
Enjoy the weekend.
Excellent
June 2, 2004 - 4am Central time zone
Reviewer: Dave Thompson from Yorkshire
Hi Tom,
This solution was spot on.
Thanks.
Any more thoughts on an Analytics book?

June 9, 2004 - 6pm Central time zone
Reviewer: Stalin from CA, US
hi tom,
wondering what would below sql look like if there hadn't been existence of lead or partition
analytical funtions. is pl/sql the only option.
snippet from "lead/lag on different dataset" thread (it's has the create and insert stmts)
ops$tkyte@ORA9IR2> select a.* , round( (signoff_date-signon_date) * 24 * 60, 2 )
minutes
2 from (
3 select log_id,
4 case when action in (1,2) and lead(action) over (partition by
account_id,user_id,mac order by log_creation_date) in (3,4,5,6,7)
5 then lead(log_id) over (partition by account_id, user_id, mac
order by log_creation_date)
6 end signoff_id,
7 user_id,
8 log_creation_date signon_date,
9 case when action in (1,2) and lead(action) over (partition by
account_id,user_id,mac order by log_creation_date) in (3,4,5,6,7)
10 then lead(log_creation_date) over (partition by account_id,
user_id, mac order by log_creation_date)
11 end signoff_date,
12 action
13 from logs
14 where account_id = 'Robert'
15 and service = 5
16 order by user_id
17 ) a
18 where action in (1,2)
19 /
Thanks,
Stalin
Followup June 9, 2004 - 6pm Central time zone:
you could use a non-equi self join to achieve the same. Many orders of magnitudes slower.
scalar subqueries could be used as well -- with the same "slower" caveat.
Is this solvable with ANALTICS too?
June 10, 2004 - 12am Central time zone
Reviewer: Peter Tran from Houston, TX USA
Hi Tom,
Can the following problem be solved using Analytics?
I have a 10 columns table where 9 of the fields are dimensions and one attribute. I would like to
get a report of D1/D2 columns where the ATTR1 is 1 for every other dimensions. Furthermore the PK
consist of all the dimension columns.
The example below aren't really true column names, but I didn't want to make the example table too
wide for illustrative purpose.
D1 D2 D3 D4 D5 D6 D7 D8 D9 ATTR1
--------------------------------------------
AA AA AA AA AA AA AA AA AA 1
AA AA BB AA AA AA AA AA AA 1
AA AA AA CC AA AA AA AA AA 1
AA AA AA AA DD AA AA AA AA 1
AA AA AA AA EE AA AA AA AA 1
AA BB AA AA AA AA AA GG AA 1
AA BB AA AA AA AA AA AA AA 1
AA BB CC AA AA AA AA AA AA 0
AA BB AA DD AA AA AA AA AA 1
EE DD JJ LL MM NN OO PP QQ 1
EE DD TT LL MM NN OO PP QQ 1
I want the query to return:
D1 D2
--------
AA AA
EE DD
It would not return AA/BB, because of the record:
D1 D2 D3 D4 D5 D6 D7 D8 D9 ATTR1
--------------------------------------------
AA BB CC AA AA AA AA AA AA 0
Thanks,
-Peter
Followup June 10, 2004 - 7am Central time zone:
yes they can, but they are not needed. regular aggregates do the job. I'd give you the real query
if I had a create table/inserts to demo against. this is "psuedo code", might or might not
actually work:
select d1, d2
from t
group by d1, d2
having count(distinct attribute) = 1

June 10, 2004 - 9am Central time zone
Reviewer: Michael T. from Dallas, Tx
Peter,
I think the following may give you what you want.
SELECT d1, d2
FROM t
GROUP BY d1, d2
HAVING SUM(DECODE(attr1, 1, 0, 1)) > 0;
Tom's psuedo code will work except for the case when all D1/D2 combinations have the same ATTR1
value, but that value is not 1.
Followup June 10, 2004 - 9am Central time zone:
ahh, good eye -- i was thinking "all attribute values are the same"
but yours doesn't do it, this will
having count( decode( attr1, 1, 1 ) ) = count(*)
cound(decode(attr1,1,1)) will return a count of non-null occurences (all of the 1's)
count(*) returns a count of all records
output when count(decode) = count(*)
Thank you!
June 10, 2004 - 10am Central time zone
Reviewer: Peter Tran from Houston, TX USA
Hi Tom/Michael T.,
Thank you. It so much clearer now.
-Peter

June 10, 2004 - 10am Central time zone
Reviewer: Michael T. from Dallas, Tx
I did screw up in my previous response. The query I submitted gives the entirely wrong answer. It
should have been
SELECT d1, d2
FROM t
GROUP BY d1, d2
HAVING SUM(DECODE(attr1, 1, 0, 1)) = 0
Even though, incorrectly, I wasn't originally considering null values for ATTR1, the above query
seems to produce the correct answer even if ATTR1 is NULL. The DECODE will evaluate a null ATTR1
entry to 1.
Tom, many thanks for this site. I have learned so much from it. It is a daily must read for me.
You said a book on analytics?
June 10, 2004 - 12pm Central time zone
Reviewer: Jeff from Atlanta, GA
A book by you on analytics would be a best seller I think.
Go for it.
quick analytic question
June 16, 2004 - 5pm Central time zone
Reviewer: A reader
schema creation---
---
scott@ora92> drop table t1;
Table dropped.
scott@ora92> create table t1
2 (
3 x varchar2(10),
4 y number
5 );
Table created.
scott@ora92>
scott@ora92> insert into t1 values( 'x1', 1 );
1 row created.
scott@ora92> insert into t1 values( 'x1', 2 );
1 row created.
scott@ora92> insert into t1 values( 'x1', 4 );
1 row created.
scott@ora92> insert into t1 values( 'x1', 0 );
1 row created.
scott@ora92> commit;
Commit complete.
scott@ora92> select x, y, min(y) over() min_y
2 from t1;
X Y MIN_Y
---------- ---------- ----------
x1 1 0
x1 2 0
x1 4 0
x1 0 0
scott@ora92> spool off
---
how do i get the minimum of y for all values
that is greater than 0 (if one exists). In the above case
I should get the result as
X Y MIN_Y
---------- ---------- ----------
x1 1 1
x1 2 1
x1 4 1
x1 0 1
Thanx for your excellent site and brilliant work!
Followup June 16, 2004 - 6pm Central time zone:
min( case when y > 0 then y end ) over ()
Great!!!
June 16, 2004 - 6pm Central time zone
Reviewer: A reader
Thank you very much
July 2, 2004 - 9am Central time zone
Reviewer: Gj from UK
The Oracle docs are a little light on examples but thank you for giving us the quick start to
analytics, can't say I understand the complex examples yet, but the simple stuff seems so easy to
understand now, can't wait until a real problem comes along I can apply this feature to.
How to mimic Ora10g LAST_VALUE(... IGNORE NULLS)?
July 6, 2004 - 8am Central time zone
Reviewer: Sergey from Norway
Hi Tom,
I need to 'fill the gaps' with the values from the last existing row in a table that is outer
joined to another table. The other table servers as a source of regular [time] intervals. The task
seems to be conceptually very simple, so I looked into Ora docs (it happens to be Ora10g docs) I
pretty soon found exactly what I need: LAST_VALUE with IGNORE NULLS. Unfortunately neither Ora8i,
nor Ora9i accept IGNORE NULLS. Is there any way to mimic this feature with 'older' analitical
functions?
I tried sort of ORDER BY SIGN(NVL(VALUE), 0) in analitical ORDER BY clause, but it does not work (I
do not have a clue why)
Thanks in advance
Here is the test:
DROP TABLE TD;
CREATE TABLE TD AS
(SELECT TRUNC(SYSDATE, 'DD') + ROWNUM T
FROM ALL_OBJECTS
WHERE ROWNUM <= 15
);
DROP TABLE TV;
CREATE TABLE TV AS
(SELECT
TRUNC(SYSDATE, 'DD') + ROWNUM * 3 T
,ROWNUM V
FROM ALL_OBJECTS
WHERE ROWNUM <= 5
);
SELECT
TD.T
,SIGN(NVL(TV.V, 0))
,NVL
(TV.V,
LAST_VALUE(TV.V IGNORE NULLS) -- IGNORE NULLS does not work on Ora8i, Ora9i
OVER
(
ORDER BY TD.T
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
) V
FROM TD, TV
WHERE TV.T(+) = TD.T
ORDER BY TD.T
;
ERROR at line 6:
ORA-00907: missing right parenthesis
SELECT
TD.T
,SIGN(NVL(TV.V, 0))
,NVL
(TV.V,
LAST_VALUE(TV.V)
OVER
(
ORDER BY SIGN(NVL(TV.V, 0)), TD.T -- Does not work
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
) V
FROM TD, TV
WHERE TV.T(+) = TD.T
ORDER BY TD.T
;
T SIGN(NVL(TV.V,0)) V
------------------- ------------------ ------------------
07.07.2004 00:00:00 0
08.07.2004 00:00:00 0
09.07.2004 00:00:00 1 1
10.07.2004 00:00:00 0
11.07.2004 00:00:00 0
12.07.2004 00:00:00 1 2
13.07.2004 00:00:00 0
14.07.2004 00:00:00 0
15.07.2004 00:00:00 1 3
16.07.2004 00:00:00 0
17.07.2004 00:00:00 0
18.07.2004 00:00:00 1 4
19.07.2004 00:00:00 0
20.07.2004 00:00:00 0
21.07.2004 00:00:00 1 5
Followup July 6, 2004 - 8am Central time zone:
This is a trick I call "carry down", we use analytics on analytics to accomplish this. We output
"marker rows" with ROW_NUMBER() on the leading edge. Using MAX() in the outer query, we "carry
down" these marker rows -- substr gets rid of the row_number for us:
ops$tkyte@ORA10G> select t,
2 sign_v,
3 v,
4 substr( max(data) over (order by t), 7 ) v2
5 from (
6 SELECT TD.T,
7 SIGN(NVL(TV.V, 0)) sign_v,
8 NVL(TV.V, LAST_VALUE(TV.V IGNORE NULLS) OVER ( ORDER BY TD.T )) V,
9 case when tv.v is not null
10 then to_char( row_number()
over (order by td.t), 'fm000000' ) || tv.v
11 end data
12 FROM TD, TV
13 WHERE TV.T(+) = TD.T
14 )
15 ORDER BY T
16 ;
T SIGN_V V V2
--------- ---------- ---------- -----------------------------------------
07-JUL-04 0
08-JUL-04 0
09-JUL-04 1 1 1
10-JUL-04 0 1 1
11-JUL-04 0 1 1
12-JUL-04 1 2 2
13-JUL-04 0 2 2
14-JUL-04 0 2 2
15-JUL-04 1 3 3
16-JUL-04 0 3 3
17-JUL-04 0 3 3
18-JUL-04 1 4 4
19-JUL-04 0 4 4
20-JUL-04 0 4 4
21-JUL-04 1 5 5
15 rows selected.
So, in 9ir2 this would simply be:
ops$tkyte@ORA9IR2> select t,
2 sign_v,
3 substr( max(data) over (order by t), 7 ) v2
4 from (
5 SELECT TD.T,
6 SIGN(NVL(TV.V, 0)) sign_v,
7 case when tv.v is not null
8 then to_char( row_number() over (order by td.t), 'fm000000' ) || tv.v
9 end data
10 FROM TD, TV
11 WHERE TV.T(+) = TD.T
12 )
13 ORDER BY T
14 ;
T SIGN_V V2
--------- ---------- -----------------------------------------
07-JUL-04 0
08-JUL-04 0
09-JUL-04 1 1
10-JUL-04 0 1
11-JUL-04 0 1
12-JUL-04 1 2
13-JUL-04 0 2
14-JUL-04 0 2
15-JUL-04 1 3
16-JUL-04 0 3
17-JUL-04 0 3
18-JUL-04 1 4
19-JUL-04 0 4
20-JUL-04 0 4
21-JUL-04 1 5
15 rows selected.
Doesn't work with PL/SQL ????????
July 20, 2004 - 9am Central time zone
Reviewer: A reader
Dear Tom
Are analytics fully compatible with PL/SQL?
Please see
SQL> ed
Wrote file afiedt.buf
1 select empno,deptno,
2 count(empno) over (partition by deptno order by empno
3 rows between unbounded preceding and current row) run_count
4* from emp
SQL> /
EMPNO DEPTNO RUN_COUNT
---------- ---------- ----------
7782 10 1
7839 10 2
7934 10 3
7369 20 1
7566 20 2
7788 20 3
7876 20 4
7902 20 5
7499 30 1
7521 30 2
7654 30 3
EMPNO DEPTNO RUN_COUNT
---------- ---------- ----------
7698 30 4
7844 30 5
7900 30 6
14 rows selected.
SQL>
SQL> ed
Wrote file afiedt.buf
1 declare
2 cursor c1 is
3 select empno,deptno,
4 count(empno) over (partition by deptno order by empno
5 rows between unbounded preceding and current row) run_count
6 from emp;
7 begin
8 for rec in c1 loop
9 null;
10 end loop;
11* end;
SQL> /
end;
*
ERROR at line 11:
ORA-06550: line 5, column 72:
PL/SQL: ORA-00905: missing keyword
ORA-06550: line 3, column 1:
PL/SQL: SQL Statement ignored
SQL>
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for 32-bit Windows: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production
SQL>
Followup July 20, 2004 - 8pm Central time zone:
You can contact support and reference <Bug:3083373>, but the workaround would be to use native
dynamic sql or a view to "hide" this construct.
the problem turns out to be the word "current" which had meaning in plsql.
Effect of distinct on lag
July 29, 2004 - 1pm Central time zone
Reviewer: John Murphy from Vienna, VA
I am trying to use analytics to find accounts with receipts in 3 consecutive years. The analytic
code seems to work, however, when I add DISTINCT (to find each account once), I get strange
results. This is on 9.2.0.1.0.
create table jcm_test(acct_id number(10), rcpt_date date);
insert into jcm_test
values (1 , to_date('01-JAN-2000', 'dd-mon-yyyy'));
insert into jcm_test
values (1 , to_date('01-JAN-2001', 'dd-mon-yyyy'));
insert into jcm_test
values (1 , to_date('01-JAN-2003', 'dd-mon-yyyy'));
insert into jcm_test
values (1 , to_date('02-JAN-2001', 'dd-mon-yyyy'));
(select j2.*,
rcpt_year - lag_yr as year_diff,
rank_year - lag_rank as rank_diff
from (select acct_id, rcpt_year, rank_year,
lag(rcpt_year, 2) over (partition by acct_id order by rcpt_year) lag_yr,
lag(rank_year, 2) over (partition by acct_id order by rcpt_year) lag_rank
from (select acct_id,
rcpt_year,
rank() over (partition by acct_id order by j.rcpt_year) rank_year
from (select distinct acct_id, to_char(rcpt_date, 'YYYY') rcpt_year
from jcm_test) j )
) j2);
ACCT_ID RCPT RANK_YEAR LAG_ LAG_RANK YEAR_DIFF RANK_DIFF
---------- ---- ---------- ---- ---------- ---------- ----------
1 2000 1
1 2001 2
1 2003 3 2000 1 3 2
select * from
(select j2.*,
rcpt_year - lag_yr as year_diff,
rank_year - lag_rank as rank_diff
from (select acct_id, rcpt_year, rank_year,
lag(rcpt_year, 2) over (partition by acct_id order by rcpt_year) lag_yr,
lag(rank_year, 2) over (partition by acct_id order by rcpt_year) lag_rank
from (select acct_id,
rcpt_year,
rank() over (partition by acct_id order by j.rcpt_year) rank_year
from (select distinct acct_id, to_char(rcpt_date, 'YYYY') rcpt_year
from jcm_test) j )
) j2)
where year_diff = rank_diff;
no rows selected
select distinct * from
(select j2.*,
rcpt_year - lag_yr as year_diff,
rank_year - lag_rank as rank_diff
from (select acct_id, rcpt_year, rank_year,
lag(rcpt_year, 2) over (partition by acct_id order by rcpt_year) lag_yr,
lag(rank_year, 2) over (partition by acct_id order by rcpt_year) lag_rank
from (select acct_id,
rcpt_year,
rank() over (partition by acct_id order by j.rcpt_year) rank_year
from (select distinct acct_id, to_char(rcpt_date, 'YYYY') rcpt_year
from jcm_test) j )
) j2)
where year_diff = rank_diff;
ACCT_ID RCPT RANK_YEAR LAG_ LAG_RANK YEAR_DIFF RANK_DIFF
---------- ---- ---------- ---- ---------- ---------- ----------
1 2001 2 2000 1 1 1
1 2003 4 2001 2 2 2
In your book, you say that because analytics are performed last, you must push them into an inline
view. However, that doesn't seem to do the trick here. Thanks, john
Followup July 29, 2004 - 2pm Central time zone:
what release -- i don't see what you see.
Distinct effect release
July 29, 2004 - 3pm Central time zone
Reviewer: John Murphy from Vienna, VA
Tom, we are using the following.
Oracle9i Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production
I tried searching Metalink, but couldn't find any bugs.
Followup July 29, 2004 - 4pm Central time zone:
i found one, not published, was solved via 9202 -- at least it did not reproduce, they did not
pursue it further for that reason.
Distinct effect release
July 29, 2004 - 4pm Central time zone
Reviewer: John Murphy from Vienna, VA
Actually, I suspect that this may be related to bug 2258035. Do you agree? Thanks, john
Followup July 29, 2004 - 4pm Central time zone:
yes, i can confirm that in 9205, it is not happening that way.
how to write this query
July 30, 2004 - 6am Central time zone
Reviewer: Teddy
Hi
using the original poster´s example:
ORDER OPN STATION CLOSE_DATE
----- --- ------- ----------
12345 10 RECV 07/01/2003
12345 20 MACH1 07/02/2003
12345 25 MACH1 07/05/2003
12345 30 MACH1 07/11/2003
12345 36 INSP1 07/12/2003
12345 50 MACH1 08/16/2003
12346 90 MACH2 07/30/2003
12346 990 STOCK 07/31/2003
How do you write a query to determine that and order has passed maufacturing operation in several
months?
In above example
12345 has rows in July and Augist but 12346 has rows in July only. How can we write a query to find
orders such as 12345?
Followup July 30, 2004 - 4pm Central time zone:
select order, min(close_date), max(close_date)
from t
having months_between( max(close_date), min(close_date) ) > your_threshold;
Finding pairs in result set
August 11, 2004 - 10am Central time zone
Reviewer: PJ
Tom,
CREATE TABLE A
(
N NUMBER,
C CHAR(1),
V VARCHAR2(20)
)
INSERT INTO A ( N, C, V ) VALUES ( 1, 'e', '1st e of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 1, 'e', '2nd e of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 1, 'e', '3rd e of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 1, 'w', '1st w of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 1, 'w', '2nd w of 1st N');
INSERT INTO A ( N, C, V ) VALUES ( 2, 'e', '1st e of 2nd N');
INSERT INTO A ( N, C, V ) VALUES ( 2, 'w', '1st w of 2nd N');
INSERT INTO A ( N, C, V ) VALUES ( 2, 'w', '2nd w of 2nd N');
commit;
SO the data I've is
select * from a;
-------------------------
N C V
1 e 1st e of 1st N
1 e 2nd e of 1st N
1 e 3rd e of 1st N
1 w 1st w of 1st N
1 w 2nd w of 1st N
2 e 1st e of 2nd N
2 w 1st w of 2nd N
2 w 2nd w of 2nd N
---------------------------------------
And the output I'm looking for is
1 e 1st e of 1st N
1 e 2nd e of 1st N
1 w 1st w of 1st N
1 w 2nd w of 1st N
2 e 1st e of 2nd N
2 w 1st w of 2nd N
So basically I need the first pairs of (e-w/w-e) for each N.
I hope I'm clear here.
Thanks as usual in advance,
Followup August 11, 2004 - 12pm Central time zone:
do you have a field that can be "sorted on" for finding "1st, 2cnd" and so on.
If not, there is no such thing as "first", or "third"

August 11, 2004 - 12pm Central time zone
Reviewer: PJ
Tom,
Sorry if I was not clear.
we need to pick pairs for N. Like we have 5 rows with N=1. so we have to pick 4 rows leaving 1
UNPAIRED "e" out.
We want the data in the same order as it is in table. We can sort it by --> order by N,C
Followup August 11, 2004 - 1pm Central time zone:
ops$tkyte@ORA920> select n, c, rn, cnt2
2 from (
3 select n, c, rn,
4 min(cnt) over (partition by n) cnt2
5 from (
6 select n, c,
7 row_number() over (partition by n, c order by c) rn,
8 count(*) over (partition by n, c) cnt
9 from a
10 )
11 )
12 where rn <= cnt2
13 /
N C RN CNT2
---------- - ---------- ----------
1 e 1 2
1 e 2 2
1 w 1 2
1 w 2 2
2 e 1 1
2 w 1 1
6 rows selected.
Brilliant as usual !!
August 11, 2004 - 2pm Central time zone
Reviewer: A reader
PJ's query
August 11, 2004 - 2pm Central time zone
Reviewer: Kevin from St. Louis
PJ - you can drop the column 'v' from your table, and just use this query (which I think will
answer your question using N and C alone, and generate an appropriate 'v' as it runs).
CREATE TABLE b
(
N NUMBER,
C CHAR(1)
)
INSERT INTO b ( N, C ) VALUES ( 1, 'e');
INSERT INTO b ( N, C ) VALUES ( 1, 'e');
INSERT INTO b ( N, C ) VALUES ( 1, 'e');
INSERT INTO b ( N, C ) VALUES ( 1, 'w');
INSERT INTO b ( N, C ) VALUES ( 1, 'w');
INSERT INTO b ( N, C ) VALUES ( 2, 'e');
INSERT INTO b ( N, C ) VALUES ( 2, 'w');
INSERT INTO b ( N, C ) VALUES ( 2, 'w');
COMMIT;
SELECT n,c,v1
FROM (
SELECT lag (c1) OVER (PARTITION BY n,c1 ORDER BY n,c1) c3,
lead (c1) OVER (PARTITION BY n,c1 ORDER BY n,c1)c4,
c1 ||
CASE WHEN c1 BETWEEN 10 AND 20
THEN 'th'
ELSE DECODE(MOD(c1,10),1,'st',2,'nd',3,'rd','th')
END || ' ' || c || ' of ' || c2 ||
CASE WHEN c2 BETWEEN 10 AND 20
THEN 'th'
ELSE DECODE(MOD(c2,10),1,'st',2,'nd',3,'rd','th')
END || ' N' v1,
t1.*
FROM (
SELECT b.*,
row_number() OVER (PARTITION BY n, c ORDER BY n,c) c1,
DENSE_RANK() OVER (PARTITION BY n, c ORDER BY n,c) c2
FROM b
) t1
) t2
WHERE c3 IS NOT NULL OR c4 IS NOT NULL
/
Results:
N C V1
1 e 1st e of 1st N
1 w 1st w of 1st N
1 e 2nd e of 1st N
1 w 2nd w of 1st N
2 e 1st e of 1st N
2 w 1st w of 1st N
INSERT INTO b ( N, C ) VALUES ( 1, 'w');
COMMIT;
Results:
N C V1
1 e 1st e of 1st N
1 w 1st w of 1st N
1 e 2nd e of 1st N
1 w 2nd w of 1st N
1 e 3rd e of 1st N
1 w 3rd w of 1st N
2 e 1st e of 1st N
2 w 1st w of 1st N
oops
August 11, 2004 - 2pm Central time zone
Reviewer: Kevin from St. Louis
replace
DENSE_RANK() OVER (PARTITION BY n, c ORDER BY n,c) c2
with
DENSE_RANK() OVER (PARTITION BY c ORDER BY c) c2
my bad.

August 11, 2004 - 3pm Central time zone
Reviewer: A reader
Your bad what?
toe? leg?
Cool....
August 12, 2004 - 7am Central time zone
Reviewer: PJ
analytic q
October 22, 2004 - 6pm Central time zone
Reviewer: A reader
First the schema:
scott@ORA92I> drop table t1;
Table dropped.
scott@ORA92I> create table t1( catg1 varchar2(10), catg2 varchar2(10), total number );
Table created.
scott@ORA92I>
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V1', 'T1', 5 );
1 row created.
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V1', 'T1', 6 );
1 row created.
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V1', 'T1', 9 );
1 row created.
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V2', 'T2', 10 );
1 row created.
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V3', 'T1', 11 );
1 row created.
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V4', 'T1', 1 );
1 row created.
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V5', 'T2', 2 );
1 row created.
scott@ORA92I> insert into t1( catg1, catg2, total) values( 'V6', 'T2', 3 );
1 row created.
The catg2 can only take two values, 'T1', 'T2'.
I want to sum the total column for catg1, catg2
and order by their total sum for each catg1 and catg2 values. Then
I want to list the top 3 catg1, catg2 combinations
based on their sum values of total column.
If there are more than 3 such combinations then I
club the remaining ones into a catg1 value of 'Others'.
my first cut solution is:
scott@ORA92I> select catg1, catg2, sum( total_sum )
2 from
3 (
4 select case
5 when dr > 3 then
6 'Others'
7 when dr <= 3 then
8 catg1
9 end catg1,
10 catg2,
11 total_sum
12 from
13 (
14 select catg1, catg2, total_sum,
15 dense_rank() over( order by total_sum desc) dr
16 from
17 (
18 select catg1, catg2, sum( total ) total_sum
19 from t1
20 group by catg1, catg2
21 )
22 )
23 )
24 group by catg1, catg2;
CATG1 CATG2 SUM(TOTAL_SUM)
---------- ---------- --------------
V1 T1 20
V2 T2 10
V3 T1 11
Others T1 1
Others T2 5
Does it look ok or do you have any better solution?
Thank you as always.
Followup October 23, 2004 - 9am Central time zone:
you could skip a layer of inline view, but it looks fine as is.
thanx!
October 24, 2004 - 12pm Central time zone
Reviewer: A reader
SQL query
November 3, 2004 - 1pm Central time zone
Reviewer: Reader from USA
I have a table which stores receipts against Purchase Orders. The users want the following o/p:
For each of the months of Jan, Feb and March 2004, provide a count of number of receipts which fall
in each of the following Dollar value range
< $5000
Between $5000 to $9999
> $10,000
(There can be a number of receipts against one Purchase Order, so that's needs to be grouped
together first)
I wrote this query using an inline view which is the UNION of 3 SQLs, one for each dollar range.
However, am sure there is a more elegant and efficient method to do this,maybe using analytical
functions , CASE, decode .... Appreciate your help.
Thanks
Followup November 5, 2004 - 10am Central time zone:
select trunc(date_col,'mm') Month,
count( case when amt < 5000 then 1 end ) "lt 5000",
count( case when amt between 5000 and 9999 then 1 end ) "between 5/9k",
count( case when amt >= 10000 then 1 end ) "10k or more"
from t
where date_col between :a and :b
group by trunc(date_col,'mm')
single pass....
Great -
November 10, 2004 - 7am Central time zone
Reviewer: syed from UK
Tom
I have a tables as follows
create table matches
( reference varchar2(9),
endname varchar2(20),
beginname varchar2(30),
DOB date,
ni varchar2(9)
)
/
insert into matches values ('A1','SMITH','BOB',to_date('1/1/1976','dd/mm/yyyy'),'AA1234567');
insert into matches values ('A1','SMITH','TOM',to_date('1/1/1970','dd/mm/yyyy'),'AA1234568');
insert into matches values ('A2','JONES','TOM',to_date('1/1/1970','dd/mm/yyyy'),'AA1234568');
insert into matches values ('A3','JONES','TOM',to_date('1/1/1971','dd/mm/yyyy'),'AA1234569');
insert into matches values ('A4','BROWN','BRAD',to_date('1/1/1961','dd/mm/yyyy'),'AA1234570');
insert into matches values ('A4','JONES','BRAD',to_date('1/1/1961','dd/mm/yyyy'),'AA1234571');
insert into matches values ('A1','SMITH','BOB',to_date('1/1/1976','dd/mm/yyyy'),'AA1234567');
insert into matches values ('A3','JACKSON','TOM',to_date('1/1/1971','dd/mm/yyyy'),'AA1234569');
insert into matches values ('A2','JACKSON','BOB',to_date('1/1/1962','dd/mm/yyyy'),'AA1234568');
insert into matches values ('A5','JACKSON','TOM',to_date('1/1/1920','dd/mm/yyyy'),'AA1234569');
commit;
SQL> select rownum,REFERENCE,ENDNAME,BEGINNAME,DOB,NI from matches;
ROWNUM REFERENCE ENDNAME BEGINNAME DOB NI
------- --------- -------- ---------- --------- ---------
1 A1 SMITH BOB 01-JAN-76 AA1234567
2 A1 SMITH TOM 01-JAN-70 AA1234568
3 A2 JONES TOM 01-JAN-70 AA1234568
4 A3 JONES TOM 01-JAN-71 AA1234569
5 A4 BROWN BRAD 01-JAN-61 AA1234570
6 A4 JONES BRAD 01-JAN-61 AA1234571
7 A1 SMITH BOB 01-JAN-76 AA1234567
8 A3 JACKSON TOM 01-JAN-71 AA1234569
9 A2 JACKSON BOB 01-JAN-62 AA1234568
10 A5 JACKSON TOM 01-JAN-20 AA1234569
I need to show duplicates where the following columns values are the same.
a) REFERENCE, ENDNAME,BEGINNAME,DOB,NI
b) ENDNAME,BEGINNAME,NI
c) REFERENCE,NI
So,
rownum 1 and 7 match criteria a)
rownum 8 and 10 match criteria b)
rownum 1 and 7, rownum 3 and 9, rownum 4 and 8 match criteria c)
How can I select this data out to show number matching each criteria ?
Followup November 10, 2004 - 7am Central time zone:
"How can I select this data out to show number matching each criteria ?"
is ambigous.
If you add columns:
count(*) over (partition by reference, endname, beginname, dob, ni ) cnt1,
count(*) over (partition by endname, beginname, ni) cnt2,
count(*) over (partition by reference,ni) cnt3
it'll give you the "dup count" by each partition -- technically showing you the "number matching
each criteria"
analytics problem
November 19, 2004 - 9am Central time zone
Reviewer: David from United Kingdom
Am newish to analytic functions and have hit problem as follows:-
create table a
(accno number(8) not null,
total_paid number(7,2) not null)
/
create table b
(accno number(8) not null,
due_date date not null,
amount_due number(7,2) not null)
/
insert into a values (1, 1000);
insert into a values (2, 1500);
insert into a values (3, 2000);
insert into a values (4, 3000);
insert into b values (1, '01-oct-04', 1000);
insert into b values (1, '01-jan-05', 900);
insert into b values (1, '01-apr-05', 700);
insert into b values (2, '01-oct-04', 1000);
insert into b values (2, '01-jan-05', 900);
insert into b values (2, '01-apr-05', 700);
insert into b values (3, '01-oct-04', 1000);
insert into b values (3, '01-jan-05', 900);
insert into b values (3, '01-apr-05', 700);
insert into b values (4, '01-oct-04', 1000);
insert into b values (4, '01-jan-05', 900);
insert into b values (4, '01-apr-05', 700);
If I then do this query...
SQL> select a.accno,
2 a.total_paid,
3 b.due_date,
4 b.amount_due,
5 case
6 when sum(b.amount_due)
7 over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <= 0
8 then 0
9 when sum(b.amount_due)
10 over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid < b.amount_due
11 then sum(b.amount_due)
12 over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid
13 when sum(b.amount_due)
14 over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid >= b.amount_due
15 and a.total_paid >= 0
16 then b.amount_due
17 end to_pay
18 from a,b
19 where a.accno = b.accno
20 order by a.accno,
21 to_date(b.due_date, 'dd-mon-rr')
22 /
ACCNO TOTAL_PAID DUE_DATE AMOUNT_DUE TO_PAY
---------- ---------- --------- ---------- ----------
1 1000 01-OCT-04 1000 1000
1 1000 01-JAN-05 900 900
1 1000 01-APR-05 700 700
2 1500 01-OCT-04 1000 1000
2 1500 01-JAN-05 900 900
2 1500 01-APR-05 700 700
3 2000 01-OCT-04 1000 1000
3 2000 01-JAN-05 900 900
3 2000 01-APR-05 700 700
4 3000 01-OCT-04 1000 1000
4 3000 01-JAN-05 900 900
4 3000 01-APR-05 700 700
12 rows selected.
...TO_PAY does not give what I was expecting. But if I do by individual accno I get what I'm
after:-
SQL> select a.accno,
2 a.total_paid,
3 b.due_date,
4 b.amount_due,
5 case
6 when sum(b.amount_due)
7 over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <= 0
8 then 0
9 when sum(b.amount_due)
10 over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid < b.amount_due
11 then sum(b.amount_due)
12 over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid
13 when sum(b.amount_due)
14 over (order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid >= b.amount_due
15 and a.total_paid >= 0
16 then b.amount_due
17 end to_pay
18 from a,b
19 where a.accno = b.accno
20 and a.accno = &accno
21 order by a.accno,
22 to_date(b.due_date, 'dd-mon-rr')
23 /
Enter value for accno: 1
old 20: and a.accno = &accno
new 20: and a.accno = 1
ACCNO TOTAL_PAID DUE_DATE AMOUNT_DUE TO_PAY
---------- ---------- --------- ---------- ----------
1 1000 01-OCT-04 1000 0
1 1000 01-JAN-05 900 900
1 1000 01-APR-05 700 700
3 rows selected.
SQL> /
Enter value for accno: 2
old 20: and a.accno = &accno
new 20: and a.accno = 2
ACCNO TOTAL_PAID DUE_DATE AMOUNT_DUE TO_PAY
---------- ---------- --------- ---------- ----------
2 1500 01-OCT-04 1000 0
2 1500 01-JAN-05 900 400
2 1500 01-APR-05 700 700
3 rows selected.
SQL> /
Enter value for accno: 3
old 20: and a.accno = &accno
new 20: and a.accno = 3
ACCNO TOTAL_PAID DUE_DATE AMOUNT_DUE TO_PAY
---------- ---------- --------- ---------- ----------
3 2000 01-OCT-04 1000 0
3 2000 01-JAN-05 900 0
3 2000 01-APR-05 700 600
3 rows selected.
SQL> /
Enter value for accno: 4
old 20: and a.accno = &accno
new 20: and a.accno = 4
ACCNO TOTAL_PAID DUE_DATE AMOUNT_DUE TO_PAY
---------- ---------- --------- ---------- ----------
4 3000 01-OCT-04 1000 0
4 3000 01-JAN-05 900 0
4 3000 01-APR-05 700 0
3 rows selected.
What is needed for first query above to work?
cheers,
David
Followup November 19, 2004 - 11am Central time zone:
ops$tkyte@ORA9IR2> select a.accno,
2 a.total_paid,
3 b.due_date,
4 b.amount_due,
5 case
6 when sum(b.amount_due)
7 over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <= 0
8 then 0
9 when sum(b.amount_due)
10 over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid <
b.amount_due
11 then sum(b.amount_due)
12 over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid
13 when sum(b.amount_due)
14 over (partition by a.accno order by to_date(b.due_date, 'dd-mon-rr')) - a.total_paid >=
b.amount_due
15 and a.total_paid >= 0
16 then b.amount_due
17 end to_pay
18 from a,b
19 where a.accno = b.accno
20 order by a.accno,
21 to_date(b.due_date, 'dd-mon-rr')
22 /
ACCNO TOTAL_PAID DUE_DATE AMOUNT_DUE TO_PAY
---------- ---------- --------- ---------- ----------
1 1000 01-OCT-04 1000 0
1 1000 01-JAN-05 900 900
1 1000 01-APR-05 700 700
2 1500 01-OCT-04 1000 0
2 1500 01-JAN-05 900 400
2 1500 01-APR-05 700 700
3 2000 01-OCT-04 1000 0
3 2000 01-JAN-05 900 0
3 2000 01-APR-05 700 600
4 3000 01-OCT-04 1000 0
4 3000 01-JAN-05 900 0
4 3000 01-APR-05 700 0
12 rows selected.
excellent
November 19, 2004 - 12pm Central time zone
Reviewer: David from UK
many thanks
Limitation of Analytic Functions
December 16, 2004 - 4am Central time zone
Reviewer: Nilanjan Ray from India
I am using the following view
create or replace view vw_history as
select
txm_dt,s_key,s_hist_slno,cm_key,burst_key,cm_channel_key
,(lag(s_hist_slno,1,0) over(partition by s_key,s_hist_slno order by s_key,s_hist_slno))
prv_hist_slno
from adc_history
The following SQL statement invariably does a full table scan on 112,861,91 rows of ADC_HISTORY and
runs for 20-25 mins.
select *
from vw_history
where t_dt between to_date('01/01/2002','dd/mm/yyyy') and to_date('01/01/2002','dd/mm/yyyy');
The query return 4200 rows. ADC_HISTORY has 112,861,91 rows. I have the following indexes :
ADC_HISTORY_IDX8 on txm_dt and ADC_HISTORY_IDX1 on spot_key columns. Both have good selectivities.
But when the required query is ran without the view it properly uses the index ADC_HISTORY_IDX8
select
txm_dt,s_key,s_hist_slno,cm_key,burst_key,cm_channel_key
,(lag(s_hist_slno,1,0) over(partition by s_key,s_hist_slno order by s_key,s_hist_slno))
prv_hist_slno
from adc_history
I had raised a tar and it says:This is the expected behaviour "PREDICATES ARE NOT PUSHED IN THE
VIEW IF ANY ANALYTIC FUNCTIONS ARE USED"
Is there any way to work aroung this limitation. I just cannot think of the painful situation if I
am unable to use views with analytics!!!!
Your help is absolutely necessary. Thanks in advance
Followup December 16, 2004 - 8am Central time zone:
guess what -- your two queries return different answers..
did you consider that? did you check that?
they are TOTALLY DIFFERENT. Analytics are applied after predicates. The view -- it has no
predicate. The query -- it has a predicate. You'll find that you have DIFFERENT result sets.
don't you see that as a problem?
It is not that you are "unable to use views"
It is that "when I use a view, I get answer 1, when I do not use a view, I get answer 2"
which answer is technically correct here?
Think about it.
consider this example (using RBO just to make it so that "if an index could be used it would" to
stress the point):
ops$tkyte@ORA9IR2> create table emp as select * from scott.emp;
Table created.
ops$tkyte@ORA9IR2> create index job_idx on emp(job);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace view v
2 as
3 select ename, sal, job,
4 sum(sal) over (partition by job) sal_by_job,
5 sum(sal) over (partition by deptno) sal_by_deptno
6 from emp
7 /
View created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace on explain
ops$tkyte@ORA9IR2> select *
2 from v
3 where job = 'CLERK'
4 /
ENAME SAL JOB SAL_BY_JOB SAL_BY_DEPTNO
---------- ---------- --------- ---------- -------------
MILLER 1300 CLERK 4150 8750
JAMES 950 CLERK 4150 9400
SMITH 800 CLERK 4150 10875
ADAMS 1100 CLERK 4150 10875
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE
1 0 VIEW OF 'V'
2 1 WINDOW (SORT)
3 2 WINDOW (SORT)
4 3 TABLE ACCESS (FULL) OF 'EMP'
so, one might ask "well - hey, I've got that beautiful index on JOB, I said "where job =
'CLERK'", whats up with that full scan.
in fact, when I do it "right" -- without the evil view:
ops$tkyte@ORA9IR2> select ename, sal, job,
2 sum(sal) over (partition by job) sal_by_job,
3 sum(sal) over (partition by deptno) sal_by_deptno
4 from emp
5 where job = 'CLERK'
6 /
ENAME SAL JOB SAL_BY_JOB SAL_BY_DEPTNO
---------- ---------- --------- ---------- -------------
MILLER 1300 CLERK 4150 1300
SMITH 800 CLERK 4150 1900
ADAMS 1100 CLERK 4150 1900
JAMES 950 CLERK 4150 950
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE
1 0 WINDOW (SORT)
2 1 WINDOW (SORT)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
4 3 INDEX (RANGE SCAN) OF 'JOB_IDX' (NON-UNIQUE)
it very rapidly uses my index !!! stupid views...
but wait.
whats up with SAL_BY_DEPTNO, that appears to be wrong... hmmm, what happened?
What happened was we computed the sal_by_depto in the query without the view AFTER doing "where job
= 'CLERK'"
YOU are doing your LAG() analysis AFTER applying the predicate. Your lags in your query without
the view -- they are pretty much "not accurate"
Note that when the predicate CAN be pushed:
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select ename, sal, sal_by_job
2 from v
3 where job = 'CLERK'
4 /
ENAME SAL SAL_BY_JOB
---------- ---------- ----------
SMITH 800 4150
ADAMS 1100 4150
JAMES 950 4150
MILLER 1300 4150
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE
1 0 VIEW OF 'V'
2 1 WINDOW (BUFFER)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
4 3 INDEX (RANGE SCAN) OF 'JOB_IDX' (NON-UNIQUE)
it most certainly is. here the predicate can safely be pushed -- since the analytic is computed
"by job", a predicate on "job" can be applied FIRST and then the analytic computed.
When pushing would change the answer -- we cannot do it.
When pushing the predicate would not change the answer -- we do it.
This is not a 'limitation', this is about "getting the right answer"
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> alter session set optimizer_mode = choose;
Session altered.
Great!!!
December 17, 2004 - 12pm Central time zone
Reviewer: Nilanjan Ray from India
Simply amazing explanation. Cleared my doubts still further. One of the best explanation, in simple
concise terms, I have seen on "Ask Tom". You know what, people should take enough caution and learn
leasons from you before making misleading statements like "...LIMITATIONS...". In your terms yet
again "Analytics Rock".
Regards
Using analytical function, LEAD, LAG
December 24, 2004 - 9am Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,
Analytical function, LEAD (or LAG) accepts the offset parameter as an integer which is a count
of rows to be skipped from the current row before accessing the leading/lagging row. What if I want
to access leading rows based on the value of column of current row, like a function applied to the
column value of current row to access the leading row.
As an example: I have a table
create table t(id integer, dt date);
For each id, start with the first record, after ordering by dt ASC. Get the next record where dt =
10 min + first_row.dt. Then next record where dt = 20 min + first_row.dt and so on. Each time time
is cummulatively increased by 10 min.
Suppose if don't get an exact match from next record (ie next_row.dt <> first_row.dt+10 min(say),
then we select a row closest to the expected record, but lying within +/-10 seconds.
insert into t values (1, to_date('12/20/2004 00:00:00', 'mm/dd/yyyy hh24:mi:ss')); --Selected.
insert into t values (1, to_date('12/20/2004 00:05:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into t values (1, to_date('12/20/2004 00:09:55', 'mm/dd/yyyy hh24:mi:ss'));
insert into t values (1, to_date('12/20/2004 00:10:00', 'mm/dd/yyyy hh24:mi:ss')); --Selected.
insert into t values (1, to_date('12/20/2004 00:15:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into t values (1, to_date('12/20/2004 00:19:54', 'mm/dd/yyyy hh24:mi:ss')); --Not selected.
insert into t values (1, to_date('12/20/2004 00:19:55', 'mm/dd/yyyy hh24:mi:ss')); --Selected.
insert into t values (1, to_date('12/20/2004 00:25:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into t values (1, to_date('12/20/2004 00:30:05', 'mm/dd/yyyy hh24:mi:ss')); --Selected.
insert into t values (1, to_date('12/20/2004 00:30:06', 'mm/dd/yyyy hh24:mi:ss')); --Not Selected.
insert into t values (1, to_date('12/20/2004 00:35:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into t values (1, to_date('12/20/2004 00:39:55', 'mm/dd/yyyy hh24:mi:ss')); --Either this or
below record is selected.
insert into t values (1, to_date('12/20/2004 00:40:05', 'mm/dd/yyyy hh24:mi:ss')); --Either this or
above record is selected.
My output would be:
id dt
-----------
1 12/20/2004 00:00:00 AM
1 12/20/2004 00:10:00 AM --Exactly matches first_row.dt + 10min
1 12/20/2004 00:19:55 AM --Closest to first_row.dt + 20min +/- 10sec
1 12/20/2004 00:30:05 AM --Closest to first_row.dt + 30min +/- 10sec
1 12/20/2004 00:39:55 AM OR 12/20/2004 00:40:05 AM --Closest to first_row.dt + 40min +/- 10sec
The method I followed, after failed using LEAD is:
Step#1
------
Get a subset of dt's column, which is a 10 min cummulatiave dts from the dt value of first
row(after rounding to the nearest minute, multiple of 10).
In this example I will get a subset:
12/20/2004 00:00:00 AM
12/20/2004 00:10:00 AM
12/20/2004 00:20:00 AM
12/20/2004 00:30:00 AM
12/20/2004 00:40:00 AM
This query will do it:
SELECT t1.id,
( min_dt - MOD ((ROUND (min_dt, 'mi') - ROUND (min_dt, 'hh')) * 24 * 60, 10) / (24 *
60)) + (ROWNUM - 1) * 10 / (24 * 60) dt_rounded
FROM (SELECT id, MIN (dt) min_dt,
ROUND ((MAX (dt) - MIN (dt)) * 24 * 60 / 10) max_rows
FROM t
WHERE id = 1
GROUP BY id) t1, t
WHERE ROWNUM <= max_rows + 1
Step#2:
-------
This subquery is joined with table t to get only those records from t which is either equal to the
dts in the resultset returned by the subquery or fall within the range 10min +/-10sec (not closest
only, but all).
SELECT t.id, dt_rounded, ABS (t.dt - dt_rounded) * 24 * 60 * 60 dt_diff_in_sec
FROM t,
(SELECT t1.id,
( min_dt - MOD ((ROUND (min_dt, 'mi') - ROUND (min_dt, 'hh')) * 24 * 60, 10) / (24 *
60)) + (ROWNUM - 1) * 10 / (24 * 60) dt_rounded
FROM (SELECT id, MIN (dt) min_dt,
ROUND ((MAX (dt) - MIN (dt)) * 24 * 60 / 10) max_rows
FROM t
WHERE id = 1
GROUP BY id) t1, t
WHERE ROWNUM <= max_rows + 1) t2
WHERE t.id = 1
AND ABS (t.dt - dt_rounded) * 24 * 60 * 60 <= 10
ORDER BY t.id, dt_rounded, dt_diff_in_sec;
I agree, this resultset will include duplicate records which I need to remove procedurally, while
looping through the cursor; the order by clause simplifies this.
Now you might have guessed the problem. If table t contains more than 1000 records, the query asks
me to wait atleast 2 min! And that too when I am planning to put at least 70,000 records!
I wrote a procedure which is handling the situation a little better. But I dont know if analytical
query can help me out to bring back the performance. I could do it if Lead have the fuctionality I
mentioned in the first paragraph. Do you have any hints?
Thanks and regards
Praveen
Followup December 24, 2004 - 9am Central time zone:
you'd be looking at first_value with range windows, not lag and lead in this case.
Windowing clause and range function.
December 25, 2004 - 1pm Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,
Thankyou for the suggestion. I am not very well used with analytical queries. I have tried
based on your advise but unable to even start with. I am struck with the first step itself - in
specifying the range in the windowing clause. In the windowing clause, we specify an integer to get
the preceding rows based on the current column value (CLARK's example-Page:556, Analytical
Funtions).
In my above example I wrote a query which contains:
FIRST_VALUE(id)
OVER (ORDER BY dt DESC
RANGE 10 PRECEDING)
10, in the windowing clause, will give me a record that fall within 10 days preceding the current
row. But I need 10 minutes preceding records. Also at the same time all those records that span
within +/- 10 sec, if exact 10 minute later records are not found (please see the description of
the problem given in the previous question).
Kindly give me a more clear picture about windowing clause.
Also how you will approch the above problem.
Thanks and regards
Praveen
Followup December 26, 2004 - 12pm Central time zone:
do you have Expert One on One Oracle? I have extensive examples in there.
range 10 = 10 days.
range 10/24 = 10 hours
range 10/24/60 = 10 minutes......
I do have Expert One on One
December 26, 2004 - 2pm Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,
I got the first glimpse into analytical queries through your book only. Although I had attempted
to learn them through oracle documentation a couple of times earlier, I never was able to write an
decent query using analytical functions. Now, after spending a few hours with your book, I can see
that these fuctions are not as complex as I thought earlier.
The 'hiredate' example you have given in the book is calculating in terms of days. (Pg:555)
"select ename, sal, hiredate, hiredate-100 window_top
first_value(ename)
over(order by hiredate asc
range 100 preceding) ename_prec,...."
I got the hint from your follow-up. I should have to think a little myself.
Thankyou Tom,
Praveen.

December 26, 2004 - 5pm Central time zone
Reviewer: A reader
Tom,
Any dates when you would be releasing your book on Analytic?
Thanks.
Followup December 26, 2004 - 6pm Central time zone:
doing a 2nd edition of Expert One on One Oracle now -- not on the list yet.
Great answer!
December 27, 2004 - 2am Central time zone
Reviewer: Shimon Tourgeman
Dear Tom,
Could you please tell us when you are going to publish the next edition of your books, covering
9iR2 and maybe 10g, as you stated here?
Merry Christmas and a Happy New Year!
Shimon.
Followup December 27, 2004 - 10am Central time zone:
sometime in 2005, but not the first 1/2 :)
Using range windows
January 3, 2005 - 8am Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,
Please allow me to explain the problem again which you had
followed up earlier (Please refer: "Using analytical
function, LEAD, LAG"). In the table t(id integer, dt date)
I have records which only differ by seconds ('dt' column).
Could you please help me to write a query to create windows
such that each window groups records based on the
expression 590 <= dt_1 <= 610 (590 & 610 are date
difference between first record and current record in
seconds and dt1 is the 'dt' column value of first record in
each window after ordering by 'id' and 'dt' ASC).
The idea is to find a record following the first record
which leads by 10 minutes. If exact match is not found
apply a tolerance of +/-10 seconds. Once the nearest match
is found (if multiple matches are found, select any), start
from the next record and repeat the process. (Please see
the scripts I had given earlier).
In your follow up, you had suggested the use of
first_value() analytical function with range windows. But
it looks like it is pretty difficult to generate the kind
of windows I specified above. And in your book, examples of
such complex nature where not given (pardon me for being
critical).
Your answer will help me to get a deeper and practical
understanding of analytical functions while at the same
time may help us to bring down a 12 hour procedure to less
than 5 hours.
Thanks and regards
Praveen
Followup January 3, 2005 - 9am Central time zone:
no idea what 590 is. days? hours? seconds?
sorry - this doesn't compute to me.
590 <= dt_1 <= 610???
Delete Records Older Than 90 Days While Keeping Max
January 3, 2005 - 10am Central time zone
Reviewer: Mac
There is a DATE column in a table. I need to delete all records older than 90 days -- except if the
newest record for a unique key happens to be older than 90 days, I want to keep it and delete the
prior records for that key value.
How?
Followup January 3, 2005 - 10am Central time zone:
if the "newest record for a unique key"
if the key is unique.... then the date column is the only thing to be looked at?
that is, if the key is unique, then the oldest record is the newest record is in the fact the only
record....
Oops, but
January 3, 2005 - 11am Central time zone
Reviewer: A reader
Sorry, forgot to mention that the DATE column is a part of the unique key.
Sorry, I went a bit fast...
January 3, 2005 - 2pm Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,
Sorry, I didnt explained properly.
590 = (10 minutes * 60) seconds - 10 seconds
600 = (10 minutes * 60) seconds + 10 seconds
Here I am looking for a record (say rn) exactly
600 sec (10 min) later to the first record in
the range window. If I didn't get an exact match
I try to find a record which is closest to rn,
but lies with in a range which is 10 seconds less
than or more than rn.
And the condition
"590 <= dt_1 <= 610" tries to eliminate all other
records inside the range window that does not follow
the above rule.
dt_1 is the dt column value of any row following the
first row in a given range window, such that the
difference between dt_1 and dt of first row is between
590 seconds and 610 seconds. I am interested in only
one record which lies closest to 600 seconds.
I hope, the picture is more clear to you now. As an
example,
id dt
-----------------------------
1 12/20/2004 00:00:00 AM --Range window #1
1 12/20/2004 00:09:55 AM
1 12/20/2004 00:10:00 AM --Selected (Closest to 12/20/2004 00:10:00 AM)
............................
1 12/20/2004 00:10:10 AM --Range window #2
1 12/20/2004 00:19:55 AM --Selected (Closest to 12/20/2004 00:20:00 AM)
1 12/20/2004 00:20:55 AM
............................
1 12/20/2004 00:20:55 AM --Range window #3
1 12/20/2004 00:25:00 AM --Nothing to select
1 12/20/2004 00:29:10 AM --Nothing to select
...........................
1 12/20/2004 00:30:05 AM --Range window #4
1 12/20/2004 00:39:55 AM --Either one is selected
1 12/20/2004 00:40:05 AM --Either one is selected
-----------------------------
Thanks and regards
Praveen
Followup January 3, 2005 - 10pm Central time zone:
that is first_value, last_value with a range window and the time range is
N * 1/24/60/60 -- for N seconds.
How to mimic Oracle 10g LAST_VALUE(... IGNORE NULLS)?
January 13, 2005 - 3pm Central time zone
Reviewer: jayaramj@quinnox.com
Hi Tom,
In answer to the question 'How to mimic Ora10g LAST_VALUE(... IGNORE NULLS)?' from reviewer Sergey
(from Norway) in this post you have proposed the following solution:
ops$tkyte@ORA10G> select t,
2 sign_v,
3 v,
4 substr( max(data) over (order by t), 7 ) v2
5 from (
6 SELECT TD.T,
7 SIGN(NVL(TV.V, 0)) sign_v,
8 NVL(TV.V, LAST_VALUE(TV.V IGNORE NULLS) OVER ( ORDER BY TD.T )) V,
9 case when tv.v is not null
10 then to_char( row_number()
over (order by td.t), 'fm000000' ) || tv.v
11 end data
12 FROM TD, TV
13 WHERE TV.T(+) = TD.T
14 )
15 ORDER BY T
16 ;
The problem is that this solution converts the data type of the column (in this case column TV.V)
to a string (V2 in the result is a string). The result would then need to be converted back to the
original data type.
It is best to avoid such data type conversion. Is there a solution to mimic Oracle 10g
LAST_VALUE(... IGNORE NULLS) in Oracle 9i without the datatype conversion?
Followup January 13, 2005 - 3pm Central time zone:
encode the date as a string using to_char( v, 'yyyymmddhh24miss' ) and in the substr of it back out
-- to_date( substr(...), 'yyyymmddhh24miss' )
How to mimic Oracle 10g LAST_VALUE(... IGNORE NULLS)?
January 14, 2005 - 12am Central time zone
Reviewer: Jay
In response to your post above - Taking care of dates (for datatype conversion) is not complex
(though timestamp variants would require a different format string). Object columns are a different
story altogether. These cannot be easily converted to strings. Is there a better solution that does
not require datatype conversion (and hence does not require any knowledge of the column datatype in
this SQL).
Followup January 14, 2005 - 8am Central time zone:
upgrade to 10g.
find prior collect_date to the max collect_date for each customer
January 25, 2005 - 4pm Central time zone
Reviewer: JANE
Hello,Tom!
I work in ORACLE 8I
I have table with 2 columns:cstmr_no,collect_date
CREATE TABLE CSTMR_dates
(
CSTMR_NO NUMBER(8) NOT NULL,
COLLECT_DATE DATE NOT NULL);
insert into cstmr_dates
values(18,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/03/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/05/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/11/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/03/04','dd/mm/yy');
How can i do instead this query the query using analytical
function:
select cstmr_no,max(collect_date) from
CSTMR_dates
where collect_date<(select max(RETURN_COLLECT_DATE)
group by cstmr_no
In production i have thousands records in the table.
THANK A LOT
JANE
Followup January 25, 2005 - 6pm Central time zone:
no idea what "return_collect_date" is. or where it comes from.
the sql is not sql...
Mistake:return_collect_date is a collect_date
January 26, 2005 - 2am Central time zone
Reviewer: JANE
Thank you for answer
JANE
Followup January 26, 2005 - 8am Central time zone:
but this sql:
select cstmr_no,max(collect_date) from
CSTMR_dates
where collect_date<(select max(COLLECT_DATE)
group by cstmr_no
is still not sql and I don't know if you want to
a) delete all old data BY CSTMR_NO (eg: keep just the record with the max(collect_date) BY CSTMR_NO
b) delete all data such that the collect_date is not equal to the max(collect_date)
I cannot suggest a way to rewrite an invalid sql query.
No,i want to do the next:
January 26, 2005 - 9am Central time zone
Reviewer: A reader
i have just to presene the data without deleting anything
For each cstmr i have to see:
cstmr_no max(collect_date) last prior date to max
======== ================= ======================
18 01/05/04 01/03/04
248 01/11/04 01/03/04
insert into cstmr_dates
values(18,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/03/04','dd/mm/yy');
insert into cstmr_dates
values(18,to_date('01/05/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/11/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/02/04','dd/mm/yy');
insert into cstmr_dates
values(248,to_date('01/03/04','dd/mm/yy');
Followup January 26, 2005 - 9am Central time zone:
wow, how we got from:
select cstmr_no,max(collect_date) from
CSTMR_dates
where collect_date<(select max(RETURN_COLLECT_DATE)
group by cstmr_no
to this, well -- just "wow". horse of a very different color.
I have to sort of guess -- maybe I'll get it right -- you want
a) every cstmr_no,
b) the last two dates recorded for them.
well, after editing your inserts to make them become actual sql that can run.... (you don't really
use YY in real life do you? please please say "no, that was a mistake...")
ops$tkyte@ORA9IR2> select cstmr_no,
2 max( decode(rn,1,collect_date) ) d1,
3 max( decode(rn,2,collect_date) ) d1
4 from (
5 select cstmr_no,
6 collect_date,
7 row_number() over (partition by cstmr_no order by collect_date desc nulls last) rn
8 from cstmr_dates
9 )
10 where rn <= 2
11 group by cstmr_no
12 /
CSTMR_NO D1 D1
---------- --------- ---------
18 01-MAY-04 01-MAR-04
248 01-NOV-04 01-MAR-04
Lead/Lag and Indexes
February 22, 2005 - 6pm Central time zone
Reviewer: Rob H from Winnipeg Canada
We are using the Lead and Lag functions and I have run into an issue of Index usage.
lets say I have 2 tables
select customer_num, prod_id, date_sold, total_sales from sales_table_NA
and
select customer_num, prod_id, date_sold, total_sales from sales_table_EUR
if i do a
create view eur_sales
select customer_account, prod_id, trunc(sales_date,'mon') month_purch,
sum(total_sales) sales_current, lead(sum(total_sales),1) over(partition by customer_account,
prod_id order by trunc(sales_date,'mon') desc) sales_last
from sales_table_EUR
group by customer_account, prod_id
create view na_sales as
select customer_account, prod_id, trunc(sales_date,'mon') month_purch,
sum(total_sales) sales_current, lead(sum(total_sales),1) over(partition by customer_account,
prod_id order by trunc(sales_date,'mon') desc) sales_last
from sales_table_NA
group by customer_account, prod_id
There are indexes on the tables for customer_acccount
Now, if I
select * from na_sales where customer_account=1
the index is used. Same for eur_sales. However, if I UNION them together it does not (WINDOW SORT
on first select and WINDOW BUFFER on second). If I remove the lead function and UNION them, the
index is used.
Any help?
Followup February 23, 2005 - 1am Central time zone:
do you really want UNION or UNION ALL.........
(do you know the difference between the two)....
if you had given me simple setup scripts, I would have been happy to see if that makes a
difference, but oh well.
Potential Solution
February 22, 2005 - 6pm Central time zone
Reviewer: Rob H from Winnipeg Canada
Rather than pre-sum the data into 2 views I found that union'ing (actually UNION ALL) the data,
then sum and Lag works fine.
ie
select
customer_account, prod_id, sales_date month_purch,
sum(total_sales) sales_current, lead(sum(total_sales),1) over(partition by
customer_account, prod_id order by sales_date desc) sales_last
from(
select customer_account, prod_id, sales_date, total_sales from sales_table_NA
union all
select customer_account, prod_id, trunc(sales_date,'mon') month_purch, total_sales from
sales_table_EUR)
Attitude....
February 23, 2005 - 9am Central time zone
Reviewer: Rob H from Winnipeg Canada
What's the deal? Having a bad day? I'm sorry, but I assumed from the select statements you could
infer structure. Yes, I was using UNION ALL, yes, I know the difference (uh, feeling a bit rude
are we?) but I didn't realize until after I posted that I missed that (a nice feature would be to
be able to edit a post for a certain time after post). I generalized the data structure and SQL
for confidentiality reasons. For a guy who is so hard on people's IM speak, you forget to
capitalize your sentences :)
Now, UNION Vs UNION ALL didn't affect index usage (it did however have 'other' performance issues).
You can see from my next post that I worked on the issue and resolved it by not presuming each
table. With the new query, if someone issues a select with no 'where customer_account=' then it's
slower (but that also wasn't the goal).
Thanks
Followup February 24, 2005 - 4am Central time zone:
No? I was simply asking "do you know the difference between the two" for I find most people
a) don't know union all exists
b) the semantic difference between union and union all
c) the performance penalty involved with union vs union all when they didn't need to use UNION
Your example, as posted, did not use UNION ALL. Look at your text:
<quote>
Now, if I
select * from na_sales where customer_account=1
the index is used. Same for eur_sales. However, if I UNION them together it
does not (WINDOW SORT on first select and WINDOW BUFFER on second). If I remove
the lead function and UNION them, the index is used.
</quote>
I quite simply asked:
does union all change the behaviour? (i did not have an example with table creates and such to work
with, so I couldn't really 'test it', I don't have your tables, your indexes, your datatypes, etc)
do you need to use union, you said union, you did not say union all. do you know the difference
between the two.
Sorry if you took it as an insult, I can only comment based on the data provided. I had to assume
you like most of the world was using UNION, not UNION ALL and simply wanted to know if you could
use union all, if union all made a difference, if you knew the difference between the two.
If I had precience, I could have read your subsequent post and not ask any questions I guess.
Not having a bad day, just working with information provided. I was not trying to insult you -- I
was simply "asking".
Analytics
February 24, 2005 - 5am Central time zone
Reviewer: Neelz from Japan
Dear Sir,
I had gone through the above examples and was wondering whether analytical functions could be used
when aggregating multiple columns from a table,
CREATE TABLE T (
SUPPLIER_CD CHAR(4) NOT NULL,
ORDERRPT_NO CHAR(8) NOT NULL,
ORDER_DATE CHAR(8) NOT NULL,
STORE_CD CHAR(4) NOT NULL,
POSITION_NO CHAR(3 ) NOT NULL,
CONTORL_FLAG CHAR(2 ),
ORDERQUANTITY_EXP NUMBER(3) DEFAULT (0) NOT NULL,
ORDERQUANTITY_RES NUMBER(3) DEFAULT (0) NOT NULL,
ENT_DATE DATE DEFAULT (SYSDATE) NOT NULL,
UPD_DATE DATE DEFAULT (SYSDATE) NOT NULL,
CONSTRAINT PK_T PRIMARY KEY(SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE, STORE_CD));
CREATE INDEX IDX_T ON T (SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE);
insert into t values('5636','62108373','20041129','0007','2','00',1,1, to_date('2004/11/29',
'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));
insert into t values('5636','62108373','20041129','0012','2','00',1,1,to_date('2004/11/29',
'yyyy/mm/dd'), to_date('2004/11/30', 'yyyy/mm/dd'));
insert into t values('5636','62108384','20041129','0014','2','00',1,1,to_date('2004/11/29',
'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));
insert into t values('5636','62108384','20041129','0015','3','00',1,1,to_date('2004/11/29',
'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));
insert into t values('1000','11169266','20040805','1309','4','00',8,8,to_date('2004/11/29',
'yyyy/mm/dd'),to_date('2004/11/30', 'yyyy/mm/dd'));
insert into t values('1000','11169266','20040805','1312','12' ,'00',8,8,to_date('2004/04/22',
'yyyy/mm/dd'),to_date('2004/11/23', 'yyyy/mm/dd'));
insert into t values('1000','11169266','20040805','1313','13' ,'00',12,12,to_date('2004/04/22',
'yyyy/mm/dd'),to_date('2004/11/23', 'yyyy/mm/dd'));
Currently the following query is used:-
SELECT
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE,
SUM(DECODE(RTRIM(POSITION_NO),'1',ORDERQUANTITY_RES,0)) Q1,
SUM(DECODE(RTRIM(POSITION_NO),'2',ORDERQUANTITY_RES,0)) Q2,
SUM(DECODE(RTRIM(POSITION_NO),'3',ORDERQUANTITY_RES,0)) Q3,
SUM(ORDERQUANTITY_RES) ORDER_TOTAL
FROM
T
GROUP BY
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE
The execution plan when this query is executed on the real table which has 4m records is : -
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=103002 Card=3571095 Bytes=107132850)
1 0 SORT (GROUP BY NOSORT) (Cost=103002 Card=3571095 Bytes=107 132850)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=10 3002 Card=3571095 Bytes=107132850)
3 2 INDEX (FULL SCAN) OF 'IDX_T' (NON-UNIQUE) (Cost=26942 Card=3571095)
Could you please tell me whether analytical functions could be used over here or a better approach
for this query.
Thanks for your great help
Followup February 24, 2005 - 5am Central time zone:
there would be no need of analytics here. analytics would be useful to get the 'aggregates' while
preserving the 'details'
eg:
select empno, sal, sum(sal) over (partition by deptno)
from emp;
shows the empno, their sal and the sum of all salaries in their dept. that would be instead of
coding:
select empno, sal, sum_sal
from emp, (select deptno, sum(sal) sum_sal from emp gropu by deptno) t
where emp.deptno = t.deptno
/
I was just wondering
February 24, 2005 - 6am Central time zone
Reviewer: A reader
how would analytics help in the following example (the data nodes are implemented as rows in a
table with two columns as pointers: split-from and merge-to, and the third column is "value", some
number, not shown on diagram):
http://img23.exs.cx/my.php?loc=img23&image=directedgraph11th.png
The task is to use this directed dependency graph and prorate the "value" column in each row/node
in the following way:
foreach node
-start with a node, for example 16
-visit each hierarchy on which 16 depends, in this case hierarchies for 14 and 15, SUM their values
and the current value of node 16, and that will be new, prorated value for node 16
-repeat this recursively for each sub-hierarchy
until all nodes are prorated
I was thinking maybe to use combination of sys_connect_by_path and AF but not sure how. Any
thoughts?
Followup February 24, 2005 - 6am Central time zone:
you won't get very far with that structure in 9i and before. connect by "loop" will be an error
you see lots of with a directed graph.
analytics won't be appropriate either, they work on windows - not on hierarchies.
sys_connect_by_path is going to give you a string, not a sum
a scalar subquery in 10g with NOCYCLE on the query might work.
What if there is no closure inside the graph?
February 24, 2005 - 9am Central time zone
Reviewer: A reader
i.e. if the link between node 9 and 5 is removed, and the link between node 6 and 0 is removed.
Would that make difference? It would be a tree in that case. How should we proceed if that is the
case? I was thinking maybe to use sys_connect_by_path to pack all sub-hierarchies one after
another, and marker in window to be the depth or level. If the level switch from n to 1 that would
mean the end of sub-hierarchy. If the level switch from 1 to 2 that is the begining of the
hierarchy. And then aggregate over partition inside hierarchy view. Or is there a better approach?
Lead/Lag and 0 Sales
February 24, 2005 - 1pm Central time zone
Reviewer: Rob H from Winnipeg Canada
Thanks for all of the help so far. I have run into an issue where I have Companies and Contacts at
that company. Here are the tables.
create table SALES_TRANS
(
CUSTOMER_ACCOUNT VARCHAR2(8) ,
STATION_NUMBER VARCHAR2(7) ,
PRODUCT_CODE VARCHAR2(8) ,
QUANTITY NUMBER ,
DATE_ISSUE DATE ,
PRICE NUMBER ,
VALUE NUMBER );
/
Create table COMPANY_CUSTOMER
(
COMPANY_ID NUMBER(9),
CUSTOMER_ACCOUNT VARCHAR2(8));
/
Create table PRODUCT_INFO
(
PRODUCT_CODE VARCHAR2(8) ,
PRODUCT_GROUP VARCHAR2(25),
PRODUCT_DESC VARCHAR2(100)
);
/
Running a query by customer (this select is a view called - SUM_CUST_TRANS_PRODUCT_FY_V)
Select
c.COMPANY_ID,
t.CUSTOMER_ACCOUNT,
p.product_group,
FISCAL_YEAR(DATE_ISSUE) fiscal_year,
sum(VALUE) total_VALUE_curr_y,
lead(sum(VALUE),1) over (partition by c.COMPANY_ID, t.CUSTOMER_ACCOUNT, p.product_group order by
FISCAL_YEAR(DATE_ISSUE) desc) total_VALUE_pre_y
From SALES_TRANS t
inner join COMPANY_CUSTOMER c on t.CUSTOMER_ACCOUNT = C.CUSTOMER_ACCOUNT
inner join PRODUCT_INFO P ON t.PRODUCT_CODE = p.PRODUCT_CODE
group by c.OMPANY_ID, t.CUSTOMER_ACCOUNT, p.product_group, fiscal_year
I get
COMPANY_ID,CUSTOMER_ACCOUNT,PRODUCT_GROUP,FISCAL_YEAR,TOTAL_VALUE_CURR_Y,TOTAL_VALUE_PRE_Y
"F0009631","27294370","Product1",2002,1460.08,0
"F0009631","27294370","Product2",2005,0,27926.31
"F0009631","27294370","Product2",2004,27926.31,18086.17
"F0009631","27294370","Product2",2003,18086.17,47597.05
"F0009631","27294370","Product2",2002,47597.05,0
"F0009631","27294370","Product2",2001,0,0
"F0009631","27294370","Product3",2004,64582.6,51041
"F0009631","27294370","Product3",2003,51041,60225
"F0009631","27294370","Product3",2002,60225,43150
"F0009631","27294370","Product3",2001,43150,50491
"F0009631","27294370","Product3",2000,50491,664
"F0009631","27294370","Product3",1999,664,0
"F0009631","27294370","Product4",2005,2119.1,1708.61
"F0009631","27294370","Product4",2004,1708.61,4050.82
"F0009631","27294370","Product4",2003,4050.82,15662.57
"F0009631","27294370","Product4",2002,15662.57,0
"F0009631","27294370","Product5",2005,0,351.64
"F0009631","27294370","Product5",2004,351.64,5873.61
"F0009631","27294370","Product5",2003,5873.61,2548.83
"F0009631","27294370","Product5",2002,2548.83,0
"F0009631","27294370","Product6",2004,17347.84,16781.33
"F0009631","27294370","Product6",2003,16781.33,10575
"F0009631","27294370","Product6",2002,10575,3659.67
"F0009631","27294370","Product6",2001,3659.67,4901.67
"F0009631","27294370","Product6",2000,4901.67,4073.47
"F0009631","27294370","Product6",1999,4073.47,0
"F0009631","27294370","Product7",2004,5377.5,2588
"F0009631","27294370","Product7",2003,2588,245
"F0009631","27294370","Product7",2000,245,0
"F0009631","27340843","Product2",2003,3013.71,0
"F0009631","27340843","Product3",1999,1411,0
"F0009631","27340843","Product5",2003,3254.9,0
Now if I run the same grouping by only company (this select is a view called -
SUM_COMPANY_TRANS_PRODUCT_FY_V)
Select
c.COMPANY_ID,
p.product_group,
FISCAL_YEAR(DATE_ISSUE) fiscal_year,
sum(VALUE) total_VALUE_curr_y,
lead(sum(VALUE),1) over (partition by c.COMPANY_ID, p.product_group order by
FISCAL_YEAR(DATE_ISSUE) desc) total_VALUE_pre_y
From SALES_TRANS t
inner join COMPANY_CUSTOMER c on t.CUSTOMER_ACCOUNT = C.CUSTOMER_ACCOUNT
inner join PRODUCT_INFO P ON t.PRODUCT_CODE = p.PRODUCT_CODE
group by c.COMPANY_ID, p.product_group, fiscal_year
we get
COMPANY_ID,PRODUCT_GROUP,FISCAL_YEAR,TOTAL_VALUE_CURR_Y,TOTAL_VALUE_PRE_Y
"F0009631","Product1",2002,1460.08,0
"F0009631","Product2",2005,0,27926.31
"F0009631","Product2",2004,27926.31,21099.88
"F0009631","Product2",2003,21099.88,47597.05
"F0009631","Product2",2002,47597.05,0
"F0009631","Product2",2001,0,0
"F0009631","Product3",2004,64582.6,51041
"F0009631","Product3",2003,51041,60225
"F0009631","Product3",2002,60225,43150
"F0009631","Product3",2001,43150,50491
"F0009631","Product3",2000,50491,2075
"F0009631","Product3",1999,2075,0
"F0009631","Product4",2005,2119.1,1708.61
"F0009631","Product4",2004,1708.61,4050.82
"F0009631","Product4",2003,4050.82,15662.57
"F0009631","Product4",2002,15662.57,0
"F0009631","Product5",2005,0,351.64
"F0009631","Product5",2004,351.64,9128.51
"F0009631","Product5",2003,9128.51,2548.83
"F0009631","Product5",2002,2548.83,0
"F0009631","Product6",2004,17347.84,16781.33
"F0009631","Product6",2003,16781.33,10575
"F0009631","Product6",2002,10575,3659.67
"F0009631","Product6",2001,3659.67,4901.67
"F0009631","Product6",2000,4901.67,4073.47
"F0009631","Product6",1999,4073.47,0
"F0009631","Product7",2004,5377.5,2588
"F0009631","Product7",2003,2588,245
"F0009631","Product7",2000,245,0
The problem is that because if I
select * from SUM_CUST_TRANS_PRODUCT_FY_V where fiscal_year=2004
Customer 27340843 will not show up (no 2004 purchases), but that also means that the
total_VALUE_pre_y for 2004 will never summarize by customer to the total_VALUE_pre_y for 2004 for
the company. Is there a better way to do this. The goal is that we can show current year sales vs
previous years sales by company, by customer, and potentially a larger summary higher than company
(city).
I guess the idea would be that I could somehow show for all customers in a company, all years, all
products, that the company has purchases (cartesian) for every year purchasing. This I think is
difficult for large customer, sales transaction tables.
ie
"F0009631","27340843","Product2",2004,0,3013.71 <--- ***
"F0009631","27340843","Product2",2003,3013.71,0
*** This row doesn't exist in the customer view. There are no 2004 sales, so doesn't appear, but
we would like to see it so that the year previous shows.
I would love to "attach" some of the transactions if it would help. Is there a better way?
hierarchical cubes + MV?
February 25, 2005 - 2pm Central time zone
Reviewer: Rob H from Winnipeg Canada
Would hierarchical cubes and MV be the solution. It seems like a lot of meta data to create. We
would have to create it for all customers, for all years, for all product groups.
Followup February 25, 2005 - 6pm Central time zone:
if you have "missing data", the only way i know to "make it up" is an outer join (partitioned outer
joins in 10g rock, removing the need to create cartesian products of every dimension first)

February 27, 2005 - 2am Central time zone
Reviewer: Neelz from Japan
Dear Sir,
This is with regards to my previous post which is 5th above from this.
<quote>
SELECT
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE,
SUM(DECODE(RTRIM(POSITION_NO),'1',ORDERQUANTITY_RES,0)) Q1,
SUM(DECODE(RTRIM(POSITION_NO),'2',ORDERQUANTITY_RES,0)) Q2,
SUM(DECODE(RTRIM(POSITION_NO),'3',ORDERQUANTITY_RES,0)) Q3,
SUM(ORDERQUANTITY_RES) ORDER_TOTAL
FROM
T
GROUP BY
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE
</quote>
As you mentioned analytics could not be used, but could you please advice me on my problem,
The query is infact big, for brevity I just put few columns. The actual query is
SELECT
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE,
SUM(DECODE(RTRIM(POSITION_NO),'1',ORDERQUANTITY_RES,0)) Q1,
SUM(DECODE(RTRIM(POSITION_NO),'2',ORDERQUANTITY_RES,0)) Q2,
SUM(DECODE(RTRIM(POSITION_NO),'3',ORDERQUANTITY_RES,0)) Q3,
.....
.....
.....
.....
.....
.....
SUM(DECODE(RTRIM(POSITION_NO),'197',ORDERQUANTITY_RES,0)) Q197,
SUM(DECODE(RTRIM(POSITION_NO),'198',ORDERQUANTITY_RES,0)) Q198,
SUM(DECODE(RTRIM(POSITION_NO),'199',ORDERQUANTITY_RES,0)) Q199,
SUM(DECODE(RTRIM(POSITION_NO),'200',ORDERQUANTITY_RES,0)) Q200,
SUM(ORDERQUANTITY_RES) ORDER_TOTAL
FROM
T
GROUP BY
SUPPLIER_CD, ORDERRPT_NO, ORDER_DATE
As you could see there is a definite pattern on the sum function. Could you please help me in
tuning this query?
Thanks in advance
Followup February 27, 2005 - 8am Central time zone:
you aer doing a pivot -- looks great to me? It is "classic"

February 27, 2005 - 9am Central time zone
Reviewer: Neelz from Japan
Dear Sir,
I am sorry if you felt like that, It is quite a new world for me here, started visiting this site
3-4 months back then realized the enormity of it and its become like an addiction. Bought both
books by you and started working on it. Reading the Oracle concepts guide. Every day many times
will try for asking a question but till now no luck, might be because of timezone difference.
Coming back to my question, since it is a huge query and was taking 35 min to execute, after
reading through many articles here and in the books I was really confused as to what approach
should I take. Still is. Analytical functions (not useful as you told), Function based indexes(no
becuase we have a standard edition), Materialized views(no because its an OLTP), Stored Sql
functions, Deterministic keyword, user defined aggregates, optimizer hints.. at present it is
confusing for me.
I am working on it with different approaches, could reduce the execution time upto 9.08 minutes.
The query was written with an index hint earlier and by removing it, the execution time decreased
upt 9+ minutes.
I was thinking whether you could advice on what approach should I take
Thanks for your valuable time,
Followup February 27, 2005 - 10am Central time zone:
if that is taking 35 minutes you either
a) have the memory settings like pga_aggreate_target/sort_area_size set way too low
b) you have billions of records that are hundreds of bytes in width
c) really slow disks
d) an overloaded system
I mean -- that query is pretty "simple" full scan, aggregate, nothing to it -- unless it is a gross
simplification, it should not take 35 minutes. Can you trace it with the 10046 level 12 trace and
post the tkprof section that is relevant to just this query with the waits and all?

February 27, 2005 - 10am Central time zone
Reviewer: Neelz from Japan
Dear Sir,
Thank you for your kind reply,
This report is taken for the development system.
I used alter session set events '10046 trace name context forever, level 12'. The query execution
time was 00:08:15.03
select
supplier_cd, orderrpt_no, order_date,
sum(decode(rtrim(position_no),'1',orderquantity_res,0)) q1,
sum(decode(rtrim(position_no),'2',orderquantity_res,0)) q2,
sum(decode(rtrim(position_no),'3',orderquantity_res,0)) q3,
sum(decode(rtrim(position_no),'4',orderquantity_res,0)) q4,
sum(decode(rtrim(position_no),'5',orderquantity_res,0)) q5,
.....
.....
sum(decode(rtrim(position_no),'197',orderquantity_res,0)) q197,
sum(decode(rtrim(position_no),'198',orderquantity_res,0)) q198,
sum(decode(rtrim(position_no),'199',orderquantity_res,0)) q199,
sum(decode(rtrim(position_no),'200',orderquantity_res,0)) q200,
sum(orderquantity_res) order_total
from
t
group by
supplier_cd, orderrpt_no, order_date
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.04 0 0 0 0
Execute 2 0.02 0.04 0 0 0 0
Fetch 15 431.55 488.37 37147 36118 74 211
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 18 431.60 488.46 37147 36118 74 211
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 66
Rows Row Source Operation
------- ---------------------------------------------------
211 SORT GROUP BY
4205484 TABLE ACCESS FULL T
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 16 0.00 0.00
SQL*Net more data to client 30 0.00 0.00
db file sequential read 3 0.04 0.05
db file scattered read 2280 0.78 30.62
direct path write 4 0.00 0.00
direct path read 147 0.05 1.45
SQL*Net message from client 16 140.57 166.58
SQL*Net break/reset to client 2 0.01 0.01
********************************************************************************
Thank you
Followup February 27, 2005 - 11am Central time zone:
that is 8 minutes?
but I see some writes to temp here -- for 211 aggregated rows, perhaps your sort/pga is set small
Also, why do you need to rtrim() 4,205,484 rows? (and why is something called position NUMBER in a
string?) is that rtrim there "just in case" or is it really needed? why would it have trailing
blanks and is that not a data integrity issue that needs to be fixed?
(but this is an 8 minute query, not a 35 minute query, if it takes longer on production -- it'll be
because it is waiting for something -- like IO...)

February 27, 2005 - 11am Central time zone
Reviewer: Neelz from Japan
Dear Sir,
This is a 3rd party application and the query was written with an index hint earlier. After
removing the hint query execution time reduced to 8 min. Regarding the rtrim I have to check with
the team if it is really needed. I will try the trace on production tomorrow.
And at last I could see the link for "Submit a New Question"!, I think I should try around 1.00 AM
Thanking You a lot
Followup February 27, 2005 - 11am Central time zone:
depends on your time zone, rarely am I up at 1am east coast (gmt-5) time doing this stuff!

March 2, 2005 - 8am Central time zone
Reviewer: Miki from Hungary
Tom,
I need to produce a moving average which has an even window size. If I want a 28 sized window, I
need to look backward 14 but I need the first value of the window to be divided by 2 and I need to
look forward 14 and the last value of the window to be divided by 2 also.
(a1/2+a2+...+a28+a29/2)/28
How could I accomplish it with the function:
avg() over(...)?
Thanks in advance
Followup March 2, 2005 - 10am Central time zone:
this is the first thought that popped into my head:
a) get the sum(val) over 13 before and 13 after (27 rows possible).
b) get the lag(val,14)/2 and lead(val,14)/2
c) add those three numbers
d) divide by the count of non-null VALS observed (count(val) 13 before/after+ 1 if lag is not null
+ 1 is lead is not null)
ops$tkyte@ORA9IR2> create table t
2 as
3 select rownum id, object_id val
4 from all_objects
5 where rownum <= 30;
Table created.
so, this was my "debug" query, just to see the data:
ops$tkyte@ORA9IR2> select id,
2 sum(val) over
(order by id rows between 13 preceding and 13 following) sum,
3 count(val) over
(order by id rows between 13 preceding and 13 following)+
4 decode(lag(val,14) over (order by id),null,0,1)+
5 decode(lead(val,14) over (order by id),null,0,1) cnt,
6 lag(id,14) over (order by id) lagid,
7 lag(val,14) over (order by id) lagval,
8 lead(id,14) over (order by id) leadid,
9 lead(val,14) over (order by id) leadval
10 from t
11 order by id;
ID SUM CNT LAGID LAGVAL LEADID LEADVAL
---------- ---------- ---------- ---------- ---------- ---------- ----------
1 218472 15 15 6399
2 224871 16 16 19361
3 244232 17 17 23637
4 267869 18 18 14871
5 282740 19 19 20668
6 303408 20 20 18961
7 322369 21 21 15767
8 338136 22 22 20654
9 358790 23 23 7065
10 365855 24 24 17487
11 383342 25 25 11077
12 394419 26 26 20772
13 415191 27 27 15505
14 430696 28 28 12849
15 425648 29 1 17897 29 23195
16 441314 29 2 7529 30 18523
17 436505 28 3 23332
18 422306 27 4 14199
19 399409 26 5 22897
20 389266 25 6 10143
21 365728 24 7 23538
22 342135 23 8 23593
23 332316 22 9 9819
24 320581 21 10 11735
25 303084 20 11 17497
26 295369 19 12 7715
27 276010 18 13 19359
28 266791 17 14 9219
29 260392 16 15 6399
30 241031 15 16 19361
30 rows selected.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select id,
2 (sum(val) over
(order by id rows between 13 preceding and 13 following)+
3 nvl(lag(val,14) over (order by id)/2,0)+
4 nvl(lead(val,14) over (order by id)/2,0))/
5 nullif(
6 count(val) over
(order by id rows between 13 preceding and 13 following)+
7 decode(lag(val,14) over (order by id),null,0,1)+
8 decode(lead(val,14) over (order by id),null,0,1)
9 ,0) avg
10 from t
11 order by id;
ID AVG
---------- ----------
1 14778.1
2 14659.4688
3 15061.7941
4 15294.6944
5 15424.9474
6 15644.425
7 15726.3095
8 15839.2273
9 15753.1522
10 15608.2708
11 15555.22
12 15569.4231
13 15664.5741
14 15611.4464
15 15386
16 15666.8966
17 16006.1071
18 15903.9074
19 15802.2115
20 15773.5
21 15729.0417
22 15388.3261
23 15328.4318
24 15545.1667
25 15591.625
26 15748.7632
27 15871.6389
28 15964.7353
29 16474.4688
30 16714.1
30 rows selected.
ops$tkyte@ORA9IR2>
I did not do a detailed check of the results -- but that should get you going (remember -- there
are 29 rows -- 14+1+14!!! and beware NULLs)

March 2, 2005 - 10am Central time zone
Reviewer: Miki from Hungary
Tom,
Your answer is excellent. That is - almost - what I needed.
If my window size is odd I can use simly avg() over() function. I am looking for a solution where I
can also use avg() over() instead of sum() over()/count().
Is it possible?
Thank you!
Followup March 2, 2005 - 11am Central time zone:
if you want to do things to row 1 and row 29 in the window "special" like this -- this was the only
thing I thought of.

March 2, 2005 - 11am Central time zone
Reviewer: Miki from Hungary
Thank you! I will use your recommended code.
consecutive days... 8.1.7
March 9, 2005 - 1pm Central time zone
Reviewer: Dean from IL
create table day_cd
(dt date
,cd varchar2(2))
/
insert into day_cd values ('08-MAR-05', 'BD');
insert into day_cd values ('09-MAR-05', 'AD');
insert into day_cd values ('10-MAR-05', 'AD');
insert into day_cd values ('11-MAR-05', 'AD');
insert into day_cd values ('12-MAR-05', 'AD');
insert into day_cd values ('13-MAR-05', 'AD');
insert into day_cd values ('14-MAR-05', 'CD');
insert into day_cd values ('15-MAR-05', 'CD');
insert into day_cd values ('16-MAR-05', 'AD');
insert into day_cd values ('17-MAR-05', 'AD');
insert into day_cd values ('18-MAR-05', 'AD');
insert into day_cd values ('19-MAR-05', 'CD')
/
SELECT * FROM DAY_CD;
DT CD
--------- --
08-MAR-05 BD
09-MAR-05 AD
10-MAR-05 AD
11-MAR-05 AD
12-MAR-05 AD
13-MAR-05 AD
14-MAR-05 CD
15-MAR-05 CD
16-MAR-05 AD
17-MAR-05 AD
18-MAR-05 AD
19-MAR-05 CD
I'd like the count the occurrence of each code as it occurs in consecutive days as one occurrence.
So that the output would be:
CD OCCURRENCES
-- -----------
AD 2
BD 1
CD 2
nevermind...
March 9, 2005 - 1pm Central time zone
Reviewer: Dean from IL
select cd, count(*)
from
(
select cd, dt, case when (lead(dt) over (partition by cd order by dt) - dt) = 1 then 1 else 0 end
day
from day_cd
)
where day = 0
group by cd
we were responding at the same time...
March 9, 2005 - 2pm Central time zone
Reviewer: Dean from IL
:)
select cd, count(*)
from
(
select cd, dt, case when (lead(dt) over (partition by cd order by dt) - dt) = 1 then 1 else 0 end
day
from day_cd
)
where day = 0
group by cd
CD COUNT(*)
-- ----------
AD 2
BD 1
CD 2
Thanks for all of your help...
max() over() till not the current row
March 10, 2005 - 4am Central time zone
Reviewer: Miki from Hungary
Tom,
I have the following input
DATUM T COL1 COL2 COL3 COL4
2005.02.19 9:29 T 1 0 0 0
2005.02.20 9:29 0 0 0 0
2005.02.21 9:29 0 0 0 0
2005.02.22 9:29 T 1 0 0 0
2005.02.23 9:29 0 0 0 0
2005.02.24 9:29 0 0 0 0
2005.02.25 9:29 0 0 0 0
2005.02.26 9:29 0 0 0 0
2005.02.27 9:29 T 0 1 0 0
2005.02.28 9:29 0 0 0 0
2005.03.01 9:29 0 0 0 0
2005.03.02 9:29 T 1 1 0 0
2005.03.03 9:29 0 0 0 0
2005.03.04 9:29 T 1 1 0 0
2005.03.05 9:29 0 0 0 0
2005.03.06 9:29 T 1 0 0 0
2005.03.07 9:29 0 0 0 0
2005.03.08 9:29 0 0 0 0
2005.03.09 9:29 0 0 0 0
When value of column T is T a rule determines which columns (col1,
, col4) get 1 or 0.
Unfortunately, with the rule more then one column can get value 1. So, if col1+
+col4 > 1 then I
would like colx to be the previous colx where t = 'T' and col1+...+col4 = 1
So, the output is the following
DATUM T COL1 COL2 COL3 COL4
2005.02.19 9:29 T 1 0 0 0
2005.02.20 9:29 0 0 0 0
2005.02.21 9:29 0 0 0 0
2005.02.22 9:29 T 1 0 0 0
2005.02.23 9:29 0 0 0 0
2005.02.24 9:29 0 0 0 0
2005.02.25 9:29 0 0 0 0
2005.02.26 9:29 0 0 0 0
2005.02.27 9:29 T 0 1 0 0
2005.02.28 9:29 0 0 0 0
2005.03.01 9:29 0 0 0 0
2005.03.02 9:29 T 0 1 0 0
2005.03.03 9:29 0 0 0 0
2005.03.04 9:29 T 0 1 0 0
2005.03.05 9:29 0 0 0 0
2005.03.06 9:29 T 1 0 0 0
2005.03.07 9:29 0 0 0 0
2005.03.08 9:29 0 0 0 0
2005.03.09 9:29 0 0 0 0
I tried to use a max() over() function to replace the wrong value but it dosnt work because I
cant see the max datum till the previous record where t=T and col1+...+col4 = 1
...
case when t = T and col1+
+col4>1 and
greatest(nvl(max(decode(col1,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col2,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col3,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col4,1,datum)) over(order by datum), sysdate-10000)
) = nvl(max(decode(col1,1,datum)) over(order by datum), sysdate-10000) then 1 else 0 end col1,
Case when t = T and col1+
+col4>1 and
Greatest(nvl(max(decode(col1,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col2,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col3,1,datum)) over(order by datum), sysdate-10000),
nvl(max(decode(col4,1,datum)) over(order by datum), sysdate-10000)
) = nvl(max(decode(col4,1,datum)) over(order by datum), sysdate-10000) then 1 else 0 end col4
Could you give me a solution to my problem?
Thanks in advance
miki

March 10, 2005 - 8am Central time zone
Reviewer: Miki from Hungary
Here is my table populated with data:
create table T
(
DATUM DATE,
T VARCHAR2(1),
COL1 NUMBER,
COL2 NUMBER,
COL3 NUMBER,
COL4 NUMBER
);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('16-01-2005 13:17:46', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('04-01-2005 17:23:13', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('01-03-2005 02:59:17', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('11-12-2004 21:59:18', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('10-01-2005 12:00:22', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('24-02-2005 02:36:51', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('08-12-2004 11:21:15', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('07-01-2005 20:52:26', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('02-02-2005 23:44:33', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('04-03-2005 16:25:12', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('01-01-2005 19:02:28', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('22-01-2005 11:21:41', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('19-01-2005 15:32:18', 'dd-mm-yyyy hh24:mi:ss'), 'T', 1, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('19-12-2004 03:07:10', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('21-02-2005 16:25:42', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('01-01-2005 01:02:39', 'dd-mm-yyyy hh24:mi:ss'), 'T', 0, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('15-12-2004 05:49:26', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('04-02-2005 14:35:34', 'dd-mm-yyyy hh24:mi:ss'), 'T', 0, 1, 0, 0);
insert into T (DATUM, T, COL1, COL2, COL3, COL4)
values (to_date('02-12-2004 15:01:42', 'dd-mm-yyyy hh24:mi:ss'), null, 0, 0, 0, 0);
commit;
select t.* from t t
order by 1;
DATUM T COL1 COL2 COL3 COL4
1 2004.12.02. 15:01:42 0 0 0 0
2 2004.12.08. 11:21:15 0 0 0 0
3 2004.12.11. 21:59:18 T 1 0 0 0
4 2004.12.15. 5:49:26 0 0 0 0
5 2004.12.19. 3:07:10 0 0 0 0
6 2005.01.01. 1:02:39 T 0 1 0 0
7 2005.01.01. 19:02:28 0 0 0 0
8 2005.01.04. 17:23:13 T 1 1 0 0
9 2005.01.07. 20:52:26 0 0 0 0
10 2005.01.10. 12:00:22 0 0 0 0
11 2005.01.16. 13:17:46 0 0 0 0
12 2005.01.19. 15:32:18 T 1 1 0 0
13 2005.01.22. 11:21:41 0 0 0 0
14 2005.02.02. 23:44:33 0 0 0 0
15 2005.02.04. 14:35:34 T 0 1 0 0
16 2005.02.21. 16:25:42 0 0 0 0
17 2005.02.24. 2:36:51 0 0 0 0
18 2005.03.01. 2:59:17 0 0 0 0
19 2005.03.04. 16:25:12 T 1 0 0 0
Line 8 and 12 have more then one column that contain 1.
So, I need to "copy" every colx from line 6 because it is the first line (ordered by datum), that
has value 'T' for column T and only one colx has value 1.
Thank you
Followup March 10, 2005 - 8am Central time zone:
ops$tkyte@ORA9IR2> select t, col1, col2, col3, col4,
2 substr(max(data) over (order by datum),11,1) c1,
3 substr(max(data) over (order by datum),12,1) c2,
4 substr(max(data) over (order by datum),13,1) c3,
5 substr(max(data) over (order by datum),14,1) c4,
6 case when col1+col2+col3+col4 > 1 then '<---' end fix
7 from (
8 select t.*,
9 case when t = 'T' and col1+col2+col3+col4 = 1
10 then to_char(row_number() over (order by datum) ,'fm0000000000') || col1 ||
col2 || col3 || col4
11 end data
12 from t
13 )
14 order by datum;
T COL1 COL2 COL3 COL4 C C C C FIX
- ---------- ---------- ---------- ---------- - - - - ----
0 0 0 0
0 0 0 0
T 1 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0
T 0 1 0 0 0 1 0 0
0 0 0 0 0 1 0 0
T 1 1 0 0 0 1 0 0 <---
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
T 1 1 0 0 0 1 0 0 <---
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
T 0 1 0 0 0 1 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
T 1 0 0 0 1 0 0 0
19 rows selected.
Great!
March 10, 2005 - 9am Central time zone
Reviewer: Miki from Hungary
Great solution!
Thank you, it is that i expected.
book on Analytics
March 10, 2005 - 11am Central time zone
Reviewer: A reader
Hi Tom,
It is high time that you publish the book on 'Analytic functions' - there is a lot one can do with
these , but very few people are fully aware of it
When is this book due ?
thanks
A variation of Dean's question ...
March 10, 2005 - 8pm Central time zone
Reviewer: Julius from Fremont, CA
create table tt (
did number,
dd date,
status number);
alter table tt add constraint tt_pk primary key (did,dd) using index;
insert into tt values (-111,to_date('03/03/2005','mm/dd/yyyy'),11);
insert into tt values (-111,to_date('03/04/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/05/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/06/2005','mm/dd/yyyy'),11);
insert into tt values (-111,to_date('03/07/2005','mm/dd/yyyy'),33);
insert into tt values (-111,to_date('03/08/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/09/2005','mm/dd/yyyy'),22);
insert into tt values (-111,to_date('03/10/2005','mm/dd/yyyy'),22);
insert into tt values (-222,to_date('03/04/2005','mm/dd/yyyy'),33);
insert into tt values (-222,to_date('03/05/2005','mm/dd/yyyy'),33);
insert into tt values (-222,to_date('03/06/2005','mm/dd/yyyy'),77);
insert into tt values (-222,to_date('03/07/2005','mm/dd/yyyy'),33);
insert into tt values (-222,to_date('03/08/2005','mm/dd/yyyy'),55);
insert into tt values (-222,to_date('03/09/2005','mm/dd/yyyy'),11);
I need a query which would return following result set where days_in_status is a count of
consecutive days the did has been in its current status (dd values are days only). I've been trying
to use analytics but without much success so far. Any idea? Thanks!!
DID DD STATUS DAYS_IN_STATUS
----- ---------- ------ --------------
-111 03/10/2005 22 3
-222 03/09/2005 11 1
Followup March 10, 2005 - 9pm Central time zone:
ops$tkyte@ORA9IR2> select did, max(dd), count(*)
2 from (
3 select x.*, max(grp) over (partition by did order by dd desc) maxgrp
4 from (
5 select tt.*,
6 case when lag(status) over (partition by did order by dd desc) <> status
7 then 1
8 end grp
9 from tt
10 ) x
11 )
12 where maxgrp is null
13 group by did
14 /
DID MAX(DD) COUNT(*)
---------- --------- ----------
-222 09-MAR-05 1
-111 10-MAR-05 3
is one approach...
SQL Query
March 15, 2005 - 6pm Central time zone
Reviewer: a reader
Hi Tom,
create table a
(accno number(8) not null,
amount_paid number(7) not null)
/
insert into a values (1, 1000);
insert into a values (2, 1500);
insert into a values (3, 2000);
insert into a values (4, 3000);
insert into a values (5, 3000);
Could you please help me in writing the following query without using rownum and analytics.
list the accno corresponding to maximum amount paid. In case of more than one accounts having the
same max amount paid, list any one.
I am expecting the result to be accno 4 or 5
Thanks for your time.
Regards
Followup March 15, 2005 - 9pm Central time zone:
sounds like homework.
I give a similar quiz question in interviews (find the more frequently occuring month)
tkyte@ORA8IW> select substr( max( to_char(amount_paid,'fm0000000') || accno ), 8 ) accno
2 from a;
ACCNO
-----------------------------------------
5
is one possible approach (assuming that amount_paid is positive)
tkyte@ORA8IW> select max(accno)
2 from a
3 where amount_paid = ( select max(amount_paid) from a );
MAX(ACCNO)
----------
5
is another (that would work well if amount_paid,accno were indexed....)
negatives to worry about ...
March 15, 2005 - 9pm Central time zone
Reviewer: Gabe
SQL> select * from a;
ACCNO AMOUNT_PAID
---------- -----------
1 -2
2 -1
SQL> select substr( max( to_char(amount_paid,'fm0000000') || accno ), 8 ) accno from a;
ACCNO
-----------------------------------------
21
Followup March 15, 2005 - 10pm Central time zone:
....
(assuming that amount_paid is positive)
.......
that was caveated and why I gave two answers ;)
cannot read ...
March 15, 2005 - 10pm Central time zone
Reviewer: Gabe
Sorry about that ... missed it completely.
following an idea of mikito ...
March 16, 2005 - 8am Central time zone
Reviewer: Matthias Rogel from Kaiserslautern, Germany
1 select accno
2 from a
3 start with amount_paid = (select max(amount_paid) from a)
4 and accno = (select min(accno) from a where amount_paid = (select max(amount_paid) from
a))
5* connect by prior null is not null
SQL> /
ACCNO
----------
4
would be a third solution
Followup March 16, 2005 - 8am Central time zone:
there are many solutions -- this one would win a Rube Goldberg award though :)
another query using analytics
March 29, 2005 - 11am Central time zone
Reviewer: A reader
I've got 2 tables, t1 and t2.
t1(1 column):
t1.x(int ,primary key)
1
2
3
and t2(3 columns,index on t2.y):
t2.x(int) t2.y(int) t2.z(int)
1 7000 1
1 7000 6
1 8000 8
2 7000 1
2 7000 5
3 7000 3
3 8000 1
3 8000 7
3 9000 5
I would like to have a report like this:
t1.x t2.y count min max
1 7000 2 1 8
1 8000 1 1 8
2 7000 2 1 5
3 7000 1 1 7
3 8000 2 1 7
3 9000 1 1 7
What I came up with is:
select distinct t1.x,t2.y,
count(*) over (partition by t1.x,t2.y) as count,
min(t2.z) over (partition by t1.x) as min,
max(t2.z) over (partition by t1.x) as max
from t1,t2 where
where t1.x=t2.x;
I was wondering if this query is good enough, or if there's a better way(in terms of performance)
to write this query. I'm new to analytics, and your help would be very much appreciated.
Followup March 29, 2005 - 12pm Central time zone:
we could probably do this in analytics without the distinct, something like
select t1.x, t2.y, t2.cnt,
min(t2.z) over (partition by t1.x),
max(t2.z) over (partition by t1.x)
from t1, (select x, y, count(*) cnt from t2 group by x, y ) t2
where t1.x = t2.x;
and maybe even pusht he min/max() down into the inline view.
Analytics problem
April 8, 2005 - 12pm Central time zone
Reviewer: Mark from NY
Hi Tom,
I have a problem whose solution I'm pretty sure involves analytic functions. I've been struggling
with it for some time, but analytics are new to me. I want to go from this:
/* create and inserts */
create table test.test (ordernum varchar2(10),
tasktype char(3),
feetype varchar2(20),
amount number(10,2));
insert into test.test(ordernum, tasktype, feetype, amount)
values('123123', 'DOC', 'Product Fee', 15);
insert into test.test(ordernum, tasktype, feetype, amount)
values('123123', 'DOC', 'Copy Fee', 1);
insert into test.test(ordernum, tasktype, feetype, amount)
values('34864', 'COS', 'Setup Fee', 23);
insert into test.test(ordernum, tasktype, feetype, amount)
values('34864', 'COS', 'File Review Fee', 27);
insert into test.test(ordernum, tasktype, feetype, amount)
values('34864', 'COS', 'Statutory Fee', 23);
insert into test.test(ordernum, tasktype, feetype, amount)
values('56432', 'DOC', 'Product Fee', 80);
insert into test.test(ordernum, tasktype, feetype, amount)
values('56432', 'DOC', 'Prepayment', -16);
SQL> select tasktype, ordernum, feetype, amount from test.test;
TAS ORDERNUM FEETYPE AMOUNT
--- ---------- -------------------- ----------
DOC 123123 Product Fee 15
DOC 123123 Copy Fee 1
COS 34864 Setup Fee 23
COS 34864 File Review Fee 27
COS 34864 Statutory Fee 22
DOC 56432 Product Fee 80
DOC 56432 Prepayment -16
...to this:
TAS ORDERNUM FEE1 FEE2 FEE3 FEE4 FEE5
--- -------- ----------- -------- ---------- -------- --------
DOC Product Fee Copy Fee Prepayment
DOC 123123 15 1
DOC 56432 80 -16
COS Setup Fee File Review Fee Statutory Fee
COS 34864 23 27 22
Allow me to explain. For each tasktype I would like a heading row, which, going across, contains
all the feetypes found in test.test for that particular tasktype. There should never be more than
five feetypes.
For each ordernum under each tasktype, I would like to have the amounts going across, underneath
the appropriate feetypes.
I'm pretty sure my solution involves the lag and/or lead functions, partitioning over tasktype. I
particularly seem to have trouble wrapping my brain around the problem of how to get a distinct
ordernum while keeping intact the data in other columns (where ordernums duplicate).
I hope my explanation is clear enough.
Hope you can help. Thanks in advance. I will continue working on this.
Followup April 8, 2005 - 12pm Central time zone:
ops$tkyte@ORA9IR2> with columns
2 as
3 (select tasktype, feetype, row_number() over (partition by tasktype order by feetype) rn
4 from (select distinct tasktype, feetype from test )
5 )
6 select a.tasktype, a.ordernum,
7 to_char( max( decode( rn, 1, amount ) )) fee1,
8 to_char( max( decode( rn, 2, amount ) )) fee2,
9 to_char( max( decode( rn, 3, amount ) )) fee3,
10 to_char( max( decode( rn, 4, amount ) )) fee4,
11 to_char( max( decode( rn, 5, amount ) )) fee5
12 from test a, columns b
13 where a.tasktype = b.tasktype
14 and a.feetype = b.feetype
15 group by a.tasktype, a.ordernum
16 union all
17 select tasktype, null,
18 ( max( decode( rn, 1, feetype ) )) fee1,
19 ( max( decode( rn, 2, feetype ) )) fee2,
20 ( max( decode( rn, 3, feetype ) )) fee3,
21 ( max( decode( rn, 4, feetype ) )) fee4,
22 ( max( decode( rn, 5, feetype ) )) fee5
23 from columns
24 group by tasktype
25 order by 1 desc, 2 nulls first
26 /
TAS ORDERNUM FEE1 FEE2 FEE3 FEE4 FEE5
--- ---------- --------------- --------------- --------------- ---- ----
DOC Copy Fee Prepayment Product Fee
DOC 123123 1 15
DOC 56432 -16 80
COS File Review Fee Setup Fee Statutory Fee
COS 34864 27 23 23
of course. :)
(suggestion, break it out, run each of the bits to see what they do. basically, columns is a view
used to "pivot" on -- we needed to assign a column number to each FEETYPE by TASKTYPE. That is all
that view does.
Then, we join that to test and "pivot" naturally.
Union all in the pivot of the column names....
and sort)
RE: Analytics problem
April 8, 2005 - 1pm Central time zone
Reviewer: Mark from NY
Excellent! I'll definitely break it down to figure out exactly what you did. Thank you very much.
Re: another query using analytics
April 8, 2005 - 3pm Central time zone
Reviewer: Gabe
You werent given any resources
so, I understand your solution was in fact merely a [untested]
suggestion.
create table t1 ( x int primary key );
insert into t1 values (1);
insert into t1 values (2);
insert into t1 values (3);
create table t2 ( x int not null references t1(x), y int not null, z int not null );
insert into t2 values ( 1,7000,1);
insert into t2 values ( 1,7000,6);
insert into t2 values ( 1,8000,8);
insert into t2 values ( 2,7000,1);
insert into t2 values ( 2,7000,5);
insert into t2 values ( 3,7000,3);
insert into t2 values ( 3,8000,1);
insert into t2 values ( 3,8000,7);
insert into t2 values ( 3,9000,5);
My solution (avoiding the distinct) is not necessarily better than the one presented by the A
reader, but here it goes:
flip@FLOP> select x, y, c
2 ,min(f) over (partition by x) f
3 ,max(l) over (partition by x) l
4 from (
5 select t2.x, t2.y, count(*) c
6 ,min(t2.z) keep (dense_rank first order by t2.z) f
7 ,max(t2.z) keep (dense_rank last order by t2.z) l
8 from t1, t2
9 where t1.x = t2.x
10 group by t2.x, t2.y
11 ) t
12 ;
X Y C F L
---------- ---------- ---------- ---------- ----------
1 7000 2 1 8
1 8000 1 1 8
2 7000 2 1 5
3 7000 1 1 7
3 8000 2 1 7
3 9000 1 1 7
Cheers.
Followup April 8, 2005 - 3pm Central time zone:
without create tables and inserts, I guess :)
takes too much time to create the setup for every case (wish people would read the page that they
have to page down through to put something up here...)
I'm confused
April 18, 2005 - 9pm Central time zone
Reviewer: Mikito
Given that
select distinct deptno
from emp
is essentially
select deptno
from emp
group by deptno
how is distinct query should be rewritten in case with analytics columns? Neither
SELECT deptno, count(1),
min(sal) over (partition by deptno) f
from emp
group by deptno,min(sal) over (partition by deptno);
nor
SELECT deptno, count(1),
min(sal) over (partition by deptno) f
from emp
group by deptno,f;
seems to be a valid syntax.
(To repeat: "Does analytics scale?")
Followup April 19, 2005 - 7am Central time zone:
why would you use analytics that way?
Tell us the question, we'll tell you the method.
select deptno, count(*) /* because count(1) is counter-intuitive */,
min(sal) over (partition by deptno) f
from emp
group by deptno, min(sal) over (partition by deptno)
would not make sense. You are saying "get all deptnos, by deptno find the minimum salary and
associate that number with each one, then aggregate by deptno/min salary to count records"
You should just ask:
find the minimum salary and count of records by deptno.
select deptno, count(*), min(sal) from emp group by deptno;
is what you were looking for. analytics scale up wonderful. Say the question was instead:
you have a table full of records that have a customer_id and a last_sale_date, I would like you to
retrieve the last record for each customer.
select *
from ( select cust.*, max(sale_date) over (partition by cust_id) lsd
from cust )
where sale_date = lsd;
versus
select *
from cust
where sale_date =
(select max(sale_date) from cust c2 where cust_id = cust.cust_id )
/
or
select *
from cust, (select cust_id, max(sale_date) lsd from cust group by cust_id)x
where cust.cust_id = x.cust_id
and cust.sale_date = x.lsd
/
for example
Tricky SQL?
April 19, 2005 - 10am Central time zone
Reviewer: A reader
CREATE TABLE master
(
m_no INTEGER PRIMARY KEY,
m_name VARCHAR2(255) NOT NULL UNIQUE
);
create table detail
(
d_pk integer primary key,
d_no integer not null references m(m_no),
d_date date,
d_data varchar2(255)
);
Given a d_pk, how can I get the second-to-last (ordered by d_date) record from M for that M_NAME?
In other words, for a given m_name, there are multiple records in "detail" with different dates.
Given one of those records, I want the prior record in "detail" (there might not be any)
I tried to design a simple master detail table, but maybe I over-normalized?
Thanks
Followup April 19, 2005 - 12pm Central time zone:
are you saying "i have a detail record, I want the detail record that came 'in front' of this one"?
that is what I sort of hear, but the second to last is confusing me.
select *
from (
select ...., lead(d_pk) over (order by d_date) next_pk
from master, detail
where master.m_no = (select d_no from detail where d_pk = :x)
and master.m_no = detail.d_no
)
where next_pk = :x;
I think that does that. You get the master/detail for that d_pk (inline view)
Use lead to assign to each record the "next pk" after sorting by d_date
Keep the record whose 'next' records primary key was the one you wanted..
a little inconsistency
April 19, 2005 - 1pm Central time zone
Reviewer: mikito
I meant inconsistency, not scalability. Why "distinct"
SELECT distinct deptno,
min(sal) over (partition by deptno) f
from emp
is allowed, whereas "group by" doesn't? If someone has trouble understanding what analytics with
"group by" means, the same should apply to analytics with "distinct" as well.
Followup April 19, 2005 - 1pm Central time zone:
because group by is not distinct, they are frankly very different concepts.
detail and summery in one sql statement
April 27, 2005 - 3pm Central time zone
Reviewer: A reader
hi tom,
quick shot. i have to process many detail (column a - f) and one summery record (containing sum
(column c) + count (*) over all recs + some literal placeholders) within one sql statement. is
there another way then using a classical UNION ALL select? any new way with analytical functions?
Followup April 27, 2005 - 3pm Central time zone:
need small example, did not follow your example as stated.
detail and summery in one sql statement
April 28, 2005 - 10am Central time zone
Reviewer: A reader
hi tom,
here is the small and simple test case to show what i mean.
SQL> create table t1 (col1 number primary key, col2 number, col3 number);
Tabelle wurde angelegt.
SQL> create table t2 (col0 number primary key, col1 number references t1 (col1), col2 number, col3
number, col4 number);
Tabelle wurde angelegt.
SQL> create index t2_col1 on t2 (col1);
Index wurde angelegt.
SQL> insert into t1 values (1, 1, 1);
1 Zeile wurde erstellt.
SQL> insert into t2 values (1, 1, 1, 1, 1);
1 Zeile wurde erstellt.
SQL> insert into t2 values (2, 1, 2, 2, 2);
1 Zeile wurde erstellt.
SQL> insert into t2 values (3, 1, 3, 3, 3);
1 Zeile wurde erstellt.
SQL> analyze table t1 compute statistics;
Tabelle wurde analysiert.
SQL> analyze table t2 compute statistics;
Tabelle wurde analysiert.
SQL> select 0 rowtype, t1.col1 display1, t1.col2 display2, t2.col3 display3, t2.col4 display4
2 from t1 join t2 on (t1.col1 = t2.col1)
3 where t1.col1 = 1
4 UNION ALL
5 select 1 rowtype, t1.col1, count (*), null, sum (t2.col4)
6 from t1 join t2 on (t1.col1 = t2.col1)
7 where t1.col1 = 1
8 group by t1.col1
9* order by rowtype
ROWTYPE DISPLAY1 DISPLAY2 DISPLAY3 DISPLAY4
---------- ---------- ---------- ---------- ----------
0 1 1 1 1
0 1 1 2 2
0 1 1 3 3
1 1 3 6
that is creating detail + summary record within one sql statement!
Followup April 28, 2005 - 10am Central time zone:
ops$tkyte@ORA10G> select grouping_id(t1.col2) rowtype,
2 t1.col1 d1,
3 decode( grouping_id(t1.col2), 0, t1.col2, count(*) ) d2,
4 decode( grouping_id(t1.col2), 0, t2.col3, null ) d3,
5 decode( grouping_id(t1.col2), 0, t2.col4, sum(t2.col4) ) d4
6 from t1, t2
7 where t1.col1 = t2.col1
8 group by grouping sets((t1.col1),(t1.col1,t1.col2,t2.col3,t2.col4))
9 /
ROWTYPE D1 D2 D3 D4
---------- ---------- ---------- ---------- ----------
0 1 1 1 1
0 1 1 2 2
0 1 1 3 3
1 1 3 6
detail and summery in one sql statement
April 29, 2005 - 10am Central time zone
Reviewer: A reader
hi tom,
thanks for your help. that's exactly what i need. analytics rock, analytics roll as you said. :)
unfortunately it is hard to get. :(
i looked in the documentation but cannot understand the grouping_id values in the example. please
could you explain? what is "2" or "3" in the grouping column?
Examples
The following example shows how to extract grouping IDs from a query of the sample table sh.sales:
SELECT channel_id, promo_id, sum(amount_sold) s_sales,
GROUPING(channel_id) gc,
GROUPING(promo_id) gp,
GROUPING_ID(channel_id, promo_id) gcp,
GROUPING_ID(promo_id, channel_id) gpc
FROM sales
WHERE promo_id > 496
GROUP BY CUBE(channel_id, promo_id);
C PROMO_ID S_SALES GC GP GCP GPC
- ---------- ---------- ---------- ---------- ---------- ----------
C 497 26094.35 0 0 0 0
C 498 22272.4 0 0 0 0
C 499 19616.8 0 0 0 0
C 9999 87781668 0 0 0 0
C 87849651.6 0 1 1 2
I 497 50325.8 0 0 0 0
I 498 52215.4 0 0 0 0
I 499 58445.85 0 0 0 0
I 9999 169497409 0 0 0 0
I 169658396 0 1 1 2
P 497 31141.75 0 0 0 0
P 498 46942.8 0 0 0 0
P 499 24156 0 0 0 0
P 9999 70890248 0 0 0 0
P 70992488.6 0 1 1 2
S 497 110629.75 0 0 0 0
S 498 82937.25 0 0 0 0
S 499 80999.15 0 0 0 0
S 9999 267205791 0 0 0 0
S 267480357 0 1 1 2
T 497 8319.6 0 0 0 0
T 498 5347.65 0 0 0 0
T 499 19781 0 0 0 0
T 9999 28095689 0 0 0 0
T 28129137.3 0 1 1 2
497 226511.25 1 0 2 1
498 209715.5 1 0 2 1
499 202998.8 1 0 2 1
9999 623470805 1 0 2 1
624110031 1 1 3 3
How to do this using Analytics
May 5, 2005 - 5pm Central time zone
Reviewer: A reader
Hello Sir,
I have a denormalized table dept_emp of which part of it I have reproduced here.It
has/will have dupes .
I need to find out all emps which belong to more than one dept using Analytics ( Want to avoid self
join ).
So the required output must be :
DEPTNO DNAME EMPNO ENAME
------ ---------- ----- --------------------
10 D10 1 E1
10 D10 1 E1
10 D10 2 E2
10 D10 2 E2
20 D20 1 E1
20 D20 1 E1
20 D20 2 E2
20 D20 2 E2
From the total set of :
SELECT * FROM DEPT_EMP ORDER BY DEPTNO ,EMPNO
DEPTNO DNAME EMPNO ENAME
------ ---------- ----- --------------------
10 D10 1 E1
10 D10 1 E1
10 D10 2 E2
10 D10 2 E2
10 D10 3 E3
10 D10 3 E3
20 D20 1 E1
20 D20 1 E1
20 D20 2 E2
20 D20 2 E2
20 D20 4 E4
20 D20 4 E4
20 D20 5 E5
20 D20 5 E5
14 rows selected
create table dept_emp (deptno number , dname varchar2(10) ,empno number ,ename varchar2(20) ) ;
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 1, 'E1');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 2, 'E2');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 3, 'E3');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 4, 'E4');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 5, 'E5');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 1, 'E1');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 2, 'E2');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 1, 'E1');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 2, 'E2');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
10, 'D10', 3, 'E3');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 4, 'E4');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 5, 'E5');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 1, 'E1');
INSERT INTO DEPT_EMP ( DEPTNO, DNAME, EMPNO, ENAME ) VALUES (
20, 'D20', 2, 'E2');
COMMIT ;
Thanx
Followup May 5, 2005 - 6pm Central time zone:
no analytics
select empno, count(distinct deptno)
from t
group by empno
having count(distinct deptno) > 1;
Thanx Sir
May 5, 2005 - 9pm Central time zone
Reviewer: A reader
Actually I was planing to use analytics to get the whole row info, will do the same trick with
Analytics, then.
You are a Genius.
Followup May 6, 2005 - 7am Central time zone:
select *
from (
select t.*, count(distinct deptno) over (partition by empno) cnt
from t
)
where cnt > 1;
Analytical solution
May 10, 2005 - 6am Central time zone
Reviewer: Baiju Menon from India
Sir,
I want to list the department and the maximum number of employees working in that department by
using Analytical function(only the department in which the maximum number of employees are working)
the query without the Analytical function is
select deptno, count(deptno) from emp group by deptno having count(deptno) in (select
max(count(deptno)) from emp group by deptno)
Thanks
Followup May 10, 2005 - 9am Central time zone:
1 select deptno, cnt
2 from (
3 select deptno, cnt, max(cnt) over() max_cnt
4 from (
5 select deptno, count(*) cnt
6 from emp
7 group by deptno
8 )
9 )
10* where cnt = max_cnt
scott@ORA9IR2> /
DEPTNO CNT
---------- ----------
30 6
group by
May 11, 2005 - 4am Central time zone
Reviewer: Anoop Gupta from INDIA
Hi Tom,
I have a table in table data is like this
empid leavelname
1001 Level1
1001 Level2
1001 Level3
1001 Level4
1002 Level1
1002 Level2
1002 Level3
...
...
Means this table tell on which levels employee is assigned.
Is there any query posible that will retrun data llike this without writing a function.
empid emp_assigned on leavel
1001 level1,level2,level3,level4
1002 level1,level2,level3
...
...
Waiting for your response.....
Followup May 11, 2005 - 7am Central time zone:
only if there is some reasonable maximum number of levelname rows per empid.
is there?
Analytics Rock - But why are they slower for me
May 13, 2005 - 1am Central time zone
Reviewer: Jeff Plumb from Melbourne, Australia
Hi Tom,
I have followed you example about Analytics from Effective Oracle by Design on page 516 (Find a
specific row in a partition). When I run the example and tkprof the 3 different queries, the
analytics actually takes a lot longer to run, but it does do less logical I/O's. It is doing a lot
more physical I/O's so I am guessing that it is using a temporary segment on disk to perform the
window sort. To perform the test I created the big_table that you use and populated it with
1,000,000 rows. I am using Oracle 9i release 2. Here is the output from TKPROF:
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33
********************************************************************************
select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.owner = t.owner)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 8 5.32 6.42 13815 14669 0 694
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10 5.32 6.42 13815 14669 0 694
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33
Rows Row Source Operation
------- ---------------------------------------------------
694 HASH JOIN
20 VIEW
20 SORT GROUP BY
1000000 TABLE ACCESS FULL BIG_TABLE
1000000 TABLE ACCESS FULL BIG_TABLE
********************************************************************************
select t.owner, t.object_name, t.created
from big_table t
join (select owner, max(created) maxcreated
from big_table
group by owner) t2
on (t2.owner = t.owner and t2.maxcreated = t.created)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 8 5.03 5.06 13816 14669 0 694
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10 5.03 5.06 13816 14669 0 694
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33
Rows Row Source Operation
------- ---------------------------------------------------
694 HASH JOIN
20 VIEW
20 SORT GROUP BY
1000000 TABLE ACCESS FULL BIG_TABLE
1000000 TABLE ACCESS FULL BIG_TABLE
********************************************************************************
select owner, object_name, created
from
( select owner, object_name, created, max(created) over (partition by owner) as maxcreated
from big_table
)
where created = maxcreated
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 8 16.68 40.66 15157 7331 17 694
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10 16.68 40.66 15157 7331 17 694
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 33
Rows Row Source Operation
------- ---------------------------------------------------
694 VIEW
1000000 WINDOW SORT
1000000 TABLE ACCESS FULL BIG_TABLE
********************************************************************************
And when I run the query with the analytics using autotrace I get the following which shows a sort
to disk:
SQL*Plus: Release 9.2.0.6.0 - Production on Fri May 13 14:53:08 2005
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
With the Partitioning option
JServer Release 9.2.0.6.0 - Production
control@DWDEV> set autot traceonly
control@DWDEV> select owner, object_name, created
2 from
3 ( select owner, object_name, created, max(created) over (partition by owner) as maxcreated
4 from big_table
5 )
6 where created = maxcreated;
694 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4399 Card=1000000 Bytes=52000000)
1 0 VIEW (Cost=4399 Card=1000000 Bytes=52000000)
2 1 WINDOW (SORT) (Cost=4399 Card=1000000 Bytes=43000000)
3 2 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=637 Card=1000000 Bytes=43000000)
Statistics
----------------------------------------------------------
0 recursive calls
17 db block gets
7331 consistent gets
15348 physical reads
432 redo size
12784 bytes sent via SQL*Net to client
717 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
694 rows processed
So how can I stop the sorts (disk)? I am guessing that the pga_aggregate_target needs to be higher,
but it seems to already be set quite high.
control@DWDEV> show parameter pga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 524288000
I hope you can help clarify how to make the anayltic version run quicker.
Thanks.
Followup May 13, 2005 - 9am Central time zone:
it'll be a function of the number of "owners" here
You have 1,000,000 records.
You have but 20 users.
in this extreme case, having 50,000 records per window and swapping out was not as good as
squashing the data down to 20 records and joining -- the CBO quite smartly rewrote:
select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.owner = t.owner)
as
select ...
from big_table t, (select owner,max(created) created from big_table t2 ...)
where ....
So, does the data you analyze to find the "most current record" tend to have 50,000 records/key in
real life?
In your case, your hash table didn't spill to disk. In real life though, the numbers would
probably be much different. a 1,000,000 row table would have keys with 10 or 100 rows maybe, not
50,000 (in general). There you would find the answer to be very different.
And if you let the sort run in memory it would be different as well -- you would get a max of 25m
given your pga aggregate target setting that may have been too small.
but consider what happens when the size of the "aggregate" goes up, dimishing marginal returns sets
in:
select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.owner = t.owner)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 320 2.06 2.01 26970 29283 0 4775
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 322 2.06 2.01 26970 29283 0 4775
********************************************************************************
select owner, object_name, created
from
( select owner, object_name, created,
max(created) over (partition by owner) as maxcreated
from big_table
)
where created = maxcreated
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 320 4.57 10.05 30603 14484 15 4775
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 322 4.57 10.05 30603 14484 15 4775
********************************************************************************
select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.id = t.id)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66668 7.70 12.04 33787 45393 2 1000000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66670 7.71 12.05 33787 45393 2 1000000
********************************************************************************
select owner, object_name, created
from
( select owner, object_name, created,
max(created) over (partition by id) as maxcreated
from big_table
)
where created = maxcreated
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66668 7.00 9.60 9336 14484 2 1000000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66670 7.00 9.60 9336 14484 2 1000000
and, given sufficient space to work "in memory", these two big queries both benefited:
select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.owner = t.owner)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 320 1.82 1.96 9909 29283 0 4775
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 322 1.83 1.97 9909 29283 0 4775
********************************************************************************
select owner, object_name, created
from
( select owner, object_name, created,
max(created) over (partition by owner) as maxcreated
from big_table
)
where created = maxcreated
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 320 2.15 2.11 2858 14484 0 4775
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 322 2.15 2.11 2858 14484 0 4775
********************************************************************************
select owner, object_name, created
from big_table t
where created = (select max(created)
from big_table t2
where t2.id = t.id)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66668 7.64 7.55 10181 94633 0 1000000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66670 7.65 7.56 10181 94633 0 1000000
********************************************************************************
select owner, object_name, created
from
( select owner, object_name, created,
max(created) over (partition by id) as maxcreated
from big_table
)
where created = maxcreated
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66668 5.69 5.49 2699 14484 0 1000000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66670 5.69 5.49 2699 14484 0 1000000
(this was a dual cpu xeon using 'nonparallel' query in this case, once with a 256mb pga aggregate
target and again with a 2gig one

May 14, 2005 - 3am Central time zone
Reviewer: kuldeep from India
Dear Tom,
I have three tables t1, t2 & t3. where t2 & t3 is joined with t2 with column "key_id".
Now I need sum of key_values(amount) of t2 and sum of key_values(amount) of t3 for key_id
in table t1.
kuldeep@dlfscg> select * from t1;
KEY_ID KEY_VAL
---------- ----------
2 1980
1 1975
kuldeep@dlfscg> select * from t2;
KEY_ID KEY_VAL
---------- ----------
2 550
2 575
1 500
kuldeep@dlfscg> select * from t3;
KEY_ID KEY_VAL
---------- ----------
2 900
1 1000
1 750
***** QUERY 1 *****
kuldeep@dlfscg> SELECT t1.key_id, SUM(t2.key_val) sum_t2_key_val, SUM(t3.key_val) sum_t3_key_val
2 FROM t1, t2, t3
3 WHERE t1.key_id=t2.key_id
4 AND t1.key_id=t3.key_id
5 GROUP BY t1.key_id
6 /
KEY_ID SUM_T2_KEY_VAL SUM_T3_KEY_VAL
---------- -------------- --------------
1 1000 1750
2 1125 1800
***** QUERY 2 *****
kuldeep@dlfscg> SELECT t1.key_id, t2.sum_t2_key_val, t3.sum_t3_key_val
2 FROM t1,
3 (SELECT key_id, SUM(key_val) sum_t2_key_val FROM t2 GROUP BY key_id) t2,
4 (SELECT key_id, SUM(key_val) sum_t3_key_val FROM t3 GROUP BY key_id) t3
5 WHERE t1.key_id=t2.key_id
6 AND t1.key_id=t3.key_id
7 /
KEY_ID SUM_T2_KEY_VAL SUM_T3_KEY_VAL
---------- -------------- --------------
1 500 1750
2 1125 900
Query 1 is giving wrong result and I can not use query 2 whose performance is very poor.
Oracle 9i has added a lot of new grouping features and a lot of analytic functions (all going over
the head).
Is there any "special" sum function or way which should pick value only once belonging to a row (or
query's key, here "key_id")
irrespective of how many time it is appearing on query result.
KEY_ID T2_KEY_VAL T3_KEY_VAL
---------- ---------- ----------
1 500 1000
1 500 750 <---- 500 of t2 should not be calculated, it is repeat
2 550 900
2 575 900 <---- 900 of t3 should not be calculated, it is repeat
thanks and regards,
Followup May 14, 2005 - 9am Central time zone:
select t1.key_id, t2.sum_val, t3.sum_val
from t1,
(select key_id, sum(val) sum_val from t2 group by key_id ) t2,
(select key_id, sum(val) sum_val from t3 group by key_id ) t3
WHERE t1.key_id=t2.key_id
AND t1.key_id=t3.key_id
apply an amount across multiple records
May 15, 2005 - 8pm Central time zone
Reviewer: Dave from Seattle
I have a problem similar to what I call the invoice payment problem.
It would seem to be a common problem, but I have searched to no avail.
The idea is that a customer may have many outstanding invoices, and sends in a check for an
arbitrary amount. So we need to apply the money across the invoices oldest first.
Note that in my specific case, if a payment exceeds the total outstanding, the excess is ignored
(obviously not dealing with real money here!)
create table invoices (
cust_nbr integer not null,
invoice_nbr integer not null,
invoice_amt number not null,
payment_amt number not null,
primary key (cust_nbr, invoice_nbr)
);
begin
delete from invoices;
dbms_random.seed(123456789);
for c in 1 .. 2 loop
for i in 1 .. 3 loop
insert into invoices values (c, i, round(dbms_random.value * 10, 2)+1, 0);
end loop;
end loop;
update invoices
set payment_amt = round(dbms_random.value * invoice_amt, 2)
where invoice_nbr = 1;
commit;
end;
/
select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
invoice_amt - payment_amt outstanding_amt
from invoices
where invoice_amt - payment_amt > 0
order by cust_nbr, invoice_nbr;
CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT OUTSTANDING_AMT
---------- ----------- ----------- ----------- ---------------
1 1 9.44 5.55 3.89
1 2 3.21 0 3.21
1 3 2.78 0 2.78
2 1 7.57 4.3 3.27
2 2 9.46 0 9.46
2 3 5.92 0 5.92
variable cust_nbr number;
variable received_amt number;
begin
:cust_nbr := 1;
:received_amt := 7.25;
end;
/
update invoices i1
set payment_amt = (... some query which applies
:received_amt to outstanding_amt ...)
where cust_nbr = :cust_nbr;
result should be:
CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT OUTSTANDING_AMT
---------- ----------- ----------- ----------- ---------------
1 1 9.44 9.44 0
1 2 3.21 3.21 0
1 3 2.78 .15 2.63
2 1 7.57 4.3 3.27
2 2 9.46 0 9.46
2 3 5.92 0 5.92
This is simple to solve in pl/sql with a cursor, but I thought it would be a good test for a
set-based solution with analytics. But after some effort, I'm stumped.
Followup May 16, 2005 - 7am Central time zone:
Using analytics we can see how to apply the inputs:
ops$tkyte@ORA9IR2> select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
2 least( greatest( :received_amt - rt + outstanding_amt, 0 ), outstanding_amt )
amount_to_apply
3 from (
4 select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
5 invoice_amt - payment_amt outstanding_amt,
6 sum(invoice_amt - payment_amt) over (partition by cust_nbr order by invoice_nbr) rt
7 from invoices
8 where cust_nbr = :cust_nbr
9 )
10 order by cust_nbr, invoice_nbr;
CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT AMOUNT_TO_APPLY
---------- ----------- ----------- ----------- ---------------
1 1 9.44 5.55 3.89
1 2 3.21 0 3.21
1 3 2.78 0 .15
Just needed a running total of outstanding amounts to take away from the received amount....
Then, merge:
ops$tkyte@ORA9IR2> merge into invoices
2 using
3 (
4 select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
5 least( greatest( :received_amt - rt + outstanding_amt, 0 ), outstanding_amt )
amount_to_apply
6 from (
7 select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
8 invoice_amt - payment_amt outstanding_amt,
9 sum(invoice_amt - payment_amt) over (partition by cust_nbr order by invoice_nbr) rt
10 from invoices
11 where cust_nbr = :cust_nbr
12 )
13 ) x
14 on ( invoices.cust_nbr = x.cust_nbr and invoices.invoice_nbr = x.invoice_nbr )
15 when matched then update set payment_amt = nvl(payment_amt,0)+x.amount_to_apply
16 when not matched /* never happens... */ then insert (cust_nbr) values (null);
3 rows merged.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select cust_nbr, invoice_nbr, invoice_amt, payment_amt,
2 invoice_amt - payment_amt outstanding_amt
3 from invoices
4 order by cust_nbr, invoice_nbr;
CUST_NBR INVOICE_NBR INVOICE_AMT PAYMENT_AMT OUTSTANDING_AMT
---------- ----------- ----------- ----------- ---------------
1 1 9.44 9.44 0
1 2 3.21 3.21 0
1 3 2.78 .15 2.63
2 1 7.57 4.3 3.27
2 2 9.46 0 9.46
2 3 5.92 0 5.92
6 rows selected.
Group by
May 16, 2005 - 10am Central time zone
Reviewer: Anoop Gupta from INDIA
Reviewer: Anoop Gupta from INDIA
Hi Tom,
As i asked question that
I have a table in table data is like this
empid leavelname
1001 Level1
1001 Level2
1001 Level3
1001 Level4
1002 Level1
1002 Level2
1002 Level3
...
...
Means this table tell on which levels employee is assigned.
Is there any query posible that will retrun data llike this without writing a
function.
empid emp_assigned on leavel
1001 level1,level2,level3,level4
1002 level1,level2,level3
...
...
Give me the way to write a query if Suppose here we have a limitation of levels for an employee is
50.
Please reply....
Followup May 16, 2005 - 1pm Central time zone:
select empid,
rtrim(
max(decode(rn,1,leavelname)) || ',' ||
max(decode(rn,1,leavelname)) || ',' ||
....
max(decode(rn,50,leavelname)), ',' )
from (select empid,
row_number() over (partition by empid order by leavelname) rn,
leavelname
from t
)
group by empid;
special sum
May 17, 2005 - 12am Central time zone
Reviewer: kuldeep from India
Dear Tom,
Thanks for your response and for this useful site.
I was looking for a solution which could avoid these inline views which were making my query to run
slow. I tried for the solution and got this query,
/* DATA VIEW */
kuldeep@dlfscg> SELECT t1.key_id,
2 t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn,
t2.key_val,
3 t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t2_rn,
t3.key_val
4 FROM t1, t2, t3
5 WHERE t1.key_id=t2.key_id
6 AND t1.key_id=t3.key_id
7 ORDER BY t1.key_id
8 /
KEY_ID T2_ROWID T2_RN KEY_VAL T3_ROWID T2_RN KEY_VAL
---------- ------------------ ---------- ---------- ------------------ ---------- ----------
1 AAANZ5AAHAAAD94AAA 1 500 AAANZ4AAHAAAD9wAAA 1 1000
1 AAANZ5AAHAAAD94AAA 2 500 AAANZ4AAHAAAD9wAAB 1 750
2 AAANZ5AAHAAAD91AAA 1 550 AAANZ4AAHAAAD9tAAA 1 900
2 AAANZ5AAHAAAD91AAB 1 575 AAANZ4AAHAAAD9tAAA 2 900
/* FINAL QUERY */
kuldeep@dlfscg> SELECT key_id,
2 SUM(DECODE(t2_rn,1,t2_key_val,0)) t2_key_val,
3 SUM(DECODE(t3_rn,1,t3_key_val,0)) t3_key_val
4 FROM (SELECT t1.key_id,
5 t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn,
t2.key_val t2_key_val,
6 t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t3_rn,
t3.key_val t3_key_val
7 FROM t1, t2, t3
8 WHERE t1.key_id=t2.key_id
9 AND t1.key_id=t3.key_id)
10 GROUP BY key_id
11 /
KEY_ID T2_KEY_VAL T3_KEY_VAL
---------- ---------- ----------
1 500 1750
2 1125 900
regards,
Followup May 17, 2005 - 8am Central time zone:
one would need more information -- it APPEARS that you are trying to get a "random first hit" from
T2 and T3 by T1.key_id
That is, for every row in T1 -- find the first match (any match will do) in T2 and in T3
report that value
is that correct.
and how big are t1,t2,t3, and how long is long.
group by
May 17, 2005 - 9am Central time zone
Reviewer: Anoop Gupta from INDIA
Tom,
Thanks for your prompt response.
Analytical Problem
May 18, 2005 - 4am Central time zone
Reviewer: Imran
Look at the following two queries.
SQL> SELECT phone, MONTH, arrears, this_month, ABS (up_down),
2 CASE
3 WHEN up_down < 0
4 THEN 'DOWN'
5 WHEN up_down > 0
6 THEN 'UP'
7 ELSE 'BALANCE'
8 END CASE,
9 prev_month
10 FROM (SELECT exch || ' - ' || phone phone,
11 TO_CHAR (TO_DATE (MONTH, 'YYMM'), 'Mon, YYYY') MONTH, region,
12 instdate, paybefdue this_month, arrears,
13 LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC) prev_month,
14 paybefdue
15 - (LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC)) up_down
16 FROM ptc
17 WHERE phone IN (7629458));
PHONE MONTH ARREARS THIS_MONTH ABS(UP_DOWN) CASE PREV_MONTH
--------------- --------------- ---------- ---------- ------------ ------- ----------
202 - 7629458 Apr, 2005 2562.52 5265 5265 UP 0
SQL> SELECT phone, MONTH, arrears, this_month, ABS (up_down),
2 CASE
3 WHEN up_down < 0
4 THEN 'DOWN'
5 WHEN up_down > 0
6 THEN 'UP'
7 ELSE 'BALANCE'
8 END CASE,
9 prev_month
10 FROM (SELECT exch || ' - ' || phone phone,
11 TO_CHAR (TO_DATE (MONTH, 'YYMM'), 'Mon, YYYY') MONTH, region,
12 instdate, paybefdue this_month, arrears,
13 LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC) prev_month,
14 paybefdue
15 - (LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC)) up_down
16 FROM ptc
17 WHERE phone IN (7629459));
PHONE MONTH ARREARS THIS_MONTH ABS(UP_DOWN) CASE PREV_MONTH
--------------- --------------- ---------- ---------- ------------ ------- ----------
202 - 7629459 Apr, 2005 3516.62 7834 7834 UP 0
SQL>
Now when I combine the two queries results are different.
1 SELECT phone, MONTH, arrears, this_month, ABS (up_down),
2 CASE
3 WHEN up_down < 0
4 THEN 'DOWN'
5 WHEN up_down > 0
6 THEN 'UP'
7 ELSE 'BALANCE'
8 END CASE,
9 prev_month
10 FROM (SELECT exch || ' - ' || phone phone,
11 TO_CHAR (TO_DATE (MONTH, 'YYMM'), 'Mon, YYYY') MONTH, region,
12 instdate, paybefdue this_month, arrears,
13 LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC) prev_month,
14 paybefdue
15 - (LEAD (paybefdue, 1, 0) OVER (ORDER BY MONTH DESC)) up_down
16 FROM ptc
17* WHERE phone IN (7629458,7629459))
SQL> /
PHONE MONTH ARREARS THIS_MONTH ABS(UP_DOWN) CASE PREV_MONTH
--------------- --------------- ---------- ---------- ------------ ------- ----------
202 - 7629458 Apr, 2005 2562.52 5265 2569 DOWN 7834
202 - 7629459 Apr, 2005 3516.62 7834 7834 UP 0
So you note that prev Month balance now disturbs badly.
Please tell me how to do this
Followup May 18, 2005 - 8am Central time zone:
need test case. create table, insert's (like the page used to submit this said....)
Use of analytic functions in UPDATE statements
May 18, 2005 - 12pm Central time zone
Reviewer: Bob Lyon from Houston
Tom,
-- Given this sample data
CREATE TABLE GT (
XP_ID INTEGER,
OFFSET INTEGER,
PMAX NUMBER,
PRIOR_PMAX NUMBER
);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 123, 1, 3);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 123, 2, 8);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 155, 3, 5);
INSERT INTO GT (XP_ID, OFFSET, PMAX) VALUES( 173, 3, 7.3);
-- I want to update the table and set the PRIOR_PMAX column values to be as follows
SELECT XP_ID, OFFSET, PMAX,
LAG(PMAX, 1, NULL) OVER (PARTITION BY XP_ID
ORDER BY XP_ID, OFFSET) PRIOR_PMAX
FROM GT
ORDER BY XP_ID, OFFSET;
XP_ID OFFSET PMAX PRIOR_PMAX
---------- ---------- ---------- ----------
123 1 3
123 2 8 3
155 3 5
173 3 7.3
-- My insert to do this tells me "4 rows updated.", but does not do what I want.
UPDATE GT A
SET PRIOR_PMAX = (
SELECT LAG(B.PMAX, 1, NULL) OVER (PARTITION BY B.XP_ID
ORDER BY B.XP_ID, B.OFFSET) PRIOR_PMAX
FROM GT B
WHERE A.ROWID = B.ROWID
);
-- but I get
SELECT xp_id, offset, pmax, prior_pmax
FROM GT
ORDER BY xp_id, offset;
XP_ID OFFSET PMAX PRIOR_PMAX
---------- ---------- ---------- ----------
123 1 3
123 2 8
155 3 5
173 3 7.3
-- Oracle doc states
-- "Therefore, analytic functions can appear only in the select list or ORDER BY clause."
-- which is perhaps a little ambiguous in this case.
-- Is there a way to do this update is "Straight SQL"?
Followup May 18, 2005 - 12pm Central time zone:
you can merge
merge into gt a
using ( SELECT rowid rid, XP_ID, OFFSET, PMAX,
LAG(PMAX, 1, NULL) OVER (PARTITION BY XP_ID
ORDER BY XP_ID, OFFSET) PRIOR_PMAX
FROM GT )b
on (a.rowid = b.rowid)
when matched then update ...
when not matched (never happens, just do a dummy insert of a single null in 9i or leave off
entirely in 10g)
special sum
May 19, 2005 - 1am Central time zone
Reviewer: Kuldeep from India
My requirement was like this : I have receivables (bills, debit notes etc.) which I adjusts against
the received payments and credit note (both are in seperate tables). To know the outstanding I was
joining (outer join) my receivables with payments and credit notes.
Because one receivable can be adjusted against many payments and credit notes so outstanding
payment was like this:
outstanding = receivable amount - sum(payment amount) - sum(credit note amount)
this simple query using outer join was giving wrong result if a receivable is adjusted against one
payment and more than one credit note or vice versa.
in this case where
receivable : 1000 payment : 400 CN : 400, 200
will appear as
1000 400 400
1000 400 200
--- ---
800 600 outstanding = -400 (wrong)
My t1, t2 and t3 has 600,000, 350,000 and 80,000 row respectively.
This is my actual inline view query
-----------------------------------
SELECT a.bill_type, a.bill_exact_type, a.period_id,
a.scheme_id, a.property_number, a.bill_number,
a.bill_amount, SUM(NVL(c.adj_amt,0)+NVL(p.adjust_amount,0)) adj_amt,
NVL(a.bill_amount,0) - SUM(NVL(c.adj_amt,0)+NVL(p.adjust_amount,0)) pending_amt
FROM ALL_RECEIVABLE a,
(SELECT bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number,
SUM(adj_amt) adj_amt
FROM CREDIT_NOTE_RECEIVABLE
WHERE bill_type=p_bill_type
AND scheme_id=p_scheme
AND property_number=p_prop
GROUP BY bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number) c,
(SELECT bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number,
SUM(adjust_amount) adjust_amount
FROM PAYMENT_RECEIPT_ADJ
WHERE bill_type=p_bill_type
AND scheme_id=p_scheme
AND property_number=p_prop
GROUP BY bill_type, scheme_id, property_number, bill_exact_type, period_id, bill_number) p
WHERE a.bill_type=P_BILL_TYPE
AND a.scheme_id=P_SCHEME
AND a.property_number=P_PROP
AND a.bill_type=c.bill_type(+)
AND a.bill_exact_type=c.bill_exact_type(+)
AND a.period_id=c.period_id(+)
AND a.scheme_id=c.scheme_id(+)
AND a.property_number=c.property_number(+)
AND a.bill_number=c.bill_number(+)
AND a.bill_type=p.bill_type(+)
AND a.bill_exact_type=p.bill_exact_type(+)
AND a.period_id=p.period_id(+)
AND a.scheme_id=p.scheme_id(+)
AND a.property_number=p.property_number(+)
AND a.bill_number=p.bill_number(+)
GROUP BY a.bill_type, a.bill_exact_type, a.period_id, a.scheme_id,
a.property_number, a.bill_number, a.bill_date, a.bill_amount
HAVING (NVL(a.bill_amount,0) - SUM(NVL(c.adj_amt,0)+NVL(p.adjust_amount,0))) > 0
ORDER BY a.bill_date;
-----------------------------------
It is not reporting just the first hit of t1 in t2 and t3. Here in my last posting, I was trying
just to exclude any repeat of t2 and t3's ROW in sum calculation. That means one row of t2 and t3
should be calculated only once.
I have tried this query putting more rows and applied the same on actual query, it is working fine
and giving the same result as previous inline view query was giving.
kuldeep@dlfscg> SELECT t1.key_id,
2 t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn,
t2.key_val,
3 t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t2_rn,
t3.key_val
4 FROM t1, t2, t3
5 WHERE t1.key_id=t2.key_id(+)
6 AND t1.key_id=t3.key_id(+)
7 ORDER BY t1.key_id
8 /
KEY_ID T2_ROWID T2_RN KEY_VAL T3_ROWID T2_RN KEY_VAL
---------- ------------------ ---------- ---------- ------------------ ---------- ----------
1 AAANZ5AAHAAAD94AAA 1 500 AAANZ4AAHAAAD9wAAA 1 1000
1 AAANZ5AAHAAAD94AAA 2 500 AAANZ4AAHAAAD9wAAB 1 750
1 AAANZ5AAHAAAD94AAA 3 500 AAANZ4AAHAAAD9wAAC 1 25
2 AAANZ5AAHAAAD91AAA 1 550 AAANZ4AAHAAAD9tAAA 1 900
2 AAANZ5AAHAAAD91AAB 1 575 AAANZ4AAHAAAD9tAAA 2 900
3 AAANZ5AAHAAAD91AAC 1 222 1
3 AAANZ5AAHAAAD91AAD 1 223 2
4 1 AAANZ4AAHAAAD9tAAB 1 333
8 rows selected.
kuldeep@dlfscg> SELECT key_id,
2 SUM(DECODE(t2_rn,1,t2_key_val,0)) t2_key_val,
3 SUM(DECODE(t3_rn,1,t3_key_val,0)) t3_key_val
4 FROM (SELECT t1.key_id,
5 t2.ROWID t2_rowid, row_number() over (PARTITION BY t2.ROWID ORDER BY t3.ROWID) t2_rn,
t2.key_val t2_key_val,
6 t3.ROWID t3_rowid, row_number() over (PARTITION BY t3.ROWID ORDER BY t2.ROWID) t3_rn,
t3.key_val t3_key_val
7 FROM t1, t2, t3
8 WHERE t1.key_id=t2.key_id(+)
9 AND t1.key_id=t3.key_id(+))
10 GROUP BY key_id
11 /
KEY_ID T2_KEY_VAL T3_KEY_VAL
---------- ---------- ----------
1 500 1775
2 1125 900
3 445 0
4 333
kuldeep@dlfscg>
thanks for your responses.
regards,
Followup May 19, 2005 - 7am Central time zone:
do not order by rowid to get a last row -- is that what you are trying to do??
which row do you want to get from t2 to join with t1
and which row do you want to get from t3 to join with t1
You must specify that based on attributes you manage (eg: there must be an orderable field that
helps you determine WHICH record is the right one)
consider rowid to be a random number that does not have any meaning when ordered by, it does not
imply order of insertion or anything.
null record
May 25, 2005 - 4pm Central time zone
Reviewer: yeshk from St.louis,mo,usa
I need help with this query - This is just a part of the query I am working with.
I am not able to generate a NULL RECORD in between the result set.
I should be able to pass this information out as a reference cursor.
create table test(state varchar2(2),svc_cat varchar2(3),measd_tkt number,non_measd_tkt number);
insert into test values('CA','NDS',100,200);
insert into test values('IL','DSL',200,300);
insert into test values('CA','DSL',100,300);
insert into test values('MO','NDS',1000,300);
insert into test values('MO','DSL',100,200);
I need a result like this
STATE SVC_CAT MEASD_TKT NON MEASD TKT
CA DSL 200 300
CA NDS 100 200
TOTAL 300 500
IL DSL 200 300
TOTAL 200 300
MO DSL 100 200
MO NDS 1000 300
TOTAL 1100 500
I am able to generate the result using a query with analytics.But I dont know how to get an empty
row after each state total
Also,Which is better using cursor
1) cursor based on state.
2) get the data and insert into a temporary table.
3) insert a null record
or use analytics to get complete data and put into a reference cursor.
Thanks
yeshk
Followup May 25, 2005 - 7pm Central time zone:
well, that would sort of be the job of the "pretty printing routine" -- eg: the report generator?
what tool is printing this out?
null record
May 26, 2005 - 9am Central time zone
Reviewer: yeshk from St.louis,mo,usa
we need to give the resultant set with a null record after each state calculation to front-end VB
application. It will be given in a reference cursor.They will just select * from reference cursor
and display it on a report.
Followup May 26, 2005 - 10am Central time zone:
the VB application should do this, (it should be able to do something shouldn't it...)
ops$tkyte@ORA9IR2> select decode( grp, 0, state ) state,
2 decode( grp, 0, svc_cat) svc_cat,
3 decode( grp, 0, sum_mt ) sum_mt,
4 decode( grp, 0, sum_nmt ) sum_nmt
5 from (
6 select grouping(dummy) grp, state, svc_cat, sum(measd_tkt) sum_mt, sum(non_measd_tkt) sum_nmt
7 from (
8 select state, svc_cat, 1 dummy, measd_tkt, non_measd_tkt
9 from test
10 )
11 group by rollup( state, dummy, svc_cat )
12 )
13 /
ST SVC SUM_MT SUM_NMT
-- --- ---------- ----------
CA DSL 100 300
CA NDS 100 200
CA 200 500
IL DSL 200 300
IL 200 300
MO DSL 100 200
MO NDS 1000 300
MO 1100 500
12 rows selected.
Can rollup do the thing??
May 26, 2005 - 9am Central time zone
Reviewer: Bhavesh Ghodasara from Ahmedabad,Gujarat,India
Hi yeshk,
create table test(state varchar2(2),svc_cat varchar2(3),measd_tkt
number,non_measd_tkt number);
insert into test values('CA','NDS',100,200);........
insert into test values('CA','DSL',100,300);....
STATE SVC_CAT MEASD_TKT NON MEASD TKT
CA DSL 200 300 <==From where measd_tkt=200 comes from??????
CA NDS 100 200
TOTAL 300 500
Tom,,Can we do like this,
break on state
select STATE,SVC_CAT,sum(measd_tkt),sum(non_measd_tkt)
from test
group by rollup(STATE,SVC_CAT)
order by state
............
If i make any mistake than please tell me..
Thanks in advance.
Followup May 26, 2005 - 10am Central time zone:
see above
Which analytics to use?
May 30, 2005 - 9pm Central time zone
Reviewer: Marc-Andre Larochelle from Montreal, PQ
Hi Tom,
I have this 3rd party table:
drop table t;
create table t (atype varchar2(4),
acol# varchar2(3),
adin varchar2(8),
ares varchar2(8));
insert into t (atype, acol#, adin) values ('DUPT','001','02246569');
insert into t (atype, acol#, adin) values ('DUPT','002','00021474');
insert into t (atype, acol#, adin) values ('DUPT','003','02246569');
insert into t (atype, acol#, ares) values ('MACT','1','02246569');
insert into t (atype, acol#, ares) values ('MACT','6','02246569');
insert into t (atype, acol#, ares) values ('MACT','7','00021474');
select * from t;
ATYPE ACOL# ADIN ARES
----- ----- -------- --------
DUPT 001 02246569
DUPT 002 00021474
DUPT 003 02246569
MACT 1 02246569
MACT 6 02246569
MACT 7 00021474
I would like to get the following result :
DUPT 001 02246569 MACT 1 02246569
DUPT 002 00021474 MACT 7 00021474
DUPT 003 02246569 MACT 6 02246569
I need to match DUPT.adin=MACT.ares together but making sure MACT.acol# is different for every
DUPT.acol#. Bsically this table has different values in column depending on the type of rows
(atype).
I have tried using lag, lead, rank and nothing seems to work but I am pretty sure it is doable with
analytics which is why I posted my question here.
Any hint/help would be appreciated.
Thank you,
Marc-Andre
Followup May 31, 2005 - 7am Central time zone:
question for you.
How did you know to put:
DUPT 001 02246569 together with MACT 1 02246569 and
DUPT 003 02246569 together with MACT 6 02246569
and not
DUPT 001 02246569 MACT 6 02246569
DUPT 003 02246569 MACT 1 02246569
for example. some missing logic here.
Am I Correct??
May 31, 2005 - 5am Central time zone
Reviewer: Bhavesh Ghodasara from Ahmedabad,Gujarat,India
Hi tom,
I solved above problem...
The query like :
select atyp,acol,aadin,batype,bacol,bares
from (
select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol# bacol,b.ares bares,
nvl(lead(b.acol# ) over(order by a.adin),0) lb,
count(*) over(partition by a.acol#) cnt
from t a,t b
where a.adin=b.ares
order by atyp,acol) t
where bacol<>lb
What i think is there must be a better way...
I know You will do it in much much better way..
Please suggest the corrections.
Thanks in Advance..
Followup May 31, 2005 - 8am Central time zone:
ATYP ACO AADIN BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 001 02246569 MACT 6 02246569
DUPT 002 00021474 MACT 7 00021474
DUPT 003 02246569 MACT 1 02246569
well, it gives a different result than the one you posted, it gives my hypothetical answer -- where
001 was combined with 6, not 1.
We can do this..
May 31, 2005 - 8am Central time zone
Reviewer: Bhavesh Ghodasara from Ahmedabad,Gujarat,India
Hi tom,
I can further modified my query:
now its give desired result....
(Agree that question is ambiguous)
select atyp,acol,aadin,batype,bacol,bares
from (
select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol#
bacol,b.ares bares,
nvl(lead(b.acol# ) over(order by a.adin),0) lb,
min(b.acol#) over(partition by a.acol#) cnt
from t a,t b
where a.adin=b.ares
order by atyp,acol) t
where bacol=lb
or cnt>1
OUTPUT:
ATYP ACO AADIN BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 001 02246569 MACT 1 02246569
DUPT 002 00021474 MACT 7 00021474
DUPT 003 02246569 MACT 6 02246569
So any corrections now??
Thanks in advance
Bhavesh
Followup May 31, 2005 - 8am Central time zone:
I don't know your data well enough, but your query is non-deterministic if you care. Consider:
ops$tkyte@ORA10G> create table t (atype varchar2(4),
2 acol# varchar2(3),
3 adin varchar2(8),
4 ares varchar2(8));
Table created.
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','001','02246569');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','002','00021474');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','003','02246569');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','1','02246569');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','5','02246569');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','6','02246569');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','7','00021474');
1 row created.
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> select atyp,acol,aadin,batype,bacol,bares
2 from (
3 select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol#
4 bacol,b.ares bares,
5 nvl(lead(b.acol# ) over(order by a.adin),0) lb,
6 min(b.acol#) over(partition by a.acol#) cnt
7 from t a,t b
8 where a.adin=b.ares
9 order by atyp,acol) t
10 where bacol=lb
11 or cnt>1;
ATYP ACO AADIN BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 002 00021474 MACT 7 00021474
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> truncate table t;
Table truncated.
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','001','02246569');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','002','00021474');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, adin) values ('DUPT','003','02246569');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','1','02246569');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','6','02246569');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','7','00021474');
1 row created.
ops$tkyte@ORA10G> insert into t (atype, acol#, ares) values ('MACT','5','02246569');
1 row created.
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> select atyp,acol,aadin,batype,bacol,bares
2 from (
3 select a.atype atyp,a.acol# acol,a.adin aadin,b.atype batype,b.acol#
4 bacol,b.ares bares,
5 nvl(lead(b.acol# ) over(order by a.adin),0) lb,
6 min(b.acol#) over(partition by a.acol#) cnt
7 from t a,t b
8 where a.adin=b.ares
9 order by atyp,acol) t
10 where bacol=lb
11 or cnt>1;
ATYP ACO AADIN BATY BAC BARES
---- --- -------- ---- --- --------
DUPT 001 02246569 MACT 6 02246569
DUPT 002 00021474 MACT 7 00021474
Same data both times, just different order of insertions. With analytics and order by, you need to
be concerned about duplicates.
Answers
May 31, 2005 - 11am Central time zone
Reviewer: Marc-Andre Larochelle from Montreal, PQ
Tom, Bhavesh,
The problem resides exactly there: no logic to match the records. I know that DUPT.din1 must have a
MACT.din1 somewhere. I just don't know which one (1st one, 2nd one?). This is a decision I will
have to take.
DUPT 001 02246569 MACT 1 02246569
DUPT 003 02246569 MACT 6 02246569
and
DUPT 001 02246569 MACT 6 02246569
DUPT 003 02246569 MACT 1 02246569
are the same to me. But when I run the query, I want to always get the same results.
Anyways, all in all, your queries (Bhavesh - thank you - and yours) seem to answer to my question.
I will watch out for duplicates.
Thank you very much for the quick help.
Marc-Andre
What I found
May 31, 2005 - 5pm Central time zone
Reviewer: Marc-Andre Larochelle from Montreal, PQ
Hi Tom,
Testing the SQL statement Bhavesh provided, I quickly discovered what you meant when saying the
query was non-deterministic. When I added a 4th record :
insert into t (atype,acol#,adin) values ('DUPT','004','02246569');
insert into t (atype,acol#,ares) values ('MACT','5','02246569');
only one row was returned. I played with the query and here is what I came up with :
select atyp,acol,aadin,batype,bacol,bares
from (
select atyp,acol,aadin,batype,bacol,bares,drnk ,
rank() over (partition by acol order by bacol) rnk
from (
select a.atype atyp,
a.acol# acol,
a.adin aadin,
b.atype batype,
b.acol# bacol,
b.ares bares,
dense_rank() over (partition by a.atype,a.adin order by a.acol#) drnk
from t a,t b
where a.adin=b.ares))
where drnk=rnk;
Feel free to comment.
Again thank you (and Bhavesh).
Marc-Andre
Using Analytical Values to find latest info
June 3, 2005 - 10am Central time zone
Reviewer: anirudh from newyork, NY
Hi Tom,
we have a fairly large table with about 100 million rows, among others this table has
the following columns
CREATE TABLE my_fact_table (
staff_number VARCHAR2 (10), -- staff number
per_end_dt DATE, -- last day of month
engagement_code VARCHAR2 (30), -- engagement code
client_code VARCHAR2 (20), -- client code
revenue NUMBER (15,2) -- revenue
)
in this table the same engagement code can have different client codes for diffenet periods. This
was at one point desirable and that is the reason client code was stored in this fact table instead
of the engagement dimension.
Our users now want us to update the client code in these transactions to the latest value of the
client code (meaning - pick the client from the latest month for which we have got any transactions
for that engagement)
This situation where same engagement has multiple clients across periods is there for about 5 % of
the rows.
[btw - we do plan to do data-model change to reflect the new relationships - but that may take some
time - hence the interim need to just update the fact table]
to implemnt these updates that may happen for several months, I'm trying to take the approach below
which involve multiple queries and creation of a couple of temp tables - does it seem reasonable. i
have a lurking feeling that with a deeper understanding of Analytic functions this can be further
simplified - will appreciate your thoughts.
============= My Approach =================
-- Find the Engagements that have multiple Clients
CREATE TABLE amtest_mult_cli AS
WITH
v1 AS (SELECT DISTINCT engagement_code,client_code
FROM my_fact_table)
SELECT engagement_code
FROM v1
GROUP BY engagement_code
HAVING COUNT(*) > 1
-- Find What should be the correct client for those engagements
CREATE TABLE amtest_use_cli AS
SELECT engagement_code,per_end_dt,client_code
FROM
(
SELECT engagement_code,per_end_dt,client_code
row_number() OVER (PARTITION BY engagement_code
ORDER BY per_end_dt DESC, client_code DESC)
row_num
FROM my_fact_table a,
amtest_mult_cli b
WHERE a.engagement_code = b.engagement_code
)
WHERE row_num = 1;
-- Update Correct Clients for those engagements
UPDATE my_fact_table a
SET a.client_code =
(SELECT b.client_code
FROM amtest_use_cli b
WHERE a.engagement_code = b.engagement_code)
WHERE EXISTS
(SELECT 1
FROM amtest_use_cli c
WHERE a.engagement_code = c.engagement_code);
======================================================
Followup June 3, 2005 - 12pm Central time zone:
why not:
merge into my_fact_table F
using
( select engagement_code,
substr(max(to_char(per_end_dt,'yyyymmddhh24miss')||client_code ),15) cc
from my_fact_table
group by engagement_code
having count(distinct client_code) > 1 ) X
on ( f.engagement_code = x.engagement_code )
when matched
then update set client_code = x.cc
when not matched
then insert ( client_code ) values ( null ); <<== never can happen
<<== in 10g, not needed!
That select finds the client_code for the max per_end_dt by engagement_code for engagement_code's
that have more than one distinct client_code....
first_value(client_code)
over (partition by engagement_code
order by per_end_dt desc, client_code desc ),
count(distinct client_code)
help with lead
June 9, 2005 - 1am Central time zone
Reviewer: Adolph from india
I have a table in the following structure:
create table cs_fpc_pr
(PRGM_C VARCHAR2(10) not null,
fpc_date date not null,
TIME_code VARCHAR2(3) not null,
SUN_TYPE varchar2(1))
insert into cs_fpc_pr values ('PRGM000222', to_date('08-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000222', to_date('09-may-2005','dd-mon-rrrr'), '05','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('09-may-2005','dd-mon-rrrr'), '25','1');
insert into cs_fpc_pr values ('PRGM000222', to_date('09-may-2005','dd-mon-rrrr'), '45','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('10-may-2005','dd-mon-rrrr'), '05','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('10-may-2005','dd-mon-rrrr'), '25','1');
insert into cs_fpc_pr values ('PRGM000222', to_date('10-may-2005','dd-mon-rrrr'), '45','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('14-may-2005','dd-mon-rrrr'), '05','3');
insert into cs_fpc_pr values ('PRGM000222', to_date('14-may-2005','dd-mon-rrrr'), '24','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '23','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '47','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('08-may-2005','dd-mon-rrrr'), '48','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('09-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('09-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('09-may-2005','dd-mon-rrrr'), '46','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('10-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('10-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('10-may-2005','dd-mon-rrrr'), '46','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('11-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('11-may-2005','dd-mon-rrrr'), '33','1');
insert into cs_fpc_pr values ('PRGM000242', to_date('11-may-2005','dd-mon-rrrr'), '46','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('14-may-2005','dd-mon-rrrr'), '07','3');
insert into cs_fpc_pr values ('PRGM000242', to_date('14-may-2005','dd-mon-rrrr'), '23','1');
commit;
select prgm_c,fpc_date,time_code,sun_type,
lead(fpc_date) over(partition by prgm_C order by fpc_date) next_date
from cs_fpc_pr
order by prgm_c,fpc_date,time_code;
PRGM_C FPC_DATE TIM S NEXT_DATE
---------- --------- --- - ---------
PRGM000222 08-MAY-05 33 1 09-MAY-05
PRGM000222 09-MAY-05 05 3 09-MAY-05
PRGM000222 09-MAY-05 25 1 09-MAY-05
PRGM000222 09-MAY-05 45 3 10-MAY-05
PRGM000222 10-MAY-05 05 3 10-MAY-05
PRGM000222 10-MAY-05 25 1 10-MAY-05
PRGM000222 10-MAY-05 45 3 14-MAY-05
PRGM000222 14-MAY-05 05 3 14-MAY-05
PRGM000222 14-MAY-05 24 1
PRGM000242 08-MAY-05 07 3 08-MAY-05
PRGM000242 08-MAY-05 23 1 08-MAY-05
PRGM000242 08-MAY-05 47 3 08-MAY-05
PRGM000242 08-MAY-05 48 3 09-MAY-05
PRGM000242 09-MAY-05 07 3 09-MAY-05
PRGM000242 09-MAY-05 33 1 09-MAY-05
PRGM000242 09-MAY-05 46 3 10-MAY-05
PRGM000242 10-MAY-05 07 3 10-MAY-05
PRGM000242 10-MAY-05 33 1 10-MAY-05
PRGM000242 10-MAY-05 46 3 11-MAY-05
PRGM000242 11-MAY-05 07 3 11-MAY-05
PRGM000242 11-MAY-05 33 1 11-MAY-05
PRGM000242 11-MAY-05 46 3 14-MAY-05
PRGM000242 14-MAY-05 07 3 14-MAY-05
PRGM000242 14-MAY-05 23 1
I need to find the for a particular 'prgm_c' the next date & time code where the 'sun_type' field =
'1'.
A sample of the output should look something like this:
PRGM_C FPC_DATE TIM S NEXT_DATE next_time
---------- --------- --- - --------- -------
PRGM000222 08-MAY-05 33 1 09-MAY-05 25
PRGM000222 09-MAY-05 05 3 09-MAY-05 25
PRGM000222 09-MAY-05 25 1 10-MAY-05 25
PRGM000222 09-MAY-05 45 3 10-MAY-05 25
PRGM000222 10-MAY-05 05 3 10-MAY-05 25
PRGM000222 10-MAY-05 25 1 14-MAY-05 24
PRGM000222 10-MAY-05 45 3 14-MAY-05 24
PRGM000222 14-MAY-05 05 3 14-MAY-05 24
PRGM000222 14-MAY-05 24 1
Tom, Can you please help me with with this?
Regards
Followup June 9, 2005 - 6am Central time zone:
PRGM000222 10-MAY-05 05 3 10-MAY-05
PRGM000222 10-MAY-05 25 1 10-MAY-05
PRGM000222 10-MAY-05 45 3 14-MAY-05
PRGM000222 14-MAY-05 05 3 14-MAY-05
PRGM000222 14-MAY-05 24 1
you've got a problem with those fpc_dates and ordering by them. you have "dups" so no one of those
10-may-05 comes "first" same with the 14th. You need to figure out how to really order this data
deterministically first.
My first attempt at this is:
tkyte@ORA9IR2W> select prgm_c, fpc_date, time_code, sun_type,
2 to_date(substr( max(data)
over (partition by prgm_c order by fpc_date desc),
6, 14 ),'yyyymmddhh24miss') ndt,
3 to_number( substr( max(data)
over (partition by prgm_c order by fpc_date desc), 20) ) ntc
4 from (
5 select prgm_c,
6 fpc_date,
7 time_code,
8 sun_type,
9 case when lag(sun_type)
over (partition by prgm_c order by fpc_date desc) = '1'
10 then to_char( row_number()
over (partition by prgm_c order by fpc_date desc) , 'fm00000') ||
11 to_char(lag(fpc_date)
over (partition by prgm_c order by fpc_date desc),'yyyymmddhh24mi
ss')||
12 lag(time_code) over (partition by prgm_c order by fpc_date desc)
13 end data
14 from cs_fpc_pr
15 )
16 order by prgm_c,fpc_date,time_code
17 /
PRGM_C FPC_DATE TIM S NDT NTC
---------- --------- --- - --------- ----------
PRGM000222 08-MAY-05 33 1 09-MAY-05 25
PRGM000222 09-MAY-05 05 3 09-MAY-05 25
PRGM000222 09-MAY-05 25 1 09-MAY-05 25
PRGM000222 09-MAY-05 45 3 09-MAY-05 25
PRGM000222 10-MAY-05 05 3 10-MAY-05 25
PRGM000222 10-MAY-05 25 1 10-MAY-05 25
PRGM000222 10-MAY-05 45 3 10-MAY-05 25
PRGM000222 14-MAY-05 05 3
PRGM000222 14-MAY-05 24 1
PRGM000242 08-MAY-05 07 3 08-MAY-05 23
PRGM000242 08-MAY-05 23 1 08-MAY-05 23
PRGM000242 08-MAY-05 47 3 08-MAY-05 23
PRGM000242 08-MAY-05 48 3 08-MAY-05 23
PRGM000242 09-MAY-05 07 3 10-MAY-05 33
PRGM000242 09-MAY-05 33 1 10-MAY-05 33
PRGM000242 09-MAY-05 46 3 10-MAY-05 33
PRGM000242 10-MAY-05 07 3 10-MAY-05 33
PRGM000242 10-MAY-05 33 1 10-MAY-05 33
PRGM000242 10-MAY-05 46 3 10-MAY-05 33
PRGM000242 11-MAY-05 07 3 14-MAY-05 23
PRGM000242 11-MAY-05 33 1 14-MAY-05 23
PRGM000242 11-MAY-05 46 3 14-MAY-05 23
PRGM000242 14-MAY-05 07 3
PRGM000242 14-MAY-05 23 1
24 rows selected.
but the lack of distinctness on the fpc_date means you might get "a different answer" with the same
set of data.
reply
June 9, 2005 - 7am Central time zone
Reviewer: Adolph from India
Sorry for not being clear at the first instance so here goes.... A program (prgm_C) will have a
maximum of one entry in the table for a combination of a (fpc_date & time_code).
This time_code actually maps to another table where '01' is '01:00:00' , '02' is '01:30:00' & so on
(i.e. times stored in varchar2 formats )
So basically a program will exist for a fpc_date and a time_code only once
I hope i'm making sense.
Regards
Followup June 9, 2005 - 7am Central time zone:
tkyte@ORA9IR2W> select prgm_c,
2 fpc_date,
3 time_code,
4 sun_type,
5 to_date(
6 substr( max(data)
7 over (partition by prgm_c
8 order by fpc_date desc,
9 time_code desc),
10 6, 14 ),'yyyymmddhh24miss') ndt,
11 to_number(
12 substr( max(data)
13 over (partition by prgm_c
14 order by fpc_date desc,
15 time_code desc), 20) ) ntc
16 from (
17 select prgm_c,
18 fpc_date,
19 time_code,
20 sun_type,
21 case when lag(sun_type)
22 over (partition by prgm_c
23 order by fpc_date desc,
24 time_code desc) = '1'
25 then
26 to_char( row_number()
27 over (partition by prgm_c
28 order by fpc_date desc,
29 time_code desc) , 'fm00000') ||
30 to_char(lag(fpc_date)
31 over (partition by prgm_c
32 order by fpc_date desc,
33 time_code desc),'yyyymmddhh24mi ss')||
34 lag(time_code)
35 over (partition by prgm_c
36 order by fpc_date desc,
37 time_code desc)
38 end data
39 from cs_fpc_pr
40 )
41 order by prgm_c,fpc_date,time_code
42 /
PRGM_C FPC_DATE TIM S NDT NTC
---------- --------- --- - --------- ----------
PRGM000222 08-MAY-05 33 1 09-MAY-05 25
PRGM000222 09-MAY-05 05 3 09-MAY-05 25
PRGM000222 09-MAY-05 25 1 10-MAY-05 25
PRGM000222 09-MAY-05 45 3 10-MAY-05 25
PRGM000222 10-MAY-05 05 3 10-MAY-05 25
PRGM000222 10-MAY-05 25 1 14-MAY-05 24
PRGM000222 10-MAY-05 45 3 14-MAY-05 24
PRGM000222 14-MAY-05 05 3 14-MAY-05 24
PRGM000222 14-MAY-05 24 1
PRGM000242 08-MAY-05 07 3 08-MAY-05 23
PRGM000242 08-MAY-05 23 1 09-MAY-05 33
PRGM000242 08-MAY-05 47 3 09-MAY-05 33
PRGM000242 08-MAY-05 48 3 09-MAY-05 33
PRGM000242 09-MAY-05 07 3 09-MAY-05 33
PRGM000242 09-MAY-05 33 1 10-MAY-05 33
PRGM000242 09-MAY-05 46 3 10-MAY-05 33
PRGM000242 10-MAY-05 07 3 10-MAY-05 33
PRGM000242 10-MAY-05 33 1 11-MAY-05 33
PRGM000242 10-MAY-05 46 3 11-MAY-05 33
PRGM000242 11-MAY-05 07 3 11-MAY-05 33
PRGM000242 11-MAY-05 33 1 14-MAY-05 23
PRGM000242 11-MAY-05 46 3 14-MAY-05 23
PRGM000242 14-MAY-05 07 3 14-MAY-05 23
PRGM000242 14-MAY-05 23 1
24 rows selected.
Just needed to add "time_code DESC"
See
http://www.oracle.com/technology/oramag/oracle/04-mar/o24asktom.html
analytics to the rescue
for the "carry down" technique I used here. In 10g, we'd simplify using "ignore nulls" in the
LAST_VALUE function instead of the max() and row_number() trick
brilliant
June 9, 2005 - 9am Central time zone
Reviewer: Adolph from India
Thank you very much Tom. The query works like a charm.I will read up the link. Analytics do rock n
roll :)
Working on an Analytic Query
June 9, 2005 - 12pm Central time zone
Reviewer: Scott from Long Island, NY USA
Tom,
From your example for Mark's problem on 4/8, it seems that you need to specify a number of
columns to output this way. Is there a way to have a varying number of columns. For example, I
need to have a query that takes a date range, and makes each date a column heading. Any help would
be greatly appreciated.
Thanks,
Scott
Followup June 9, 2005 - 6pm Central time zone:
you need dynamic sql. the number of columns in a query is "well defined, known at parse time" by
definition.
If you have access to expert one on one Oracle, I demostrated how to do this with ref cursors in a
stored procedure. but you have to run a query, to get the set of column "headings" and write a
query bsaed on that.
Tom any idea how I can re write this piece of code
June 9, 2005 - 3pm Central time zone
Reviewer: A reader
decode ((SELECT ih.in_date
FROM major_sales ih
WHERE ih.container = i.container
AND sales > i.container_id
AND sales = (SELECT MIN(ihh.container_id)
FROM major_sales ihh
WHERE ihh.container_id > i.container_id
AND ihh.container = i.container)), NULL,
Followup June 9, 2005 - 6pm Central time zone:
not out of context, no.
I am still having problem with analytical function
July 1, 2005 - 12pm Central time zone
Reviewer: A reader
select i.container,ssl_user_code,ssl_user_code ssl,cl.code length_code, out_trucker_code,
i.chassis,
lead(in_date) over (partition by i.container order by in_date) next_in_date,
out_date,
lead (out_date) over (partition by i.container order by in_date) o_date
from his_containers i,
container_masters cm,
tml_container_lhts clht,
tml_container_lengths cl
WHERE cm.container = i.container
and cm.lht_code = clht.code
and clht.length_code = cl.code
and ssl_user_code = 'ACL'
and i.container like '%408014'
and voided_date is null
and ((in_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')) OR
(out_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')))
results:
----------
CONTAINER SSL_USER_CODE SSL LENGTH_CODE OUT_TRUCKER_CODE CHASSIS NEXT_IN_DATE
OUT_DATE O_DATE
ACLU408014 ACL ACL 4 R0480 3/22/2005 2:52:41 PM 3/21/2005 3:45:48 PM
4/6/2005 2:25:59 PM
ACLU408014 ACL ACL 4 J1375 4/6/2005 2:25:59 PM
1. how can I get rid of the 4/6/2005 2:25:59 PM???
Followup July 1, 2005 - 1pm Central time zone:
can you be more specific about why you don't like April 6th as 2:25:59pm? what is it about that
you don't like?
That'll help me tell you how to in general remove it. What is the criteria for removal
analytical query
July 1, 2005 - 2pm Central time zone
Reviewer: A reader
Tom,
We are trying to build the client within a the month, in this case is within april. I also would
like to know how many days elapsed during 2 days so I can bill them.
Followup July 1, 2005 - 3pm Central time zone:
"how many days elapsed between 2 days"
the answer is: 2
but are you asking how to do date arithmetic? Just subtract.
sorry...within March
July 1, 2005 - 2pm Central time zone
Reviewer: A reader
more information
July 1, 2005 - 2pm Central time zone
Reviewer: A reader
Tom,
This is how the data looks
IN_DATE OUT_DATE CONTAINER
1/3/2005 2:23:05 PM 1/10/2005 5:05:16 PM ACLU408014
1/11/2005 1:04:49 PM 1/12/2005 8:49:06 AM ACLU408014
1/14/2005 12:09:50 PM 1/18/2005 6:39:10 AM ACLU408014
3/19/2005 2:10:24 AM 3/21/2005 3:45:48 PM ACLU408014
3/22/2005 2:52:41 PM 4/6/2005 2:25:59 PM ACLU408014
4/7/2005 1:24:43 PM 4/10/2005 2:21:59 AM ACLU408014
and I would like to get the pair within the same month
Followup July 1, 2005 - 3pm Central time zone:
the pair of "what"?
I would like to get all the dates within the month
July 1, 2005 - 4pm Central time zone
Reviewer: A reader
one more try
July 1, 2005 - 4pm Central time zone
Reviewer: A reader
This is how the data looks as of now with the above query.
IN_DATE OUT_DATE CONTAINER
1/3/2005 2:23:05 PM 1/10/2005 5:05:16 PM ACLU408014
1/11/2005 1:04:49 PM 1/12/2005 8:49:06 AM ACLU408014
1/14/2005 12:09:50 PM 1/18/2005 6:39:10 AM ACLU408014
3/19/2005 2:10:24 AM 3/21/2005 3:45:48 PM ACLU408014
3/22/2005 2:52:41 PM 4/6/2005 2:25:59 PM ACLU408014
4/7/2005 1:24:43 PM 4/10/2005 2:21:59 AM ACLU408014
I Would like to get it as the following
IN_DATE OUT_DATE CONTAINER
3/19/2005 2:10:24 AM 3/21/2005 3:45:48 PM ACLU408014
3/22/2005 2:52:41 PM
This is what I am looking for.....this way.
Followup July 1, 2005 - 4pm Central time zone:
still not much of a specification (important thing for those of us in this industry - being able to
describe the problem at hand in detail, so someone else can take the problem definition and code
it).
Let me try, this is purely a speculative guess on my part:
I would like all records in the table such that the in_date-out_date range covered at least part of
the month of march in the year 2005.
If the out_date falls AFTER march, I would like it nulled out.
(this part is a total guess) if the in_date falls BEFORE march, i would like it nulled out as well
(for consistency?)
Ok, stated like that I can give you untested psuedo code since there are no create tables and no
inserts to play with:
select case when in_date between to_date( :x, 'dd-mon-yyyy' )
and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
then in_date end,
case when out_date between to_date( :x, 'dd-mon-yyyy' )
and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
then out_date end,
container
from T
where in_date <= to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
and out_date >= to_date( :x, 'dd-mon-yyyy' )
bind in :x = '01-mar-2005' and :y = '01-apr-2005' for your dates.
As you requested
July 1, 2005 - 5pm Central time zone
Reviewer: A reader
CREATE TABLE CONTAINER_MASTERS
(
CONTAINER VARCHAR2(10 BYTE) NOT NULL,
CHECK_DIGIT VARCHAR2(1 BYTE) NOT NULL,
SSL_OWNER_CODE VARCHAR2(5 BYTE) NOT NULL,
LHT_CODE VARCHAR2(5 BYTE) NOT NULL
)
INSERT INTO CONTAINER_MASTERS ( CONTAINER, CHECK_DIGIT, SSL_OWNER_CODE,
LHT_CODE ) VALUES ( '045404', '1', 'BCL', '5AV');
commit;
CREATE TABLE TML_CONTAINER_LHTS
(
CODE VARCHAR2(5 BYTE) NOT NULL,
SHORT_DESCRIPTION VARCHAR2(10 BYTE) NOT NULL,
LONG_DESCRIPTION VARCHAR2(30 BYTE) NOT NULL,
ISO VARCHAR2(4 BYTE) NOT NULL,
LENGTH_CODE VARCHAR2(5 BYTE) NOT NULL
)
INSERT INTO TML_CONTAINER_LHTS ( CODE, SHORT_DESCRIPTION, LONG_DESCRIPTION, ISO, LENGTH_CODE,
HEIGHT_CODE, TYPE_CODE ) VALUES ( '5BR', '5BR', '45'' 9''6" Reefer', '5432', '5', 'B', 'R');
commit;
CREATE TABLE TML_CONTAINER_LENGTHS
(
CODE VARCHAR2(5 BYTE) NOT NULL,
SHORT_DESCRIPTION VARCHAR2(10 BYTE) NOT NULL,
)
INSERT INTO TML_CONTAINER_LENGTHS ( CODE, SHORT_DESCRIPTION,
LONG_DESCRIPTION ) VALUES (
'2', '20''', '20 Ft');
INSERT INTO TML_CONTAINER_LENGTHS ( CODE, SHORT_DESCRIPTION,
LONG_DESCRIPTION ) VALUES (
'4', '40''', '40 Ft');
commit;
Followup July 1, 2005 - 6pm Central time zone:
umm, specification?
did I get it right? if so, did you *try* the query at all???
Here is a SQL puzzle for analytics zealots
July 1, 2005 - 10pm Central time zone
Reviewer: Mikito Harakiri
OK, if anybody suceed writing the following with analytics, I would convert to analytics once and
forever. Credit it in the book, of course.
Given:
table Hotels (
name string,
price integer,
distance
)
Here is a query that sounds very analytical:
Order hotels by price, distance. Compare each record with its neighbour (lag?), and one of them is
inferior to the other by both criteria -- more pricey and father from the beach -- then throw it
away from the result.
Followup July 2, 2005 - 9am Central time zone:
define neighbor.
is neighbor defined by price or by distance? your specification is lacking many many details
(seems to be a recurring theme on this page for some reason)
sounds like you want the cheapest closest hotel to the beach. for each row, if something closer
and cheaper exists in the original set, do not keep that row.
sounds like a where not exists, not analytics to me. but then - the specification is lacking.
And lets see, in order to appreciate a tool, you have to be shown that the tool can be the end all,
be all answer to everything??!?? that is downright silly don't you think.
Let's see:
"if anyone succeeds in making the Oracle 9i merge command select data, I would convert to merge
once and forever"
"if anyone succeeds in making my car fly into outer space, I would convert to cars once and
forever"
Think about your logic here.
There are no zealots here, there are people willing to read the documentation, understand that
things work the way they work, not the way THEY think they should have been made to work, and have
jobs to do, pragmatic practical things to accomplish and are willing to use the best tool for the
job.
specs
July 3, 2005 - 11pm Central time zone
Reviewer: Mikito Harakiri
Yes, find all the hotels that are not dominated by the others by both price and distance. That is
"not exists" query, but it is a very inefficient one:
select * from hotels h
where not exists (select * from hotels hh
where hh.price < h.price and hh.distance <= h.distance
or hh.price <= h.price and hh.distance < h.distance
)
The one that reformulated is much more efficient, but how do I express it in SQL?
Followup July 4, 2005 - 10am Central time zone:
the one that reforumulated?
and why do you have the or in there at all. to dominate by both pric and distance would simply be:
where not exists ( select NULL
from hotels hh
where hh.price < h.price
AND hh.distinct < h.distance )
You said "by BOTH price and distance", nothing but nothing about ties.
ops$tkyte@ORA9IR2> /*
DOC>
DOC>drop table hotels;
DOC>
DOC>create table hotels
DOC>as
DOC>select object_name name, object_id price, object_id distance, all_objects.*
DOC> from all_objects;
DOC>
DOC>create index hotel_idx on hotels(price,distance);
DOC>
DOC>exec dbms_stats.gather_table_stats( user, 'T', cascade=>true );
DOC>*/
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select h1.name, h1.price, h1.distance
2 from hotels h1
3 where not exists ( select NULL
4 from hotels h2
5 where h2.price < h1.price
6 AND h2.distance < h1.distance )
7 /
NAME PRICE DISTANCE
------------------------------ ---------- ----------
I_OBJ# 3 3
Elapsed: 00:00:00.22
ops$tkyte@ORA9IR2> select count(*) from hotels;
COUNT(*)
----------
27837
Elapsed: 00:00:00.00
it doesn't seem horribly inefficient.
Tom Can we give it one more try
July 5, 2005 - 9am Central time zone
Reviewer: A reader
Tom, When I ran the query it returned nothing. I am sending you the whole test case. This is what I
would like to see
in the report.
out_date in_date container
1/18/2005 6:39:10 AM 3/19/2005 2:10:24 AM ACLU408014
3/21/2005 3:45:48 PM 3/22/2005 2:52:41 PM ACLU408014
CREATE TABLE BETA
(
IN_DATE DATE NOT NULL,
OUT_DATE DATE,
CONTAINER VARCHAR2(10 BYTE) NOT NULL
)
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/03/2005 02:23:05 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/10/2005 05:05:16 PM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/11/2005 01:04:49 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/12/2005 08:49:06 AM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/14/2005 12:09:50 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/18/2005 06:39:10 AM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/19/2005 02:10:24 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '03/21/2005 03:45:48 PM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/22/2005 02:52:41 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '04/06/2005 02:25:59 PM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '04/07/2005 01:24:43 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '04/10/2005 02:21:59 AM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
commit;
select in_date, out_date,container,
case when in_date between to_date('01-mar-2005', 'dd-mon-yyyy' )
and to_date( '31-mar-2005', 'dd-mon-yyyy' )-1/24/60/60
then in_date end,
case when out_date between to_date( '01-mar-2005', 'dd-mon-yyyy' )
and to_date( '31-mar-2005', 'dd-mon-yyyy' )-1/24/60/60
then out_date end
container
from BETA
WHERE in_date <= to_date( '01-mar-2005', 'dd-mon-yyyy' )-1/24/60/60
and out_date >= to_date( '31-mar-2005', 'dd-mon-yyyy' )
Followup July 5, 2005 - 9am Central time zone:
you know, this is going beyond....
*s*p*e*c*i*f*i*c*a*t*i*o*n*
pretend you were explaining to your mother (who presumably doesn't work in IT and doesn't know sql
or databases or whatever) what needed to be done.
that is what I need to see. I obviously don't know your logic of getting from "A (inputs) to B
(outputs)" and you need to explain that.
and when I run my query:
ops$tkyte@ORA10G> variable x varchar2(20)
ops$tkyte@ORA10G> variable y varchar2(20)
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec :x := '01-mar-2005'; :y := '01-apr-2005'
PL/SQL procedure successfully completed.
ops$tkyte@ORA10G> select case when in_date between to_date( :x, 'dd-mon-yyyy' )
2 and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
3 then in_date end,
4 case when out_date between to_date( :x, 'dd-mon-yyyy' )
5 and to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
6 then out_date end,
7 container
8 from beta
9 where in_date <= to_date( :y, 'dd-mon-yyyy' )-1/24/60/60
10 and out_date >= to_date( :x, 'dd-mon-yyyy' )
11 /
CASEWHENI CASEWHENO CONTAINER
--------- --------- ----------
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 ACLU408014
I do get output, not what you say you want, but output. you need to tell me THE LOGIC here. (and
maybe when you write it down, specify it, the answer will just naturally appear)
so yes, we can definitely give it one more try but if and only if you provide the details, the
specification, the logic, the thoughts behind this.
Not just "i have this and want that", it doesn't work that way.
in english
July 5, 2005 - 10am Central time zone
Reviewer: Jean
We are trying to bill from the time the truck left to the
time it returned. For example in the above query.
I would like to bill him from 1/18/2005 to 3/19/2005. So it must be part of the report. That's the
the whole key here.
clarification!!
July 5, 2005 - 10am Central time zone
Reviewer: A reader
the time he left 1/18/2005 6:39:10 AM
the time he came back 3/22/2005 2:52:41 PM
hope this helps....
Followup July 5, 2005 - 11am Central time zone:
ops$tkyte@ORA9IR2> select * from beta order by in_date;
IN_DATE OUT_DATE CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014 <<<=== gap, no 13
14-JAN-05 18-JAN-05 ACLU408014 <<=== big gap, no 19.... mar 18
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 06-APR-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014
6 rows selected.
I don't get it. I don't get it AT ALL. does anyone else ?
nope, not getting it even a teeny tiny bit myself.
give us LOGIC, ALGORITHM, INFORMATION.
like I said, pretend I'm your mother who has never seen a computer -- explain the logic at that
level (or I just give up)
BETTER TABLE
July 5, 2005 - 11am Central time zone
Reviewer: A reader
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/03/2005 02:23:05 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/10/2005 05:05:16 PM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/11/2005 01:04:49 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/12/2005 08:49:06 AM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '01/14/2005 12:09:50 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '01/18/2005 06:39:10 AM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/19/2005 02:10:24 AM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '03/21/2005 03:45:48 PM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '04/07/2005 01:24:43 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '04/10/2005 02:21:59 AM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
INSERT INTO BETA ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/22/2005 02:52:41 PM', 'MM/DD/YYYY HH:MI:SS AM'), TO_Date( '04/06/2005 02:25:59 PM',
'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU408014');
commit;
OUT_DATE IN_DATE
1/18/2005 6:39:10 AM 3/19/2005 2:10:24 AM
3/21/2005 3:45:48 PM 3/22/2005 2:52:41 PM
LEFT 1/18 CAME BACK 3/19
LEFT 3/21 CAME BACK 3/22
Followup July 5, 2005 - 12pm Central time zone:
you have totally and utterly missed my point.
IN_DATE OUT_DATE CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014
14-JAN-05 18-JAN-05 ACLU408014
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 06-APR-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014
6 rows selected.
sigh.
what if the records are
IN_DATE OUT_DATE CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014
14-JAN-05 18-JAN-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014
specification, you know what, without it, I'm not even going to look anymore. Textual description
of precisely what you want. I'm tired of guessing. I think I can guess, but I don't even want to
guess about "missing" months like my second example here.
English Explanation
July 5, 2005 - 1pm Central time zone
Reviewer: A reader
Sorry for going back and forth on this report. All I want is the following: We have trucks that
comes and out of yard. All we are looking for is when the truck came in and the "next record"
nothing in between because a truck can come in many times during a month. So we want when it first
came in and the very last time he went out for a particular month.That is to say the last time he
left the yard. So the date and time should give us this information. Finally this report should be
within a month.
example:
IN_DATE OUT_DATE CONTAINER
--------- --------- ----------
03-JAN-05 10-JAN-05 ACLU408014
11-JAN-05 12-JAN-05 ACLU408014
14-JAN-05 18-JAN-05 ACLU408014
19-MAR-05 21-MAR-05 ACLU408014
22-MAR-05 06-APR-05 ACLU408014
07-APR-05 10-APR-05 ACLU408014
6 rows selected.
in this case we want
in_date out_date
-------- --------
3/22/2005 2:52:41PM 1/18/2005 6:39:10 AM
Followup July 5, 2005 - 1pm Central time zone:
so what happened to the 21st/22nd of march this time. the answer keeps changing?
and what if, there are no records for march in the table (nothing in_date/out_date wise)
follow up
July 5, 2005 - 1pm Central time zone
Reviewer: jean
Tom,
We realized that it maybe too much to get the dates in between
so we opt for just getting the in_date and out_date. By the way there will always be data so do not
worry about if....
Thanks!!
Followup July 5, 2005 - 3pm Central time zone:
feb, what about feb? you said there would always be data? I want to run this for feb?
do you or do you not need to be concerned about a missing month.
do not be concerned!
July 5, 2005 - 3pm Central time zone
Reviewer: A reader
Please do not be concerned about missing a month. This is a report.
Followup July 5, 2005 - 3pm Central time zone:
umm, I want the report for feburary
it is blank.
now what? it should not be blank should it? this is a problem, this is a problem in our industry
in general. You get what you ask for (sometimes) and if you ask repeatedly for the wrong thing,
that's what you'll get. I am concerned -- by this line of question here.
Hey, here you go:
ops$tkyte-ORA9IR2> select *
2 from (
3 select
4 lag(out_date) over (partition by container order by in_date) last_out_date,
5 in_date,
6 container
7 from beta
8 )
9 where trunc(in_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
10 or trunc(last_out_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy');
LAST_OUT_ IN_DATE CONTAINER
--------- --------- ----------
18-JAN-05 19-MAR-05 ACLU408014
21-MAR-05 22-MAR-05 ACLU408014
gets the answer given your data, makes a zillion assumptions (50% of which are probably wrong),
won't work for FEB, probably doesn't answer the question behind the question, but hey, there you
go.
Thanks!!!
July 6, 2005 - 9am Central time zone
Reviewer: A reader
I will try it ...Thanks a zillion for your efforts and your patient.
Thanks!
July 6, 2005 - 11am Central time zone
Reviewer: A reader
CREATE TABLE BETA3
(
IN_DATE DATE NOT NULL,
OUT_DATE DATE,
CONTAINER VARCHAR2(10 BYTE) NOT NULL
)
INSERT INTO BETA3 ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '07/20/2004 03:08:49 PM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_Date( '08/10/2004 02:45:52 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU040312');
INSERT INTO BETA3 ( IN_DATE, OUT_DATE, CONTAINER ) VALUES (
TO_Date( '03/19/2005 01:55:06 AM', 'MM/DD/YYYY HH:MI:SS AM'),
TO_Date( '03/27/2005 05:05:36 AM', 'MM/DD/YYYY HH:MI:SS AM')
, 'ACLU040312');
commit;
Tom I was able to get the first pair as show
last_out_date in_date container
8/10/2004 2:45:52 AM 3/19/2005 1:55:06 AM ACLU040312
which is fine...
But can I get the other pair?
last_out_date in_date container
3/27/2005 5:05:36 AM
Followup July 6, 2005 - 12pm Central time zone:
problem is, you are "missing" a row and 'making up' data is hard.
it might be
ops$tkyte-ORA10G> select decode( r, 1, last_out_date, out_date ),
2 decode( r, 1, in_date, next_in_date )
3 from (
4 select
5 lag(out_date) over (partition by container order by in_date) last_out_date,
6 in_date, out_date,
7 lead(in_date) over (partition by container order by in_date) next_in_date,
8 container
9 from beta3
10 ), ( select 1 r from dual union all select 2 r from dual )
11 where ((
12 trunc(in_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
13 or
14 trunc(last_out_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
15 ) and r = 1 )
16 or
17 ( next_in_date is null and r = 2 )
18 /
DECODE(R,1,LAST_OUT_ DECODE(R,1,IN_DATE,N
-------------------- --------------------
10-aug-2004 02:45:52 19-mar-2005 01:55:06
27-mar-2005 05:05:36
still curious what happens in feb.
Please refer some books to learn Oracle Analytic functions
July 7, 2005 - 7am Central time zone
Reviewer: Vijay from India
Followup July 7, 2005 - 9am Central time zone:
data warehousing guide (freely available on otn.oracle.com)
Expert one on one Oracle (I have a big chapter on them in there)
Thank you very much!!
July 8, 2005 - 10am Central time zone
Reviewer: Jean
I want to thank you for the last query!!! it worked very well,even tho I still get dates outside of
the range. But overall it's fine.
How to get contiguous date ranges from Start_date, end_date pairs?
July 11, 2005 - 3pm Central time zone
Reviewer: Bob Lyon from Houston
-- Tom, Suppose I have a table with data...
-- MKT_CD START_DT_GMT END_DT_GMT
-- ------ ----------------- -----------------
-- AAA 07/11/05 00:00:00 07/12/05 00:00:00
-- BBB 07/11/05 00:00:00 07/11/05 01:00:00
-- BBB 07/11/05 01:00:00 07/11/05 02:00:00
-- BBB 07/11/05 02:00:00 07/11/05 03:00:00
-- BBB 07/11/05 06:00:00 07/11/05 07:00:00
-- BBB 07/11/05 07:00:00 07/11/05 08:00:00
-- What I would like to get is the "contiguous date ranges"
-- by MKT_CD, i.e.,
-- MKT_CD START_DT_GMT END_DT_GMT
-- ------ ----------------- -----------------
-- AAA 07/11/05 00:00:00 07/12/05 00:00:00
-- BBB 07/11/05 00:00:00 07/11/05 03:00:00
-- BBB 07/11/05 06:00:00 07/11/05 08:00:00
-- I have played with LAG/LEAD/FIRST_VALUE/LAST_VALUE
-- but seem to just "go in circles" trying to code this.
-- Here is the test data setup (Oracle 9.2.0.6) :
CREATE GLOBAL TEMPORARY TABLE NM_DEMAND_BIDS_API_GT
(
MKT_CD VARCHAR2(6) NOT NULL,
START_DT_GMT DATE NOT NULL,
END_DT_GMT DATE NOT NULL
)
ON COMMIT PRESERVE ROWS;
-- This code has 24 hours
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('AAA', TRUNC(SYSDATE), TRUNC(SYSDATE) + 1);
-- A second code goes by hours
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 00/24, TRUNC(SYSDATE) + 01/24);
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 01/24, TRUNC(SYSDATE) + 02/24);
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 02/24, TRUNC(SYSDATE) + 03/24);
-- and has an intentional gap
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 06/24, TRUNC(SYSDATE) + 07/24);
INSERT INTO NM_DEMAND_BIDS_API_GT ( MKT_CD, START_DT_GMT, END_DT_GMT )
VALUES ('BBB', TRUNC(SYSDATE)+ 07/24, TRUNC(SYSDATE) + 08/24);
-- Query
SELECT MKT_CD, START_DT_GMT, END_DT_GMT
FROM NM_DEMAND_BIDS_API_GT;
Followup July 11, 2005 - 3pm Central time zone:
based on:
http://www.oracle.com/technology/oramag/oracle/04-mar/o24asktom.html
ops$tkyte@ORA9IR2> select mkt_cd, min(start_dt_gmt), max(end_dt_gmt)
2 from (
3 select mkt_cd, start_dt_gmt, end_dt_gmt,
4 max(grp) over (partition by mkt_cd order by start_dt_gmt) mgrp
5 from (
6 SELECT MKT_CD,
7 START_DT_GMT,
8 END_DT_GMT,
9 case when lag(end_dt_gmt) over (partition by mkt_cd order by start_dt_gmt) <>
start_dt_gmt
10 or
11 lag(end_dt_gmt) over (partition by mkt_cd order by start_dt_gmt) is null
12 then row_number() over (partition by mkt_cd order by start_dt_gmt)
13 end grp
14 FROM NM_DEMAND_BIDS_API_GT
15 )
16 )
17 group by mkt_cd, mgrp
18 order by 1, 2
19 /
MKT_CD MIN(START_DT_GMT) MAX(END_DT_GMT)
------ -------------------- --------------------
AAA 11-jul-2005 00:00:00 12-jul-2005 00:00:00
BBB 11-jul-2005 00:00:00 11-jul-2005 03:00:00
BBB 11-jul-2005 06:00:00 11-jul-2005 08:00:00
Thanks!
July 11, 2005 - 5pm Central time zone
Reviewer: Bob Lyon from Houston
Wow, that was fast.
The trick here is the MAX() analytic function. I could tag the lines where a break was to occur
but couldn't figure out how to carry forward the tag/grp.
Thanks Again!
Analytical functions book
July 11, 2005 - 11pm Central time zone
Reviewer: Vijay from India
Thanks a lot
More Help
July 26, 2005 - 5pm Central time zone
Reviewer: Jean
Tom,
How can I get "just" the record within the scope? I am getting record outside of march.
select container,decode( r, 1, last_out_date, out_date )out_date, decode( r, 1, in_date,
next_in_date) in_date,
code length_code,chassis,out_trucker_code,ssl_user_code ssl, ssl_user_code,out_mode
from (
select lag(out_date) over (partition by i.container order by in_date)
last_out_date,
i.ssl_user_code,
in_date,
cl.code,
i.out_trucker_code,
i.ssl_user_code ssl,
i.container,
i.chassis,
out_mode,
out_date,
clht.length_code,
lead(in_date) over (partition by i.container order by in_date)
next_in_date
from his_containers i,container_masters cm,tml_container_lhts clht,tml_container_lengths
cl
where cm.container = i.container
and cm.lht_code = clht.code
and cl.code = clht.length_code
and ssl_user_code = 'ACL'
and i.container = 'ACLU214285'
and voided_date is null
and chassis is null
and in_mode = 'T'
and out_mode = 'T' ), ( select 1 r from dual union all select 2 r from dual )
where (( trunc(in_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy')
or trunc(last_out_date,'mm') = to_date('01-mar-2005','dd-mon-yyyy'))
and r = 1 ) or ( next_in_date is null and r = 2 )
order by out_date
Followup July 26, 2005 - 5pm Central time zone:
select *
from (Q)
where <any other conditions you like>
order by out_date;
replace Q with your query.
that's what I got in my query.....
July 26, 2005 - 6pm Central time zone
Reviewer: A reader
Followup July 26, 2005 - 6pm Central time zone:
don't know what you mean
I thought I was doing what you suggested already...
July 26, 2005 - 6pm Central time zone
Reviewer: A reader
Followup July 26, 2005 - 6pm Central time zone:
I cannot see your output, obviously you are getting more data than you wanted it -- add to the
predicate in order to filter it out. don't know what else to say.
More information..
July 27, 2005 - 9am Central time zone
Reviewer: Jean
the way it was before
CONTAINER OUT_DATE IN_DATE LENGTH_CODE CHASSIS OUT_TRUCKER_CODE
ACLU217150 6/25/2004 2:58:01 PM 3/11/2005 7:36:29 PM 4 E2131 ACL ACL T
---with your changes---
CONTAINER OUT_DATE IN_DATE LENGTH_CODE CHASSIS OUT_TRUCKER_CODE
ACLU217150 6/25/2004 2:58:01 PM 3/11/2005 7:36:29 PM 4 E2131
my history tables
CONTAINER_ID OUT_DATE IN_DATE
31779 6/21/2004 10:03:25 AM 6/16/2004 1:33:50 AM
55317 6/25/2004 2:58:01 PM 6/25/2004 2:19:49 PM
672863 3/2/2005 7:03:31 PM 2/26/2005 6:03:49 PM
708598 4/4/2005 3:31:03 PM 3/11/2005 7:36:29 PM
779305 4/16/2005 1:03:36 PM 4/6/2005 2:04:53 PM
as you can see I am not picking up the records within the month of march...with or without
the changes to the query.
Followup July 27, 2005 - 10am Central time zone:
sorry -- you'll need to work through this, you see the techniques involved right -- lag, lead,
analytic functions, YOU understand your data much better than I.
(because in part, frankly, the "way it was before" and "with your changes" look, well, I don't know
-- the same I think to me as displayed here)
Thanks for your help!
July 27, 2005 - 1pm Central time zone
Reviewer: A reader
I know the data, however I thought I was going to be something easy just to get the date within
march...I guess not.
count number of rows in a number of ranges
July 27, 2005 - 6pm Central time zone
Reviewer: A reader
Hi
I would like to count the number of rows I have per range of values. For example
SELECT RANGE, SUM(suma) total_per_deptno
FROM (SELECT CASE
WHEN deptno between 10 and 20 THEN '10-20'
ELSE '30'
END RANGE,
deptno, 1 SUMA
FROM scott$emp)
GROUP BY RANGE
RANGE TOTAL_PER_DEPTNO
----- ----------------
10-20 8
30 6
Can I rewrite that query in some other way so range can be dynamic such as
11-20
21-30
31-40
and counts the number of rows?
Thank you
Followup July 27, 2005 - 6pm Central time zone:
if you can come up with a function f(x) such that f(x) returns what you want, sure.
EG:
for you 11-20, 21-30, 31-40 -- well
f(deptno) = trunc( (deptno-0.1)/10)
(assuming deptno is an integer) -- that'll bin up deptno 0..10, 11..20, 21..30 and so on into
groups 0, 1, 2, 3, ....

August 2, 2005 - 1pm Central time zone
Reviewer: A reader
Tom,
I hope you can provide an insight to this.
table emp1 is shown below.
EmpId Week Year Day0 Day1 ..... Day14
100 20 2005 8 8 8
200 22 2003 0 0 8
300 25 2004 8 8 0
400 06 2005 0 8 8
500 08 2002 8 0 8
create table emp1(empid varchar2(3), week varchar2(2), year varchar2(4), day0 number(2), day1
number(2), day2 number(2), day3 number(2), day4 number(2), day5 number(2), day6 number(2), day7
number(2), day8 number(2), day9 number(2), day10 number(2), day11 number(2), day12 number(2), day13
number(2), day14 number(2));
insert into emp1 values('100', '20', '2005', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
insert into emp1 values('200', '22', '2003', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
insert into emp1 values('300', '25', '2004', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 0);
insert into emp1 values('400', '06', '2005', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
insert into emp1 values('500', '08', '2002', 8, 0, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8);
I am trying to select emp1 records as follows:
EmpId, Date of the day, Hours worked per day
Firstly, I have to calculate date of the day of a record (first day that corresponds to Day0) using
week of the year and year. Then I have to increment the day by 1, 2 ...14
to get the hours worked for each particular date
Example: Assuming that week 20 of 2005 is 05/07/2005. It corresponds to Day0 in the same record
Day1 column corresponds to the next day which is 05/08/2005. Day2 becomes 05/09/2005 and so on ...
Then, I have to print individual rows for each empid as:
100 05/07/2005 8
100 05/08/2005 8
.....
200 05/22/2003 0
200 05/23/2003 8
.. and so on for all empid's ...
Thank you.
Followup August 2, 2005 - 2pm Central time zone:
oh no, columns where rows should be :(
and basically you are saying "i need ROWS where these rows should be!"
tell me, how do you turn 20 into a date?

August 2, 2005 - 2pm Central time zone
Reviewer: A reader
Tom,
I should've explained it better. Week 20 of 2005, here should be translated to the first day of
week 20 of 2005 (Assuming it is 05/07/2005). That corresponds to Day0 of that row. Day1 becomes
05/08/2005 and so on ...
Is there a function or approach that can convert columns to rows?
Followup August 2, 2005 - 3pm Central time zone:
no, i mean -- what function/logic/algorithm are you using to figure out "week 20 is this day"

August 2, 2005 - 9pm Central time zone
Reviewer: A reader
Tom,
Sorry, firstly, the date is not calculated the way I said above. It's not clear yet how the date is
obtained. This issue is under review and I think I'll obtain date by joining empid with some table
(say temp1). However, I am sure I will have to use date (such as 05/07/2005), associate it with
Day0 column value. Day1 becomes 05/08/2005 and so on .. However, I am trying to obtain a sql or
pl/sql that can arrange the rows as described above. Any ideas? Thanks.
Followup August 3, 2005 - 10am Central time zone:
I cannot tell you how much I object to this model.
storing "week" and "year" - UGH.
storing them in STRINGS - UGH UGH UGH.
storing things that should be cross record in record UGH to the power of 10.
I had to fix your inserts, they did not work, added day14 of zero.
ops$tkyte@ORA10G> with dates as
2 (select to_date( '05/07/2005','mm/dd/yyyy')+level-1 dt, level-1 l from dual connect by level
<= 15 )
3 select empid, dt,
4 case when l = 0 then day0
5 when l = 1 then day1
6 when l = 2 then day2
7 /* ... */
8 when l = 13 then day13
9 when l = 14 then day14
10 end data
11 from (select * from emp1 where week = 20), dates
12 /
EMP DT DATA
--- --------- ----------
100 07-MAY-05 8
100 08-MAY-05 8
100 09-MAY-05 0
100 10-MAY-05
100 11-MAY-05
100 12-MAY-05
100 13-MAY-05
100 14-MAY-05
100 15-MAY-05
100 16-MAY-05
100 17-MAY-05
100 18-MAY-05
100 19-MAY-05
100 20-MAY-05 8
100 21-MAY-05 0
15 rows selected.

August 3, 2005 - 3pm Central time zone
Reviewer: A reader
Tom,
Thanks for the solution. I need some more help if you don't mind. The sql works excellently and I
experimented with it.
However, this question is based on a change of design here ... The emp1 table is joined with trn1
table (empid ~ trnid) to obtain values x and y. x and y should be passed to a function that returns
date.
The emp1 table is like:
EmpId Day0 Day1 ..... Day14
100 8 8 8
200 0 0 8
300 8 8 0
400 0 8 8
500 8 0 8
trn1 table is like:
trnid x y
100 3 18
200 4 19
300 5 20
400 6 21
500 7 22
etc ...
create table emp1(empid varchar2(3), day0 number(2), day1 number(2), day2 number(2), day3
number(2), day4 number(2), day5 number(2), day6 number(2), day7 number(2), day8 number(2), day9
number(2), day10 number(2), day11 number(2), day12 number(2), day13 number(2), day14 number(2));
insert into emp1 values('100', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 8);
insert into emp1 values('200', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 0);
insert into emp1 values('300', 8, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 0, 0);
insert into emp1 values('400', 0, 8, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 8);
insert into emp1 values('500', 8, 0, 0, 8, 0, 8, 0, 0, 8, 8, 8, 8, 0, 8, 8);
create table trn1(empid varchar2(3), x number(2), y number(2));
insert into trn1 values('100', 3, 18);
insert into trn1 values('200', 4, 19);
insert into trn1 values('300', 5, 20);
insert into trn1 values('400', 6, 21);
insert into trn1 values('500', 7, 22);
I used this function on just one row of emp1 (by hard coding x and y values).
I replaced
with dates as
(select to_date( '05/07/2005','mm/dd/yyyy')+level-1 dt, level-1 l from dual
connect by level <= 15 )
with
with dates as
(select getXYDate(x,y)+level-1 dt, level-1 l from dual
connect by level <= 15 )
However, I am trying to implement this on every row of emp1 by obtaining x and y from trn. There is
no week or year in emp1 table. Any help? Thanks again.
Followup August 3, 2005 - 6pm Central time zone:
I didn't think it was possible, but now I like this even less than before! didn't think you could
do that ;(
ops$tkyte@ORA10G> with dates as
2 (select to_date( '05/07/2005','mm/dd/yyyy')+level-1 dt, level-1 l from dual
connect by level <= 15 )
3 select empid, dt,
4 case when l = 0 then day0
5 when l = 1 then day1
6 when l = 2 then day2
7 /* ... */
8 when l = 13 then day13
9 when l = 14 then day14
10 end data
11 from ( QUERY ), dates
12 /
replace query with a join of emp with trn and apply the function in there.

August 3, 2005 - 7pm Central time zone
Reviewer: A reader
Tom,
Sorry to bother you again. In my case, I think
(select to_date( '05/07/2005','mm/dd/yyyy') will not help me anymore because I have to basically
find dates for Day0 .. Day14 of every row in emp1 table. The first date (date that corresponds to
Day0) for each record should be obtained using a function by passing X and Y values of trn table ..
Because each record may have different x, y values.
If it's not achievable using this way, can you suggest an alternate approach. I am trying to make a
function that would use a loop. Also, the data should be written to a text file once complete, in
that case I think a procedure might help and if so, could you throw some light? Thanks for your
patience.
Followup August 3, 2005 - 8pm Central time zone:
well, you just need to generate a set of 15 numbers (L)
and add them in later than. No big change. You have the "start_date" from the function right --
just add L to dt.

August 3, 2005 - 8pm Central time zone
Reviewer: A reader
Ok, Can you please show that if possible?

August 3, 2005 - 9pm Central time zone
Reviewer: A reader
Tom,
I tried this and am getting an error: ORA-00904: "DAY13": invalid identifier
WITH DATES AS
(SELECT FUNC_XY(17,2003)+level-1 dt, level-1 l FROM DUAL
connect by level <= 15)
select empid, day0, day14, x, y, dt,
case when l = 0 then day0
when l = 1 then day1
when l = 2 then day2
when l = 3 then day3
when l = 4 then day4
when l = 5 then day5
when l = 6 then day6
when l = 7 then day7
when l = 8 then day8
when l = 9 then day9
when l = 10 then day10
when l = 11 then day11
when l = 12 then day12
when l = 13 then day13
when l = 14 then day14
end data
from (select emp1.empid, day0, day14, x, y from emp1, trn1 where emp1.empid = trn1.empid), dates
/
As said before ... I also have to use x and y instead of 17 and 2003 in order to compute it for
every row.
Followup August 4, 2005 - 8am Central time zone:
yeah, well -- you didn't select it out in the inline view. fix that.
look the concept is thus:
with some_rows as ( select level-1 l from dual connect by level <= 15 )
select a.empid, a.dt+l, case when l=0 then a.day0
...
when l=14 then a.day14
end data
from some_rows,
(select emp1.empid, func_xy(trn1.x, trn1.y) dt,
emp1.day0, emp1.day1, .... <ALL OF THE DAYS>, emp1.day14
from emp1, trn1
where emp1.empid = trn1.empno )

August 4, 2005 - 9am Central time zone
Reviewer: A reader
Tom,
Here, the sql is using a.empid, a.dt+l ...
whereas the inner sql is using emp1.day0, trn1.empid , etc ... My real inner sql well uses some
more columns adn joins as well. When this gave me error, I just substituted emp1.day0, emp1.day14
etc ... with day0, day14 etc .. and it worked. However, when there are several joins with alias
names, How should it be done?
To make it a bit clear, this sql looks similar to:
select emp1.empid, emp1.day0 from some_rows, (select emp1.empid, emp1.day0) ...
Any idea how to select from select and still use multiple joins etc ... Hope I am clear
Followup August 4, 2005 - 9am Central time zone:
you can join as much as you WANT in the inline views.
Sorry, I cannot go further with this one, I've shown the technique -- it is just a pivot to turn
COLUMNS THAT SHOULD HAVE BEEN ROWS into rows -- very common.

August 4, 2005 - 9am Central time zone
Reviewer: A reader
Please ignore above post.
I need some help
August 9, 2005 - 10am Central time zone
Reviewer: Carlos
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('11/15/2004 17:42:56', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('11/18/2004 15:09:19',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('11/24/2004 09:38:15', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('11/30/2004 04:28:09',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('01/03/2005 14:36:24', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('01/05/2005 10:04:15',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('01/07/2005 08:54:59', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('01/10/2005 10:54:07',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('01/12/2005 10:13:13', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('01/18/2005 04:23:41',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/03/2005 03:15:05', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/09/2005 18:54:11',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/11/2005 13:25:40', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/15/2005 21:47:41',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/22/2005 20:27:03', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/29/2005 17:05:04',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/22/2005 20:27:15', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/30/2005 08:53:13',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/30/2005 13:16:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('04/16/2005 13:40:44',
'MM/DD/YYYY HH24:MI:SS'));
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/30/2005 15:08:39', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('04/16/2005 13:40:44',
'MM/DD/YYYY HH24:MI:SS'));
COMMIT;
Tom,
I hope you can help since I have been struggling with this report. I would like to get something
like this...
IN ORDER WORDS I WANT TO GET WHEN IT FIRST WAS LOGED IN INDATE AND WHEN IT WAS LAST LOGed IN
OUT_DATE. SORT OF LIKE MIN AND MAX. In this case for example for the month of March, however it can
be for any given Month. Any Ideas how I can accomplish that?
IN_DATE OUT_DATE
3/22/2005 8:27:03 PM 3/30/2005 3:08:39 PM
----from the table above for the month of March
Followup August 9, 2005 - 10am Central time zone:
insufficient detail here, why won't min/max work for you for example.
but I don't understand the logic behind the two values you say you want, I don't get how you
arrived at them.
This is what I get
August 9, 2005 - 10am Central time zone
Reviewer: A reader
select in_date, out_date
from lou_date
where id = 201048
and ((out_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')) OR
(in_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')))
I get the following:
In_date out_date
3/22/2005 8:27:03 PM 3/29/2005 5:05:04 PM
3/30/2005 3:08:39 PM 4/16/2005 1:40:44 PM
Followup August 9, 2005 - 11am Central time zone:
ok,
Insert into LOU_DATE
(IN_DATE, OUT_DATE)
Values
(TO_DATE('03/11/2005 13:25:40', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/15/2005
21:47:41', 'MM/DD/YYYY HH24:MI:SS'));
why didn't you get that row. for example.

August 9, 2005 - 11am Central time zone
Reviewer: A reader
SQL Statement which produced this data:
select in_date, out_date
from lou_date
where ((out_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')) OR
(in_date between to_date('01-MAR-05 00:00:00', 'DD-MON-RR HH24:MI:SS')
and to_date('31-MAR-05 23:59:59', 'DD-MON-RR HH24:MI:SS')))
order by out_date
3/3/2005 3:15:05 AM 3/9/2005 6:54:11 PM
3/11/2005 1:25:40 PM 3/15/2005 9:47:41 PM
3/11/2005 1:25:40 PM 3/15/2005 9:47:41 PM
3/22/2005 8:27:03 PM 3/29/2005 5:05:04 PM
3/22/2005 8:27:15 PM 3/30/2005 8:53:13 AM
3/30/2005 1:16:00 PM 4/16/2005 1:40:44 PM
3/30/2005 3:08:39 PM 4/16/2005 1:40:44 PM
I guess my question is I would like to that when
I get records with beyond march it should be replace
with blank or Null...since I can't charged him/her
for April...
Followup August 9, 2005 - 12pm Central time zone:
I am so not following you here.

August 9, 2005 - 12pm Central time zone
Reviewer: A reader
Tom,
Pretend that you are charging someone for a particular month. Let's say the month of March. So you
would like to do a query that reflect just that..so a group of dates are given to you and in that
group of dates you have multiple records with the same id. Also some records containts records
that inintiated in march but came back in April. Here is are the examples..but it can work with any
dates...
example 1.
in_date out_date
3/22/2005 8:27:15 PM 3/30/2005 8:53:13 AM
3/30/2005 1:16:00 PM 4/16/2005 1:40:44 PM
would like to see:
in_date out_date
3/22/2005 8:27:15 PM 3/30/2005 1:16:00 PM
example 2
In_date out_date
3/3/2005 3:15:05 AM 3/9/2005 6:54:11 PM
3/11/2005 1:25:40 PM 3/15/2005 9:47:41 PM
would like to see:
In_date out_date
3/3/2005 3:15:05 AM 3/15/2005 9:47:41 PM
Followup August 9, 2005 - 12pm Central time zone:
begs the question
in_date out_date
20-feb-2005 15-apr-2005
or
in_date out_date
3/22/2005 8:27:15 PM 3/25/2005 8:53:13 AM
3/30/2005 1:16:00 PM 4/16/2005 1:40:44 PM
what then. Be able to clearly specify the "goal" or the "algorithm" usually leads us straight to
the query itself. There are so many ambiguities here. Pretend you were actually documenting this
for a junior programmer to program. Give them the specifications. In gory detail.
please don't just answer these two what thens -- think of all of the cases (cause I'll just keep on
coming back with "what then" if you don't)
Remember -- I know NOTHING about your data, not a thing. This progression from
... I WANT TO GET WHEN IT FIRST WAS LOGED IN INDATE AND WHEN IT WAS
LAST LOGed IN OUT_DATE. SORT OF LIKE MIN AND MAX....
to this has been 'strange' to say the least.
Full explanation of requirements
August 9, 2005 - 3pm Central time zone
Reviewer: A reader
Sorry for the misunderstanding Tom. Here is the full requirements. I hope I can explain it this
time.
The report is a billing report and the it goes as follows:
For example for the month of March we have to bill as
in the following way:
out_date date_in Bill
2/23 3/2 3/1 to 3/2
3/1 3/3 3/1 to 3/3
3/1 4/14 3/1 to 3/31
3/1 - 3/1 to 3/31
2/23 - 3/1 to 3/31
Followup August 9, 2005 - 3pm Central time zone:
well, i hope you give your programmers more detail. Here is the best I'll do
ops$tkyte@ORA9IR1> select t.*,
2 greatest( in_date, to_date('mar-2005','mon-yyyy') ) fixed_in_date,
3 least( nvl(out_date,to_date('3000','yyyy')), last_day( to_date( 'mar-2005', 'mon-yyyy'
) ) ) fixed_out_date
4 from t
5 where in_date < last_day( to_date( 'mar-2005', 'mon-yyyy' ) )+1
6 and out_date >= to_date( 'mar-2005', 'mon-yyyy' );
IN_DATE OUT_DATE FIXED_IN_ FIXED_OUT
--------- --------- --------- ---------
03-MAR-05 09-MAR-05 03-MAR-05 09-MAR-05
11-MAR-05 15-MAR-05 11-MAR-05 15-MAR-05
22-MAR-05 29-MAR-05 22-MAR-05 29-MAR-05
22-MAR-05 30-MAR-05 22-MAR-05 30-MAR-05
30-MAR-05 16-APR-05 30-MAR-05 31-MAR-05
30-MAR-05 16-APR-05 30-MAR-05 31-MAR-05
6 rows selected.
predicate finds records that overlap march.
select adjusts the begin/end dates.
Thank!!!
August 10, 2005 - 12pm Central time zone
Reviewer: A reader
Tom,
One more request. I would like to start the report with
the first time it went out. That is to say...
how it looks now with your help...
fix_in fix_out
3/22/2005 8:27:03 PM 3/29/2005 5:05:04 PM
3/30/2005 3:08:39 PM 3/31/2005
how the data looks
fix_in fix_out
3/22/2005 8:27:03 PM 3/29/2005 5:05:04 PM---first went out
3/30/2005 3:08:39 PM 4/16/2005 1:40:44 PM
How I would like to see it since we begin billing from
the first date the truck went out.
fix_in fix_out
3/29/2005 5:05:04 PM 3/30/2005 3:08:39 PM
3/30/2005 3:08:39 PM 3/31/2005
Thanks again Tom
Followup August 10, 2005 - 1pm Central time zone:
try to work it out yourself -- please.
why? because I'll do this little thing and it'll be "oh yeah, one more thing, when the data looks
like this...."
specifying requirements is like the most important thing in the world -- it is key, it is crucial.
It is obivous you know what you want (well, maybe -- it seems to change over time) but I don't "get
it" myself. Your simple example here with two rows begs so so many questions, I don't even want to
get started.
You have lag() and lead() at your disposal, the probably come into play here. check them out.
Thanks for help !
August 11, 2005 - 3pm Central time zone
Reviewer: A reader
The report is kind of tricky. Specially when one of the dates originates in Feb. and the other pair
falls in march.
Hooked on Analytics worked for me!!
August 22, 2005 - 11am Central time zone
Reviewer: Greg from Toronto
I think I need to find a meeting group to help with my addiction ... I think I'm addicted to
analytics .. :\
Finally got a chance to read chapter 12 in "Expert Oracle" ... awesome!! 4 big, hairy Thumbs up!!
heh
But I got a question ... an "odd" behaviour that I don't understand ... was wondering if you could
help explain:
Test Script:
================
drop table junk2;
drop sequence seq_junk2;
create sequence seq_junk2;
create table junk2
(inv_num number,
cli_num number,
user_id number)
/
insert into junk2
values ( 123, 456, null );
insert into junk2
values ( 123, 678, null );
insert into junk2
values ( 234, 456, null );
insert into junk2
values ( 234, 678, null );
commit;
break on cli_num skip 1
select * from junk2;
select inv_num, cli_num,
NVL ( user_id, 999 ) chk1,
NVL2 ( user_id, 'NOT NULL', 'NULL' ) chk2,
seq_junk2.nextval seq,
FIRST_VALUE ( NVL ( user_id, seq_junk2.nextval ) )
OVER ( PARTITION BY cli_num ) user_id
from junk2
/
=====================
The final query shows this:
INV_NUM CLI_NUM CHK1 CHK2 SEQ USER_ID
---------- ---------- ---------- -------- ---------- ----------
123 456 999 NULL 1
234 999 NULL 2
123 678 999 NULL 3 2
234 999 NULL 4 2
4 rows selected.
and I'm kinda confused .. it appears that the analytic functions are not "processing" that sequence
... how do sequences and analytics work together?? (if at all??)
(In short, this is a simplified example of a bigger problem I tripped over. I'm trying to assign
new user_ids for existing clients, but only want 1 user_id assigned per client. Trick is, each
client can be associated with more 1 investment ... so I have multiple rows with same client, but I
want the same user_id assigned. kind of: "Has this client got an id yet? if not, give him a new
one, otherwise display the one he's already been assigned".)
FIRST_VALUE and LAST_VALUE seemed the logical choice ...
The interesting thing is, when I use DBMS_RANDOM.VALUE (to assign a random PIN to start with) ...
it works fine, what am I missing/forgetting about sequences that changes their behaviour in this
regards?)
Followup August 23, 2005 - 8am Central time zone:
that will be a tricky one, lots of assumptions on orders of rows processed and such.
that should throw an ora-2287 in my opinion.
I cannot see a safe way to do that without writing a plsql function and performing a lookup off to
the side by cli_num
Sorry, I don't understand ...
August 23, 2005 - 11am Central time zone
Reviewer: Greg from Toronto
you wrote:
"that will be a tricky one, lots of assumptions on orders of rows processed and such."
I don't understand what assumptions I'm making ... in my example, I just got 4 rows, I don't care
what order they come back in, just so long as it deals with them in "groups of cli_nums" .. (hence
the partition by cli_num portion) ... if I "lose" sequence numbers, that's fine, too ... I don't
care about gaps in the sequence or "missing userids" ...
The only behaviour I'm seeing, is that the analytic function doesn't seem to be working with the
sequence properly ...
I guess I can simplify the question even further:
Why does the following query return "NULL" ?
SQL > select first_value ( seq_junk2.nextval ) over ( )
2 from dual
3 /
------more------
FIRST_VALUE(SEQ_JUNK2.NEXTVAL)OVER()
------------------------------------
1 row selected.
(with a "normal" sequence - nothing fancy):
SQL > select seq_junk2.nextval from dual;
------more------
NEXTVAL
----------
29
1 row selected.
Followup August 24, 2005 - 8am Central time zone:
as i said, i believe it should be raising an error (I have it on my list of things to file when I
get back in town).
I cannot make it work, I cannot think of a way to do it in a single statement, short of writing a
user defined function.
Connect by with self referenced parent
August 23, 2005 - 12pm Central time zone
Reviewer: Joe from Reston, VA
CONNECT BY works great but I've run into a problem when the ultimate parent is referenced in the
parent record. e.g., date looks like:
SQL> select * from t;
OBJ_ID PARENT_ID
---------- ----------
1 1
2 1
3 1
4 2
5 4
But... using connect by generates an error..
SQL> select lpad(' ', 2*(level-1)) ||level "LEVEL",t.obj_id, t.parent_id
2 from t
3 connect by t.parent_id = prior t.obj_id;
ERROR:
ORA-01436: CONNECT BY loop in user data
If parent_id is null where obj_id = 1, then it's okay. Any suggestion on how to handle the other
case? I'm stumped.
Solution for connect by
August 23, 2005 - 5pm Central time zone
Reviewer: Logan Palanisamy from Sunnyvale, CA USA
SQL> select lpad(' ', 2*(level-1)) ||level "LEVEL",t.obj_id, t.parent_id
2 from t
3 connect by t.parent_id = prior t.obj_id and t.parent_id <> t.obj_id;
LEVEL OBJ_ID PARENT_ID
-------------------- ---------- ----------
1 1 1
2 2 1
3 4 2
4 5 4
2 3 1
1 2 1
2 4 2
3 5 4
1 3 1
1 4 2
2 5 4
1 5 4
12 rows selected.
re:Solution for connect by
August 24, 2005 - 8am Central time zone
Reviewer: Joe from Reston, VA
Thanks Logan. Often the solution is so simple! Thanks.
Seq problem
August 24, 2005 - 11am Central time zone
Reviewer: Bob B from Albany, NY
SELECT
A.*,
seq_junk2.currval CURR_SEQ,
seq_junk2.nextval - ROWNUM + VAL SEQ
FROM (
SELECT
inv_num,
cli_num,
NVL ( user_id, 999 ) chk1,
NVL2 ( user_id, 'NOT NULL', 'NULL' ) chk2,
DENSE_RANK() OVER ( ORDER BY CLI_NUM ) VAL
FROM JUNK2
) A
Might be a starting point. It works on the following ASSUMPTION: ROWNUM corresponds to the number
of times the sequence has been called. As Tom stated, this assumption can easily go out the window
(throw an analytic function or an order by on the outer query for a simple example).
A safer solution might be to run two updates. Update 1 will give a unique id to each null user id.
Update 2 will update the user id to the min or max user id for that cli_num. A little overhead,
but safer and simpler than the aforementioned alternative.
Still confused ... but working on it ...
August 24, 2005 - 1pm Central time zone
Reviewer: Greg from Toronto
Thanks, Bob!! Yeah, that does exactly what I wanted it to do, (but still doesn't really explain
the "why" part) ...
problem is, it looks like this is more a question on sequences now than analytics, so I'll see if I
can find a more appropriate thread to continue this on ..
Thanks!!
A slight twist on lag/lead
September 1, 2005 - 11am Central time zone
Reviewer: Sudha Bhagavatula from Buffalo, NY
That was useful to me. Could do a lot of queries easily. However I'm stuck at this point.
I have data like this:
subr_id dep_nbr grp eff_date term_date
1001 001 2112 01/01/2000 12/31/2000
1001 001 2112 01/01/2001 06/30/2001
1001 001 2112 07/01/2001 12/31/2001
1001 001 7552 01/01/2003 12/31/2003
1001 001 2112 06/30/2004 12/31/9999
I want my output to look like this:
subr_id dep_nbr grp eff_date term_date
1001 001 2112 01/01/2000 12/31/2001
1001 001 7552 01/01/2003 12/31/2003
1001 001 2112 06/30/2004 12/31/9999
How do I achieve this ?
Followup September 1, 2005 - 3pm Central time zone:
well, you should start by describing the logic from getting from A to B first.
otherwise it is just text. what are the rules that got you from inputs to outputs.
tell me the procedural algorithm you would use for example.
Rules from A to B
September 2, 2005 - 9am Central time zone
Reviewer: Sudha Bhagavatula from Buffalo, NY
A member is enrolled in a group for a timeframe. For all contiguous time frames for a group I can
take the min(eff_date) and max(term_date). For each break in group a new row with min(eff_date) and
max(term_date) again. So say a member was enrolled in a group from 01/01/2001 to 12/31/2001 and
then again with the same group from 01/01/2005 to 06/30/2005 then I need 2 rows for this member
with the dates as said just now. This is the sql that I'm running, hopefully I'm on the right track
but am stuck at this point:
SELECT SUBR_ID,
DEP_NBR,
GRP,
LAG_EFF_DATE,
LEAD_EFF_DATE,
EFF_DATE,
TERM_DATE,
LAG_TERM_DATE,
LEAD_TERM_DATE,
DECODE( LEAD_GRP, GRP, 1, 0 ) FIRST_OF_SET,
DECODE( LAG_GRP, GRP, 1, 0 ) LAST_OF_SET
FROM (SELECT M.SUBR_ID,
M.DEP_NBR,
LAG(GRP_NBR||SUB_GRP) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY
CJ.EFF_DATE) LAG_GRP,
LEAD(GRP_NBR||SUB_GRP) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY
CJ.EFF_DATE) LEAD_GRP,
GRP_NBR||SUB_GRP GRP,
CJ.EFF_DATE,
CJ.TERM_DATE,
LAG(CJ.EFF_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE)
LAG_EFF_DATE,
LEAD(CJ.EFF_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE)
LEAD_EFF_DATE,
LAG(CJ.TERM_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE)
LAG_TERM_DATE,
LEAD(CJ.TERM_DATE) OVER (PARTITION BY M.SUBR_ID, M.DEP_NBR ORDER BY CJ.EFF_DATE)
LEAD_TERM_DATE
FROM DW.T_MEMBER_GROUP_JUNCTION CJ,
BCBS.T_GROUP_DIMENSION G,
BCBS.T_MEMBER_DIMENSION M
WHERE CJ.GRP_DIM_ID = G.GRP_DIM_ID
AND CJ.MBR_DIM_ID = M.MBR_DIM_ID
AND M.DEP_NBR != '000'
AND G.BENE_PKG IS NOT NULL)
WHERE LAG_GRP IS NULL
OR LEAD_GRP IS NULL
OR LEAD_GRP <> GRP
OR LAG_GRP <> GRP
Thanks for your reply.
Followup September 3, 2005 - 7am Central time zone:
you know, without a table, rows and something more concrete.... I have no comment.
More detail
September 4, 2005 - 10pm Central time zone
Reviewer: Sudha Bhagavatula from buffalo, NY
have 3 tables:
Member_dimension
Group_Dimension
Member_Group_Junction
Member_Dimension :- columns are mbr_dim_id, subr_id, dep_nbr
Group dimension :- columns are grp_dim_id, grp_nbr, sub_grp
Member_Group_Junction :- columns are mbr_dim_id, grp_dim_id, eff_date, term_date
I have to create one row for each contiguous dates of enrollment with a new row for a new group or
a break in date.
Suppose a member (subr_id = 1001, dep_nbr = 001) is enrolled with a group called 001 from
01/01/2001 till 06/30/2001, he then changes group to 002 for the period 07/01/2001 till 12/31/2001.
He enrolls with the same group 002 from 01/01/2002 till 06/30/2002 with a change in benefits. He
then gets transferred to some other city or changes jobs. He joins back with the group 001 from
09/30/2003 till 11/30/2003 and quits again. joins back with the same group 001 from 01/01/2204 till
present.The data in the junction table will be like this:
mbr_dim_id grp_dim_id eff_date term_date
1 1 01/01/2001 06/30/2001
1 2 07/01/2001 12/31/2001
1 2 01/01/2002 06/30/2002
1 1 09/30/2003 11/30/2003
1 1 01/01/2004 12/31/9999
My output should be like this:
mbr_dim_id grp_dim_id eff_date term_date
1 1 01/01/2001 06/30/2001
1 2 07/01/2001 06/30/2002
1 1 09/30/2003 11/30/2003
1 1 01/01/2004 12/31/9999
For each change in group or a break in the contiguity of the dates I should get a new row. The
junction table is joined to the dimension with the respective dim_ids.
Hope I'm clearer this time.
Thanks
Sudha
Followup September 5, 2005 - 10am Central time zone:
tell you what, see
http://www.oracle.com/technology/oramag/oracle/04-mar/o24asktom.html
it shows a technique in the analytics to the rescue article that will be useful for grouping ranges
a records using the LAG() function.
But, you need to read the text that you are supposed to read before putting an example here.
It is something I think I say a lot.
<quote>
If your followup requires a response that might include a query, you had better supply very very
simple create tables and insert statements. I cannot create a table and populate it for each and
every question. The SMALLEST create table possible (no tablespaces, no schema names, just like I do
in my examples for you)
</quote>
that is a direct cut and paste
distinct last_value
September 6, 2005 - 4am Central time zone
Reviewer: Putchi from Stockholm, Sweden
When using last_value I am usually only intrested in the last value, hence I need a distinct in the
select to get it. It gives what I want but it seems that the database hase to do the work twice,
first a window sort and after that a unique sort. Is there any way to avoid the distinct but still
only get one row per partion key?
create table a (num number(2), var1 varchar2(10), var2 varchar2(10));
insert into a values (1,'a','A');
insert into a values (2,'b','A');
insert into a values (3,'c','A');
insert into a values (1,'a','B');
insert into a values (2,'b','B');
insert into a values (3,'c','B');
commit;
SQL> select distinct
2 var2
3 ,last_value(var1) over (partition by var2 order by num
4 rows between unbounded preceding and unbounded following) var1
5 from a;
VAR2 VAR1
---------- ----------
A c
B c
Körschema
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (UNIQUE)
2 1 WINDOW (SORT)
3 2 TABLE ACCESS (FULL) OF 'A'
Followup September 6, 2005 - 8am Central time zone:
nope, analytics are not aggregates, aggregates are not analytics.
A trick you can use to skip one or the other step is:
ops$tkyte@ORA817DEV> select var2,
2 substr( max(to_char( num,'fm0000000000') || var1), 11 ) data
3 from a
4 group by var2
5 /
VAR2 DATA
---------- -----------
A c
B c
Analytics to the rescue
September 6, 2005 - 11am Central time zone
Reviewer: Sudha Bhagavatula from Buffalo, NY
Read that article. Helped me, but now I have another twist.
Create table contracts (subr_id varchar2(15), dep_nbr varchar2(3), grp_nbr varchar2(12), eff_date
date, term_date date)
insert into contracts values ('1001', '001', '2112', to_date('01/01/2000','mm/dd/yyyy'),
to_date('12/31/2000','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '2112', to_date('01/01/2001','mm/dd/yyyy'),
to_date('06/30/2001','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '2112', to_date('07/01/2001','mm/dd/yyyy'),
to_date('12/31/2001','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '7552', to_date('01/01/2003','mm/dd/yyyy'),
to_date('12/31/2003','mm/dd/yyyy'));
insert into contracts values ('1001', '001', '2112', to_date('01/01/2004','mm/dd/yyyy'),
to_date('12/31/9999','mm/dd/yyyy'));
I ran this query to identify breaks in groups and dates for the above table:
select subr_id, dep_nbr, grp,
min_eff_date,
max_term_date
from
(select subr_id, dep_nbr, grp,
min(eff_date) min_eff_date,
max(term_date) max_term_date
from
(select subr_id, dep_nbr, eff_date, term_date, grp,
max(rn)
over(partition by subr_id, dep_nbr order by eff_date) max_rn
from
(select subr_id, dep_nbr, eff_date, term_date, grp,
(case
when eff_date-lag_term_date > 1
or lag_term_date is null
or lag_grp_nbr is null
or lag_grp_nbr <> grp
then row_num
end) rn
from (
select subr_id, dep_nbr, eff_date, term_date, grp_nbr grp,
lag(term_date)
over (partition by subr_id, dep_nbr order by eff_date) lag_term_date,
lag(grp_nbr||sub_grp)
over (partition by subr_id, dep_nbr order by eff_date) lag_grp_nbr,
row_number()
over (partition by subr_id, dep_nbr order by eff_date) row_num
from contracts )))
group by subr_id, dep_nbr, grp, max_rn )
order by subr_id, dep_nbr, min_eff_date
This gave me the output as :
subr_id dep_nbr grp eff_date term_date
1001 001 2112 01/01/2000 12/31/2001
1001 001 7552 01/01/2003 12/31/2003
1001 001 2112 06/30/2004 12/31/9999
I now have another table :
create table contract_pcp_junction (subr_id varchar2(15), dep_nbr varchar2(3), pcp_id varchar2(12),
eff_date date, term_date date)
insert into contract_pcp_junction values('1001','001','123765', to_date('07/01/2000','mm/dd/yyyy')
to_date('06/30/2001','mm/dd/yyyy');
insert into contract_pcp_junction values('1001','001','155165', to_date('01/01/2003','mm/dd/yyyy')
to_date('12/31/9999','mm/dd/yyyy');
This table identifies the provider coverage for each member. I need to identify the breaks in
coverage with regards to the contracts.
Now as per the data above this member does not have a pcp from 01/01/2000 to 06/30/2000 and again
from 07/01/2001 to 12/31/2001.
I need to insert the breaks into another table. This table needs to have the subr_id, dep_nbr, grp
and eff_date, term_date.
create table contract_pcp_breaks (subr_id varchar2(15), dep_nbr varchar2(3), grp_nbr varchar2(12),
eff_date date, term_date date)
This table needs to have the data for the breaks
subr_id dep_nbr grp_nbr eff_date term_date
1001 001 2112 01/01/2000 06/30/2000
1001 001 2112 07/01/2001 12/31/2001
How do I do that and hopefully I have the necessary scripts for you to work w1th.
Thanks a lot for your patience with this.
--Sudha
Followup September 6, 2005 - 8pm Central time zone:
yah, I have scripts, but no real idea how these tables relate. Your query looks overly complex for
the single table.
cannot you take your data, join it, get some "flat relation" that just simply using lag() on will
solve the problem?
(please remember, you have been looking at this for hours. To you this data is natural. to
everyone else, it is just bits and bytes on the screen)
Combining two tables
September 9, 2005 - 6am Central time zone
Reviewer: Putchi from Stockholm, Sweden
Hi Tom!
I want to combine from/to history values from two tables into one sequence like this:
create table a (a varchar2(2)
,from_date date
,to_date date);
create table b (b varchar2(2)
,from_date date
,to_date date);
insert into a ( a, from_date, to_date ) values (
'a1', to_date( '01/13/2005', 'mm/dd/yyyy'), to_date('02/10/2005', 'mm/dd/yyyy'));
insert into a ( a, from_date, to_date ) values (
'a2', to_date( '02/10/2005', 'mm/dd/yyyy'), to_date( '05/01/2005', 'mm/dd/yyyy'));
insert into a ( a, from_date, to_date ) values (
'a3', to_date( '05/01/2005', 'mm/dd/yyyy'), to_date( '08/12/2005', 'mm/dd/yyyy'));
insert into b ( b, from_date, to_date ) values (
'b1', to_date( '01/13/2005', 'mm/dd/yyyy'), to_date( '01/22/2005', 'mm/dd/yyyy'));
insert into b ( b, from_date, to_date ) values (
'b2', to_date( '01/22/2005', 'mm/dd/yyyy'), to_date( '04/01/2005', 'mm/dd/yyyy'));
insert into b ( b, from_date, to_date ) values (
'b3', to_date( '04/01/2005', 'mm/dd/yyyy'), to_date( '09/07/2005', 'mm/dd/yyyy'));
commit;
select * from ("Magic");
A B FROM_DATE TO_DATE
-- -- ---------- ----------
a1 b1 2005-01-13 2005-01-22
a1 b2 2005-01-22 2005-02-10
a2 b2 2005-02-10 2005-04-01
a2 b3 2005-04-01 2005-05-01
a3 b3 2005-05-01 2005-08-12
Is it possible?
Followup September 9, 2005 - 8am Central time zone:
ops$tkyte@ORA10G> select a.* , b.*,
2 greatest(a.from_date,b.from_date),
3 least(a.to_date,b.to_date)
4 from a, b
5 where a.from_date <= b.to_date
6 and a.to_date >= b.from_date;
A FROM_DATE TO_DATE B FROM_DATE TO_DATE GREATEST( LEAST(A.T
-- --------- --------- -- --------- --------- --------- ---------
a1 13-JAN-05 10-FEB-05 b1 13-JAN-05 22-JAN-05 13-JAN-05 22-JAN-05
a1 13-JAN-05 10-FEB-05 b2 22-JAN-05 01-APR-05 22-JAN-05 10-FEB-05
a2 10-FEB-05 01-MAY-05 b2 22-JAN-05 01-APR-05 10-FEB-05 01-APR-05
a2 10-FEB-05 01-MAY-05 b3 01-APR-05 07-SEP-05 01-APR-05 01-MAY-05
a3 01-MAY-05 12-AUG-05 b3 01-APR-05 07-SEP-05 01-MAY-05 12-AUG-05
It won't be blindingly fast on huge things I would guess...

September 9, 2005 - 9am Central time zone
Reviewer: Putchi from Stockholm, Sweden
OK, I will try if it works, the real tables will have hundred of thousands records. I tried this
myself, but I couldn't come up with something that filled in the "null" values.
SQL> select a,b,from_date,lead(from_date) over (order by from_date)
2 from (
3 select a,null b,from_date,to_date from a
4 union all
5 select null a,b,from_date,to_date from b
6 order by from_date
7 );
A B FROM_DATE LEAD(FROM_
-- -- ---------- ----------
a1 2005-01-13 2005-01-13
b1 2005-01-13 2005-01-22
b2 2005-01-22 2005-02-10
a2 2005-02-10 2005-04-01
b3 2005-04-01 2005-05-01
a3 2005-05-01
Followup September 9, 2005 - 9am Central time zone:
that query won't work -- you need to join.
How to get the 1ST row of this distinct value in a single SELECT
September 16, 2005 - 11am Central time zone
Reviewer: Sean Chang from VA,USA
Thank you, Tom.
I have read the analytic function for a while, but still
can't figure out a way to select the first row of a distinct
column value in a single SELECT statement. I.E
>>by running below Create and Insert
create table INV (
inv# number(7),
add_time date ,
inv_type varchar2(10),
amount number(8,2));
insert into inv values(1, sysdate-1, 'CASH', 100);
insert into inv values(1, sysdate, 'VISA', 200);
insert into inv values(1, sysdate+1, 'COD', 100);
insert into inv values(1, sysdate, 'VISA', 200);
insert into inv values(2, sysdate, 'MC', 10);
insert into inv values(3, sysdate-1, 'AMEX', 30);
insert into inv values(3, sysdate, 'CASH', 30);
I can get the first row of distinct INV# this way:
select * from (select a.*,
rank() over (partition by inv# order by add_time) time_order
from inv a) where time_order=1;
But how can I acheive this by a single SELECT statement?
The reason is that we have lots of tables we only need
look the very first row of the same Column value and I
don't want endup with lots of in-line views in SELECT
statement.
Followup September 16, 2005 - 1pm Central time zone:
that is a single select.
why not? (on the lots of in-line views). If you think they are evil - then you wouldn't like my
code ;)
Is analytical fitting in this situtation?
October 3, 2005 - 10am Central time zone
Reviewer: A reader
select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date,
gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'A'
group by trunc(c.damage_inspection_date),c.damage_inspection_by
UNION ALL
select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id) cnt
from gate_containers g,
gate_damages z
where g.gate_id = z.gate_id and
z.damage_type_code = 'F'
group by trunc(g.damage_inspection_date),g.damage_inspection_by
UNION ALL
select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
group by damage_inspection_date, damage_inspection_by, status, cnt
)
) a
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+);
Followup October 3, 2005 - 11am Central time zone:
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'A'
group by trunc(c.damage_inspection_date),c.damage_inspection_by
UNION ALL
select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id) cnt
from gate_containers g,
gate_damages z
where g.gate_id = z.gate_id and
z.damage_type_code = 'F'
group by trunc(g.damage_inspection_date),g.damage_inspection_by
UNION ALL
select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
should be a single query without union's - you don't need to make three passes on that data
select ..., count(distinct case when damage_code = 'A' then gate_id),
count(distinct case when damage_code = 'F' then gate_id end),
count(distinct gate_id)
Great!
October 3, 2005 - 4pm Central time zone
Reviewer: A reader
Tom,
When I put the changes. It saying "missing keyword" What am I doing wrong?
select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date)
damage_inspection_date, gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
count(distinct case when damage_code = 'A' then gate_id),
count(distinct case when damage_code = 'F' then gate_id end),
count(distinct gate_id))
from gate_containers ab,gate_damages ac
where ab.gate_id = ac.gate_id
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+);
Followup October 3, 2005 - 8pm Central time zone:
sorry, I am not a sql compiler, I cannot reproduce since I don't have the tables or anything.
Case when ... then ... end
October 4, 2005 - 8am Central time zone
Reviewer: Greg from Toronto
Just lucked out an saw this:
"select ..., count(distinct case when damage_code = 'A' then gate_id),
count(distinct case when damage_code = 'F' then gate_id end),
count(distinct gate_id)"
Should be:
"select ..., count(distinct case when damage_code = 'A' then gate_id end),
count(distinct case when damage_code = 'F' then gate_id end),
count(distinct gate_id)"
Tom just missed the "end" for the case statement ... (I got lucky and spotted it .. heh)
Followup October 4, 2005 - 4pm Central time zone:
(that is why i always ask for create tables and inserts - without them, it is not possible to test)
thanks!!
October 4, 2005 - 2pm Central time zone
Reviewer: A reader
Well Taken
October 5, 2005 - 10am Central time zone
Reviewer: A reader
Tom,
This is what I would like to see..
damage_inspection_date damage_inspection_by counts
xx/xx/xxxx Louis 2 minors
xx/xx/xxxx juan 1 major
thanks.
can analytics help me?
October 5, 2005 - 2pm Central time zone
Reviewer: Susan from Watertown, MA
My result set be ordered by the sum of multiple columns with weight assigned to the columns. The
SQL below works and gives me what I want, but maybe there is an analytical function solution?
Thanks for all your help.
SELECT ename, job, sal, comm FROM scott.BONUS
ORDER BY DECODE(job, -2, 0, job)*100000+DECODE(sal, -2, 0, sal)*10000+DECODE(comm, -2,0,comm)*100
DESC
Followup October 5, 2005 - 3pm Central time zone:
not in this case - you want to order by a simple function of attributes of a single row.
You don't need to look across rows - analytics look across rows.
Thanks Tom
October 5, 2005 - 3pm Central time zone
Reviewer: Susan from Watertown, MA
Thanks for your reply. Do you agree with the DECODE approach or am I missing a more elegant
solution?
Followup October 5, 2005 - 8pm Central time zone:
the decode looks fine here - shorter than case but in this "case" just as easy to read.
Tom
October 5, 2005 - 4pm Central time zone
Reviewer: A reader
Tom,
Can you please point in the right direction...
This is what I am getting with the following query...
damage_inspection_date damage_inspection_by status
6/12/2004 CCCT MAJOR
6/12/2004 CCCT MINOR
6/12/2004 CCCT TOTAL
6/12/2004 LOU MAJOR
6/12/2004 LOU MINOR
and this is what I would like to get....
damage_inspection_date damage_inspection_by status count
6/12/2004 CCCT MAJOR 2
6/12/2004 CCCT MINOR 2
6/12/2004 CCCT TOTAL 1
select b.damage_inspection_date,
b.damage_inspection_by
,b.status
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date,
gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT ab.damage_inspection_date,
damage_inspection_by,
STATUS_CODE,
count(distinct case when ac.damage_location_code = 'A' then ab.gate_id end),
count(distinct case when ac.damage_location_code = 'F' then ab.gate_id end),
count(distinct ab.gate_id )
from gate_containers ab,gate_damages ac
where ab.gate_id = ac.gate_id
group by ab.damage_inspection_date,ab.damage_inspection_by,status_code,
ab.gate_id))a
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
group by (b.damage_inspection_date, b.damage_inspection_by,b.status)
Followup October 5, 2005 - 8pm Central time zone:
....
damage_inspection_date damage_inspection_by status
6/12/2004 CCCT MAJOR
6/12/2004 CCCT MINOR
6/12/2004 CCCT TOTAL
6/12/2004 LOU MAJOR
6/12/2004 LOU MINOR
and this is what I would like to get....
damage_inspection_date damage_inspection_by status count
6/12/2004 CCCT MAJOR 2
6/12/2004 CCCT MINOR 2
6/12/2004 CCCT TOTAL 1
...
by what "logic"? can you explain how you get from A to B?
follow up
October 6, 2005 - 9am Central time zone
Reviewer: A reader
Tom,
I already got the first part done. All I need to show is to somehow have the count in another
column, how many minor, major and total I have. Can that be possible?
Just maybe like in the second example.
Followup October 6, 2005 - 11am Central time zone:
first part of WHAT?
more information
October 6, 2005 - 12pm Central time zone
Reviewer: A reader
Sorry about the lack of information before.
Here I will try to do bettter. I am trying to
a query where I need to count the major, minor
and then get a total.
requirements:
1. if there is a container with majors and a minors total the
counts = major+ minor = total count
2. where container has minor and no major count the minor only.
count = minor
inspector major minor total
1 major, 0 minor , other 1 1
inspector
2 major , 1 minor , other 2 1 3
inspector
0 major, 1 minor, other 0 1 1
Followup October 6, 2005 - 1pm Central time zone:
sorry -- going back to your original example, I still cannot see the logic behind "what I have" and
"what I want" there.
I don't know what you mean by "i have the first part"
this what I have now
October 6, 2005 - 2pm Central time zone
Reviewer: A reader
Tom,
This is my query and result...
select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date,
gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'F'
group by trunc(c.damage_inspection_date),c.damage_inspection_by
UNION ALL
select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id) cnt
from gate_containers g,
gate_damages z
where g.gate_id = z.gate_id and
z.damage_type_code = 'A'
group by trunc(g.damage_inspection_date),g.damage_inspection_by
UNION ALL
select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id(+) and
SUBSTR(ab.action,2,1) != 'C'
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
group by damage_inspection_date, damage_inspection_by, status, cnt
)
) a
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+);
RESULT:
SQL Statement which produced this data:
select * from MAJOR_MINOR_COUNT_VIEW
where rownum < 10
6/12/2004 CCCT TOTAL 1
6/12/2004 CRAIG TOTAL 6
6/13/2004 CCCT TOTAL 5
6/14/2004 CCCT TOTAL 46
6/14/2004 FYFE TOTAL 30
6/14/2004 HALM TOTAL 38
6/14/2004 MUTH MAJOR 2
6/14/2004 MUTH MINOR 14
6/14/2004 MUTH TOTAL 40
AND I WOULD LIK TO HAVE LIKE AS
THE REQUIREMENTS ABOVE...HOPE THIS HELP.
Followup October 6, 2005 - 2pm Central time zone:
take your query - call it Q
select inspector,
max(decode(status,'MINOR',cnt)) minor,
max(decode(status,'MAJOR',cnt)) major,
max(decode(status,'TOTAL',cnt)) total
from (Q)
group by inspector
Year to dt + month to date
October 6, 2005 - 2pm Central time zone
Reviewer: reader from US
CREATE TABLE TEST (ID VARCHAR2(10),sale_dt DATE ,amount NUMBER(6,2) )
INSERT INTO TEST VALUES ('aa','14-OCT-2005',65.25);
INSERT INTO TEST VALUES ('aa','14-OCT-2005',56.25);
INSERT INTO TEST VALUES ('aa','15-SEP-2005',72.25);
INSERT INTO TEST VALUES ('aa','19-OCT-2005',43.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',67.25);
INSERT INTO TEST VALUES ('bb','13-OCT-2005',235.25);
INSERT INTO TEST VALUES ('bb','15-OCT-2005',365.25);
INSERT INTO TEST VALUES ('bb','14-NOV-2005',465.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',165.25);
commit;
SELECT DISTINCT id,sale_dt,SUM (amount)
OVER (PARTITION BY id ORDER BY sale_dt ASC) sale_daily,
SUM (amount)
OVER (PARTITION BY id, TO_CHAR(invoice_dt, 'MON-YYYY') ORDER BY TO_CHAR(sale_dt, 'MON-YYYY') ASC)
mon_sal,
SUM (sale_price_usd * qty_sold)
OVER (PARTITION BY id, TO_CHAR(sale_dt, 'YYYY') ORDER BY TO_CHAR(sale_dt, 'YYYY') ASC) yr_sal,
FROM test
ID SALE_DT SALE_DAILY MON_SAL YR_SAL
---------- --------- ---------- ---------- ----------
aa 15-SEP-05 72.25 72.25 237
aa 14-OCT-05 121.5 164.75 237
aa 19-OCT-05 43.25 164.75 237
bb 14-SEP-05 232.5 232.5 1298.25
bb 13-OCT-05 235.25 600.5 1298.25
bb 15-OCT-05 365.25 600.5 1298.25
bb 14-NOV-05 465.25 465.25 1298.25
7 rows selected.
Ideally ,it should have been ----
ID SALE_DT SALE_DAILY MON_SAL YR_SAL
---------- --------- ---------- ---------- ----------
aa 15-SEP-05 72.25 72.25 72.25
aa 14-OCT-05 121.5 121.5 193.75
aa 19-OCT-05 43.25 164.75 237
bb 14-SEP-05 232.5 232.5 232.5
bb 13-OCT-05 235.25 235.25 467.5
bb 15-OCT-05 365.25 600.5 833.0
bb 14-NOV-05 465.25 465.25 1298.25
How can I do this ?
Will appreciate your help .
THANKS
Followup October 6, 2005 - 3pm Central time zone:
ideally - there would be a qty_sold column somewhere :)
ideally you will ONLY use to_char to *format* data, never to process it.
trunc(invoice_dt,'y') NOT to_char(invoice_dt,'yyyy')
trunc(sale_dt,'mm') NOT to_char(sale_dt, 'MON-YYYY' )
Year to Date and Month to date
October 6, 2005 - 10pm Central time zone
Reviewer: READER from US
As per your suggestion ,I made the changes but ...still need your help .
CREATE TABLE TEST (ID VARCHAR2(10),sale_dt DATE ,amount NUMBER(6,2) )
INSERT INTO TEST VALUES ('aa','14-OCT-2005',65.25);
INSERT INTO TEST VALUES ('aa','14-OCT-2005',56.25);
INSERT INTO TEST VALUES ('aa','15-SEP-2005',72.25);
INSERT INTO TEST VALUES ('aa','19-OCT-2005',43.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',67.25);
INSERT INTO TEST VALUES ('bb','13-OCT-2005',235.25);
INSERT INTO TEST VALUES ('bb','15-OCT-2005',365.25);
INSERT INTO TEST VALUES ('bb','14-NOV-2005',465.25);
INSERT INTO TEST VALUES ('bb','14-SEP-2005',165.25);
commit;
SELECT DISTINCT id,sale_dt,SUM (amount)
OVER (PARTITION BY id ORDER BY sale_dt ASC) sale_daily,
SUM (amount)
OVER (PARTITION BY id,trunc(sale_dt,'MM') ORDER BY trunc(sale_dt,'MM') ASC) mon_sal,
SUM (amount)
OVER (PARTITION BY id,trunc(sale_dt,'Y') ORDER BY trunc(sale_dt,'Y') ASC) yr_sal
FROM test
ID SALE_DT SALE_DAILY MON_SAL YR_SAL
---------- --------------------- ---------- ---------- ----------
aa 9/15/2005 72.25 72.25 237
aa 10/14/2005 193.75 164.75 237
aa 10/19/2005 237 164.75 237
bb 9/14/2005 232.5 232.5 1298.25
bb 10/13/2005 467.75 600.5 1298.25
bb 10/15/2005 833 600.5 1298.25
bb 11/14/2005 1298.25 465.25 1298.25
7 rows selected
Ideally ,it should have been ----
ID SALE_DT SALE_DAILY MON_SAL YR_SAL
---------- --------- ---------- ---------- ----------
aa 15-SEP-05 72.25 72.25 72.25
aa 14-OCT-05 121.5 121.5 193.75
aa 19-OCT-05 43.25 164.75 237
bb 14-SEP-05 232.5 232.5 232.5
bb 13-OCT-05 235.25 235.25 467.5
bb 15-OCT-05 365.25 600.5 833.0
bb 14-NOV-05 465.25 465.25 1298.25
Thanks again .
Followup October 7, 2005 - 8am Central time zone:
you shall have to explain how you derived your "optimal" output.
certainly isn't sorted by anything? I don't get the numbers.
Year to date /Month to date
October 7, 2005 - 9am Central time zone
Reviewer: Reader from US
I wish to create a summary table where we will have sale for every day ,sale up to that day in that
month and then upto that day in that year
ie running total or cummulative total
Thanks
Followup October 7, 2005 - 8pm Central time zone:
ok?
Follo up
October 7, 2005 - 9am Central time zone
Reviewer: A reader
Tom,
The above pivot worked well, however my count are off since
I ONLY want to count the minor when there is no Major.
Something like this..
major minor count
1 major, 0 minor , other 1 1
2 major , 1 minor , other 2 2
0 major, 1 minor, other 0 1 1
* count the minor when there is no major
CREATE TABLE GATE_CONTAINERS
(
GATE_ID NUMBER ,
VISIT NUMBER ,
REFERENCE_ID NUMBER ,
DAMAGE_INSPECTION_BY VARCHAR2(30),
DAMAGE_INSPECTION_DATE DATE,
)
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(1, 1);
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(17, 10);
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(21, 12);
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(31, 18);
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(33, 19);
Insert into GATE_TBL
(GATE_ID, VISIT, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(36, 22, TO_DATE('06/12/2004 11:48:49', 'MM/DD/YYYY HH24:MI:SS'), 'CRAIG');
Insert into GATE_TBL
(GATE_ID, VISIT, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(37, 23, TO_DATE('06/12/2004 11:50:11', 'MM/DD/YYYY HH24:MI:SS'), 'CRAIG');
Insert into GATE_TBL
(GATE_ID, VISIT, DAMAGE_INSPECTION_DATE, DAMAGE_INSPECTION_BY)
Values
(39, 25, TO_DATE('06/12/2004 11:48:19', 'MM/DD/YYYY HH24:MI:SS'), 'CRAIG');
Insert into GATE_TBL
(GATE_ID, VISIT)
Values
(45, 30);
COMMIT;
CREATE TABLE GATE_DAMAGES
(
GATE_ID NUMBER NOT NULL,
DAMAGE_LOCATION_CODE VARCHAR2(5 BYTE) NOT NULL,
DAMAGE_TYPE_CODE VARCHAR2(5 BYTE) NOT NULL
)
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(34, '01', '9');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(34, '02', 'C');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(37, '01', 'B');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(62, '05', 'B');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(101, '23', 'C');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(183, '99', '9');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '01', 'D');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '04', 'B');
Insert into damages_tbl
(GATE_ID, DAMAGE_LOCATION_CODE, DAMAGE_TYPE_CODE)
Values
(188, '07', 'B');
COMMIT;
Followup October 7, 2005 - 8pm Central time zone:
The above pivot worked well, however my count are off since
I ONLY want to count the minor when there is no Major.
Something like this..
major minor count
1 major, 0 minor , other 1 1
2 major , 1 minor , other 2 2
0 major, 1 minor, other 0 1 1
so tell me why there are minor counts when major > 0???
and this is my query
October 7, 2005 - 9am Central time zone
Reviewer: A reader
select damage_inspection_date,damage_inspection_by,
max(decode(status,'MINOR',cnt)) minor,
max(decode(status,'MAJOR',cnt)) major,
max(decode(status,'TOTAL',cnt)) total
from (select b.damage_inspection_date,
b.damage_inspection_by
,b.status
,NVL(a.cnt,0) CNT
from
(select aa.damage_inspection_date,
aa.damage_inspection_by,
bb.status
from (select distinct trunc(gc.damage_inspection_date) damage_inspection_date,
gc.damage_inspection_by
from gate_damages gd, gate_containers gc
where gd.gate_id = gc.gate_id
) aa,
(select *
from (select 'MAJOR' STATUS from dual
union all
select 'MINOR' STATUS from dual
union all
select 'TOTAL' STATUS from dual
)
) bb
)b,
((SELECT damage_inspection_date,
damage_inspection_by,
Status,
cnt
FROM (select trunc(c.damage_inspection_date) damage_inspection_date,
c.damage_inspection_by,
'MAJOR' STATUS,
count(distinct c.gate_id) cnt
from gate_containers c,
gate_damages d
where c.gate_id = d.gate_id and
d.damage_type_code = 'F'
group by trunc(c.damage_inspection_date),c.damage_inspection_by
UNION ALL
select trunc(g.damage_inspection_date) damage_inspection_date,
g.damage_inspection_by,
'MINOR' STATUS,
count(distinct g.gate_id) cnt
from gate_containers g,
gate_damages z
where g.gate_id = z.gate_id and
z.damage_type_code = 'A'
group by trunc(g.damage_inspection_date),g.damage_inspection_by
UNION ALL
select trunc(ab.damage_inspection_date) damage_inspection_date,
ab.damage_inspection_by,
'TOTAL' STATUS,
count(distinct ab.gate_id) cnt
from gate_containers ab,
gate_damages ac
where ab.gate_id = ac.gate_id(+) and
SUBSTR(ab.action,2,1) != 'C'
group by trunc(ab.damage_inspection_date),ab.damage_inspection_by
)
group by damage_inspection_date, damage_inspection_by, status, cnt
)
) a
where b.damage_inspection_by = a.damage_inspection_by(+)
and b.damage_inspection_date = a.damage_inspection_date(+)
and b.status = a.status(+)
)
group by damage_inspection_date,damage_inspection_by
I got it....
& |