Skip to Main Content
  • Questions
  • Usage of Analytic Functions within a query having grouping

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Rennie.

Asked: May 31, 2002 - 5:24 pm UTC

Last updated: August 11, 2009 - 1:52 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Tom,
Table tab1 has 3 columns col1,col2 and col3
I have a query grouped on col1. Columns col2 and col3 have non unique values for a particular value of col1. I want the value of col2 for the row having maximum value of col3 pertaining to the col1 grouping.
Tab1
col1 col2 col3
'A' 'x' 1
'A' 'x' 3
'A' 'y' 5
'B' 'c' 9
'B' 'd' 2
'B' 'c' 7
In this case i want my output to be

col1 max_val
'A' 'y'
'B' 'c'

Its something like, if you treat col1 as a class and col2 as student and col3 as marks. I want the student having maximum marks in that class. But, the column col3 contains marks for all subjects. SO, the student will have multiple records under that class.

CREATE TABLE tab1 (col1 VARCHAR2(10),col2 VARCHAR2(10), col3 NUMBER(10));

INSERT INTO tab1(col1,col2,col3) VALUES('A','x',1);
INSERT INTO tab1(col1,col2,col3) VALUES('A','x',3);
INSERT INTO tab1(col1,col2,col3) VALUES('A','y',5);

INSERT INTO tab1(col1,col2,col3) VALUES('B','c',9);
INSERT INTO tab1(col1,col2,col3) VALUES('B','d',2);
INSERT INTO tab1(col1,col2,col3) VALUES('B','c',7);

SELECT col1, <greatest col2> FROM tab1 GROUP BY col1;

Please suggest me a method other than to use an inline view. Because in the real scenario col2 and col3 come from an inline view are they are obtained using analytic functions.

Iam not able to combine analytic function within a grouping and it gives me wrong result. Please throw some light on this in general also, since i have lot many queries affected by this. The error i get is that 'Window functions are not allowed here'.

Please advice.

regards
rennie


Hi Tom,
Iam sorry that i replied to this id. Its just that your site does not
reflect multiple updations. Here is the query i was trying on. I have added
the table creation and insert sql's but could not add this also.

I wanted something like this.

SELECT col1, MAX(FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col3
DESC))
FROM tab1
GROUP BY col1;

This gives error - ORA-30483. My question is that on a query that has a
group by clause, how do i use an analytic function.

and Tom said...



Let's say it was valid -- what would it return? It would return the MAX( constant ) for a given col1. It seems you want

select DISTINCT col1,
FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col3 DESC)
from tab1;

Since first_value (being partitioned by col1) will effectively be a CONSTANT by col1 -- distinct is what you are looking for.

scott@ORA817DEV.US.ORACLE.COM> select deptno,
2 max(first_value(sal) over (partition by deptno order by job DESC )) fv
3 from emp
4 group by deptno
5 /
max(first_value(sal) over (partition by deptno order by job DESC )) fv
*
ERROR at line 2:
ORA-30483: window functions are not allowed here


scott@ORA817DEV.US.ORACLE.COM> select DISTINCT deptno,
2 first_value(sal) over (partition by deptno order by job DESC ) fv
3 from emp
4 /

DEPTNO FV
---------- ----------
10 5000
20 2975
30 1250

scott@ORA817DEV.US.ORACLE.COM>

Alternatively, but not as good, you can

scott@ORA817DEV.US.ORACLE.COM> select deptno, max(fv)
2 from ( select deptno,
3 first_value(sal) over (partition by deptno order by job DESC ) fv
4 from emp
5 )
6 group by deptno
7 /

DEPTNO MAX(FV)
---------- ----------
10 5000
20 2975
30 1250




And yes, I allow you ONE chance to update a question -- your update will flip a bit and the question becomes mine again -- so if you prematurely submit the update -- the question goes into my queue and your updates don't "stick". I've added an error message to that effect now -- you'll be told "the question is in tom's queue"...



Rating

  (88 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Analytic Function

Rennie Sreekumar, June 03, 2002 - 4:48 pm UTC

Thanks tom,
Your response was fast. What i wanted was to have it in the same query avoiding an inline view. Seems that is not possible.
Can you pls tell me why oracle does not support some thing like GROUP FUNCTION on (<analytic function>).
For eg:-
MAX(FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col3DESC))

regards
rennie

Tom Kyte
June 03, 2002 - 5:39 pm UTC

Because it doesn't make SENSE? Your query doesn't really "compute" (to me anyway)

Distinct is exactly what you are looking for.

select DISTINCT col1,
FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col3 DESC)
from tab1;

that

a) has no inline view
b) gets you the answer
c) makes "sense"

Could Analytic Functions be Used Here?

Richard, December 16, 2003 - 11:05 am UTC

Hi,

My set up:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production


I have a table named myTable, created as follows:
DROP TABLE myTable
/


CREATE TABLE myTable (
Ship_date DATE NOT NULL
,Ship_type VARCHAR2(10) NOT NULL
,Shipment NUMBER(6,0) NOT NULL
)
/


It has data in it as follows:
SHIP_DATE SHIP_TYPE SHIPMENT
---------- ---------- ----------
01-01-2000 SHIP1 27
01-01-2000 SHIP1 26
01-01-2000 SHIP1 25
01-01-2000 SHIP2 24
01-01-2000 SHIP2 23
01-01-2000 SHIP2 22
01-01-2000 SHIP3 21
01-01-2000 SHIP3 20
01-01-2000 SHIP3 19
01-01-2001 SHIP1 18
01-01-2001 SHIP1 17
01-01-2001 SHIP1 16
01-01-2001 SHIP2 15
01-01-2001 SHIP2 14
01-01-2001 SHIP2 13
01-01-2001 SHIP3 12
01-01-2001 SHIP3 11
01-01-2001 SHIP3 10
01-01-2002 SHIP1 9
01-01-2002 SHIP1 8
01-01-2002 SHIP1 7
01-01-2002 SHIP2 6
01-01-2002 SHIP2 5
01-01-2002 SHIP2 4
01-01-2002 SHIP3 3
01-01-2002 SHIP3 2
01-01-2002 SHIP3 1


I wish to select only those rows that have the greatest Shipment value, per Ship_type, per Ship_date. The result I want is:
SHIP_DATE SHIP_TYPE SHIPMENT
---------- ---------- ----------
01-01-2002 SHIP1 9
01-01-2002 SHIP2 6
01-01-2002 SHIP3 3


I have created a query which does this, as follows:
SELECT ship_date, ship_type, shipment
FROM myTable
WHERE TO_CHAR( TRUNC( ship_date ))
||
ship_type
||
TO_CHAR( shipment )
IN
(
SELECT MAX
(
TO_CHAR( TRUNC( ship_date ))
||
ship_type
||
TO_CHAR( shipment )
)
FROM myTable
GROUP BY ship_TYPE
)
/


However, I think it's clunky and is probably a rubbishy way of going about it. Is my suspicion that Analytics might be of use?

Thanks in advance.


P.S. myTable can be populated with:
BEGIN
INSERT INTO myTable VALUES('01-01-2000', 'SHIP1', 27);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP1', 26);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP1', 25);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP2', 24);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP2', 23);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP2', 22);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP3', 21);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP3', 20);
INSERT INTO myTable VALUES('01-01-2000', 'SHIP3', 19);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP1', 18);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP1', 17);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP1', 16);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP2', 15);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP2', 14);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP2', 13);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP3', 12);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP3', 11);
INSERT INTO myTable VALUES('01-01-2001', 'SHIP3', 10);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP1', 9);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP1', 8);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP1', 7);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP2', 6);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP2', 5);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP2', 4);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP3', 3);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP3', 2);
INSERT INTO myTable VALUES('01-01-2002', 'SHIP3', 1);
COMMIT;
END;
/

Tom Kyte
December 16, 2003 - 12:01 pm UTC

ops$tkyte@ORA9IR2> select * from (
  2  select myTable.*, max(shipment) over (partition by ship_date, ship_type) max_shipment
  3    from myTable
  4    ) where shipment = max_shipment;
 
SHIP_DATE  SHIP_TYPE    SHIPMENT MAX_SHIPMENT
---------- ---------- ---------- ------------
01-01-2000 SHIP1              27           27
01-01-2000 SHIP2              24           24
01-01-2000 SHIP3              21           21
01-01-2001 SHIP1              18           18
01-01-2001 SHIP2              15           15
01-01-2001 SHIP3              12           12
01-01-2002 SHIP1               9            9
01-01-2002 SHIP2               6            6
01-01-2002 SHIP3               3            3
 
9 rows selected.
 

A reader, December 16, 2003 - 1:45 pm UTC

!!!!!

A Bit More Help, Please!

Richard, December 18, 2003 - 4:19 am UTC

Hi,

I didn't word my initial request very well (my apologies).

What I SHOULD have asked is:

I wish to select only those rows that have the greatest Shipment value, per Ship_type, per Ship_date, showing ONLY those records with greatest Ship_date per Ship_type. The result I want is:
SHIP_DATE SHIP_TYPE SHIPMENT
---------- ---------- ----------
01-01-2002 SHIP1 9
01-01-2002 SHIP2 6
01-01-2002 SHIP3 3

i.e. using the data I supplied, I would wish to see the 3 records above only.

As an aside, does using TO_CHAR (as I have done in my supplied example) *on the fly* perform better, worse, or the same than a function that would do the same?

Merry Christmas & Happy New Year!


Tom Kyte
December 18, 2003 - 11:13 am UTC

just use the same technique!  a simple variation on a theme.



ops$tkyte@ORA920PC> select *
  2    from (
  3  select myTable.*, max(ship_date) over (partition by ship_type) max_ship_date
  4    from (
  5  select myTable.*, max(shipment) over (partition by ship_date, ship_type) max_shipment
  6    from myTable
  7    )  myTable
  8   where shipment = max_shipment
  9   )
 10   where ship_date = max_ship_date
 11  /
 
SHIP_DATE  SHIP_TYPE    SHIPMENT MAX_SHIPMENT MAX_SHIP_D
---------- ---------- ---------- ------------ ----------
01-01-2002 SHIP1               9            9 01-01-2002
01-01-2002 SHIP2               6            6 01-01-2002
01-01-2002 SHIP3               3            3 01-01-2002
 

I don't know how to answer your other question -- as I don't see how "a function" (whatever you mean by that -- as TO_CHAR is itself a function) would help at all.  not sure what you are asking to compare. 

Excellent Answers!

Richard, December 20, 2003 - 11:56 am UTC

Thank you very much for your help!

Forget about my second question regarding TO_CHAR... I'm no longer sure what I meant myself!!!

Fishing out a row as of a specific date

Michael, January 15, 2004 - 5:16 pm UTC

I have a mock audit table based on the hr.employee. I copied all the data from employee into employee_aud and added some columns for sys_tm, os_usr, operation (update,etc). I have been using some of your advice in other threads to organize the data using the lag function. Still learning on those analytics. Now, I have a different problem. One of the requirements is to pull a row out of the audit table for a specific PK (in my test case, employee_id) showing the data as of a given date, even if there is no sys_tm that matches it. We cannot use WM for versioning, due to unique constraints not PKs. So if I have a number of updates for a given employee_id (108) where sys_tm dates are 01-JAN-98, 12-FEB-98, 30-JUN-98, and
26-SEP-98. For each update, the triggers populate all of the columns in the audit table: if the column did not change, it has the old value, if it did change, the new value. Someone wants to know what the data looked like on
15-MAY-98. I have been trying to tweak the logic above but I keep returning all of the rows that match those 4 dates.
Here is my latest attempt, but it still returns all 4 dates listed above, not just the most recent one preceding the 15-MAY-98 date.

select * from (select employee_aud.*, max(sys_tm) over (partition by sys_tm) asofdate
from employee_aud) where sys_tm <= '30-JUN-99'
and employee_id = 108;

What I am trying to get is the row with the sys_tm 12-FEB-98. That would represent what the data for that row looked like on 30-JUN-98, because there were no updates in between.

Tom Kyte
January 16, 2004 - 1:04 am UTC

select *
from ( select *
from employee_aud
where sys_tm <= TO_DATE( :bv, 'dd-mon-yyyy' )
order by sys_tm DESC )
where rownum = 1;

you want the max record such that the sys_tm is less than or equal to the date in question -- ordering and rownum can do that.


comments:

o compare dates to dates, strings to strings, numbers to numbers -- never anything else

o use 4 digits for a year.

Works Great!

Michael, January 16, 2004 - 11:22 am UTC

Your "Expert One on One" book is on order. (I should have ordered it when it came out!).

Tom Kyte
January 16, 2004 - 11:37 am UTC

small world -- i grew up in bethlehem pa ;) not too far from you.

A reader, January 16, 2004 - 6:52 pm UTC

Yeps ... And don't forget to order EOBD once you have finished the first one :-)

A reader, January 16, 2004 - 6:54 pm UTC

It looks as if Tom's missing home !!!!

group partitions

A reader, February 03, 2004 - 5:36 am UTC

Hi

let´s say I have data like this


deptno empno
------- --------
10 1001
10 1002
20 2003
20 2004

how can I obtain a result such as

deptno empno numrow
------- -------- --------
10 1001 1
10 1002 1
20 2003 2
20 2004 2

partition by deptno but I dont know which function to use!

cheers

Tom Kyte
February 03, 2004 - 7:54 am UTC

You cannot really with analytics -- you already have a nice "partition" column to group by and what-not (deptno).

You could use regular old sql

scott@ORA10G> select emp.deptno, emp.empno, a.rn
2 from emp, ( select deptno, rn
3 from ( select deptno, rownum rn
4 from ( select distinct deptno
5 from emp
6 order by deptno )
7 )
8 ) a
9 where emp.deptno = a.deptno
10 order by rn, empno
11 /

DEPTNO EMPNO RN
---------- ---------- ----------
10 7782 1
10 7839 1
10 7934 1
20 7369 2
20 7566 2
20 7788 2
20 7876 2
20 7902 2
30 7499 3
30 7521 3
30 7654 3
30 7698 3
30 7844 3
30 7900 3

14 rows selected.


What about?

A reader, February 03, 2004 - 8:00 am UTC

SQL> select deptno, ename, dense_rank() over (order by deptno);

    DEPTNO ENAME      DENSE_RANK()OVER(ORDERBYDEPTNO)
---------- ---------- -------------------------------
        10 CLARK                                    1
        10 KING                                     1
        10 MILLER                                   1
        20 SMITH                                    2
        20 ADAMS                                    2
        20 FORD                                     2
        20 SCOTT                                    2
        20 JONES                                    2
        30 ALLEN                                    3
        30 BLAKE                                    3
        30 MARTIN                                   3
        30 JAMES                                    3
        30 TURNER                                   3
        30 WARD                                     3 

Tom Kyte
February 03, 2004 - 8:29 am UTC

doh, very good -- very good :)

I was thinking about it the wrong way...... that does it for sure.

Query

A reader, February 07, 2004 - 2:40 pm UTC

Tom,

Could you please show me how to do this.

table C1
--------
cust_no acct_no type balance

11 111 A 500
22 222 A 100

table C2
--------
cust_no acct_no type balance

11 1111 B 50
22 2222 B 60

table C3
---------
cust_no acct_no type balance

33 888 C 100
11 333 C 400

table C4 (resultant) table
I want to generate another table which will be the union of tables C1, C2, C3
but the derived_balance field will be as per the following

if Type is either 'A' or 'B' then add the balance else if type is 'C' then subtract
it and derived a resultant balance for each cust_no. Rest of the fields will just be the unions.


cust_no acct_no Type balance derived_balance
11 111 A 500 500+50-400
11 1111 B 50 500+50-400
11 333 C 400 500+50-400

22 222 A 100 100+60
22 2222 B 60 100+60

33 888 C 400 0+0-400




Thanks very much,





Tom Kyte
February 07, 2004 - 3:52 pm UTC

ops$tkyte@ORA9IR2> select cust_no, acct_no, type, balance,
  2         sum(xbalance) over (partition by cust_no) derived_balance
  3    from ( select c1.*, balance xbalance from c1
  4           union all
  5                   select c2.*, balance xbalance from c2
  6                   union  all
  7                   select c3.*, -1 * balance xbalance from c3
  8             )
  9  order by cust_no
 10  /
 
   CUST_NO    ACCT_NO T    BALANCE DERIVED_BALANCE
---------- ---------- - ---------- ---------------
        11        111 a        500             150
        11        111 b         50             150
        11        333 c        400             150
        22        222 a        100             160
        22        222 b         60             160
        33        888 c        100            -100
 
6 rows selected.



(you have a typo in your example output -- 33/889 is -100) 

Can Analytics Help me ?

Riaz Shahid, February 14, 2004 - 3:49 am UTC

Consider:

SQL> select * From test;

         A          B C
========== ========== =
         1        100 Y
         2        200 Y
         3        200 Y
         4        300 Y
         5        100 N
         6        200 Y
         7        200 Y
         8        300 Y
         9        100 Y
        10        200 N
        11        200 Y
        12        300 Y
        12        300 Y

13 rows selected.

I have to choose all the records whose status = 'Y'
and display the following info.

A     B     Count

 1    4     4
 6    9     4
11   12     2

Regards 

Tom Kyte
February 14, 2004 - 5:11 pm UTC

ops$tkyte@ORA9IR2> select min(a), max(a), count(*)
  2    from (
  3  select a, c, max(rn) over (order by a) grp
  4    from (
  5  select a, c,
  6         case when c = 'N' and lag(c) over (order by a) = 'Y'
  7                  then row_number() over (order by a)
  8          end rn
  9    from t
 10         )
 11   where c = 'Y' or rn is not null
 12         )
 13   where c = 'Y'
 14   group by grp
 15   order by grp nulls first
 16  /
 
    MIN(A)     MAX(A)   COUNT(*)
---------- ---------- ----------
         1          4          4
         6          9          4
        11         12          3
 
Analytics rock...
Analytics roll....


(in order to figure this thingy out -- please run the queries from the inside out to see how they "build" on one another...) 

Little Correction...

Riaz Shahid, February 14, 2004 - 3:54 am UTC

I need output like:

I have to choose all the records whose status = 'Y'
and display the following info.

MinA MaxA Count

1 4 4
6 9 4
11 12 2

Where MinA & MaxA are values for column A with an interval of 3 (if possible,else the interval will be according to the last value, as shown in the last row of output)



Excellent ! ! !

Riaz Shahid, February 16, 2004 - 6:26 am UTC

Thats just brilliant. I was wondering from where you get )simple) ideas to solve difficult (looking) problems.

Thanks a lot

Counting the number of occurances

A reader, February 17, 2004 - 7:09 am UTC

Perhaps this is a new question but...

I have a report to write that I thought at first would be very straight forward. but I'm actually completely stumped. I *guess* that analytic functions will do me grand but I can't fathom how to do it.

The base table looks like...

IDENTIFIER NUMBER,
ACCOMMODATION_COMP_PLANNED DATE,
ACCOMMODATION_COMP_FORECAST DATE,
BASE_BUILT_PLANNED DATE,
BASE_BUILT_FORECAST DATE,
BUILT_POWER_RIGGED_PLANNED DATE,
BUILT_POWER_RIGGED_FORECAST DATE,
DAS_COMMISSION_PLANNED DATE,
DAS_COMMISSION_FORECAST DATE,
DETAIL_DESIGN_APP_PLANNED DATE,
DETAIL_DESIGN_APP_FORECAST DATE,

(In actual fact I have 19 of these date "pairs" but I figure these few is enough for demostration purposes)

It's also worth pointing out that some of these dates may be null (as that particular identifier does not do that activity)


The output I'm looking for is (Excuse the column name chopping)...

DATE ACCOM_PLANNED ACCOM_FCAST, BASE_PLANNED, BASE_FCAST, BUILT_PLANNED, BUILT_FCAST, DAS_PLANNED, DAS_FCAST, DETAIL_PLANNED, DETAIL_FCAST, ...
JAN 99 23 24 40 42 150 140 120 90 130 190
FEB 99 34 34 50 50 43 54 34 54 45 45
...


So I want to count the number of occurances by month, I thought a whole load of count(*) over (partion by to_char(column,'MON-YY') would do it but to no avail (I get lots of dups) I'm completely confussed as to how I'd structure the SQL to do this, in the end I hope to turn this into a view so it can just be selected...


Thanks,

Mike.


Tom Kyte
February 17, 2004 - 9:53 am UTC

well, couple of things - first, we need a set of dates to report on.  you can either query the table to determine what the min/max dates could be -- or generate a set of dates.

I'm going to generate a set of dates, starting with this sample data:


DOC>create table t ( id number, ap date, af date, bp date, bf date, cp date, cf  date);
DOC>
DOC>insert into t
DOC>select rownum,
DOC>       add_months( trunc(sysdate,'y'), dbms_random.value( 0, 12 ) ),
DOC>       add_months( trunc(sysdate,'y'), dbms_random.value( 0, 12 ) ),
DOC>       add_months( trunc(sysdate,'y'), dbms_random.value( 0, 12 ) ),
DOC>       add_months( trunc(sysdate,'y'), dbms_random.value( 0, 12 ) ),
DOC>       add_months( trunc(sysdate,'y'), dbms_random.value( 0, 12 ) ),
DOC>       add_months( trunc(sysdate,'y'), dbms_random.value( 0, 12 ) )
DOC>  from all_objects;
DOC> analyze table t compute statistics for table;
DOC>*/


ops$tkyte@ORA920PC> select add_months( to_date('01-jan-2004'), rownum-1 ) dt
  2    from all_objects
  3   where rownum <= 12
  4  /
 
DT
---------
01-JAN-04
01-FEB-04
01-MAR-04
01-APR-04
01-MAY-04
01-JUN-04
01-JUL-04
01-AUG-04
01-SEP-04
01-OCT-04
01-NOV-04
01-DEC-04
 
12 rows selected.
 

we'll generate the months for 2004 and then we can:

ops$tkyte@ORA920PC> select dt,
  2         sum(decode(trunc(ap,'mm'),dt,1)) ap,
  3         sum(decode(trunc(af,'mm'),dt,1)) af,
  4         sum(decode(trunc(bp,'mm'),dt,1)) bp,
  5         sum(decode(trunc(bf,'mm'),dt,1)) bf,
  6         sum(decode(trunc(cp,'mm'),dt,1)) cp,
  7         sum(decode(trunc(cf,'mm'),dt,1)) cfA
  8    from (
  9  select add_months( to_date('01-jan-2004'), rownum-1 ) dt
 10    from all_objects
 11   where rownum <= 12
 12         ),
 13             t
 14   group by dt
 15  /
 
DT                AP         AF         BP         BF         CP        CFA
--------- ---------- ---------- ---------- ---------- ---------- ----------
01-JAN-04       2521       2489       2545       2445       2472       2566
01-FEB-04       2529       2462       2493       2547       2522       2540
01-MAR-04       2513       2587       2427       2493       2569       2460
01-APR-04       2544       2642       2531       2528       2534       2465
01-MAY-04       2461       2500       2530       2556       2514       2559
01-JUN-04       2527       2482       2557       2526       2546       2517
01-JUL-04       2548       2528       2502       2471       2599       2519
01-AUG-04       2548       2561       2508       2524       2558       2552
01-SEP-04       2556       2453       2528       2582       2422       2466
01-OCT-04       2447       2510       2455       2507       2460       2483
01-NOV-04       2570       2465       2568       2549       2557       2530
01-DEC-04       2486       2571       2606       2522       2497       2593
 
12 rows selected.
 


Yes, that is a cartesian join -- but it needs to be since each row in T might need to be joined to EVERY row in the set of generated dates.


or, you can take this approach:


ops$tkyte@ORA920PC> select dt, sum(ap), sum(af), sum(bp), sum(bf), sum(cp), sum(cf)
  2    from (
  3  select trunc(ap,'mm') dt, count(*) ap, 0 af, 0 bp, 0 bf, 0 cp, 0 cf from t group by trunc(ap,'mm') union all
  4  select trunc(af,'mm') dt, 0, count(*), 0, 0, 0, 0 from t group by trunc(af,'mm') union all
  5  select trunc(bp,'mm') dt, 0, 0, count(*), 0, 0, 0 from t group by trunc(bp,'mm') union all
  6  select trunc(bf,'mm') dt, 0, 0, 0, count(*), 0, 0 from t group by trunc(bf,'mm') union all
  7  select trunc(cp,'mm') dt, 0, 0, 0, 0, count(*), 0 from t group by trunc(cp,'mm') union all
  8  select trunc(cf,'mm') dt, 0, 0, 0, 0, 0, count(*) from t group by trunc(cf,'mm')
  9         )
 10   group by dt
 11  /
 
DT           SUM(AP)    SUM(AF)    SUM(BP)    SUM(BF)    SUM(CP)    SUM(CF)
--------- ---------- ---------- ---------- ---------- ---------- ----------
01-JAN-04       2521       2489       2545       2445       2472       2566
01-FEB-04       2529       2462       2493       2547       2522       2540
01-MAR-04       2513       2587       2427       2493       2569       2460
01-APR-04       2544       2642       2531       2528       2534       2465
01-MAY-04       2461       2500       2530       2556       2514       2559
01-JUN-04       2527       2482       2557       2526       2546       2517
01-JUL-04       2548       2528       2502       2471       2599       2519
01-AUG-04       2548       2561       2508       2524       2558       2552
01-SEP-04       2556       2453       2528       2582       2422       2466
01-OCT-04       2447       2510       2455       2507       2460       2483
01-NOV-04       2570       2465       2568       2549       2557       2530
01-DEC-04       2486       2571       2606       2522       2497       2593
 
12 rows selected.
 
 

Replacing attached code with Analytic Function?

Denni50, March 02, 2004 - 10:39 am UTC

Tom

I'm developing a step matrix report(as we call it, which
may be an incorrect terminology based on what I am trying
to achieve).

So far the logic behind what I am trying to achieve is
shown in the code below:

FUNCTION GetDollars(amt IN NUMBER) RETURN NUMBER IS
AmtRange NUMBER:=0;

BEGIN
CASE amt
WHEN (BETWEEN .01 and 9.99) THEN AmtRange :=1;
WHEN (BETWEEN 10 and 19.99) THEN AmtRange :=2;
WHEN (BETWEEN 20 and 29.99) THEN AmtRange :=3;
WHEN (BETWEEN 30 and 39.99) THEN AmtRange :=4;
WHEN (BETWEEN 40 and 49.99) THEN AmtRange :=5;
WHEN (BETWEEN 50 and 59.99) THEN AmtRange :=6;
WHEN (BETWEEN 60 and 69.99) THEN AmtRange :=7;
WHEN (BETWEEN 70 and 79.99) THEN AmtRange :=8;
WHEN (BETWEEN 80 and 89.99) THEN AmtRange :=9;
WHEN (BETWEEN 90 and 99.99) THEN AmtRange :=10;
WHEN (BETWEEN 100 and 249.99)THEN AmtRange :=11;
WHEN (BETWEEN 250 and 999999.99)THEN AmtRange :=12;
ELSE AmtRange:=13; --if NULL
END CASE;
RETURN AmtRange;
END;

FUNCTION GetDates(dte IN DATE,dteComp IN DATE) RETURN NUMBER IS
DteRange NUMBER:=0;
monDiff NUMBER:=0;
BEGIN
monDiff:= MONTHS_BETWEEN(dteComp,dte);
CASE monDiff
WHEN (BETWEEN 0 and 3) THEN DteRange :=1;
WHEN (BETWEEN 3 and 6) THEN DteRange :=2;
WHEN (BETWEEN 6 and 12) THEN DteRange :=3;
WHEN (BETWEEN 12 and 18) THEN DteRange :=4;
WHEN (BETWEEN 18 and 24) THEN DteRange :=5;
WHEN (BETWEEN 24 and 36) THEN DteRange :=6;
WHEN (BETWEEN 36 and 48) THEN DteRange :=7;
WHEN (BETWEEN 48 and 999999) THEN DteRange :=8;
ELSE DteRange:=9; --if NULL
END CASE;
RETURN DteRange;
END;

I'm trying to achieve a grid like report where AmtRange
would represent the horizontal(Column portion)of the
report and the DteRange would represent the vertical(rows)
and results would be processed in 'buckets' accordingly.

my question is:
can this laborious process be better accomplished using
AF and if so..perhaps give an example to get me started
in the right direction.

thanks a bunch..









Tom Kyte
March 02, 2004 - 6:43 pm UTC

I hope you are not going to call these from SQL -- if so, I would move 100% of the logic out of plsql and directly into a view. You'll find it *many* times faster.


You can definitely make it "smaller" with a simple divide - -but I do not see analytics helping you here, they work across rows, this is just simple "take this continous set of inputs and map them to this discrete set of outputs using this method" (eg: a function)

thanks Tom...

Denni50, March 03, 2004 - 9:05 am UTC

As I think about this more and more...initially I was
hoping AF could somehow perform this need more efficently
but as you explained AF would not work in this case.

Plans are to call this from a stored procedure.
The Functions I posted were just one part of the entire
procedure. Create a table where the output would be
inserted then create a report from that table.

Now I'm planning to install Oracle Developer Reports 6i
on my 9i db. I've worked with Forms and know you can call
PL/SQL procedures, triggers..etc

For right now I'm going to create a scaled down test
table with donors who have count(gftamt)=1 and
gftamt > 0.(we have nondonors who do not give but respond to campaigns,we keep track of them for purposes of converting them into donors and assign them a gftamt=0)
We have multi donors but for now I want to focus on single
donors.

Then get the procedure to insert how many donors
gave in the 0-3 month, .01-9.99 range
0-3 month, 10-19.99 range
0-3 month, 20-29.99 range...and on and on

which would look like:

.01-9.99 10-19.99 20-29.99 30-39.99
0-3 xx xx xx xx
3-6 xx xx xx xx
6-12 xx xx xx xx
...and so on


you mention something about using a *divide* routine
and using a *view*...could you expand on that.

Tom Kyte
March 03, 2004 - 10:12 am UTC

the stored procedures are going to impose a huge cost on you.

I would URGE you to use views -- strongly URGE.


Instead of:

FUNCTION GetDates(dte IN DATE,dteComp IN DATE) RETURN NUMBER IS
DteRange NUMBER:=0;
monDiff NUMBER:=0;
BEGIN
monDiff:= MONTHS_BETWEEN(dteComp,dte);
CASE monDiff
WHEN (BETWEEN 0 and 3) THEN DteRange :=1;
WHEN (BETWEEN 3 and 6) THEN DteRange :=2;
WHEN (BETWEEN 6 and 12) THEN DteRange :=3;
WHEN (BETWEEN 12 and 18) THEN DteRange :=4;
WHEN (BETWEEN 18 and 24) THEN DteRange :=5;
WHEN (BETWEEN 24 and 36) THEN DteRange :=6;
WHEN (BETWEEN 36 and 48) THEN DteRange :=7;
WHEN (BETWEEN 48 and 999999) THEN DteRange :=8;
ELSE DteRange:=9; --if NULL
END CASE;
RETURN DteRange;
END;

and then


select dte, dteComp, getDates( dte, dteComp ) xxx from t;

you would:

create or replace view v
as
select dte, dteComp, case
when ( months_between(dte,dteComp) between 0 and 3) then 1
when ( months_between(dte,dteComp) between 3 and 6) then 2
when ( months_between(dte,dteComp) between 6 and 12) then 3
when ( months_between(dte,dteComp) between 12 and 18) then 4
when ( months_between(dte,dteComp) between 18 and 24) then 5
when ( months_between(dte,dteComp) between 24 and 36) then 6
when ( months_between(dte,dteComp) between 36 and 48) then 7
when ( months_between(dte,dteComp) between 48 and 99999) then 8
else 9
end
xxx
from t;

and have your reports use V instead of T.




and by divide, i just meant that you could do some divisions on the numbers to roll them up "shorter" but in a much less clear fashion. I'd stick with case IN a view.

THANKS!!! Tom

Denni50, March 03, 2004 - 11:02 am UTC

I understand what you mean now...since Views are like
stored queries anyhow...thanks for the explanation and
example.

:~)


anayltics

kit, March 03, 2004 - 11:24 am UTC

HI,

are you recommending the views to compliment your theory that if it can be done by sql then use it opposed to plsql stored procs

Also can you please expand on analytics works across rows

Tom Kyte
March 03, 2004 - 3:27 pm UTC

I use views to hide complexity like the case above. I would not want to code the case statement for all 50 reports I need to generate (i made up the number 50, it is "whatever").

If you look at the data dictionary -- the ALL_, USER_, DBA_ views are littered with this using DECODE()....


analytics work "across rows". functions like "lag()" reach "back" in the result set, "lead()" reach forward.

sum() over (partition by deptno)

sums OVER all of the records by deptno...

and so on -- analytics work "across" rows. this function (above function) is working soley on the data within a single row.

Report generation

A reader, July 19, 2004 - 12:09 pm UTC

Hi,

I want some help with generation of reports.

I have the following table

create table test
(
id number,
act number,
timestamp date
);

insert into test (123,1,'06-12-2000 11:23:00');
insert into test (123,2,'06-12-2000 11:25:01');
insert into test (123,3,'06-12-2000 11:40:00');
insert into test (124,1,'06-12-2000 11:24:00');
commit;

is it possible for me to get an output like

session act1 timestamp1 act2 timestamp2 act3 timestamp3

123 1 06-12-2000 11:23:00 2
06-12-2000 11:25:01 3 06-12-2000 11:40:00

Pls help.

Thanks.

Tom Kyte
July 19, 2004 - 12:14 pm UTC

be nice to have tested the inserts (they are not sql)....


but -- are the act values always 1, 2, 3 -- or random?
are there just three of them?

Inserts

A reader, July 19, 2004 - 12:21 pm UTC

Sorry about the insert scripts Tom. I was typing in a hurry. I have retyped them.

create table test
(
id number,
act number,
timestamp date
);

insert into test values (123,1,'06-12-2000 11:23:00');
insert into test values (123,2,'06-12-2000 11:25:01');
insert into test values (123,3,'06-12-2000 11:40:00');
insert into test values (124,1,'06-12-2000 11:24:00');
commit;


The values for act can be anything from 1..6. some ids can have all 6 values, some 1..5, some 1..3 and so on.

Thanks

Tom Kyte
July 19, 2004 - 1:47 pm UTC

without knowing how many there are before you code the query -- you cannot "transpose" them like that into columns -- it is a chicken and egg program -- in order to write the query, you need to know the number of columns you are expecting.

is there a reasonable maximum?

Convert to Analytic Function

denni50, July 23, 2004 - 8:59 am UTC

Tom

Can the below sql statement be converted to AF?

thanks


select g.idnumber,sum(g.giftamount)LTGifts,count(g.giftamount)LTCount,max(g.giftamount)LargestGift,
min(g.giftdate)FirstGiftDate,max(g.giftdate)LastGiftDate,
to_number(substr(min(case when g.giftamount>0 then
to_char(g.giftdate,'yyyymmddhh24miss')||g.giftamount end),15))FirstGiftAmt,
to_number(substr(max(case when g.giftamount>0 then
to_char(g.giftdate,'yyyymmddhh24miss')||g.giftamount end),15))LastGiftAmt,
(select sum(ga.giftamount)
from gift ga
where ga.giftdate >=(SELECT ADD_MONTHS(max(gi.giftdate),-12)
from gift gi
where gi.idnumber=ga.idnumber)
and ga.idnumber=g.idnumber) AnnualSum
from gift g,aclj_drL a
where g.idnumber=a.idnumber
group by g.idnumber

Tom Kyte
July 23, 2004 - 9:17 am UTC

ops$tkyte@ORA9IR2> desc t
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 IDNUMBER                                          NUMBER(38)
 GIFTAMOUNT                                        NUMBER(38)
 GIFTDATE                                          DATE
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select idnumber, sum(giftamount), sum(ga2)
  2    from (
  3  select idnumber, giftamount,
  4         case when add_months(giftdate,12) >= max(giftdate) over (partition by idnumber) then giftamount end ga2
  5    from t
  6         )
  7   group by idnumber
  8  /
 
  IDNUMBER SUM(GIFTAMOUNT)   SUM(GA2)
---------- --------------- ----------
         0          361982     117340
         1          394998     109448
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select idnumber, sum(giftamount),
  2         (select sum(giftamount)
  3            from t ga
  4           where giftdate >= (select add_months(max(giftdate),-12)
  5                                from t gi
  6                               where gi.idnumber = ga.idnumber )
  7             and ga.idnumber = g.idnumber) ga2
  8    from t g
  9   group by idnumber
 10  /
 
  IDNUMBER SUM(GIFTAMOUNT)        GA2
---------- --------------- ----------
         0          361982     117340
         1          394998     109448
 
 

How to do this

A reader, August 12, 2004 - 6:01 pm UTC

Tom,

I read your site and found many ways of doing complex logic using analytic functions. Is there a way to get the following using a simple query.

I have a table TEST2 initially set-up as follows

CREATE TABLE test2
(
end_date DATE,
tot_return FLOAT,
unit_value FLOAT
)
/
INSERT INTO test2 (end_date,tot_return,unit_value)
VALUES ('31-JAN-04',0,100)
/
INSERT INTO test2 (end_date,tot_return)
VALUES ('29-FEB-04',10)
/
INSERT INTO test2 (end_date,tot_return)
VALUES ('31-MAR-04',20)
/
INSERT INTO test2 (end_date,tot_return)
VALUES ('30-APR-04',50)
/

I want to get output as follows :

End_Date Tot_Return Unit_Value
==================================
31-JUN-04 0 100
29-FEB-04 10 110*
31-MAR-04 20 132*
30-APR-04 50 198*

The calculation is as follows :

Current Unit Value =
Prior Unit Value * (1 + Tot_Return/100)

110* = 100 * (1 + 10/100)
132* = 110 * (1 + 20/100)
198* = 132 * (1 + 50/100)

It is the same as calculating the compound interest. Is there any way to get this output in a single query

As always thanks much for your help to Oracle community

Tom Kyte
August 13, 2004 - 9:06 am UTC

ops$tkyte@ORA9IR2> select end_date, tot_return,
  2     nvl(exp( sum( decode(tot_return,0,to_number(0),ln(1+tot_return/100)) )
  3                 over (order by end_date)),1) factor,
  4      first_value(unit_value) over (order by end_date) x,
  5      nvl(exp( sum( decode(tot_return,0,to_number(0),ln(1+tot_return/100)) )
  6                 over (order by end_date)),1) *
  7      first_value(unit_value) over (order by end_date) current_unit_value
  8    from test2
  9  /
 
END_DATE  TOT_RETURN     FACTOR          X CURRENT_UNIT_VALUE
--------- ---------- ---------- ---------- ------------------
31-JAN-04          0          1        100                100
29-FEB-04         10        1.1        100                110
31-MAR-04         20       1.32        100                132
30-APR-04         50       1.98        100                198
 
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2200571416651
will help with some of the background -- the exp(sum(ln)) creates a product -- what we wanted to do was just get the product(tot_return) OVER (order by end_date) -- multiply that by the first_value and wah-lah....


 

Negative numbers?

A reader, August 13, 2004 - 4:20 pm UTC

Tom

Thanks for a wonderful solution. I also read the URL that explains the use of EXP,SUM and LN. In that discussion, one of the reader mentioned that the above logic will not work for negative numbers. I tried the same query and when a negative number is found, it says that it is out of range. Is there a query that would work for negative numbers of should we have to write some functions to handle that

Thanks

Tom Kyte
August 13, 2004 - 6:44 pm UTC

well, you could use the ABS and count the number of negatives...

ops$tkyte@ORA9IR2> select dt, num,
  2         exp(sum(ln(abs(num))) over (order by dt)) product,
  3             case when mod( count(case when num < 0 then 1 end) over (order by dt), 2 )=1 then -1
  4                  else 1
  5              end sign,
  6         exp(sum(ln(abs(num))) over (order by dt))*
  7             case when mod( count(case when num < 0 then 1 end) over (order by dt), 2 )=1 then -1
  8                  else 1
  9              end real_answer
 10    from t;
 
DT               NUM    PRODUCT       SIGN REAL_ANSWER
--------- ---------- ---------- ---------- -----------
14-AUG-04         -1          1         -1          -1
15-AUG-04          2          2         -1          -2
16-AUG-04         -3          6          1           6
17-AUG-04          4         24          1          24
18-AUG-04         -5        120         -1        -120
19-AUG-04          6        720         -1        -720
20-AUG-04         -7       5040          1        5040
21-AUG-04          8      40320          1       40320
22-AUG-04         -9     362880         -1     -362880
23-AUG-04         10    3628800         -1    -3628800
 
10 rows selected. 

handling 0 in exp(sum(ln(abs)))

Dong Wang, August 13, 2004 - 8:24 pm UTC

the product function using exp(sum(ln(abs))) does not handle 0 values. It will give an error of ORA-01428: argument '0' is out of range

How about the following one to compute a product:

create table t (
a varchar2(10),
b number(9,2)
);

insert into t values ('a', 1);
insert into t values ('a', 0);
insert into t values ('a', -1);


select
decode(
max(case when b <> 0 then 0 else 1 end),
1, 0,
exp(sum(ln(abs(decode(b, 0, 1, b))))) *
decode(
mod(sum(case when b >= 0 then 0 else 1 end), 2),
0, 1,
-1
)
)
from t
group by a;

Tom Kyte
August 14, 2004 - 1:13 pm UTC

I know, that is why I nvl'ed it

2 nvl(exp( sum( decode(tot_return,0,to_number(0),ln(1+tot_return/100)) )
3 over (order by end_date)),1) factor,


in my example too.

Calculating Rate of Returns

A reader, August 17, 2004 - 4:55 pm UTC

Tom,

I have the following table and the INSERT script :

DROP TABLE test_tab CASCADE CONSTRAINTS
/
CREATE TABLE test_tab
(
account_id VARCHAR2(20),
freq_ind VARCHAR2(1),
perf_rollupid VARCHAR2(1),
month_end_date DATE,
rate_of_ret NUMBER(28,12),
unit_value_tot NUMBER(28,12)
)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20020213','YYYYMMDD'),0,100)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20020228','YYYYMMDD'),1.634,0)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20020331','YYYYMMDD'),3.328,0)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20020430','YYYYMMDD'),-1.215,0)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20020531','YYYYMMDD'),0.655,0)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20020630','YYYYMMDD'),1.516,0)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20020731','YYYYMMDD'),-0.927,0)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20020831','YYYYMMDD'),-1.588,0)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20020930','YYYYMMDD'),-1.49,0)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20021031','YYYYMMDD'),-4.538,0)
/
INSERT INTO test_tab VALUES('2195','M','1',TO_DATE('20021130','YYYYMMDD'),2.428,0)
/

I was able to calculate the Unit Value total based on the total return using the formula (e.g on using EXP(SUM(LN)) to perform a product. However, I want to calculate the QTD_TOT_RETURN as follows :

QTD_TOT_RETURN :=
(Unit Val Tot/Prev Qtr Unit value Tot - 1) * 100

So the output of the above query should be as follows :
Unit Value QTD Tot Return
2195 M 1 13-FEB-02 0 100 0
2195 M 1 28-FEB-02 1.634 101.634 1.634
2195 M 1 31-MAR-02 3.328 105.01637952 5.01637952
2195 M 1 30-APR-02 -1.215 103.7404305088 -1.215*
2195 M 1 31-MAY-02 .655 104.4199303286 -0.56795825*
2195 M 1 30-JUN-02 1.516 106.0029364724 0.93943150293*
2195 M 1 31-JUL-02 -.927 105.0202892513 -0.927*
2195 M 1 31-AUG-02 -1.588 103.3525670580 -2.50027924*
2195 M 1 30-SEP-02 -1.49 101.8126138088 -3.953025079324
2195 M 1 31-OCT-02 -4.538 97.1923573942 -4.538
2195 M 1 30-NOV-02 2.428 99.5521878317 -2.22018264
2195 M 1 31-DEC-02 2.132 101.6746404763 -0.135516933885
....

For eg the calculation for -1.215 is as follows

-1.215 := (103.7404305088 / 105.01637952 - 1) * 100
(unit value tot
for prev qtr
which is march)
-2.50027924 =
(103.3525670580/106.0029364724 - 1) * 100
(unit value total
for prev qtr
which is June)

How do I get the Unit Value total for the prev qtr. Is there any analytic function that I could use to get previous quarter. Please help.

Tom Kyte
August 17, 2004 - 6:14 pm UTC

ops$tkyte@ORA9IR2> select t.*,
  2          nvl(exp( sum( decode(rate_of_ret,0,to_number(0),ln(1+rate_of_ret/100)) )
  3                    over (order by month_end_date)),1) *
  4          first_value(unit_value_tot) over (order by month_end_date) unit_value,
  5                  trunc(month_end_date,'q')-1 last_q
  6    from test_tab t
  7  /
 
ACCO F P MONTH_END    ror  uvt UNIT_VALUE LAST_Q
---- - - --------- ------ ---- ---------- ---------
2195 M 1 13-FEB-02   .000  100        100 31-DEC-01
2195 M 1 28-FEB-02  1.634    0    101.634 31-DEC-01
2195 M 1 31-MAR-02  3.328    0  105.01638 31-DEC-01
2195 M 1 30-APR-02 -1.215    0 103.740431 31-MAR-02
2195 M 1 31-MAY-02   .655    0  104.41993 31-MAR-02
2195 M 1 30-JUN-02  1.516    0 106.002936 31-MAR-02
2195 M 1 31-JUL-02  -.927    0 105.020289 30-JUN-02
2195 M 1 31-AUG-02 -1.588    0 103.352567 30-JUN-02
2195 M 1 30-SEP-02 -1.490    0 101.812614 30-JUN-02
2195 M 1 31-OCT-02 -4.538    0 97.1923574 30-SEP-02
2195 M 1 30-NOV-02  2.428    0 99.5521878 30-SEP-02
 
11 rows selected.

<b>we take your query, assign the last qtr end date to each record...</b>

 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select       x.*,
  2          first_value( unit_value ) over (order by month_end_date range month_end_date-last_q preceding) y,
  3          first_value( month_end_date ) over (order by month_end_date range month_end_date-last_q preceding) z
  4     from (
  5  select t.*,
  6          nvl(exp( sum( decode(rate_of_ret,0,to_number(0),ln(1+rate_of_ret/100)) )
  7                    over (order by month_end_date)),1) *
  8          first_value(unit_value_tot) over (order by month_end_date) unit_value,
  9                  trunc(month_end_date,'q')-1 last_q
 10    from test_tab t
 11         ) x
 12  /
 
ACCO F P MONTH_END    ror  uvt UNIT_VALUE LAST_Q             Y Z
---- - - --------- ------ ---- ---------- --------- ---------- ---------
2195 M 1 13-FEB-02   .000  100        100 31-DEC-01        100 13-FEB-02
2195 M 1 28-FEB-02  1.634    0    101.634 31-DEC-01        100 13-FEB-02
2195 M 1 31-MAR-02  3.328    0  105.01638 31-DEC-01        100 13-FEB-02
2195 M 1 30-APR-02 -1.215    0 103.740431 31-MAR-02  105.01638 31-MAR-02
2195 M 1 31-MAY-02   .655    0  104.41993 31-MAR-02  105.01638 31-MAR-02
2195 M 1 30-JUN-02  1.516    0 106.002936 31-MAR-02  105.01638 31-MAR-02
2195 M 1 31-JUL-02  -.927    0 105.020289 30-JUN-02 106.002936 30-JUN-02
2195 M 1 31-AUG-02 -1.588    0 103.352567 30-JUN-02 106.002936 30-JUN-02
2195 M 1 30-SEP-02 -1.490    0 101.812614 30-JUN-02 106.002936 30-JUN-02
2195 M 1 31-OCT-02 -4.538    0 97.1923574 30-SEP-02 101.812614 30-SEP-02
2195 M 1 30-NOV-02  2.428    0 99.5521878 30-SEP-02 101.812614 30-SEP-02
 
11 rows selected.

<b>then we compute the first_value in the range window that precedes that row -- just pulled up the date to see that I was getting what I meant to get...</b>
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select       x.*,
  2          ( unit_value/
  3          first_value( unit_value ) over (order by month_end_date range month_end_date-last_q preceding) -1)*100  qtd
  4     from (
  5  select t.*,
  6          nvl(exp( sum( decode(rate_of_ret,0,to_number(0),ln(1+rate_of_ret/100)) )
  7                    over (order by month_end_date)),1) *
  8          first_value(unit_value_tot) over (order by month_end_date) unit_value,
  9                  trunc(month_end_date,'q')-1 last_q
 10    from test_tab t
 11         ) x
 12  /
 
ACCO F P MONTH_END    ror  uvt UNIT_VALUE LAST_Q           QTD
---- - - --------- ------ ---- ---------- --------- ----------
2195 M 1 13-FEB-02   .000  100        100 31-DEC-01          0
2195 M 1 28-FEB-02  1.634    0    101.634 31-DEC-01      1.634
2195 M 1 31-MAR-02  3.328    0  105.01638 31-DEC-01 5.01637952
2195 M 1 30-APR-02 -1.215    0 103.740431 31-MAR-02     -1.215
2195 M 1 31-MAY-02   .655    0  104.41993 31-MAR-02 -.56795825
2195 M 1 30-JUN-02  1.516    0 106.002936 31-MAR-02 .939431503
2195 M 1 31-JUL-02  -.927    0 105.020289 30-JUN-02      -.927
2195 M 1 31-AUG-02 -1.588    0 103.352567 30-JUN-02 -2.5002792
2195 M 1 30-SEP-02 -1.490    0 101.812614 30-JUN-02 -3.9530251
2195 M 1 31-OCT-02 -4.538    0 97.1923574 30-SEP-02     -4.538
2195 M 1 30-NOV-02  2.428    0 99.5521878 30-SEP-02 -2.2201826
 
11 rows selected.
<b> and there you go.  analytics rock, analytics roll</b>
 

Excellent

A reader, August 17, 2004 - 9:50 pm UTC

Tom,

I don't find any words to describe your brilliance. I would consider myself fortunate if I could get an office work place by your side. Just curious how do find out solutions to these complex logic

Thanks a million.



A reader, September 02, 2004 - 9:34 am UTC

Tom,

I have a table similar to dept in which i have to do a count of people who belong to deptno 10,20 and 30 as category I
and people who belong to deptno 30,40 as category II.
Can i do this using analytic function without having to do multiple scans of table.

Thanks.

Tom Kyte
September 02, 2004 - 9:58 am UTC

select count( case when deptno in ( 10, 20, 30 ) then 1 end ) cat_I,
count( case when deptno in ( 30, 40 ) then 1 end ) cat_II
from emp;




A reader, September 02, 2004 - 11:05 am UTC

Tom,

I looking for the other way ex

Cat_name count
CAt1 10
cat2 20

Thanks

Tom Kyte
September 02, 2004 - 11:09 am UTC

select decode( r, 1, 'Cat1', 2, 'Cat2' ),
decode( r, 1, cat_i, 2, cat_ii )
from
(
select count( case when deptno in ( 10, 20, 30 ) then 1 end ) cat_I,
count( case when deptno in ( 30, 40 ) then 1 end ) cat_II
from emp),
(select rownum r from all_objects where row <= 2 )



since you need to double count dept 30, you'll need to pivot. if you didn't need to double count deptno=30, answer would of course be different.




The other way

A reader, September 02, 2004 - 11:13 am UTC

SQL> select deptno, count(*) from emp group by deptno;

    DEPTNO   COUNT(*)
---------- ----------
        10          3
        20          5
        30          6

SQL> select decode(deptno, 10, 'Cat1', 20, 'Cat1', 30, 'Cat2'), count(*)
  2  from emp
  3  group by decode(deptno, 10, 'Cat1', 20, 'Cat1', 30, 'Cat2');

DECO   COUNT(*)
---- ----------
Cat1          8
Cat2          6
 

Tom Kyte
September 02, 2004 - 11:19 am UTC

...
belong to deptno 10,20 and 30 as category I
and people who belong to deptno 30,40 as category II.
........

I took a literal read of their request -- if deptno=30 is "wrongly specificied", then your way is correct....

scott@ORA9IR2> select decode( r, 1, 'Cat1', 2, 'Cat2' ),
2 decode( r, 1, cat_i, 2, cat_ii )
3 from
4 (
5 select count( case when deptno in ( 10, 20, 30 ) then 1 end ) cat_I,
6 count( case when deptno in ( 30, 40 ) then 1 end ) cat_II
7 from emp),
8 (select rownum r from all_objects where rownum <= 2)
9 /

DECO DECODE(R,1,CAT_I,2,CAT_II)
---- --------------------------
Cat1 14
Cat2 6

scott@ORA9IR2>
scott@ORA9IR2> select decode(deptno, 10, 'Cat1', 20, 'Cat1', 30, 'Cat2'), count(*)
2 from emp
3 group by decode(deptno, 10, 'Cat1', 20, 'Cat1', 30, 'Cat2');

DECO COUNT(*)
---- ----------
Cat1 8
Cat2 6



A reader, September 02, 2004 - 11:29 am UTC

Thanks...

How to do this??

A reader, October 11, 2004 - 11:04 pm UTC

Tom,

In the rate of return calculation example above, I have the following situation :

The ror for 31-OCT-02 becomes -100. In that case I should get the following output :

2195 M 1 31-OCT-02 -100 0 0 30-SEP-02 ...
2195 M 1 30-NOV-02 2.248 0 0 30-SEP-02 ...

On 31-OCT-02 since ror is -100, UNIT_VALUE should become 0 for this month and all subsequent months. However, since ror is -100, 1+ror/100 becomes 0 (1 + (-100)/100) and so I am getting the error 'Zero is not a valid argument' because LN(1+ror/100) does not work if (1+ror/100) is 0. Any way to overcome this problem?

Your valuable suggestion is appreciated




Tom Kyte
October 12, 2004 - 7:48 am UTC

use CASE or decode I guess (but i don't see it as -100 in the examples above?)

use CASE to say "do this when it is zero, do this when it is not zero"

Could you please show me how to do this with oracle 8.05

Poernomo Halim, January 18, 2005 - 10:35 pm UTC

Hi Tom,

Could you please show me how to do this.

table A
--------
cust_no Debet Credit balance

1 0 1000 1000
2 100 0` 900
3 250 0 650
4 0 100 750
----------------------------------

and

Table B
--------

table A
--------
cust_no Mutasi Type balance

1 1000 K 1000
2 100 K` 1100 --- > if K = +
3 200 D 900 --- > if D = -
4 50 D 850
5 150 K 1000
----------------------------------


Could this work with oracle 8.05 with easy ?

Please advice.

Regards

Poernomo


Tom Kyte
January 19, 2005 - 10:23 am UTC

not even sure what I'm looking at here, but 8.0.5 did not have analytics way back when, so using functions like sum() over () which are typically necessary for this is impossible.

Could this work with oracle 8.05 with easy ?

Poernomo, January 18, 2005 - 10:38 pm UTC

Hi Tom,

Could you please show me how to do this.

table A
--------
cust_no Debet Credit balance

1 0 1000 1000
2 100 0` 900
3 250 0 650
4 0 100 750
----------------------------------

and

Table B
--------

cust_no Mutasi Type balance

1 1000 K 1000
2 100 K` 1100 --- > if K = +
3 200 D 900 --- > if D = -
4 50 D 850
5 150 K 1000
----------------------------------


Could this work with oracle 8.05 with easy ?

Please advice.

Regards

Poernomo


analytic function for this query

A reader, January 19, 2005 - 9:39 am UTC

Hi

I have these data in a staging table

CO_CUSTOMER CO_CONTRACT
1 1
1 1
2 3
2 1

The condition is that a contract belongs only to a customer and that contract plus customer cannot be duplicated so in this case the only valid data is

CO_CUSTOMER CO_CONTRACT
2 3

How can we filter other rows to only this row? I can do it in three queries but is it possible using only one?

Thanks

Tom Kyte
January 19, 2005 - 10:58 am UTC

ops$tkyte@ORA9IR2> select * from t;
 
         X          Y
---------- ----------
         1          1
         1          1
         2          3
         2          1
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select *
  2    from (
  3  select x, y, count(distinct x) over (partition by y) cnt
  4    from t
  5         )
  6   where cnt = 1;
 
         X          Y        CNT
---------- ---------- ----------
         2          3          1
 

excellent but why this happens when I have this data

A reader, January 19, 2005 - 11:20 am UTC

Hi

The query works great thanks, but when I have this data it returns wrong result?

CO_CLIENTE CO_C
---------- ----
1 1
1 1
1 1
2 1
2 2
5 2
2 3
2 3
3 4
4 5
5 6


select * from (
select co_cliente, co_contrato, count(distinct co_cliente) over (partition by co_contrato) cnt
from cont
)
where cnt = 1

CO_CLIENTE CO_C CNT
---------- ---- ----------
2 3 1
2 3 1
3 4 1
4 5 1
5 6 1

shouldnt 2 and 3 disappear too?

Tom Kyte
January 19, 2005 - 9:05 pm UTC

not according to your specification:

The condition is that a contract belongs only to a customer and that contract
plus customer cannot be duplicated so in this case the only valid data is



not the way I read it, you can of course just count(*) over (partition by co_contrato) if you meant "only one record per co_contrato"

Solution

Bob B, January 19, 2005 - 12:45 pm UTC

If the 2 3 should disappear, use this query:

select * from (
select co_cliente, co_contrato, count(co_cliente) over (partition by co_contrato) cnt
from cont
)
where cnt = 1

Using "count( distinct co_cliente ) over (partition by co_contrato) cnt" returns a "cnt" of 1 whenever there is only 1 unique co_cliente assigned to co_contrato.

Switching it to "count( co_cliente ) over (partition by co_contrato) cnt" returns a "cnt" of 1 whenever there is exactly 1 row with the given co_contrato number.

Both of the above assume that co_cliente is not null.

sorry Tom, the specification you read was correct

A reader, January 20, 2005 - 7:11 am UTC

Hi

you are right, your understanding was correct!

CO_CLIENTE CO_C CNT
---------- ---- ----------
2 3 1
2 3 1
3 4 1
4 5 1
5 6 1

is the correct result set

cheers



analytic function by date range

Ryan Gaffuri, March 24, 2005 - 9:19 am UTC

Is it possible to do the following with an analytic function?

I have 2 fields

type, date

I need to partition by type,date, then have a count(*) for each partition.

However, my date partition needs to have the following ranges

less than 3 months
3-6 months
6-9 months
9-12 months
more than 1 year

Tom Kyte
March 24, 2005 - 3:43 pm UTC

less than 3 months of what?

you have a date -- is it three months from sysdate?


this should get you going:


select type, date_string, count(*)
from (
select type, case when months_between(sysdate,dt) < 3
then 'less than 3 months'
when months_between(sysdate,dt) < 6
then '3 to less than 6 months'
when months_between.........
end date_string
from t
)
group by type, date_string;


Analytical Function

Lindsey, April 14, 2005 - 11:43 am UTC

You are so brilliant! Please help with my following issue:

My tables is
TypeID Submitted Approved Denied
1 06-APR-05 07-APR-05
1 06-APR-05 12-APR-05
2 05-APR-05 06-APR-05
2 05-APR-05 07-APR-05
3 07-APR-05 08-APR-05
3 05-APR-05 08-APR-05

My result will count how many Submitted,Approved and Denied
for each type within period e.g. from 06-apr-05 to 07-apr-05. The result should be like:

TypeID Submitted Approved Denied
1 2 1 0
2 0 1 1
3 1 0 0

Can a query establish this? Please help. Thanks.

Tom Kyte
April 14, 2005 - 11:52 am UTC

that is easy

select typeid,
count( case when submitted between :x and :y then submitted end ),
count( case when approved between :x and :y then approved end ),
count( case when denied between :x and :y then denied end )
from t
group by typeid;


you might need to nvl() those if you want zeroes.

Analytical Function

James Stansell, April 14, 2005 - 12:17 pm UTC

Lindsey was asking about limiting the count to a certain time period. I think something like this might be required:

count(case when submitted between date '2005-04-06' and date '2005-04-07' then 1 else null end),
count(case when approved between date '2005-04-06' and date '2005-04-07' then 1 else null end),
count(case when denied between date '2005-04-06' and date '2005-04-07' then 1 else null end)

I didn't put the date criteria in a where clause because that wouldn't produce the desired output, as specified by the sample in the question.

[Disclaimer: This is begging for bind variables for the dates. Also I didn't actually create a table and try to run this, so my syntax might be slightly off.]

Tom Kyte
April 14, 2005 - 1:20 pm UTC

that is what I did? I used bind variable notation, :x and :y but that is what I did?

also believed nvl may be needed ...

Gabe, April 14, 2005 - 1:06 pm UTC

<quote>
you might need to nvl() those if you want zeroes.
</quote>

No need for nvl ... but there is a need for a where condition to avoid getting all the 'typeid's back.

Analytical Function

Lindsey, April 14, 2005 - 3:03 pm UTC

Thanks. The query looks so simple on Oracle. I'm proud to be an Oracle user. Is it possible to make a view out of that? I need that logic to build a report (result count table) from a non-Oracle reporting product which I don't think can produce that simple query.

Tom Kyte
April 14, 2005 - 3:09 pm UTC

can you run a stored procedure right before selecting from the view in this tool?

Yes we can make it a view.
But, it makes it hard to pass the dates into it!
But if we can call a procedure (or just execute a piece of plsq) then we can do it.

Analytic Function

Lindsey, April 14, 2005 - 5:58 pm UTC

Hi Tom,

No, I cann't execute the procedure from that tool. They can only generate a report based on either tables and views. They have ability to join the tables but I rather build the logic from Oracle and pass the view to them to generate report. I posted a question to the tech support to generate that report with my current table data, they are still researching the way. Thanks for any helps that you have.

Tom Kyte
April 14, 2005 - 6:15 pm UTC

what is the tool?

and are you sure you cannot use the query as coded and bind the values into it?

Analytical Function

Lindsey, April 15, 2005 - 9:47 am UTC

The tool is WebFocus.
Yes, WebFocus can do joining and we can create the report with parameters so user can pass values to it. My point here is to build a logic into an Oracle view so that can feed into the report then report generates the crosstab to count status (submitted, approved...) based on the input dates from user. Does it make sense to you? Thanks.

Tom Kyte
April 15, 2005 - 10:37 am UTC

but the view would need "parameters" (because you have to put the :x and :y in the select list)

and you cannot parameterize a view without calling a stored procedure.

For Lindsay ...

Gabe, April 15, 2005 - 11:39 am UTC

</code> http://www.iwaysoftware.com/products/adapters/oracle.html <code>

The iWay's Oracle Adapter for WebFocus does support stored procedures (which likely means in can do ref cursors ... which renders the view approach moot ... just pass parameters in and let Oracle do the heavy lifting in the procedure).

You may have an older version though ... but do some seaching in the WebFocus doco beyond mapping tables and defining joins.

Cheers.

Analytical Function

Lindsey, April 15, 2005 - 11:49 am UTC

Hi Tom,

You can ignore my issue. I was able to work with a consultant from WebFocus to establish the report.
Thanks anyway for your helps on query.
I always learn something every time I get in your site
on any topic. Thanks!


Can Analytics help me with this query ?

A reader, May 10, 2005 - 11:20 am UTC

We have this query as a part of a huge PL/SQL program.

select mp.ey_rpt_man_cntry,
gp.iu_legal_ent_bu,
me.ey_rpt_man_cntry,
ge.iu_legal_ent_bu
into prj_mc,
prj_le,
emp_mc,
emp_le
from ps_pc_job e,
ps_ey_proj_gl_org o,
ps_bus_unit_tbl_gl gp,
ps_ey_rpt_lgle_tbl mp,
ps_bus_unit_tbl_gl ge,
ps_ey_rpt_lgle_tbl me
where o.business_unit =(SELECT EY_CODE_VALUE
FROM PS_EY_CODE_VAL_TBL
WHERE EY_CODE_ID='PCBU'
AND EY_CODE_TYPE='GFIS_PCBU')
and o.project_id = prj_id
and o.effdt = (select max(o2.effdt)
from ps_ey_proj_gl_org o2
where o2.business_unit = o.business_unit
and o2.project_id = o.project_id
and o2.effdt <= asof_date)
and gp.business_unit = o.business_unit_gl
and mp.iu_legal_ent_bu = gp.iu_legal_ent_bu
and mp.effdt = (select max(mp2.effdt)
from ps_ey_rpt_lgle_tbl mp2
where mp2.iu_legal_ent_bu = mp.iu_legal_ent_bu
and mp2.effdt <= asof_date)
and e.business_unit = (SELECT EY_CODE_VALUE
FROM PS_EY_CODE_VAL_TBL
WHERE EY_CODE_ID = 'PCBU'
AND EY_CODE_TYPE = 'GFIS_PCBU')

and e.emplid = emp_id
and e.effdt = (select max(e2.effdt)
from ps_pc_job e2
where e2.business_unit = e.business_unit
and e2.emplid = e.emplid
and e2.effdt <= asof_date)
and e.effseq = (select max(e3.effseq)
from ps_pc_job e3
where e3.business_unit = e.business_unit
and e3.emplid = e.emplid
and e3.effdt = e.effdt)
and ge.business_unit = e.business_unit_gl
and me.iu_legal_ent_bu = ge.iu_legal_ent_bu
and me.effdt = (select max(me2.effdt)
from ps_ey_rpt_lgle_tbl me2
where me2.iu_legal_ent_bu = me.iu_legal_ent_bu
and me2.effdt <= asof_date);

prj_id and asof_date are being passed as parameters to the procedure. Would it be possible to rewrite the max subquery part using analytics ? Could you please show me how to ?

Thanks a lot.



Tom Kyte
May 10, 2005 - 1:22 pm UTC

*maybe*

the problem is the analytics happen AFTER the predicate, the subqueries you have are not affected by the predicate (eg:

(select max(e2.effdt)
from ps_pc_job e2
where e2.business_unit = e.business_unit
and e2.emplid = e.emplid
and e2.effdt <= asof_date)

sees all of the rows in ps_pc_job for that bu/emplid/effdt, the analytics would only see the rows that survive the join of E to GE, ME, O and so on.



but I cannot even tell what are PLSQL inputs and what are columns -- it is not clear (strongly, very very very strongly suggest you PREFIX ALL plsql variables with p_ for parameters, l_ for local variables -- always)

A reader, May 16, 2005 - 3:23 pm UTC


Identify groups of transactions using analytics

andrew, July 13, 2005 - 4:54 pm UTC

I am working with a 3rd party application on 8.1.7.4., and I need to write a query to identify a family of transactions. CONNECT BY does not work because this is a self referencing table.
I have been trying to find a way to use LEAD and LAG, but so far no luck.

The number of transactions per family is variable.

The transactions which define a family may be randomly distributed within the table.

The starting record for a family will have a value in wss_tid and all other "tid" fields will be null
OR
wss_tid = to_orig_id and orig_tid = null and ndf_orig_tid = null.

Members of a family are identified by org_tid,to_orig_tid or ndf_orig_tid = wss_tid.

A transaction may only be a member of one family.

In the example below there are 5 families of records

wss_tid : 1,2,3,4,5 and 7
wss_tid : 6
wss_tid : 8,9,12
wss_tid : 10
wss_tid : 11

My results should look like :

OWNER_TID WSS_TID ORIG_TID TO_ORIG_TID NDF_ORIG_TID

1 1 1
1 2 1
1 3 1
1 4 1
1 5 2
1 7 2
6 6
8 8
8 9 8
8 12 9
10 10 10
11 11

create table my_table
(WSS_TID VARCHAR2(15) NOT NULL,
ORIG_TID VARCHAR2(15),
TO_ORIG_TID VARCHAR2(15),
NDF_ORIG_TID VARCHAR2(15),
VALUE_DATE DATE,
WSS_GDP_TIME DATE)
/

insert into my_table values ('1','','1','','04-JUL-05','04-JUL-05');
insert into my_table values ('2','','','1','05-JUL-05','04-JUL-05');
insert into my_table values ('3','','1','','04-JUL-05','04-JUL-05');
insert into my_table values ('4','1','','','04-JUL-05','04-JUL-05');
insert into my_table values ('5','','','2','04-JUL-05','04-JUL-05');
insert into my_table values ('6','','','','10-JUL-05','04-JUL-05');
insert into my_table values ('7','2','','','04-JUL-05','04-JUL-05');
insert into my_table values ('8','','','','27-JUN-05','04-JUL-05');
insert into my_table values ('9','','8','','04-JUL-05','04-JUL-05');
insert into my_table values ('10','','10','','04-JUL-05','04-JUL-05');
insert into my_table values ('11','','','','04-JUL-05','04-JUL-05');
insert into my_table values ('12','9','','','30-JUN-05','04-JUL-05');

select * from my_table;
WSS_TID ORIG_TID TO_ORIG_TID NDF_ORIG_TID VALUE_DAT WSS_GDP_T
--------- ---------- ------------ --------------- --------- ---------
1 1 04-JUL-05 04-JUL-05
2 1 05-JUL-05 04-JUL-05
3 1 04-JUL-05 04-JUL-05
4 1 04-JUL-05 04-JUL-05
5 2 04-JUL-05 04-JUL-05
6 10-JUL-05 04-JUL-05
7 2 04-JUL-05 04-JUL-05
8 27-JUN-05 04-JUL-05
9 8 04-JUL-05 04-JUL-05
10 10 04-JUL-05 04-JUL-05
11 04-JUL-05 04-JUL-05
12 9 30-JUN-05 04-JUL-05

Thank you for all the great work on this site.


Tom Kyte
July 13, 2005 - 5:01 pm UTC

I do not understand why "1 and 2" are in the same family, there doesn't see to be a "start" record for them?

Identify groups of transactions using analytics

Andrew, July 14, 2005 - 9:34 am UTC

Hi Tom,

1 is the starting record for the first family of transactions because wss_tid = 1 and to_orig_tid = 1 and orig_tid and ndf_orig_tid are null. This is equivalent to the situation where wss_tid = 1 and orig_tid,to_orig_tid and ndf_to_orig_tid are null.

2 is part of family 1 because the record where wss_tid = 2 the field ndf_orig_tid = 1, hence the relationship to the record where wss_tid = 1. Any records related to wss_tid = 2 are part of family 1, because the record where wss_tid = 2 is related to the record where wss_tid = 1.


The purpose of this query is to improve the performance a purging routine which is currently written using PLSQL cursors that traverse the transaction table many times.

The business rule is that a record may only be purged if all related records also meet the purge criteria, in this case a cut off date.


Many thanks.


Tom Kyte
July 14, 2005 - 11:12 am UTC

I don't know what to group and order by things here. what is the relevance of owner_tid and where did it go? It looks like it might be relevant.

why varchar2's are these numbers or strings, what sequences this set, what "groups" it.

Identify groups of transactions using analytics

Andrew, July 15, 2005 - 12:22 pm UTC

The vendor of this product has defined the fields as varchar2, in our use of the product the data is always numeric.

The application will populate WSS_TID, and may also populate one of TO_ORIG_TID,ORIG_TID or NDF_ORIG_TID depending on the type of transaction. WSS_TID is mandatory and will be unique.

Sorry, I wasn't very clear about OWNER_TID. OWNER_TID is a copy of WSS_TID for the starting record of a family of transactions, it is a derived field. This is the field that the result set should be grouped and ordered by.

Thank you for the prompt replies.


Tom Kyte
July 15, 2005 - 6:09 pm UTC

OWNER_TID WSS_TID ORIG_TID TO_ORIG_TID NDF_ORIG_TID

1 1 1
1 2 1
1 3 1
1 4 1
1 5 2
1 7 2
6 6
8 8
8 9 8
8 12 9
10 10 10
11 11

but you see -- I don't understand how to "sort" the input data (which is vital for the analytics) in order to get wws_tid = 1,2,3,4,5,7 together (but not 6). 7? how did 7 skip back?

Identify groups of transactions using analytics

Andrew, July 20, 2005 - 2:38 pm UTC

Yes, I see why analytics won't work here, thank you for taking the time to explain things.

My followup on the original question

Rennie, December 07, 2005 - 6:23 am UTC

Tom,

I had started this thread 3 years back with a rather strange question. What i asked was impossible then, with 8i.

I wanted a value from the resultset without disturbing the existing query (no inline views - no change in group by etc). It was a very complex query (several inline views) and i kind of represented it in a simple manner using the table tab1.

Anyway, this followup is just to to say that i can get my result in 9i if i fire this query

select
col1,
MAX(col2) KEEP (DENSE_RANK FIRST ORDER BY col3)
from
tab1
group by
col1;

Thanks,

Randall, December 14, 2005 - 8:34 am UTC

Tom,

I have a problem and I wanted to know if it could be solved with Analytics. But I just can't wrap my head around it. I need to get a min val for a grouping but the records have to stay in order prior to grouping. Here is a sample.

Oracle Ver 9.2.0.7

CREATE TABLE T
( SCHED_NO CHAR(12 BYTE) NOT NULL,
GLOBAL_ORDER_SEQ NUMBER(10) NOT NULL,
PCS_GROUP_CODE CHAR(20 BYTE) NOT NULL,
STAT_ID CHAR(1 BYTE) NOT NULL
);

INSERT INTO T VALUES ('AAAA100',1,'1157A','C');
INSERT INTO T VALUES ('AAAA102',3,'1157A','C');

INSERT INTO T VALUES ('AAAA103',7,'997A','C');
INSERT INTO T VALUES ('AAAA104',12,'997A','C');

INSERT INTO T VALUES ('AAAA105',13,'1157A','C');
INSERT INTO T VALUES ('AAAA106',16,'1157A','C');
INSERT INTO T VALUES ('AAAA107',17,'1157A','A');
INSERT INTO T VALUES ('AAAA108',18,'1157A','N');
INSERT INTO T VALUES ('AAAA109',20,'1157A','N');

INSERT INTO T VALUES ('AAAA110',21,'8957A','N');
INSERT INTO T VALUES ('AAAA111',30,'8957A','N');
INSERT INTO T VALUES ('AAAA112',31,'1157A','N');

What I want to find is the min GLOBAL_ORDER_SEQ for the active order grouping. SCHED_NO 105-109 are all part of the same group so in this case I would return.

GLOBAL_ORDER_SEQ
----------------
13


I have built a function that does this by looping through the records in reverse order. It works but it won't be very fast.


By the way... I am up to Chapter 9 your new book. Great job.. I look forward to part two..

Thanks..

Tom Kyte
December 14, 2005 - 9:12 am UTC

not sure what the "active order grouping" is precisely.

for example, why is the last record in the group it is in, what is the logic here. what defines your concept of a group.

Analytic is fun

Michel Cadot, December 14, 2005 - 12:44 pm UTC

Hi Randall,

I think this one will match your requirement:

SQL> select sched_no, global_order_seq, pcs_group_code, stat_id
  2  from t
  3  order by sched_no
  4  /
SCHED_NO     GLOBAL_ORDER_SEQ PCS_GROUP_CODE       S
------------ ---------------- -------------------- -
AAAA100                     1 1157A                C
AAAA102                     3 1157A                C
AAAA103                     7 997A                 C
AAAA104                    12 997A                 C
AAAA105                    13 1157A                C
AAAA106                    16 1157A                C
AAAA107                    17 1157A                A
AAAA108                    18 1157A                N
AAAA109                    20 1157A                N
AAAA110                    21 8957A                N
AAAA111                    30 8957A                N
AAAA112                    31 1157A                N

12 rows selected.

SQL> select min(global_order_seq) global_order_seq
  2  from ( select sched_no, global_order_seq, pcs_group_code, stat_id,
  3                max(grp) over (order by sched_no) grp
  4         from ( select sched_no, global_order_seq, pcs_group_code, stat_id,
  5                       case 
  6                       when prev_group != pcs_group_code or prev_group is null
  7                       then rn end grp
  8                from ( select sched_no, global_order_seq, pcs_group_code, stat_id,
  9                              lag(pcs_group_code) over (order by sched_no) prev_group,
 10                              row_number () over (order by sched_no) rn
 11                       from t ) ) )
 12  group by grp
 13  having min(stat_id) = 'A'
 14  /
GLOBAL_ORDER_SEQ
----------------
              13

1 row selected.

Regards
Michel
 

Tom Kyte
December 14, 2005 - 1:00 pm UTC

(still would like him to say "how" the logic works here).

Randall, December 14, 2005 - 2:26 pm UTC

What we are going for is this.

Orders are processed in order (Global_order_seq). STAT_ID is (N)New, (A)Active, (C)Complete.

The Orders are grouped via the pcs_group_code, so when the group code changes, it starts a new group.

The front will end pull all orders that are New or Active. But we also want to pull all Completed orders that are part of the Active group.

The user views the orders in group mode which looks like one large active order.


Thanks... Michel I will give it a try...


Randall, December 14, 2005 - 2:34 pm UTC

Thanks Michel, Tom

It worked with one small change. I changed the sorting to order by Global_Order_seq

SELECT MIN (global_order_seq) global_order_seq
FROM (SELECT sched_no,
global_order_seq,
pcs_group_code,
stat_id,
MAX (grp) OVER (ORDER BY global_order_seq) grp
FROM (SELECT sched_no,
global_order_seq,
pcs_group_code,
stat_id,
CASE
WHEN prev_group != pcs_group_code
OR prev_group IS NULL THEN rn
END grp
FROM (SELECT sched_no,
global_order_seq,
pcs_group_code,
stat_id,
LAG (pcs_group_code) OVER (ORDER BY global_order_seq) prev_group,
ROW_NUMBER () OVER (ORDER BY global_order_seq) rn
FROM t)))
GROUP BY grp
HAVING MIN (stat_id) = 'A'

is it possible to do an "exclusive or" in sql?

Ryan, December 19, 2005 - 5:11 pm UTC

I have two conditions. Condition 1 and condition 2.

if condition 1 exists don't look at condition 2.
If condition 1 does not exist try condition 2.

If I do

select col1
from tab1
where col2 = 'stuff'
or col3 = 'stuff2'

it gets both. I only want to look at col3 = 'stuff2' if the first condition does not exist.

If I do the following

select col1
from tab1
where col2 = 'stuff'
or (col2 != 'stuff' and col3 = 'stuff2')

Oracle looks at the query on a row by row basis. So if in row 1 the first condition is met good, but if in row 2 it's not it still brings it back.

Basically on a set level. If condition 1 is every met in the query do not do condition 2.

Tom Kyte
December 19, 2005 - 5:17 pm UTC

you mean "short circut evaluation"

it does short circut. But you have no control over which part goes first - I don't care how many () you add.


I don't get the last part ?

i'll try to explain it better

Ryan, December 20, 2005 - 12:49 am UTC

Lets say I have a set of the following records with 3 fields:

1,'Hello','Bob'
2, 'Goodbye','Tom'
3, 'See Ya','Jack'

If any rows in field2 = 'Goodbye', stop and do not look at the second condition.

If no rows in column2 = 'Goodbye' look at the second condition



select col1
from tab1
where col2 = 'Hello'
or (col2 != 'stuff' and col3 = 'Jack') /*+ only do this condition if there are NO rows returned where col2 = 'Hello' */


Tom Kyte
December 20, 2005 - 8:44 am UTC

That is hardly short circut evaluation or "exclusive or" to me. But it can be done in sql. (is there a real world use for this?)

with data
as
(
select *
from t
where COND1
)
select *
from data
union all
select *
from t
where NOT EXISTS (select null from data)
and ( COND2 );





"most recently created" data for each distinct period

Bob Lyon, March 03, 2006 - 1:05 pm UTC

Tom,

The system I am supporting has tables sort of like this...
the general idea is that the most recently created record for any given
day is the value we want to select.

CREATE GLOBAL TEMPORARY TABLE t0 (
PK_ID INTEGER
-- Lots of other columns
);

CREATE GLOBAL TEMPORARY TABLE t1 (
PK_ID INTEGER,
t0_FK_ID INTEGER, -- Foreign Key on t0.PK_ID
START_DT_GMT DATE,
END_DT_GMT DATE,
CREATED_DT DATE,
VALUE NUMBER,
COMMENT_TEXT VARCHAR2(30)
-- Lots of other columns
);

CREATE GLOBAL TEMPORARY TABLE t2 (
PK_ID INTEGER,
t0_FK_ID INTEGER, -- Foreign Key on t0.PK_ID
START_DT_GMT DATE,
END_DT_GMT DATE,
CREATED_DT DATE,
VALUE NUMBER,
COMMENT_TEXT VARCHAR2(30)
-- Lots of other columns
);

Here is some simplistic data...
END_DT_GMTs represent "date when the data is no longer valid"
rather than "the last day the data is valid".

ALTER SESSION SET NLS_DATE_FORMAT='MM/DD/YYYY';

INSERT INTO t0 VALUES (1);

INSERT INTO t1 VALUES (2,1,TO_DATE('08/01/2005'), TO_DATE('10/01/2005'), TO_DATE('08/01/2005'), 1, 'T1-2');
INSERT INTO t1 VALUES (7,1,TO_DATE('08/15/2005'), TO_DATE('08/30/2005'), TO_DATE('08/15/2005'), 2, 'T1-7');
INSERT INTO t1 VALUES (8,1,TO_DATE('09/02/2005'), TO_DATE('09/15/2005'), TO_DATE('09/02/2005'), 3, 'T1-8');
INSERT INTO t1 VALUES (9,1,TO_DATE('08/20/2005'), TO_DATE('09/10/2005'), TO_DATE('09/05/2005'), 4, 'T1-9');

INSERT INTO t2 VALUES (4,1,TO_DATE('08/18/2005'), TO_DATE('08/26/2005'), TO_DATE('08/15/2005'), 12, 'T2-2');
INSERT INTO t2 VALUES (5,1,TO_DATE('08/21/2005'), TO_DATE('09/29/2005'), TO_DATE('09/02/2005'), 23, 'T2-8');

What we want, and what the below query (Analytics ROCK!) produces,
is the "most recently created" data for each distinct period across tables t1 and t2:

START_DATE END_DATE T1_VALUE T1_COMMENT_TEXT T2_VALUE T2_COMMENT_TEXT
---------- ---------- ---------- --------------- ---------- ---------------
08/01/2005 08/15/2005 1 T1-2
08/15/2005 08/18/2005 2 T1-7
08/18/2005 08/20/2005 2 T1-7 12 T2-2
08/20/2005 08/21/2005 4 T1-9 12 T2-2
08/21/2005 09/10/2005 4 T1-9 23 T2-8
09/10/2005 09/15/2005 3 T1-8 23 T2-8
09/15/2005 09/29/2005 1 T1-2 23 T2-8
09/29/2005 10/01/2005 1 T1-2

8 rows selected.



This query is intended to replace some rather ugly procedural code
and several rollup tables and we think that for a single parent id
will perform fine.

The gotcha is that we have a requirement to provide a view which
could be used to select this data for all of the (possibly
"millions") of rows in "t0". The child tables might have anything
up to (SWAG) about 1000 records for any given parent id, but
typically less than 100.

This is for a system not yet in production and so we don't have
any realistic test data. Do you have any suggestions off the top
of your head to improve the performance of such a view?




col START_DT_GMT HEADING 'START_DATE'
col END_DT_GMT HEADING 'END_DATE'
col t1_COMMENT_TEXT FORMAT A15
col t2_COMMENT_TEXT FORMAT A15

variable x number
exec :x := 1

SELECT anal.start_dt_gmt
,anal.end_dt_gmt
,t1.value t1_value
,t1.comment_text t1_comment_text
-- Lots of other t1 columns
,t2.value t2_value
,t2.comment_text t2_comment_text
-- Lots of other t2 columns
FROM (
SELECT grouped_id -- just for possible display
,MIN(start_date) start_dt_gmt
,MAX(start_date)+1 end_dt_gmt
,MAX(t1_row_id) t1_row_id -- all row_ids are the same for a given group_id
,MAX(t2_row_id) t2_row_id -- all row_ids are the same for a given group_id
FROM ( -- Carry the group identifier forward (fill in the NULL values with the prior group identifier)
SELECT start_date
,t1_row_id
,t2_row_id
,MAX(gid) OVER (ORDER BY start_date) grouped_id
FROM ( -- Assign a group identifier when either rowid changes or is the first record in the set
SELECT start_date
,t1_row_id
,t2_row_id
,CASE
WHEN ROWNUM = 1
THEN t1_row_id||t2_row_id
ELSE LEAD(t1_row_id||t2_row_id) OVER (ORDER BY start_date DESC)
END start_dtw
,CASE
WHEN ROWNUM = 1
THEN ROWNUM
WHEN NVL(t1_row_id||t2_row_id,'~') != LEAD(t1_row_id||t2_row_id)
OVER (ORDER BY start_date DESC)
THEN ROWNUM
ELSE NULL
END gid
FROM (
WITH date_data AS
(
-- Gets the widest date range for the given parent key
-- (either table may overlap date ranges on the other)
-- (Note: The TRUNCs are due to some dates not being GMT)
SELECT start_dt_gmt, LEVEL-1 lvl
FROM (SELECT MAX(end_dt_gmt) - MIN(start_dt_gmt) n_days
,MIN(start_dt_gmt) start_dt_gmt
FROM (SELECT MAX(TRUNC(t1.end_dt_gmt)) end_dt_gmt
,MIN(TRUNC(t1.start_dt_gmt)) start_dt_gmt
FROM t1
WHERE t0_FK_ID = :x
UNION ALL
SELECT MAX(TRUNC(t2.end_dt_gmt)) end_dt_gmt
,MIN(TRUNC(t2.start_dt_gmt)) start_dt_gmt
FROM t2
WHERE t0_FK_ID = :x
)
) n_days
CONNECT BY LEVEL <= n_days
)
SELECT dated_t1.start_date
,dated_t1.row_id t1_row_id
,dated_t2.row_id t2_row_id
FROM ( -- Gets the most recently created rowid for each day
SELECT date_data.start_dt_gmt + lvl start_date
-- This trick gets the rowid for the latest CREATED_DT
-- for a single START_DT_GMT.
,SUBSTR( MAX( TO_CHAR(t1.created_dt, 'YYYYMMDDHH24MISS')
|| t1.ROWID
)
, 15
) row_id
FROM date_data
LEFT OUTER JOIN
t1
ON (date_data.start_dt_gmt + lvl < TRUNC(t1.end_dt_gmt)
AND
date_data.start_dt_gmt + lvl >= TRUNC(t1.start_dt_gmt)
AND
t1.t0_FK_ID = :x
)
GROUP BY date_data.start_dt_gmt + lvl
ORDER BY 1
) dated_t1
,( -- Gets the most recently created rowid for each day
SELECT date_data.start_dt_gmt + lvl start_date
-- This trick gets the rowid for the latest CREATED_DT
-- for a single START_DT_GMT.
,SUBSTR( MAX( TO_CHAR(t2.created_dt, 'YYYYMMDDHH24MISS')
|| t2.ROWID
)
, 15
) row_id
FROM date_data
LEFT OUTER JOIN
t2
ON (date_data.start_dt_gmt + lvl < TRUNC(t2.end_dt_gmt)
AND
date_data.start_dt_gmt + lvl >= TRUNC(t2.start_dt_gmt)
AND
t2.t0_FK_ID = :x
)
GROUP BY date_data.start_dt_gmt + lvl
ORDER BY 1
) dated_t2
WHERE dated_t1.start_date = dated_t2.start_date
)
ORDER BY 1
)
)
GROUP BY grouped_id
ORDER BY grouped_id
) anal
LEFT OUTER JOIN
t1
ON (t1.rowid = CAST(anal.t1_row_id AS ROWID))
LEFT OUTER JOIN
t2
ON (t2.rowid = CAST(anal.t2_row_id AS ROWID));


Tom Kyte
March 03, 2006 - 2:15 pm UTC

sorry, cannot really read, digest and reverse engineer a big query like that on the hypothetical grounds that it might some day be "slow"

just not enough minutes in the day.

use of analytic function

KR, April 04, 2006 - 4:48 pm UTC

Hi Tom
I have a table that has history records
say account_Dim ( Id, Ods_id, Current_ind, etc ...)
sample records

ID ODS_ID Current_ind
1 234 N
2 234 N
10 234 Y
4 465 N
5 465 N
6 465 N
8 465 N
34 765 N

I want all just the distinct ods_id for records that do not have a Current_ind = 'Y'. I have been looking at it so much that I am not getting anywhere. Can you please help me here?
Thanks


Tom Kyte
April 04, 2006 - 7:42 pm UTC

select ods_id
from t
group by ods_id
having count( decode( current_ind, 'Y', 1 ) ) = 0;



Analytic functions

Chris, April 05, 2006 - 9:55 am UTC

Tom,
Is there a way to use mutiple analytic functions on the same "over" without putting them on separate lines which appears to slow things down as you add each one. For example, I want the values from 8 columns in the row for each business key having the max timestamp (table writes records every minute). So I have something like this:

SELECT DISTINCT keycol1, keycol2, keycol3, keycol4,
first_value(col5) over (partition by keycol1, keycol2, keycol3, keycol4
order by create_timestamp desc) last_col5,
first_value(col6) over (partition by keycol1, keycol2, keycol3, keycol4
order by create_timestamp desc) last_col6,
first_value(col7) over (partition by keycol1, keycol2, keycol3, keycol4
order by create_timestamp desc) last_col7,

etc...

With each column I ad dlike that it gets a litle bit slower (but still much faster than correlated sub-query). Is there a way to tell it that I want the first value for several columns with the same partition and order so it grabbed it all at once. Something like:

SELECT DISTINCT keycol1, keycol2, keycol3, keycol4,
first_value(col5), first_value(col6), first_value(col7)
over (partition by keycol1, keycol2, keycol3, keycol4
order by create_timestamp desc)

Obviously that syntax is no good, but I think you get the gist of my question.

Tom Kyte
April 06, 2006 - 8:54 am UTC

seems you might want:

select *
from (
select keycol1, ...., col5, col6, col7,
row_number() over
(partition by keycol1, ... keycol4 order by create_timestamp desc) rn
from t
)
where rn = 1;



using analytics to replace a group by with a count(*)

Ryan, April 18, 2006 - 2:03 pm UTC

select columnA, count(*)
from tableA
group by columnA;

Is there a way to do this with analytics that has superior performance? I need to do this with tables that have millions of rows. I am hoping for something a little better.

Tom Kyte
April 18, 2006 - 3:45 pm UTC

shouldn't take long, mostly just the time to scan millions of rows assuming the distinct cardinality of columnA is relatively small

but analytics would be wholly and entirely inappropriate here, they do NOT SQUISH OUT THE DATA.

analytics do aggregates without aggregating. You - you need aggregation. Analytics would return millions of rows from millions of rows - exactly the same number of millions of rows.

One would anticipate a group by to return less than the original number of rows.


an index on columnA would be useful as a skinny version of the table - but would be problematic if columnA is nullable (we'd have to index (columnA,1) or something to make sure the entire set of rows appears in the index).



Using Analytics

A Reader, September 25, 2006 - 4:58 pm UTC

I have a Table T which stores origin airport, destination airport and distance.

select * from T

AIRPORT_1 AIRPORT_2 DISTANCE
D C 201
C D 200
B A 101
A B 100


I need to get an output like this.

A B 101
B A 101
C D 201
D C 201

is it possible using analytic functions?

The scripts .

CREATE TABLE T
(
AIRPORT_1 VARCHAR2(3),
AIRPORT_2 VARCHAR2(3),
DISTANCE NUMBER
);

Insert into T
(AIRPORT_1, AIRPORT_2, DISTANCE)
Values
('B', 'A', 101);
Insert into T
(AIRPORT_1, AIRPORT_2, DISTANCE)
Values
('A', 'B', 100);

Insert into T
(AIRPORT_1, AIRPORT_2, DISTANCE)
Values
('C', 'D', 200);

Insert into T
(AIRPORT_1, AIRPORT_2, DISTANCE)
Values
('D', 'C', 201);
COMMIT;




Tom Kyte
September 26, 2006 - 2:15 am UTC

you know - giving specifications is a fine skill, a necessary skill to have...

"i have this input data"
"the output needs to look like this"

without explanation - makes us GUESS what you mean, what you want.

I suppose you want the max distance by airport pairs.


ops$tkyte%ORA9IR2> select airport_1, airport_2,
  2         max(distance) over
  3           ( partition by least(airport_1,airport_2),
  4                          greatest(airport_1,airport_2)) dist
  5    from t;

AIR AIR       DIST
--- --- ----------
B   A          101
A   B          101
C   D          201
D   C          201

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select least(airport_1,airport_2),
  2         greatest(airport_1,airport_2),
  3         max(distance)
  4    from t
  5   group by least(airport_1,airport_2),
  6         greatest(airport_1,airport_2)
  7  /

LEA GRE MAX(DISTANCE)
--- --- -------------
A   B             101
C   D             201

either one suffices for me... 

Hep with Analytics

Rita, October 06, 2006 - 3:12 pm UTC

Hi Tom,
I need you help in writing up a query where a calculated total from the prev row is used for calculation in the subsequent row.
Pls create a test table as follows


Create table test ( date_in date, qoh number, qty_out number);
INSERT INTO TEST VALUES (TRUNC(SYSDATE),500,5);
INSERT INTO TEST VALUES(TRUNC(SYSDATE +1),0,2);
INSERT INTO TEST VALUES(TRUNC(SYSDATE + 2),0,1);
INSERT INTO TEST VALUES (TRUNC(SYSDATE + 3),0,1);
INSERT INTO TEST VALUES (TRUNC(SYSDATE + 4),0,6);



This is what the o/p needs to be


date_in QOH QTY_OUT END_OOH

10/6/2006 500 5 495
10/7/2006 495 2 493
10/8/2006 493 1 492
10/9/2007 492 1 491
10/10/2006 491 6 485


Basically, the end_qoh of the prev day would become the qoh for the subsequent day. Sorted on date.
I tried doing using Analytical Function ( Lag), but did not get these results.
Appreciate your help on how to do this.

Thanks,
Rita

Tom Kyte
October 06, 2006 - 3:40 pm UTC

ops$tkyte%ORA10GR2> select date_in, qoh, qty_out,
  2         sum(qoh-qty_out) over (order by date_in)+qty_out new_qoh,
  3         sum(qoh-qty_out) over (order by date_in) end_ooh
  4    from test
  5   order by date_in
  6  /

DATE_IN          QOH    QTY_OUT    NEW_QOH    END_OOH
--------- ---------- ---------- ---------- ----------
06-OCT-06        500          5        500        495
07-OCT-06          0          2        495        493
08-OCT-06          0          1        493        492
09-OCT-06          0          1        492        491
10-OCT-06          0          6        491        485
 

Without analytics?

V, November 29, 2006 - 3:39 pm UTC

Tom,

I have the following data:
USERID LOGIN ROLE
---------- ------------------- ----------
1 29-11-2006 14:04:20 A,B
2 29-11-2006 14:05:06
1 29-11-2006 14:05:13 A,B
3 29-11-2006 14:11:24 A
3 29-11-2006 14:11:49 A,B

create table a (userid, number, login date, role varchar2(10));
insert into a values (1,sysdate,'A,B');
insert into a values (2,sysdate,null);
insert into a values (1,sysdate,'A,B');
insert into a values (3,sysdate,'A');
insert into a values (3,sysdate,'A,B');

Without using analytics, how would I bring back:
All data for the latest login for each userid?
i.e.
USERID LOGIN ROLE
1 29-11-2006 14:05:13 A,B
2 29-11-2006 14:05:06
3 29-11-2006 14:11:49 A,B

Tom Kyte
November 30, 2006 - 9:30 am UTC

well, your test case stinks... sysdate - hmm.

ops$tkyte%ORA10GR2> select a.*
  2    from a, (select userid, max(login) max_login from a group by userid) b
  3   where a.userid = b.userid
  4     and a.login = b.max_login
  5  /
 

A Question that resembles the first in this tread

Mette, January 23, 2007 - 10:56 am UTC

Hi Tom

I have an analytics question and/or sql .....

I have a large table where we have to do som cleaning up.

The idea is that requestid/response is a transaction out and in stream. There may be multiple stations (0 or many) on boths ways, each logging an entry.
Deletion should take place of all intermediate stations (pr. request/response) when:
- all rows in a request id (not the request/response) is older than 3 days

Never delete start/ending points within a request/response group

Thank you for a great site
Best regards from
Mette


Example

PART_ID REQUESTID RESPONSE SEQUENCENUMBER DB_TIMESTAMP
a1 1 0 1 19-01-2007 16:21:17,000000 ALL record for req=1
a2 1 0 2 19-01-2007 16:21:17,000000 not to be deleted bacause of
a3 1 0 3 19-01-2007 16:21:17,000000 seq 6 too young (disqual the whole bunch)
a4 1 0 4 19-01-2007 16:21:17,000000
ba5 1 1 5 19-01-2007 16:21:17,000000
ba6 1 1 6 21-01-2007 16:21:18,000000
ba7 1 1 7 19-01-2007 16:21:18,000000
ba8 1 1 8 19-01-2007 16:21:18,000000
ba9 1 1 9 19-01-2007 16:21:18,000000
ba10 1 1 10 19-01-2007 16:21:18,000000

ca1 21 0 1 19-01-2007 16:21:18,000000 -- start point (dont delete)
ca2 21 0 2 19-01-2007 16:21:18,000000 DEL
ca3 21 0 3 19-01-2007 16:21:19,000000 DEL
ca4 21 0 4 19-01-2007 16:21:19,000000 -- end point (dont delete)
cba5 21 1 5 19-01-2007 16:21:19,000000 -- start point (dont delete)
cba6 21 1 6 19-01-2007 16:21:19,000000 DEL
cba7 21 1 7 19-01-2007 16:21:19,000000 DEL
cba8 21 1 8 19-01-2007 16:21:19,000000 DEL
cba9 21 1 9 19-01-2007 16:21:20,000000 DEL
cba10 21 1 10 19-01-2007 16:21:20,000000 -- end point (dont delete)

Here are the appro. scripts for defining the stuff:

drop table mette;

create table mette (part_id varchar2(10) primary key, requestid number,
response number, sequencenumber number,
db_timestamp timestamp(6) default systimestamp not null);

create index mette_ix1 on mette(requestid);

create index mette_ix2 on mette(db_timestamp);

insert into mette (part_id, requestid, response, sequencenumber) values('a1',1,0,1);
insert into mette (part_id, requestid, response, sequencenumber) values('a2',1,0,2);
insert into mette (part_id, requestid, response, sequencenumber) values('a3',1,0,3);
insert into mette (part_id, requestid, response, sequencenumber) values('a4',1,0,4);
insert into mette (part_id, requestid, response, sequencenumber) values('ba5',1,1,5);
insert into mette (part_id, requestid, response, sequencenumber) values('ba6',1,1,6);
insert into mette (part_id, requestid, response, sequencenumber) values('ba7',1,1,7);
insert into mette (part_id, requestid, response, sequencenumber) values('ba8',1,1,8);
insert into mette (part_id, requestid, response, sequencenumber) values('ba9',1,1,9);
insert into mette (part_id, requestid, response, sequencenumber) values('ba10',1,1,10);

insert into mette (part_id, requestid, response, sequencenumber) values('ca1',21,0,1);
insert into mette (part_id, requestid, response, sequencenumber) values('ca2',21,0,2);
insert into mette (part_id, requestid, response, sequencenumber) values('ca3',21,0,3);
insert into mette (part_id, requestid, response, sequencenumber) values('ca4',21,0,4);
insert into mette (part_id, requestid, response, sequencenumber) values('cba5',21,1,5);
insert into mette (part_id, requestid, response, sequencenumber) values('cba6',21,1,6);
insert into mette (part_id, requestid, response, sequencenumber) values('cba7',21,1,7);
insert into mette (part_id, requestid, response, sequencenumber) values('cba8',21,1,8);
insert into mette (part_id, requestid, response, sequencenumber) values('cba9',21,1,9);
insert into mette (part_id, requestid, response, sequencenumber) values('cba10',21,1,10);

-- make all recos eligeble for deletion
update mette set db_timestamp = db_timestamp - 4;

-- make one row in requestid 21 younger (could have been manualle injected)
update mette set db_timestamp = db_timestamp + 2 where part_id = 'ba6';


And here is my DELETE statement (which works) - but it must be possible to do a more readable version (or better performing):

delete from mette where db_timestamp < systimestamp - 3 and part_id in (
select part_id from mette x where exists
(
select '1' from (select * from
(select requestid, response, min(sequencenumber) as firstpart, max(sequencenumber) as lastpart,
max(max(db_timestamp)) over (partition by requestid) as latest_message
from mette
group by requestid, response
)
where latest_message < systimestamp - 3
) a
where
x.requestid = a.requestid and
x.response = a.response and
x.sequencenumber > firstpart and
x.sequencenumber < lastpart
)
);

LEAD/LAG with GROUPS

Robert, February 05, 2007 - 5:03 pm UTC

Hi Tom,

(Oracle 9.2.0.7)

I am trying to use analytics on the stats$snapshot table to
create an overview report of my statspack snapshot info.
My question involved the functionality of LEAD/LAG...

drop table test_snapshot;

create table test_snapshot
(
snap_id number
, snap_time date
, startup_time date
)
/

insert into test_snapshot values (10, trunc(sysdate), trunc(sysdate - 3));
insert into test_snapshot values (9, trunc(sysdate - 1), trunc(sysdate - 3));
insert into test_snapshot values (8, trunc(sysdate - 2), trunc(sysdate - 3));
insert into test_snapshot values (7, trunc(sysdate - 3), trunc(sysdate - 3));
insert into test_snapshot values (6, trunc(sysdate - 4), trunc(sysdate - 6));
insert into test_snapshot values (5, trunc(sysdate - 5), trunc(sysdate - 6));
insert into test_snapshot values (4, trunc(sysdate - 6), trunc(sysdate - 6));
insert into test_snapshot values (3, trunc(sysdate - 7), trunc(sysdate - 9));
insert into test_snapshot values (2, trunc(sysdate - 8), trunc(sysdate - 9));
insert into test_snapshot values (1, trunc(sysdate - 9), trunc(sysdate - 9));

commit;

select snap_id
, snap_time
, startup_time
, lag(startup_time) over (partition by startup_time order by startup_time) prev_startup
, lead(startup_time) over (partition by startup_time order by startup_time) next_startup
from test_snapshot
order by snap_id
/

This is the above query returns....(LEAD/LAG reading SINGLE columns in order)

SNAP_ID SNAP_TIME STARTUP_T PREV_STAR NEXT_STAR
--------- --------- --------- --------- ---------
1 27-JAN-07 27-JAN-07 27-JAN-07
2 28-JAN-07 27-JAN-07 27-JAN-07 27-JAN-07
3 29-JAN-07 27-JAN-07 27-JAN-07
4 30-JAN-07 30-JAN-07 30-JAN-07
5 31-JAN-07 30-JAN-07 30-JAN-07 30-JAN-07
6 01-FEB-07 30-JAN-07 30-JAN-07
7 02-FEB-07 02-FEB-07 02-FEB-07
8 03-FEB-07 02-FEB-07 02-FEB-07 02-FEB-07
9 04-FEB-07 02-FEB-07 02-FEB-07 02-FEB-07
10 05-FEB-07 02-FEB-07 02-FEB-07

This is what I want it to return...(LEAD/LAG reading GROUP of columns in order)

SNAP_ID SNAP_TIME STARTUP_T PREV_STAR NEXT_STAR
--------- --------- --------- --------- ---------
1 27-JAN-07 27-JAN-07 30-JAN-07
2 28-JAN-07 27-JAN-07 30-JAN-07
3 29-JAN-07 27-JAN-07 30-JAN-07
4 30-JAN-07 30-JAN-07 27-JAN-07 02-FEB-07
5 31-JAN-07 30-JAN-07 27-JAN-07 02-FEB-07
6 01-FEB-07 30-JAN-07 27-JAN-07 02-FEB-07
7 02-FEB-07 02-FEB-07 30-JAN-07
8 03-FEB-07 02-FEB-07 30-JAN-07
9 04-FEB-07 02-FEB-07 30-JAN-07
10 05-FEB-07 02-FEB-07 30-JAN-07

How do I write a query to use LEAD/LAG againsts a GROUP of columns instead of going against SINGLE columns?

Thanks,

Robert.

Tom Kyte
February 05, 2007 - 6:41 pm UTC

you call lag and lead once for each column, that is they only way

Don't understand?

Robert, February 05, 2007 - 6:59 pm UTC

Tom,

I don't understand "you call lag and lead once for each column"

How do I write the query to get the results, below? (for some reason my text is not posting as fixed-width font(?)).

SNAP_ID SNAP_TIME STARTUP_T PREV_STAR NEXT_STAR
--------- --------- --------- --------- ---------
1 27-JAN-07 27-JAN-07 30-JAN-07
2 28-JAN-07 27-JAN-07 30-JAN-07
3 29-JAN-07 27-JAN-07 30-JAN-07
4 30-JAN-07 30-JAN-07 27-JAN-07 02-FEB-07
5 31-JAN-07 30-JAN-07 27-JAN-07 02-FEB-07
6 01-FEB-07 30-JAN-07 27-JAN-07 02-FEB-07
7 02-FEB-07 02-FEB-07 30-JAN-07
8 03-FEB-07 02-FEB-07 30-JAN-07
9 04-FEB-07 02-FEB-07 30-JAN-07
10 05-FEB-07 02-FEB-07 30-JAN-07

Thanks,

Robert.

Tom Kyte
February 06, 2007 - 10:48 am UTC

you use the code button or just use the code tags to get fixed width.
select snap_id, snap_time, 
       startup_time,
       lag(startup_time) over (order by snap_id) prev_startup,
       lead(startup_time) over (order by snap_id) next_startup
 from ...

Clarification to above...

Robert, February 05, 2007 - 9:44 pm UTC

Tom,

I said "group of COLUMNS"...
I meant to say "group of ROWS"

Is there a way to do this?

Thanks,

Robert.

To: Robert

Michel Cadot, February 06, 2007 - 3:00 am UTC

If I clearly understand your issue:
SQL> with 
  2    data as (
  3      select snap_id, snap_time, startup_time,
  4             case 
  5             when lag(startup_time) over (order by startup_time, snap_id) != startup_time
  6               then lag(startup_time) over (order by startup_time, snap_id)
  7             end prev_startup,
  8             case 
  9             when lead(startup_time) over (order by startup_time, snap_id) != startup_time
 10               then lead(startup_time) over (order by startup_time, snap_id)
 11             end next_startup
 12      from test_snapshot 
 13    )
 14  select snap_id, snap_time, startup_time,
 15         first_value(prev_startup) 
 16           over (partition by startup_time order by snap_id) prev_startup,
 17         first_value(next_startup) 
 18           over (partition by startup_time order by snap_id desc)
 19           next_startup
 20  from data
 21  order by snap_id
 22  /
   SNAP_ID SNAP_TIME  STARTUP_TI PREV_START NEXT_START
---------- ---------- ---------- ---------- ----------
         1 28/01/2007 28/01/2007            31/01/2007
         2 29/01/2007 28/01/2007            31/01/2007
         3 30/01/2007 28/01/2007            31/01/2007
         4 31/01/2007 31/01/2007 28/01/2007 03/02/2007
         5 01/02/2007 31/01/2007 28/01/2007 03/02/2007
         6 02/02/2007 31/01/2007 28/01/2007 03/02/2007
         7 03/02/2007 03/02/2007 31/01/2007
         8 04/02/2007 03/02/2007 31/01/2007
         9 05/02/2007 03/02/2007 31/01/2007
        10 06/02/2007 03/02/2007 31/01/2007

The inner query catches the previous and next startup times only if they are different from the current one.
The outer query extracts the first or last value for each group ("partition by startup_time") and propagates them to each row of the group.

Regards
Michel

Thanks Michel... that's it!

Robert Wood, February 06, 2007 - 11:40 am UTC

Michel,

You understood perfectly.
Thank you for your help.

I just have one question/comment.....

I thought I had a fairly good grasp on analytics, and it seemed very 'intuitive' to me that if using lead/lag, if you partition by a certain column, that lead/lag *should* consider the entire GROUP of distinct values and should go to the next/prev GROUP instead of the next/prev row. By considering the next/prev *row*, it seems to me that lead/lag is effectivly *ignoring* the fact that you are partitioning on that column (!?).

I guess I have some more learning to do on how analytics work.

Thanks again for your help.
Robert.

Analytics and group

Michel Cadot, February 06, 2007 - 1:43 pm UTC

Robert,

Analytics works on each partition independently. You can think each group as a different result set as if they come from different queries. Analytic function reevaluates its result for each group restarting from its first value (1 for row_number or rank, null for lag and so on).

Regards
Michel

Usage of Analytic Functions within a query having grouping

A reader, May 29, 2008 - 10:26 am UTC

Hi Tom,

For the first question in this thread, I think one more alternative is :

select t.col1, t.col2
from tab1 t,
(select distinct t.col1, max(t.col3) over(partition by(t.col1)) col3
from tab1 t) c
where t.col1 = c.col1
and t.col3 = c.col3;

Other than the fact that the table is referred twice, can you please point out if there are other issues ?

Long live "Tom Kyte" !!!

Thanks,
r-a-v-i

Reader, February 02, 2009 - 12:15 pm UTC

create table tst
(dt_key number,
 val varchar2(10),
 last_time  timestamp(6)
);


insert into tst
values
(20080819,'ABC',to_timestamp('08/19/2008 2:08:02.753000 PM','mm/dd/yyyy hh:mi:ss.ff am'));

insert into tst
values
(20080819,'ABC',to_timestamp('08/19/2008 2:30:51.229000 PM','mm/dd/yyyy hh:mi:ss.ff am'));

commit;

select * from tst order by last_time;

    DT_KEY VAL        LAST_TIME
---------- ---------- ------------------------------
  20080819 ABC        19-AUG-08 02.08.02.753000 PM
  20080819 ABC        19-AUG-08 02.30.51.229000 PM



create table tst_data
(dt_key number
,val varchar2(10)
,ts  timestamp(6)
,price1 number
,price2 number)


select * from tst_data order by 3;

insert into tst_data
values
(20080819,'ABC',to_timestamp('08/19/2008 02.08.02.753000 PM','mm/dd/yyyy hh:mi:ss.ff am'),100,200);

insert into tst_data
values
(20080819,'ABC',to_timestamp('08/19/2008 2:30:46.000000 PM','mm/dd/yyyy hh:mi:ss.ff am'),100.1,50.72);

insert into tst_data
values
(20080819,'ABC',to_timestamp('08/19/2008 2:30:53.000000 PM','mm/dd/yyyy hh:mi:ss.ff am'),55.63,10.72);

commit;

  1* select * from tst_data order by 3
SQL> /

    DT_KEY VAL        ts                               PRICE1     PRICE2
---------- ---------- ------------------------------  -------    -------    
  20080819 ABC        19-AUG-08 02.08.02.753000 PM     100        200
  20080819 ABC        19-AUG-08 02.30.46.000000 PM     100.1      50.72
  20080819 ABC        19-AUG-08 02.30.53.000000 PM     55.63      10.72
  
If tst.val=tst_data.val and tst.dt_key=tst_data.dt_key and  tst.last_time = tst_data.ts, then I want the price1 and price2 from tst_data
Eg: for record 1 in tst, the last_time = 19-AUG-08 02.08.02.753000 PM
    and there is a match in tst_data for that dt_key, val and time. So I want price1=100 and price2=200
    
If I cannot find the exact match for the time, I need to get the tst_data.price1,tst_data.price2 for the closest time in tst_data.
Eg: for record 2 in tst, the last_time = 19-AUG-08 02.30.51.229000 PM, there is no match in tst_data for this time, hence I want price1 and prioce2 for the closest time 
    i.e price1=100.1; price2=50.72 2nd record in tst_data table (ts = 19-AUG-08 02.30.46.000000 PM)     


Can you advice how to get this?

Tom Kyte
February 02, 2009 - 1:34 pm UTC

If your goal is accurately stated:

... hence I want price1 and prioce2 for the
closest time
...

then AN answer is:

ops$tkyte%ORA10GR2> select tst.dt_key, tst.val,
  2         max(tst_data.price1)
  3            keep (dense_rank first
  4                  order by greatest(tst.last_time,tst_data.ts)
  5                          -least(tst.last_time,tst_data.ts) ) p1,
  6         max(tst_data.price2)
  7            keep (dense_rank first
  8                  order by greatest(tst.last_time,tst_data.ts)
  9                          -least(tst.last_time,tst_data.ts) ) p2,
 10         max(tst_data.ts)
 11            keep (dense_rank first
 12                  order by greatest(tst.last_time,tst_data.ts)
 13                          -least(tst.last_time,tst_data.ts) ) ts1,
 14         max(tst.last_time)
 15            keep (dense_rank first
 16                  order by greatest(tst.last_time,tst_data.ts)
 17                          -least(tst.last_time,tst_data.ts) ) ts2
 18    from tst, tst_data
 19   where tst.dt_key = tst_data.dt_key
 20     and tst.val = tst_data.val
 21   group by tst.rowid, tst.dt_key, tst.val
 22  /

    DT_KEY VAL         P1         P2 TS1                          TS2
---------- --- ---------- ---------- ---------------------------- ----------------------------
  20080819 ABC        100        200 19-AUG-08 02.08.02.753000 PM 19-AUG-08 02.08.02.753000 PM
  20080819 ABC      55.63      10.72 19-AUG-08 02.30.53.000000 PM 19-AUG-08 02.30.51.229000 PM




(53 seconds is closer to 51.229 seconds than 46 seconds is)

If the actual need was for the closest TS less than or equal to last time, then AN answer is:
ops$tkyte%ORA10GR2> select tst.dt_key, tst.val,
  2         max(tst_data.price1)
  3               keep (dense_rank first
  4                   order by tst.last_time-tst_data.ts ) p1,
  5         max(tst_data.price2)
  6               keep (dense_rank first
  7                   order by tst.last_time-tst_data.ts ) p2,
  8         max(tst_data.ts)
  9               keep (dense_rank first
 10                   order by tst.last_time-tst_data.ts ) ts1,
 11         max(tst.last_time)
 12               keep (dense_rank first
 13                   order by tst.last_time-tst_data.ts ) ts2
 14    from tst, tst_data
 15   where tst.dt_key = tst_data.dt_key
 16     and tst.val = tst_data.val
 17     and tst.last_time >= tst_data.ts
 18   group by tst.rowid, tst.dt_key, tst.val
 19  /

    DT_KEY VAL         P1         P2 TS1                          TS2
---------- --- ---------- ---------- ---------------------------- ----------------------------
  20080819 ABC        100        200 19-AUG-08 02.08.02.753000 PM 19-AUG-08 02.08.02.753000 PM
  20080819 ABC      100.1      50.72 19-AUG-08 02.30.46.000000 PM 19-AUG-08 02.30.51.229000 PM


Reader, February 02, 2009 - 4:53 pm UTC

Tom,
Apologies for not being specific on my requirement. The second case you mentioned is what I am supposed to use. In tst_data, I will have 7 to 10 million records everyday. On a six million record data for 2008-08-19 in tst_data, your query returned results in 1.50 mins. Can you please explain how it works so fast?
Tom Kyte
February 03, 2009 - 9:12 am UTC

I am so bookmarking this response!

probably two nice big juicy full scans and a hash join followed by a nice sort aggregate step.


if yu trace it, you'll see exactly what it is doing - and if you see physical writes in the processing - you might consider increasing the pga memory if you have some in reserve.

Viji, February 03, 2009 - 8:20 am UTC

How badly will it affect when using "keep...dense_rank()" on a volatile table (The table is used for a what-if scenario, so we expect heavy INSERTs). Right now, in Development, when I try it, the performance is fantastic.
Or will it not matter?

Thank you!!
Tom Kyte
February 03, 2009 - 10:33 am UTC

keep dense_rank is a 'red herring', it is just an aggregate.

anyway - if the plan is as I think it is, there is a full scan (of the table we think will have the fewer rows...). That full scan will use read consistency to retrieve the data - meaning, if an uncommitted insert or a new insert is started is hit during the full scan, we'll read the undo tablespace to undo the insert. This is common, it happens all of the time, it might increase the time to full scan, it might not.

As we full scan this table, we hash it into memory.

We then start full scanning the other table - same story with read consistency. As we full scan this table, we are joining it to the hashed results from before.

Then, we are taking the output of that join and hash/sort aggregating it
Then we start returning rows.


So, the inserts will affect the performance of the full scans.



Reader, February 03, 2009 - 2:29 pm UTC

Tom,
As you said, it is two full scans on the daily partition tables on tst_data and tst, hash join and aggregation.

I have no control to change the pga size. Can you please let me know if I can take any other steps?

Plan.
SELECT STATEMENT ALL_ROWSCost: 14,256 Bytes: 103,404 Cardinality: 1,231
6 SORT GROUP BY Cost: 14,256 Bytes: 103,404 Cardinality: 1,231
5 HASH JOIN Cost: 14,255 Bytes: 103,404 Cardinality: 1,231
2 PARTITION RANGE SINGLE Cost: 147 Bytes: 24,388 Cardinality: 469 Partition #: 3 Partitions accessed #166
1 TABLE ACCESS FULL TABLE TST Cost: 147 Bytes: 24,388 Cardinality: 469 Partition #: 4 Partitions accessed #166
4 PARTITION RANGE SINGLE Cost: 14,081 Bytes: 247,215,136 Cardinality: 7,725,473 Partition #: 5 Partitions accessed #263
3 TABLE ACCESS FULL TABLE TST_DATA Cost: 14,081 Bytes: 247,215,136 Cardinality: 7,725,473 Partition #: 6 Partitions accessed #263
Tom Kyte
February 03, 2009 - 2:39 pm UTC

I would need to see a tkprof with the row source operation section (not explain plan!) in particular

with wait events too :)

Reader, February 04, 2009 - 11:36 am UTC

Tom,
I do not have access to tracefiles. Can I ask the dba to use the latest trace file and get me a report?
Tom Kyte
February 04, 2009 - 12:27 pm UTC

You can

Old fashioned SQL equivalent of Analytics

Sanji, April 20, 2009 - 4:19 pm UTC

Tom,

Table/ Data sample

select * from tab1;

RPT_ID GL_AMT RPT_NUM
---------- ---------- ----------
a 0 89
a 2 88
a -5 88
b 10 10
b 2 10
c 0 89

What would be the "non-analytical" equivalent of the following SQL. I mean, without using analytics.

select rpt_id || '->' || row_number() over (partition by rpt_id order by rpt_id) rpt_id_rank,
rpt_num, sum(gl_amt)
from tab1
group by rpt_id, rpt_num
/

RPT_ID_RANK RPT_NUM SUM(GL_AMT)
-------------------- ---------- -----------
a->1 88 -3
a->2 89 0
b->1 10 12
c->1 89 0

Rgds
Sanji
Tom Kyte
April 21, 2009 - 2:46 pm UTC

no create table
no insert intos
no look

Old fashioned SQL equivalent of Analytics

Sanji, April 21, 2009 - 3:00 pm UTC

Sincere apologies

create table tab1 (
rpt_id varchar2(5),
gl_amt number,
rpt_num number);

insert into tab1 values('a',0,89);
insert into tab1 values('a',2,88);
insert into tab1 values('a',-5,88);
insert into tab1 values('b',10,10);
insert into tab1 values('b',2,10);
insert into tab1 values('c',0,89);

What i'm looking at is the following result set like without using analytics

select rpt_id || '->' || row_number() over (partition by rpt_id order by rpt_id) rpt_id_rank,
rpt_num, sum(gl_amt)
from tab1
group by rpt_id, rpt_num
/

RPT_ID_RAN RPT_NUM SUM(GL_AMT)
---------- ---------- -----------
a->1 88 -3
a->2 89 0
b->1 10 12
c->1 89 0

Rgds
Sanji
Tom Kyte
April 21, 2009 - 3:57 pm UTC

ops$tkyte%ORA10GR2> select t1.*,
  2         (select count(distinct rpt_id||'.'||rpt_num)+1 from tab1 t2 where t2.rpt_id = t1.rpt_id and t2.rpt_num < t1.rpt_num) rn
  3    from (
  4  select rpt_id, rpt_num, sum(gl_amt)
  5    from tab1
  6   group by rpt_id, rpt_num
  7         ) t1
  8  /

RPT_I    RPT_NUM SUM(GL_AMT)         RN
----- ---------- ----------- ----------
a             88          -3          1
a             89           0          2
c             89           0          1
b             10          12          1


it won't be fast....

Need some clarifications on analytics

Manjunath, July 09, 2009 - 1:51 pm UTC

Tom,
Lately we have found in our project((dataware house), many queries are improved when we convert the queries using aggregates with a group by and order by clause.
On one particular query however, I found not much difference in terms of elapsed time. THe query with analytics was using FTS with window sorting and the other was doing an index scan. I even confirmed that the explain plan is true by actually setting the trace and running the query for some time. My question is this. My environment does not mimic the production but is atleast 10 to 15 times smaller for many of the tables. So, even though there is not much between the 2 alternates in my environment, is it safe to assume that on prod the analytics would scale better? I am not currently using the parallel option, but that is also available as we do FTS in the analytics alternate.
Your inputs in this regards would be highly useful.

I am sorry, I currently do not have the explain plan and rowsource, but if you think they will be useful before you comment, I will update tomorrow when I get back to work again.

Thanks and regards
manjunath
Tom Kyte
July 14, 2009 - 3:10 pm UTC

... On one particular query however, I found not much difference in terms of
elapsed time. ...

that does not surprise me. If analytics were always faster, we'd remove the other bits from the language :) Nothing is always true.

Never say never, never say Always, I always say.


... is it safe to assume that on
prod the analytics would scale better? ....

impossible to say, you give nothing to work with. Nothing is "always better".

Analytic functions with dates

Manuel, August 06, 2009 - 9:36 am UTC

Hi Tom,

I have have two tables and I must join them using dates,
I think analytic functions could help me, but I haven't
been able to solve the problem though I've tried.

MACHINE_CONNECTIONS table

MACHINE_ID AP_ID START END
---------------------------------------------------------------
M1 1110803 04/01/2008 11:09:24 05/03/2008 12:59:03
M1 3000890 05/03/2008 12:59:03 26/08/2008 7:29:24
M1 1111436 26/08/2008 7:29:24 04/02/2009 14:06:01
M1 1107816 04/02/2009 14:06:01 25/02/2009 16:44:46
M1 1111436 25/02/2009 16:44:46 null

ACCESSPOINT_LOCATIONS table

AP_ID LOCATION_ID START END
---------------------------------------------------------------
1110803 14395 23/02/2007 16:00:20 09/08/2007 13:26:59
1110803 17298 09/08/2007 13:26:59 09/08/2007 13:31:06
1110803 17150 04/01/2008 11:00:02 05/03/2008 12:59:50
1110803 18327 12/03/2008 11:38:59 25/11/2008 11:47:50
1110803 18901 25/11/2008 11:47:50 null

Following rules apply

- an AP_ID is installed in a LOCATION_ID,
creating a record in table ACCESSPOINT_LOCATIONS with
START date and null END date
- always after AP_ID is located, a machine (M1) may
connect to it, creating a record in table
MACHINE_CONNECTIONS with START date and null END date
- a machine may be connected to another access point,
so the previous record has END date, and creating a
new one with the new AP_ID and a START date

What I want is to obtain the history of LOCATION_IDs
for every machine, knowing that for every record in
MACHINE_CONNECTIONS with AP_ID xxxx there is only one
ACCESSPOINT_LOCATIONS that satisfies START and END
dates, because START,END period from MACHINE_CONNECTIONS
is always contained in START,END period from
ACCESSPOINT_LOCATIONS

Thank you very much in advance


Tom Kyte
August 06, 2009 - 10:42 am UTC

no create
no inserts
never ever looked at.


but you want to JOIN, analytics are not about joining, so I seriously doubt they have anything to do with it.


can you specify a where clause that matches a record from one table to the other (you have to be able to do that, just write down the psuedo code, the specification, the rules for a record in one table to be matched to the other)

that is the answer to your question - when you do that - when you tell us how to join the two together, you'll have answered your own question.

Analytic functions with dates?

Manuel, August 07, 2009 - 6:01 am UTC


Hi,

Sorry for not making myself clear. Here are the creates
and some inserts.

CREATE TABLE MACHINE_CONNECTIONS
(
MACHINE_ID VARCHAR2(2 BYTE),
AP_ID INTEGER,
D_START DATE,
D_END DATE
);

CREATE TABLE ACCESSPOINT_LOCATIONS
(
AP_ID INTEGER,
LOCATION_ID INTEGER,
D_START DATE,
D_END DATE
);

Insert into ACCESSPOINT_LOCATIONS
(AP_ID, LOCATION_ID, D_START, D_END)
Values
(1110803, 14395, TO_DATE('02/23/2007 16:00:20', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/09/2007 13:26:59', 'MM/DD/YYYY HH24:MI:SS'));
Insert into ACCESSPOINT_LOCATIONS
(AP_ID, LOCATION_ID, D_START, D_END)
Values
(1110803, 14395, TO_DATE('08/09/2007 13:26:59', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/09/2007 13:31:06', 'MM/DD/YYYY HH24:MI:SS'));
Insert into ACCESSPOINT_LOCATIONS
(AP_ID, LOCATION_ID, D_START, D_END)
Values
(1110803, 17298, TO_DATE('01/04/2008 11:00:02', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/05/2008 12:59:50', 'MM/DD/YYYY HH24:MI:SS'));
Insert into ACCESSPOINT_LOCATIONS
(AP_ID, LOCATION_ID, D_START, D_END)
Values
(1110803, 17150, TO_DATE('03/05/2008 12:59:50', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/12/2008 11:38:59', 'MM/DD/YYYY HH24:MI:SS'));
Insert into ACCESSPOINT_LOCATIONS
(AP_ID, LOCATION_ID, D_START, D_END)
Values
(1110803, 18327, TO_DATE('03/12/2008 11:38:59', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('11/25/2008 11:47:50', 'MM/DD/YYYY HH24:MI:SS'));
Insert into ACCESSPOINT_LOCATIONS
(AP_ID, LOCATION_ID, D_START, D_END)
Values
(1111436, 7189, TO_DATE('02/28/2007 10:54:04', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/01/2007 15:53:29', 'MM/DD/YYYY HH24:MI:SS'));
Insert into ACCESSPOINT_LOCATIONS
(AP_ID, LOCATION_ID, D_START, D_END)
Values
(1111436, 3967, TO_DATE('03/29/2007 07:23:23', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/29/2007 07:59:31', 'MM/DD/YYYY HH24:MI:SS'));
Insert into ACCESSPOINT_LOCATIONS
(AP_ID, LOCATION_ID, D_START, D_END)
Values
(1111436, 7189, TO_DATE('03/30/2007 14:48:18', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/30/2007 14:49:10', 'MM/DD/YYYY HH24:MI:SS'));
Insert into ACCESSPOINT_LOCATIONS
(AP_ID, LOCATION_ID, D_START)
Values
(1111436, 7310, TO_DATE('04/20/2009 13:22:04', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MACHINE_CONNECTIONS
(MACHINE_ID, AP_ID, D_START, D_END)
Values
('M1', 1110803, TO_DATE('01/04/2008 11:09:24', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/05/2008 12:59:03', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MACHINE_CONNECTIONS
(MACHINE_ID, AP_ID, D_START, D_END)
Values
('M1', 1111436, TO_DATE('08/26/2008 07:29:24', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('02/04/2009 14:06:01', 'MM/DD/YYYY HH24:MI:SS'));
Insert into MACHINE_CONNECTIONS
(MACHINE_ID, AP_ID, D_START)
Values
('M1', 1111436, TO_DATE('02/25/2009 16:44:46', 'MM/DD/YYYY HH24:MI:SS'));

I want to get the location history for a machine_id,
(LOCATION_ID in ACCESSPOINT_LOCATIONS table)
knowing that there is only one record in table
ACCESSPOINT_LOCATIONS that satisfies the location for
a given record in MACHINE_CONNECTIONS. So, a MACHINE_ID
can never be connected to an AP_ID unless it has a
location in ACCESSPOINT_LOCATIONS. While a MACHINE_ID
remains connected to an AP_ID (d_end is null in
MACHINE_CONNECTIONS), the AP_ID must change location,
and so the MACHINE_ID while d_end remains null for that
AP_ID.

Tables join via AP_ID field.

Tom Kyte
August 07, 2009 - 10:11 am UTC

I did not follow your description.

do you mean something like this?

ops$tkyte%ORA10GR2> select * from machine_connections;

MA      AP_ID D_START   D_END
-- ---------- --------- ---------
M1    1110803 04-JAN-08 05-MAR-08
M1    1111436 26-AUG-08 04-FEB-09
M1    1111436 25-FEB-09

ops$tkyte%ORA10GR2> select * from accesspoint_locations;

     AP_ID LOCATION_ID D_START   D_END
---------- ----------- --------- ---------
   1110803       14395 23-FEB-07 09-AUG-07
   1110803       14395 09-AUG-07 09-AUG-07
   1110803       17298 04-JAN-08 05-MAR-08
   1110803       17150 05-MAR-08 12-MAR-08
   1110803       18327 12-MAR-08 25-NOV-08
   1111436        7189 28-FEB-07 01-MAR-07
   1111436        3967 29-MAR-07 29-MAR-07
   1111436        7189 30-MAR-07 30-MAR-07
   1111436        7310 20-APR-09

9 rows selected.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select *
  2    from machine_connections mc, accesspoint_locations al
  3   where mc.ap_id = al.ap_id
  4     and mc.d_start <= nvl(al.d_end,sysdate)
  5     and nvl(mc.d_end,sysdate) >= al.d_start
  6  /

MA      AP_ID D_START   D_END          AP_ID LOCATION_ID D_START   D_END
-- ---------- --------- --------- ---------- ----------- --------- ---------
M1    1110803 04-JAN-08 05-MAR-08    1110803       17298 04-JAN-08 05-MAR-08
M1    1111436 25-FEB-09              1111436        7310 20-APR-09




walk us through the entire process if not, step by step, using pseudo code.

Analytic functions with dates

Manuel, August 10, 2009 - 8:06 am UTC

Thanks for the answer Tom, but I'm not sure it would do
it.
I'll try to explain using pseudocode

1) For every record in MC
- Find all records -locations- in AL that satisfy
-MC.ap_id = AL.ap_id
-MC.d_start >= AL.d_start
-MC.d_end <= AL.d_end (both d_end can be null)
There can only be one record in MC
with d_end = null for the same machine_id
There can only be one record in AP
with d_end = null for the same ap_id
If MC.d_end is null that means machine_id
remains connected to ap_id.
If AL.d_end is null that means ap_id remains
in the same location_id.

I could send another bunch of inserts and the desired
result to make it clearer if you want.

Thanks
Tom Kyte
August 11, 2009 - 1:52 pm UTC

isn't that simply

MC.ap_id = AL.ap_id and MC.d_start >= AL.d_start
and (MC.d_end <= AL.d_end or (mc.d_end is null and al.d_end is null))



If you can phrase it as a list of things that must be true, it turns right into a where clause

Books on Analytics

enayet, August 12, 2009 - 10:20 am UTC

Tom,

When you will be authoring a book about Analytics? It's been long overdue, right?

Thanks,

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.