Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Prince.

Asked: October 02, 2002 - 8:44 pm UTC

Last updated: October 03, 2006 - 4:59 pm UTC

Version: 8.1.7.4

Viewed 1000+ times

You Asked

CREATE TABLE EMP ( EMPNO NUMBER NOT NULL,
DEPTNO NUMBER,
EMPNAME VARCHAR2(30),
MGRNO NUMBER,
SAL NUMBER(8),
CONSTRAINT PK_EMP PRIMARY KEY(EMPNO)
);


SQL> select * from emp where rownum < 10 ;

EMPNO DEPTNO EMPNAME MGRNO SAL
---------- ---------- ---------- ---------- ---------
1 2 czueltcagt 20000
2 3 yxycarvtqn 1 30000
3 4 ucbaicclek 1 40000
4 5 gzlocwxque 1 50000
5 6 bbukploxzc 1 10000
6 7 hcxiuyiudc 1 20000
7 8 ygrnqwjdjj 1 30000
8 9 nwppvwwdmi 1 40000
9 10 htchnvnpvs 1 50000

Now, I want to apply say, 100,000 on the sal column deducting first row's sal from 100K, then 2nd rows sal from the remainder etc, until my original 100K become zero or negative.

Note: the following is just a test case.

EMPNO DEPTNO EMPNAME MGRNO SAL Remaining
---------- ---------- ---------- ---------- --------- ---------
1 2 czueltcagt 20000 0 (100 - 20 -> 80K) We applied full 20K and now have 80K on hand
2 3 yxycarvtqn 1 30000 0 (80 - 30 -> 50K) we applied full 30K from the remaining 80K
3 4 ucbaicclek 1 40000 0 (50 - 40 -> 10K) we applied full 40K from the remaining 50K
4 5 gzlocwxque 1 50000 40K (10 - 50 -> 40K) we applied remaining 10K here and stop applying anymore.
5 6 bbukploxzc 1 10000 10K
6 7 hcxiuyiudc 1 20000 20K
7 8 ygrnqwjdjj 1 30000 .
8 9 nwppvwwdmi 1 40000 .
9 10 htchnvnpvs 1 50000 .

I tried with sum OVER (ORDER BY empno RANGE UNBOUNDED PRECEDING) and lag etc, but couldn't find a way to stop when it reaches the first negative value.


and Tom said...

something like this perhaps:

scott@ORA920.LOCALHOST> select ename, sal, sum_sal,
2 case when ( sum_sal <= 10000 )
3 then 10000-sum_sal
4 else null
5 end bal
6 from (
7 select empno,
8 ename,
9 sal,
10 sum(sal) over ( order by empno ) sum_sal
11 from emp
12 )
13 order by empno
14 /

ENAME SAL SUM_SAL BAL
---------- ---------- ---------- ----------
SMITH 800 800 9200
ALLEN 1600 2400 7600
WARD 1250 3650 6350
JONES 2975 6625 3375
MARTIN 1250 7875 2125
BLAKE 2850 10725
CLARK 2450 13175
SCOTT 3000 16175
KING 5000 21175
TURNER 1500 22675
ADAMS 1100 23775
JAMES 950 24725
FORD 3000 27725
MILLER 1300 29025

14 rows selected.

scott@ORA920.LOCALHOST>

Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

I need a bit different result

Prince, October 03, 2002 - 3:07 pm UTC

May be, I was not clear when I explain the case.

Assume, the SAL column is what I have to pay, and I have 10K on my hand. I need to pay them off in the order of empno and show the the remaining each employee has to get.

Infact, the emp table is not a good example. This is the table I had it in handy.


ENAME SAL SUM_SAL BAL ToRecieve
---------- ---------- ---------- ---------- --------------
SMITH 800 800 9200 0
ALLEN 1600 2400 7600 0
WARD 1250 3650 6350 0
JONES 2975 6625 3375 0
MARTIN 1250 7875 2125 0
BLAKE 2850 10725 2850-2125 = 725
CLARK 2450 13175 2450
SCOTT 3000 16175 3000
KING 5000 21175 5000
TURNER 1500 22675 ..
ADAMS 1100 23775 ..
JAMES 950 24725 ..
FORD 3000 27725 ..
MILLER 1300 29025 ..


What I had was something like,
select * from (select empno, sal, ssum, lag(ssum) over (order by empno) lag_s ,
nvl(10000 - (lag(ssum) over (order by empno)) , 10000) diff, sal - nvl(10000 - (lag(ssum) over (order by empno)) , 10000
) rem
from (SELECT empno, sal, SUM(sal) OVER (ORDER BY empno ) ssum from emp) t1 )

I can now apply the case on "diff" and
if it is >= 0
display "0" if sign("sal - diff" ) = -1
esle display "sal - diff"
else display "SAL" as "ToRecieve"

But the one I have has so many level of subquery. I want it to be simple and efficient. Also, Can I stop the sum() when it reaches the value greater than 10K. I guess I need another nesting for this to work as "window functions are not allowed" in the where cluse.

Thanks for the case statement. I wasn't thinking about the case at all. What I had before was justy the SQL I pasted above.


Tom Kyte
October 03, 2002 - 6:46 pm UTC

Using inline views makes it simple.

Using inline views does not mean "inefficient"

Use inline views, use case....

I am not corcerned about the inline views, but

Prince., October 04, 2002 - 12:17 am UTC

Thanks Tom,

I am not concerned much about the inline views. But I am wondering whether I am using too many inline views and whether there is a better way to do this.

It looks to me that, I am following a convoluted way!

Do you see any simpler way of doing this? else, I will continue with what I have right now.

Tom Kyte
October 04, 2002 - 8:17 am UTC

use analytics, use case and you are going down the right path IMO

How can I do thiis ?

Parag Jayant Patankar, January 03, 2005 - 5:04 am UTC

Hi Tom,

I am trying to do average using analytical function.

1. Table of transaction is parag and having column a date, and column b number 

15:12:47 SQL> select * from parag;

A                 B
-------- ----------
20041130       5000
20041210       6000
20041229      10000
20041230       7000


I wan to take average for financial matter by doing following,

select a, b, x, lag(b) over ( order by (a)) y, x *lag(b) over ( order by (a)) z
from
(
select a, b, to_date(a, 'RRRRMMDD') - to_date(lag(a) over (order by(a)), 'RRRRMMDD') x
from parag
)
/

A                 B          X          Y          Z
-------- ---------- ---------- ---------- ----------
20041130       5000
20041210       6000         10       5000      50000
20041229      10000         19       6000     114000
20041230       7000          2      10000      20000

which is nothing but calculating amount by no of days between transactions and multiplying the amount. 

I have following questions

1. How to add last row in this same calculation by arriving no of days is months last day - last day of the transaction ? for e.g. how to add 7000 in this output.

So my output should be

A                 B          X          Y          Z
-------- ---------- ---------- ---------- ----------
20041130       5000
20041210       6000         10       5000      50000
20041229      10000         19       6000     114000
20041230       7000          2      10000      20000
                             1       7000       7000

Suppose my table is having following data

A                 B
-------- ----------
20041130       5000
20041210       6000
20041229      10000


A                 B          X          Y          Z
-------- ---------- ---------- ---------- ----------
20041130       5000
20041210       6000         10       5000      50000
20041229      10000         19       6000     114000
                             2      10000      20000

arriving at 2 days ( column x ) by subtracting 29/12 from 31/12

2. My doing this I am arriving at the average balance for the month. Is there any other simple method for arriving this kind of calculation ?

Kindly guide me for analytical functions.

regards & thanks
pjp 

Tom Kyte
January 03, 2005 - 8:54 am UTC

too hard for me to turn "select * from's" into insert intos

How can I do this ? ( weighted average )

Parag Jayant Patankar, January 03, 2005 - 9:11 am UTC

Hi Tom,

I think I am not able to explain question properly. I will try to put my question again.

I have a transaction table "parag" having following details

A B
-------- ----------
20041130 5000
20041210 6000
20041228 10000
20041229 7000

I want the output of select statement as follows

A B X Y Z
-------- ---------- ---------- ---------- ----------
20041130 5000
20041210 6000 10 5000 50000
20041228 10000 18 6000 108000
20041229 7000 1 10000 10000
2 7000 14000 <= in this last row no of days i.e. column x should be arrived by deducting last transaction date from last day of the month. here value 2 arrived by '20041231' - '20041229'.

How can I write SQL statment to arrive this possibly by using analytical function ?

regards & thanks
pjp

Tom Kyte
January 03, 2005 - 9:14 am UTC

in the future, what I mean is that it is too hard for me to convert your stuff above into insert into's with create tables...

You do that. give me something to cut and paste and run in sqlplus to get going.


but here -- just thing "outer join", you need a table of dates that should be in your output. outer join parag to that table of dates, you have therefore filled in the blanks.




Pl find table creation and insert statements

Parag Jayant Patankar, January 03, 2005 - 9:22 am UTC

Hi Tom,

Pl find details below SQL statements you required

create table parag ( a char, b number );
insert into parag values ('20041130', 5000);
insert into parag values ('20041210', 6000);
insert into parag values ('20041228', 10000);
insert into parag values ('20041229', 7000);

My sql which is not giving desired output is

select a, b, x, lag(b) over ( order by (a)) y, x *lag(b) over ( order by (a)) z
from
(
select a, b, to_date(a, 'RRRRMMDD') - to_date(lag(a) over (order by(a)),
'RRRRMMDD') x
from parag
);

thanks & regards
pjp



Tom Kyte
January 03, 2005 - 9:54 am UTC

did you even try the suggestion above? outer join to a table with all of your needed dates -- eg:

select to_date( :x, 'dd-mon-yyyy') + rownum-1 dt
from all_objects
where rownum <= to_char( last_day( to_date(:x,'dd-mon-yyyy') ), 'dd' )

will generate all of the days in a month (put 01-dec-2004 into :x)

outer join parag to that.

then analyze.

How to proceed ?

Parag Jayant Patankar, January 04, 2005 - 4:13 am UTC

Hi Tom,

Thanks for your tip. Now after having outer join by following command

select dt, l, m
from
(
select dt, nvl(b, 0) l,
dt - lag(dt) over (order by dt) m
from parag,
(
select to_date( '30-NOV-2004', 'dd-mon-yyyy') + rownum-1 dt
from all_objects
where rownum <= to_char(last_day(to_date('30-NOV-2004','dd-mon-yyyy') ),'dd') +2
order by 1
) x
where to_date(parag.a(+), 'RRRRMMDD') = dt
order by 1, 2, 3
);

I am having following output

DT L M
--------- ---------- ----------
30-NOV-04 5000
01-DEC-04 0 1
02-DEC-04 0 1
03-DEC-04 0 1
04-DEC-04 0 1
05-DEC-04 0 1
06-DEC-04 0 1
07-DEC-04 0 1
08-DEC-04 0 1
09-DEC-04 0 1
10-DEC-04 6000 1
11-DEC-04 0 1
12-DEC-04 0 1
13-DEC-04 0 1
14-DEC-04 0 1
15-DEC-04 0 1
16-DEC-04 0 1
17-DEC-04 0 1
18-DEC-04 0 1
19-DEC-04 0 1
20-DEC-04 0 1
21-DEC-04 0 1
22-DEC-04 0 1
23-DEC-04 0 1
24-DEC-04 0 1
25-DEC-04 0 1
26-DEC-04 0 1
27-DEC-04 0 1
28-DEC-04 10000 1
29-DEC-04 7000 1
30-DEC-04 0 1
31-DEC-04 0 1

32 rows selected.

Kindly guide me how to proceed now. Sorry to trouble you for such a small problem as I am not very clear to proceed. So I am learning from you to tackle such situations in SQL.

regards & thanks
pjp


Tom Kyte
January 04, 2005 - 8:33 am UTC

how to proceed with what?

Required help

Parag Jayant Patankar, January 04, 2005 - 8:45 am UTC

Hi Tom,

from following output

DT L M
--------- ---------- ----------
30-NOV-04 5000
01-DEC-04 0 1
02-DEC-04 0 1
03-DEC-04 0 1
04-DEC-04 0 1
05-DEC-04 0 1
06-DEC-04 0 1
07-DEC-04 0 1
08-DEC-04 0 1
09-DEC-04 0 1
10-DEC-04 6000 1
11-DEC-04 0 1
12-DEC-04 0 1
13-DEC-04 0 1
14-DEC-04 0 1
15-DEC-04 0 1
16-DEC-04 0 1
17-DEC-04 0 1
18-DEC-04 0 1
19-DEC-04 0 1
20-DEC-04 0 1
21-DEC-04 0 1
22-DEC-04 0 1
23-DEC-04 0 1
24-DEC-04 0 1
25-DEC-04 0 1
26-DEC-04 0 1
27-DEC-04 0 1
28-DEC-04 10000 1
29-DEC-04 7000 1
30-DEC-04 0 1
31-DEC-04 0 1

I want a output like this

DT L M N
--------- ---------- ----------------------
30-NOV-04 5000 5000
01-DEC-04 0 1 5000 <= ( 5000 * 1 i.e. value of column m X last non zero value of column l )
02-DEC-04 0 1 5000 <= ( 5000 * 1 )
03-DEC-04 0 1 5000 <= ( 5000 * 1 )
04-DEC-04 0 1 5000 <= ( 5000 * 1 )
05-DEC-04 0 1 5000 <= ( 5000 * 1 )
06-DEC-04 0 1 5000 <= ( 5000 * 1 )
07-DEC-04 0 1 5000 <= ( 5000 * 1 )
08-DEC-04 0 1 5000 <= ( 5000 * 1 )
09-DEC-04 0 1 5000 <= ( 5000 * 1 )
10-DEC-04 6000 1 6000 <= ( 6000 * 1 )
11-DEC-04 0 1 6000 <= ( 6000 * 1 )
12-DEC-04 0 1 6000 <= ( 6000 * 1 )
13-DEC-04 0 1 6000 <= ( 6000 * 1 )
14-DEC-04 0 1 6000 <= ( 6000 * 1 )
15-DEC-04 0 1 6000 <= ( 6000 * 1 )
16-DEC-04 0 1 6000 <= ( 6000 * 1 )
17-DEC-04 0 1 6000 <= ( 6000 * 1 )
18-DEC-04 0 1 6000 <= ( 6000 * 1 )
19-DEC-04 0 1 6000 <= ( 6000 * 1 )
20-DEC-04 0 1 6000 <= ( 6000 * 1 )
21-DEC-04 0 1 6000 <= ( 6000 * 1 )
22-DEC-04 0 1 6000 <= ( 6000 * 1 )
23-DEC-04 0 1 6000 <= ( 6000 * 1 )
24-DEC-04 0 1 6000 <= ( 6000 * 1 )
25-DEC-04 0 1 6000 <= ( 6000 * 1 )
26-DEC-04 0 1 6000 <= ( 6000 * 1 )
27-DEC-04 0 1 6000 <= ( 6000 * 1 )
28-DEC-04 10000 1 10000 <= (10000 * 1 )
29-DEC-04 7000 1 7000 <= ( 7000 * 1 )
30-DEC-04 0 1 7000 <= ( 7000 * 1 )
31-DEC-04 0 1 7000 <= ( 7000 * 1 )

How can I write SQL to achive this preferably using analytical ?

thanks for helping me.

regards & thanks
pjp



Tom Kyte
January 04, 2005 - 8:58 am UTC

variation on a theme

https://www.oracle.com/technetwork/issue-archive/2014/14-mar/o24asktom-2147206.html
Analytics to the rescue...

same concept, a "carry forward"

ops$tkyte@ORA9IR2> select dt, l,
  2         to_number(substr(max(last_l) over (order by dt),11)) last_l, m
  3    from (
  4  select dt, l,
  5         case when l is not null then to_char(row_number() over (order by dt),'fm0000000000')||l end last_l,
  6         m
  7  from
  8  (
  9  select dt, b l,
 10         dt - lag(dt) over (order by dt) m
 11  from parag,
 12  (
 13  select to_date( '30-NOV-2004', 'dd-mon-yyyy') + rownum-1 dt
 14  from all_objects
 15  where rownum <= to_char(last_day(to_date('30-NOV-2004','dd-mon-yyyy') ),'dd') +2
 16  order by 1
 17  ) x
 18  where to_date(parag.a(+), 'RRRRMMDD') = dt
 19  )
 20  )
 21  order by 1, 3, 4
 22  /
 
DT                 L     LAST_L          M
--------- ---------- ---------- ----------
30-NOV-04       5000       5000
01-DEC-04                  5000          1
02-DEC-04                  5000          1
03-DEC-04                  5000          1
04-DEC-04                  5000          1
05-DEC-04                  5000          1
06-DEC-04                  5000          1
07-DEC-04                  5000          1
08-DEC-04                  5000          1
09-DEC-04                  5000          1
10-DEC-04       6000       6000          1
11-DEC-04                  6000          1
12-DEC-04                  6000          1
13-DEC-04                  6000          1
14-DEC-04                  6000          1
15-DEC-04                  6000          1
16-DEC-04                  6000          1
17-DEC-04                  6000          1
18-DEC-04                  6000          1
19-DEC-04                  6000          1
20-DEC-04                  6000          1
21-DEC-04                  6000          1
22-DEC-04                  6000          1
23-DEC-04                  6000          1
24-DEC-04                  6000          1
25-DEC-04                  6000          1
26-DEC-04                  6000          1
27-DEC-04                  6000          1
28-DEC-04      10000      10000          1
29-DEC-04       7000       7000          1
30-DEC-04                  7000          1
31-DEC-04                  7000          1
 
32 rows selected.
 

Statistics

A reader, January 04, 2005 - 10:16 am UTC

create table txn
(
txn_id int primary key,
load_seq_no int
);

I want to find out
1. Min, max and average no. of txns per load seq_no
2. Number of unique load_seq_nos
3. Total number of records in the table

All in one query. One FTS of the table, this is a large table (50 million rows)

Help? Thanks

Tom Kyte
January 05, 2005 - 8:19 am UTC

1) is ambigous -- you put "min", "max" and "avg" num of txns/load seq_no together.  I know what "min", "max" txn_id per load_seq_no is:

select min(txn_id), max(txn_id) from t group by load_seq_no;

but an "average" number of transactions per load_seq_no is

select avg( count(txn_id) ) from t group by load_seq_no (an aggregate of an aggregate)


so, I'll assume you want:

by load_seq_no, the min/max txn id, the cnt of txn_ids

and then, the avg number of cnt of txn_ids
          the count of all records
          the count of distinct load_seq_no's


ops$tkyte@ORA9IR2> select * from t;
 
    TXN_ID LOAD_SEQ_NO
---------- -----------
         1           1
         2           1
         3           1
         4           2
         5           2
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace on explain
ops$tkyte@ORA9IR2> select load_seq_no,
  2         min_tid,
  3         max_tid,
  4         avg(cnt_tid) over () avg_txn,
  5         sum(cnt_tid) over () tot_records,
  6         count(*) over () num_unq_load_seq
  7    from (
  8  select load_seq_no, min(txn_id) min_tid, max(txn_id) max_tid, count(txn_id) cnt_tid
  9    from t
 10   group by load_seq_no
 11         )
 12  /
 
LOAD_SEQ_NO    MIN_TID    MAX_TID    AVG_TXN TOT_RECORDS NUM_UNQ_LOAD_SEQ
----------- ---------- ---------- ---------- ----------- ----------------
          1          1          3        2.5           5                2
          2          4          5        2.5           5                2
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   WINDOW (BUFFER)
   2    1     SORT (GROUP BY)
   3    2       TABLE ACCESS (FULL) OF 'T'
 
 
 
ops$tkyte@ORA9IR2> set autotrace off
 

what is wrong with analytics?

mikito, January 04, 2005 - 8:54 pm UTC

Zillions of extra unnecessary concepts.

* lag with offset parameter -- i was naively assuming that lag(lag) must work
* confusing window definnition with preceding, unbounded, following

How analytical SQL should have been designed? Well, let see what concepts are really required.

First, we have to be able to introduce a counter on ordered result set. Any query block that has "order by" clause must have implicit row_num() pseudocolumn. This is IMO one of the majour weaknesses of current SQL: they have underutilized "order by" clause. Now, when we have row numbers, we can define windows any way we like with a scalar subquery within the select clause. When leveraging subqueries we specify window with predicates, while analytic ANSI SQL uses COBOL style descriptions like "RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY FOLLOWING"

Admitterdly,

with ordemp as (
select e.*, rownum rn from (
select * from emp e order by deptno, sal
) e
)
select oe.*,
(select oe.rn - min(rn) from ordemp oee
where oee.deptno = oe.deptno)
from ordemp oe;

looks a little bit less succinct than

select e.*,
row_number(sal)
over(order by sal partition by deptno)
from emp e;

mostly because row_number() functionality that I have described is not there. On the brighter side, when writing the first query I don't have to consult reference manual every 5 minutes.


Tom Kyte
January 05, 2005 - 9:24 am UTC

yawn, boring...

we know you don't understand "why they are cool"
we know you don't like them
we know you don't like aggregates and count(*) either


and frankly..... :) well -- you can fill in the blanks.


I don't have to consult the manuals every 5 minutes for them. And on the bright side, my stuff runs tons faster -- that is a bonus. I won't even comment on the readability or understandability of your convuluted solution with the "with" clause.


I truly and utterly fail to see how:

row_number() over (PARTITION BY DEPT
ORDER BY SAL)

is "hard" -- either to remember, or to type. If you do top-n processing lots, guess what, that statement just falls out of your hands and into the keyboard.


What I do have to reference the manual for is "how to create a partitioned table" -- why? because I don't do that 50 times a day.

So, if you are reaching for the manual every 5 minutes to remember the syntax of something you do every 5 minutes -- well, that might speak to your short term memory more than anything else. If you use analytics, you'll know the syntax (it takes very little time actually to "get them" -- you have partition (break the result set up), you have order them (should already know that stuff) and then you have the less frequently used RANGE/ROWS clause -- which is the "hard part" conceptually -- but also the "hardly used" part)



There ain't nothing wrong with analytics.

There is something wrong with not learning them.


Stats

A reader, January 05, 2005 - 9:45 am UTC

I had come up with the following before I read your solution

SELECT MIN(cnt),MAX(cnt),MIN(dist),MAX(dist),MIN(total),AVG(cnt),
percentile_cont(0.5) within GROUP (ORDER BY cnt) median1,
percentile_disc(0.5) within GROUP (ORDER BY cnt) median2
FROM (
SELECT
COUNT(*) over (PARTITION BY load_sequence_number) cnt,
COUNT(DISTINCT load_sequence_number) over () dist,
COUNT(*) over () total
FROM t
)

Worked fine, but took 20 minutes for a 50 million row table. Isnt that excessive? If I take out the percentile_* functions, it goes down to 10 minutes. Those functions double the run-time!? Why?

Thanks

Tom Kyte
January 05, 2005 - 10:51 am UTC

use sql_trace and tkprof to see what things are happening.


think about the work that needs to be performed. you are applying analytics to analytics -- it is virtually having to run the query twice -- inner query to temp probably, outer query from temp to temp and then you get the answer.


larger pga_aggregate_targets or sort_area_sizes might be called for but use a 10046 level 12 trace, tkprof it and see what you see.

Analytics and ANSI SQL

Su Baba, October 03, 2006 - 4:54 pm UTC

In chapter 12, p 545 of your book "Expert Oracle One-On-One", you mentioned that "these extensions [analytic functions] are currently under review by the ANSI SQL committee for inclusion in the SQL specification."

Do you know if this has already happened or will happen soon?

Tom Kyte
October 03, 2006 - 4:59 pm UTC

I believe it has, but so have a billion other things. Anyone that says "we are fully ansi compliant" is lying to you :) there are so many things in the spec that I seriously doubt anyone will ever be compliant again.



More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.