Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Jan.

Asked: October 29, 2001 - 10:35 am UTC

Last updated: November 07, 2011 - 11:59 am UTC

Version: 8.16

Viewed 10K+ times! This question is

You Asked

Tom,

You used 'first_value(ename) over ( partition by deptno order by hiredate desc NULLS LAST )' in one of your example queries, which made me wonder why you don't use last_value...

These two queries both get the first and last snap id for a day, one uses 'first_value .. sort by .. desc', the other 'last_value'. Which one would be more efficient?

select snap_id
, snap_time
, FIRST_VALUE(snap_id) OVER (partition by trunc(snap_time) ORDER BY snap_time) AS first_snap
, FIRST_VALUE(snap_id) OVER (partition by trunc(snap_time) ORDER BY snap_time desc) AS last_snap
from STATS$SNAPSHOT
;

select snap_id
, snap_time
, FIRST_VALUE(snap_id) OVER (partition by trunc(snap_time) ORDER BY snap_time ) AS first_snap
, LAST_VALUE(snap_id) OVER (partition by trunc(snap_time) ORDER BY snap_time
rows between unbounded preceding and unbounded following) AS last_snap
from STATS$SNAPSHOT
;

The first explain plan shows:
Operation
SELECT STATEMENT
WINDOW SORT
WINDOW SORT
TABLE ACCESS FULL STATS$SNAPSHOT

The second
Operation
SELECT STATEMENT
WINDOW SORT
TABLE ACCESS FULL STATS$SNAPSHOT

Thanks a lot, jan

and Tom said...

I goofed. I did it the inefficient way. It works better if you can keep the windows sorted by the same thing.

What I did to test this is:

REM drop table t;
REM
REM create table t as select deptno, hiredate from emp;
REM insert /*+ append */ into t select * from t;
REM commit;
REM insert /*+ append */ into t select * from t;
REM commit;
REM insert /*+ append */ into t select * from t;
REM commit;
.... (lots more, got over 200k rows in there)

and then TKPROF shows:

select deptno,
first_value(hiredate) over ( partition by deptno order by hiredate ),
last_value(hiredate) over ( partition by deptno order by hiredate rows between
unbounded preceding and unbounded following)
from t

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 15293 10.76 25.97 4282 519 34 229376
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 15296 10.76 25.98 4282 519 34 229376

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 61

Rows Row Source Operation
------- ---------------------------------------------------
229376 WINDOW SORT
229376 TABLE ACCESS FULL T

********************************************************************************

select deptno,
first_value(hiredate) over ( partition by deptno order by hiredate ),
first_value(hiredate) over ( partition by deptno order by hiredate desc nulls last )
from t

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 15293 18.01 58.21 8919 519 51 229376
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 15296 18.01 58.21 8919 519 51 229376

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 61

Rows Row Source Operation
------- ---------------------------------------------------
229376 WINDOW SORT
229376 WINDOW SORT
229376 TABLE ACCESS FULL T

keeping the windows the same, especially for LARGE windows as I had, is much more efficient.

Thanks for the heads up.

Rating

  (19 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Performance difference?

jan van mourik, October 29, 2001 - 4:45 pm UTC

Thanks for the quick reply Tom!

first_value performance

steve, September 24, 2002 - 11:37 am UTC

Hi Tom,

Platform: Oracle 8.1.7, Solaris 8.

I have 4 tables with the following rowcount:

ppw_customer_dim            96111
ppw_cust_msisdn_history     96111
ppw_service_class_history   178574
ppw_service_class_dim       184





The following query works well (note: all join 
columns are indexed)

  1  select count(*)
  2  from    ppw_customer_dim          ppw
  3         ,ppw_cust_msisdn_history   pcmh
  4         ,ppw_service_class_history sch
  5         ,ppw_service_class_dim     scd
  6  where       ppw.cust_id                 = pcmh.cust_id
  7          and ppw.curr_msisdn             = pcmh.msisdn
  8          and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
  9*         and sch.service_class_id        = scd.service_class_id
SQL> /

  COUNT(*)
----------
    178573

Elapsed: 00:00:03.29












Putting the whole thing as a sub-query, still ok:
=================================================

Elapsed: 00:00:03.34
  1  select count(*) from (
  2     select   ppw.curr_msisdn
  3     from    ppw_customer_dim          ppw
  4            ,ppw_cust_msisdn_history   pcmh
  5            ,ppw_service_class_history sch
  6            ,ppw_service_class_dim     scd
  7     where       ppw.cust_id                 = pcmh.cust_id
  8             and ppw.curr_msisdn             = pcmh.msisdn
  9             and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
 10             and sch.service_class_id        = scd.service_class_id
 11*    )
SQL> /

  COUNT(*)
----------
    178573

Elapsed: 00:00:03.44






Now, I start introducing the first_value:
=========================================

  1  select count(*) from (
  2     select   ppw.curr_msisdn
  3             ,first_value(ppw.cust_id)
  4                  over (partition by curr_MSISDN
  5                        order     by ppw.gen_ymdhms desc nulls first
  6                       ) phone_account
  7     from    ppw_customer_dim          ppw
  8            ,ppw_cust_msisdn_history   pcmh
  9            ,ppw_service_class_history sch
 10            ,ppw_service_class_dim     scd
 11     where       ppw.cust_id                 = pcmh.cust_id
 12             and ppw.curr_msisdn             = pcmh.msisdn
 13             and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
 14             and sch.service_class_id        = scd.service_class_id
 15*    )
SQL> /

  COUNT(*)
----------
    178573

Elapsed: 00:00:11.74



The increase in execution time surprised me a little, but
fine. I guess there's some overhead with the first_value.




Now, I introduced a second first_value:
=======================================


SQL> 

  1  select count(*) from (
  2     select   ppw.curr_msisdn
  3              --
  4             ,first_value(ppw.cust_id)
  5                  over (partition by curr_MSISDN
  6                        order     by ppw.gen_ymdhms desc nulls first
  7                       ) phone_account
  8              --
  9             ,first_value(ppw.master_cust_id)
 10                 over (partition by curr_MSISDN
 11                       order     by ppw.gen_ymdhms desc nulls first
 12                      ) master_account
 13     from    ppw_customer_dim          ppw
 14            ,ppw_cust_msisdn_history   pcmh
 15            ,ppw_service_class_history sch
 16            ,ppw_service_class_dim     scd
 17     where       ppw.cust_id                 = pcmh.cust_id
 18             and ppw.curr_msisdn             = pcmh.msisdn
 19             and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
 20             and sch.service_class_id        = scd.service_class_id
 21*    )
SQL> /

  COUNT(*)
----------
    178573

Elapsed: 00:00:16.80


Hmmm... Same partition and sorting. Surprised at the increase.




Now, I introduced a third first_value:
======================================


  1  select count(*) from (
  2     select   ppw.curr_msisdn
  3              --
  4             ,first_value(ppw.cust_id)
  5                  over (partition by curr_MSISDN
  6                        order     by ppw.gen_ymdhms desc nulls first
  7                       ) phone_account
  8              --
  9             ,first_value(ppw.master_cust_id)
 10                 over (partition by curr_MSISDN
 11                       order     by ppw.gen_ymdhms desc nulls first
 12                      ) master_account
 13              --
 14             ,first_value(ppw.cust_status)
 15                 over (partition by curr_MSISDN
 16                       order     by ppw.gen_ymdhms desc nulls first
 17                      ) cust_status
 18     from    ppw_customer_dim          ppw
 19            ,ppw_cust_msisdn_history   pcmh
 20            ,ppw_service_class_history sch
 21            ,ppw_service_class_dim     scd
 22     where       ppw.cust_id                 = pcmh.cust_id
 23             and ppw.curr_msisdn             = pcmh.msisdn
 24             and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
 25             and sch.service_class_id        = scd.service_class_id
 26*    )
SQL> /

  COUNT(*)
----------
    178573

Elapsed: 00:00:21.55




Now, I introduced a fourth first_value:
=======================================

  1   select count(*) from (
  2      select   ppw.curr_msisdn
  3               --
  4              ,first_value(ppw.cust_id)
  5                   over (partition by curr_MSISDN
  6                         order     by ppw.gen_ymdhms desc nulls first
  7                        ) phone_account
  8               --
  9              ,first_value(ppw.master_cust_id)
 10                  over (partition by curr_MSISDN
 11                        order     by ppw.gen_ymdhms desc nulls first
 12                       ) master_account
 13               --
 14              ,first_value(ppw.cust_status)
 15                  over (partition by curr_MSISDN
 16                        order     by ppw.gen_ymdhms desc nulls first
 17                       ) cust_status
 18              ,first_value(ppw.serv_status)
 19                  over (partition by curr_MSISDN
 20                        order     by ppw.gen_ymdhms desc nulls first
 21                       ) serv_status
 22      from    ppw_customer_dim          ppw
 23             ,ppw_cust_msisdn_history   pcmh
 24             ,ppw_service_class_history sch
 25             ,ppw_service_class_dim     scd
 26      where       ppw.cust_id                 = pcmh.cust_id
 27              and ppw.curr_msisdn             = pcmh.msisdn
 28              and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
 29              and sch.service_class_id        = scd.service_class_id
 30*     )
SQL> /

  COUNT(*)
----------
    178573

Elapsed: 00:00:27.38





BUT: If I sort a subquery first and then apply the first_value()
on the result set, I get much better results:




  1    select count(*) from (
  2       select   curr_msisdn
  3                --
  4               ,first_value(cust_id)
  5                    over (partition by curr_MSISDN
  6                          order     by gen_ymdhms desc nulls first
  7                         ) phone_account
  8                --
  9               ,first_value(master_cust_id)
 10                   over (partition by curr_MSISDN
 11                         order     by gen_ymdhms desc nulls first
 12                        ) master_account
 13                --
 14               ,first_value(cust_status)
 15                   over (partition by curr_MSISDN
 16                         order     by gen_ymdhms desc nulls first
 17                        ) cust_status
 18               ,first_value(serv_status)
 19                   over (partition by curr_MSISDN
 20                         order     by gen_ymdhms desc nulls first
 21                        ) serv_status
 22       from    (select  pcmh.gen_ymdhms,ppw.curr_msisdn,ppw.cust_id,ppw.master_cust_id,ppw.cust_status,ppw.serv_status
 23              from  ppw_customer_dim          ppw
 24              ,ppw_cust_msisdn_history   pcmh
 25              ,ppw_service_class_history sch
 26              ,ppw_service_class_dim     scd
 27       where       ppw.cust_id                 = pcmh.cust_id
 28               and ppw.curr_msisdn             = pcmh.msisdn
 29               and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
 30               and sch.service_class_id        = scd.service_class_id
 31       --order by curr_msisdn,gen_ymdhms)
 32       order by ppw.curr_msisdn,pcmh.gen_ymdhms)
 33*      )
SQL> /

  COUNT(*)
----------
    178573

Elapsed: 00:00:19.63


But this is still dependent on the number of first_value operators:
===================================================================


  1    select count(*) from (
  2       select   curr_msisdn
  3                --
  4               ,first_value(cust_status)
  5                   over (partition by curr_MSISDN
  6                         order     by gen_ymdhms desc nulls first
  7                        ) cust_status
  8               ,first_value(serv_status)
  9                   over (partition by curr_MSISDN
 10                         order     by gen_ymdhms desc nulls first
 11                        ) serv_status
 12       from    (select  pcmh.gen_ymdhms,ppw.curr_msisdn,ppw.cust_id,ppw.master_cust_id,ppw.cust_status,ppw.serv_status
 13              from  ppw_customer_dim          ppw
 14              ,ppw_cust_msisdn_history   pcmh
 15              ,ppw_service_class_history sch
 16              ,ppw_service_class_dim     scd
 17       where       ppw.cust_id                 = pcmh.cust_id
 18               and ppw.curr_msisdn             = pcmh.msisdn
 19               and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
 20               and sch.service_class_id        = scd.service_class_id
 21       --order by curr_msisdn,gen_ymdhms)
 22       order by ppw.curr_msisdn,pcmh.gen_ymdhms)
 23*      )
SQL> /

  COUNT(*)
----------
    178573

Elapsed: 00:00:11.90






Questions:
----------

1. Why does the number of first_value operations affect the performance?
   If each column is partitioned and sorted in the same way, I thought
   performance would be the same.

2. Is it normal that the second query approach (the one that has
   the first_value() working on a sorted result set) would
   be faster?
   
3. Any performance advice for this query. The real query has about
   16 first_value() calls and takes minutes to run.



Tom: If you would like the exec plans, please let me know

Thanks Tom

Steve 

Tom Kyte
September 24, 2002 - 3:39 pm UTC

1) its doing tons more work -- partitioning the result set, sorting it.

2) many times, when something is done has a large effect on the time -- so yes, it is normal in both ways (faster and slower -- depending)

3) umm, get rid of the first_value calls since they don't do anything?




I guess I didn't ask the right questions

steve, September 24, 2002 - 3:46 pm UTC

Regarding Q1:

if we have one first_value() or 10 first_value(),
if they are all partitioning and sorting in the
same way, is the optimizer not able to detect
this and therefore not repeat the work for each column?

Regarding answer 2:

Sorry, but I don't understand the answer.

Regarding answer 3:

Why do you say they "don't do anything"? It's taking the most recent value for the column over time.

Tom Kyte
September 24, 2002 - 4:04 pm UTC

1) the addition of each column did not add linearly to the time

Elapsed: 00:00:03.29 no first value
Elapsed: 00:00:11.74 1 FV +8
Elapsed: 00:00:16.80 2 FV +5
Elapsed: 00:00:21.55 3 FV +5
Elapsed: 00:00:27.38 4 FV +5

The work *is in fact done once* and the optimizer recognized that. If not, all would be +8 perhaps -- consider, you proved that with:

Elapsed: 00:00:19.63 4 FV, but pre-sorted
Elapsed: 00:00:11.90 2 FV, but pre-sorted


But as for "why does each take longer" -- that is like asking "why does select sum(sal), sum(comm) from emp take longer then select sum(sal) from emp"?

You are asking for more work to be done.

and don't forget, the simple select count(*) might have been skipping lots of table access by rowids that the first values added back in (the first values are not coming from the same tables either -- so more then one extra "table access by rowid" could have been added.

2) Umm, sometims SORTING and then doing an operation is faster then doing a little bit -- sorting -- doing a little bit -- sorting and so on.

For example, sometimes "select * from t where x > 0" should use an index (when 5-15% of the table is > 0) and sometimes it should blow off the index and just do a full tablescan. Even if I add "order by x" onto the query -- we should avoid the index, full scan/sort -- instead of index range scan/table access by rowid.

Sometimes doing a function later (like you did, sort and then function) rather then sorting in the function will be better -- sometimes not.

3) did it change the result set? the answer posted back to you?





Much clearer!! Thanks!

steve, September 24, 2002 - 4:25 pm UTC

Hey Tom,

Much clearer. Thank you.

I'm still surprised though at the increase (5 seconds or so
per first_value()). But hey, life is full of surprises I guess!

I'll get back to you on #3. I sure as heck hope it changes
the result set or else.....

Thanks again, AS ALWAYS, Tom. You rock man!

Steve

answer for Q3

steve, September 24, 2002 - 5:27 pm UTC

Here's the answer on the result set with/without
the first_value calls


CREATE OR REPLACE VIEW V ( CURR_MSISDN, 
PHONE_ACCOUNT, MASTER_ACCOUNT, CUST_STATUS, SERV_STATUS
 ) AS (select   distinct ppw.curr_msisdn
               --
              ,first_value(ppw.cust_id)
                   over (partition by curr_MSISDN
                         order     by ppw.gen_ymdhms desc nulls first
                        ) phone_account
               --
              ,first_value(ppw.master_cust_id)
                  over (partition by curr_MSISDN
                        order     by ppw.gen_ymdhms desc nulls first
                       ) master_account
               --
              ,first_value(ppw.cust_status)
                  over (partition by curr_MSISDN
                        order     by ppw.gen_ymdhms desc nulls first
                       ) cust_status
              ,first_value(ppw.serv_status)
                  over (partition by curr_MSISDN
                        order     by ppw.gen_ymdhms desc nulls first
                       ) serv_status
      from    ppw_customer_dim          ppw
             ,ppw_cust_msisdn_history   pcmh
             ,ppw_service_class_history sch
             ,ppw_service_class_dim     scd
      where       ppw.cust_id                 = pcmh.cust_id
              and ppw.curr_msisdn             = pcmh.msisdn
              and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
              and sch.service_class_id        = scd.service_class_id
      )



CREATE OR REPLACE VIEW V2 ( CURR_MSISDN, 
PHONE_ACCOUNT, MASTER_ACCOUNT, CUST_STATUS, SERV_STATUS
 ) AS (select   distinct ppw.curr_msisdn,
               --
              ppw.cust_id,ppw.master_cust_id,ppw.cust_status,ppw.serv_status
      from    ppw_customer_dim          ppw
             ,ppw_cust_msisdn_history   pcmh
             ,ppw_service_class_history sch
             ,ppw_service_class_dim     scd
      where       ppw.cust_id                 = pcmh.cust_id
              and ppw.curr_msisdn             = pcmh.msisdn
              and pcmh.cust_msisdn_key        = sch.cust_msisdn_key
              and sch.service_class_id        = scd.service_class_id
      )




SQL> select * from v;

CURR_MSISDN                    PHONE_ACCOUNT MASTER_ACCOUNT C S
------------------------------ ------------- -------------- - -
525510022031                          119057         119057 q I

Elapsed: 00:00:00.01




SQL>  select * from v2;

CURR_MSISDN                    PHONE_ACCOUNT MASTER_ACCOUNT C S
------------------------------ ------------- -------------- - -
525510022031                          119057         119057 P I
525510022031                          119057         119057 q I

Elapsed: 00:00:00.00


Thanks again for your help Tom.

Steve
 

Tom Kyte
September 24, 2002 - 5:37 pm UTC

well, those are not what we were looking at! not a count(*) to be seen.

right

steve, September 25, 2002 - 6:21 am UTC

You're right that with the count(*) there's no
difference. And in the original posting, I was using the
count(*).

I added the count(*) in the posting and during testing to stop the thousands of lines from being spit out to the screen. But in the real application, there's no count(*). So that's why I was confused by the original comment (I was thinking about the real application - not the example I posted). My mistake.

Now we're on the same page!

Ciao

Steve

Tom Kyte
September 25, 2002 - 8:30 am UTC

don't do that. you cannot compare count(*) to the REAL QUERY!!!!

count(*) will many many many times let us skip the "TABLE ACCESS by ROWID" after an index lookup. When you put your real columns back in -- we need to add those.

Consider this trivial example:

big_table@ORA920.US.ORACLE.COM> select count(*) from ( select * from big_table where owner = user );


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=211 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'BIG_INDEX_1' (NON-UNIQUE) (Cost=211 Card=33234 Bytes=166170)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
3 physical reads
0 redo size
379 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

big_table@ORA920.US.ORACLE.COM> select * from big_table where owner = user;

32 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1237 Card=33234 Bytes=3190464)
1 0 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=1237 Card=33234 Bytes=3190464)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12712 consistent gets
12694 physical reads
300 redo size
1637 bytes sent via SQL*Net to client
521 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
32 rows processed



or this one:


big_table@ORA920.US.ORACLE.COM> select count(*)
2 from ( select * from big_Table where owner = 'OUTLN' )
3 /


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'BIG_INDEX_1' (NON-UNIQUE) (Cost=4 Card=224 Bytes=1120)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
380 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

big_table@ORA920.US.ORACLE.COM> select * from big_Table where owner = 'OUTLN'
2 /

224 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=228 Card=224 Bytes=21504)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=228 Card=224 Bytes=21504)
2 1 INDEX (RANGE SCAN) OF 'BIG_INDEX_1' (NON-UNIQUE) (Cost=4 Card=224)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
248 consistent gets
0 physical reads
300 redo size
4433 bytes sent via SQL*Net to client
653 bytes received via SQL*Net from client
16 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
224 rows processed


4 consistent gets vs 248 consistent gets! The resulting performance difference makes it IMPOSSIBLE, totally 100% IMPOSSIBLE to use select count(*) from ( YOUR_QUERY_HERE ) as a method of tuning or testing ideas!!!!


Use autotrace traceonly as I did and you'll find the query runs, but the output is never printed





Really Useful!

Steve, September 25, 2002 - 2:19 pm UTC

I feel like I've just been scolded by the principal :-)
That's good cause I need a good slap behind the head now and again.

Thanks Tom. I will definitely use autotrace traceonly. This is exactly what I needed.

Steve

How to show "Elapsed time"

Abubaker, September 29, 2002 - 3:27 am UTC

Oracle 8.1.6.3
NT 4

Hi all,
Can someone show me how to display the "Elapsed:00:00:00.00"


Thanks,

Tom Kyte
September 29, 2002 - 8:09 am UTC

SQL> set timing on 

If you do that and see "real nnnnnn", then use the 8i or above version of sqlplus.  It shows the time in that format.  older releases on windows did not.

 

rows between unbounded preceding and unbounded following

Kamal Kishore, May 27, 2003 - 9:57 pm UTC

Hi Tom,
What does this clause do:
"rows between unbounded preceding and unbounded following"
when using last_value?

I get different results from these two queries:

SQL> select distinct deptno,
  2                   last_value(ename) over (partition by deptno order by sal
  3                                           rows between unbounded preceding and unbounded following) as highest_sal
  4  from scott.emp
  5  /

    DEPTNO HIGHEST_SA
---------- ----------
        10 KING
        20 FORD
        30 BLAKE

3 rows selected.

SQL>
SQL> select distinct deptno,
  2                   last_value(ename) over (partition by deptno order by sal) as highest_sal
  3  from scott.emp
  4  /

    DEPTNO HIGHEST_SA
---------- ----------
        10 CLARK
        10 KING
        10 MILLER
        20 ADAMS
        20 JONES
        20 SCOTT
        20 SMITH
        30 ALLEN
        30 BLAKE
        30 JAMES
        30 TURNER
        30 WARD

12 rows selected.

SQL>
 

Tom Kyte
May 28, 2003 - 6:51 am UTC

the default window is from the beginning of the partition TO THE CURRENT ROW within the partition for many analytics (ones that use order by in them).

you changed the defaul window to the entire partition using the unbounded preceding/following.  Consider:

ops$tkyte@ORA920LAP> select deptno, ename, sal,
  2         last_value(ename) over (partition by deptno order by sal
  3                                 rows between unbounded preceding
  4                                  and unbounded following) as highest_sal1,
  5         last_value(ename) over (partition by deptno order by sal) highest_sal2
  6      from scott.emp
  7   order by deptno, sal
  8  /

    DEPTNO ENAME             SAL HIGHEST_SA HIGHEST_SA
---------- ---------- ---------- ---------- ----------
        10 MILLER           1300 KING       MILLER
           CLARK            2450 KING       CLARK
           KING             5000 KING       KING

        20 SMITH             800 SCOTT      SMITH
           ADAMS            1100 SCOTT      ADAMS
           JONES            2975 SCOTT      JONES
           FORD             3000 SCOTT      SCOTT
           SCOTT            3000 SCOTT      SCOTT

        30 JAMES             950 BLAKE      JAMES
           MARTIN           1250 BLAKE      WARD
           WARD             1250 BLAKE      WARD
           TURNER           1500 BLAKE      TURNER
           ALLEN            1600 BLAKE      ALLEN
           BLAKE            2850 BLAKE      BLAKE


14 rows selected.

ops$tkyte@ORA920LAP>

See how the default last_value() with default window (upto the current row in the partition) is the same as the current row within the partition?  You were using a sliding window


if you have my book expert one on one oracle -- i've got a pretty detailed walkthru of this stuff in there with pictures and stuff. 

Excellent explanation

Kamal Kishore, May 28, 2003 - 8:36 am UTC

Hi Tom,
That was an excellent explanation, more than several pages of any book or documentation.
I have your book, but have yet to reach this chapter.
With all the things going on right now, I find very little time to read. But whenever I do find time, Expert One-On-One is the only book I read.
Thanks,


Please explain window sort

A reader, August 27, 2004 - 10:00 am UTC

Hi Tom,

The explain plan says "WINDOW (SORT)"

The Enterprise Manager says"

Sort/Merge
Sort Output

against a single WINDOW (SORT)

is this the only way to implement WINDOW (SORT)?

Can this be implemented in any other way, if the system has more memory?

The above actually goes to disk when sorting (I'll try increasing the SORT_AREA_SIZE with WORKAREA_SIZE_POLICY to MANUAL), but just confirming the most performant implementation of WINDOW/SORT.

Thanks

Tom Kyte
August 27, 2004 - 10:13 am UTC

a window sort is just a sort of data in an analytic functions window.

it is just a sort, period. just a sort.

FIRST_VALUE vs LAST_VALUE gotta watch the defaults !

A reader, August 16, 2006 - 5:19 pm UTC

FIRST_VALUE operates "intuitively", and LAST_VALUE "counter-intuitively" but correctly:

select rec#
     , gender
     , first_value(gender)
        over (order by rec#
              ASC)            "1"
     , first_value(gender)
        over (order by rec#
              DESC)           "2"
     , first_value(gender)
        over (order by rec#
              ASC
rows between unbounded preceding and unbounded following) "3"
     , first_value(gender)
        over (order by rec#
              DESC
rows between unbounded preceding and unbounded following) "4"
     , last_value(gender)
        over (order by rec#
              DESC)          "5"
     , last_value(gender)
        over (order by rec#
              ASC)           "6"
     , last_value(gender)
        over (order by rec#
              ASC
rows between unbounded preceding and unbounded following) "7"
     , last_value(gender)
        over (order by rec#
              DESC
rows between unbounded preceding and unbounded following) "8"
from (    select 1 rec#, 'F' gender from dual
UNION ALL select 2 rec#, 'M' gender from dual
UNION ALL select 3 rec#, '?' gender from dual )
order by rec#
SQL> /

 REC# G 1 2 3 4 5 6 7 8
----- - - - - - - - - -
    1 F F ? F ? F F ? F
    2 M F ? F ? M M ? F
    3 ? F ? F ? ? ? ? F

#5 and 6 give the results because, according to Bug No. 2227479 (Status: Closed, Not a Bug):

If you omit the windowing_clause entirely, the default is RANGE BETWEEN  UNBOUNDED PRECEDING AND CURRENT ROW.  So effectively each row forms a window and all values for [each row are treated] as last value  

Reader, April 07, 2009 - 12:24 pm UTC

create table tst_first_last_value
(dt_key number,
sym varchar2(100),
ts timestamp(6),
price number,
volume number
)

insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.550000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,1000);

insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.550000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,100);

insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.550000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,100);

insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.550000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,750);

insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.563000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,260);

insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.563000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104.36,260);

insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.707000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104.1,100);


insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:09:48.643000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104.14,300);

insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:09:48.643000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104.2,200);

commit;

select dt_key,sym,ts,to_char(ts,'hh:mi:ss') ts_hh_mi_ss,price,
from tst_first_last_value
order by ts;

I need to get the first value and last value of the price for the time 09:07:42 and 09:09:48 (basically ordered by ts). I wrote the query as shown below



select dt_key
,sym
,ts
,price
,first_value(price) over ( partition by sym,to_char(ts,'HH:MI:SS') order by ts) first_val
,last_value(price) over (partition by sym,to_char(ts,'HH:MI:SS') order by ts ) last_val
,last_value(price) over (partition by sym,to_char(ts,'HH:MI:SS') order by ts
rows between unbounded preceding and unbounded following) last_val_new
from tst_first_last_value;

I am getting different results for last_val and last_val_new columns. Can you explain what is the djavascript:doSubmit('SUBMIT_REVIEW')
Submit Reviewifference between using rows between and not using it?
Tom Kyte
April 13, 2009 - 12:46 pm UTC

when you use an order by with the analytic - and you do not specify a window (rows or range), the default window is the CURRENT ROW and everything PRECEDING it.


So, the last_value would always be the CURRENT row without a window clause.

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select * from all_users where rownum <= 10;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select t.*,
  2         first_value(user_id) over (order by user_id) fv,
  3         last_value(user_id) over (order by user_id) lv1_a,
  4         last_value(user_id) over
  5                 (order by user_id rows between unbounded preceding and current row) lv1_b,
  6         last_value(user_id) over
  7                 (order by user_id rows between current row and unbounded following) lv2
  8    from t
  9   order by user_id
 10  /

USERNAME                          USER_ID CREATED           FV      LV1_A      LV1_B        LV2
------------------------------ ---------- --------- ---------- ---------- ---------- ----------
DIP                                    19 30-JUN-05         19         19         19        228
TSMSYS                                 21 30-JUN-05         19         21         21        228
MDDATA                                 50 30-JUN-05         19         50         50        228
BIG_TABLE                              58 14-DEC-05         19         58         58        228
FOO$TKYTE                              60 19-DEC-05         19         60         60        228
LOTTOUSER                              65 30-DEC-05         19         65         65        228
R                                      76 09-JAN-06         19         76         76        228
SCOTT                                  84 12-FEB-06         19         84         84        228
QUOTA                                  94 22-FEB-06         19         94         94        228
AQ                                    228 15-OCT-07         19        228        228        228

10 rows selected.

DISTINCT last_value?

A reader, April 07, 2010 - 8:18 am UTC

If I have a query like

select
 ...
 (select last_value(col) over (order by dt rows between unbounded preceding and unbounded following) from x where x.pk=t.pk and <other predicates on x>)
 ...
from t


that basically subqueries out the last value (last record for a particular date from another table), if there are multiple rows for the same date, I will get 'single row subquery returns more than 1 row'

This can be addressed by adding DISTINCT before the last_value because of how the last_value function works with the window specification, all the rows will have the same value.

Is this a valid approach or would you suggest something else?

Thanks
Tom Kyte
April 12, 2010 - 8:41 pm UTC

that basically subqueries out the last value (last record for a particular date from another table), if there are multiple rows for the same date, I will get 'single row subquery returns more than 1 row'

no, it is if your predicate "where x.pk=t.pk and <other predicates on x>" RETURNS MORE THAN ONE ROW. Last_value is not an aggregate, it does not reduce the number of rows in a result set.

that'll always return the number of records the where clause returns - please explain in more detail here.

FIRST_VALUE() vs LAST_VALUE()

Praveen Ray, February 22, 2011 - 8:11 am UTC

Hi Tom,

Thanks for your great replies. I have a table 't' with the following data:

select * from t;

COL1 COL2
---- ----
x 1
x 2
x 3
y 4
y 5
y 6

As per documentation, I predicted the following two statements should produce the identical results which is not the case:

select distinct first_value(col2) over (partition by col1 order by col2 desc) from t;

FIRST_VALUE(COL2)OVER(PARTITIO
------------------------------
6
3

select distinct last_value(col2) over (partition by col1 order by col2 /*asc*/) from t;

LAST_VALUE(COL2)OVER(PARTITION
------------------------------
1
6
2
4
5
3

Please, guide what am I unable to get.
Tom Kyte
February 23, 2011 - 10:55 am UTC

ops$tkyte%ORA11GR2> select t.*,
  2  first_value( col2 ) over (partition by col1 order by col2 desc ) fv,
  3  last_value( col2 ) over (partition by col1 order by col2 ) lv
  4  from t
  5  order by col1, col2 DESC;

C       COL2         FV         LV
- ---------- ---------- ----------
x          3          3          3
x          2          3          2
x          1          3          1
y          6          6          6
y          5          6          5
y          4          6          4

6 rows selected.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select t.*,
  2  first_value( col2 ) over (partition by col1 order by col2 desc ) fv,
  3  last_value( col2 ) over (partition by col1 order by col2 ) lv
  4  from t
  5  order by col1, col2 ASC;

C       COL2         FV         LV
- ---------- ---------- ----------
x          1          3          1
x          2          3          2
x          3          3          3
y          4          6          4
y          5          6          5
y          6          6          6

6 rows selected.




the default window is from the current row and all PRECEDING rows. So, in the latter result set - the last value for the first row is "1", then when we move onto the second row - the last value is "2" because the window is the first two rows now.

and so on.

the first value never changes in the partition because the default window grows from the top down. but the "last row" is always changing as the window grows.

You'd have to add "rows between current row and unbounded following"

ops$tkyte%ORA11GR2> select t.*,
  2  first_value( col2 ) over (partition by col1 order by col2 desc ) fv,
  3  last_value( col2 ) over (partition by col1 order by col2 rows between current row
  4  and unbounded following ) lv
  5  from t
  6  order by col1, col2 ASC;

C       COL2         FV         LV
- ---------- ---------- ----------
x          1          3          3
x          2          3          3
x          3          3          3
y          4          6          6
y          5          6          6
y          6          6          6

6 rows selected.


so that the window grows "down" from the current row to the bottom of the partition instead of from the current row UP to the top of the partition

first_value() vs last_value

Praveen Ray, February 22, 2011 - 11:11 pm UTC

Hi Tom,

Sorry to bother you. I found the answer myself :)

select distinct last_value(col2) over (partition by col1 order by col2 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) from t;

select distinct last_value(col2) over (partition by col1 order by col2 RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) from t;

Regards,
Ray

first_value vs last_value

Praveen Ray, February 23, 2011 - 10:45 pm UTC

Hi,

Thank you so much Tom, for your nice illustration...Your way of describing really fits into mind :)

Regards,
Ray

Discrepancy in outputs of last_value function

A reader, November 07, 2011 - 2:25 am UTC

Hi Tom,

I am getting the output while using the last_value function as below...

scripts are as follows..

CREATE TABLE C1
(
  PK1       VARCHAR2(10 CHAR),
  PK2       VARCHAR2(10 CHAR),
  SERIALNO  NUMBER(10,2),
  IND       VARCHAR2(1 CHAR),
  VAL1      FLOAT(126),
  VAL2      FLOAT(126)
);

Insert into C1
   (PK1, PK2, SERIALNO, IND, VAL1, VAL2)
 Values
   ('a1', 'b1', 1, 'G', 10.5, 11.5);
Insert into C1
   (PK1, PK2, SERIALNO, IND, VAL1, VAL2)
 Values
   ('a1', 'b1', 2, 'G', 12.5, 12.5);
Insert into C1
   (PK1, PK2, SERIALNO, IND, VAL1, VAL2)
 Values
   ('a1', 'b1', 3, 'H', 13.5, 10.5);
Insert into C1
   (PK1, PK2, SERIALNO, IND, VAL1, VAL2)
 Values
   ('a1', 'b1', 4, 'G', 18.5, 15.5);
Insert into C1
   (PK1, PK2, SERIALNO, IND, VAL1, VAL2)
 Values
   ('a1', 'b1', 5, 'G', 21.5, 18.5);
COMMIT;

SQL> select * from c1;

PK1        PK2          SERIALNO I       VAL1       VAL2
---------- ---------- ---------- - ---------- ----------
a1         b1                  1 G       10.5       11.5
a1         b1                  2 G       12.5       12.5
a1         b1                  3 H       13.5       10.5
a1         b1                  4 G       18.5       15.5
a1         b1                  5 G       21.5       18.5

SQL> select val1,first_value(val1) over (order by serialno) fv,
  2  last_value(val1) over (order by serialno) lv from c1;

      VAL1         FV         LV
---------- ---------- ----------
      10.5       10.5       10.5
      12.5       10.5       12.5
      13.5       10.5       13.5
      18.5       10.5       18.5
      21.5       10.5       21.5

But i expect my output to be as below but i am getting above output.It's giving correctly for the first_value function but not for last_value function

     VAL1         FV         LV
--------- ---------- ----------
     21.5       10.5       21.5
     18.5       10.5       21.5
     13.5       10.5       21.5
     12.5       10.5       21.5
     10.5       10.5       21.5

Am i missing something or doing any mistake.Please clarify..

Tom Kyte
November 07, 2011 - 11:59 am UTC

let's plug in the default windowing clause and maybe it'll make more sense:

ops$tkyte%ORA11GR2> select val1,
  2         first_value(val1) over (order by serialno rows between unbounded preceding and current row ) fv,
  3         last_value(val1)  over (order by serialno rows between unbounded preceding and current row ) lv
  4    from c1;

      VAL1         FV         LV
---------- ---------- ----------
      10.5       10.5       10.5
      12.5       10.5       12.5
      13.5       10.5       13.5
      18.5       10.5       18.5
      21.5       10.5       21.5


see, the default window of data you look at is the current row and everything in front of it (it is an anchored window, like a window shade being pulled down).

So, the last value in such a window is ALWAYS the current row!

You need to either look from the current row on down (an anchored window, but anchored at the bottom of the partition) or turn the window upside down.

I recommend the first approach so you only need one sort.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select val1,
  2         first_value(val1) over (order by serialno rows between unbounded preceding and current row ) fv,
  3         last_value(val1)  over (order by serialno rows between current row and unbounded following ) lv,
  4         first_value(val1) over (order by serialno DESC rows between unbounded preceding and current row ) lv2
  5    from c1;

      VAL1         FV         LV        LV2
---------- ---------- ---------- ----------
      21.5       10.5       21.5       21.5
      18.5       10.5       21.5       21.5
      13.5       10.5       21.5       21.5
      12.5       10.5       21.5       21.5
      10.5       10.5       21.5       21.5

Discrepancy in outputs of last_value function

A reader, November 07, 2011 - 9:06 pm UTC

Thanks a lot for the clarification Tom.It's clear now

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.