Performance difference?
jan van mourik, October 29, 2001 - 4:45 pm UTC
Thanks for the quick reply Tom!
first_value performance
steve, September 24, 2002 - 11:37 am UTC
Hi Tom,
Platform: Oracle 8.1.7, Solaris 8.
I have 4 tables with the following rowcount:
ppw_customer_dim 96111
ppw_cust_msisdn_history 96111
ppw_service_class_history 178574
ppw_service_class_dim 184
The following query works well (note: all join
columns are indexed)
1 select count(*)
2 from ppw_customer_dim ppw
3 ,ppw_cust_msisdn_history pcmh
4 ,ppw_service_class_history sch
5 ,ppw_service_class_dim scd
6 where ppw.cust_id = pcmh.cust_id
7 and ppw.curr_msisdn = pcmh.msisdn
8 and pcmh.cust_msisdn_key = sch.cust_msisdn_key
9* and sch.service_class_id = scd.service_class_id
SQL> /
COUNT(*)
----------
178573
Elapsed: 00:00:03.29
Putting the whole thing as a sub-query, still ok:
=================================================
Elapsed: 00:00:03.34
1 select count(*) from (
2 select ppw.curr_msisdn
3 from ppw_customer_dim ppw
4 ,ppw_cust_msisdn_history pcmh
5 ,ppw_service_class_history sch
6 ,ppw_service_class_dim scd
7 where ppw.cust_id = pcmh.cust_id
8 and ppw.curr_msisdn = pcmh.msisdn
9 and pcmh.cust_msisdn_key = sch.cust_msisdn_key
10 and sch.service_class_id = scd.service_class_id
11* )
SQL> /
COUNT(*)
----------
178573
Elapsed: 00:00:03.44
Now, I start introducing the first_value:
=========================================
1 select count(*) from (
2 select ppw.curr_msisdn
3 ,first_value(ppw.cust_id)
4 over (partition by curr_MSISDN
5 order by ppw.gen_ymdhms desc nulls first
6 ) phone_account
7 from ppw_customer_dim ppw
8 ,ppw_cust_msisdn_history pcmh
9 ,ppw_service_class_history sch
10 ,ppw_service_class_dim scd
11 where ppw.cust_id = pcmh.cust_id
12 and ppw.curr_msisdn = pcmh.msisdn
13 and pcmh.cust_msisdn_key = sch.cust_msisdn_key
14 and sch.service_class_id = scd.service_class_id
15* )
SQL> /
COUNT(*)
----------
178573
Elapsed: 00:00:11.74
The increase in execution time surprised me a little, but
fine. I guess there's some overhead with the first_value.
Now, I introduced a second first_value:
=======================================
SQL>
1 select count(*) from (
2 select ppw.curr_msisdn
3 --
4 ,first_value(ppw.cust_id)
5 over (partition by curr_MSISDN
6 order by ppw.gen_ymdhms desc nulls first
7 ) phone_account
8 --
9 ,first_value(ppw.master_cust_id)
10 over (partition by curr_MSISDN
11 order by ppw.gen_ymdhms desc nulls first
12 ) master_account
13 from ppw_customer_dim ppw
14 ,ppw_cust_msisdn_history pcmh
15 ,ppw_service_class_history sch
16 ,ppw_service_class_dim scd
17 where ppw.cust_id = pcmh.cust_id
18 and ppw.curr_msisdn = pcmh.msisdn
19 and pcmh.cust_msisdn_key = sch.cust_msisdn_key
20 and sch.service_class_id = scd.service_class_id
21* )
SQL> /
COUNT(*)
----------
178573
Elapsed: 00:00:16.80
Hmmm... Same partition and sorting. Surprised at the increase.
Now, I introduced a third first_value:
======================================
1 select count(*) from (
2 select ppw.curr_msisdn
3 --
4 ,first_value(ppw.cust_id)
5 over (partition by curr_MSISDN
6 order by ppw.gen_ymdhms desc nulls first
7 ) phone_account
8 --
9 ,first_value(ppw.master_cust_id)
10 over (partition by curr_MSISDN
11 order by ppw.gen_ymdhms desc nulls first
12 ) master_account
13 --
14 ,first_value(ppw.cust_status)
15 over (partition by curr_MSISDN
16 order by ppw.gen_ymdhms desc nulls first
17 ) cust_status
18 from ppw_customer_dim ppw
19 ,ppw_cust_msisdn_history pcmh
20 ,ppw_service_class_history sch
21 ,ppw_service_class_dim scd
22 where ppw.cust_id = pcmh.cust_id
23 and ppw.curr_msisdn = pcmh.msisdn
24 and pcmh.cust_msisdn_key = sch.cust_msisdn_key
25 and sch.service_class_id = scd.service_class_id
26* )
SQL> /
COUNT(*)
----------
178573
Elapsed: 00:00:21.55
Now, I introduced a fourth first_value:
=======================================
1 select count(*) from (
2 select ppw.curr_msisdn
3 --
4 ,first_value(ppw.cust_id)
5 over (partition by curr_MSISDN
6 order by ppw.gen_ymdhms desc nulls first
7 ) phone_account
8 --
9 ,first_value(ppw.master_cust_id)
10 over (partition by curr_MSISDN
11 order by ppw.gen_ymdhms desc nulls first
12 ) master_account
13 --
14 ,first_value(ppw.cust_status)
15 over (partition by curr_MSISDN
16 order by ppw.gen_ymdhms desc nulls first
17 ) cust_status
18 ,first_value(ppw.serv_status)
19 over (partition by curr_MSISDN
20 order by ppw.gen_ymdhms desc nulls first
21 ) serv_status
22 from ppw_customer_dim ppw
23 ,ppw_cust_msisdn_history pcmh
24 ,ppw_service_class_history sch
25 ,ppw_service_class_dim scd
26 where ppw.cust_id = pcmh.cust_id
27 and ppw.curr_msisdn = pcmh.msisdn
28 and pcmh.cust_msisdn_key = sch.cust_msisdn_key
29 and sch.service_class_id = scd.service_class_id
30* )
SQL> /
COUNT(*)
----------
178573
Elapsed: 00:00:27.38
BUT: If I sort a subquery first and then apply the first_value()
on the result set, I get much better results:
1 select count(*) from (
2 select curr_msisdn
3 --
4 ,first_value(cust_id)
5 over (partition by curr_MSISDN
6 order by gen_ymdhms desc nulls first
7 ) phone_account
8 --
9 ,first_value(master_cust_id)
10 over (partition by curr_MSISDN
11 order by gen_ymdhms desc nulls first
12 ) master_account
13 --
14 ,first_value(cust_status)
15 over (partition by curr_MSISDN
16 order by gen_ymdhms desc nulls first
17 ) cust_status
18 ,first_value(serv_status)
19 over (partition by curr_MSISDN
20 order by gen_ymdhms desc nulls first
21 ) serv_status
22 from (select pcmh.gen_ymdhms,ppw.curr_msisdn,ppw.cust_id,ppw.master_cust_id,ppw.cust_status,ppw.serv_status
23 from ppw_customer_dim ppw
24 ,ppw_cust_msisdn_history pcmh
25 ,ppw_service_class_history sch
26 ,ppw_service_class_dim scd
27 where ppw.cust_id = pcmh.cust_id
28 and ppw.curr_msisdn = pcmh.msisdn
29 and pcmh.cust_msisdn_key = sch.cust_msisdn_key
30 and sch.service_class_id = scd.service_class_id
31 --order by curr_msisdn,gen_ymdhms)
32 order by ppw.curr_msisdn,pcmh.gen_ymdhms)
33* )
SQL> /
COUNT(*)
----------
178573
Elapsed: 00:00:19.63
But this is still dependent on the number of first_value operators:
===================================================================
1 select count(*) from (
2 select curr_msisdn
3 --
4 ,first_value(cust_status)
5 over (partition by curr_MSISDN
6 order by gen_ymdhms desc nulls first
7 ) cust_status
8 ,first_value(serv_status)
9 over (partition by curr_MSISDN
10 order by gen_ymdhms desc nulls first
11 ) serv_status
12 from (select pcmh.gen_ymdhms,ppw.curr_msisdn,ppw.cust_id,ppw.master_cust_id,ppw.cust_status,ppw.serv_status
13 from ppw_customer_dim ppw
14 ,ppw_cust_msisdn_history pcmh
15 ,ppw_service_class_history sch
16 ,ppw_service_class_dim scd
17 where ppw.cust_id = pcmh.cust_id
18 and ppw.curr_msisdn = pcmh.msisdn
19 and pcmh.cust_msisdn_key = sch.cust_msisdn_key
20 and sch.service_class_id = scd.service_class_id
21 --order by curr_msisdn,gen_ymdhms)
22 order by ppw.curr_msisdn,pcmh.gen_ymdhms)
23* )
SQL> /
COUNT(*)
----------
178573
Elapsed: 00:00:11.90
Questions:
----------
1. Why does the number of first_value operations affect the performance?
If each column is partitioned and sorted in the same way, I thought
performance would be the same.
2. Is it normal that the second query approach (the one that has
the first_value() working on a sorted result set) would
be faster?
3. Any performance advice for this query. The real query has about
16 first_value() calls and takes minutes to run.
Tom: If you would like the exec plans, please let me know
Thanks Tom
Steve
September 24, 2002 - 3:39 pm UTC
1) its doing tons more work -- partitioning the result set, sorting it.
2) many times, when something is done has a large effect on the time -- so yes, it is normal in both ways (faster and slower -- depending)
3) umm, get rid of the first_value calls since they don't do anything?
I guess I didn't ask the right questions
steve, September 24, 2002 - 3:46 pm UTC
Regarding Q1:
if we have one first_value() or 10 first_value(),
if they are all partitioning and sorting in the
same way, is the optimizer not able to detect
this and therefore not repeat the work for each column?
Regarding answer 2:
Sorry, but I don't understand the answer.
Regarding answer 3:
Why do you say they "don't do anything"? It's taking the most recent value for the column over time.
September 24, 2002 - 4:04 pm UTC
1) the addition of each column did not add linearly to the time
Elapsed: 00:00:03.29 no first value
Elapsed: 00:00:11.74 1 FV +8
Elapsed: 00:00:16.80 2 FV +5
Elapsed: 00:00:21.55 3 FV +5
Elapsed: 00:00:27.38 4 FV +5
The work *is in fact done once* and the optimizer recognized that. If not, all would be +8 perhaps -- consider, you proved that with:
Elapsed: 00:00:19.63 4 FV, but pre-sorted
Elapsed: 00:00:11.90 2 FV, but pre-sorted
But as for "why does each take longer" -- that is like asking "why does select sum(sal), sum(comm) from emp take longer then select sum(sal) from emp"?
You are asking for more work to be done.
and don't forget, the simple select count(*) might have been skipping lots of table access by rowids that the first values added back in (the first values are not coming from the same tables either -- so more then one extra "table access by rowid" could have been added.
2) Umm, sometims SORTING and then doing an operation is faster then doing a little bit -- sorting -- doing a little bit -- sorting and so on.
For example, sometimes "select * from t where x > 0" should use an index (when 5-15% of the table is > 0) and sometimes it should blow off the index and just do a full tablescan. Even if I add "order by x" onto the query -- we should avoid the index, full scan/sort -- instead of index range scan/table access by rowid.
Sometimes doing a function later (like you did, sort and then function) rather then sorting in the function will be better -- sometimes not.
3) did it change the result set? the answer posted back to you?
Much clearer!! Thanks!
steve, September 24, 2002 - 4:25 pm UTC
Hey Tom,
Much clearer. Thank you.
I'm still surprised though at the increase (5 seconds or so
per first_value()). But hey, life is full of surprises I guess!
I'll get back to you on #3. I sure as heck hope it changes
the result set or else.....
Thanks again, AS ALWAYS, Tom. You rock man!
Steve
answer for Q3
steve, September 24, 2002 - 5:27 pm UTC
Here's the answer on the result set with/without
the first_value calls
CREATE OR REPLACE VIEW V ( CURR_MSISDN,
PHONE_ACCOUNT, MASTER_ACCOUNT, CUST_STATUS, SERV_STATUS
) AS (select distinct ppw.curr_msisdn
--
,first_value(ppw.cust_id)
over (partition by curr_MSISDN
order by ppw.gen_ymdhms desc nulls first
) phone_account
--
,first_value(ppw.master_cust_id)
over (partition by curr_MSISDN
order by ppw.gen_ymdhms desc nulls first
) master_account
--
,first_value(ppw.cust_status)
over (partition by curr_MSISDN
order by ppw.gen_ymdhms desc nulls first
) cust_status
,first_value(ppw.serv_status)
over (partition by curr_MSISDN
order by ppw.gen_ymdhms desc nulls first
) serv_status
from ppw_customer_dim ppw
,ppw_cust_msisdn_history pcmh
,ppw_service_class_history sch
,ppw_service_class_dim scd
where ppw.cust_id = pcmh.cust_id
and ppw.curr_msisdn = pcmh.msisdn
and pcmh.cust_msisdn_key = sch.cust_msisdn_key
and sch.service_class_id = scd.service_class_id
)
CREATE OR REPLACE VIEW V2 ( CURR_MSISDN,
PHONE_ACCOUNT, MASTER_ACCOUNT, CUST_STATUS, SERV_STATUS
) AS (select distinct ppw.curr_msisdn,
--
ppw.cust_id,ppw.master_cust_id,ppw.cust_status,ppw.serv_status
from ppw_customer_dim ppw
,ppw_cust_msisdn_history pcmh
,ppw_service_class_history sch
,ppw_service_class_dim scd
where ppw.cust_id = pcmh.cust_id
and ppw.curr_msisdn = pcmh.msisdn
and pcmh.cust_msisdn_key = sch.cust_msisdn_key
and sch.service_class_id = scd.service_class_id
)
SQL> select * from v;
CURR_MSISDN PHONE_ACCOUNT MASTER_ACCOUNT C S
------------------------------ ------------- -------------- - -
525510022031 119057 119057 q I
Elapsed: 00:00:00.01
SQL> select * from v2;
CURR_MSISDN PHONE_ACCOUNT MASTER_ACCOUNT C S
------------------------------ ------------- -------------- - -
525510022031 119057 119057 P I
525510022031 119057 119057 q I
Elapsed: 00:00:00.00
Thanks again for your help Tom.
Steve
September 24, 2002 - 5:37 pm UTC
well, those are not what we were looking at! not a count(*) to be seen.
right
steve, September 25, 2002 - 6:21 am UTC
You're right that with the count(*) there's no
difference. And in the original posting, I was using the
count(*).
I added the count(*) in the posting and during testing to stop the thousands of lines from being spit out to the screen. But in the real application, there's no count(*). So that's why I was confused by the original comment (I was thinking about the real application - not the example I posted). My mistake.
Now we're on the same page!
Ciao
Steve
September 25, 2002 - 8:30 am UTC
don't do that. you cannot compare count(*) to the REAL QUERY!!!!
count(*) will many many many times let us skip the "TABLE ACCESS by ROWID" after an index lookup. When you put your real columns back in -- we need to add those.
Consider this trivial example:
big_table@ORA920.US.ORACLE.COM> select count(*) from ( select * from big_table where owner = user );
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=211 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'BIG_INDEX_1' (NON-UNIQUE) (Cost=211 Card=33234 Bytes=166170)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
3 physical reads
0 redo size
379 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
big_table@ORA920.US.ORACLE.COM> select * from big_table where owner = user;
32 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1237 Card=33234 Bytes=3190464)
1 0 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=1237 Card=33234 Bytes=3190464)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12712 consistent gets
12694 physical reads
300 redo size
1637 bytes sent via SQL*Net to client
521 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
32 rows processed
or this one:
big_table@ORA920.US.ORACLE.COM> select count(*)
2 from ( select * from big_Table where owner = 'OUTLN' )
3 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'BIG_INDEX_1' (NON-UNIQUE) (Cost=4 Card=224 Bytes=1120)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
380 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
big_table@ORA920.US.ORACLE.COM> select * from big_Table where owner = 'OUTLN'
2 /
224 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=228 Card=224 Bytes=21504)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=228 Card=224 Bytes=21504)
2 1 INDEX (RANGE SCAN) OF 'BIG_INDEX_1' (NON-UNIQUE) (Cost=4 Card=224)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
248 consistent gets
0 physical reads
300 redo size
4433 bytes sent via SQL*Net to client
653 bytes received via SQL*Net from client
16 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
224 rows processed
4 consistent gets vs 248 consistent gets! The resulting performance difference makes it IMPOSSIBLE, totally 100% IMPOSSIBLE to use select count(*) from ( YOUR_QUERY_HERE ) as a method of tuning or testing ideas!!!!
Use autotrace traceonly as I did and you'll find the query runs, but the output is never printed
Really Useful!
Steve, September 25, 2002 - 2:19 pm UTC
I feel like I've just been scolded by the principal :-)
That's good cause I need a good slap behind the head now and again.
Thanks Tom. I will definitely use autotrace traceonly. This is exactly what I needed.
Steve
How to show "Elapsed time"
Abubaker, September 29, 2002 - 3:27 am UTC
Oracle 8.1.6.3
NT 4
Hi all,
Can someone show me how to display the "Elapsed:00:00:00.00"
Thanks,
September 29, 2002 - 8:09 am UTC
SQL> set timing on
If you do that and see "real nnnnnn", then use the 8i or above version of sqlplus. It shows the time in that format. older releases on windows did not.
rows between unbounded preceding and unbounded following
Kamal Kishore, May 27, 2003 - 9:57 pm UTC
Hi Tom,
What does this clause do:
"rows between unbounded preceding and unbounded following"
when using last_value?
I get different results from these two queries:
SQL> select distinct deptno,
2 last_value(ename) over (partition by deptno order by sal
3 rows between unbounded preceding and unbounded following) as highest_sal
4 from scott.emp
5 /
DEPTNO HIGHEST_SA
---------- ----------
10 KING
20 FORD
30 BLAKE
3 rows selected.
SQL>
SQL> select distinct deptno,
2 last_value(ename) over (partition by deptno order by sal) as highest_sal
3 from scott.emp
4 /
DEPTNO HIGHEST_SA
---------- ----------
10 CLARK
10 KING
10 MILLER
20 ADAMS
20 JONES
20 SCOTT
20 SMITH
30 ALLEN
30 BLAKE
30 JAMES
30 TURNER
30 WARD
12 rows selected.
SQL>
May 28, 2003 - 6:51 am UTC
the default window is from the beginning of the partition TO THE CURRENT ROW within the partition for many analytics (ones that use order by in them).
you changed the defaul window to the entire partition using the unbounded preceding/following. Consider:
ops$tkyte@ORA920LAP> select deptno, ename, sal,
2 last_value(ename) over (partition by deptno order by sal
3 rows between unbounded preceding
4 and unbounded following) as highest_sal1,
5 last_value(ename) over (partition by deptno order by sal) highest_sal2
6 from scott.emp
7 order by deptno, sal
8 /
DEPTNO ENAME SAL HIGHEST_SA HIGHEST_SA
---------- ---------- ---------- ---------- ----------
10 MILLER 1300 KING MILLER
CLARK 2450 KING CLARK
KING 5000 KING KING
20 SMITH 800 SCOTT SMITH
ADAMS 1100 SCOTT ADAMS
JONES 2975 SCOTT JONES
FORD 3000 SCOTT SCOTT
SCOTT 3000 SCOTT SCOTT
30 JAMES 950 BLAKE JAMES
MARTIN 1250 BLAKE WARD
WARD 1250 BLAKE WARD
TURNER 1500 BLAKE TURNER
ALLEN 1600 BLAKE ALLEN
BLAKE 2850 BLAKE BLAKE
14 rows selected.
ops$tkyte@ORA920LAP>
See how the default last_value() with default window (upto the current row in the partition) is the same as the current row within the partition? You were using a sliding window
if you have my book expert one on one oracle -- i've got a pretty detailed walkthru of this stuff in there with pictures and stuff.
Excellent explanation
Kamal Kishore, May 28, 2003 - 8:36 am UTC
Hi Tom,
That was an excellent explanation, more than several pages of any book or documentation.
I have your book, but have yet to reach this chapter.
With all the things going on right now, I find very little time to read. But whenever I do find time, Expert One-On-One is the only book I read.
Thanks,
Please explain window sort
A reader, August 27, 2004 - 10:00 am UTC
Hi Tom,
The explain plan says "WINDOW (SORT)"
The Enterprise Manager says"
Sort/Merge
Sort Output
against a single WINDOW (SORT)
is this the only way to implement WINDOW (SORT)?
Can this be implemented in any other way, if the system has more memory?
The above actually goes to disk when sorting (I'll try increasing the SORT_AREA_SIZE with WORKAREA_SIZE_POLICY to MANUAL), but just confirming the most performant implementation of WINDOW/SORT.
Thanks
August 27, 2004 - 10:13 am UTC
a window sort is just a sort of data in an analytic functions window.
it is just a sort, period. just a sort.
FIRST_VALUE vs LAST_VALUE gotta watch the defaults !
A reader, August 16, 2006 - 5:19 pm UTC
FIRST_VALUE operates "intuitively", and LAST_VALUE "counter-intuitively" but correctly:
select rec#
, gender
, first_value(gender)
over (order by rec#
ASC) "1"
, first_value(gender)
over (order by rec#
DESC) "2"
, first_value(gender)
over (order by rec#
ASC
rows between unbounded preceding and unbounded following) "3"
, first_value(gender)
over (order by rec#
DESC
rows between unbounded preceding and unbounded following) "4"
, last_value(gender)
over (order by rec#
DESC) "5"
, last_value(gender)
over (order by rec#
ASC) "6"
, last_value(gender)
over (order by rec#
ASC
rows between unbounded preceding and unbounded following) "7"
, last_value(gender)
over (order by rec#
DESC
rows between unbounded preceding and unbounded following) "8"
from ( select 1 rec#, 'F' gender from dual
UNION ALL select 2 rec#, 'M' gender from dual
UNION ALL select 3 rec#, '?' gender from dual )
order by rec#
SQL> /
REC# G 1 2 3 4 5 6 7 8
----- - - - - - - - - -
1 F F ? F ? F F ? F
2 M F ? F ? M M ? F
3 ? F ? F ? ? ? ? F
#5 and 6 give the results because, according to Bug No. 2227479 (Status: Closed, Not a Bug):
If you omit the windowing_clause entirely, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. So effectively each row forms a window and all values for [each row are treated] as last value
Reader, April 07, 2009 - 12:24 pm UTC
create table tst_first_last_value
(dt_key number,
sym varchar2(100),
ts timestamp(6),
price number,
volume number
)
insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.550000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,1000);
insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.550000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,100);
insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.550000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,100);
insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.550000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,750);
insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.563000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104,260);
insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.563000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104.36,260);
insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:07:42.707000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104.1,100);
insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:09:48.643000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104.14,300);
insert into tst_first_last_value
values
(20090401,'ABC',to_timestamp('4/1/2009 9:09:48.643000000 AM','mm/dd/yyyy hh:mi:ss.ff am'),104.2,200);
commit;
select dt_key,sym,ts,to_char(ts,'hh:mi:ss') ts_hh_mi_ss,price,
from tst_first_last_value
order by ts;
I need to get the first value and last value of the price for the time 09:07:42 and 09:09:48 (basically ordered by ts). I wrote the query as shown below
select dt_key
,sym
,ts
,price
,first_value(price) over ( partition by sym,to_char(ts,'HH:MI:SS') order by ts) first_val
,last_value(price) over (partition by sym,to_char(ts,'HH:MI:SS') order by ts ) last_val
,last_value(price) over (partition by sym,to_char(ts,'HH:MI:SS') order by ts
rows between unbounded preceding and unbounded following) last_val_new
from tst_first_last_value;
I am getting different results for last_val and last_val_new columns. Can you explain what is the djavascript:doSubmit('SUBMIT_REVIEW')
Submit Reviewifference between using rows between and not using it?
April 13, 2009 - 12:46 pm UTC
when you use an order by with the analytic - and you do not specify a window (rows or range), the default window is the CURRENT ROW and everything PRECEDING it.
So, the last_value would always be the CURRENT row without a window clause.
ops$tkyte%ORA10GR2> create table t
2 as
3 select * from all_users where rownum <= 10;
Table created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select t.*,
2 first_value(user_id) over (order by user_id) fv,
3 last_value(user_id) over (order by user_id) lv1_a,
4 last_value(user_id) over
5 (order by user_id rows between unbounded preceding and current row) lv1_b,
6 last_value(user_id) over
7 (order by user_id rows between current row and unbounded following) lv2
8 from t
9 order by user_id
10 /
USERNAME USER_ID CREATED FV LV1_A LV1_B LV2
------------------------------ ---------- --------- ---------- ---------- ---------- ----------
DIP 19 30-JUN-05 19 19 19 228
TSMSYS 21 30-JUN-05 19 21 21 228
MDDATA 50 30-JUN-05 19 50 50 228
BIG_TABLE 58 14-DEC-05 19 58 58 228
FOO$TKYTE 60 19-DEC-05 19 60 60 228
LOTTOUSER 65 30-DEC-05 19 65 65 228
R 76 09-JAN-06 19 76 76 228
SCOTT 84 12-FEB-06 19 84 84 228
QUOTA 94 22-FEB-06 19 94 94 228
AQ 228 15-OCT-07 19 228 228 228
10 rows selected.
DISTINCT last_value?
A reader, April 07, 2010 - 8:18 am UTC
If I have a query like
select
...
(select last_value(col) over (order by dt rows between unbounded preceding and unbounded following) from x where x.pk=t.pk and <other predicates on x>)
...
from t
that basically subqueries out the last value (last record for a particular date from another table), if there are multiple rows for the same date, I will get 'single row subquery returns more than 1 row'
This can be addressed by adding DISTINCT before the last_value because of how the last_value function works with the window specification, all the rows will have the same value.
Is this a valid approach or would you suggest something else?
Thanks
April 12, 2010 - 8:41 pm UTC
that basically subqueries out the last value (last record for a particular date from another table), if there are multiple rows for the same date, I will get 'single row subquery returns more than 1 row'
no, it is if your predicate "where x.pk=t.pk and <other predicates on x>" RETURNS MORE THAN ONE ROW. Last_value is not an aggregate, it does not reduce the number of rows in a result set.
that'll always return the number of records the where clause returns - please explain in more detail here.
FIRST_VALUE() vs LAST_VALUE()
Praveen Ray, February 22, 2011 - 8:11 am UTC
Hi Tom,
Thanks for your great replies. I have a table 't' with the following data:
select * from t;
COL1 COL2
---- ----
x 1
x 2
x 3
y 4
y 5
y 6
As per documentation, I predicted the following two statements should produce the identical results which is not the case:
select distinct first_value(col2) over (partition by col1 order by col2 desc) from t;
FIRST_VALUE(COL2)OVER(PARTITIO
------------------------------
6
3
select distinct last_value(col2) over (partition by col1 order by col2 /*asc*/) from t;
LAST_VALUE(COL2)OVER(PARTITION
------------------------------
1
6
2
4
5
3
Please, guide what am I unable to get.
February 23, 2011 - 10:55 am UTC
ops$tkyte%ORA11GR2> select t.*,
2 first_value( col2 ) over (partition by col1 order by col2 desc ) fv,
3 last_value( col2 ) over (partition by col1 order by col2 ) lv
4 from t
5 order by col1, col2 DESC;
C COL2 FV LV
- ---------- ---------- ----------
x 3 3 3
x 2 3 2
x 1 3 1
y 6 6 6
y 5 6 5
y 4 6 4
6 rows selected.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select t.*,
2 first_value( col2 ) over (partition by col1 order by col2 desc ) fv,
3 last_value( col2 ) over (partition by col1 order by col2 ) lv
4 from t
5 order by col1, col2 ASC;
C COL2 FV LV
- ---------- ---------- ----------
x 1 3 1
x 2 3 2
x 3 3 3
y 4 6 4
y 5 6 5
y 6 6 6
6 rows selected.
the default window is from the current row and all PRECEDING rows. So, in the latter result set - the last value for the first row is "1", then when we move onto the second row - the last value is "2" because the window is the first two rows now.
and so on.
the first value never changes in the partition because the default window grows from the top down. but the "last row" is always changing as the window grows.
You'd have to add "rows between current row and unbounded following"
ops$tkyte%ORA11GR2> select t.*,
2 first_value( col2 ) over (partition by col1 order by col2 desc ) fv,
3 last_value( col2 ) over (partition by col1 order by col2 rows between current row
4 and unbounded following ) lv
5 from t
6 order by col1, col2 ASC;
C COL2 FV LV
- ---------- ---------- ----------
x 1 3 3
x 2 3 3
x 3 3 3
y 4 6 6
y 5 6 6
y 6 6 6
6 rows selected.
so that the window grows "down" from the current row to the bottom of the partition instead of from the current row UP to the top of the partition
first_value() vs last_value
Praveen Ray, February 22, 2011 - 11:11 pm UTC
Hi Tom,
Sorry to bother you. I found the answer myself :)
select distinct last_value(col2) over (partition by col1 order by col2 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) from t;
select distinct last_value(col2) over (partition by col1 order by col2 RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) from t;
Regards,
Ray
first_value vs last_value
Praveen Ray, February 23, 2011 - 10:45 pm UTC
Hi,
Thank you so much Tom, for your nice illustration...Your way of describing really fits into mind :)
Regards,
Ray
Discrepancy in outputs of last_value function
A reader, November 07, 2011 - 2:25 am UTC
Hi Tom,
I am getting the output while using the last_value function as below...
scripts are as follows..
CREATE TABLE C1
(
PK1 VARCHAR2(10 CHAR),
PK2 VARCHAR2(10 CHAR),
SERIALNO NUMBER(10,2),
IND VARCHAR2(1 CHAR),
VAL1 FLOAT(126),
VAL2 FLOAT(126)
);
Insert into C1
(PK1, PK2, SERIALNO, IND, VAL1, VAL2)
Values
('a1', 'b1', 1, 'G', 10.5, 11.5);
Insert into C1
(PK1, PK2, SERIALNO, IND, VAL1, VAL2)
Values
('a1', 'b1', 2, 'G', 12.5, 12.5);
Insert into C1
(PK1, PK2, SERIALNO, IND, VAL1, VAL2)
Values
('a1', 'b1', 3, 'H', 13.5, 10.5);
Insert into C1
(PK1, PK2, SERIALNO, IND, VAL1, VAL2)
Values
('a1', 'b1', 4, 'G', 18.5, 15.5);
Insert into C1
(PK1, PK2, SERIALNO, IND, VAL1, VAL2)
Values
('a1', 'b1', 5, 'G', 21.5, 18.5);
COMMIT;
SQL> select * from c1;
PK1 PK2 SERIALNO I VAL1 VAL2
---------- ---------- ---------- - ---------- ----------
a1 b1 1 G 10.5 11.5
a1 b1 2 G 12.5 12.5
a1 b1 3 H 13.5 10.5
a1 b1 4 G 18.5 15.5
a1 b1 5 G 21.5 18.5
SQL> select val1,first_value(val1) over (order by serialno) fv,
2 last_value(val1) over (order by serialno) lv from c1;
VAL1 FV LV
---------- ---------- ----------
10.5 10.5 10.5
12.5 10.5 12.5
13.5 10.5 13.5
18.5 10.5 18.5
21.5 10.5 21.5
But i expect my output to be as below but i am getting above output.It's giving correctly for the first_value function but not for last_value function
VAL1 FV LV
--------- ---------- ----------
21.5 10.5 21.5
18.5 10.5 21.5
13.5 10.5 21.5
12.5 10.5 21.5
10.5 10.5 21.5
Am i missing something or doing any mistake.Please clarify..
November 07, 2011 - 11:59 am UTC
let's plug in the default windowing clause and maybe it'll make more sense:
ops$tkyte%ORA11GR2> select val1,
2 first_value(val1) over (order by serialno rows between unbounded preceding and current row ) fv,
3 last_value(val1) over (order by serialno rows between unbounded preceding and current row ) lv
4 from c1;
VAL1 FV LV
---------- ---------- ----------
10.5 10.5 10.5
12.5 10.5 12.5
13.5 10.5 13.5
18.5 10.5 18.5
21.5 10.5 21.5
see, the default window of data you look at is the current row and everything in front of it (it is an anchored window, like a window shade being pulled down).
So, the last value in such a window is ALWAYS the current row!
You need to either look from the current row on down (an anchored window, but anchored at the bottom of the partition) or turn the window upside down.
I recommend the first approach so you only need one sort.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select val1,
2 first_value(val1) over (order by serialno rows between unbounded preceding and current row ) fv,
3 last_value(val1) over (order by serialno rows between current row and unbounded following ) lv,
4 first_value(val1) over (order by serialno DESC rows between unbounded preceding and current row ) lv2
5 from c1;
VAL1 FV LV LV2
---------- ---------- ---------- ----------
21.5 10.5 21.5 21.5
18.5 10.5 21.5 21.5
13.5 10.5 21.5 21.5
12.5 10.5 21.5 21.5
10.5 10.5 21.5 21.5
Discrepancy in outputs of last_value function
A reader, November 07, 2011 - 9:06 pm UTC
Thanks a lot for the clarification Tom.It's clear now