Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question.

Asked: February 27, 2008 - 8:43 am UTC

Last updated: September 20, 2012 - 5:26 am UTC

Version: 9.1

Viewed 10K+ times! This question is

You Asked

Lets say i have a table:

create table t ( a int, b date, c varchar2(30) );

with this data in it:

insert into t values ( 1, to_date( '01-jan-2004'), 'hello' );
insert into t values ( 2, to_date( '15-jan-2004'), 'world' );
insert into t values (3, to_date('17-jul-2005'),'how');
insert into t values (4, to_date('25-dec-2007'),'are you');
insert into t valued (5,to_date('27-feb-2008'),'today');

I need the 'are you' row because it matches some criteria
having found that row I want to reference the column values in the 3 preceding rows based on the date column "b"

Is that easy to do?

and Tom said...

ops$tkyte%ORA11GR1> select *
  2    from (
  3  select *
  4    from t
  5   where b <= (select b
  6                 from t
  7                        where c = 'are you')
  8   order by b desc
  9         )
 10   where rownum <= 4
 11  /

         A B         C
---------- --------- ------------------------------
         4 25-DEC-07 are you
         3 17-JUL-05 how
         2 15-JAN-04 world
         1 01-JAN-04 hello



and you can lag/lead on that small result set if you need them all on a single row...

Now, you have to figure out how to deal with the case where

B is not unique...
C is not unique...

but you didn't tell us enough about the data to know if that will be an issue...

Rating

  (19 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Toby, February 27, 2008 - 6:01 pm UTC

Thanks Tom.

B is unique C is not.

I am tinkering with lag and lead but can't quite get what I need

I do want to the whole result set on a single row please

Tom Kyte
February 27, 2008 - 7:25 pm UTC

define please what you expect when c is not unique.

eg: BE SPECIFIC AND VERY CLEAR for every case.

A reader, February 27, 2008 - 7:55 pm UTC

create table t (mykey int, the_date date, drink_sales int, food_sales int, 
            petrol_sales int,tyre_sales int);

insert into t values (1,to_date('20-feb-2008'), 100, 120,100,0);
insert into t values (2,to_date('21-feb-2008'), 150, 120,100,0);
insert into t values (3,to_date('22-feb-2008'), 90, 120,140,10);
insert into t values (4,to_date('23-feb-2008'), 200, 200,110,60);
insert into t values (5,to_date('24-feb-2008'), 200, 180,110,60);
insert into t values (6,to_date('25-feb-2008'), 120, 250,110,120);
insert into t values (7,to_date('26-feb-2008'), 100, 100,300,60);



column mykey is the primary key
column the_date is unique
remaining columns are not I am not concerned if these columns are unique
or not just there relationship to the values of the preceding day.


I would like to do something like this

select all the columns for a day where
the drink_sales(today) are less then the drink_sales(yesterday)
and the food_sales(today) are greater then food_sales(yesterday)
and the food_sales(yesterday) are less then food_sale(the day before that)



Did I get this clearer?
Tom Kyte
February 28, 2008 - 3:28 pm UTC

ops$tkyte%ORA10GR2> select *
  2    from (
  3  select t.*,
  4         case when lag(the_date) over (order by the_date) = the_date-1 then lag(drink_sales) over (order by the_date) end yesterdays_drink,
  5         case when lag(the_date) over (order by the_date) = the_date-1 then lag(food_sales) over (order by the_date) end yesterdays_food,  6         case when lag(the_date,2) over (order by the_date) = the_date-2 then lag(food_sales,2) over (order by the_date) end day_before_yesterdays_food
  7    from t
  8         )
  9   where drink_sales < yesterdays_drink
 10     and food_sales > yesterdays_food
 11     and yesterdays_food < day_before_yesterdays_food
 12  /

     MYKEY THE_DATE  DRINK_SALES FOOD_SALES PETROL_SALES TYRE_SALES YESTERDAYS_DRINK YESTERDAYS_FOOD DAY_BEFORE_YESTERDAYS_FOOD
---------- --------- ----------- ---------- ------------ ---------- ---------------- --------------- --------------------------
         6 25-FEB-08         120        250          110        120              200             180                        200

specific enough

Duke Ganote, February 27, 2008 - 10:25 pm UTC

I'm sure Tom's will be slicker:

create table t
( mykey int primary key
, the_date date unique
, drink_sales int, food_sales int
, petrol_sales int,tyre_sales int)
;
insert into t values (1,to_date('20-feb-2008','dd-mon-yyyy'), 100, 120,100,0);
insert into t values (2,to_date('21-feb-2008','dd-mon-yyyy'), 150, 120,100,0);
insert into t values (3,to_date('22-feb-2008','dd-mon-yyyy'), 90, 120,140,10);
insert into t values (4,to_date('23-feb-2008','dd-mon-yyyy'), 200, 200,110,60);
insert into t values (5,to_date('24-feb-2008','dd-mon-yyyy'), 200, 180,110,60);
insert into t values (6,to_date('25-feb-2008','dd-mon-yyyy'), 120, 250,110,120);
insert into t values (7,to_date('26-feb-2008','dd-mon-yyyy'), 100, 100,300,60);

SELECT today, "food(today)", "drink(today)"
, "drink(dayB4)"
, "food(dayB4)"
, "food(2dayB4)"
FROM
(
select the_date today
, food_sales "food(today)"
, drink_sales "drink(today)"
, first_value(drink_sales) OVER (order by the_date rows 1 preceding )
"drink(dayB4)"
, first_value(food_sales) OVER (order by the_date rows 1 preceding)
"food(dayB4)"
, first_value(food_sales) OVER (order by the_date rows 2 preceding)
"food(2dayB4)"
, row_number() OVER (order by the_date) r#
from t
order by the_date DESC
)
WHERE r# > 2
AND "drink(today)" < "drink(dayB4)"
AND "food(today)" > "food(dayB4)"
AND "food(dayB4)" < "food(2dayB4)"
/

TODAY food(today) drink(today) drink(dayB4) food(dayB4) food(2dayB4)
---------- ----------- ------------ ------------ ----------- ------------
2008-02-25 250 120 200 180 200
Tom Kyte
February 28, 2008 - 3:30 pm UTC


what about non-contiguous dates.....

lag(column,offset) with the case to check the dates...

Looks good

Toby, February 27, 2008 - 11:30 pm UTC

Hey thanks Duke!

Looks very good, I haven't used first_value before.
Can ask what the row number > 2 doing in the query?

I got lag and lead working.
I have used lag to get the the date, yesterday and the day before yesterdays date.

Thanks again for your answer.
Still interested in seeing your solution Tom.




Tom Kyte
February 28, 2008 - 3:31 pm UTC

see above

on FIRST_VALUE for 'Knight Move' and ROW_NUMBER

Duke Ganote, February 28, 2008 - 5:59 am UTC

FIRST_VALUE may be my favorite analytic function. Here it allows me to make a 'knight move': count 2 rows ordered by date, then get the value of food_sales. (Anthony Molinaro, in his book "SQL Cookbook", refers to 'knight values', but I prefer the Bob Seger allusion. :)

ROW_NUMBER covers the boundary condition. I explicitly excluded the first/earliest two rows because they couldn't possibly satisfy the condition "food_sales(yesterday) are less then food_sale(the day before that)". Row 1 has no known "food_sales(yesterday)" and row 2 has no known food_sale "food_sale(the day before that)".
Tom Kyte
February 28, 2008 - 3:46 pm UTC

what about non-contiguous dates....

re: FIRST_VALUE and non-contiguous dates.....

Duke Ganote, February 28, 2008 - 5:55 pm UTC

Now you made me review p.553-555 of the ultimate Oracle book (your EO1O1, of course) and learn something new...

You mean like this?

SELECT today -- REVISED QUERY AGAINST ORIGINAL DATA
, "drink(today)"
, "drink(dayB4)"
, "food(today)"
, "food(dayB4)"
, "food(2dayB4)"
FROM
(
select the_date today
, food_sales "food(today)"
, drink_sales "drink(today)"
, FIRST_VALUE(drink_sales) OVER
(order by the_date
range between 1 preceding and 1 preceding)
"drink(dayB4)"
, FIRST_VALUE(food_sales) OVER
(order by the_date
range between 1 preceding and 1 preceding)
"food(dayB4)"
, FIRST_VALUE(food_sales) OVER
(order by the_date
range between 2 preceding and 2 preceding)
"food(2dayB4)"
from t
order by the_date DESC
)
WHERE "drink(today)" < "drink(dayB4)"
AND "food(today)" > "food(dayB4)"
AND "food(dayB4)" < "food(2dayB4)"
/

TODAY drink(today) drink(dayB4) food(today) food(dayB4) food(2dayB4)
---------- ------------ ------------ ----------- ----------- ------------
2008-02-25 120 200 250 180 200

update t set the_date = the_date + mykey
/
select * from t -- NON-CONTIGUOUS DATES
/

MYKEY THE_DATE DRINK_SALES FOOD_SALES PETROL_SALES TYRE_SALES
---------- ---------- ----------- ---------- ------------ ----------
1 2008-02-21 100 120 100 0
2 2008-02-23 150 120 100 0
3 2008-02-25 90 120 140 10
4 2008-02-27 200 200 110 60
5 2008-02-29 200 180 110 60
6 2008-03-02 120 250 110 120
7 2008-03-04 100 100 300 60

7 rows selected.

SELECT today
, "drink(today)"
, "drink(dayB4)"
, "food(today)"
, "food(dayB4)"
, "food(2dayB4)"
FROM
(
select the_date today
, food_sales "food(today)"
, drink_sales "drink(today)"
, FIRST_VALUE(drink_sales) OVER
(order by the_date
range between 1 preceding and 1 preceding)
"drink(dayB4)"
, FIRST_VALUE(food_sales) OVER
(order by the_date
range between 1 preceding and 1 preceding)
"food(dayB4)"
, FIRST_VALUE(food_sales) OVER
(order by the_date
range between 2 preceding and 2 preceding)
"food(2dayB4)"
from t
order by the_date DESC
)
WHERE "drink(today)" < "drink(dayB4)"
AND "food(today)" > "food(dayB4)"
AND "food(dayB4)" < "food(2dayB4)"
/

no rows selected

LAG vs FIRST_VALUE. Subtle, no?

Duke Ganote, February 28, 2008 - 7:21 pm UTC

There's a subtle difference between the inner queries, which I noticed when testing with the non-contiguous data.

The LAG(...,2) query doesn't show show food sales from two days prior on non-contiguous data because LAG(...,2) is a ROW offset (p. 564 of EO1O1):

select the_date,
food_sales "food(today)",
case when lag(the_date ,2) over (order by the_date) = the_date-2
then lag(food_sales,2) over (order by the_date)
end "food(2dayB4)"
from t
order by the_date
/
THE_DATE food(today) food(2dayB4)
---------- ----------- ------------
2008-02-21 120
2008-02-23 120
2008-02-25 120
2008-02-27 200
2008-02-29 180
2008-03-02 250
2008-03-04 100

7 rows selected.

This query shows sales from two days prior because of the windowing RANGE clause:

select the_date today
, food_sales "food(today)"
, FIRST_VALUE(food_sales) OVER
(order by the_date
range between 2 preceding and 2 preceding)
"food(2dayB4)"
from t
order by the_date
/

TODAY food(today) food(2dayB4)
---------- ----------- ------------
2008-02-21 120
2008-02-23 120 120
2008-02-25 120 120
2008-02-27 200 120
2008-02-29 180 200
2008-03-02 250 180
2008-03-04 100 250

7 rows selected.

Actually, it shows the same result with FIRST_VALUE, MIN, MAX, SUM, etc.

Learned alot

Toby, February 29, 2008 - 8:45 am UTC

Thanks Tom and Duke

I learned an awful lot from your posts

Learned a lot

Duke Ganote, February 29, 2008 - 9:45 am UTC

Me too! Analytic functions offer lots of power. However, I tend to only use what I originally needed, learned (and still regularly use).

Until Tom "pushed back", I'd never looked at the row offset parameter on LAG, or used a windowing clause. I regularly use PARTITIONING and ORDER BY clauses, sure. But never a windowing clause like this:
RANGE BETWEEN 2 PRECEDING AND 2 PRECEDING
to capture a value "the day before yesterday", that is, 2 days ago (in calendar days, not business days).

I toyed with it further by compressing the time slices to half-daily:

update t set the_date = the_date-mykey*.5;

7 rows updated.

select * from t;

MYKEY THE_DATE DRINK_SALES FOOD_SALES PETROL_SALES TYRE_SALES
---------- ------------------- ----------- ---------- ------------ ----------
1 2008-02-19 12:00:00 100 120 100 0
2 2008-02-20 00:00:00 150 120 100 0
3 2008-02-20 12:00:00 90 120 140 10
4 2008-02-21 00:00:00 200 200 110 60
5 2008-02-21 12:00:00 200 180 110 60
6 2008-02-22 00:00:00 120 250 110 120
7 2008-02-22 12:00:00 100 100 300 60

7 rows selected.

and got the sum of yesterday's and the day-before-yesterday's sales like this:

select the_date this_datetime
, food_sales "food(this_datetime)"
, sum(food_sales) OVER
(order by trunc(the_date)
range between 1 preceding and 1 preceding)
"food(dayB4)"
, sum(food_sales) OVER
(order by trunc(the_date)
range between 2 preceding and 2 preceding)
"food(2dayB4)"
from t
order by the_date
/

THIS_DATETIME food(this_datetime) food(dayB4) food(2dayB4)
------------------- ------------------- ----------- ------------
2008-02-19 12:00:00 120
2008-02-20 00:00:00 120 120
2008-02-20 12:00:00 120 120
2008-02-21 00:00:00 200 240 120
2008-02-21 12:00:00 180 240 120
2008-02-22 00:00:00 250 380 240
2008-02-22 12:00:00 100 380 240

7 rows selected.

ah, the possibilities...

How about if the column is a computed value?

Chuck jolley, February 29, 2008 - 12:15 pm UTC

As an example, would it be possible to create a factorial table by referencing prior calculations?

eg
1 * 1 = 1
2 * 1 = 2
3 * 2 = 6
4 * 6 = 24
5 * 24 = 120
etc...

Something along the lines of
select level,
level * nvl(lag(factorial) over (order by level), level) factorial
from dual connect by level <= 5

Except, of course, that that won't compile because of the self reverencing computed column.

But, is there a way it can be done?

Tom Kyte
March 01, 2008 - 11:59 am UTC

ops$tkyte%ORA10GR2> with data as
  2  (select level l from dual connect by level <= 10)
  3  select l, exp( sum( ln(l) ) over (order by l) ) fact
  4    from data
  5  /

         L       FACT
---------- ----------
         1          1
         2          2
         3          6
         4         24
         5        120
         6        720
         7       5040
         8      40320
         9     362880
        10    3628800

10 rows selected.


RE: factorial table by referencing prior calculations?

Duke Ganote, February 29, 2008 - 7:20 pm UTC

10g's MODEL clause makes it easy:

SELECT * FROM
(
SELECT level r#
FROM dual
CONNECT BY LEVEL <= 5
) x
MODEL
DIMENSION BY ( r# )
MEASURES ( CAST(NULL AS NUMBER) factorial )
RULES AUTOMATIC ORDER
( factorial[r#] = cv(r#)* CASE cv(r#)
WHEN 1 THEN 1
ELSE factorial[cv()-1]
END
)
/

R# FACTORIAL
---------- ----------
1 1
2 2
3 6
4 24
5 120

A whole lesson in one statement

Chuck Jolley, March 01, 2008 - 8:52 am UTC

That is very interesting, thanks.
It's going to take me a bit to pick it apart and understand the general case.

If the factorial example was too easy, how about a Pascal's Triangle. ;)

Tom Kyte
March 01, 2008 - 12:15 pm UTC

ops$tkyte%ORA10GR2> with data as
  2  (select level l from dual connect by level <= 10)
  3  select l, exp( sum( ln(l) ) over (order by l) ) fact
  4    from data
  5  /

         L       FACT
---------- ----------
         1          1
         2          2
         3          6
         4         24
         5        120
         6        720
         7       5040
         8      40320
         9     362880
        10    3628800

10 rows selected.


is another approach.

see
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:12257624552691#599027400346211164

for the triangle ;)

Chuck Jolley, March 01, 2008 - 4:19 pm UTC

Tom,
I knew about the factorial formula.
I really just wondered if there was a way to reference prior calculated rows and used factorials as an example for simplicity.

Still, interesting stuff!

Chuck Jolley, March 01, 2008 - 4:21 pm UTC

I should have followed your link before I replied :D

I guess I didn't have anything else to do for the next week but wade through that.

pascal in sql

A reader, March 02, 2008 - 11:18 am UTC

RE: pick it apart and understand the general case

Duke Ganote, March 03, 2008 - 5:19 am UTC

If you mean the "general case" of MODEL, I found Joe Fuda and his SQL Snippets invaluable:
http://www.sqlsnippets.com/en/topic-11663.html
especially on the infuriating CV() for the righthand side:
http://pages.citebite.com/c2n9a8t4hhsa

Difference between Lag/Lead and range/row preceding

Raghu, September 13, 2012 - 4:34 pm UTC

Tom,

The question might sound noobish, but is there a difference between

LAG(SUM(salary), 1) OVER (ORDER BY hiredate)
and
sum(salary) over (order by hiredate rows/range 1 preceding)?

Thanks in advance
Tom Kyte
September 14, 2012 - 6:53 pm UTC

if you just want the preceding salary, please just use

lag(salary) over (order by hiredate)

why are you confusing everything with sum()s and such.


use the one that makes sense

lag(salary) over (order by hiredate)


Selecting previous rows

Raghu, September 18, 2012 - 1:42 pm UTC

Thanks Tom.

I've a table

create table t (attribute_name varchar2(50), attribute_length number
);

insert into t values ('ATTRIBUTE_1',10);
insert into t values ('ATTRIBUTE_2',10);
insert into t values ('ATTRIBUTE_2',20);
insert into t values ('ATTRIBUTE_3',30);
insert into t values ('ATTRIBUTE_4',10);
insert into t values ('ATTRIBUTE_4',40);
insert into t values ('ATTRIBUTE_5',50);
insert into t values ('ATTRIBUTE_6',60);
insert into t values ('ATTRIBUTE_7',70);
insert into t values ('ATTRIBUTE_8',80);
insert into t values ('ATTRIBUTE_8',80);
commit;
/

Now,

select attribute_name, attribute_length,
row_number () over (partition by attribute_name order by attribute_name) rn
from t;

would give me

ATTRIBUTE_NAME ATTRIBUTE_LENGTH RN

ATTRIBUTE_1 10 1
ATTRIBUTE_2 20 1
ATTRIBUTE_2 10 2
ATTRIBUTE_3 30 1
ATTRIBUTE_4 10 1
ATTRIBUTE_4 40 2
ATTRIBUTE_5 50 1
ATTRIBUTE_6 60 1
ATTRIBUTE_7 70 1
ATTRIBUTE_8 80 1
ATTRIBUTE_8 80 2

Is there a way to select only attributes 2,4 and not 8 (since the difference in lengths are only in 2 and 4)

I'm sure analytics is the way to go but uncertain which function to use.

Thanks
Tom Kyte
September 20, 2012 - 5:26 am UTC

ops$tkyte%ORA11GR2> select attribute_name, attribute_length,
  2         row_number() over (partition by attribute_name order by attribute_length) rn
  3    from (
  4  select attribute_name, attribute_length,
  5         count(distinct attribute_length) over (partition by attribute_name ) cnt
  6    from t
  7         )
  8   where cnt > 1
  9   order by attribute_name, attribute_length
 10  /

ATTRIBUTE_NAME                                     ATTRIBUTE_LENGTH         RN
-------------------------------------------------- ---------------- ----------
ATTRIBUTE_2                                                      10          1
ATTRIBUTE_2                                                      20          2
ATTRIBUTE_4                                                      10          1
ATTRIBUTE_4                                                      40          2

4 rows selected.

Rephrasing the question

Raghu, September 18, 2012 - 1:47 pm UTC

I need both the rows of attributes 2 and 4

i.e. result should return,

ATTRIBUTE_NAME ATTRIBUTE_LENGTH RN

ATTRIBUTE_2 20 1
ATTRIBUTE_2 10 2
ATTRIBUTE_4 10 1
ATTRIBUTE_4 40 2

only.

Thanks a million.