Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Michael.

Asked: March 31, 2009 - 5:23 am UTC

Last updated: August 13, 2011 - 3:23 pm UTC

Version: 9.2.0

Viewed 1000+ times

You Asked


I have a bit of a poser..

create table temp_mich
(
mydate date,
mytime varchar2(4),
job_type varchar2(4),
empid number
)

insert into temp_mich values('24-mar-09','0830','CAP1',1234);
insert into temp_mich values('24-mar-09','0837','CAP1',1234);
insert into temp_mich values('24-mar-09','0848','CAP1',1234);
insert into temp_mich values('24-mar-09','0852','CAP1',1234);
insert into temp_mich values('24-mar-09','0859','CAP2',1234);
insert into temp_mich values('24-mar-09','0900','CAP2',1234);
insert into temp_mich values('24-mar-09','0910','CAP1',1234);
insert into temp_mich values('24-mar-09','0914','CAP1',1234);
insert into temp_mich values('24-mar-09','0930','CAP1',1234);
insert into temp_mich values('24-mar-09','0830','CAP1',5678);
insert into temp_mich values('24-mar-09','0837','CAP1',5678);
insert into temp_mich values('24-mar-09','0848','CAP1',5678);
insert into temp_mich values('24-mar-09','0852','CAP1',5678);
insert into temp_mich values('24-mar-09','0859','CAP2',5678);
insert into temp_mich values('24-mar-09','0900','CAP2',5678);
insert into temp_mich values('24-mar-09','0910','CAP1',5678);
insert into temp_mich values('24-mar-09','0914','CAP1',5678);
insert into temp_mich values('24-mar-09','0930','CAP1',5678);


As you can see, we have a poor design arounf the date and time column with a four digit time being instered rather than just using the date field.

Anyway, from my sql, what I'd like to see is

mydate     mytime  job_type  empid    seq
24-MAR-09 0830 CAP1 1234 1
24-MAR-09 0837 CAP1 1234 1
24-MAR-09 0848 CAP1 1234 1
24-MAR-09 0852 CAP1 1234 1
24-MAR-09 0859 CAP2 1234 2
24-MAR-09 0900 CAP2 1234 2
24-MAR-09 0910 CAP1 1234 3
24-MAR-09 0914 CAP1 1234 3
24-MAR-09 0930 CAP1 1234 3
24-MAR-09 0830 CAP1 5678 1
24-MAR-09 0837 CAP1 5678 1
24-MAR-09 0848 CAP1 5678 1
24-MAR-09 0852 CAP1 5678 1
24-MAR-09 0859 CAP2 5678 2
24-MAR-09 0900 CAP2 5678 2
24-MAR-09 0910 CAP1 5678 3
24-MAR-09 0914 CAP1 5678 3
24-MAR-09 0930 CAP1 5678 3

The change to the seq should be on the change of job_type, and it should reset to one on change of empid.

Now, using this
select mydate, mytime, job_type, empid,
dense_rank() over (partition by empid order by empid, job_type) seq from temp_mich ORDER by empid,  to_date(mydate|| mytime,'dd-mon-yyhh24mi')

I get
mydate     mytime  job_type  empid    seq
24-MAR-09 0830 CAP1 1234 1
24-MAR-09 0837 CAP1 1234 1
24-MAR-09 0848 CAP1 1234 1
24-MAR-09 0852 CAP1 1234 1
24-MAR-09 0859 CAP2 1234 2
24-MAR-09 0900 CAP2 1234 2
24-MAR-09 0910 CAP1 1234 1
24-MAR-09 0914 CAP1 1234 1
24-MAR-09 0930 CAP1 1234 1
24-MAR-09 0830 CAP1 5678 1
24-MAR-09 0837 CAP1 5678 1
24-MAR-09 0848 CAP1 5678 1
24-MAR-09 0852 CAP1 5678 1
24-MAR-09 0859 CAP2 5678 2
24-MAR-09 0900 CAP2 5678 2
24-MAR-09 0910 CAP1 5678 1
24-MAR-09 0914 CAP1 5678 1


which is close, but not close enough.
Ayy pointers would be very much appreciated.

and Tom said...

ahh, the old "group em, carry down and then rank" problem....


<b>we start by marking the interesting records:</b>

ops$tkyte%ORA10GR2> select mydate, mytime, job_type, empid,
  2         decode(lag(job_type) over (partition by empid order by mydate, mytime), job_type, to_number(null), row_number() over (partition by empid order by mydate, mytime)) rn
  3    from t
  4  /

MYDATE    MYTI JOB_      EMPID         RN
--------- ---- ---- ---------- ----------
24-MAR-09 0830 CAP1       1234          1
24-MAR-09 0837 CAP1       1234
24-MAR-09 0848 CAP1       1234
24-MAR-09 0852 CAP1       1234
24-MAR-09 0859 CAP2       1234          5
24-MAR-09 0900 CAP2       1234
24-MAR-09 0910 CAP1       1234          7
24-MAR-09 0914 CAP1       1234
24-MAR-09 0930 CAP1       1234
24-MAR-09 0830 CAP1       5678          1
24-MAR-09 0837 CAP1       5678
24-MAR-09 0848 CAP1       5678
24-MAR-09 0852 CAP1       5678
24-MAR-09 0859 CAP2       5678          5
24-MAR-09 0900 CAP2       5678
24-MAR-09 0910 CAP1       5678          7
24-MAR-09 0914 CAP1       5678
24-MAR-09 0930 CAP1       5678

18 rows selected.

<b>Now, we have to carry down the last non-null value in the group - we could use ignore nulls in 10g, but you are 9i...</b>

ops$tkyte%ORA10GR2> select mydate, mytime, job_type, empid,
  2         max(rn) over (partition by empid order by mydate, mytime) grp
  3    from (
  4  select mydate, mytime, job_type, empid,
  5         decode(lag(job_type) over (partition by empid order by mydate, mytime), job_type, to_number(null), row_number() over (partition by empid order by mydate, mytime)) rn
  6    from t
  7         )
  8  /

MYDATE    MYTI JOB_      EMPID        GRP
--------- ---- ---- ---------- ----------
24-MAR-09 0830 CAP1       1234          1
24-MAR-09 0837 CAP1       1234          1
24-MAR-09 0848 CAP1       1234          1
24-MAR-09 0852 CAP1       1234          1
24-MAR-09 0859 CAP2       1234          5
24-MAR-09 0900 CAP2       1234          5
24-MAR-09 0910 CAP1       1234          7
24-MAR-09 0914 CAP1       1234          7
24-MAR-09 0930 CAP1       1234          7
24-MAR-09 0830 CAP1       5678          1
24-MAR-09 0837 CAP1       5678          1
24-MAR-09 0848 CAP1       5678          1
24-MAR-09 0852 CAP1       5678          1
24-MAR-09 0859 CAP2       5678          5
24-MAR-09 0900 CAP2       5678          5
24-MAR-09 0910 CAP1       5678          7
24-MAR-09 0914 CAP1       5678          7
24-MAR-09 0930 CAP1       5678          7

18 rows selected.

<b>and then we rank the groups...</b>

ops$tkyte%ORA10GR2> select mydate, mytime, job_type, empid,
  2         dense_rank() over (partition by empid order by grp) dr
  3    from (
  4  select mydate, mytime, job_type, empid,
  5         max(rn) over (partition by empid order by mydate, mytime) grp
  6    from (
  7  select mydate, mytime, job_type, empid,
  8         decode(lag(job_type) over (partition by empid order by mydate, mytime), job_type, to_number(null), row_number() over (partition by empid order by mydate, mytime)) rn
  9    from t
 10         )
 11         )
 12   order by empid, mydate, mytime
 13  /

MYDATE    MYTI JOB_      EMPID         DR
--------- ---- ---- ---------- ----------
24-MAR-09 0830 CAP1       1234          1
24-MAR-09 0837 CAP1       1234          1
24-MAR-09 0848 CAP1       1234          1
24-MAR-09 0852 CAP1       1234          1
24-MAR-09 0859 CAP2       1234          2
24-MAR-09 0900 CAP2       1234          2
24-MAR-09 0910 CAP1       1234          3
24-MAR-09 0914 CAP1       1234          3
24-MAR-09 0930 CAP1       1234          3
24-MAR-09 0830 CAP1       5678          1
24-MAR-09 0837 CAP1       5678          1
24-MAR-09 0848 CAP1       5678          1
24-MAR-09 0852 CAP1       5678          1
24-MAR-09 0859 CAP2       5678          2
24-MAR-09 0900 CAP2       5678          2
24-MAR-09 0910 CAP1       5678          3
24-MAR-09 0914 CAP1       5678          3
24-MAR-09 0930 CAP1       5678          3

18 rows selected.

Rating

  (9 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Awesome!!!

Michael Peel, March 31, 2009 - 10:25 am UTC

The title says it all.

Thanks,

How about this...

Steve, March 31, 2009 - 1:37 pm UTC

This may also work.....

with change_flag as (
select
mydate
, mytime
, job_type
, empid
, case when nvl(lag(job_type) over (partition by empid order by mydate, mytime), job_type) = job_type
then 0
else 1
end as changed
from temp_mich
)
select
mydate
, mytime
, job_type
, empid
, sum(changed) over(partition by empid order by mydate, mytime) + 1 as rank
from change_flag
Tom Kyte
April 01, 2009 - 7:33 am UTC

neat, yes.

Steve, March 31, 2009 - 1:57 pm UTC

more explanation...

1. First step is to mark each record which the job changes, making sure this is done by the specific order. The results of this query is a value of
1 whenever the job has changed.

2. The second step is to simply do a running total of the changes. Because only the records that change have a 1 the running total will provide the ranking desired. The "+ 1" is simply used to make the ranking start at 1 rather than 0

with change_flag as (
select
mydate
, mytime
, job_type
, empid
, case when lag(job_type,1, job_type) over (partition by empid order by mydate, mytime) = job_type
then 0
else 1
end as changed
from temp_mich
)
select
mydate
, mytime
, job_type
, empid
, sum(changed) over(partition by empid order by mydate, mytime) + 1 as rank
from change_flag
Tom Kyte
April 01, 2009 - 7:40 am UTC

and it requires only two window sorts - nice.

great !

Sokrates, April 01, 2009 - 7:09 am UTC

I vote for steve's version:
simpler and cheaper

tkyte:
...
-------------------------------------------------------------------------------------
| Id  | Operation               | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |           |    18 |   774 |     5  (60)| 00:00:01 |
|   1 |  SORT ORDER BY          |           |    18 |   774 |     5  (60)| 00:00:01 |
|   2 |   WINDOW SORT           |           |    18 |   774 |     5  (60)| 00:00:01 |
|   3 |    VIEW                 |           |    18 |   774 |     3  (34)| 00:00:01 |
|   4 |     WINDOW BUFFER       |           |    18 |   774 |     3  (34)| 00:00:01 |
|   5 |      VIEW               |           |    18 |   774 |     3  (34)| 00:00:01 |
|   6 |       WINDOW SORT       |           |    18 |   396 |     3  (34)| 00:00:01 |
|   7 |        TABLE ACCESS FULL| TEMP_MICH |    18 |   396 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------


Statistiken
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        875  bytes sent via SQL*Net to client
        349  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
         18  rows processed


steve (with added "order by empid, mydate, mytime"):
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |    18 |   594 |     3  (34)| 00:00:01 |
|   1 |  WINDOW BUFFER       |           |    18 |   594 |     3  (34)| 00:00:01 |
|   2 |   VIEW               |           |    18 |   594 |     3  (34)| 00:00:01 |
|   3 |    WINDOW SORT       |           |    18 |   396 |     3  (34)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| TEMP_MICH |    18 |   396 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------


Statistiken
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        877  bytes sent via SQL*Net to client
        349  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         18  rows processed


only half the number of sorts, thus showing
that semantically equivalent query can easily result in different plans
Tom Kyte
April 01, 2009 - 9:47 am UTC

I would argue that these are not "semantically equivalent" in the truest sense of the word.

They come to the same answer, but they are *very* different in nature.


but I agree, I like Steve's approach and have thusly filed it away in my head for future use.

Nitpicking

Stew Ashton, April 06, 2009 - 5:59 pm UTC


Very nice question and answers, thanks to all, but if you don't mind my saying so:

In the absence of a unique constraint on empid + mydate + mytime, results are not deterministic unless you

(partition by empid order by mydate, mytime, job_type)




Clarification please

Huey, April 07, 2009 - 5:04 am UTC

I'm struggling with the 2nd part of Tom's solution to this problem, i.e. the query:
select mydate,
       mytime, 
    job_type, 
    empid,
       max(rn) over (partition by empid order by mydate, mytime) grp
from ( select mydate,
              mytime, 
     job_type, 
     empid,
     decode(lag(job_type) over (partition by empid order by mydate, mytime), 
                     job_type,
      to_number(null), 
      row_number() over (partition by empid order by mydate, mytime)) rn
       from t)

From my basic understanding of analytics I would expect the value of:
max(rn) over (partition by empid order by mydate, mytime) grp
to be 7 throughout, my reason being that the max function takes the largest value of rn from each partition. There are two partitions, one for 1234 and one for 5678, and the largest value of rn in each is 7.

This is obviously wrong, but I can't see why. Could someone please explain.
Tom Kyte
April 13, 2009 - 11:21 am UTC

when you use order by - that created an implicit window of "rows between the current row and all preceding rows"

so the max(rn) was computed after the sort, and only considered the current row and preceding rows.

@Huey re:clarification

someone, April 07, 2009 - 2:07 pm UTC


Hi Huey,

Since Tom is traveling and you asked for me by name :)

Analytic functions have windowing clauses. You are expecting the behaviour
(partition BY empid order by mydate, mytime
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
but the documentation says "If you omit the windowing_clause entirely, then the default is
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW."

Try each windowing clause in turn and you will see the difference.
Tom Kyte
April 13, 2009 - 12:55 pm UTC

... but the documentation says "If you omit the windowing_clause entirely, then the default is
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW." ...

not entirely true - it depends on the presence of an order by clause.

ops$tkyte%ORA10GR2> select user_id,
  2         max(user_id) over () max1,
  3             max(user_id) over (order by user_id) max2
  4    from t;

   USER_ID       MAX1       MAX2
---------- ---------- ----------
        19        228         19
        21        228         21
        50        228         50
        58        228         58
        60        228         60
        65        228         65
        76        228         76
        84        228         84
        94        228         94
       228        228        228

10 rows selected.


without order by - the default window is the entire partition.

with order by - the default window is the current row and every row "in front of it" (before it) in the partition.

Another Logic to get the result

Orlando Migotto, April 17, 2009 - 9:07 pm UTC

Hi all,

I think I did it an easy way.

Look at the query:

select y.mydate,
y.mytime,
y.job_type,
y.empid,
linha-sum(aux) over(partition by empid order by empid rows unbounded preceding) grp
from
(select t.*
,row_number() over(partition by empid order by empid) linha
,case when lag(job_type,1) over(partition by empid order by empid) = job_type then 1 else 0 end aux
from temp_mich t) y


Analytics really rocks !!


Rank Based on Multiple Records

A Reader, August 08, 2011 - 4:32 am UTC

Dear Tom,

With below data
create table Orders
(
order_no number,
item_code varchar2(100)
);

insert into orders values (1, 'Item-1');
insert into orders values (1, 'Item-2');
insert into orders values (1, 'Item-3');
insert into orders values (2, 'Item-7');
insert into orders values (2, 'Item-8');
insert into orders values (3, 'Item-2');
insert into orders values (3, 'Item-1');
insert into orders values (3, 'Item-3');
insert into orders values (4, 'Item-7');
insert into orders values (4, 'Item-8');

commit;

Is it possible to get below output.

Order Item Rank

1 Item-1 1
1 Item-2 1
1 Item-3 1
3 Item-1 1
3 Item-2 1
3 Item-3 1
2 Item-7 2
2 Item-8 2
4 Item-7 2
4 Item-8 2

thanks,
A reader
Tom Kyte
August 13, 2011 - 3:23 pm UTC

maybe yes
maybe no

You would sort of have to explain what this rank means. I don't see the logic behind it.

explain it like you would document it in a specification, be detailed and consider all of the edge cases.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.