Thanks for the question, nordine.

Asked: May 27, 2008 - 10:38 am UTC

Last updated: June 05, 2008 - 7:14 pm UTC

Version: oracle 9.2

Viewed 1000+ times

You Asked

Hello Tom, thank you kindly answer my question :

create table t1 (c1 number, c2 char(1), c3 date);

insert into t1 values (1,'p','01/01/2000');
insert into t1 values (1,'p','01/01/2000');
insert into t1 values (1,'c','01/01/2000');
insert into t1 values (1,'p','01/01/2005');
insert into t1 values (2,'p','01/01/2000');
insert into t1 values (2,'p','01/01/2000');
insert into t1 values (2,'p','01/01/2000');
insert into t1 values (3,'p','01/01/2006');
insert into t1 values (3,'p','01/01/2006');
insert into t1 values (4,'c','01/01/2008');
insert into t1 values (4,'c','01/01/2008');

COMMIT;

c3 is not always constant
there is no primary key (datawarehouse table)

my question,

What is the fastest way to get the next output :

1,'c','01/01/2000'
1,'p','01/01/2005'
2,'p','01/01/2000'
3,'p','01/01/2006'
4,'c','01/01/2008'
Note : c category is more priority than the category p
Exemple :
for the person number 1, three events have occurred the same day ('01/01/2000'), and I have to keep the third row (category c) because its highest priority.
we must understand: 'c' is more important than 'p'

Thank you Sir TOM

and Tom said...

ops$tkyte%ORA10GR2> select *
  2    from (
  3  select c1, c2, c3,
  4         row_number() over (partition by c1, c3 order by c2) rn
  5    from t1
  6         )
  7   where rn = 1;

        C1 C C3                 RN
---------- - ---------- ----------
         1 c 01/01/2000          1
         1 p 01/01/2005          1
         2 p 01/01/2000          1
         3 p 01/01/2006          1
         4 c 01/01/2008          1


or

ops$tkyte%ORA10GR2> select c1, min(c2), c3 from t1 group by c1, c3;

        C1 M C3
---------- - ----------
         1 c 01/01/2000
         1 p 01/01/2005
         2 p 01/01/2000
         3 p 01/01/2006
         4 c 01/01/2008

Rating

(11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

more details

nordine, May 29, 2008 - 4:05 am UTC

Tom, thank you for the answer. The question is a bit more complicated than that
Suppose that I add the following row to my table :
insert into t1 values (1,'a','01/01/2000');
the result will be :
C1 C C3 RN
---------- - -------- ----------
1 a 01/01/00 1
1 p 01/01/05 1
2 p 01/01/00 1
3 p 01/01/06 1
4 c 01/01/08 1

it is not really what I want because the category 'c' is always more priority than all other categories
the result should have been as follows:
C1 C C3 RN
---------- - -------- ----------
1 c 01/01/00 1
1 p 01/01/05 1
2 p 01/01/00 1
3 p 01/01/06 1
4 c 01/01/08 1

Thank you for your help

May 29, 2008 - 8:07 am UTC

... The question is a bit more complicated than that
......

I knew it was, and I tried really really hard to get to you to TELL ME WHAT the real thing was.... we went back and forth and back and forth on this many times.... sigh.

You didn't say that 'a' could be there.

but it seems rather simple - sort by things in the way you need them sorted.

order by case when c2 = 'c' then 0 else 1 end, c2

ops$tkyte%ORA9IR2> select *
  2    from (
  3  select c1, c2, c3,
  4         row_number() over (partition by c1, c3 order by case when c2 = 'c' then 0 else 1 end, c2) rn
  5    from t1
  6         )
  7   where rn = 1;

        C1 C C3                 RN
---------- - ---------- ----------
         1 c 01/01/2000          1
         1 p 01/01/2005          1
         2 p 01/01/2000          1
         3 p 01/01/2006          1
         4 c 01/01/2008          1

create 2 buckets, one for 'c' and another for rest

Shailendra Parate, May 29, 2008 - 5:09 am UTC

Hi Nordine,

If I understand correctly, you want the 'c' to take priority and from other categories only the earlier (or the latest) as the case maybe. If this is correct, I have made a slight modification to Tom's original query below.

The case statement : case when c2='c' then 1 else 0 end forces the data into 2 buckets, one priority bucket for 'c' and another for the rest of the categories.

select c1,c2,c3
    from (
    select c1, c2, c3, case when c2='c' then 1 else 0 end as priority_partition,
           row_number() over (partition by c1,  case when c2='c' then 1 else 0 end order by c3) rn
      from t1
           )
where rn = 1

Thanks, Shailendra ..

more details

nordine, May 29, 2008 - 7:41 am UTC

thank you Shailendra for your help
Your solution produce the following result :
C1 C C3
---------- - --------
1 p 01/01/00
1 c 01/01/00
2 p 01/01/00
3 p 01/01/06
4 c 01/01/08

It's not exactly what I want.

I should not have the line : 1 p 01/01/00 in the result.
Only 1 c 01/01/00 must apear for this person.
Thank you

Re:delete duplicate rows

Mital, May 29, 2008 - 12:06 pm UTC

Hi Nordine,
  Based upon your requirements and following up on Tom's answers, I modified the partition clause to get your desired result set.


I inserted two more values to test further.

insert into t values(4,'a','03/02/2008');
insert into t values(3,'c','03/02/2006');

COMMIT;

I just removed the column c2 from the partition by clause from Tom's original SQL.

SQL> select c1,c2,c3
  2    from ( select t.*,
  3           row_number() over(partition by c1 order by case when c2 = 'c' then 0 else 1 end) rn
  4         from t)
  5   where rn=1;



        C1 C C3
---------- - ---------
         1 c 01-JAN-00
         2 p 01-JAN-00
         3 c 02-MAR-06
         4 c 01-JAN-08

May 29, 2008 - 1:21 pm UTC

Mital

But then Nordine will come back and tell us about the letter z and how sometimes the letter z appears instead of p when c is not there, but since there is a p, it should show p because p < z

so...

row_number() over (partition by c1, c3 order by case when c2 = 'c' then 0 else 1 end,
c2) rn

put the 'c' values first, then order by c2 normally - that'll cover it.

Re:delete duplicate rows

Mital, May 29, 2008 - 3:10 pm UTC

Tom,
considering your expertise with such requirements (on the fly change in requirements and requester unaware of what they want), I can understand your point.

The clause
row_number() over(partition by c1,c3 order by case when c2 = 'c' then 0 else 1 end) rn

will put the 'c' values first(based on Nordine's initial and second requirement).
you are spot on when any values after 'p' is inserted (q,r,s.....)

Thanks a lot for the in-depth explanation.

May 29, 2008 - 9:39 pm UTC

hah, I am totally guessing - we had much back and forth that no one saw on this, trying to extract the requirements - I made them change the example, provide more information

and yet, it was insufficient.... I didn't think to ask this time "are P and C the only values" (you should have seen the first example - all of the dates were the same - constant, not very "useful". And that C came before P - that is explained as "c is more priority" - not very clear... but anyway)

I think the best people in our industry are those that can ask a well formed question - not provide the answer - but ask the question.

Getting the answer is actually rather easy.

Asking the right question, specified correctly, that is more difficult.

A reader, May 31, 2008 - 8:47 am UTC

Very pertinent remarks TOM. I will try to be more precise in the future. Thank you for your invaluable assistance and thank you to all who have contributed to enriching this post
Nordine

how about using distinct

kiran, June 02, 2008 - 2:22 am UTC

why dont u try

select distinct(c1),c2,c3 from t1

June 02, 2008 - 11:17 am UTC

you understand that is identical to

select distinct c1, c2, c3 from t1;

the () around c1 do *nothing*

distinct is not a function, distinct works on the entire result set there.

select dummy from dual;
select (dummy) from dual;

they are the same...

there is:

select ALL columns....
select DISTINCT columns....
select UNIQUE columns.....

the default is SELECT ALL, the ALL word is optional.

did you think to TRY your solution before posting it? I do not believe you read the original posters question at all.

I hope "U" does try it, so they can verify it does not work at all.

delete duplicate rows

r-a-v-i, June 04, 2008 - 6:13 pm UTC

Hi Tom,

Schema : Scott/ Tiger
Version : Oracle 10gR2

On the similar lines of the above technique, the following query produces a result, where for each department, the manager record will show up first.

Query :

select e.deptno,e.ename, e.job, 
 row_number() over(partition by e.deptno order by case when job='MANAGER' then 0 else 1 end, job) rn
 from emp e ;

Result :

   DEPTNO ENAME JOB  RN
1 10 CLARK MANAGER  1
2 10 MILLER CLERK  2
3 10 KING PRESIDENT 3
4 20 JONES MANAGER  1
5 20 SCOTT ANALYST  2
6 20 FORD ANALYST  3
7 20 SMITH CLERK  4
8 20 ADAMS CLERK  5
9 30 BLAKE MANAGER  1
10 30 JAMES CLERK  2
11 30 WARD SALESMAN 3
12 30 ALLEN SALESMAN 4
13 30 TURNER SALESMAN 5
14 30 MARTIN SALESMAN 6

Say, I don't want to display the column "RN". So, if I do :

select deptno, ename, job from (
select e.deptno,e.ename, e.job, row_number() over(partition by e.deptno order by 
case when job='MANAGER' then 0 else 1 end,
job) rn
 from emp e);

the above sorting order is not maintained.

Could you please explain why.

Thanks,
r-a-v-i

June 05, 2008 - 9:39 am UTC

because UNLESS AND UNTIL YOU HAVE AN ORDER BY ON THE QUERY YOU HAVE NO REASON TO EXPECT THE DATA TO BE SORTED AT ALL.

...
select e.deptno,e.ename, e.job,
row_number() over(partition by e.deptno order by case when job='MANAGER'
then 0 else 1 end, job) rn
from emp e ;
....

that query does NOT show the manager first.

that query assigned a number "RN" to each row sequetially, after sorting the data by the CASE statement. That the rows accidentally were sorted was just that - an ACCIDENT and not assured to happen in real life.

Until you have ORDER BY, you cannot expect data to be sorted!!!!! order by on the STATMENT, not in an analytic function - IN THE STATEMENT.

You meant to use:

ops$tkyte%ORA11GR1> select e.deptno,e.ename, e.job
  2   from scott.emp e
  3  order by deptno, case when job = 'MANAGER' then 0 else 1 end;

    DEPTNO ENAME      JOB
---------- ---------- ---------
        10 CLARK      MANAGER
        10 KING       PRESIDENT
        10 MILLER     CLERK
        20 JONES      MANAGER
        20 SCOTT      ANALYST
        20 FORD       ANALYST
        20 ADAMS      CLERK
        20 SMITH      CLERK
        30 BLAKE      MANAGER
        30 ALLEN      SALESMAN
        30 TURNER     SALESMAN
        30 JAMES      CLERK
        30 WARD       SALESMAN
        30 MARTIN     SALESMAN

14 rows selected.

Your "partition by" and "order by" in the analytic do not mean, imply, infer that the data would be sorted in any way shape or form

If you learn one thing today make it be this:

UNLESS AND UNTIL YOUR SQL ENDS WITH "ORDER <sibilings> BY", you have no reason to expect the data to be retrieved in any order.

NOTHING short of ORDER BY orders data, group by does not assure your data is sorted, neither do analytics, INDEX hints - NOTHING but ORDER BY on the end of the query does that

delete duplicate rows

r-a-v-i, June 05, 2008 - 2:33 pm UTC

Thanks for the detailed explanation Tom.

You always rock !!

Here is one more way

Bijay, June 05, 2008 - 3:56 pm UTC

drop table t1;
create table t1 ( c1 int, c2 int, c3 char(1) );
insert into t1 values ( 1, 50, 'a' );
insert into t1 values ( 1, 50, 'a' );
insert into t1 values ( 1, 50, 'a' );
insert into t1 values ( 1, 50, 'b' );
insert into t1 values ( 2, 89, 'x' );
insert into t1 values ( 2, 89, 'y' );
insert into t1 values ( 2, 89, 'x' );
insert into t1 values ( 2, 89, 'x' );
insert into t1 values ( 2, 89, 'z' );
insert into t1 values ( 2, 89, 'z' );
insert into t1 values ( 2, 89, 'z' );
commit;
select * from t1;

delete from T1
where rowid <> ( select max(rowid)
from t1 b
where b.c1 = t1.c1
and b.c2 = t1.c2 )
/
select * from t1;

June 05, 2008 - 7:14 pm UTC

ummm, please read the original question.

you need to order some data there to get the "right one" and in fact, they don't really want to delete - they want to select out their concept of "the right unique row"

A reader, June 07, 2008 - 3:21 pm UTC

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

more details

create 2 buckets, one for 'c' and another for rest

more details

Re:delete duplicate rows

Re:delete duplicate rows

how about using distinct

delete duplicate rows

delete duplicate rows

Here is one more way

More to Explore

Analytics