Home>Question Details



Pankaj -- Thanks for the question regarding "DISTINCT vs, GROUP BY", version 8.1.7

Submitted on 30-Jan-2005 22:39 Central time zone
Last updated 2-Nov-2008 16:59

You Asked

Tom, 

Just want to know the difference between DISTINCT and GROUP BY in queries where I'm not 
using any aggregate functions.

Like for example.

Select emp_no, name from Emp
Group by emo_no, name

And 

Select distinct emp_no, name from emp;


Which one is faster and why ? 


Thanks
 

and we said...

they are for all intents and purposes the same...  and really easy for you to 
evaluate!!

if you run:

set linesize 121
set echo on
                                                                                          
                                  
drop table t;
                                                                                          
                                  
create table t
as
select * from all_objects;
                                                                                          
                                  
alter table t add constraint t_pk primary key(object_id);
exec dbms_stats.gather_table_stats( user, 'T', cascade=>true );
                                                                                          
                                  
set autotrace traceonly
select distinct owner, object_name, object_type from t;
select owner, object_name, object_type from t group by owner, object_name, object_type;
set autotrace off
alter session set sql_trace=true;
set autotrace traceonly
select distinct owner, object_name, object_type from t;
select distinct owner, object_name, object_type from t;
select distinct owner, object_name, object_type from t;
select owner, object_name, object_type from t group by owner, object_name, object_type;
select owner, object_name, object_type from t group by owner, object_name, object_type;
select owner, object_name, object_type from t group by owner, object_name, object_type;
set autotrace off


select distinct owner, object_name, object_type from t
                                                                                          
                                  
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch     9525      0.92       0.83          0       1992          0      142815
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     9531      0.92       0.83          0       1992          0      142815
                                                                                          
                                  
Rows     Row Source Operation
-------  ---------------------------------------------------
  47605  SORT UNIQUE (cr=664 pr=0 pw=0 time=177034 us)
  47938   TABLE ACCESS FULL T (cr=664 pr=0 pw=0 time=48087 us)
********************************************************************************
select owner, object_name, object_type from t group by owner, object_name, object_type
                                                                                          
                                  
                                                                                          
                                  
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch     9525      0.93       0.85          0       1992          0      142815
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     9531      0.93       0.85          0       1992          0      142815
                                                                                          
                                  
Rows     Row Source Operation
-------  ---------------------------------------------------
  47605  SORT GROUP BY (cr=664 pr=0 pw=0 time=166792 us)
  47938   TABLE ACCESS FULL T (cr=664 pr=0 pw=0 time=48006 us)
 

Reviews    
4 stars Thanks Tom   January 31, 2005 - 9am Central time zone
Reviewer: Pankaj from MD, USA
As always, thanks tom. 


3 stars DISTINCT issues   December 19, 2005 - 2pm Central time zone
Reviewer: Mike Angelastro from Salt Lake City, UT USA
Isn't using a "DISTINCT" sometimes a sign of a query that hasn't been fully thought out?  Whenever 
I create a query, I run it with and without a "DISTINCT" and, if there is a difference in the 
record counts, I try to figure out why. Usually, if the record counts are different, there is 
something I hadn't considered. 

 


Followup   December 19, 2005 - 4pm Central time zone:

it *could* mean that.  

it doesn't have to mean that.


 

3 stars   January 19, 2006 - 3am Central time zone
Reviewer: A reader 


5 stars Direct Answer   May 11, 2006 - 8pm Central time zone
Reviewer: A reader from Austin, TX USA
It is always nice to see an answer backed up with data rather than conjecture. 


1 stars Distinct vs Group By   October 5, 2006 - 4am Central time zone
Reviewer: Victor from RSA
Hi there. I disagree with the statement that they are the same. 

I had a statement, built of two nested "group by" tables, and it hung... changed to "distinct" and 
voila! 1.5min execution.

I think distinct *pure thumbsuck* only takes one of each result value, where group by takes all 
results, and summarises the result into unique values. This is support by the added functionality 
group by has - HAVING!

ok then
Victor 


Followup   October 5, 2006 - 8am Central time zone:

they are the same in that the results they return are ....... ta-dah - the same.  They have the 
same effect.

Just like:

select * from t where x in ( select y from t2 )

is the "same" as:

select t.* from t, (select distinct y from t2) t2 where t.x = t2.y;

they may well perform differently (different plans) but they are "interchangable, the same" 

3 stars DISTINCT de-mythified by Prometheus   October 5, 2006 - 9am Central time zone
Reviewer: Duke Ganote from Terra\USA\Ohio\ClermontCty\BataviaTwp
Oh, this takes me back-- one of the rule-of-thumb (ROT) myths I remember hearing from crusty DBAs 
when I started working with Oracle DBMS late last century:  
* "always use GROUP BY instead of DISTINCT because it performs better."  
* "Never use outer joins; make the application do that"
My first purchased book, "Oracle 8i: The Complete Reference", had a scant page on explain plan that 
illustrated just a few lines of cryptic-looking output...

Then early in this century I noticed a piece in Oracle magazine by a guy who knew how to get 
performance statistics from the DBMS, interpret the results, and who used the experimental method 
to test hypotheses.  I was impressed.  And he writes some great books!

Thanks (exponentially), Tom ! 


4 stars   October 5, 2006 - 5pm Central time zone
Reviewer: David Aldridge from Colorado Springs, CO
I ran exactly the same test in 10.2 just to confirm that nothing about the HASH GROUP BY changed 
this, and noticed that the distinct query used HASH UNIQUE, which made me initially believe that 
both operations were still internally the same.

However the tkprof'd trace file showed the DISTINCT operation using more CPU than the GROUP BY ...

select distinct owner, object_name, object_type 
from
 t


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch    10095      1.85       1.78         45       2106          0      151374
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    10101      1.85       1.79         45       2106          0      151374

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62  

Rows     Row Source Operation
-------  ---------------------------------------------------
  50458  HASH UNIQUE (cr=702 pr=45 pw=45 time=2405972 us)
  50750   TABLE ACCESS FULL T (cr=702 pr=0 pw=0 time=2334900 us)

********************************************************************************


********************************************************************************

select owner, object_name, object_type from t group by owner, object_name,
object_type

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch    10095      1.18       1.08          0       2106          0      151374
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    10101      1.19       1.08          0       2106          0      151374

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62  

Rows     Row Source Operation
-------  ---------------------------------------------------
  50458  HASH GROUP BY (cr=702 pr=0 pw=0 time=624258 us)
  50750   TABLE ACCESS FULL T (cr=702 pr=0 pw=0 time=507831 us)

********************************************************************************


If you can't reproduce at your end Tom then I can send trace files and whatnot. 


Followup   October 6, 2006 - 8am Central time zone:

I would say those numbers are close enough for me to call them "the same" over time. (look at your 
elapsed times.... for example)


ops$tkyte%ORA10GR2> create table t as select * from all_objects;

Table created.

ops$tkyte%ORA10GR2> insert /*+ APPEND */ into t select * from t;

50036 rows created.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2> insert /*+ APPEND */ into t select * from t;

100072 rows created.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
  2          for i in 1 .. 1
  3          loop
  4                  for x in (select distinct owner, object_name, object_type from t)
  5                  loop
  6                          null;
  7                  end loop;
  8                  for x in (select owner, object_name, object_type from t group by owner, 
object_name, object_type)
  9                  loop
 10                          null;
 11                  end loop;
 12          end loop;
 13  end;
 14  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';

Session altered.

ops$tkyte%ORA10GR2> begin
  2          for i in 1 .. 5
  3          loop
  4                  for x in (select distinct owner, object_name, object_type from t)
  5                  loop
  6                          null;
  7                  end loop;
  8                  for x in (select owner, object_name, object_type from t group by owner, 
object_name, object_type)
  9                  loop
 10                          null;
 11                  end loop;
 12          end loop;
 13  end;
 14  /

PL/SQL procedure successfully completed.





SELECT DISTINCT OWNER, OBJECT_NAME, OBJECT_TYPE FROM T
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      5      0.00       0.00          0          0          0           0
Fetch     2475      1.54       1.59        165      13805          0      247425
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2481      1.54       1.59        165      13806          0      247425

Rows     Row Source Operation
-------  ---------------------------------------------------
 247425  HASH UNIQUE (cr=13805 pr=165 pw=165 time=1381040 us)
1000720   TABLE ACCESS FULL T (cr=13805 pr=0 pw=0 time=1000940 us)



********************************************************************************
SELECT OWNER, OBJECT_NAME, OBJECT_TYPE FROM T GROUP BY OWNER, OBJECT_NAME, OBJECT_TYPE
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      5      0.00       0.00          0          0          0           0
Fetch     2475      1.51       1.86          0      13805          0      247425
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2481      1.51       1.86          0      13806          0      247425

Rows     Row Source Operation
-------  ---------------------------------------------------
 247425  HASH GROUP BY (cr=13805 pr=0 pw=0 time=1573230 us)
1000720   TABLE ACCESS FULL T (cr=13805 pr=0 pw=0 time=1000937 us)
 

3 stars   December 7, 2006 - 8pm Central time zone
Reviewer: Chi H from California
I couldn't reproduce this, but found some production data that resembled the following:
select distinct id from t1@dblink
where id = 123;
-- returned one record

select id from t1@dblink
where id = 123;
-- returns 3 records

select *
from t2
where id in (select distinct id from t1@dlink where id = 123);
-- returned 3 records

Like I said, I couldn't create a test case to demonstrate this.  Do you have any idea how this 
could be? 


Followup   December 8, 2006 - 7am Central time zone:

ops$tkyte%ORA10GR2> create table t1 ( id int );

Table created.

ops$tkyte%ORA10GR2> create table t2 ( id int );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 select 123 from all_users where rownum <= 3;

3 rows created.

ops$tkyte%ORA10GR2> insert into t2 select 123 from all_users where rownum <= 3;

3 rows created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select distinct id from t1 where id = 123;

        ID
----------
       123

ops$tkyte%ORA10GR2> select id from t1 where id = 123;

        ID
----------
       123
       123
       123

ops$tkyte%ORA10GR2> select * from t2 where id in (select distinct id from t1 where id = 123);

        ID
----------
       123
       123
       123

the cardinality of a result returned from T2 isn't going to be affected by the cardinality of the 
subquery against t1 - that is LOSE THE DISTINCT, it is not relevant at all.  

T2 must have three records such that ID=123, that is all - no big mystery really...
 

5 stars LOSE THE DISTINCT   December 8, 2006 - 8am Central time zone
Reviewer: Matthew from Warwickshire
Or move it to the outermost SELECT if you just want distinct records 


3 stars   December 8, 2006 - 2pm Central time zone
Reviewer: Chi H from California
I had tried this test case before my posting yesterday:
SQL> create table t1 (a number, b varchar2(10));

Table created.

SQL> insert into t1 values (1, 'A');

1 row created.

SQL> create table t2 (a number);

Table created.

SQL> insert into t2 values (1)
  2  ;

1 row created.

SQL> insert into t2 values (1);

1 row created.

SQL> commit;

Commit complete.

SQL> select distinct a from t2;

         A
----------
         1

SQL> select * from t1
  2  where a in (select a from t2);

         A B
---------- ----------
         1 A

SQL> select * from t1
  2  where a in (select distinct a from t2);

         A B
---------- ----------
         1 A

I'm feeling a bit too dense.  I don't understand the difference between our results. 


Followup   December 9, 2006 - 12pm Central time zone:

umm, I selected from t2, not t1 and I had different numbers of rows.

so, not sure where you are going with this. 

4 stars DISTINCT vs GROUP BY?   December 9, 2006 - 10am Central time zone
Reviewer: Alejandro Daza from Colombia
Hey David Aldridge, that test you did is not the same, you have to create the index that Tom´s 
create.

A conclusion of these tests, it´s that you have to evaluate your queries, ONE by ONE, and you have 
to make proper indexes to faster up operations like hash. Only with time (expertise (30%)) and the 
way you perform your architect (or your program data IO structure (70%)), you would prefer to make 
some operation or another.

My conclusions here are:
1. If you have sort unique, hey I think could be more faster or the same as group by unique
2. If you have hash unique consider a index to cenvert it to sort unique
3. If you can´t index it all, do the group by operation,
 BUT REMEMBER TRY ALWAYS TO EVALUATE FIRST
 


3 stars   December 10, 2006 - 3pm Central time zone
Reviewer: Chi H from California
Tom,
I just took a closer look and see where we are on different pages.  My fault for not being clear.

As I mentioned, I created the test case before posting the question, so I just copied and pasted.  
It just happened that we used the same table names.

My test case shows that t1 has only 1 record and t2 has 2 records.  Doing a select from t1 where id 
in t2 should only give one record.  In our production environment, this type of scenario returned 
multiple records (3 in the production system).

I was just wondering if you could explain it.  Your test case is different from the scenario. 


Followup   December 10, 2006 - 7pm Central time zone:

ok, tell you what - you post the 100% complete, concise, yet 100% here test case - and let us look 
at it. 

far far far too confusing to piece all of the tiny bits together here. 

3 stars   December 11, 2006 - 3pm Central time zone
Reviewer: Chi H from California
Tom,
Sorry to have confused you.  As I mentioned, I couldn't create a test case, and was hoping you knew 
something from the back of your mind about this situation.

I think the next best thing is to ping support to have them take a look.

Thanks for the follow-up. 


Followup   December 11, 2006 - 7pm Central time zone:

nope, need test case - not following your sequence of events in my head - need to see it STEP by 
STEP 

5 stars Is this a bug?   January 10, 2007 - 4pm Central time zone
Reviewer: A reader 
SQL> select object_type from dba_objects where owner='SYSTEM' and status='INVALI
D' order by owner;

OBJECT_TYPE
-------------------
PROCEDURE
FUNCTION
PACKAGE BODY
PACKAGE BODY
PROCEDURE
PROCEDURE
PROCEDURE
FUNCTION
PROCEDURE

9 rows selected.

SQL> select distinct object_type from dba_objects where owner='SYSTEM' and statu
s='INVALID' order by owner;

OBJECT_TYPE
-------------------
PROCEDURE
PACKAGE BODY
FUNCTION

SQL> select distinct object_type from dba_objects where owner like 'SYSTEM' and statu
s='INVALID' order by owner

              *
ERROR at line 1:
ORA-01791: not a SELECTed expression


SQL>

Oracle 10g r2.

Thanks


2 stars please think about all posibilities   March 26, 2007 - 4am Central time zone
Reviewer: deccar from your ass
fools... what will be happen if you use an aggregation function with a group by clause when you really wanted to use distinct? well I'll tell you, your results will be erroneous, cause the function DOES use all the resulting tuples, not only the ones youre seeing.

Followup   March 26, 2007 - 7am Central time zone:

brilliant.

absolute hogwash as stated

but well done, thanks so much for your "input"




5 stars Please be professional !!   March 26, 2007 - 2pm Central time zone
Reviewer: Tom Admirer from WA, USA
Tom,

The above review comments do not adhere to the professional standards. People participating here should apply some professional standards and avoid using some coarse or vulgar words...Tom...you should take this seriously and discourage them (as you do it for the people who do the IM talk)...Probably you should consider removing the previous review from deccar.

Well tom, it finally rests in your discretion.

To all other readers and reviewers, please be professional in here...it is not the site to vent your anger...Please express your ideas in a more elegant/proper way.


Followup   March 27, 2007 - 9am Central time zone:

moderating is a slippery slope. where does it end.

I don't moderate, but I have no problem making fun :)

So, if we either

o laugh at them
o ignore them

which ever you prefer - they lose, we win.


4 stars How would I do that?   May 3, 2007 - 5pm Central time zone
Reviewer: daniel from iowa
create table test_table
(t_id number,
t_name varchar2(10)
);

insert into test_table VALUES (1, 'RAIN')
/
insert into test_table VALUES (2, 'RAIN')
/
insert into test_table VALUES (3, 'RAIN')
/
insert into test_table VALUES (1, 'FOG')
/
insert into test_table VALUES (2, 'FOG')
/
My goal is to select records (all fields) based on the t_name being distinct. I just need to get 1 record for 'RAIN' AND 1 for 'FOG'. Doesn't matter which one. Distinct will look on all the fields and will return 5 rows.



Followup   May 4, 2007 - 12pm Central time zone:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:228182900346230020


shows three different techniques.
4 stars this seems to work   May 3, 2007 - 5pm Central time zone
Reviewer: daniel from iowa
Would this be correct?

select distinct t_name, first_value(t_id) over (partition by t_name)
from test_table



Followup   May 4, 2007 - 12pm Central time zone:

no, that returns every row

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:228182900346230020


shows three techniques
3 stars So am I just getting lucky?   May 4, 2007 - 3pm Central time zone
Reviewer: daniel from iowa
SQL> select * from test_table;

    T_ID T_NAME
---------- ----------
      1 RAIN
      3 RAIN
      1 FOG
      2 FOG

SQL> select distinct t_name, first_value(t_id) over (partition by t_name)
2 from test_table
3 /

T_NAME  FIRST_VALUE(T_ID)OVER(PARTITIONBYT_NAME)
---------- ----------------------------------------
FOG                              1
RAIN                              1                          1

Followup   May 4, 2007 - 4pm Central time zone:

sorry, did not see the distinct

yes, that will work as well

4 stars Distinct on Analytics?   May 5, 2007 - 10pm Central time zone
Reviewer: Tom Admirer from WA, USA
Tom,

I do not think that you would appreciate the idea of using "DISTINCT" on analytics function. Do you?

So the better idea would be to use any of the three ways you suggested:
select t_id,t_name from
(select a.*,row_number() over (partition by t_name order by t_id) rn from test_table a )
where rn=1


SCOTT @ MYDB> /

      T_ID T_NAME
---------- ----------
         1 FOG
         1 RAIN


Or is there any workaround to avoid the distinct?

Followup   May 8, 2007 - 10am Central time zone:

you just demonstrated one way.

I pointed you to three ways.
3 stars Probably no difference   May 7, 2007 - 10am Central time zone
Reviewer: Mark Brady from Baltimore, MD USA
The Analytic function and the Distinct will both cause a sort - I believe. Did you cost both out? don't just guess if distinct is worse, show that it is.

It's true that I usually tell my developers that if you have to add distinct to make the results correct and you don't know why, it's a missed join causing a cartesian. It's also true that if you use an AF to do normal aggregation you'll get back a lot of duplicate rows that you have to eliminate. (There's another post somewhere on this site where TK does show this point and that the Distinct is ok)

Although I think you tried to simplify your example too much and you made it non-sensical.

Followup   May 8, 2007 - 11am Central time zone:

No, the distinct will be in general much worse - the optimizer recognizes top-n quereis with row_number()

T is a copy of all objects:

select *
  from (select t.*, row_number() over (partition by owner order by created) rn
          from t )
 where rn = 1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        3      0.20       0.20          3        692          4          30
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.21       0.21          3        693          4          30

Rows     Row Source Operation
-------  ---------------------------------------------------
     30  VIEW  (cr=692 pr=3 pw=3 time=206306 us)
     68   WINDOW SORT PUSHED RANK (cr=692 pr=3 pw=3 time=206278 us)
  49998    TABLE ACCESS FULL T (cr=692 pr=0 pw=0 time=600043 us)

********************************************************************************
select distinct owner,
first_value( OWNER ) over (partition by owner ),
first_value( OBJECT_NAME ) over (partition by owner ),
first_value( SUBOBJECT_NAME ) over (partition by owner ),
first_value( OBJECT_ID ) over (partition by owner ),
first_value( DATA_OBJECT_ID ) over (partition by owner ),
first_value( OBJECT_TYPE ) over (partition by owner ),
first_value( CREATED ) over (partition by owner ),
first_value( LAST_DDL_TIME ) over (partition by owner ),
first_value( TIMESTAMP ) over (partition by owner ),
first_value( STATUS ) over (partition by owner ),
first_value( TEMPORARY ) over (partition by owner ),
first_value( GENERATED ) over (partition by owner ),
first_value( SECONDARY ) over (partition by owner )
from t

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        3      0.96       1.93       2713        692         16          30
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.96       1.94       2713        693         16          30

Rows     Row Source Operation
-------  ---------------------------------------------------
     30  HASH UNIQUE (cr=692 pr=2713 pw=2024 time=1935907 us)
  49998   WINDOW SORT (cr=692 pr=2713 pw=2024 time=2031276 us)
  49998    TABLE ACCESS FULL T (cr=692 pr=0 pw=0 time=350063 us)

5 stars   May 9, 2007 - 5pm Central time zone
Reviewer: orafan from VA
I have a table with receipe and ingredient information.
Pleas help me out to write a query to bring all receipes which has 'ING1' and 'ING2' in it .So in this case the result is receipe1 and receipe2.

create table abc ( receipe varchar2(10) ,ingredient varchar2(10))
/

insert into abc values ('Receipe1', 'ING1')
/
insert into abc values ('Receipe1', 'ING2')
/
insert into abc values ('Receipe1', 'ING3')
/
insert into abc values ('Receipe2', 'ING1')
/
insert into abc values ('Receipe2', 'ING3')
/
insert into abc values ('Receipe3', 'ING1')
/
insert into abc values ('Receipe3', 'ING2')
/
insert into abc values ('Receipe3', 'ING10')
/

Followup   May 11, 2007 - 10am Central time zone:

smells like homework.

you should be able to come with with at least 5 ways to do this... go for it, on a scale of 1 to 10 sql-wise, this is a 2 in terms of complexity (think GROUP BY and HAVING).
5 stars   May 9, 2007 - 10pm Central time zone
Reviewer: orafan from VA
Forgot to maintain that I am looking for a sql solution without using set operation.

Followup   May 11, 2007 - 10am Central time zone:

"sql solution without using a set operation"

well, that is going to be HARD :)

given that, well, SELECT is a set operation.
4 stars   May 10, 2007 - 3am Central time zone
Reviewer: yaz from CT
select receipe from abc
where receipe in (select receipe from abc where ingredient = 'ING1')
and ingredient = 'ING2';


5 stars Analytics Way   May 11, 2007 - 9pm Central time zone
Reviewer: A reader 
select receipe from (
select receipe,count(case when ingredient = 'ING1' then ingredient end) CNT_ING1,
               count(case when ingredient = 'ING2' then ingredient end) CNT_ING2
from abc group by receipe) where CNT_ING1>=1 and CNT_ING2>=1 order by 1


How about this?

Followup   May 14, 2007 - 12pm Central time zone:

that is not analytics, that is aggregation


select recipe
  from table
 where recipe in ( 'ING1', 'ING2' )
having count(distinct ingredient) = 2;


would be a "better" way to use aggregation....
4 stars Small Correction   May 14, 2007 - 4pm Central time zone
Reviewer: A reader 
Tom,

Small correction in your query, Probably this is what you meant...The logic was good.


select receipe from abc where ingredient in ( 'ING1', 'ING2' ) group by receipe having 
count(distinct ingredient) = 2;


Followup   May 14, 2007 - 5pm Central time zone:

correct, where ingredient in ....
3 stars Trick Question.. sort of!   May 16, 2007 - 7am Central time zone
Reviewer: Richard Armstrong-Finnerty from UK
The question is "a query to bring all receipes which has 'ING1' and 'ING2' in it .So in this case the result is receipe1 and receipe2"... which is impossible, as receipe2 does not have ING2! The 2 receipes (sic) that do have ING1 & ING2 are receipe1 & receipe3.

5 stars i love you tom!   March 26, 2008 - 12pm Central time zone
Reviewer: dfxgirl from cambodia
yeah that works! Tom is so sexy... uh.. i´m wet...


1 stars   April 16, 2008 - 11pm Central time zone
Reviewer: A reader 


3 stars Different results with Distinct vs. Group By   October 31, 2008 - 6pm Central time zone
Reviewer: Dylan from Albany, NY
While doing some performance turning on a procedure, I came across a case where not only does the performance vary between a statement using distinct vs. group by, but I'm also getting different results. This had me scratching my head a little bit...

EUSER@stdv > set autotrace traceonly
EUSER@stdv > select billing_day
  2  from a_table
  3  group by billing_day;

<b>138 rows selected.

Elapsed: 00:00:00.78</b>

Execution Plan
----------------------------------------------------------

-----------------------------------------------------------------------
| Id  | Operation          | Name             | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                  |   136 |  1088 |   485 |
|   1 |  SORT GROUP BY     |                  |   136 |  1088 |   485 |
|   2 |   TABLE ACCESS FULL|     A_TABLE      |   617K|  4826K|   105 |
-----------------------------------------------------------------------

Note
-----
   - 'PLAN_TABLE' is old version
   - cpu costing is off (consider enabling it)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2703  consistent gets
       2692  physical reads
        188  redo size
       1913  bytes sent via SQL*Net to client
        300  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        138  rows processed

EUSER@stdv > select distinct billing_day
  2  from a_table;

<b>137 rows selected.

Elapsed: 00:00:23.03</b>

Execution Plan
----------------------------------------------------------

------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |   137 |  1096 | 11407 |
|   1 |  SORT UNIQUE       |                   |   137 |  1096 | 11407 |
|   2 |   TABLE ACCESS FULL|     A_TABLE       |    14M|   111M|  2437 |
------------------------------------------------------------------------

Note
-----
   - 'PLAN_TABLE' is old version
   - cpu costing is off (consider enabling it)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      63811  consistent gets
      63633  physical reads
       2464  redo size
       1981  bytes sent via SQL*Net to client
        300  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        137  rows processed


A_TABLE is a relatively small staging table for a data warehouse which contains < 350,000 rows. Since it's a dev environment and I didn't know what condition the table was in, I reorged it and analyzed it just for kicks. That had no impact.

Any thought on what could be causing something like this? We're on 10.2.0.4, but optimizer is still on 9.2.


Followup   November 2, 2008 - 4pm Central time zone:

you would need to provide a full example

and of course, if you believe you are getting a wrong result - especially since you obviously have support - you would file it with them - right??
3 stars Ignore my last   November 3, 2008 - 9am Central time zone
Reviewer: Dylan from Albany, NY
Figured out what it was.  User error after a long week.



Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement