Thanks Tom
Pankaj, January   31, 2005 - 9:38 am UTC
 
 
As always, thanks tom. 
 
 
DISTINCT issues
Mike Angelastro, December  19, 2005 - 2:33 pm UTC
 
 
Isn't using a "DISTINCT" sometimes a sign of a query that hasn't been fully thought out?  Whenever I create a query, I run it with and without a "DISTINCT" and, if there is a difference in the record counts, I try to figure out why. Usually, if the record counts are different, there is something I hadn't considered. 
 
 
December  19, 2005 - 4:04 pm UTC 
 
it *could* mean that.  
it doesn't have to mean that.
 
 
 
 
A reader, January   19, 2006 - 3:36 am UTC
 
 
 
 
Direct Answer
A reader, May       11, 2006 - 8:40 pm UTC
 
 
It is always nice to see an answer backed up with data rather than conjecture. 
 
 
Distinct vs Group By
Victor, October   05, 2006 - 4:13 am UTC
 
 
Hi there. I disagree with the statement that they are the same. 
I had a statement, built of two nested "group by" tables, and it hung... changed to "distinct" and voila! 1.5min execution.
I think distinct *pure thumbsuck* only takes one of each result value, where group by takes all results, and summarises the result into unique values. This is support by the added functionality group by has - HAVING!
ok then
Victor 
 
October   05, 2006 - 8:14 am UTC 
 
they are the same in that the results they return are ....... ta-dah - the same.  They have the same effect.
Just like:
select * from t where x in ( select y from t2 )
is the "same" as:
select t.* from t, (select distinct y from t2) t2 where t.x = t2.y;
they may well perform differently (different plans) but they are "interchangable, the same" 
 
 
 
DISTINCT de-mythified by Prometheus
Duke Ganote, October   05, 2006 - 9:55 am UTC
 
 
Oh, this takes me back-- one of the rule-of-thumb (ROT) myths I remember hearing from crusty DBAs when I started working with Oracle DBMS late last century:  
* "always use GROUP BY instead of DISTINCT because it performs better."  
* "Never use outer joins; make the application do that"
My first purchased book, "Oracle 8i: The Complete Reference", had a scant page on explain plan that illustrated just a few lines of cryptic-looking output...
Then early in this century I noticed a piece in Oracle magazine by a guy who knew how to get performance statistics from the DBMS, interpret the results, and who used the experimental method to test hypotheses.  I was impressed.  And he writes some great books!
Thanks (exponentially), Tom ! 
 
 
David Aldridge, October   05, 2006 - 5:03 pm UTC
 
 
I ran exactly the same test in 10.2 just to confirm that nothing about the HASH GROUP BY changed this, and noticed that the distinct query used HASH UNIQUE, which made me initially believe that both operations were still internally the same.
However the tkprof'd trace file showed the DISTINCT operation using more CPU than the GROUP BY ...
select distinct owner, object_name, object_type 
from
 t
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch    10095      1.85       1.78         45       2106          0      151374
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    10101      1.85       1.79         45       2106          0      151374
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62  
Rows     Row Source Operation
-------  ---------------------------------------------------
  50458  HASH UNIQUE (cr=702 pr=45 pw=45 time=2405972 us)
  50750   TABLE ACCESS FULL T (cr=702 pr=0 pw=0 time=2334900 us)
********************************************************************************
********************************************************************************
select owner, object_name, object_type from t group by owner, object_name,
object_type
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        3      0.00       0.00          0          0          0           0
Execute      3      0.00       0.00          0          0          0           0
Fetch    10095      1.18       1.08          0       2106          0      151374
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    10101      1.19       1.08          0       2106          0      151374
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62  
Rows     Row Source Operation
-------  ---------------------------------------------------
  50458  HASH GROUP BY (cr=702 pr=0 pw=0 time=624258 us)
  50750   TABLE ACCESS FULL T (cr=702 pr=0 pw=0 time=507831 us)
********************************************************************************
If you can't reproduce at your end Tom then I can send trace files and whatnot. 
 
October   06, 2006 - 8:26 am UTC 
 
I would say those numbers are close enough for me to call them "the same" over time. (look at your elapsed times.... for example)
ops$tkyte%ORA10GR2> create table t as select * from all_objects;
Table created.
ops$tkyte%ORA10GR2> insert /*+ APPEND */ into t select * from t;
50036 rows created.
ops$tkyte%ORA10GR2> commit;
Commit complete.
ops$tkyte%ORA10GR2> insert /*+ APPEND */ into t select * from t;
100072 rows created.
ops$tkyte%ORA10GR2> commit;
Commit complete.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
  2          for i in 1 .. 1
  3          loop
  4                  for x in (select distinct owner, object_name, object_type from t)
  5                  loop
  6                          null;
  7                  end loop;
  8                  for x in (select owner, object_name, object_type from t group by owner, object_name, object_type)
  9                  loop
 10                          null;
 11                  end loop;
 12          end loop;
 13  end;
 14  /
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';
Session altered.
ops$tkyte%ORA10GR2> begin
  2          for i in 1 .. 5
  3          loop
  4                  for x in (select distinct owner, object_name, object_type from t)
  5                  loop
  6                          null;
  7                  end loop;
  8                  for x in (select owner, object_name, object_type from t group by owner, object_name, object_type)
  9                  loop
 10                          null;
 11                  end loop;
 12          end loop;
 13  end;
 14  /
PL/SQL procedure successfully completed.
SELECT DISTINCT OWNER, OBJECT_NAME, OBJECT_TYPE FROM T
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      5      0.00       0.00          0          0          0           0
Fetch     2475      1.54       1.59        165      13805          0      247425
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2481      1.54       1.59        165      13806          0      247425
Rows     Row Source Operation
-------  ---------------------------------------------------
 247425  HASH UNIQUE (cr=13805 pr=165 pw=165 time=1381040 us)
1000720   TABLE ACCESS FULL T (cr=13805 pr=0 pw=0 time=1000940 us)
********************************************************************************
SELECT OWNER, OBJECT_NAME, OBJECT_TYPE FROM T GROUP BY OWNER, OBJECT_NAME, OBJECT_TYPE
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      5      0.00       0.00          0          0          0           0
Fetch     2475      1.51       1.86          0      13805          0      247425
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2481      1.51       1.86          0      13806          0      247425
Rows     Row Source Operation
-------  ---------------------------------------------------
 247425  HASH GROUP BY (cr=13805 pr=0 pw=0 time=1573230 us)
1000720   TABLE ACCESS FULL T (cr=13805 pr=0 pw=0 time=1000937 us)
 
 
 
 
 
Chi H, December  07, 2006 - 8:48 pm UTC
 
 
I couldn't reproduce this, but found some production data that resembled the following:
select distinct id from t1@dblink
where id = 123;
-- returned one record
select id from t1@dblink
where id = 123;
-- returns 3 records
select *
from t2
where id in (select distinct id from t1@dlink where id = 123);
-- returned 3 records
Like I said, I couldn't create a test case to demonstrate this.  Do you have any idea how this could be? 
 
December  08, 2006 - 7:33 am UTC 
 
ops$tkyte%ORA10GR2> create table t1 ( id int );
Table created.
ops$tkyte%ORA10GR2> create table t2 ( id int );
Table created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 select 123 from all_users where rownum <= 3;
3 rows created.
ops$tkyte%ORA10GR2> insert into t2 select 123 from all_users where rownum <= 3;
3 rows created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select distinct id from t1 where id = 123;
        ID
----------
       123
ops$tkyte%ORA10GR2> select id from t1 where id = 123;
        ID
----------
       123
       123
       123
ops$tkyte%ORA10GR2> select * from t2 where id in (select distinct id from t1 where id = 123);
        ID
----------
       123
       123
       123
<b>the cardinality of a result returned from T2 isn't going to be affected by the cardinality of the subquery against t1 - that is LOSE THE DISTINCT, it is not relevant at all.  
T2 must have three records such that ID=123, that is all - no big mystery really...
</b> 
 
 
 
 
LOSE THE DISTINCT
Matthew, December  08, 2006 - 8:48 am UTC
 
 
Or move it to the outermost SELECT if you just want distinct records 
 
 
Chi H, December  08, 2006 - 2:28 pm UTC
 
 
I had tried this test case before my posting yesterday:
SQL> create table t1 (a number, b varchar2(10));
Table created.
SQL> insert into t1 values (1, 'A');
1 row created.
SQL> create table t2 (a number);
Table created.
SQL> insert into t2 values (1)
  2  ;
1 row created.
SQL> insert into t2 values (1);
1 row created.
SQL> commit;
Commit complete.
SQL> select distinct a from t2;
         A
----------
         1
SQL> select * from t1
  2  where a in (select a from t2);
         A B
---------- ----------
         1 A
SQL> select * from t1
  2  where a in (select distinct a from t2);
         A B
---------- ----------
         1 A
I'm feeling a bit too dense.  I don't understand the difference between our results. 
 
 
December  09, 2006 - 12:41 pm UTC 
 
umm, I selected from t2, not t1 and I had different numbers of rows.
so, not sure where you are going with this. 
 
 
 
DISTINCT vs GROUP BY?
Alejandro Daza, December  09, 2006 - 10:13 am UTC
 
 
Hey David Aldridge, that test you did is not the same, you have to create the index that Tom´s create.
A conclusion of these tests, it´s that you have to evaluate your queries, ONE by ONE, and you have to make proper indexes to faster up operations like hash. Only with time (expertise (30%)) and the way you perform your architect (or your program data IO structure (70%)), you would prefer to make some operation or another.
My conclusions here are:
1. If you have sort unique, hey I think could be more faster or the same as group by unique
2. If you have hash unique consider a index to cenvert it to sort unique
3. If you can´t index it all, do the group by operation,
 BUT REMEMBER TRY ALWAYS TO EVALUATE FIRST
 
 
 
Chi H, December  10, 2006 - 3:06 pm UTC
 
 
Tom,
I just took a closer look and see where we are on different pages.  My fault for not being clear.
As I mentioned, I created the test case before posting the question, so I just copied and pasted.  It just happened that we used the same table names.
My test case shows that t1 has only 1 record and t2 has 2 records.  Doing a select from t1 where id in t2 should only give one record.  In our production environment, this type of scenario returned multiple records (3 in the production system).
I was just wondering if you could explain it.  Your test case is different from the scenario. 
 
December  10, 2006 - 7:41 pm UTC 
 
ok, tell you what - you post the 100% complete, concise, yet 100% here test case - and let us look at it. 
far far far too confusing to piece all of the tiny bits together here. 
 
 
 
Chi H, December  11, 2006 - 3:09 pm UTC
 
 
Tom,
Sorry to have confused you.  As I mentioned, I couldn't create a test case, and was hoping you knew something from the back of your mind about this situation.
I think the next best thing is to ping support to have them take a look.
Thanks for the follow-up. 
 
December  11, 2006 - 7:18 pm UTC 
 
nope, need test case - not following your sequence of events in my head - need to see it STEP by STEP 
 
 
 
Is this a bug?
A reader, January   10, 2007 - 4:46 pm UTC
 
 
SQL> select object_type from dba_objects where owner='SYSTEM' and status='INVALI
D' order by owner;
OBJECT_TYPE
-------------------
PROCEDURE
FUNCTION
PACKAGE BODY
PACKAGE BODY
PROCEDURE
PROCEDURE
PROCEDURE
FUNCTION
PROCEDURE
9 rows selected.
SQL> select distinct object_type from dba_objects where owner='SYSTEM' and statu
s='INVALID' order by owner;
OBJECT_TYPE
-------------------
PROCEDURE
PACKAGE BODY
FUNCTION
SQL> select distinct object_type from dba_objects where owner like 'SYSTEM' and statu
s='INVALID' order by owner
                     *
ERROR at line 1:
ORA-01791: not a SELECTed expression
SQL>
Oracle 10g r2.
Thanks
 
 
please think about all posibilities
deccar, March     26, 2007 - 4:32 am UTC
 
 
fools... what will be happen if you use an aggregation function with a group by clause when you really wanted to use distinct? well I'll tell you, your results will be erroneous, cause the function DOES use all the resulting tuples, not only the ones youre seeing. 
March     26, 2007 - 7:41 am UTC 
 
brilliant.
absolute hogwash as stated
but well done, thanks so much for your "input"
 
 
 
Please be professional !!
Tom Admirer, March     26, 2007 - 2:37 pm UTC
 
 
Tom,
The above review comments do not adhere to the professional standards. People participating here should apply some professional standards and avoid using some coarse or vulgar words...Tom...you should take this seriously and discourage them (as you do it for the people who do the IM talk)...Probably you should consider removing the previous review from deccar. 
Well tom, it finally rests in your discretion.
To all other readers and reviewers, please be professional in here...it is not the site to vent your anger...Please express your ideas in a more elegant/proper way.
 
March     27, 2007 - 9:16 am UTC 
 
moderating is a slippery slope.  where does it end.  
I don't moderate, but I have no problem making fun :)
So, if we either
o laugh at them
o ignore them
which ever you prefer - they lose, we win.
 
 
 
How would I do that?
daniel, May       03, 2007 - 5:02 pm UTC
 
 
create table test_table
(t_id number,
t_name varchar2(10)
);
insert into test_table VALUES (1, 'RAIN')
/
insert into test_table VALUES (2, 'RAIN')
/
insert into test_table VALUES (3, 'RAIN')
/
insert into test_table VALUES (1, 'FOG')
/
insert into test_table VALUES (2, 'FOG')
/
My goal is to select records (all fields) based on the t_name being distinct.  I just need to get 1 record for 'RAIN' AND 1 for 'FOG'. Doesn't  matter which one.  Distinct will look on all the fields and will return 5 rows. 
 
May       04, 2007 - 12:48 pm UTC 
 
 
 
this seems to work
daniel, May       03, 2007 - 5:12 pm UTC
 
 
Would this be correct?
select distinct t_name, first_value(t_id)  over (partition by t_name)
from test_table
 
May       04, 2007 - 12:48 pm UTC 
 
 
 
So am I just getting lucky?
daniel, May       04, 2007 - 3:20 pm UTC
 
 
SQL> select * from test_table;
      T_ID T_NAME
---------- ----------
         1 RAIN
         3 RAIN
         1 FOG
         2 FOG
SQL> select distinct t_name, first_value(t_id) over (partition by t_name)
  2  from test_table 
  3  /
T_NAME     FIRST_VALUE(T_ID)OVER(PARTITIONBYT_NAME)
---------- ----------------------------------------
FOG                                               1
RAIN                                              1                                        1 
May       04, 2007 - 4:06 pm UTC 
 
sorry, did not see the distinct
yes, that will work as well
 
 
 
Distinct on Analytics?
Tom Admirer, May       05, 2007 - 10:06 pm UTC
 
 
Tom,
I do not think that you would appreciate the idea of using "DISTINCT" on analytics function. Do you?
So the better idea would be to use any of the three ways you suggested:
select t_id,t_name from
(select a.*,row_number() over (partition by t_name order by t_id) rn from test_table a )
where rn=1
SCOTT @ MYDB> /
      T_ID T_NAME
---------- ----------
         1 FOG
         1 RAIN
Or is there any workaround to avoid the distinct? 
May       08, 2007 - 10:22 am UTC 
 
you just demonstrated one way.
I pointed you to three ways. 
 
 
Probably no difference
Mark Brady, May       07, 2007 - 10:58 am UTC
 
 
The Analytic function and the Distinct will both cause a sort - I believe. Did you cost both out? don't just guess if distinct is worse, show that it is.
It's true that I usually tell my developers that if you have to add distinct to make the results correct and you don't know why, it's a missed join causing a cartesian. It's also true that if you use an AF to do normal aggregation you'll get back a lot of duplicate rows that you have to eliminate. (There's another post somewhere on this site where TK does show this point and that the Distinct is ok)
Although I think you tried to simplify your example too much and you made it non-sensical.  
May       08, 2007 - 11:07 am UTC 
 
No, the distinct will be in general much worse - the optimizer recognizes top-n quereis with row_number()
T is a copy of all objects:
select *
  from (select t.*, row_number() over (partition by owner order by created) rn
          from t )
 where rn = 1
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        3      0.20       0.20          3        692          4          30
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.21       0.21          3        693          4          30
Rows     Row Source Operation
-------  ---------------------------------------------------
     30  VIEW  (cr=692 pr=3 pw=3 time=206306 us)
     68   WINDOW SORT PUSHED RANK (cr=692 pr=3 pw=3 time=206278 us)
  49998    TABLE ACCESS FULL T (cr=692 pr=0 pw=0 time=600043 us)
********************************************************************************
select distinct owner,
first_value( OWNER ) over (partition by owner ),
first_value( OBJECT_NAME ) over (partition by owner ),
first_value( SUBOBJECT_NAME ) over (partition by owner ),
first_value( OBJECT_ID ) over (partition by owner ),
first_value( DATA_OBJECT_ID ) over (partition by owner ),
first_value( OBJECT_TYPE ) over (partition by owner ),
first_value( CREATED ) over (partition by owner ),
first_value( LAST_DDL_TIME ) over (partition by owner ),
first_value( TIMESTAMP ) over (partition by owner ),
first_value( STATUS ) over (partition by owner ),
first_value( TEMPORARY ) over (partition by owner ),
first_value( GENERATED ) over (partition by owner ),
first_value( SECONDARY ) over (partition by owner )
from t
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        3      0.96       1.93       2713        692         16          30
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.96       1.94       2713        693         16          30
Rows     Row Source Operation
-------  ---------------------------------------------------
     30  HASH UNIQUE (cr=692 pr=2713 pw=2024 time=1935907 us)
  49998   WINDOW SORT (cr=692 pr=2713 pw=2024 time=2031276 us)
  49998    TABLE ACCESS FULL T (cr=692 pr=0 pw=0 time=350063 us)
 
 
 
orafan, May       09, 2007 - 5:29 pm UTC
 
 
I have a table with receipe and ingredient information.
Pleas help me out to write a query to bring all receipes which has 'ING1' and 'ING2' in it .So in this case the result is receipe1 and receipe2.
create table abc ( receipe varchar2(10) ,ingredient varchar2(10))
/
insert into abc values ('Receipe1', 'ING1')
/
insert into abc values ('Receipe1', 'ING2')
/
insert into abc values ('Receipe1', 'ING3')
/
insert into abc values ('Receipe2', 'ING1')
/
insert into abc values ('Receipe2', 'ING3')
/
insert into abc values ('Receipe3', 'ING1')
/
insert into abc values ('Receipe3', 'ING2')
/
insert into abc values ('Receipe3', 'ING10')
/ 
May       11, 2007 - 10:32 am UTC 
 
smells like homework.
you should be able to come with with at least 5 ways to do this...  go for it, on a scale of 1 to 10 sql-wise, this is a 2 in terms of complexity (think GROUP BY and HAVING). 
 
 
orafan, May       09, 2007 - 10:17 pm UTC
 
 
Forgot to maintain that I am looking for a sql solution without using set operation. 
May       11, 2007 - 10:33 am UTC 
 
"sql solution without using a set operation"
well, that is going to be HARD :)
given that, well, SELECT is a set operation. 
 
 
yaz, May       10, 2007 - 3:47 am UTC
 
 
select receipe from abc
where receipe in (select receipe from abc where ingredient = 'ING1')
and ingredient = 'ING2'; 
 
Analytics Way
A reader, May       11, 2007 - 9:05 pm UTC
 
 
select receipe from (
select receipe,count(case when ingredient = 'ING1' then ingredient end) CNT_ING1,
               count(case when ingredient = 'ING2' then ingredient end) CNT_ING2
from abc group by receipe) where CNT_ING1>=1 and CNT_ING2>=1 order by 1
How about this? 
 
May       14, 2007 - 12:54 pm UTC 
 
that is not analytics, that is aggregation
select recipe
  from table
 where recipe in ( 'ING1', 'ING2' )
having count(distinct ingredient) = 2;
would be a "better" way to use aggregation.... 
 
 
Small Correction
A reader, May       14, 2007 - 4:40 pm UTC
 
 
Tom,
Small correction in your query, Probably this is what you meant...The logic was good.
select receipe from abc where ingredient in ( 'ING1', 'ING2' ) group by receipe having count(distinct ingredient) = 2;
 
May       14, 2007 - 5:30 pm UTC 
 
correct, where ingredient in .... 
 
 
Trick Question.. sort of!
Richard Armstrong-Finnerty, May       16, 2007 - 7:53 am UTC
 
 
The question is "a query to bring all receipes which has 'ING1' and 'ING2' in it .So in this case the result is receipe1 and receipe2"... which is impossible, as receipe2 does not have ING2! The 2 receipes (sic) that do have ING1 & ING2 are receipe1 & receipe3. 
 
i love you tom!
dfxgirl, March     26, 2008 - 12:23 pm UTC
 
 
yeah that works! Tom is so sexy... uh.. i´m wet... 
 
A reader, April     16, 2008 - 11:38 pm UTC
 
 
 
 
Different results with Distinct vs. Group By
Dylan, October   31, 2008 - 6:33 pm UTC
 
 
While doing some performance turning on a procedure, I came across a case where not only does the performance vary between a statement using distinct vs. group by, but I'm also getting different results.  This had me scratching my head a little bit...
EUSER@stdv > set autotrace traceonly
EUSER@stdv > select billing_day
  2  from a_table
  3  group by billing_day;
<b>138 rows selected.
Elapsed: 00:00:00.78</b>
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------
| Id  | Operation          | Name             | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                  |   136 |  1088 |   485 |
|   1 |  SORT GROUP BY     |                  |   136 |  1088 |   485 |
|   2 |   TABLE ACCESS FULL|     A_TABLE      |   617K|  4826K|   105 |
-----------------------------------------------------------------------
Note
-----
   - 'PLAN_TABLE' is old version
   - cpu costing is off (consider enabling it)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2703  consistent gets
       2692  physical reads
        188  redo size
       1913  bytes sent via SQL*Net to client
        300  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        138  rows processed
EUSER@stdv > select distinct billing_day
  2  from a_table;
<b>137 rows selected.
Elapsed: 00:00:23.03</b>
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |   137 |  1096 | 11407 |
|   1 |  SORT UNIQUE       |                   |   137 |  1096 | 11407 |
|   2 |   TABLE ACCESS FULL|     A_TABLE       |    14M|   111M|  2437 |
------------------------------------------------------------------------
Note
-----
   - 'PLAN_TABLE' is old version
   - cpu costing is off (consider enabling it)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      63811  consistent gets
      63633  physical reads
       2464  redo size
       1981  bytes sent via SQL*Net to client
        300  bytes received via SQL*Net from client
         11  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        137  rows processed
A_TABLE is a relatively small staging table for a data warehouse which contains < 350,000 rows.  Since it's a dev environment and I didn't know what condition the table was in, I reorged it and analyzed it just for kicks.  That had no impact.
Any thought on what could be causing something like this?  We're on 10.2.0.4, but  optimizer is still on 9.2. 
November  02, 2008 - 4:59 pm UTC 
 
you would need to provide a full example
and of course, if you believe you are getting a wrong result - especially since you obviously have support - you would file it with them - right?? 
 
 
Ignore my last
Dylan, November  03, 2008 - 9:42 am UTC
 
 
Figured out what it was.  User error after a long week. 
 
An example where 'distinct' and 'group by' are not the same
Jack Douglas, May       02, 2011 - 5:11 am UTC
 
 
with w as (select round(level/2) as id from dual connect by level < 11)
select distinct id, count(*) over (partition by id) from w;
with w as (select round(level/2) as id from dual connect by level < 11)
select id, count(*) over (partition by id) from w group by id; 
May       04, 2011 - 12:12 pm UTC 
 
yes, true, because analytics are done after the where clause/aggregation takes place...
 
 
 
Question?
chithambaram.p, May       24, 2011 - 11:57 pm UTC
 
 
Hi Tom,
 How can I get distinct value from the table without using DISTINCT and GROUP BY...Now am using query
select distinct col_name from table1
where col_name>= 0 
group by col_name order by col_name
this query is taking long time bcoz the table has 6 crores records..
how can i get good performance plz help me?
 
May       25, 2011 - 11:26 am UTC 
 
if you have an index on col_name, we can index fast full scan that instead of the table - but distinct is going to be what you use.
There are other ways, but they certainly won't be faster.   
 
 
@ chithambaram.p
Sokrates, May       25, 2011 - 11:48 am UTC
 
 
How can I get distinct value from the table without using DISTINCT and GROUP BY...
query
try UNIQUE
 
May       25, 2011 - 2:17 pm UTC 
 
;) good one, I should have thought of that - as "select unique" is the same as "select distinct"
Just like "select" is the same as "select all" 
 
 
not distinct, or unique, or group by but ...
Dave, May       25, 2011 - 10:44 pm UTC
 
 
select stuff from mytab
union
select stuff from mytab where 1 = 0;
(sorry :-)) 
 
How about this one?
Nathan Marston, May       26, 2011 - 9:56 pm UTC
 
 
SELECT stuff
from mytab t1
where not exists
(select null
 from mytab t2
 where t2.rowid < t1.rowid 
 and t2.col1 = t1.col1
 and t2.col2 = t2.col2
 ... repeat for each column in "stuff"
);
The thinking is that wherever you get a "duplicated" row, all the columns will match - but you want one of the duplicate rows (not zero), so the rowid comparison ensures you just get the one with minimum rowid.
I'm not convinced it even works in the face of nullable columns - I'm thinking NULL values will cause it to behave differently from a DISTINCT/GROUP BY.
You could work your way around that using nvl (and a sentinel value for NULL), but ugh... I'd rather not think about that. That SQL's ugly enough as it is :). 
 
May       27, 2011 - 10:31 am UTC 
 
what about nulls indeed.
it won't work with nulls like distinct, unique or group by would. 
 
 
A reader, May       27, 2011 - 2:51 am UTC
 
 
Hi all,
   Thanks for your reply...I tried all this way...
but still its very slow...and the column also has index..
the reason is am using this query in PRO*C... and the query is condition based if it is pl/sql then we can create seperate view for this query...but here there is no way to do...
here is my Execution Plan...plz help me to get performance in this...
                          SQL PLAN REPORT                 
 ------------------------------------------------------------------------------
   SELECT STATEMENT       Cost = 72928              
   2.1 SORT GROUP BY                              
     3.1 INDEX FAST FULL SCAN COL1_IDX3_NEW INDEX                                                                           
      
---------------------------------------------------------------------------------------                                                                                                                                                     
| Id  | Operation               | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)|                           
---------------------------------------------- ------------------------------------------                           
|   0 | SELECT STATEMENT        |                  |     2 |   108 |       |   108K  (4)|                 
|*  1 |  FILTER                 |                  |       |       |       |            |         
|   2 |   SORT GROUP BY         |                  |     2 |   108 |       |   108K  (4)|                           
|*  3 |    HASH JOIN            |                  |    18M|   932M|   425M|   105K  (2)|                           
|*  4 |     INDEX FAST FULL SCAN| COL1_IDX3_NEW    |    13M|   275M|       | 28472   (3)|                           
|*  5 |     INDEX FAST FULL SCAN| COL1_IDX3_ NEW   |    13M|   400M|       | 28472   (3)|                 
----------------------------------------------------------------------------------------         
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWID=MIN(ROWID))
   3 - access("COL1"="A"."COL1")
   4 - filter("COL1">0)
   5 - filter("COL1">0)
 
Thanks to All..
by
Chithu...
   
May       27, 2011 - 10:40 am UTC 
 
I don't know who you are or what you are talking about "reader"
no query to look at, no real context (you cannot be the one from above since they were obviously using a different query, they were using:
select distinct col_name from table1
where col_name>= 0 
group by col_name order by col_name
 
 
 
Sambhav, May       28, 2011 - 5:55 am UTC
 
 
Hi Tom,
How can we find the sum of each depatment salary without using group by clause.
Regards
Sambhav
 
May       31, 2011 - 9:45 am UTC 
 
please ask questions in ONE and ONLY ONE place.  Look in the other place you asked (and I answered) this same exact question. 
 
 
answer to sambhav
Mani, May       30, 2011 - 6:06 am UTC
 
 
Sambhav, write a plsql procedure where you sum all the salary departmentwise and store it in a temporaray table and do a select from there.
its what is simply re-inventing the wheel :)..any reason as to why group by should not be used??
 
 
@Sambhav
A reader, May       30, 2011 - 8:16 am UTC
 
 
Sambhav,
may be you can use the analytical function:
<code><i>select distinct * from (select  deptno, sum(sal) over(partition by deptno order by deptno)sum_over_dept from emp);</i>
Data:
DeptNo   Sal
10       300
20       200
10       200
Result of query:
DeptNo   Sum_Over_Dept
10       500
20       200 
</code> 
 
 
SORT GROUP BY Vs HASH GROUP BY
Rajeshwaran, Jeyabal, June      09, 2011 - 12:12 pm UTC
 
 
Tom:
If i remember correct, In the previous release of Oracle, we have SORT ORDER BY access path, if a query has Order by clause.
but in Oracle 10g, I don't see that happening
SORT GROUP BY - if Group by & Order by in query 
HASH GROUP BY - if only Group by and Order by is missing 
SORT ORDER BY - if only no group by and only order by clause. 
rajesh@ORA10GR2> set autotrace traceonly explain;
rajesh@ORA10GR2>
rajesh@ORA10GR2> select owner,object_name,count(*)
  2  from big_table
  3  group by owner,object_name
  4  order by owner,object_name;
48441 rows selected.
Elapsed: 00:00:59.88
Execution Plan
----------------------------------------------------------
Plan hash value: 3184991183
----------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |   437K|    12M|       | 84566   (4)| 00:16:55 |
|   1 |  SORT GROUP BY     |           |   437K|    12M|   505M| 84566   (4)| 00:16:55 |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |    12M|   377M|       | 41518   (2)| 00:08:19 |
----------------------------------------------------------------------------------------
rajesh@ORA10GR2>
rajesh@ORA10GR2> select owner,object_name,count(*)
  2  from big_table
  3  group by owner,object_name;
48441 rows selected.
Elapsed: 00:00:46.22
Execution Plan
----------------------------------------------------------
Plan hash value: 1753714399
----------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |   437K|    12M|       | 84566   (4)| 00:16:55 |
|   1 |  HASH GROUP BY     |           |   437K|    12M|   505M| 84566   (4)| 00:16:55 |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |    12M|   377M|       | 41518   (2)| 00:08:19 |
----------------------------------------------------------------------------------------
rajesh@ORA10GR2> select owner,object_name,cnt
  2  from (
  3  select owner,object_name,count(*) as cnt
  4  from big_table
  5  group by owner,object_name
  6  ) order by owner,object_name;
48441 rows selected.
Elapsed: 00:01:06.73
Execution Plan
----------------------------------------------------------
Plan hash value: 3184991183
----------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |   437K|    12M|       | 84566   (4)| 00:16:55 |
|   1 |  SORT GROUP BY     |           |   437K|    12M|   505M| 84566   (4)| 00:16:55 |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |    12M|   377M|       | 41518   (2)| 00:08:19 |
----------------------------------------------------------------------------------------
rajesh@ORA10GR2>
rajesh@ORA10GR2> select owner,object_name
  2  from big_table
  3  order by owner,object_name
  4  /
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1472477105
----------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |    12M|   377M|       |   152K  (3)| 00:30:32 |
|   1 |  SORT ORDER BY     |           |    12M|   377M|   977M|   152K  (3)| 00:30:32 |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |    12M|   377M|       | 41518   (2)| 00:08:19 |
----------------------------------------------------------------------------------------
rajesh@ORA10GR2>
Questions1)What does this SORT GROUP BY & HASH GROUP BY is really about? I am checking in new features guide. but i dont find there
 http://download.oracle.com/docs/cd/B19306_01/server.102/b14214/toc.htm  
June      09, 2011 - 12:20 pm UTC 
 
I'm very confused by this question.
You say 
If i remember correct, In the previous release of Oracle, we have SORT ORDER BY access path, if a query has Order by clause. 
but in Oracle 10g, I don't see that happening 
SORT GROUP BY - if Group by & Order by in query 
HASH GROUP BY - if only Group by and Order by is missing 
SORT ORDER BY - if only no group by and only order by clause. 
and then proceeded to post and example that contradicts 100% what you just said.  We see that in queries with order bys, it tends to use a sort order/group by.  
Which is what you just said you didn't see happening?????
and they are not access paths - table access by index rowid, that's an access path.  Sort group by/Hash group by is a step in a plan, it isn't used as an access path to data. 
 
 
Internal Sorts
Snehasish Das, December  14, 2012 - 1:41 am UTC
 
 
Hi Tom,
I read your articles about sort in distinct and group by. Also seen the examples where Hash Group by is used in Grouping operations. 
Can you please let me know which operations in oracle cause Internal sorting, i.e may use SORT_AREA.
One is order by, other can be UNION/Minus.
Regards,
Snehasish Das 
December  17, 2012 - 4:07 pm UTC 
 
Can you please let me know which operations in oracle cause Internal sorting, 
i.e may use SORT_AREA.
too numerous to list, if you see "order" in the plan, then you found one.  most all of the set operations (union, intersect, minus, distinct...) and many functions (like analytics), join operations (like sort merge joins), etc.
and we don't really use sort_area anymore, we use PGA automatic memory management and that ignores sort_area_size and such. 
 
 
Ranjan, May       25, 2013 - 3:21 pm UTC
 
 
Hi Tom,
I found some strange thing , please have a look on this.
SQL> drop table tt;
Table dropped.
SQL> create table tt(a number,b number);
Table created.
SQL> insert into tt select level,level+1 from dual connect by level <6 union all select 1,2 from dua
l;
6 rows created.
SQL> commit;
Commit complete.
SQL> select * from tt;
         A          B
---------- ----------
         1          2
         2          3
         3          4
         4          5
         5          6
         1          2
6 rows selected.
SQL> select distinct a from tt order by b;
select distinct a from tt order by b
                                   *
ERROR at line 1:
ORA-01791: not a SELECTed expression -- Here as we know, order by would work at the end then why this query is throwing error.(but gone if order by column is in select list).
SQL> select distinct a from tt order by a;
         A
----------
         1
         2
         3
         4
         5
SQL> select distinct a from tt where b=2 order by b;
         A
----------
         1
SQL> select distinct a from tt where b in(2) order by b;
         A
----------
         1
SQL> select distinct a from tt where b>2 order by b;
select distinct a from tt where b>2 order by b
                                             *
ERROR at line 1:
ORA-01791: not a SELECTed expression
SQL> select distinct a from tt where b<3 order by b;
select distinct a from tt where b<3 order by b
                                             *
ERROR at line 1:
ORA-01791: not a SELECTed expression
Here why it is throwing error as data concerned b<3
 is equivalent as b=2(coz only less that b data is 2 here).
SQL> select * from tt;
         A          B
---------- ----------
         1          2
         2          3
         3          4
         4          5
         5          6
         1          2
6 rows selected.
SQL> select distinct a from tt where b=b order by b;
select distinct a from tt where b=b order by b
                                             *
ERROR at line 1:
ORA-01791: not a SELECTed expression
--even got error here .
Could you please give your thoughts for this strange behaviour.
thanks as always. 
 
May       29, 2013 - 5:04 pm UTC 
 
where b=2 order by b;
is the same as:
where b=2;
the order by is known to be not necessary and disappears.
same with in.
anytime the optimizer KNOWS there is only one value for B, it knows the order by is not necessary (you are ordering by a constant at that point! just b=2)
where b=b returns ALL rows in this case, the optimizer knows this and throws the error "I cannot sort by B, B is not selected" 
 
 
got few and have doubt on few
Ranjan, May       30, 2013 - 6:41 am UTC
 
 
Hi Tom,
Thanks for your reply.
Yes that is true b=2 or b in (2) order by b will be disappear(coz it has nothing coz b's values will be 2 anyway).
I didnt understand b=b will return all rows,but I think there is a distinct clause so why it will return all rows and why the optimizer will not able to do order by b.
And could you please tell why it is throughing error in case of b<3.
Thanks as always,
Ranjan 
May       30, 2013 - 2:50 pm UTC 
 
you don't understand why "b=b" would return all rows in your case?
You have no nulls.
In ever row, B in that row is equal to B in that row.
So, every row will be returned.  
where b=b would only NOT return a row if B were NULL.
when you have b<3, you have the ability to return b=2, b=1, b=0, b=..........  you could be returning all rows in the table with many different values of B.  Your data might not allow it right this second, but a simple insert could change that.  
IF and ONLY IF the optimizer is able to mathematically prove that B is constant in the result set can it optimize away "b".
 
 
 
cont to my last post
Ranjan, May       30, 2013 - 4:48 pm UTC
 
 
thank you for your nice explanation.
I was thinking how distinct + b=b will return all row because there is one duplicate record in my data :) 1  2 and 1 2 .
your last line actually makes everything clear now.
thanks again.