Thanks Tom
Pankaj, January 31, 2005 - 9:38 am UTC
As always, thanks tom.
DISTINCT issues
Mike Angelastro, December 19, 2005 - 2:33 pm UTC
Isn't using a "DISTINCT" sometimes a sign of a query that hasn't been fully thought out? Whenever I create a query, I run it with and without a "DISTINCT" and, if there is a difference in the record counts, I try to figure out why. Usually, if the record counts are different, there is something I hadn't considered.
December 19, 2005 - 4:04 pm UTC
it *could* mean that.
it doesn't have to mean that.
A reader, January 19, 2006 - 3:36 am UTC
Direct Answer
A reader, May 11, 2006 - 8:40 pm UTC
It is always nice to see an answer backed up with data rather than conjecture.
Distinct vs Group By
Victor, October 05, 2006 - 4:13 am UTC
Hi there. I disagree with the statement that they are the same.
I had a statement, built of two nested "group by" tables, and it hung... changed to "distinct" and voila! 1.5min execution.
I think distinct *pure thumbsuck* only takes one of each result value, where group by takes all results, and summarises the result into unique values. This is support by the added functionality group by has - HAVING!
ok then
Victor
October 05, 2006 - 8:14 am UTC
they are the same in that the results they return are ....... ta-dah - the same. They have the same effect.
Just like:
select * from t where x in ( select y from t2 )
is the "same" as:
select t.* from t, (select distinct y from t2) t2 where t.x = t2.y;
they may well perform differently (different plans) but they are "interchangable, the same"
DISTINCT de-mythified by Prometheus
Duke Ganote, October 05, 2006 - 9:55 am UTC
Oh, this takes me back-- one of the rule-of-thumb (ROT) myths I remember hearing from crusty DBAs when I started working with Oracle DBMS late last century:
* "always use GROUP BY instead of DISTINCT because it performs better."
* "Never use outer joins; make the application do that"
My first purchased book, "Oracle 8i: The Complete Reference", had a scant page on explain plan that illustrated just a few lines of cryptic-looking output...
Then early in this century I noticed a piece in Oracle magazine by a guy who knew how to get performance statistics from the DBMS, interpret the results, and who used the experimental method to test hypotheses. I was impressed. And he writes some great books!
Thanks (exponentially), Tom !
David Aldridge, October 05, 2006 - 5:03 pm UTC
I ran exactly the same test in 10.2 just to confirm that nothing about the HASH GROUP BY changed this, and noticed that the distinct query used HASH UNIQUE, which made me initially believe that both operations were still internally the same.
However the tkprof'd trace file showed the DISTINCT operation using more CPU than the GROUP BY ...
select distinct owner, object_name, object_type
from
t
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 10095 1.85 1.78 45 2106 0 151374
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10101 1.85 1.79 45 2106 0 151374
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62
Rows Row Source Operation
------- ---------------------------------------------------
50458 HASH UNIQUE (cr=702 pr=45 pw=45 time=2405972 us)
50750 TABLE ACCESS FULL T (cr=702 pr=0 pw=0 time=2334900 us)
********************************************************************************
********************************************************************************
select owner, object_name, object_type from t group by owner, object_name,
object_type
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 10095 1.18 1.08 0 2106 0 151374
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10101 1.19 1.08 0 2106 0 151374
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62
Rows Row Source Operation
------- ---------------------------------------------------
50458 HASH GROUP BY (cr=702 pr=0 pw=0 time=624258 us)
50750 TABLE ACCESS FULL T (cr=702 pr=0 pw=0 time=507831 us)
********************************************************************************
If you can't reproduce at your end Tom then I can send trace files and whatnot.
October 06, 2006 - 8:26 am UTC
I would say those numbers are close enough for me to call them "the same" over time. (look at your elapsed times.... for example)
ops$tkyte%ORA10GR2> create table t as select * from all_objects;
Table created.
ops$tkyte%ORA10GR2> insert /*+ APPEND */ into t select * from t;
50036 rows created.
ops$tkyte%ORA10GR2> commit;
Commit complete.
ops$tkyte%ORA10GR2> insert /*+ APPEND */ into t select * from t;
100072 rows created.
ops$tkyte%ORA10GR2> commit;
Commit complete.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
2 for i in 1 .. 1
3 loop
4 for x in (select distinct owner, object_name, object_type from t)
5 loop
6 null;
7 end loop;
8 for x in (select owner, object_name, object_type from t group by owner, object_name, object_type)
9 loop
10 null;
11 end loop;
12 end loop;
13 end;
14 /
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';
Session altered.
ops$tkyte%ORA10GR2> begin
2 for i in 1 .. 5
3 loop
4 for x in (select distinct owner, object_name, object_type from t)
5 loop
6 null;
7 end loop;
8 for x in (select owner, object_name, object_type from t group by owner, object_name, object_type)
9 loop
10 null;
11 end loop;
12 end loop;
13 end;
14 /
PL/SQL procedure successfully completed.
SELECT DISTINCT OWNER, OBJECT_NAME, OBJECT_TYPE FROM T
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 5 0.00 0.00 0 0 0 0
Fetch 2475 1.54 1.59 165 13805 0 247425
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2481 1.54 1.59 165 13806 0 247425
Rows Row Source Operation
------- ---------------------------------------------------
247425 HASH UNIQUE (cr=13805 pr=165 pw=165 time=1381040 us)
1000720 TABLE ACCESS FULL T (cr=13805 pr=0 pw=0 time=1000940 us)
********************************************************************************
SELECT OWNER, OBJECT_NAME, OBJECT_TYPE FROM T GROUP BY OWNER, OBJECT_NAME, OBJECT_TYPE
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 5 0.00 0.00 0 0 0 0
Fetch 2475 1.51 1.86 0 13805 0 247425
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2481 1.51 1.86 0 13806 0 247425
Rows Row Source Operation
------- ---------------------------------------------------
247425 HASH GROUP BY (cr=13805 pr=0 pw=0 time=1573230 us)
1000720 TABLE ACCESS FULL T (cr=13805 pr=0 pw=0 time=1000937 us)
Chi H, December 07, 2006 - 8:48 pm UTC
I couldn't reproduce this, but found some production data that resembled the following:
select distinct id from t1@dblink
where id = 123;
-- returned one record
select id from t1@dblink
where id = 123;
-- returns 3 records
select *
from t2
where id in (select distinct id from t1@dlink where id = 123);
-- returned 3 records
Like I said, I couldn't create a test case to demonstrate this. Do you have any idea how this could be?
December 08, 2006 - 7:33 am UTC
ops$tkyte%ORA10GR2> create table t1 ( id int );
Table created.
ops$tkyte%ORA10GR2> create table t2 ( id int );
Table created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 select 123 from all_users where rownum <= 3;
3 rows created.
ops$tkyte%ORA10GR2> insert into t2 select 123 from all_users where rownum <= 3;
3 rows created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select distinct id from t1 where id = 123;
ID
----------
123
ops$tkyte%ORA10GR2> select id from t1 where id = 123;
ID
----------
123
123
123
ops$tkyte%ORA10GR2> select * from t2 where id in (select distinct id from t1 where id = 123);
ID
----------
123
123
123
<b>the cardinality of a result returned from T2 isn't going to be affected by the cardinality of the subquery against t1 - that is LOSE THE DISTINCT, it is not relevant at all.
T2 must have three records such that ID=123, that is all - no big mystery really...
</b>
LOSE THE DISTINCT
Matthew, December 08, 2006 - 8:48 am UTC
Or move it to the outermost SELECT if you just want distinct records
Chi H, December 08, 2006 - 2:28 pm UTC
I had tried this test case before my posting yesterday:
SQL> create table t1 (a number, b varchar2(10));
Table created.
SQL> insert into t1 values (1, 'A');
1 row created.
SQL> create table t2 (a number);
Table created.
SQL> insert into t2 values (1)
2 ;
1 row created.
SQL> insert into t2 values (1);
1 row created.
SQL> commit;
Commit complete.
SQL> select distinct a from t2;
A
----------
1
SQL> select * from t1
2 where a in (select a from t2);
A B
---------- ----------
1 A
SQL> select * from t1
2 where a in (select distinct a from t2);
A B
---------- ----------
1 A
I'm feeling a bit too dense. I don't understand the difference between our results.
December 09, 2006 - 12:41 pm UTC
umm, I selected from t2, not t1 and I had different numbers of rows.
so, not sure where you are going with this.
DISTINCT vs GROUP BY?
Alejandro Daza, December 09, 2006 - 10:13 am UTC
Hey David Aldridge, that test you did is not the same, you have to create the index that Tom´s create.
A conclusion of these tests, it´s that you have to evaluate your queries, ONE by ONE, and you have to make proper indexes to faster up operations like hash. Only with time (expertise (30%)) and the way you perform your architect (or your program data IO structure (70%)), you would prefer to make some operation or another.
My conclusions here are:
1. If you have sort unique, hey I think could be more faster or the same as group by unique
2. If you have hash unique consider a index to cenvert it to sort unique
3. If you can´t index it all, do the group by operation,
BUT REMEMBER TRY ALWAYS TO EVALUATE FIRST
Chi H, December 10, 2006 - 3:06 pm UTC
Tom,
I just took a closer look and see where we are on different pages. My fault for not being clear.
As I mentioned, I created the test case before posting the question, so I just copied and pasted. It just happened that we used the same table names.
My test case shows that t1 has only 1 record and t2 has 2 records. Doing a select from t1 where id in t2 should only give one record. In our production environment, this type of scenario returned multiple records (3 in the production system).
I was just wondering if you could explain it. Your test case is different from the scenario.
December 10, 2006 - 7:41 pm UTC
ok, tell you what - you post the 100% complete, concise, yet 100% here test case - and let us look at it.
far far far too confusing to piece all of the tiny bits together here.
Chi H, December 11, 2006 - 3:09 pm UTC
Tom,
Sorry to have confused you. As I mentioned, I couldn't create a test case, and was hoping you knew something from the back of your mind about this situation.
I think the next best thing is to ping support to have them take a look.
Thanks for the follow-up.
December 11, 2006 - 7:18 pm UTC
nope, need test case - not following your sequence of events in my head - need to see it STEP by STEP
Is this a bug?
A reader, January 10, 2007 - 4:46 pm UTC
SQL> select object_type from dba_objects where owner='SYSTEM' and status='INVALI
D' order by owner;
OBJECT_TYPE
-------------------
PROCEDURE
FUNCTION
PACKAGE BODY
PACKAGE BODY
PROCEDURE
PROCEDURE
PROCEDURE
FUNCTION
PROCEDURE
9 rows selected.
SQL> select distinct object_type from dba_objects where owner='SYSTEM' and statu
s='INVALID' order by owner;
OBJECT_TYPE
-------------------
PROCEDURE
PACKAGE BODY
FUNCTION
SQL> select distinct object_type from dba_objects where owner like 'SYSTEM' and statu
s='INVALID' order by owner
*
ERROR at line 1:
ORA-01791: not a SELECTed expression
SQL>
Oracle 10g r2.
Thanks
please think about all posibilities
deccar, March 26, 2007 - 4:32 am UTC
fools... what will be happen if you use an aggregation function with a group by clause when you really wanted to use distinct? well I'll tell you, your results will be erroneous, cause the function DOES use all the resulting tuples, not only the ones youre seeing.
March 26, 2007 - 7:41 am UTC
brilliant.
absolute hogwash as stated
but well done, thanks so much for your "input"
Please be professional !!
Tom Admirer, March 26, 2007 - 2:37 pm UTC
Tom,
The above review comments do not adhere to the professional standards. People participating here should apply some professional standards and avoid using some coarse or vulgar words...Tom...you should take this seriously and discourage them (as you do it for the people who do the IM talk)...Probably you should consider removing the previous review from deccar.
Well tom, it finally rests in your discretion.
To all other readers and reviewers, please be professional in here...it is not the site to vent your anger...Please express your ideas in a more elegant/proper way.
March 27, 2007 - 9:16 am UTC
moderating is a slippery slope. where does it end.
I don't moderate, but I have no problem making fun :)
So, if we either
o laugh at them
o ignore them
which ever you prefer - they lose, we win.
How would I do that?
daniel, May 03, 2007 - 5:02 pm UTC
create table test_table
(t_id number,
t_name varchar2(10)
);
insert into test_table VALUES (1, 'RAIN')
/
insert into test_table VALUES (2, 'RAIN')
/
insert into test_table VALUES (3, 'RAIN')
/
insert into test_table VALUES (1, 'FOG')
/
insert into test_table VALUES (2, 'FOG')
/
My goal is to select records (all fields) based on the t_name being distinct. I just need to get 1 record for 'RAIN' AND 1 for 'FOG'. Doesn't matter which one. Distinct will look on all the fields and will return 5 rows.
May 04, 2007 - 12:48 pm UTC
this seems to work
daniel, May 03, 2007 - 5:12 pm UTC
Would this be correct?
select distinct t_name, first_value(t_id) over (partition by t_name)
from test_table
May 04, 2007 - 12:48 pm UTC
So am I just getting lucky?
daniel, May 04, 2007 - 3:20 pm UTC
SQL> select * from test_table;
T_ID T_NAME
---------- ----------
1 RAIN
3 RAIN
1 FOG
2 FOG
SQL> select distinct t_name, first_value(t_id) over (partition by t_name)
2 from test_table
3 /
T_NAME FIRST_VALUE(T_ID)OVER(PARTITIONBYT_NAME)
---------- ----------------------------------------
FOG 1
RAIN 1 1
May 04, 2007 - 4:06 pm UTC
sorry, did not see the distinct
yes, that will work as well
Distinct on Analytics?
Tom Admirer, May 05, 2007 - 10:06 pm UTC
Tom,
I do not think that you would appreciate the idea of using "DISTINCT" on analytics function. Do you?
So the better idea would be to use any of the three ways you suggested:
select t_id,t_name from
(select a.*,row_number() over (partition by t_name order by t_id) rn from test_table a )
where rn=1
SCOTT @ MYDB> /
T_ID T_NAME
---------- ----------
1 FOG
1 RAIN
Or is there any workaround to avoid the distinct?
May 08, 2007 - 10:22 am UTC
you just demonstrated one way.
I pointed you to three ways.
Probably no difference
Mark Brady, May 07, 2007 - 10:58 am UTC
The Analytic function and the Distinct will both cause a sort - I believe. Did you cost both out? don't just guess if distinct is worse, show that it is.
It's true that I usually tell my developers that if you have to add distinct to make the results correct and you don't know why, it's a missed join causing a cartesian. It's also true that if you use an AF to do normal aggregation you'll get back a lot of duplicate rows that you have to eliminate. (There's another post somewhere on this site where TK does show this point and that the Distinct is ok)
Although I think you tried to simplify your example too much and you made it non-sensical.
May 08, 2007 - 11:07 am UTC
No, the distinct will be in general much worse - the optimizer recognizes top-n quereis with row_number()
T is a copy of all objects:
select *
from (select t.*, row_number() over (partition by owner order by created) rn
from t )
where rn = 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3 0.20 0.20 3 692 4 30
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 0.21 0.21 3 693 4 30
Rows Row Source Operation
------- ---------------------------------------------------
30 VIEW (cr=692 pr=3 pw=3 time=206306 us)
68 WINDOW SORT PUSHED RANK (cr=692 pr=3 pw=3 time=206278 us)
49998 TABLE ACCESS FULL T (cr=692 pr=0 pw=0 time=600043 us)
********************************************************************************
select distinct owner,
first_value( OWNER ) over (partition by owner ),
first_value( OBJECT_NAME ) over (partition by owner ),
first_value( SUBOBJECT_NAME ) over (partition by owner ),
first_value( OBJECT_ID ) over (partition by owner ),
first_value( DATA_OBJECT_ID ) over (partition by owner ),
first_value( OBJECT_TYPE ) over (partition by owner ),
first_value( CREATED ) over (partition by owner ),
first_value( LAST_DDL_TIME ) over (partition by owner ),
first_value( TIMESTAMP ) over (partition by owner ),
first_value( STATUS ) over (partition by owner ),
first_value( TEMPORARY ) over (partition by owner ),
first_value( GENERATED ) over (partition by owner ),
first_value( SECONDARY ) over (partition by owner )
from t
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3 0.96 1.93 2713 692 16 30
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 0.96 1.94 2713 693 16 30
Rows Row Source Operation
------- ---------------------------------------------------
30 HASH UNIQUE (cr=692 pr=2713 pw=2024 time=1935907 us)
49998 WINDOW SORT (cr=692 pr=2713 pw=2024 time=2031276 us)
49998 TABLE ACCESS FULL T (cr=692 pr=0 pw=0 time=350063 us)
orafan, May 09, 2007 - 5:29 pm UTC
I have a table with receipe and ingredient information.
Pleas help me out to write a query to bring all receipes which has 'ING1' and 'ING2' in it .So in this case the result is receipe1 and receipe2.
create table abc ( receipe varchar2(10) ,ingredient varchar2(10))
/
insert into abc values ('Receipe1', 'ING1')
/
insert into abc values ('Receipe1', 'ING2')
/
insert into abc values ('Receipe1', 'ING3')
/
insert into abc values ('Receipe2', 'ING1')
/
insert into abc values ('Receipe2', 'ING3')
/
insert into abc values ('Receipe3', 'ING1')
/
insert into abc values ('Receipe3', 'ING2')
/
insert into abc values ('Receipe3', 'ING10')
/
May 11, 2007 - 10:32 am UTC
smells like homework.
you should be able to come with with at least 5 ways to do this... go for it, on a scale of 1 to 10 sql-wise, this is a 2 in terms of complexity (think GROUP BY and HAVING).
orafan, May 09, 2007 - 10:17 pm UTC
Forgot to maintain that I am looking for a sql solution without using set operation.
May 11, 2007 - 10:33 am UTC
"sql solution without using a set operation"
well, that is going to be HARD :)
given that, well, SELECT is a set operation.
yaz, May 10, 2007 - 3:47 am UTC
select receipe from abc
where receipe in (select receipe from abc where ingredient = 'ING1')
and ingredient = 'ING2';
Analytics Way
A reader, May 11, 2007 - 9:05 pm UTC
select receipe from (
select receipe,count(case when ingredient = 'ING1' then ingredient end) CNT_ING1,
count(case when ingredient = 'ING2' then ingredient end) CNT_ING2
from abc group by receipe) where CNT_ING1>=1 and CNT_ING2>=1 order by 1
How about this?
May 14, 2007 - 12:54 pm UTC
that is not analytics, that is aggregation
select recipe
from table
where recipe in ( 'ING1', 'ING2' )
having count(distinct ingredient) = 2;
would be a "better" way to use aggregation....
Small Correction
A reader, May 14, 2007 - 4:40 pm UTC
Tom,
Small correction in your query, Probably this is what you meant...The logic was good.
select receipe from abc where ingredient in ( 'ING1', 'ING2' ) group by receipe having count(distinct ingredient) = 2;
May 14, 2007 - 5:30 pm UTC
correct, where ingredient in ....
Trick Question.. sort of!
Richard Armstrong-Finnerty, May 16, 2007 - 7:53 am UTC
The question is "a query to bring all receipes which has 'ING1' and 'ING2' in it .So in this case the result is receipe1 and receipe2"... which is impossible, as receipe2 does not have ING2! The 2 receipes (sic) that do have ING1 & ING2 are receipe1 & receipe3.
i love you tom!
dfxgirl, March 26, 2008 - 12:23 pm UTC
yeah that works! Tom is so sexy... uh.. i´m wet...
A reader, April 16, 2008 - 11:38 pm UTC
Different results with Distinct vs. Group By
Dylan, October 31, 2008 - 6:33 pm UTC
While doing some performance turning on a procedure, I came across a case where not only does the performance vary between a statement using distinct vs. group by, but I'm also getting different results. This had me scratching my head a little bit...
EUSER@stdv > set autotrace traceonly
EUSER@stdv > select billing_day
2 from a_table
3 group by billing_day;
<b>138 rows selected.
Elapsed: 00:00:00.78</b>
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 136 | 1088 | 485 |
| 1 | SORT GROUP BY | | 136 | 1088 | 485 |
| 2 | TABLE ACCESS FULL| A_TABLE | 617K| 4826K| 105 |
-----------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
- cpu costing is off (consider enabling it)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2703 consistent gets
2692 physical reads
188 redo size
1913 bytes sent via SQL*Net to client
300 bytes received via SQL*Net from client
11 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
138 rows processed
EUSER@stdv > select distinct billing_day
2 from a_table;
<b>137 rows selected.
Elapsed: 00:00:23.03</b>
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 137 | 1096 | 11407 |
| 1 | SORT UNIQUE | | 137 | 1096 | 11407 |
| 2 | TABLE ACCESS FULL| A_TABLE | 14M| 111M| 2437 |
------------------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
- cpu costing is off (consider enabling it)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
63811 consistent gets
63633 physical reads
2464 redo size
1981 bytes sent via SQL*Net to client
300 bytes received via SQL*Net from client
11 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
137 rows processed
A_TABLE is a relatively small staging table for a data warehouse which contains < 350,000 rows. Since it's a dev environment and I didn't know what condition the table was in, I reorged it and analyzed it just for kicks. That had no impact.
Any thought on what could be causing something like this? We're on 10.2.0.4, but optimizer is still on 9.2.
November 02, 2008 - 4:59 pm UTC
you would need to provide a full example
and of course, if you believe you are getting a wrong result - especially since you obviously have support - you would file it with them - right??
Ignore my last
Dylan, November 03, 2008 - 9:42 am UTC
Figured out what it was. User error after a long week.
An example where 'distinct' and 'group by' are not the same
Jack Douglas, May 02, 2011 - 5:11 am UTC
with w as (select round(level/2) as id from dual connect by level < 11)
select distinct id, count(*) over (partition by id) from w;
with w as (select round(level/2) as id from dual connect by level < 11)
select id, count(*) over (partition by id) from w group by id;
May 04, 2011 - 12:12 pm UTC
yes, true, because analytics are done after the where clause/aggregation takes place...
Question?
chithambaram.p, May 24, 2011 - 11:57 pm UTC
Hi Tom,
How can I get distinct value from the table without using DISTINCT and GROUP BY...Now am using query
select distinct col_name from table1
where col_name>= 0
group by col_name order by col_name
this query is taking long time bcoz the table has 6 crores records..
how can i get good performance plz help me?
May 25, 2011 - 11:26 am UTC
if you have an index on col_name, we can index fast full scan that instead of the table - but distinct is going to be what you use.
There are other ways, but they certainly won't be faster.
@ chithambaram.p
Sokrates, May 25, 2011 - 11:48 am UTC
How can I get distinct value from the table without using DISTINCT and GROUP BY...
query
try UNIQUE
May 25, 2011 - 2:17 pm UTC
;) good one, I should have thought of that - as "select unique" is the same as "select distinct"
Just like "select" is the same as "select all"
not distinct, or unique, or group by but ...
Dave, May 25, 2011 - 10:44 pm UTC
select stuff from mytab
union
select stuff from mytab where 1 = 0;
(sorry :-))
How about this one?
Nathan Marston, May 26, 2011 - 9:56 pm UTC
SELECT stuff
from mytab t1
where not exists
(select null
from mytab t2
where t2.rowid < t1.rowid
and t2.col1 = t1.col1
and t2.col2 = t2.col2
... repeat for each column in "stuff"
);
The thinking is that wherever you get a "duplicated" row, all the columns will match - but you want one of the duplicate rows (not zero), so the rowid comparison ensures you just get the one with minimum rowid.
I'm not convinced it even works in the face of nullable columns - I'm thinking NULL values will cause it to behave differently from a DISTINCT/GROUP BY.
You could work your way around that using nvl (and a sentinel value for NULL), but ugh... I'd rather not think about that. That SQL's ugly enough as it is :).
May 27, 2011 - 10:31 am UTC
what about nulls indeed.
it won't work with nulls like distinct, unique or group by would.
A reader, May 27, 2011 - 2:51 am UTC
Hi all,
Thanks for your reply...I tried all this way...
but still its very slow...and the column also has index..
the reason is am using this query in PRO*C... and the query is condition based if it is pl/sql then we can create seperate view for this query...but here there is no way to do...
here is my Execution Plan...plz help me to get performance in this...
SQL PLAN REPORT
------------------------------------------------------------------------------
SELECT STATEMENT Cost = 72928
2.1 SORT GROUP BY
3.1 INDEX FAST FULL SCAN COL1_IDX3_NEW INDEX
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
---------------------------------------------- ------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 108 | | 108K (4)|
|* 1 | FILTER | | | | | |
| 2 | SORT GROUP BY | | 2 | 108 | | 108K (4)|
|* 3 | HASH JOIN | | 18M| 932M| 425M| 105K (2)|
|* 4 | INDEX FAST FULL SCAN| COL1_IDX3_NEW | 13M| 275M| | 28472 (3)|
|* 5 | INDEX FAST FULL SCAN| COL1_IDX3_ NEW | 13M| 400M| | 28472 (3)|
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWID=MIN(ROWID))
3 - access("COL1"="A"."COL1")
4 - filter("COL1">0)
5 - filter("COL1">0)
Thanks to All..
by
Chithu...
May 27, 2011 - 10:40 am UTC
I don't know who you are or what you are talking about "reader"
no query to look at, no real context (you cannot be the one from above since they were obviously using a different query, they were using:
select distinct col_name from table1
where col_name>= 0
group by col_name order by col_name
Sambhav, May 28, 2011 - 5:55 am UTC
Hi Tom,
How can we find the sum of each depatment salary without using group by clause.
Regards
Sambhav
May 31, 2011 - 9:45 am UTC
please ask questions in ONE and ONLY ONE place. Look in the other place you asked (and I answered) this same exact question.
answer to sambhav
Mani, May 30, 2011 - 6:06 am UTC
Sambhav, write a plsql procedure where you sum all the salary departmentwise and store it in a temporaray table and do a select from there.
its what is simply re-inventing the wheel :)..any reason as to why group by should not be used??
@Sambhav
A reader, May 30, 2011 - 8:16 am UTC
Sambhav,
may be you can use the analytical function:
<code><i>select distinct * from (select deptno, sum(sal) over(partition by deptno order by deptno)sum_over_dept from emp);</i>
Data:
DeptNo Sal
10 300
20 200
10 200
Result of query:
DeptNo Sum_Over_Dept
10 500
20 200
</code>
SORT GROUP BY Vs HASH GROUP BY
Rajeshwaran, Jeyabal, June 09, 2011 - 12:12 pm UTC
Tom:
If i remember correct, In the previous release of Oracle, we have SORT ORDER BY access path, if a query has Order by clause.
but in Oracle 10g, I don't see that happening
SORT GROUP BY - if Group by & Order by in query
HASH GROUP BY - if only Group by and Order by is missing
SORT ORDER BY - if only no group by and only order by clause.
rajesh@ORA10GR2> set autotrace traceonly explain;
rajesh@ORA10GR2>
rajesh@ORA10GR2> select owner,object_name,count(*)
2 from big_table
3 group by owner,object_name
4 order by owner,object_name;
48441 rows selected.
Elapsed: 00:00:59.88
Execution Plan
----------------------------------------------------------
Plan hash value: 3184991183
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 437K| 12M| | 84566 (4)| 00:16:55 |
| 1 | SORT GROUP BY | | 437K| 12M| 505M| 84566 (4)| 00:16:55 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 12M| 377M| | 41518 (2)| 00:08:19 |
----------------------------------------------------------------------------------------
rajesh@ORA10GR2>
rajesh@ORA10GR2> select owner,object_name,count(*)
2 from big_table
3 group by owner,object_name;
48441 rows selected.
Elapsed: 00:00:46.22
Execution Plan
----------------------------------------------------------
Plan hash value: 1753714399
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 437K| 12M| | 84566 (4)| 00:16:55 |
| 1 | HASH GROUP BY | | 437K| 12M| 505M| 84566 (4)| 00:16:55 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 12M| 377M| | 41518 (2)| 00:08:19 |
----------------------------------------------------------------------------------------
rajesh@ORA10GR2> select owner,object_name,cnt
2 from (
3 select owner,object_name,count(*) as cnt
4 from big_table
5 group by owner,object_name
6 ) order by owner,object_name;
48441 rows selected.
Elapsed: 00:01:06.73
Execution Plan
----------------------------------------------------------
Plan hash value: 3184991183
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 437K| 12M| | 84566 (4)| 00:16:55 |
| 1 | SORT GROUP BY | | 437K| 12M| 505M| 84566 (4)| 00:16:55 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 12M| 377M| | 41518 (2)| 00:08:19 |
----------------------------------------------------------------------------------------
rajesh@ORA10GR2>
rajesh@ORA10GR2> select owner,object_name
2 from big_table
3 order by owner,object_name
4 /
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1472477105
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12M| 377M| | 152K (3)| 00:30:32 |
| 1 | SORT ORDER BY | | 12M| 377M| 977M| 152K (3)| 00:30:32 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 12M| 377M| | 41518 (2)| 00:08:19 |
----------------------------------------------------------------------------------------
rajesh@ORA10GR2>
Questions1)What does this SORT GROUP BY & HASH GROUP BY is really about? I am checking in new features guide. but i dont find there
http://download.oracle.com/docs/cd/B19306_01/server.102/b14214/toc.htm
June 09, 2011 - 12:20 pm UTC
I'm very confused by this question.
You say
If i remember correct, In the previous release of Oracle, we have SORT ORDER BY access path, if a query has Order by clause.
but in Oracle 10g, I don't see that happening
SORT GROUP BY - if Group by & Order by in query
HASH GROUP BY - if only Group by and Order by is missing
SORT ORDER BY - if only no group by and only order by clause.
and then proceeded to post and example that contradicts 100% what you just said. We see that in queries with order bys, it tends to use a sort order/group by.
Which is what you just said you didn't see happening?????
and they are not access paths - table access by index rowid, that's an access path. Sort group by/Hash group by is a step in a plan, it isn't used as an access path to data.
Internal Sorts
Snehasish Das, December 14, 2012 - 1:41 am UTC
Hi Tom,
I read your articles about sort in distinct and group by. Also seen the examples where Hash Group by is used in Grouping operations.
Can you please let me know which operations in oracle cause Internal sorting, i.e may use SORT_AREA.
One is order by, other can be UNION/Minus.
Regards,
Snehasish Das
December 17, 2012 - 4:07 pm UTC
Can you please let me know which operations in oracle cause Internal sorting,
i.e may use SORT_AREA.
too numerous to list, if you see "order" in the plan, then you found one. most all of the set operations (union, intersect, minus, distinct...) and many functions (like analytics), join operations (like sort merge joins), etc.
and we don't really use sort_area anymore, we use PGA automatic memory management and that ignores sort_area_size and such.
Ranjan, May 25, 2013 - 3:21 pm UTC
Hi Tom,
I found some strange thing , please have a look on this.
SQL> drop table tt;
Table dropped.
SQL> create table tt(a number,b number);
Table created.
SQL> insert into tt select level,level+1 from dual connect by level <6 union all select 1,2 from dua
l;
6 rows created.
SQL> commit;
Commit complete.
SQL> select * from tt;
A B
---------- ----------
1 2
2 3
3 4
4 5
5 6
1 2
6 rows selected.
SQL> select distinct a from tt order by b;
select distinct a from tt order by b
*
ERROR at line 1:
ORA-01791: not a SELECTed expression -- Here as we know, order by would work at the end then why this query is throwing error.(but gone if order by column is in select list).
SQL> select distinct a from tt order by a;
A
----------
1
2
3
4
5
SQL> select distinct a from tt where b=2 order by b;
A
----------
1
SQL> select distinct a from tt where b in(2) order by b;
A
----------
1
SQL> select distinct a from tt where b>2 order by b;
select distinct a from tt where b>2 order by b
*
ERROR at line 1:
ORA-01791: not a SELECTed expression
SQL> select distinct a from tt where b<3 order by b;
select distinct a from tt where b<3 order by b
*
ERROR at line 1:
ORA-01791: not a SELECTed expression
Here why it is throwing error as data concerned b<3
is equivalent as b=2(coz only less that b data is 2 here).
SQL> select * from tt;
A B
---------- ----------
1 2
2 3
3 4
4 5
5 6
1 2
6 rows selected.
SQL> select distinct a from tt where b=b order by b;
select distinct a from tt where b=b order by b
*
ERROR at line 1:
ORA-01791: not a SELECTed expression
--even got error here .
Could you please give your thoughts for this strange behaviour.
thanks as always.
May 29, 2013 - 5:04 pm UTC
where b=2 order by b;
is the same as:
where b=2;
the order by is known to be not necessary and disappears.
same with in.
anytime the optimizer KNOWS there is only one value for B, it knows the order by is not necessary (you are ordering by a constant at that point! just b=2)
where b=b returns ALL rows in this case, the optimizer knows this and throws the error "I cannot sort by B, B is not selected"
got few and have doubt on few
Ranjan, May 30, 2013 - 6:41 am UTC
Hi Tom,
Thanks for your reply.
Yes that is true b=2 or b in (2) order by b will be disappear(coz it has nothing coz b's values will be 2 anyway).
I didnt understand b=b will return all rows,but I think there is a distinct clause so why it will return all rows and why the optimizer will not able to do order by b.
And could you please tell why it is throughing error in case of b<3.
Thanks as always,
Ranjan
May 30, 2013 - 2:50 pm UTC
you don't understand why "b=b" would return all rows in your case?
You have no nulls.
In ever row, B in that row is equal to B in that row.
So, every row will be returned.
where b=b would only NOT return a row if B were NULL.
when you have b<3, you have the ability to return b=2, b=1, b=0, b=.......... you could be returning all rows in the table with many different values of B. Your data might not allow it right this second, but a simple insert could change that.
IF and ONLY IF the optimizer is able to mathematically prove that B is constant in the result set can it optimize away "b".
cont to my last post
Ranjan, May 30, 2013 - 4:48 pm UTC
thank you for your nice explanation.
I was thinking how distinct + b=b will return all row because there is one duplicate record in my data :) 1 2 and 1 2 .
your last line actually makes everything clear now.
thanks again.