Thanks for the question, Indrakala.

Asked: August 31, 2002 - 8:00 pm UTC

Last updated: July 16, 2012 - 3:03 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Tom,
Is there any advantage of using primary keys in the GROUP BY clause. Will the query performance improve that way? I believe that it doesnt and all you have to take care is that your sortkey should be as small a value as possible. Is it correct?

regards
ik

and Tom said...

Umm -- the columns in a group by clause are dictated PURELY by the question being asked. It has nothing to do with performance.

Besides, if the primary key is in the group by - the group by is REDUNDANT isn't it. I mean, you'll NEVER have two rows with the same keys, so the group by is not relevant.

Rating

(47 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Just clarifying

ik, August 31, 2002 - 8:32 pm UTC

Thanks Tom. Yes, GROUP BY on Primary Keys will result in only one row, but my case is when multiple tables are joined.
For example :-
I have a table tab1 having primary key on col_pk. col_id is another field which is unique for a given col_pk. There is a child table having multiple values for a given col_pk. So, if i want to select statistics grouped on col_pk or col_id from a child table and i want col_id in the selected list, i would naturally group on col_id. My question is that would there be a difference in performance if the gouping was on the primary key field?

Also, can you please explain how oracle computes group by. Does it happen after all data is fetched? What is the sequence if a query has GROUP BY, Analytic functions and an ORER BY Clause.

regards
ik

September 01, 2002 - 10:08 am UTC

The question isn't relevant (that is my point).

Look -- if you NEED something grouped by X,Y,Z it really just doesn't matter AT ALL if it would be faster to group by A, B does it? It isn't what you need. It is not like "grouping by a primary key is an ALTERNATIVE". When tuning, you are looking for same result alternatives -- this just isn't one of them!

(as an aside, there is perhaps an advantage to getting the first rows back fastest when the group by columns are indexed -- which will be the case in a primary key situation. That does not mean "index all group by columns"!!! Not even a little)

As for the "when/how" -- the answer is "it depends". It depends on the predicate, the table(s) involved, the lack or presence of indices and so on.

Sometimes after all data, sometimes as soon as the first group is known to be done.

GROUPING is logically done first.
Then Analytics.
And finally sorting.

We can see this with this query:

scott@ORA920.US.ORACLE.COM> select deptno, max(sal), lag(deptno) over ( order by deptno )
2 from emp
3 group by deptno
4 order by 2
5 /

DEPTNO MAX(SAL) LAG(DEPTNO)OVER(ORDERBYDEPTNO)
------ ---------- ------------------------------
30 2850 20
20 3000 10
10 5000

scott@ORA920.US.ORACLE.COM>

Now, the analytics can only work on group by columns in this case (when you have an aggregate, analytics can only "see" the aggregated columns, this perhaps shows that even better:
scott@ORA920.US.ORACLE.COM> select deptno,
2 max(sal),
3 lag(deptno) over ( order by deptno ),
4 sum(max(sal)) over ()
5 from emp
6 group by deptno
7 order by 2
8 /

DEPTNO MAX(SAL) LAG(DEPTNO)OVER(ORDERBYDEPTNO) SUM(MAX(SAL))OVER()
------ ---------- ------------------------------ -------------------
30 2850 20 10850
20 3000 10 10850
10 5000 10850

scott@ORA920.US.ORACLE.COM>

we can sum() over() the max(sal) -- meaning the analytic MUST be done after the grouping has already taken place.

The order by is always done dead last.

Just what I wanted to know

Dave, September 16, 2002 - 11:17 am UTC

I have the same kind of situation, where I GROUP BY primary key columns, such as ...

SELECT PK1,PK2,PK3,PK4,SUM(NUM1) NUM1
FROM TABLE_WITH_4PK_COLS
GROUP BY PK1,PK2,PK3,PK4
HAVING SUM(NUM1) > 1000

... and obviously it is non-optimum, as you say. It would be better to ...

SELECT PK1,PK2,PK3,PK4,NUM1
FROM TABLE_WITH_4PK_COLS
WHERE NUM1 > 1000

... but I do not have a choice in this. Like many organizations out there we use a COTS query tool that generates SQL for us, and the tool will not use PK information to change the query appropriatly. I've used a tool that does, but I've also found that it sometimes did so in inappropriate circumstances. I guess that's what can happen when you have multiple applications storing metadata redundantly.

I've been very impressed with the level of "intelligence" shown by the Oracle optimizer when using materialized views and applying query rewrite, and it was a real surprise to find that this optimization is not automatically performed. It would seem to me to be an ideal candidate for a performance-enhancing optimization. The tools I use are also "aggregate-aware", but I strongly/fanatically believe in Oracle's approach to performing these optimizations in the layer that has the most robust metadata on data integrity/relationships/quality -- the database itself.

Just my 2 cents Tom, but perhaps you could pick up the phone and tell the developers to add a couple of lines of code to 9i release 3 to do this ;-). Maybe there's actually a mathematical/logical reason not to do it, who knows?

GROUP BY - a pseudo column (back to SQL 102)

robert, November 14, 2002 - 4:17 pm UTC

Tom, about the SQL below...
on the "decode(emp_status, 'ACTIVE', hiredate, firedate)"
in the GROUP BY...I am not sure this is OK.
Will you pls confirm that Oracle, although
will not accept in the GROUP BY clause the pseudo
column name (l_date), will do the "decode..." evaluation,
"gather" the result-set and do the aggregation on the
l_date column of the result-set. Thanks

SELECT deptno, count(*),
decode(emp_status, 'ACTIVE', hiredate, firedate) l_date
FROM emp
WHERE (
(hiredate BETWEEN '01-Jan-2002' AND '13-Nov-2002' AND emp_status = 'ACTIVE')
OR
(firedate BETWEEN '01-Jan-2002' AND '13-Nov-2002'
AND emp_status = 'GONE'))
GROUP BY
deptno,
decode(emp_status, 'ACTIVE', hiredate, firedate)

November 14, 2002 - 7:50 pm UTC

sorry -- not following you on this one.

If you are asking "will the above work and work correctly", yes -- it will. it will group by the decode.

if you don't like the double decode coding

select deptno, count(*), thedate
from ( select deptno, decode(emp_status, 'ACTIVE', hiredate, firedate) thedate
from emp
where .... )
group by thedate

GROUP BY and performance

Oleksandr Alesinskyy, November 15, 2002 - 11:08 am UTC

Hi Tom,

Y say: "the columns in a group by clause are dictated PURELY by the question being asked. It has nothing to do with performance.", but this isn't true completely.
E.g. we have view that allow us to obtain some data grouped by city, but underlining table is indexed only by countries and we don't want to add more indexes. In this case it is quite reasonable to add to GROUP BY clause of the view coutry, which is redudant but maz drastically improve performance for some queries from this view. I suppose that similar results maz be achieved with DIMENSIONS, but sometimes just addition of 1 column is much simpler solution.

November 15, 2002 - 7:24 pm UTC

If the index was on city, country -- you would have no need to add the extra and most like very confusing group by column.

Quick question

A reader, December 04, 2002 - 8:11 pm UTC

Tom,
If a query with a SELECT DISTINCT and a SELECT.. GROUP BY would return the same data which would you prefer and why. It's a choice between sort unique and sort group by.

For eg:-
SELECT DISTINCT a FROM tab;
or
SELECT a FROM tab GROUP BY a;

regards

December 05, 2002 - 7:05 am UTC

tell me -- do you want distinct rows -- is that your goal? then by all means use distinct.

tell me -- do you want to group by, to perform aggregates? then by all means use group by (as nothing else will do)

Use the one that has more semantic meaning so that 2 years from now after you've moved on -- it is more clear to the person inheriting your code what you meant.

A reader, December 05, 2002 - 10:08 am UTC

Tom,

This is the scenario.

TABLE -tab1
name role

1 A
1 B
1 C
2 A
2 C
3 A
3 D
3 E

The result expected is -
name
1
2
3

Which would be a better option considering that tab1 would have thousands of rows. No aggregation required, just the name column is required.

Both DISTINCT and Group By name gives me same result. Since they are two different operations altogether i wanted to know given such a situation which one would be better?

December 05, 2002 - 11:23 am UTC

the answer is to use distinct.

the optimizer knows what to do in both cases. but only distinct is semantically correct.

Group By Data

Harry, December 06, 2002 - 1:37 am UTC

Hi Tom!
Your information is very good. But i have to get a group by column along with a non-grouped column too.
Like i need to select product_code,dealer_code as grouped columns and distinct shipment code to be compared within the inner loop.
If i use group by all 3, i will get duplicate code for each dealer. I want only distinct product code,dealer code and shipment code. The data is in 3 tables.

Sql> Select distinct product_code,dealer_code,shipment_code from product_master a,dealer_master b,ship_maaster c where a.product_code = b.product_code and b.shipment_code = c.shipment_code and
dealer_code = 'XX';

This query works for a single dealer well. If i need to query for all the dealers then i am not able group.
like

Sql> Select distinct product_code,dealer_code,shipment_code from product_master a,dealer_master b,ship_maaster c where a.product_code = b.product_code and b.shipment_code = c.shipment_code and
dealer_code like '%';

This brings in duplicate entries. Can you give me a solution for this?
Thanks
Harry

December 06, 2002 - 7:08 am UTC

"Like i need to select product_code,dealer_code as grouped columns and distinct
shipment code to be compared within the inner loop."

makes no sense whatsoever to me.

I fail to see what "duplicate" entries you could be getting from the query with distinct.

Tell me this -- if there is more then one shipment_code for each product_code, dealer_code -- which "random" shipment_code do you want???

You'll need to use an example here with data -- not getting your point.

Need help on this one...

A reader, December 06, 2002 - 11:03 am UTC

Tom,
Thanks for earlier clarification-

I have yet another situation on hand.

DESC tab1;

a VARCHAR2(20), --- PK
NAME VARCHAR2(100),
salary NUMBER,
flg VARCHAR2(1),
address1 VARCHAR2(200),
address2 VARCHAR2(200));

The developer has coded as -
SELECT
a,
name,
SUM(salary),
flg,
address1,
address2
FROM
tab1
GROUP BY
a,
name,
flg,
address1,
address2;

This is what i propose.

SELECT
a,
MAX(name) name,
SUM(salary),
MAX(flg) flg,
MAX(address1) address1,
MAX(address2) address2
FROM
tab1
GROUP BY
a;

I have to justify why this piece of code needs to be changed to the second one.
The reasons i gave was less sort memory usage and hence better performance and also easier to understand.

Both queries return same result. But, however i cannot approve the first one. Please help me out on this. Am i doing the right thing here.

regards

December 06, 2002 - 11:21 am UTC

if A is the primary key -- i propose this

SELECT
a,
name,
salary,
flg,
address1,
address2
FROM
tab1

done -- same answer, less work. if you group by primary key -- you get exactly the same thing as if you DO NOT group by

Ooops...sorry

A reader, December 06, 2002 - 1:27 pm UTC

Tom,
That was just an example query. Sorry, i kind of goofed up the whole thing. Here it is again...Assuming that we join tab1 to a child table tab2 which has a whole set of numerics which i need to aggregate on. For every row in tab1 i have numerous records in tab2. So in that, what difference would it make if i group by just the Primary Key alone or if i group by this and a whole lot of other columns?

regards

December 06, 2002 - 3:52 pm UTC

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t1 as select * from all_objects;

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t2 as select * from all_objects;

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into t2 select * from t2;

23655 rows created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> commit;

Commit complete.

ops$tkyte@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into t2 select * from t2;

47310 rows created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> commit;

Commit complete.

ops$tkyte@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into t2 select * from t2;

94620 rows created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> commit;

Commit complete.

ops$tkyte@ORA817DEV.US.ORACLE.COM> update t2 set object_name = rownum;

189240 rows updated.

ops$tkyte@ORA817DEV.US.ORACLE.COM> commit;

Commit complete.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table t1 add constraint t1_pk primary key(object_id);

Table altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table t2 add constraint t2_pk primary key(object_id,object_name);

Table altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t1 compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t2 compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly
ops$tkyte@ORA817DEV.US.ORACLE.COM> select a.object_id, a.object_name, a.owner, a.created, sum(b.data_object_id)
  2    from t1 a, t2 b
  3   where a.object_id = b.object_id
  4   group by a.object_id, a.object_name, a.owner, a.created
  5  /

23654 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4467 Card=23654 Bytes=1513856)
   1    0   SORT (GROUP BY) (Cost=4467 Card=23654 Bytes=1513856)
   2    1     MERGE JOIN (Cost=1382 Card=189232 Bytes=12110848)
   3    2       SORT (JOIN) (Cost=203 Card=23654 Bytes=1111738)
   4    3         TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=23654 Bytes=1111738)
   5    2       SORT (JOIN) (Cost=1179 Card=189240 Bytes=3217080)
   6    5         TABLE ACCESS (FULL) OF 'T2' (Cost=75 Card=189240 Bytes=3217080)




Statistics
----------------------------------------------------------
          0  recursive calls
         49  db block gets
       2934  consistent gets
        977  physical reads
          0  redo size
    1778152  bytes sent via SQL*Net to client
     175361  bytes received via SQL*Net from client
       1578  SQL*Net roundtrips to/from client
          0  sorts (memory)
          3  sorts (disk)
      23654  rows processed

ops$tkyte@ORA817DEV.US.ORACLE.COM> select a.object_id, max(a.object_name), max(a.owner), max(a.created), sum(b.data_object_id)
  2    from t1 a, t2 b
  3   where a.object_id = b.object_id
  4   group by a.object_id
  5  /

23654 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1382 Card=23654 Bytes=1513856)
   1    0   SORT (GROUP BY NOSORT) (Cost=1382 Card=23654 Bytes=1513856)
   2    1     MERGE JOIN (Cost=1382 Card=189232 Bytes=12110848)
   3    2       SORT (JOIN) (Cost=203 Card=23654 Bytes=1111738)
   4    3         TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=23654 Bytes=1111738)
   5    2       SORT (JOIN) (Cost=1179 Card=189240 Bytes=3217080)
   6    5         TABLE ACCESS (FULL) OF 'T2' (Cost=75 Card=189240 Bytes=3217080)




Statistics
----------------------------------------------------------
          0  recursive calls
         47  db block gets
       2934  consistent gets
        734  physical reads
          0  redo size
    1778173  bytes sent via SQL*Net to client
     175361  bytes received via SQL*Net from client
       1578  SQL*Net roundtrips to/from client
          2  sorts (memory)
          2  sorts (disk)
      23654  rows processed

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace off


<b>that shows they use the same memory but differing amounts of temp in different ways (sorts memory/disk counts are different).

I suggest you stick with the way you have (group by c1, c2, c3, c4, ... ) instead of applying MAX()

reasoning:  it is semantically more meaningful to the developers looking at your query 3 years from now.</b>

Group By Month

Rohit Gulati, May 23, 2003 - 6:22 am UTC

Hi Tom
I have a table containing the invoice number,Date adn other detail. If i use group by clause on month then it gives results for every month. But my problem is that i have to group the month from 8th of current month to the 7th of next month i.e for April months it should show all the invoices from 8th April to 7 may.
I want these results by single sql command.
will you please help me.
thanks & regards

May 23, 2003 - 8:54 am UTC

group by trunc(date_column-7,'mm')

How to do this Group by

Reader, May 23, 2003 - 8:56 am UTC

I am having a problem.I have a SQL which is not giving the desired result.Can you help me.

select
a.storeID,
a.STORE_NM,
SUM(decode(upper(a.MTH),'JAN',a.MTHQ,null)) ||'|'||
SUM(decode(upper(a.MTH),'FEB',a.MTHQ,null)) ||'|'||
SUM(decode(upper(a.MTH),'MAR',a.MTHQ,null))||chr(10)||
SUM(decode(upper(a.MTH),'JAN',a.FCST_AMT,null)) ||'|'||
SUM(decode(upper(a.MTH),'FEB',a.FCST_AMT,null)) ||'|'||
SUM(decode(upper(a.MTH),'MAR',a.FCST_AMT,null))||chr(10)||
SUM(decode(upper(a.MTH),'JAN',a.DSAR_FRD_AMT_PRIOR,null)) ||'|'||
SUM(decode(upper(a.MTH),'FEB',a.DSAR_FRD_AMT_PRIOR,null)) ||'|'||
SUM(decode(upper(a.MTH),'MAR',a.DSAR_FRD_AMT_PRIOR,null)) ||'|'||chr(10)||
SUM(decode(upper(a.MTH),'JAN',a.DSAR_FRD_AMT,null))||'|'||
SUM(decode(upper(a.MTH),'FEB',a.DSAR_FRD_AMT,null))||'|'||
SUM(decode(upper(a.MTH),'MAR',a.DSAR_FRD_AMT,null))||'|'||chr(10)||
SUM(decode(upper(b.CATG_ALT_CD),'ANL',b.ALLOC_AMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'SKC',b.ALLOC_AMT,null))||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'CLD',b.ALLOC_AMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'DIA',b.ALLOC_AMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'OTH',b.ALLOC_AMT,null)) ||'|'||chr(10)||
SUM(decode(upper(b.CATG_ALT_CD),'ANL',b.PLAN_PMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'SKC',b.PLAN_PMT,null))||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'CLD',b.PLAN_PMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'DIA',b.PLAN_PMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'OTH',b.PLAN_PMT,null)) ||'|'||chr(10)||
SUM(decode(upper(b.CATG_ALT_CD),'ANL',b.PRIOR_PMT_AMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'SKC',b.PRIOR_PMT_AMT,null))||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'CLD',b.PRIOR_PMT_AMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'DIA',b.PRIOR_PMT_AMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'OTH',b.PRIOR_PMT_AMT,null)) ||'|'||chr(10)||
SUM(decode(upper(b.CATG_ALT_CD),'ANL',b.PMT_AMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'SKC',b.PMT_AMT,null))||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'CLD',b.PMT_AMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'DIA',b.PMT_AMT,null)) ||'|'||
SUM(decode(upper(b.CATG_ALT_CD),'OTH',b.PMT_AMT,null)) ||'|'||
c.brd_alt_ds || '|' ||
to_char(c.event_start_dt,'yyyy-mm-dd') || '|' ||
to_char(c.price_amt,'999.99')
from     cso.VOLUME_QTRLY_TMP a,
cso.TRADE_QTRLY_TMP b,
cso.PROMO_QTRLY_TMP c
where     a.storeID in ( '007300')
and a.storeID =b.storeID
and a.storeID =c.storeID
group by a.storeID,a.STORE_NM,c.BRD_ALT_DS,c.EVENT_START_DT,c.PRICE_AMT


SQL> /

STORE STORE_NM
------ -----------------------------------
SUM(DECODE(UPPER(A.MTH),'JAN',A.MTHQ,NULL))||'|'||SUM(DECODE(UPPER(
--------------------------------------------------------------------------------

EVENT_STAR  PRICE_AMT
---------- ----------

007300 GROCERY STORE
||
510300|894000|1662000
1564097.16|1671413.52|2870744.28|
1020685.2|2299272.48|1888686.96|
3225000|378000|351000||154500|
5015520|102774|25455||120|
8832101.1|3443561.52|1222076.1||21888|
1719421.2|185466.96|47736||115.2|Product1
2003-01-01       3.94

007300 GROCERY STORE
||
510300|894000|1662000
1564097.16|1671413.52|2870744.28|
1020685.2|2299272.48|1888686.96|
3225000|378000|351000||154500|
5015520|102774|25455||120|
8832101.1|3443561.52|1222076.1||21888|
1719421.2|185466.96|47736||115.2|Product2
2003-02-01       3.65

007300 GROCERY STORE
||
510300|894000|1662000
1564097.16|1671413.52|2870744.28|
1020685.2|2299272.48|1888686.96|
3225000|378000|351000||154500|
5015520|102774|25455||120|
8832101.1|3443561.52|1222076.1||21888|
1719421.2|185466.96|47736||115.2|Product3
2003-02-22       6.64

007300 GROCERY STORE
||
510300|894000|1662000
1564097.16|1671413.52|2870744.28|
1020685.2|2299272.48|1888686.96|
3225000|378000|351000||154500|
5015520|102774|25455||120|
8832101.1|3443561.52|1222076.1||21888|
1719421.2|185466.96|47736||115.2|Product3
2003-01-01       9.58

007300 GROCERY STORE
||
510300|894000|1662000
1564097.16|1671413.52|2870744.28|
1020685.2|2299272.48|1888686.96|
3225000|378000|351000||154500|
5015520|102774|25455||120|
8832101.1|3443561.52|1222076.1||21888|
1719421.2|185466.96|47736||115.2|Product4
2003-01-01             12.99


Will like to get this:

007300 GROCERY STORE
||
510300|894000|1662000
1564097.16|1671413.52|2870744.28|
1020685.2|2299272.48|1888686.96|
3225000|378000|351000||154500|
5015520|102774|25455||120|
8832101.1|3443561.52|1222076.1||21888|
1719421.2|185466.96|47736||115.2|Product1|2003-01-01||| 3.94
1719421.2|185466.96|47736||115.2|Product2|2003-02-01||| 3.65
1719421.2|185466.96|47736||115.2|Product3|2003-02-22||| 6.64
1719421.2|185466.96|47736||115.2|Product3|2003-01-01||| 9.58
1719421.2|185466.96|47736||115.2|Product4|2003-01-01|||12.99

May 23, 2003 - 9:39 am UTC

looks like you want to write a report then -- in master/detail style.

Yes

Reader, May 23, 2003 - 10:55 am UTC

Thanks for the prompt reply.Actually this is currently going to be an Adhoc report and maybe finally (with other requirements) will become a standard report that users/management will pull into Excel/Access or any other tool.

May 23, 2003 - 11:48 am UTC

great, so write a report -- not everything is doable in a single query every time without some procedural workings on it.

Distinct

arc, July 31, 2003 - 2:05 am UTC

Hi,
I have a query to retrive 1 Million records.
I need your advise in elminating duplicates.
Which is better either distinct or group by?

Q1)
SELECT DISTINCT
CIF_KEY,
BANK_CODE ,
BRANCH_CODE ,
ACCOUNT_NUMBER,
ISO_CURRENCY_CD,
EFF_TRANSFER_RATE ,
GL_CORP_CD,
GL_ACCOUNT_ID,
PRODUCT_COA_ID,
TP_COA_ID
FROM FEM_COMMERCIAL_LOANS
WHERE AS_OF_DATE = (SELECT AS_OF_DATE FROM TETL_ASOFDATE);

Q2)
SELECT CIF_KEY,
BANK_CODE ,
BRANCH_CODE ,
ACCOUNT_NUMBER,
ISO_CURRENCY_CD,
EFF_TRANSFER_RATE ,
GL_CORP_CD,
GL_ACCOUNT_ID,
PRODUCT_COA_ID,
TP_COA_ID
FROM FEM_COMMERCIAL_LOANS
WHERE AS_OF_DATE = (SELECT AS_OF_DATE FROM TETL_ASOFDATE)
GROUP BY CIF_KEY,
BANK_CODE ,
BRANCH_CODE ,
ACCOUNT_NUMBER,
ISO_CURRENCY_CD,
EFF_TRANSFER_RATE ,
GL_CORP_CD,
GL_ACCOUNT_ID,
PRODUCT_COA_ID,
TP_COA_ID;

July 31, 2003 - 7:10 am UTC

benchmark it.

in either case, it'll only take a couple of seconds/minutes to retrieve both result sets so you can easily test both and see what happens. Just

alter session set timed_statistics=true;
alter session set sql_trace=true;
set arraysize 100
set termout off
run query 1
run query 2
set termout on
exit

run that in plus and then use tkprof

A grouping

A reader, December 05, 2003 - 5:26 am UTC

Hello Tom!
Can you please suggest how to create a query?

For table:
create table t (d1 date, d2 date, v number)
/

And data:
insert into t values (sysdate - 20, sysdate - 15, 1);
insert into t values (sysdate - 15, sysdate - 5, 1);
insert into t values (sysdate - 5, sysdate - 1, 2);
insert into t values (sysdate - 1, sysdate + 15, 1);
insert into t values (sysdate + 15, sysdate + 25, 1);
insert into t values (sysdate + 25, sysdate + 45, 3);
insert into t values (sysdate + 45, sysdate + 55, 3);
insert into t values (sysdate + 55, sysdate + 75, 3);

How to get min and max date inside groups by value of column v, that are continuous (not broken by another value of v).
I.e. from
15.11.2003 20.11.2003 1
20.11.2003 30.11.2003 1
30.11.2003 04.12.2003 2
04.12.2003 20.12.2003 1
20.12.2003 30.12.2003 1
30.12.2003 19.01.2004 3
19.01.2004 29.01.2004 3
29.01.2004 18.02.2004 3

get
15.11.2003 30.11.2003 1
30.11.2003 04.12.2003 2
04.12.2003 30.12.2003 1
30.12.2003 18.02.2004 3

Thank you in advance!

December 05, 2003 - 7:40 am UTC

ops$tkyte@ORA9IR2> select min(d1), max(d2), v
  2    from (
  3  select d1, d2, v, max(rn) over ( order by d1 ) max_rn
  4    from (
  5  select t.*,
  6         case when (lag(v) over (order by d1) <> v)
  7              then row_number() over (order by d1)
  8          end rn
  9    from t
 10         )
 11         )
 12   group by v, max_rn
 13   order by 1
 14  /
 
MIN(D1)   MAX(D2)            V
--------- --------- ----------
15-NOV-03 30-NOV-03          1
30-NOV-03 04-DEC-03          2
04-DEC-03 30-DEC-03          1
30-DEC-03 18-FEB-04          3
 

run the inline views one by one by one -- wrapping the innermost with the next layer and so on to see what I did.

Thank you, Tom!!

A reader, December 05, 2003 - 9:21 am UTC

Interesting; Is there a way to make grouping perform faster

Tom, December 10, 2003 - 12:31 pm UTC

Group by is a great tool. We have a consistent problem with them here in that we construct queries that do the selective part well (i.e. use the expected indexes and return the data quickly) but when we aggregate using GROUP BY, the queries "go away and return hours later." For example, one query took 4.5 seconds to select 350K records, however when we add some AVG functions and GROUP BY the query flys off to neverneverland and returns 2 plus hours later. What is going on and how can we improve the performance?

December 10, 2003 - 4:05 pm UTC

it'll be a function of your SORT_AREA_SIZE if you are using 8i and before -- pga_aggregate_target | sort_area_size in 9i and up

so -- what are you (version)
and what are they (init.oras)

while this does not show "2 hours" (a TKPROF would be really really handy) it does show that sort area size can affect this greatly

big_table@ORA9IR2> alter session set workarea_size_policy = manual;

Session altered.

big_table@ORA9IR2> alter session set sort_area_size = 65536;

Session altered.

big_table@ORA9IR2> set arraysize 100
big_table@ORA9IR2> set timing on
big_table@ORA9IR2> set autotrace traceonly
big_table@ORA9IR2> select * from big_table where rownum <= 350000;

350000 rows selected.

Elapsed: 00:00:04.91

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1379 Card=350000 Bytes=35000000)
1 0 COUNT (STOPKEY)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=1379 Card=999620 Bytes=99962000)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
8469 consistent gets
4456 physical reads
0 redo size
24242394 bytes sent via SQL*Net to client
38988 bytes received via SQL*Net from client
3501 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
350000 rows processed

big_table@ORA9IR2> select id, object_name, owner, count(*), avg(data_object_id), min(last_ddl_time), max(created)
2 from (select * from big_table where rownum <= 350000 )
3 group by id, object_name, owner;

350000 rows selected.

Elapsed: 00:00:43.40

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=15844 Card=350000 Bytes=20650000)
1 0 SORT (GROUP BY) (Cost=15844 Card=350000 Bytes=20650000)
2 1 VIEW (Cost=1379 Card=350000 Bytes=20650000)
3 2 COUNT (STOPKEY)
4 3 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=1379 Card=999620 Bytes=58977580)

Statistics
----------------------------------------------------------
0 recursive calls
689 db block gets
5021 consistent gets
22901 physical reads
0 redo size
16534945 bytes sent via SQL*Net to client
38988 bytes received via SQL*Net from client
3501 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
350000 rows processed

big_table@ORA9IR2> alter session set sort_area_size = 6553600;

Session altered.

Elapsed: 00:00:00.01
big_table@ORA9IR2> select id, object_name, owner, count(*), avg(data_object_id), min(last_ddl_time), max(created)
2 from (select * from big_table where rownum <= 350000 )
3 group by id, object_name, owner;

350000 rows selected.

Elapsed: 00:00:05.05

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4755 Card=350000 Bytes=20650000)
1 0 SORT (GROUP BY) (Cost=4755 Card=350000 Bytes=20650000)
2 1 VIEW (Cost=1379 Card=350000 Bytes=20650000)
3 2 COUNT (STOPKEY)
4 3 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=1379 Card=999620 Bytes=58977580)

Statistics
----------------------------------------------------------
0 recursive calls
6 db block gets
5021 consistent gets
8080 physical reads
0 redo size
16534945 bytes sent via SQL*Net to client
38988 bytes received via SQL*Net from client
3501 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
350000 rows processed

big_table@ORA9IR2> set autotrace off
big_table@ORA9IR2> set timing off
big_table@ORA9IR2>

Thank you

Tom, December 12, 2003 - 10:49 am UTC

This helps. Thank you.

Group by cluase

Ganesh, February 24, 2004 - 12:11 am UTC

exllent discussion .. very useful .

Is there any limit in oracle 9i , the maximum columns in group by clause.
how many Columns can we give in GROUP BY, ORDER BY

Regds
Ganesh

February 24, 2004 - 6:49 am UTC

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96536/ch44.htm <code>

Group by cluase

Ganesh, February 26, 2004 - 12:24 am UTC

Hi Tom,
I got the relavent information from the link. I am preparing one document on limitaions. Most of the limitation ,I got from the above link.
still i am not able fine the below information

1. Columns referenced by a view
2. Columns per SELECT statement
3. Parameters per stored procedure
4. Nested stored procedure levels
5. Tables per SELECT statement
6. XML support
Tom ! can you please provide me the above. Thanks in advance.

February 26, 2004 - 9:03 am UTC

1) $ oerr ora 1792
01792, 00000, "maximum number of columns in a table or view is 1000"



ops$tkyte@ORA920PC> declare
  2          l_stmt long := 'create or replace view v as select 1 c1';
  3  begin
  4          for i in 2 .. 2000
  5          loop
  6                  l_stmt := l_stmt || ',1 c'||i;
  7          end loop;
  8          l_stmt := l_stmt || ' from dual';
  9          execute immediate l_stmt;
 10  end;
 11  /
declare
*
ERROR at line 1:
ORA-01792: maximum number of columns in a table or view is 1000
ORA-06512: at line 9


2) unlimited in theory, limited by practical limits
3) same as #2
4) same as #2
5) same as #2
6) unlimited?  not sure what you mean



ops$tkyte@ORA920PC> variable x refcursor
ops$tkyte@ORA920PC> declare
  2          l_stmt long := 'select 1 c1';
  3  begin
  4          for i in 2 .. 2000
  5          loop
  6                  l_stmt := l_stmt || ',1 c'||i;
  7          end loop;
  8          l_stmt := l_stmt || ' from dual';
  9          open :x for l_stmt;
 10  end;
 11  /
 
PL/SQL procedure successfully completed.
 
 
ops$tkyte@ORA920PC> print x

        C1         C2         C3         C4         C5         C6         C7         C8         C9        C10        C11       C12
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
...............

Distinct is taking long time

ZS, April 14, 2004 - 10:54 am UTC

Tom,
We have a query that has a DISTINCT and ORDER BY. The query takes more than 30 minutes to run. When we removed the DISTINCT it runs for less than 1 minute. We are running Oracle9i.
Can you please explain why this happens and what are the steps involved with DISTINCT.
Thanks,
ZS

April 14, 2004 - 10:59 am UTC

I'll guess, are you running this in toad or something.

SQL

ZS, April 14, 2004 - 11:06 am UTC

We ran it in from SQL Plus and Discoverer.

April 14, 2004 - 11:34 am UTC

ok, set autotrace traceonly explain and run both queries, lets see the plans

(and you did retrieve the ENTIRE result set in plus right -- you are not just timing the time to get the first rows back correct?)

I know!!

ZS, April 15, 2004 - 7:58 pm UTC

oh..I think I know why now. we checked the time to get the first rows back(in discoverer). when we run it in sqlplus they are almost the same (about 20 seconds more for distinct).
Can you please explain what are the steps that oracle do with distinct vs. order by and group by. We need to explain them to Discoverer users.
Thanks,
ZS.

April 15, 2004 - 8:49 pm UTC

distinct -- think about it. in order to DISTINCT a set of rows - you need to get *every row*. And then you can distinct. (in general, an index on every column being distincted could change that)

in order to show an ordered set -- maybe we just need to read data via an index. you could get the first row hours/days before the last row. (group by, same thing, you group by some indexed columns -- an index can be used to rapidly return the first group)

distinct and use of PK

dd, May 18, 2004 - 6:06 pm UTC

Tom, suppose I did a distinct on the some of the columns of a multi column primary key. Would the query make use of the primary key index somehow to do the sort?

select distinct pk1, pk2, pk3 from table1;

PK of table1 is pk1+pk2+pk3+pk4+pk5

What about if I do the same using a Group BY?

May 19, 2004 - 7:39 am UTC

for the distinct it would probably use the index as a skinnier version of the table, fast full scan and return the distincted values.  for a group by - it would either do that as well, or it might well use the index in an index range scan (single block io) to return first rows fastest.

different data might have different results.


ops$tkyte@ORA9IR2> create table t ( a int, b int, c int, d int, e int, f char(80),
  2  primary key(a,b,c,d,e) );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats(user,'T',numrows=>1000000,numblks=>1000000);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select distinct a,b,c from t;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6865 Card=1000000 Bytes=39000000)
   1    0   SORT (UNIQUE) (Cost=6865 Card=1000000 Bytes=39000000)
   2    1     INDEX (FAST FULL SCAN) OF 'SYS_C005303' (UNIQUE) (Cost=4 Card=1000000 Bytes=39000000)
 
 
 
ops$tkyte@ORA9IR2> select /*+ FIRST_ROWS */ distinct a,b,c from t;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=6865 Card=1000000 Bytes=39000000)
   1    0   SORT (UNIQUE) (Cost=6865 Card=1000000 Bytes=39000000)
   2    1     INDEX (FAST FULL SCAN) OF 'SYS_C005303' (UNIQUE) (Cost=4 Card=1000000 Bytes=39000000)
 
 
 
ops$tkyte@ORA9IR2> select a,b,c from t group by a,b,c;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=26 Card=1000000 Bytes=39000000)
   1    0   SORT (GROUP BY NOSORT) (Cost=26 Card=1000000 Bytes=39000000)
   2    1     INDEX (FULL SCAN) OF 'SYS_C005303' (UNIQUE) (Cost=26 Card=1000000 Bytes=39000000)
 
 
 
ops$tkyte@ORA9IR2> select /*+ FIRST_ROWS */ a,b,c from t group by a,b,c;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=26 Card=1000000 Bytes=39000000)
   1    0   SORT (GROUP BY NOSORT) (Cost=26 Card=1000000 Bytes=39000000)
   2    1     INDEX (FULL SCAN) OF 'SYS_C005303' (UNIQUE) (Cost=26 Card=1000000 Bytes=39000000)
 
 
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>

distinct and returns other columns ( Column wise distinct and not row wise )

Wor, July 12, 2004 - 2:57 pm UTC

Hi Tom,
I want do to something like Column wise distinct and not row wise in a single select statement.

I have a table that i am doing distinct on the employee number , but I want to return all the other columns, how can i do this in the select statement?

select distinct(empno) from employee.
select * from employee

( basically one employee can have multiple rows with unique data except the empno )

It's like I want to merge the 2 statements together....
I want the distinct records ( only on empno ) but I want all columns returned. Any one record e.g. asc or desc if multiple rows exists for that empno.

Is this possible ?

Thanks

July 12, 2004 - 11:25 pm UTC

if an employess can have >1 row with "unique data"

which "row" would be returned in your world?

you can:

select *
from (select emp.*,
row_number() over (partition by empno order by <something>) rn
from emp )
where rn = 1;

use something to decide "which row is first for an empno")

Complilation error

Wor, July 13, 2004 - 12:19 pm UTC

Hi Tom,
I could sucessfully run the SQL direclty but when I put that SQL inside a Package/Stored Procedure I get compilation error
like the following below.

The complilation error comes on line
"row_number() over (partition "
at the "(" before the word "partition".

I am using 8.1.7 on Win2k

I am really very confused , how it works directly in the query but same thing I put in the package it gives compilation error.

Please Help .

(1): PLS-00103: Encountered the symbol "(" when expecting one of the following:
(1):
(1): , from

July 13, 2004 - 12:32 pm UTC

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:3027089372477 <code>

same issue.

Group by Month starting from 8th...

Kamran, July 13, 2004 - 11:42 pm UTC

group by trunc(date_column-7,'mm')
it is simple, cool, lovely technique. It gave idea to resolve
many other cases (Excellent)

Group By clause

A reader, July 26, 2004 - 5:41 am UTC

Hi Tom,
I have the following situation

Rollnumber subject
1 Maths
2 Science
3 Physics
4 Maths
5 Physics
6 Biology
7 Science

Let us say i have the above data in a table...
I want the following information from a single query (ordered by on subject)
I want only those subjects where there are more than 1 rollnumber in it.
The result should be
1 Maths
4 Maths
2 Science
7 Science
3 Physics
5 Physics

July 26, 2004 - 7:31 am UTC

select *
from ( select rollnumber, subject, count(*) over (partition by subject) cnt
from t )
where cnt > 1

that is just psuedo code -- as you didn't give me a create table, or insert into statements. the crux of the answer is there, but the sql is untested....

scott@ORA9IR2> select empno, job from emp order by job;

EMPNO JOB
---------- ---------
7788 ANALYST
7902 ANALYST
7369 CLERK
7876 CLERK
7934 CLERK
7900 CLERK
7566 MANAGER
7782 MANAGER
7698 MANAGER
7839 PRESIDENT
7499 SALESMAN
7654 SALESMAN
7844 SALESMAN
7521 SALESMAN

14 rows selected.

scott@ORA9IR2> select *
2 from ( select empno, job, count(*) over (partition by job) cnt
3 from emp )
4 where cnt > 1;

EMPNO JOB CNT
---------- --------- ----------
7788 ANALYST 2
7902 ANALYST 2
7369 CLERK 4
7876 CLERK 4
7934 CLERK 4
7900 CLERK 4
7566 MANAGER 3
7782 MANAGER 3
7698 MANAGER 3
7499 SALESMAN 4
7654 SALESMAN 4
7844 SALESMAN 4
7521 SALESMAN 4

13 rows selected.

Group By output where grouped expression only appears once per grouping

Ian Murphy, July 29, 2004 - 2:18 pm UTC

Hi Tom,

If my title is misleading, I apologise.  Basically, I have a table:

SQL> select * from probcode;

CATG_CODE    PROB_CODE    DESCRIPTION
---------    ----------    ---------------------
ACCOUNT        LOCKOUT        Account locked out
ACCOUNT        PASSWORD    Password reset
ACCOUNT        SETUP        Account creation
HW ISSUE    MONITOR        Problem with monitor
HW ISSUE    KEYBOARD    problem with keyboard

but would like it to appear thusly:

CATG_CODE    PROB_CODE    DESCRIPTION
---------    ----------    -----------------------
ACCOUNT        LOCKOUT        Account locked out
        PASSWORD    Password reset
        SETUP        Account creation
HW ISSUE    MONITOR        Problem with monitor
        KEYBOARD    problem with keyboard

I don't want you to tell me how but I'm really struggling on exactly what to search for.  Any pointers you could provide would be greatly appreciated and once I find a solution (assuming I do!) I'll post it back.

Cheers,

IM

July 29, 2004 - 2:23 pm UTC

scott@ORA9IR2> break on deptno
scott@ORA9IR2> select deptno, ename from emp order by deptno;

DEPTNO ENAME
---------- ----------
10 CLARK
KING
MILLER
20 SMITH
ADAMS
FORD
SCOTT
JONES
30 ALLEN
BLAKE
MARTIN
JAMES
TURNER
WARD

14 rows selected.

:-O You are a legend!

Ian Murphy, July 29, 2004 - 2:27 pm UTC

Tom,

Not only was that incredibly quick, but it is spot on.

Thanks for your time.

IM

for the query above

A reader, July 29, 2004 - 3:39 pm UTC

Tom
is not "break" sqlplus specific feature - in which
case it may not work outside sqlplus?

July 29, 2004 - 4:10 pm UTC

yes, break is a sqlplus reporting format.

you could use row_number() and decode if you really wanted, but the tool that prints this out would be "best able to surpress" this

1 select decode(row_number() over (partition by deptno order by ename),1,deptno) deptno_fmt, ename
2 from emp
3* order by deptno, ename
scott@ORA9IR2> /

DEPTNO_FMT ENAME
---------- ----------
10 CLARK
KING
MILLER
20 ADAMS
FORD
JONES
SCOTT
SMITH
30 ALLEN
BLAKE
JAMES
MARTIN
TURNER
WARD

14 rows selected.

GROUP BY and DISTINCT

Asim Naveed, March 16, 2005 - 10:28 am UTC

Can there be a situation where DISTINCT and GROUP BY
are both used in one SELECT query

e.g

SELECT DISTINCT ............
FROM TABLEA
GROUP BY ............

Am I right on saying
"There is no situation in which DISTINCT and GROUP BY
are togather needed in one query, BUT oracle will not
give you any syntax error if you use DISTINCT and GROUP
BY togather in one query"

Thanks

March 17, 2005 - 6:57 am UTC

select distinct ... group by ....
and
select .... group by .....

can return different results


ops$tkyte@ORA9IR2> create table t ( x int, y int, z int );
Table created.
 
ops$tkyte@ORA9IR2> insert into t values ( 1, 1, 1 );
1 row created.
 
ops$tkyte@ORA9IR2> insert into t values ( 1, 2, 1 );
1 row created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select x, sum(z)
  2    from t
  3   group by x, y
  4  /
 
         X     SUM(Z)
---------- ----------
         1          1
         1          1
 
ops$tkyte@ORA9IR2> select distinct x, sum(z)
  2    from t
  3   group by x, y
  4  /
 
         X     SUM(Z)
---------- ----------
         1          1
 
ops$tkyte@ORA9IR2>

Having Clause Problem

Michaeel Curtis, March 18, 2005 - 8:43 pm UTC

SELECT ptcd.region, ptcd.phone, COUNT (*) calls,
INITCAP ( addr1
|| ' '
|| addr2
|| ' '
|| addr3
|| ' '
|| addr4
|| ' '
|| addr5
) address,
SUM (dur (DURATION)) "Unit Consumed"
FROM glaxyd.ptcd, ptc
WHERE glaxyd.ptcd.phone = ptc.phone
AND dialno LIKE '%0300%'
AND TYPE || nature IN ('NA', 'NL')
AND cdate BETWEEN '041201' AND '041231'
GROUP BY ptcd.region, ptcd.phone, addr1, addr2, addr3, addr4, addr5
HAVING SUM (dur (DURATION)) > 1000
/

Hi there, The above Query is correct but I want to include some other columns, What I want is to include TYPE || nature in the select list. But the problem is then I must include this in group by clause and so results break up into over type||nature. Now the having clause is failed here as then no SUM (dur (DURATION)) > 1000 evaluate to true. Please can you guide me how to break up records for TYPE||NATURE but with having clasue SUM (dur (DURATION)) > 1000

March 19, 2005 - 6:48 am UTC

of course it does -- think about it.

for a given:

GROUP BY ptcd.region, ptcd.phone, addr1, addr2, addr3, addr4, addr5

you have two type||natures:

AND TYPE || nature IN ('NA', 'NL')

So, which one do you want? Either of them? if so, just take the max(type||nature), if you say "don't know which one I want", then neither do I.

if you want "both", if they exist,

max( decode( type||nature, 'NA', 'NA' ) ) type_nature_1,
max( decode( type||nature, 'NL', 'NL' ) ) type_nature_2,

if there is an NA, the _1 would be filled it for that grouping, if there is an NL -- _2 would be. If one or the other is not there for a grouping, it would show "null"

DISTINCT / GROUP BY with ORDER BY clause

Roy, March 16, 2007 - 4:04 am UTC

Hi Tom,

We knew that any column (apart from an aggregate) in the SELECT list should appear in the GROUP BY clause.

But, we didn't know that any column that is there in the ORDER BY clause but not in the SELECT list should also be there in the GROUP BY clause of a query. We guess, nothing like this is mentioned in the documentation.

Similarly, anything not in the SELECT DISTINCT list cannot appear in the ORDER BY clause too. We are not sure why is it so!

SQL*Plus: Release 9.0.1.0.1 - Production on Fri Mar 16 12:37:42 2007

(c) Copyright 2001 Oracle Corporation. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> create table t_videos (title_id number primary key, title varchar2(20) not null, rented_date da
te not null);

Table created.

SQL> insert into t_videos values (1, 'TITANIC', sysdate - 10);

1 row created.

SQL> insert into t_videos values (2, 'TERMINATOR-II', sysdate - 12);

1 row created.

SQL> insert into t_videos values (3, 'MEN IN BLACK', sysdate - 12);

1 row created.

SQL> insert into t_videos values (4, 'TITANIC', sysdate - 10);

1 row created.

SQL> insert into t_videos values (5, 'MEN IN BLACK', sysdate - 12);

1 row created.

SQL> select * from t_videos order by rented_date desc;

TITLE_ID TITLE RENTED_DA
---------- -------------------- ---------
4 TITANIC 06-MAR-07
1 TITANIC 06-MAR-07
5 MEN IN BLACK 04-MAR-07
3 MEN IN BLACK 04-MAR-07
2 TERMINATOR-II 04-MAR-07

SQL> select distinct title from t_videos order by rented_date desc;
select distinct title from t_videos order by rented_date desc
*
ERROR at line 1:
ORA-01791: not a SELECTed expression

SQL> select distinct title from (select * from t_videos order by rented_date desc);

TITLE
--------------------
MEN IN BLACK
TERMINATOR-II
TITANIC

SQL> select title from t_videos group by title order by rented_date desc;
select title from t_videos group by title order by rented_date desc
*
ERROR at line 1:
ORA-00979: not a GROUP BY expression

SQL> select title from (select * from t_videos order by rented_date desc) group by title;

TITLE
--------------------
MEN IN BLACK
TERMINATOR-II
TITANIC

As you can see above, in case of both DISTINCT and GROUP BY queries (that worked), we did not get the result set in the desired order, since the DISTINCT or GROUP BY in the outer query involves SORTING and that destroys the ORDER within the subquery. So, we tried this and it worked as per our desire:

SQL> select title from (select title, row_number() over (partition by title order by rented_date des
c) rno from t_videos order by rented_date desc) where rno = 1;

TITLE
--------------------
TITANIC
MEN IN BLACK
TERMINATOR-II

Do you think the above solution is fine? Can you please probably suggest of a better solution for this if ours is not fine or costly?

Thanks.

March 17, 2007 - 2:35 pm UTC

... But, we didn't know that any column that is there in the ORDER BY clause but not in the SELECT list should also be there in the GROUP BY clause of a query. We guess, nothing like this is mentioned in the documentation. ....

it is not obvious?

If

a) the column to be order by is not an aggregate function
b) grouped by

HOW could it be in the result set to be ordered by????!?!?!?!

think about this please.

You have columns A, B, C

You

select, a, sum(b) from t group by a;

HOW can you possibly order that result set by C??!?!?!?!?! it makes no sense.

A     B       C
----  -----   ------
1     0       1
1     0       2
2     0       2
2     0       3

Ok, using that set of data, grouped by A, sum of B, HOW DO YOU 'ORDER' by C??!?

DISTINCT / GROUP BY with ORDER BY

Roy, March 19, 2007 - 11:58 pm UTC

Hi Tom,

Yes, it is clear now why the queries were not working. Since the ORDER BY is the last operation in the queries, the columns by which we need to sort the data should be present in the result set, unless it is not just a simple SELECT.

Could you please confirm if the solution using the analytic function is fine? I mean, by nesting the ORDER BY query within the GROUP BY / DISTINCT operations, we didn't get the correct result always, since there is an additional sort (on another column) involved. However, with the solution using analytics, can we guarantee that the result will always be in the desired order?

Thanks.

March 20, 2007 - 7:49 am UTC

and what is the "desired order", give us a "for example"

Example

Roy, March 20, 2007 - 8:32 am UTC

Hi Tom,

I quote the example once again here:

SQL> select * from t_videos order by rented_date desc;

TITLE_ID TITLE RENTED_DA
---------- -------------------- ---------
4 TITANIC 06-MAR-07
1 TITANIC 06-MAR-07
5 MEN IN BLACK 04-MAR-07
3 MEN IN BLACK 04-MAR-07
2 TERMINATOR-II 04-MAR-07

SQL> select distinct title from t_videos order by rented_date desc;
select distinct title from t_videos order by rented_date desc
*
ERROR at line 1:
ORA-01791: not a SELECTed expression

SQL> select distinct title from (select * from t_videos order by rented_date desc);

TITLE
--------------------
MEN IN BLACK
TERMINATOR-II
TITANIC

SQL> select title from t_videos group by title order by rented_date desc;
select title from t_videos group by title order by rented_date desc
*
ERROR at line 1:
ORA-00979: not a GROUP BY expression

SQL> select title from (select * from t_videos order by rented_date desc) group by title;

TITLE
--------------------
MEN IN BLACK
TERMINATOR-II
TITANIC

As you can see above, in case of both DISTINCT and GROUP BY queries (that worked), we did not get the result set in the desired order, since the DISTINCT or GROUP BY in the outer query involves SORTING and that destroys the ORDER within the subquery. So, we tried this and it worked as per our desire:

SQL> select title from (select title, row_number() over (partition by title order by rented_date des
c) rno from t_videos order by rented_date desc) where rno = 1;

TITLE
--------------------
TITANIC
MEN IN BLACK
TERMINATOR-II

So, like with GROUP BY, we cannot guarantee that the result set will be sorted (one of interesting discussions here on asktom I went through), with the above also, can we guarantee that the result will be in the desired order always?

Also, could you please let us know if the above query will generally be costly or not? As we can see, in the subquery:

select title, row_number() over (partition by title order by rented_date des
c) rno from t_videos order by rented_date desc

we are doing the ORDER BY twice, once within the OVER() clause and once outside. Why is the outer ORDER BY necessary, if while PARTITIONing itself, we are ORDERing the result?

Thanks.

March 20, 2007 - 8:46 am UTC

ok, tell me, what is the sorted order of:

SQL> select distinct title from t_videos order by rented_date desc;

if the input data is:

TITLE_ID TITLE RENTED_DA
---------- -------------------- ---------
4 TITANIC 07-MAR-07
1 TITANIC 06-MAR-07
5 MEN IN BLACK 07-MAR-07
3 MEN IN BLACK 06-MAR-07

Ok, tell me - which comes first, and why.

so, maybe you really mean something like:

select title, MAX(rented_date)
  from t
 group by title
 order by max(rented_date);

you need to - well - sort of identify the DATE you want to sort with.

you need to

a) build a result set that has all of the information - including the date
b) order by that date.

DISTINCT / GROUP BY with ORDER BY

Roy, March 20, 2007 - 10:14 am UTC

Well,

This was just an example that I took to explain you the real problem scenario. Ours is insurance related.

Basically, we have a table, say, transaction_records, that holds the details of all the transactions for a policy with at least these details:

1. Policy_Number
2. Trans_Type
3. Trans_Date
4. Trans_Amount
5. Trans_Unique

Now, for each combination of (Policy_Number, Trans_Type, Trans_Date), there will be multiple records in the table, each record uniquely identified by the Trans_Unique. The Trans_Unique is a sequence-based column and determines the chronological order in which the records are created.

The Trans_Date is just the actual date on which the transaction should have taken place, but it might not be actually processed on the due date. So, the true chronological order is reflected from ONLY the Trans_Unique field, and Trans_Date plays minimal role here.

So, for a combination (1234, 'Charge','03-MAR-07'), there might be 4 records, generated in the ascending order of Trans_Unique.

Now, I want the DISTINCT combinations of (Policy_Number, Trans_Type, Trans_Date) in the (chronological) order in which they were processed using a single query.

So, if I include Trans_Unique in the SELECT list, I won't get the desired combinations distinctly, right? Yet, I need to ORDER the result BY this field.

Thanks.

March 20, 2007 - 11:13 am UTC

and I'll keep asking the same tired question over and over.

how can you possibly order by this column - if it makes the results "not distinct" that means for sure there is MORE THAN ONE VALUE of trans_unique associated with each grouped record.

WHICH ONE OF THOSE TRANS_UNIQUE values do you want to use?

do you want the max? use max then
do you want the min? use min then
do you want the avg? the stddev? the XXXX?? use that aggregate then.

You have to PICK the trans_unique value you want associated with each grouped set of records - and then - and THEN - you may order by it.

think about this please.

POLICY_NUMBER        TRANS_TYPE       TRANS_DATE        TRANS_UNIQUE
---------------      --------------   -------------     ----------------
123                  X                01-jan-2007       1
456                  X                01-jan-2007       2
123                  X                01-jan-2007       3
456                  X                01-jan-2007       4

ok, now what??? the distinct values are:

123, x, 01-jan-2007
456, x, 01-jan-2007

Now, please tell me which comes "first" and "why" - which trans_unique (each grouped set had TWO TRANS_UNIQUE in this case) is used to sort.

Once you answer that - your ultimate answer will be obvious!!!

DISTINCT / GROUP BY with ORDER BY

Roy, March 20, 2007 - 10:44 am UTC

I guess, your solution would be:

SELECT Policy_Number, Trans_Type, Trans_Date
FROM transaction_records
GROUP BY Policy_Number, Trans_Type, Trans_Date
ORDER BY MAX(Trans_Unique);

Well, if I guessed correctly, and that it works, I think, this is the best solution. Thanks a lot Tom. Didn't really approach the problem from this angle.

Anyways, could you please let me know just for knowledge, how would the following query behave?

SELECT Policy_Number, Trans_Type, Trans_Date
FROM (SELECT Policy_Number, Trans_Type, Trans_Date,
ROW_NUMBER() OVER (PARTITION BY Policy_Number, Trans_Type, Trans_Date
ORDER BY Trans_Unique) rnum
FROM transaction_records
ORDER BY Trans_Unique)
WHERE rnum = 1;

Will the above query fetch me the result set always in the correct ORDER? Since the outer query doesn't involve any SORTING (deliberate or inherent), will this not give me the result in the correct order always?

Thanks.

March 20, 2007 - 11:19 am UTC

you probably meant order by "trans_unique DESC" (from big to small) in the second query.

the order by in the inline view (not the one in the analytic function, that is needed) is not necessary or useful.

and you need to KEEP the trans_unique in the second query so you can actually sort the result set by it after selecting out the rnum = 1 values...

select a, b, c
  from t
 group by a, b, c
 order by max(d)

will return the same results as (assuming that D is unique by A,B,C)

select a, b, c
  from (select a, b, c, d, 
               row_number() over (partition by a,b,c order by d DESC)rn
          from t
       )
 where rn = 1
 order by d

but the group by approach is likely the most sensible.

RE: DISTINCT / GROUP BY with ORDER BY

Stew Ashton, March 20, 2007 - 10:44 am UTC

[As I write, Tom has not followed up, so I dare to contribute.]

Hi Roy,

If you use DISTINCT, you are getting ONE result from potentially SEVERAL records, which means you have more than one Trans_Unique value to choose from: which one do you want to order by? If it's the highest one, use max(Trans_Unique) as Tom said. You can also use min(), avg(), sum(), whatever you want, but you have to choose some ONE thing to order by.

In other words, "you need to - well - sort of identify [what] you want to sort with."

Solution

Roy, March 20, 2007 - 10:56 am UTC

Yes, I am sorry, my solution using analytics, in fact, clashes with the other one. The equivalent solution using analytics should rather be this (I missed out DESC in my last post):

SELECT Policy_Number, Trans_Type, Trans_Date
FROM (SELECT Policy_Number, Trans_Type, Trans_Date,
ROW_NUMBER() OVER (PARTITION BY Policy_Number, Trans_Type, Trans_Date
ORDER BY Trans_Unique DESC) rnum
FROM transaction_records
ORDER BY Trans_Unique DESC)
WHERE rnum = 1;

And, yes, I am rather happy with the MAX() value to sort by, and not anything else. Thanks a lot Stew. Thanks Tom.

March 20, 2007 - 11:19 am UTC

see above, your query is wrong, the order by must be on the OUTER query, not in the inline view.

Got the solution

Roy, March 20, 2007 - 11:53 pm UTC

Thanks a lot Tom. Got the solution.

Solution

Roy, March 21, 2007 - 12:09 am UTC

I beg your pardon, messed up one thing: in fact, I need the transaction records in REVERSE chronological order, and that's why I am happy with MAX() to sort with in DESC order. So, I will go with this solution:

SELECT Policy_Number, Trans_Type, Trans_Date
FROM transaction_records
GROUP BY Policy_Number, Trans_Type, Trans_Date
ORDER BY MAX(Trans_Unique) DESC;

Thanks a lot once again.

Group by to turn rows to columns

Guna, February 18, 2008 - 10:55 am UTC

February 18, 2008 - 1:53 pm UTC

ahh, the infamous EAV model (entity-attribute-value)

funny - you are using it, but you don't know how to query it.

search this site for the word pivot to see examples of pivoting a result set.

Group by to turn rows to columns

Guna, February 18, 2008 - 12:03 pm UTC

Tom, I have a table with values like id, value. The table contents are like
1 - house
2 - iimigrant
1 - apartment
2 - Citizen
1 - Homeless
2 - Resident Alien

(Eg: 1 is ID and house is value)
Could you please help me to group them from rows into columns
1 - house - apartment - homeless
2 - Citizen - Homeless - Resident Alien

Eg: column1 = id, col2 = house, col3 = apartment ....

Since I am on Oracle 10g I tried with Collect function but then couldn't split that into seperate columns that I intend to get (as in Example above).

Thanks a lot.

Group By over DBLink

A reader, June 22, 2012 - 8:12 am UTC

Anytime I have TO_CHAR of a date column in the select that does aggregate function and the table is fetched over a database link, I am getting ORA-00979: not a GROUP BY expression.
Is this a bug or am I doing something wrong?

SQL> SELECT *
  2  FROM V$VERSION;

BANNER                                                                          
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production    
PL/SQL Release 11.2.0.2.0 - Production                                          
CORE 11.2.0.2.0 Production                                                      
TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production               
NLSRTL Version 11.2.0.2.0 - Production                                          

SQL> 
SQL> SELECT *
  2  FROM V$VERSION@RMPR;

BANNER                                                                          
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production    
PL/SQL Release 11.2.0.2.0 - Production                                          
CORE 11.2.0.2.0 Production                                                      
TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production               
NLSRTL Version 11.2.0.2.0 - Production                                          

SQL> 
SQL> 
SQL> SELECT COUNT(*)
  2  FROM AES_RX_POA_HDR_HIST@RMPR
  3  GROUP BY TO_CHAR(DELDATE, 'YYYYMM');

  COUNT(*)                                                                      
----------                                                                      
       267                                                                      
       669                                                                      

SQL> 
SQL> 
SQL> SELECT TO_CHAR(DELDATE, 'YYYYMM'), COUNT(*)
  2  FROM AES_RX_POA_HDR_HIST@RMPR
  3  GROUP BY TO_CHAR(DELDATE, 'YYYYMM');
SELECT TO_CHAR(DELDATE, 'YYYYMM'), COUNT(*)
*
ERROR at line 1:
ORA-00979: not a GROUP BY expression 
ORA-02063: preceding line from RMPR

June 22, 2012 - 8:24 am UTC

I would strongly encourage you to not do that, do not use to_char to truncate a date, use trunc

trunc(deldate,'mm')

it keeps things sortable
it doesn't use tons of cpu to format a date over and over and over

select to_char(the_date,'yyyymm'), cnt
  from (select trunc(deldate,'mm') the_date, count(*) cnt
          from t@r
         group by trunc(deldate,'mm')
       )
order by the_date
/

I cannot reproduce, but try my query and see what happens:

ops$tkyte%ORA11GR2> select TO_CHAR(DELDATE, 'YYYYMM'), COUNT(*)
  2    FROM t@loopback@ora11gr2
  3   GROUP BY TO_CHAR(DELDATE, 'YYYYMM');

TO_CHA   COUNT(*)
------ ----------
201206          1

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select to_char(the_date,'yyyymm'), cnt
  2    from (select trunc(deldate,'mm') the_date, count(*) cnt
  3            from t@loopback@ora11gr2
  4           group by trunc(deldate,'mm')
  5         )
  6  order by the_date
  7  /

TO_CHA        CNT
------ ----------
201206          1

Group By over DBLink

A reader, June 22, 2012 - 11:00 am UTC

Thank you for the reply.

FYI
-----

Strange, the first query doesn't work, but the second one works.

SQL> SELECT TRUNC (deldate, 'mm') the_date, COUNT (*) cnt
  2  FROM Anp_RX_POA_HDR_HIST@RMPR
  3  GROUP BY TRUNC (deldate, 'mm');
SELECT TRUNC (deldate, 'mm') the_date, COUNT (*) cnt
*
ERROR at line 1:
ORA-00979: not a GROUP BY expression 
ORA-02063: preceding line from RMPR 


SQL> 
SQL> SELECT TO_CHAR (the_date, 'yyyymm') DAT, cnt
  2  FROM (SELECT TRUNC (deldate, 'mm') the_date, COUNT (*) cnt
  3        FROM ANP_RX_POA_HDR_HIST@RMPR
  4        GROUP BY TRUNC (deldate, 'mm'))
  5  ORDER BY the_date;

DAT                    CNT                                                      
--------------- ----------                                                      
201205                 267                                                      
201206                 669

June 22, 2012 - 4:57 pm UTC

the second one will be more performant in general as well.

It could have been an issue in 11.2.0.2 that was resolved in 11.2.0.3

Group by Clause

Vijay Sekhar Lattala, June 28, 2012 - 2:02 pm UTC

Hi Tom,

This is Vijay from India(Cochin).
I have a query which is taking very long time as it has some aggregate functions and Group By Clause. Could you please help me on tuning the query.
Please find below the query and this is general Oracle Apps HRMS Module EBS query.

SELECT child.resource_id Person_id,
TRUNC (child.start_time) Start_Date,
TRUNC (child.stop_time) Stop_Date,
SUM (1) N,
MIN (child.start_time) Earliest_In,
MAX (child.stop_time) Latest_Out,
SUM(TO_CHAR (child.stop_time, 'HH24') * 60
+ DECODE (TRUNC (child.stop_time),
TRUNC (child.start_time), 0,
1440)
+ TO_CHAR (child.stop_time, 'MI')
- TO_CHAR (child.start_time, 'HH24') * 60
- TO_CHAR (child.start_time, 'MI'))
W
FROM hxc_time_building_blocks child
WHERE child.scope = 'DETAIL'
AND child.object_version_number =
(SELECT MAX (object_version_number)
FROM hxc_time_building_blocks
WHERE Time_Building_block_id =
child.Time_Building_block_id)
GROUP BY child.resource_id,
TRUNC (child.start_time),
TRUNC (child.stop_time)

Please let me know if you need anything else

Regards,
Vijay

June 29, 2012 - 9:56 am UTC

you can try using analytics

select * 
  from (
select child.*, 
       row_number() over (partition by time_building_block_id 
                          order by object_version_number DESC) rn 
  from hxc_time_building_blocks child
       )
 where rn = 1

Call that query Q, you could then:

SELECT child.resource_id Person_id,
               TRUNC (child.start_time) Start_Date,
               TRUNC (child.stop_time) Stop_Date,
               SUM (1) N,
               MIN (child.start_time) Earliest_In,
               MAX (child.stop_time) Latest_Out,
               SUM(TO_CHAR (child.stop_time, 'HH24') * 60
                   + DECODE (TRUNC (child.stop_time),
                        TRUNC (child.start_time), 0,
                        1440)
                   + TO_CHAR (child.stop_time, 'MI')
                   - TO_CHAR (child.start_time, 'HH24') * 60
                   - TO_CHAR (child.start_time, 'MI'))
                  W
          FROM (Q) child
        GROUP BY child.resource_id,
                 TRUNC (child.start_time),
                 TRUNC (child.stop_time)

Group by clause

Sateesh, July 13, 2012 - 4:26 am UTC

Hi Tom,

I am asking this as this question is related to Group by clause. Please can you provide the query for this as this.

How to retrieve the columns that are not in group by clause. For example if I have a table testgroup with 6 columns, I need to select the six columns but use group by only for on column. Below is the query for creating table.

CREATE TABLE testgroup (
id number PRIMARY KEY,
a number,
b number,
c varchar2(45),
d varchar2(45),
e varchar2(45),
f varchar2(45)
)

insert into testgroup values(1,1,2,'c1','d1','e1','f1');
insert into testgroup values(2,3,2,'c2','d2','e2','f2');
insert into testgroup values(3,2,2,'c3','d3','e3','f3');
insert into testgroup values(4,3,3,'c4','d4','e4','f4');
insert into testgroup values(5,4,3,'c5','d5','e5','f5');
insert into testgroup values(6,2,3,'c6','d6','e6','f6');
insert into testgroup values(7,6,4,'c7','d7','e7','f7');
insert into testgroup values(8,3,4,'c8','d8','e8','f8');
insert into testgroup values(9,2,4,'c9','d9','e9','f9');

I want to select the complete row values for max(a) group by b.
The result set should be like:

2,3,2,'c2','d2','e2','f2'
5,4,3,'c5','d5','e5','f5'
7,6,4,'c7','d7','e7','f7'

But for Group by we need keep all the columns in group by that we are selecting. Can you please help me on how to get his done.

Thanks in advance,
Sateesh B.

July 16, 2012 - 3:03 pm UTC

ops$tkyte%ORA11GR2> select max(id) keep ( dense_rank first order by a desc ) id,
  2         max(a) a,
  3         b,
  4         max(c) keep (dense_rank last order by a ) c,
  5         max(d) keep (dense_rank last order by a ) d,
  6         max(e) keep (dense_rank last order by a ) e,
  7         max(f) keep (dense_rank last order by a ) f
  8    from testgroup
  9    group by b
 10    order by 1
 11  /

        ID          A          B C     D     E     F
---------- ---------- ---------- ----- ----- ----- -----
         2          3          2 c2    d2    e2    f2
         5          4          3 c5    d5    e5    f5
         7          6          4 c7    d7    e7    f7

that works if A is unique within B. As you can imagine, if A is repeated multiple times for a given B value, your question doesn't make sense.

http://docs.oracle.com/cd/E14072_01/server.112/e10592/functions063.htm

Live SQL

Share and learn SQL and PL/SQL; free access to the latest version of Oracle Database!

Dev Gym

Classes, workouts and quizzes on Oracle Database technologies. Expertise through exercise!

Group By Clause

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Just clarifying

Just what I wanted to know

GROUP BY - a pseudo column (back to SQL 102)

GROUP BY and performance

Quick question

Group By Data

Need help on this one...

Ooops...sorry

Group By Month

How to do this Group by

Yes

Distinct

A grouping

Thank you, Tom!!

Interesting; Is there a way to make grouping perform faster

Thank you

Group by cluase

Group by cluase

Distinct is taking long time

SQL

I know!!

distinct and use of PK

distinct and returns other columns ( Column wise distinct and not row wise )

Complilation error

Group by Month starting from 8th...

Group By clause

Group By output where grouped expression only appears once per grouping

:-O You are a legend!

for the query above

GROUP BY and DISTINCT

Having Clause Problem

DISTINCT / GROUP BY with ORDER BY clause

DISTINCT / GROUP BY with ORDER BY

Example

DISTINCT / GROUP BY with ORDER BY

DISTINCT / GROUP BY with ORDER BY

RE: DISTINCT / GROUP BY with ORDER BY

Solution

Got the solution

Solution

Group by to turn rows to columns

Group by to turn rows to columns

Group By over DBLink

Group By over DBLink

Group by Clause

Group by clause

More to Explore

Live SQL

Dev Gym