Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Arun.

Asked: November 28, 2007 - 12:37 am UTC

Last updated: July 28, 2011 - 6:29 pm UTC

Version: 10.2.0.3

Viewed 1000+ times

You Asked

Hi Tom,

Is there anyway to get other columns which are not included in group by clause?

For eg., in the code below, have to do max() to get other columns like destinationid,productid,agreementid

SELECT    eda.esid esid,
          eda.tier_rate_definition_itemid tier_rate_defn_itemid,
          eda.valid_from valid_from,
          eda.adjusted_valid_from adj_valid_from,
          max(eda.destinationid) destinationid,
          max(eda.productid) productid,
          max(eda.agreementid) agreementid,
          sum(eda.usage_amt) usage_amt,
          sum(eds.markup_amt) markup_amt,
          sum(eda.surcharge_amt) surcharge_amt,
          sum(eda.no_of_events) no_of_events,
          sum(eda.rated_volume) rated_volume
   FROM
     FROM e_det_amt partition eda
    GROUP BY esid,valid_from,adjusted_valid_from,tier_rate_definition_itemid


Thanks & Regards
R.Arun kumar

and Tom said...

umm,

which destinationid would you LIKE to get. You do realize that you are very possibly getting destinationid from one record, productid from some other record and so on.


so, you would have to tell us WHICH ONE of the records should provide this information.


and if you say "destinationid, productid, agreementid is constant for a given esid,valid_from,adjusted_valid_from,tier_rate_definition_itemid combination" then your approach is valid - or you can just ADD those three columns to the group by.


Rating

  (17 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

vague question ... vague answer (this one) ...

nameless, November 28, 2007 - 10:24 am UTC

Tom Kyte
November 28, 2007 - 11:10 pm UTC

analytics do NOT aggregate

so, no, they would not be useful here.

answer was not vague.

answer was: you need to tell us WHAT IS EXPECTED here.

It is all confusing now ;)

nameless, November 29, 2007 - 12:07 am UTC

answer was: you need to tell us WHAT IS EXPECTED here.

Yes. Though the you would not be me.
The question was not clear ... hence, my own followup was vague (as in, maybe this is what you were looking for, but I cannot be sure)

analytics do NOT aggregate

Yes again. But I didn't say they would.

FIRST/LAST, like SUM, MIN, MAX exist in both incarnations: as aggregate functions and as analytic functions.
Tom Kyte
November 29, 2007 - 8:40 am UTC

but since analytics don't aggregate, they would be entirely "inappropriate" here.

first/last do not exist as aggregates.

aggregates are available as analytics.
not all analytics are available as aggregates.

To Nameless--there is no confusion

suresh, November 29, 2007 - 1:23 am UTC

Tom's response is absolutely correct, not vague and there is no confusion also. The OP's question was 'how to get the 'other' columns which are not included in group by clause. Tom's response is
1) either you have a single row of destinationid,productid,agreementid for "esid,valid_from,adj_valid_from,tier_rate_definition_itemid" combination, in which case no need to use 'any' aggregate function, but to simply add these columns in the group by clause.
2) if data is such that there could be multiple records, then any aggregate function will give 'wrong' result, as max(destinationid) may be a some record, max(productid) may be in other record and so on...

Thomas, November 29, 2007 - 5:46 am UTC

Suppose the additional information needed was known to be
constant per grouping. I've asked myself many times whether
it is better (logical and/or performance-wise) whether to
use MIN() or MAX() on these additional data columns or
to include these columns in the GROUP BY clause. Both will
lead to the same correct result.

What would you suggest?
Tom Kyte
November 29, 2007 - 8:53 am UTC

I'd add to the group by for semantic reasons - you do not want the min/max, you wanted to group by them

and in the event the assumption they were constant is FALSE sometime in the future, it'll be obvious in the output, instead of hidden.

To thomas

karthick, November 29, 2007 - 9:06 am UTC

Constant per grouping? are you expecting a constant to be included like this.

19:47:23 [SYSADM@INLABTST]> SELECT DEPTNO, MIN(SAL), 'A' AS CONS FROM SCOTT.EMP WHERE EMPNO <7698 GROUP BY DEPTNO;

DEPTNO MIN(SAL) C
---------- ---------- -
30 1250 A
20 800 A

if so no min or max or to include in group by is necessory.

but not sure what you mean by constant per group.
Tom Kyte
November 29, 2007 - 6:36 pm UTC

no, they meant there was some value within DEPTNO that was constant

eg:

select d.deptno,d.dname, min(sal) 
  from emp e, dept d
 where e.deptno = d.deptno
 group by d.deptno, d.dname



here d.dname is "constant within the group by - d.deptno"

their question surrounds d.dname - whether to aggregate or group by.

I say group by.

Thomas, November 29, 2007 - 10:02 am UTC

The question was whether constructs like

SELECT empid,MAX(empname) FROM emp GROUP BY empid

or

SELECT empid,empname FROM emp GROUP BY empid,empname

are better style. I know empid is unique, so given an empid the corresponding empname is fixed ("constant"). So Tom unterstood it correctly.

Tom, is there any difference concerning performance (I guess not)? The first statement would have to use grouping and aggregation, whereas the second only uses grouping, but has one more column in the GROUP BY clause.

Tom Kyte
November 29, 2007 - 6:52 pm UTC

in general however it is the difference between what I show above

select d.deptno,d.dname, min(sal) 
  from emp e, dept d
 where e.deptno = d.deptno
 group by d.deptno, d.dname


and 

select d.deptno, min(d.dname), min(sal) 
  from emp e, dept d
 where e.deptno = d.deptno
 group by d.deptno




that is, you always have something to group by, you always have something to aggregate

you just want to know about that third bit -dname in this case - it belongs in the group by.

confusion rules?

nameless, November 29, 2007 - 10:15 am UTC

Tom: first/last do not exist as aggregates.

Then the documentation I pointed at is seriously flawed or I'm reading it wrong ... it even comes complete with an aggregate example and an analytic example.

Suresh: There is no confusion

Apparently there is ;)

I didn't suggest Tom's response was vague; I qualified my own follow-up as vague. It was clear to me what I meant to say, but it may have not appeared so to the OP ... well, quid pro quo.

Apart from the other obvious one, Tom's answer was: which destinationid would you LIKE to get. You do realize that you are very possibly getting destinationid from one record, productid from some other record and so on. so, you would have to tell us WHICH ONE of the records should provide this information.

I hypothetical answer could've been: oh, I didn't realize that ... case closed.

Another hypothetical answer could've been: actually, I do have a method for specifying, on a column by column basis, which of those many records should provide that information. Assuming a solution could've been devised, it might have involved joining back to the original table.

Now, have a read at this paragraph from the documentation I've pointed at:

When you need a value from the first or last row of a sorted group, but the needed value is not the sort key, the FIRST and LAST functions eliminate the need for self-joins or views and enable better performance.

Here you have it, for a specific method/need, one could employ the FIRST/LAST functions.

And if you look at the aggregate example in that docummentation page you'll notice that employee is grouped by department_id but they also bring back salary ... seems to be in the spirit of: Is there anyway to get other columns which are not included in group by clause.

Tom Kyte
November 29, 2007 - 6:57 pm UTC

you are adding the whole KEEP DENSE RANK bit here then.
SELECT department_id,
MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY commission_pct) "Worst",
MAX(salary) KEEP (DENSE_RANK LAST ORDER BY commission_pct) "Best"
   FROM employees
   GROUP BY department_id;


there is a first/last analytic


and I think the documentation is flawed there calling them an aggregate function - they are a "keyword" there - but aggregate? No, the min and max are aggregates, the first and last are modifiers used with dense_rank.


First

Laurent Schneider, December 04, 2007 - 3:35 am UTC

<i>there is a first/last analytic </i>
Well, but with dense_rank too. 

Ex: richest employee of the department
<code> select ename,sal, max(ename) keep (dense_rank first order by sal) over (partition by deptno) RICH_ENAME, max(sal) over (partition by deptno) RICH_SAL from emp;

ENAME             SAL RICH_ENAME   RICH_SAL
---------- ---------- ---------- ----------
CLARK            2450 MILLER           5000
KING             5000 MILLER           5000
MILLER           1300 MILLER           5000
JONES            2975 SMITH            3000
FORD             3000 SMITH            3000
ADAMS            1100 SMITH            3000
SMITH             800 SMITH            3000
SCOTT            3000 SMITH            3000
WARD             1250 JAMES            2850
TURNER           1500 JAMES            2850
ALLEN            1600 JAMES            2850
JAMES             950 JAMES            2850
BLAKE            2850 JAMES            2850
MARTIN           1250 JAMES            2850


I think too that FIRST is not a "function" but a "keyword"</code>

oops

Laurent Schneider, December 04, 2007 - 3:39 am UTC

Sorry I forget the DESC keyword :-(
select ename,sal, 
  max(ename) 
    keep (dense_rank first order by sal desc) 
    over (partition by deptno) RICH_ENAME, 
  max(sal) 
    over (partition by deptno) RICH_SAL 
from emp;

ENAME             SAL RICH_ENAME   RICH_SAL
---------- ---------- ---------- ----------
CLARK            2450 KING             5000
KING             5000 KING             5000
MILLER           1300 KING             5000
JONES            2975 SCOTT            3000
FORD             3000 SCOTT            3000
ADAMS            1100 SCOTT            3000
SMITH             800 SCOTT            3000
SCOTT            3000 SCOTT            3000
WARD             1250 BLAKE            2850
TURNER           1500 BLAKE            2850
ALLEN            1600 BLAKE            2850
JAMES             950 BLAKE            2850
BLAKE            2850 BLAKE            2850
MARTIN           1250 BLAKE            2850


How about WITH

cas, December 06, 2007 - 1:47 am UTC

I sometimes use WITH to avoid very long list in the GROUP BY. The aggregates go into with and the rest gets joined to one or 2 columns from in in the main query. It goes like this:
WITH totals as
(SELECT the_id, SUM(x) as tot_x, SUM(Y) as tot_y
FROM whatever
WHERE something
GROUP BY the_id
)
SELECT lost-of-stuff
FROM totals
INNER JOIN some_table st on st.id = totals.the_id
INNER JOIN some-more-tables

usually I condition what to select in the WITH section so often the WHERE clause is not needed in the main query.
The penalty may be that some tables will need to be read twice - once in the WITH and second time in the main query.
What do you think about this approach?
Tom Kyte
December 10, 2007 - 9:58 am UTC

... The penalty may be that some tables will need to be read twice - once in the
WITH and second time in the main query. ...

that is a rather huge penalty don't you think....


With Group by & Order by find out most densely populated data

Bhasker, May 30, 2008 - 2:32 pm UTC

Hi Tom,
I have a query as below to determine, which day's (date) data is loaded in the collection table.
Sometimes more than one day's data gets loaded to collection. However, the large volume of the data will be
for only one day. I am trying to figure out which day using count & group by using Order by. Then ROWNUM=1.
But it takes longer time to execute. Just looking if there are any other oracle functions to do this better?
As always, your feedback would be very valuable.
There are no indexes on collection table. Table has about 30 million records per day.

SELECT idv_date, NULL,
'2' AS status,
NULL, NULL, NULL
FROM (SELECT /*+ PARALLEL(c,4) */
TRUNC (CAST (local_time AS DATE)) idv_date,
COUNT (*) cnt
FROM Cl1_Collection c
WHERE instance_id BETWEEN 222222 AND 333333
GROUP BY TRUNC (CAST (local_time AS DATE))
ORDER BY cnt DESC, TRUNC (CAST (local_time AS DATE)) ASC)
WHERE ROWNUM = 1

Thanks,
Bhasker
Tom Kyte
May 30, 2008 - 3:18 pm UTC

think about this for a moment please.....

perhaps the thing that loads this table needs to ........ I don't know ... put into a table somewhere what day it loaded.

With loosey goosey data processing things like "However, the large volume of the data will be for only one day." and a 30million row table without any indexes - well, what you see is what you got.


I would say "there isn't any magic oracle feature you should be looking for, but a better design that meets your processing requirements could be called for"



Thanks! any analytic functions?

Bhasker, May 30, 2008 - 6:18 pm UTC

Actually, we're running the process daily. Which will do the load using a single insert statment into Collection table. But on a single day, we might be pulling 2 days back data which might contain older than 2 days data (small amount). So, to figure out which date really has more data is requiring this expensive sql query.
Any other better ways to do it instead of ROWNUM=1 after group & order.

As always, i follow your advice.
Thanks,
Bhasker
Tom Kyte
May 31, 2008 - 6:59 am UTC

yes, by designing into the process things that will make your job/task easier.

Like having the loader register what data was loaded somewhere - it is highly likely your loader KNOWS the data

constant within the group by

Gianluca Trombetta, November 11, 2009 - 5:24 am UTC

I would return on the question:
is better style

SELECT empid,MAX(empname) FROM emp GROUP BY empid

or

SELECT empid,empname FROM emp GROUP BY empid,empname

In my opinion the BEST would be a new aggregate function (e.g. CONST_VAL) that returns the argument passed, raising an exception when not constant within the group.

So we could write:

SELECT empid,CONST_VAL(empname) FROM emp GROUP BY empid

I tried to write this function on my own, but I was not able to obtainan an overloaded user-defined aggregate function (like MIN or MAX) that accepts (and returns)
different datatypes.

Can I hope to have a function like that in some future release?



Thanks (for this site and your columns)
Gianluca Trombetta
Tom Kyte
November 15, 2009 - 12:11 pm UTC

style ??? style?!?!?!?! how does "style" enter into anything.

select x, max(y) from t group by x;

cannot be compared to

select x, y from t group by x,y;

in any way shape or form. not at all - they are two entirely different - as in "not comparable" - queries.



I don't get your point at all - why group by anything. did you want DISTINCT values? then just use DISTINCT

select distinct x,y from t;

I have no idea at all what this "const_val(expression)" would do or when it could be used.


ORA-00935

Sokrates, December 10, 2009 - 4:40 am UTC

SQL> select  count(dummy) dummy from dual order by count(dummy);
select  count(dummy) dummy from dual order by count(dummy)
                                              *
ERROR at line 1:
ORA-00935: group function is nested too deeply


is this a bug or a feature ?
Tom Kyte
December 10, 2009 - 2:57 pm UTC

feature

ops$tkyte%ORA10GR2> select  count(dummy) dummy from dual order by count(dummy);
select  count(dummy) dummy from dual order by count(dummy)
                                              *
ERROR at line 1:
ORA-00935: group function is nested too deeply


ops$tkyte%ORA10GR2> select  count(dummy) dummyx from dual order by count(dummy);

    DUMMYX
----------
         1




Return columns not in GROUP BY

Mark, September 30, 2010 - 2:56 pm UTC

I have been plagued by this problem too. Seems like the SQL language developers dropped the ball on something so simple as returning the value of a different column, even though MAX has already isolated a single record. Here is what I used to find a unique identifier "xID". The combination of MAX RevCode and AnotherValue in the subquery isolated a single record.

SELECT xID,RevCode
FROM MyTable where AnotherValue='1234'
AND revcode = (SELECT MAX(RevCode) FROM dbo.MyTable
WHERE AnotherValue='1234')

Perhaps a future version of SQL will allow me to exempt "xID" from a GROUP BY clause?
I had problems with "WITH". DBA settings wouldn't allow me to use it.
Tom Kyte
September 30, 2010 - 3:26 pm UTC

answered in reverse.

I had problems with "WITH". DBA settings wouldn't allow me to use it.


you are mistaken, WITH requires no privileges, no settings, nothing. It just works - unless you are using a pretty GUI tool written by a third party that doesn't recognize it.

You can use with - no problem.



Now, onto your question, your query:

SELECT xID,RevCode
FROM MyTable where AnotherValue='1234'
AND revcode = (SELECT MAX(RevCode) FROM dbo.MyTable
WHERE AnotherValue='1234')

(dbo.... hmmm, making Oracle sound like sybase or ms - yuck :) )

You can do this with analytics for many many years. analytics get you the 'aggregate without aggregating'




ops$tkyte%ORA11GR2> create or replace view myTable
  2  as
  3  select empno xid, job revcode, deptno anothervalue
  4    from scott.emp;

View created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> SELECT xID,RevCode
  2    FROM mytable
  3   where AnotherValue=30
  4     AND revcode = (SELECT MAX(RevCode) FROM MyTable WHERE AnotherValue=30)
  5  /

       XID REVCODE
---------- ---------
      7499 SALESMAN
      7521 SALESMAN
      7654 SALESMAN
      7844 SALESMAN

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select *
  2    from (
  3  select xid, revcode, max(revcode) over () max_revcode
  4    from mytable
  5   where anothervalue = 30
  6         )
  7   where revcode = max_revcode
  8  /

       XID REVCODE   MAX_REVCO
---------- --------- ---------
      7499 SALESMAN  SALESMAN
      7521 SALESMAN  SALESMAN
      7654 SALESMAN  SALESMAN
      7844 SALESMAN  SALESMAN


Sam, May 28, 2011 - 3:50 am UTC

Hi Tom,

How can we write a query which gives the result of sum of each department salary without using group by clause.

Best Regards
Sam
Tom Kyte
May 31, 2011 - 9:42 am UTC

why? what is the 'goal', what is the result set you need?

If the result set is "against the scott.emp table, show me a sum of salary by department", then the only logical response is

select deptno, sum(sal) from emp group by deptno;

Are there other ways? Sure, but they would be dumb to do and not very performant in general.


ops$tkyte%ORA11GR2> @test
ops$tkyte%ORA11GR2> select distinct deptno, sum(sal) over (partition by deptno) sumsal from scott.emp;

    DEPTNO     SUMSAL
---------- ----------
        10       8750
        20      10875
        30       9400

Elapsed: 00:00:00.00
ops$tkyte%ORA11GR2> select deptno, (select sum(sal) from scott.emp where emp.deptno = dept.deptno) from (select distinct deptno from scott.emp) dept;

    DEPTNO (SELECTSUM(SAL)FROMSCOTT.EMPWHEREEMP.DEPTNO=DEPT.DEPTNO)
---------- --------------------------------------------------------
        30                                                     9400
        20                                                    10875
        10                                                     8750

Elapsed: 00:00:00.00

group by cases

A reader, July 26, 2011 - 7:12 am UTC

Hi Tom,

Whats is the reason for the query to behave differently when group by is not specified?. I always thought thatthat when we
select col1,count(*) from t; a group by on col1 is definitely necessary for the query as it would fail to parse even.

select as_of_date,count(*)
from(select sysdate as_of_date from dual where 1=0)

AS_OF_DAT COUNT(*)
--------- ----------
26-JUL-11 0
1 row selected.

select as_of_date,count(*)
from(select sysdate as_of_date from dual where 1=0)
group by as_of_date
-- no rows returned

I am in oracle 10gR2




Tom Kyte
July 28, 2011 - 6:29 pm UTC

select as_of_date, count(*)
from (select sysdate as_of_date from dual where 1=0)

is known to be the same as:

select as_of_date, count(*) from (select 'CONSTANT' as_of_date from dual where 1=0)

which is known to be the same as

select 'CONSTANT', count(*) from dual where 1=0;


through a process called view merging.


ops$tkyte%ORA11GR2> select sysdate, count(*) from dual where 1=0;

SYSDATE     COUNT(*)
--------- ----------
28-JUL-11          0

ops$tkyte%ORA11GR2> select 'constant', count(*) from dual where 1=0;

'CONSTAN   COUNT(*)
-------- ----------
constant          0



you never need to group by a constant - we know it has but one value. So, you are really sort of doing:

select count(*) from dual where 1=0;

and a group by without an aggregate ALWAYS returns at least and at most one row.

When you add a group by - there has to be a row in the group by group - else you get zero rows for that group:

ops$tkyte%ORA11GR2> select 'x', count(*) from dual where 1=0;

'   COUNT(*)
- ----------
x          0

ops$tkyte%ORA11GR2> select 'x', count(*) from dual where 1=0 group by 'x';

no rows selected

ops$tkyte%ORA11GR2>