Skip to Main Content
  • Questions
  • Howto select first value in a group by bunch of rows....

Breadcrumb

XMAS_TREE

The AskTOM team is taking a break over the holiday season, so we're not taking questions or responding to comments, but we might still sneak in a few Office Hours sessions! Thanks for being a member of the AskTOM community.

Question and Answer

Connor McDonald

Thanks for the question, Kim.

Asked: February 19, 2007 - 2:21 pm UTC

Last updated: November 29, 2012 - 7:48 am UTC

Version: 10.2

Viewed 100K+ times! This question is

You Asked

Hi Tom

I have just begun using analytic functions, but have come up short on this:

In a query where I group by a field, I would like to select the first values from a specific row.

I have using something like:

select distinct
a.name
, first_value(c.task) over (partition by a.name order by c.task_code asc) as task_code
, first_value(e.code_subject) over (partition by a.name order by c.task_code asc) as code_subject
from .....


Without distinct, the two first_value() expressions transforms the task and code_subject into the same values for all rows per same name. I would have made a group by name, but this is not supported, so instead I put in distinct.
Normally, when I come across the use of distinct, it is caused by notn joining right/enough/etc. and I think this could also be the case here (wrong tool).

Do I need the distinct, or have I missed something?

Br
Kim

and Tom said...

Analytics are NOT aggregates - they do not "squish out" any rows - they simply are functions that work across rows in a partition.

another way to code what you have would be:

select *
from (select name, task, code_subject,
row_number() over (partition by name order by task_code) rn
from t)
where rn = 1;

Rating

  (17 ratings)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

Why not use a built-in function ?

Jay, February 20, 2007 - 12:36 pm UTC

The FIRST aggregate function seems to fit your requirement exactly:

select
a.name
, MAX(c.task) KEEP (DENSE_RANK FIRST ORDER BY c.task_code) as task_code
, MAX(e.code_subject) KEEP (DENSE_RANK FIRST ORDER BY c.task_code) as code_subject
from .....
group by a.name

Tom Kyte
February 20, 2007 - 3:13 pm UTC

Why does FIRST take only numeric datatypes ?

Jay, February 20, 2007 - 4:13 pm UTC

The Oracle 10gR2 SQL Reference manual on page 5-68 explains the FIRST function and contains the following:
----------------
This function takes as an argument any numeric datatype or any nonnumeric datatype
that can be implicitly converted to a numeric datatype. The function returns the same
datatype as the numeric datatype of the argument.
----------------

I presume the argument that is being referred to is the argument to the aggregate function.

1. The argument restriction here should be no different than the argument restriction for the aggregate function being used.
2. Which non-numeric datatype can be implicitly converted to a numeric datatype ?
3. It seems to work with character datatypes as well (I suppose the characters are converted to the corresponding numeric codes of the database character set and dates are anyway stored internally as numbers):

SQL> select * from test_tab;

DEPT EMP_NAME SALARY
---------- ---------- ----------
10 MARY 30000
10 JOHN 20000
10 SCOTT 20000
20 BOB 50000
20 BETTY 50000

SQL> select dept,
2 max(emp_name) keep (dense_rank first order by salary) emp_highest_salary
3 from test_tab
4 group by dept;

DEPT EMP_HIGHES
---------- ----------
10 SCOTT
20 BOB

Tom Kyte
February 20, 2007 - 5:26 pm UTC

it is more of a "first modifier to the aggregate function", see the link i have right above.

removing groups based on condition

Raj, February 21, 2007 - 9:14 am UTC

Dear Tom,

How to remove the rows in a group if any one row in the group does not satisfy a condition?

For eg:
test@KAS>create table t(
  2  docid varchar2(15),
  3  doctype_id number(3),
  4  status_id number(3),
  5  delflg char(1))
  6  /

Table created.

test@KAS>
test@KAS>insert into t values('LB0001_20080202', 401, 203, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0001_20080102', 401, 204, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0001_20070221', 401, 204, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0001_20070204', 401, 204, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0002_20071202', 401, 201, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0002_20070203', 401, 202, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0002_20070201', 401, 203, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0002_20050205', 401, 204, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0003_20080202', 401, 201, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0003_20080201', 401, 202, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0003_20040204', 401, 201, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0003_20040105', 401, 202, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0004_20030202', 401, 203, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0004_20030103', 401, 203, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0004_20020204', 401, 203, 'N')
  2  /

1 row created.

test@KAS>insert into t values('LB0004_20010205', 401, 203, 'N')
  2  /

1 row created.

test@KAS>
test@KAS>select * from t
  2  /

DOCID           DOCTYPE_ID  STATUS_ID D
--------------- ---------- ---------- -
LB0001_20080202        401        203 N
LB0001_20080102        401        204 N
LB0001_20070221        401        204 N
LB0001_20070204        401        204 N

LB0002_20071202        401        201 N
LB0002_20070203        401        202 N
LB0002_20070201        401        203 N
LB0002_20050205        401        204 N

LB0003_20080202        401        201 N
LB0003_20080201        401        202 N
LB0003_20040204        401        201 N
LB0003_20040105        401        202 N

LB0004_20030202        401        203 N
LB0004_20030103        401        203 N
LB0004_20020204        401        203 N
LB0004_20010205        401        203 N

16 rows selected.
(blank line inserted for clarity)


Here I should group by the first 6 characters of DOCID and if the STATUS_ID in any one row in the group is other than 203 or 204 then this group should be excluded from the output. So in this case only LB0001 and LB0004 group rows should be in the resultset.

Could you please help?

Thank you
Tom Kyte
February 21, 2007 - 11:09 am UTC

delete from t
where substr( docid, 1,6 ) in (select substr( docid, 1, 6 )
                                 from t 
                                where status_id not in (203,204))


find the set of docids to remove
then remove them.

Raj, February 21, 2007 - 11:16 am UTC

Dear Tom,

I didnot mean to delete from the table, but to exclude from the partition while using an analytical function.

Thank you

Tom Kyte
February 21, 2007 - 12:25 pm UTC

did not really see any analytics?

but in any case,


select ...
from t
where substr() not in ( select substr() from .... );


use the delete statement to get a positive statement full of rows you want to process.

Eliminate using sum of an expr

Jay, February 21, 2007 - 11:20 am UTC

Raj

Try this:

select substr(docid,1,6)
from t
group by substr(docid,1,6)
having sum( (case when status_id in (203,204) then 0 else 1 end) ) = 0

Raj, February 21, 2007 - 11:42 am UTC

Thank you Jay,

I need to output all records in each group which satisfies the condition,

This is for a content management system, the documents with a status id of 203 and 204 should be exported, but if any one of the document in one group has a status other than 203 and 204, then that group should be excluded from exporting.

The group is decided by the first 6 characters of DOCID. If it is same then they are related documents.

Thank you

Use Analytics to select all records in a group

Jay, February 21, 2007 - 12:59 pm UTC

If all records in the export groups are to be selected, the following SQL might be better (avoids using a subquery with the previous solution):

select docid, doctype_id, status_id, delflg
from (
select t.*, min( (case when status_id in (203,204) then 'Y' else 'N' end) ) over (partition by substr(docid,1,6)) export_group
from t
)
where export_group = 'Y'

Raj, February 22, 2007 - 7:30 am UTC


Thank you Tom and Jay,

I was looking for the answer which Jay had given me.

I was not able to filter the group if the status_id is not in 203 or 204.

Thanks to both of you


Analytics ARE aggregates

A reader, March 01, 2007 - 3:15 pm UTC

at least many analytic functions can also be used as aggregates, first_value can't.

I wish it could, I don't see any cause why.

Can you tell us any ?
Tom Kyte
March 02, 2007 - 1:00 pm UTC

you have "keep dense rank first" functionality for that.

the following shows what first_value would return, how to use group by with that and lastly how to single step it using dense_rank with keep and first:

ops$tkyte%ORA10GR2> select deptno, first_value(ename) over (partition by deptno order by sal desc, ename) fv
  2    from emp;

    DEPTNO FV
---------- ----------
        10 KING
        10 KING
        10 KING
        20 FORD
        20 FORD
        20 FORD
        20 FORD
        20 FORD
        30 BLAKE
        30 BLAKE
        30 BLAKE
        30 BLAKE
        30 BLAKE
        30 BLAKE

14 rows selected.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select deptno, max(fv)
  2    from (
  3  select deptno, first_value(ename) over (partition by deptno order by sal desc, ename ) fv
  4    from emp
  5         )
  6   group by deptno;

    DEPTNO MAX(FV)
---------- ----------
        10 KING
        20 FORD
        30 BLAKE

ops$tkyte%ORA10GR2> select deptno, max(ename) keep (dense_rank first order by sal desc, ename ) fv
  2    from emp
  3   group by deptno;

    DEPTNO FV
---------- ----------
        10 KING
        20 FORD
        30 BLAKE


Aggregate function choice when using FIRST/LAST?

Jimmy, March 17, 2008 - 4:14 pm UTC

Is there any difference in which aggregate function you choose when using FIRST/LAST? It seems to me that MIN, MAX, SUM, and AVG all return the same value, COUNT will always be 1, and VARIANCE and STDDEV will always be 0.

The documentation at:

http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14223/analysis.htm#DWHSG0205

...doesn't explicitly state this, and seems to imply otherwise by using both MIN and MAX in their examples, as if it makes a difference.
Tom Kyte
March 24, 2008 - 9:00 am UTC

not sure what you mean at all. min, max, sum, avg, count do not always return "the same value", no more than first and last do.


min - returns the minimum value of a set of rows in a given window.
first - returns the first value of a set of rows in a given window.
count - returns the count of a set of rows in a given window.

they all work within the SAME set of conditions. Which one you use it drive purely, solely by what you NEED.


You need to learn about "windows" and what is in the current window and what default windows (current row and unbounded preceding - generally is the default window)

ops$tkyte%ORA10GR2> create table t as select * from all_users where rownum <= 5;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select username,
  2         first_value(username) over (order by user_id) funame,
  3         last_value (username) over (order by user_id) luname
  4    from t;

USERNAME                       FUNAME                         LUNAME
------------------------------ ------------------------------ ------------------------------
BIG_TABLE                      BIG_TABLE                      BIG_TABLE
BIGGER_TABLE                   BIG_TABLE                      BIGGER_TABLE
FB_DEMO                        BIG_TABLE                      FB_DEMO
OPS$TKYTE                      BIG_TABLE                      OPS$TKYTE
B                              BIG_TABLE                      B

ops$tkyte%ORA10GR2> select username,
  2         first_value(username) over (order by user_id rows between current row and unbounded following) funame,
  3         last_value (username) over (order by user_id rows between current row and unbounded following) luname
  4    from t;

USERNAME                       FUNAME                         LUNAME
------------------------------ ------------------------------ ------------------------------
BIG_TABLE                      BIG_TABLE                      B
BIGGER_TABLE                   BIGGER_TABLE                   B
FB_DEMO                        FB_DEMO                        B
OPS$TKYTE                      OPS$TKYTE                      B
B                              B                              B



Look first_value returns a constant, no wait, last_value returns a constant...


No - neither return a constant

ops$tkyte%ORA10GR2> select username,
  2         first_value(username) over (order by user_id range between current row and 2 following) funame,
  3         last_value (username) over (order by user_id range between current row and 2 following) luname
  4    from t;

USERNAME                       FUNAME                         LUNAME
------------------------------ ------------------------------ ------------------------------
BIG_TABLE                      BIG_TABLE                      BIG_TABLE
BIGGER_TABLE                   BIGGER_TABLE                   BIGGER_TABLE
FB_DEMO                        FB_DEMO                        OPS$TKYTE
OPS$TKYTE                      OPS$TKYTE                      OPS$TKYTE
B                              B                              B



they simply return the first/last_value in the current WINDOW.

Re: Aggregate function choice when using FIRST/LAST?

Jimmy, March 25, 2008 - 9:58 am UTC

It seems you didn't read my post. I'm talking about aggregate functions with FIRST and LAST, not the analytical functions FIRST_VALUE and LAST_VALUE.

As stated in my previous post, it seems to me that "keep (dense_rank [first|last] order by...)" only returns a single value, so it doesn't matter if you choose MIN or MAX or SUM, and COUNT seems useless. The documentation doesn't explicitly state this, and seems to imply that it does make a difference since they pair MIN with FIRST and MAX with LAST.

SQL> select deptno,
  2         min(sal) keep (dense_rank last order by hiredate) min_last_sal,
  3         max(sal) keep (dense_rank last order by hiredate) max_last_sal,
  4         sum(sal) keep (dense_rank last order by hiredate) sum_last_sal,
  5         count(sal) keep (dense_rank last order by hiredate) count_last_sal
  6    from emp
  7   group by deptno;

    DEPTNO MIN_LAST_SAL MAX_LAST_SAL SUM_LAST_SAL COUNT_LAST_SAL
---------- ------------ ------------ ------------ --------------
        10         1300         1300         1300              1
        20         1100         1100         1100              1
        30          950          950          950              1


So, is there any difference between MIN, MAX, and SUM when using FIRST or LAST? Or is it just a syntactical formality since you have to be using some aggregate function with a group by?
Tom Kyte
March 26, 2008 - 8:20 am UTC

it seems you gave no example, so sure, it makes it difficult to understand what you say.


...
As stated in my previous post, it seems to me that "keep (dense_rank [first|last] order by...)"

....

you stated nothing like that, that would have been useful - it would have shown you were specifically talking about dense_rank with KEEP.

ops$tkyte%ORA10GR2> select deptno,
  2          min(comm) keep (dense_rank last order by hiredate) min_last_comm,
  3          max(comm) keep (dense_rank last order by hiredate) max_last_comm,
  4          sum(comm) keep (dense_rank last order by hiredate) sum_last_comm,
  5          count(comm) keep (dense_rank last order by hiredate) count_last_comm
  6     from emp
  7    group by deptno;

    DEPTNO MIN_LAST_COMM MAX_LAST_COMM SUM_LAST_COMM COUNT_LAST_COMM
---------- ------------- ------------- ------------- ---------------
        10                                                         0
        20                                                         0
        30                                                         0



so, count(column) is not a constant.

and with repeated data - min, max, sum are definitely NOT the same

ops$tkyte%ORA10GR2> insert into emp (empno,hiredate,comm)
  2  select 1234,sysdate,1000+rownum from all_users;

58 rows created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select deptno,
  2          min(comm) keep (dense_rank last order by hiredate) min_last_comm,
  3          max(comm) keep (dense_rank last order by hiredate) max_last_comm,
  4          sum(comm) keep (dense_rank last order by hiredate) sum_last_comm,
  5          count(comm) keep (dense_rank last order by hiredate) count_last_comm
  6     from emp
  7    group by deptno;

    DEPTNO MIN_LAST_COMM MAX_LAST_COMM SUM_LAST_COMM COUNT_LAST_COMM
---------- ------------- ------------- ------------- ---------------
        10                                                         0
        20                                                         0
        30                                                         0
                    1001          1058         59711              58





Performance of DENSE_RANK FIRST ORDER BY

A reader, July 29, 2008 - 5:50 pm UTC

Why is the SQL that uses "DENSE_RANK FIRST ORDER BY" so much faster than the one that uses row_number?

CREATE TABLE x (
  dept  NUMBER,
  name  VARCHAR2(20),
  sal   NUMBER
);

BEGIN
   FOR i IN 1..1000000 LOOP
      INSERT INTO x VALUES (
         MOD(i, 1000),
         dbms_random.string('A', 20),
         TRUNC(dbms_random.value(10000, 1000000))
      );
   END LOOP;

   COMMIT;
END;
/

set autotrace traceonly
set timing on

SELECT dept, name, sal
FROM   (
        SELECT dept, name, sal,
               row_number() OVER (PARTITION BY dept ORDER BY sal DESC) rn
        FROM   x
       )
WHERE  rn = 1;

1000 rows selected.

Elapsed: 00:00:06.04

Execution Plan
----------------------------------------------------------
Plan hash value: 4121446221

-----------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |   989K|    48M|       | 10876   (1)| 00:02:11 |
|*  1 |  VIEW                    |      |   989K|    48M|       | 10876   (1)| 00:02:11 |
|*  2 |   WINDOW SORT PUSHED RANK|      |   989K|    35M|    90M| 10876   (1)| 00:02:11 |
|   3 |    TABLE ACCESS FULL     | X    |   989K|    35M|       |  1079   (2)| 00:00:13 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPT" ORDER BY
              INTERNAL_FUNCTION("SAL") DESC )<=1)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       4850  consistent gets
          0  physical reads
          0  redo size
      40067  bytes sent via SQL*Net to client
       1111  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed


SELECT dept, 
       MAX(name) KEEP (DENSE_RANK FIRST ORDER BY sal DESC) name,
       MAX(sal) sal
FROM   x
GROUP  BY dept;

1000 rows selected.

Elapsed: 00:00:01.09

Execution Plan
----------------------------------------------------------
Plan hash value: 2477198225

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   989K|    35M|  1150   (8)| 00:00:14 |
|   1 |  SORT GROUP BY     |      |   989K|    35M|  1150   (8)| 00:00:14 |
|   2 |   TABLE ACCESS FULL| X    |   989K|    35M|  1079   (2)| 00:00:13 |
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       4850  consistent gets
          0  physical reads
          0  redo size
      40067  bytes sent via SQL*Net to client
       1111  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

set autotrace off
set timing off


Tom Kyte
August 01, 2008 - 10:54 am UTC

you would want to use tkprof, not timing in sqlplus, to see what is going on. for example:

SELECT dept, name, sal
FROM   ( SELECT dept, name, sal, row_number() OVER (PARTITION BY dept ORDER BY sal DESC) rn FROM   x) WHERE  rn = 1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       68      1.32       1.61         26        500          4        1000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       70      1.33       1.62         26        501          4        1000

Rows     Row Source Operation
-------  ---------------------------------------------------
   1000  VIEW  (cr=500 pr=26 pw=26 time=1569382 us)
   5752   WINDOW SORT PUSHED RANK (cr=500 pr=26 pw=26 time=1688642 us)
 100000    TABLE ACCESS FULL X (cr=500 pr=0 pw=0 time=2899972 us)
********************************************************************************
SELECT dept, MAX(name) KEEP (DENSE_RANK FIRST ORDER BY sal DESC) name, MAX(sal) sal FROM   x GROUP  BY dept

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       68      0.69       0.89          0        500          0        1000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       70      0.70       0.91          0        501          0        1000

Rows     Row Source Operation
-------  ---------------------------------------------------
   1000  SORT GROUP BY (cr=500 pr=0 pw=0 time=883211 us)
 100000   TABLE ACCESS FULL X (cr=500 pr=0 pw=0 time=2499976 us)


you can see they have very different plans - one of them is an aggregate, it squishes out rows, it only has to cache the 1000 rows to be returned as it aggregates (eg: only one ROW per dept needs be cached in the sort area, because it is grouping by dept)

the other one needed lots more temp (you can see pw=NN, it actually spilled to disk).

The aggregate IN THIS CASE was more efficient because of the things it could do to the data (keep only one row per dept), the analytic in this case was less efficient because the rows for dept had to be buffered, sorted and then down selected on

first n rows grouped by a column

Jyo, October 28, 2008 - 11:41 am UTC

How do i select the rows from a table that corresponds to first n distinct values of column in that table. For eg

CREATE TABLE TXN_DTLS
(
ID NUMBER(20),
ACC_IDENT VARCHAR2(35),
DTL_ID NUMBER(20),
VALUE_DATE VARCHAR2(8),
STMT_NO NUMBER(6),
SEQ_NO NUMBER(6),
CREDIT number(20,2),
DEBIT number(20,2),
OP_BAL_AMT number(20,2),
CL_BAL_AMT number(20,2)
);

SET DEFINE OFF;

Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1084, '015004570', 1120, '20080828', 168, 1, 2815000, 49903.62, 49803.62);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1090, '015002284', 1114, '20080828', 168, 2, 10865000, 11603299.39, 743848.62);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1102, '0240001387601', 1164, '20080828', 513, 5, 5.53, 4470174.95, 4461779.16);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1102, '0240001387601', 1164, '20080828', 513, 5, 8.09, 4470174.95, 4461779.16);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1102, '0240001387601', 1164, '20080828', 513, 5, 172, 4470174.95, 4461779.16);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1102, '0240001387601', 1164, '20080828', 513, 5, 650, 4470174.95, 4461779.16);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1199, '0140003608400', 1034, '20080828', 94, 2, 1800, 29062193.29, 23175380.49);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1199, '0140003608400', 1034, '20080828', 94, 2, 10175.3, 29062193.29, 23175380.49);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1199, '0140003608400', 1034, '20080828', 94, 2, 294432, 29062193.29, 23175380.49);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1199, '0140003608400', 1034, '20080828', 94, 2, 446987.29, 29062193.29, 23175380.49);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1200, '0140003608400', 1033, '20080828', 94, 1, 288181.12, 36719544.06, 29062193.29);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1200, '0140003608400', 1033, '20080828', 94, 1, 1349695.45, 36719544.06, 29062193.29);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1200, '0140003608400', 1033, '20080828', 94, 1, 1547138.62, 36719544.06, 29062193.29);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1200, '0140003608400', 1033, '20080828', 94, 1, 2498722.84, 36719544.06, 29062193.29);
COMMIT;

what will be the query to get the rows that corresponds to first 6 distinct ACC_IDENT column values.

i tried

Option -1

SELECT DISTINCT A.* FROM TXN_DTLS A,
(
SELECT ROWNUM ROW_ID, ACC_IDENT FROM (SELECT ACC_IDENT FROM TXN_DTLS WHERE ACC_IDENT IS NOT NULL
AND DEBIT IS NOT NULL GROUP BY ACC_IDENT ORDER BY ACC_IDENT) WHERE ROWNUM<=6
)B
WHERE A.ACC_IDENT=B.ACC_IDENT AND DEBIT IS NOT NULL
ORDER BY A.ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO, ID,DEBIT

Option -2

SELECT * FROM
(
SELECT DENSE_RANK() OVER (ORDER BY ACC_IDENT) R,X.* FROM
(
SELECT * FROM
TXN_DTLS
WHERE
DEBIT IS NOT NULL ORDER BY ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO, ID,DEBIT
)X
)
WHERE R<=6

help me to get the results with possible queries and which execution plan will perform well.

Tom Kyte
October 28, 2008 - 1:33 pm UTC

...
SELECT * FROM
(
SELECT DENSE_RANK() OVER (ORDER BY ACC_IDENT) R,X.* FROM
(
SELECT * FROM
TXN_DTLS
WHERE
DEBIT IS NOT NULL ORDER BY ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO,
ID,DEBIT
)X
)
WHERE R<=6
....

that "ORDER BY ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO, ID,DEBIT" is bad for two reasons

a) it is not necessary
b) it might convey to someone that doesn't know better that the data will be sorted by acc_ident, value_date, stmt_no, seq_no, id, debit - it does not have to be.

that order by is totally "not relevant" to the result.


I made it think your table is "big", added an index, and this is probably what you want:

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'TXN_DTLS', numrows => 1000000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> SELECT * FROM
  2  (
  3  SELECT DENSE_RANK() OVER (ORDER BY ACC_IDENT) R, X.*
  4    FROM TXN_DTLS x
  5   WHERE   DEBIT IS NOT NULL
  6  )
  7  WHERE R<=6
  8  order by ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO, ID,DEBIT
  9  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1098440754

------------------------------------------------------------------------
| Id  | Operation                      | Name         | Rows  | Bytes |T
------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |              | 50000 |  6933K|
|   1 |  SORT ORDER BY                 |              | 50000 |  6933K|
|*  2 |   VIEW                         |              | 50000 |  6933K|
|*  3 |    WINDOW NOSORT STOPKEY       |              | 50000 |  4882K|
|   4 |     TABLE ACCESS BY INDEX ROWID| TXN_DTLS     | 50000 |  4882K|
|*  5 |      INDEX FULL SCAN           | TXN_DTLS_IDX | 50000 |       |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("R"<=6)
   3 - filter(DENSE_RANK() OVER ( ORDER BY "ACC_IDENT")<=6)
   5 - filter("DEBIT" IS NOT NULL)

ops$tkyte%ORA10GR2> @atoff
ops$tkyte%ORA10GR2> set autotrace off



use the index to read the table sorted by acc_ident, filtering on the debit is not null bit right in the index - then assign the dense rank as the data flows out and stop after dense rank hits six....


Then sort that tiny bit of data we just retrieved.

A reader, March 02, 2010 - 4:35 am UTC


a couple comments...

Eddie, November 26, 2012 - 7:05 pm UTC

1. use the following instead of the broken link from above:

http://docs.oracle.com/cd/B19306_01/server.102/b14223/analysis.htm#DWHSG0205

2. the review "removing groups based on condition" sounds like a case of Relational Division - using the right terminology is the first step towards finding the right solution...
Tom Kyte
November 29, 2012 - 7:48 am UTC

unless of course "relational division" as a term is meaningless to you in which case saying something like:

a) find the set of docids to remove
b) then remove them.

is generally considered more useful...

last_value() analytical function not working like first_value()

Mahesh, June 24, 2015 - 11:38 am UTC

have written a simple SQL

select deptno
,sal
, ename
, first_value (ename) over ( partition by deptno order by sal , ename ) fv
from emp;
10 1300 MILLER MILLER
10 2450 CLARK MILLER
20 800 SMITH SMITH
20 1100 ADAMS SMITH
20 2975 JONES SMITH
20 3000 FORD SMITH
.........
.........

Its working as expected.
whereas the below behaves differently

select deptno
,sal
, ename
, last_value (ename) over ( partition by deptno order by sal , ename ) fv
from emp;

10 1300 MILLER MILLER
10 2450 CLARK CLARK
20 800 SMITH SMITH
20 1100 ADAMS ADAMS
20 2975 JONES JONES
20 3000 FORD FORD
20 3000 SCOTT SCOTT
30 950 JAMES JAMES
30 1250 MARTIN MARTIN
30 1250 WARD WARD
30 1500 TURNER TURNER

I have used defaulted emp & dept table.Please help ?
Below is the oracle version used
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production


Perfect

A reader, December 03, 2016 - 7:11 pm UTC

Thanks a lot

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.