first n rows grouped by a column
Jyo, October 28, 2008 - 11:41 am UTC
How do i select the rows from a table that corresponds to first n distinct values of column in that table. For eg
CREATE TABLE TXN_DTLS
(
ID NUMBER(20),
ACC_IDENT VARCHAR2(35),
DTL_ID NUMBER(20),
VALUE_DATE VARCHAR2(8),
STMT_NO NUMBER(6),
SEQ_NO NUMBER(6),
CREDIT number(20,2),
DEBIT number(20,2),
OP_BAL_AMT number(20,2),
CL_BAL_AMT number(20,2)
);
SET DEFINE OFF;
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1084, '015004570', 1120, '20080828', 168, 1, 2815000, 49903.62, 49803.62);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1090, '015002284', 1114, '20080828', 168, 2, 10865000, 11603299.39, 743848.62);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1102, '0240001387601', 1164, '20080828', 513, 5, 5.53, 4470174.95, 4461779.16);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1102, '0240001387601', 1164, '20080828', 513, 5, 8.09, 4470174.95, 4461779.16);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1102, '0240001387601', 1164, '20080828', 513, 5, 172, 4470174.95, 4461779.16);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1102, '0240001387601', 1164, '20080828', 513, 5, 650, 4470174.95, 4461779.16);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1199, '0140003608400', 1034, '20080828', 94, 2, 1800, 29062193.29, 23175380.49);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1199, '0140003608400', 1034, '20080828', 94, 2, 10175.3, 29062193.29, 23175380.49);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1199, '0140003608400', 1034, '20080828', 94, 2, 294432, 29062193.29, 23175380.49);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1199, '0140003608400', 1034, '20080828', 94, 2, 446987.29, 29062193.29, 23175380.49);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1200, '0140003608400', 1033, '20080828', 94, 1, 288181.12, 36719544.06, 29062193.29);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1200, '0140003608400', 1033, '20080828', 94, 1, 1349695.45, 36719544.06, 29062193.29);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1200, '0140003608400', 1033, '20080828', 94, 1, 1547138.62, 36719544.06, 29062193.29);
Insert into TXN_DTLS
(ID, ACC_IDENT, DTL_ID, VALUE_DATE, STMT_NO, SEQ_NO, DEBIT, OP_BAL_AMT, CL_BAL_AMT)
Values
(1200, '0140003608400', 1033, '20080828', 94, 1, 2498722.84, 36719544.06, 29062193.29);
COMMIT;
what will be the query to get the rows that corresponds to first 6 distinct ACC_IDENT column values.
i tried
Option -1
SELECT DISTINCT A.* FROM TXN_DTLS A,
(
SELECT ROWNUM ROW_ID, ACC_IDENT FROM (SELECT ACC_IDENT FROM TXN_DTLS WHERE ACC_IDENT IS NOT NULL
AND DEBIT IS NOT NULL GROUP BY ACC_IDENT ORDER BY ACC_IDENT) WHERE ROWNUM<=6
)B
WHERE A.ACC_IDENT=B.ACC_IDENT AND DEBIT IS NOT NULL
ORDER BY A.ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO, ID,DEBIT
Option -2
SELECT * FROM
(
SELECT DENSE_RANK() OVER (ORDER BY ACC_IDENT) R,X.* FROM
(
SELECT * FROM
TXN_DTLS
WHERE
DEBIT IS NOT NULL ORDER BY ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO, ID,DEBIT
)X
)
WHERE R<=6
help me to get the results with possible queries and which execution plan will perform well.
October 28, 2008 - 1:33 pm UTC
...
SELECT * FROM
(
SELECT DENSE_RANK() OVER (ORDER BY ACC_IDENT) R,X.* FROM
(
SELECT * FROM
TXN_DTLS
WHERE
DEBIT IS NOT NULL ORDER BY ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO,
ID,DEBIT
)X
)
WHERE R<=6
....
that "ORDER BY ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO, ID,DEBIT" is bad for two reasons
a) it is not necessary
b) it might convey to someone that doesn't know better that the data will be sorted by acc_ident, value_date, stmt_no, seq_no, id, debit - it does not have to be.
that order by is totally "not relevant" to the result.
I made it think your table is "big", added an index, and this is probably what you want:
ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'TXN_DTLS', numrows => 1000000, numblks => 100000 );
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> SELECT * FROM
2 (
3 SELECT DENSE_RANK() OVER (ORDER BY ACC_IDENT) R, X.*
4 FROM TXN_DTLS x
5 WHERE DEBIT IS NOT NULL
6 )
7 WHERE R<=6
8 order by ACC_IDENT, VALUE_DATE, STMT_NO, SEQ_NO, ID,DEBIT
9 /
Execution Plan
----------------------------------------------------------
Plan hash value: 1098440754
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |T
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 | 6933K|
| 1 | SORT ORDER BY | | 50000 | 6933K|
|* 2 | VIEW | | 50000 | 6933K|
|* 3 | WINDOW NOSORT STOPKEY | | 50000 | 4882K|
| 4 | TABLE ACCESS BY INDEX ROWID| TXN_DTLS | 50000 | 4882K|
|* 5 | INDEX FULL SCAN | TXN_DTLS_IDX | 50000 | |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("R"<=6)
3 - filter(DENSE_RANK() OVER ( ORDER BY "ACC_IDENT")<=6)
5 - filter("DEBIT" IS NOT NULL)
ops$tkyte%ORA10GR2> @atoff
ops$tkyte%ORA10GR2> set autotrace off
use the index to read the table sorted by acc_ident, filtering on the debit is not null bit right in the index - then assign the dense rank as the data flows out and stop after dense rank hits six....
Then sort that tiny bit of data we just retrieved.