You Asked
1. We receive data in the following format in a collection. We do not have control on this data.
(1, 'abc')
(1, 'def')
(1, 'pqr')
(2, 'xyz')
(2, 'abc')
(3, 'mno')
(4, 'abc')
(4, 'def')
(4, 'pqr')
(4, 'jkl')
2. rec_tab is the intermediate data. We derive the key (k1, k2, k3 etc) based on some logic. The structure is in the form of <record_num, key, value>.
(1, 'k1', 'abc')
(1, 'k2', 'def')
(1, 'k3', 'pqr')
(2, 'k1', 'xyz')
(2, 'k4', 'abc')
(3, 'k2', 'mno')
(4, 'k1', 'abc')
(4, 'k2', 'def')
(4, 'k3', 'pqr')
(4, 'k5', 'jkl')
3. I have to combine the intermediate data based on record_num in the below format. Also based on key value we have to filter records.
e.g. For record_num 4 we do not want to process k5, so (4, 'k5', 'jkl') will be filtered.
(1, 'k1|abc:k2|def:k3|pqr')
(2, 'k1|xyz:k4|abc')
(3, 'k2|mno')
(4, 'k1|abc:k2|def:k3|pqr')
4. Then I have to collect the duplicate record_num. It will be sufficient if we can say duplicate record exists even if we do not identify the duplicate record number.
e.g. in the above case it should return 1 and 4
I want to determine the duplicate record_num while deriving the key. Please suggest. We have control on data type from step 2 onwards.
The below approach starts from step 2:
CREATE OR REPLACE TYPE my_rec AS OBJECT (record_num NUMBER, key VARCHAR2(10), value VARCHAR2(50));
/
CREATE OR REPLACE TYPE rec_tab IS TABLE OF my_rec;
/
CREATE OR REPLACE TYPE dest_rec AS OBJECT (record_num NUMBER, key_val VARCHAR2(100));
/
CREATE OR REPLACE TYPE dest_tab IS TABLE OF dest_rec;
/
declare
lv_rec rec_tab := rec_tab();
lv_dest_tab dest_tab := dest_tab();
lv_tmp_tab dest_tab := dest_tab();
begin
lv_rec.extend;
lv_rec(lv_rec.last) := my_rec(1, 'k1', 'abc');
lv_rec.extend;
lv_rec(lv_rec.last) := my_rec(1, 'k2', 'def');
lv_rec.extend;
lv_rec(lv_rec.last) := my_rec(1, 'k3', 'pqr');
lv_rec.extend;
lv_rec(lv_rec.last) := my_rec(2, 'k1', 'xyz');
lv_rec.extend;
lv_rec(lv_rec.last) := my_rec(2, 'k4', 'abc');
lv_rec.extend;
lv_rec(lv_rec.last) := my_rec(3, 'k2', 'mno');
lv_rec.extend;
lv_rec(lv_rec.last) := my_rec(4, 'k1', 'abc');
lv_rec.extend;
lv_rec(lv_rec.last) := my_rec(4, 'k2', 'def');
lv_rec.extend;
lv_rec(lv_rec.last) := my_rec(4, 'k3', 'pqr');
SELECT dest_rec(record_num, key_val1)
BULK COLLECT INTO lv_dest_tab
FROM
(SELECT record_num, key_val1, COUNT(1) OVER (PARTITION BY key_val1) cnt
FROM (
SELECT record_num, LISTAGG(key_val, ':') WITHIN GROUP (ORDER BY key_val) key_val1
FROM (SELECT record_num, key||'|'||value as key_val FROM (SELECT * FROM TABLE(lv_rec)))
GROUP BY record_num )
) WHERE cnt > 1;
FOR i IN lv_dest_tab.FIRST .. lv_dest_tab.LAST
LOOP
dbms_output.put_line(lv_dest_tab(i).record_num || ', ' || lv_dest_tab(i).key_val);
END LOOP;
end;
/
and Connor said...
Depending on the volume of the incoming data it *might* be quicker to do it all in PLSQL using associative arrays, eg
SQL> set serverout on
SQL> declare
2 type t1 is table of varchar2(4000) index by pls_integer;
3 r1 t1;
4
5 type t2 is table of varchar2(100) index by varchar2(4000);
6 r2 t2;
7
8 idx pls_integer;
9
10 lv_rec rec_tab := rec_tab();
11 lv_dest_tab dest_tab := dest_tab();
12 lv_tmp_tab dest_tab := dest_tab();
13 begin
14 lv_rec.extend;
15 lv_rec(lv_rec.last) := my_rec(1, 'k1', 'abc');
16 lv_rec.extend;
17 lv_rec(lv_rec.last) := my_rec(1, 'k2', 'def');
18 lv_rec.extend;
19 lv_rec(lv_rec.last) := my_rec(1, 'k3', 'pqr');
20 lv_rec.extend;
21 lv_rec(lv_rec.last) := my_rec(2, 'k1', 'xyz');
22 lv_rec.extend;
23 lv_rec(lv_rec.last) := my_rec(2, 'k4', 'abc');
24 lv_rec.extend;
25 lv_rec(lv_rec.last) := my_rec(3, 'k2', 'mno');
26 lv_rec.extend;
27 lv_rec(lv_rec.last) := my_rec(4, 'k1', 'abc');
28 lv_rec.extend;
29 lv_rec(lv_rec.last) := my_rec(4, 'k2', 'def');
30 lv_rec.extend;
31 lv_rec(lv_rec.last) := my_rec(4, 'k3', 'pqr');
32
33 --SELECT dest_rec(record_num, key_val1)
34 --BULK COLLECT INTO lv_dest_tab
35 --FROM
36 --
37 --(SELECT record_num, key_val1, COUNT(1) OVER (PARTITION BY key_val1) cnt
38 --FROM (
39 --SELECT record_num, LISTAGG(key_val, ':') WITHIN GROUP (ORDER BY key_val) key_val1
40 --FROM (SELECT record_num, key||'|'||value as key_val FROM (SELECT * FROM TABLE(lv_rec)))
41 --GROUP BY record_num )
42 --) WHERE cnt > 1;
43
44
45 --FOR i IN lv_dest_tab.FIRST .. lv_dest_tab.LAST
46 --LOOP
47 --dbms_output.put_line(lv_dest_tab(i).record_num || ', ' || lv_dest_tab(i).key_val);
48 --END LOOP;
49
50
51 for i in 1 .. lv_Rec.count loop
52 if r1.exists(lv_Rec(i).record_num) then
53 r1(lv_Rec(i).record_num) := r1(lv_Rec(i).record_num) || lv_Rec(i).key||'|'||lv_Rec(i).value;
54 else
55 r1(lv_Rec(i).record_num) := lv_Rec(i).key||'|'||lv_Rec(i).value;
56 end if;
57 end loop;
58
59 idx := r1.first;
60 loop
61 if r2.exists(r1(idx)) then
62 dbms_output.put_line(idx||' dup with '||r2(r1(idx)));
63 else
64 r2(r1(idx)) := idx;
65 end if;
66 idx := r1.next(idx);
67 exit when idx is null;
68 end loop;
69
70 end;
71 /
4 dup with 1
PL/SQL procedure successfully completed.
but that probably wont scale as well as the SQL option if you've got tens of thousands of rows.
Is this answer out of date? If it is, please let us know via a Comment