Hello All,
It would be greatly appreciated if someone can provide a way out on how to extract distinct group/array of associated (they are not of parent-child relationship or hierarchical) records as shown below. The same table has been created with sample dataset which you will find in the LiveSQL link below -
https://livesql.oracle.com/apex/livesql/s/hodwakb4rv2siftyzv1juvc4i
CREATE TABLE TEST_DATA
(
SRC_REC NUMBER,
TGT_REC NUMBER
);
INSERT INTO TEST_DATA (SRC_REC, TGT_REC) VALUES(100, 101);
INSERT INTO TEST_DATA (SRC_REC, TGT_REC) VALUES(100, 102);
INSERT INTO TEST_DATA (SRC_REC, TGT_REC) VALUES(103, 102);
INSERT INTO TEST_DATA (SRC_REC, TGT_REC) VALUES(103, 104);
INSERT INTO TEST_DATA (SRC_REC, TGT_REC) VALUES(102, 105);
INSERT INTO TEST_DATA (SRC_REC, TGT_REC) VALUES(108, 110);
INSERT INTO TEST_DATA (SRC_REC, TGT_REC) VALUES(107, 106);
INSERT INTO TEST_DATA (SRC_REC, TGT_REC) VALUES(109, 111);
INSERT INTO TEST_DATA (SRC_REC, TGT_REC) VALUES(111, 107);
COMMIT;
Array 1: 100, 101, 102, 103, 104, 105
Array 2: 106, 107, 111
Array 3: 108, 110
This is a form of graph traversal problem. Which is tricky in SQL.
But you can do it using connect by. Or recursive with if you prefer.
For each row, recursively join it to every other row any of the following are true:
- The source = target (and vice-versa)
- The source = source and they have different targets
- The target = target and they have different sources
Include the root row for the source. This will loop back on itself, so you need to add cycle detection.
select src_rec, tgt_rec,
connect_by_root src_rec rt
from test_data
connect by nocycle
prior src_rec = tgt_rec or
prior tgt_rec = src_rec or
( prior src_rec = src_rec and prior tgt_rec <> tgt_rec ) or
( prior tgt_rec = tgt_rec and prior src_rec <> src_rec );
SRC_REC TGT_REC RT
100 101 100
100 102 100
103 102 100
103 104 100
102 105 100
102 105 100
103 102 100
103 104 100
100 102 100
100 101 100
103 102 100
103 104 100
102 105 100
102 105 100
103 102 100
103 104 100
...
This will generate the subgraph for all rows. To convert this to lists:
- Unpivot the source and target columns
- Return the resulting distinct root and values
- Use listagg to get a comma-separated list of the subgraph for each root
- Return the distinct list of these CSVs
Which gives:
with tree ( src, tgt, rt ) as (
select src_rec, tgt_rec,
connect_by_root src_rec rt
from test_data
connect by nocycle
prior src_rec = tgt_rec or
prior tgt_rec = src_rec or
( prior src_rec = src_rec and prior tgt_rec <> tgt_rec ) or
( prior tgt_rec = tgt_rec and prior src_rec <> src_rec )
), vals as (
select distinct rt, val
from tree
unpivot ( val for col in ( src, tgt ) )
)
select distinct listagg ( val , ',' )
within group ( order by val ) grp
from vals
group by rt;
GRP
106,107,109,111
108,110
100,101,102,103,104,105
Warning: you're generating the tree for every row. So on large data sets this likely to take a while!