Skip to Main Content
  • Questions
  • CONNECT BY - multiple start nodes (possible cartesian product)

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Andrew.

Asked: November 08, 2010 - 1:27 pm UTC

Last updated: November 10, 2010 - 8:39 am UTC

Version: 10.2.0.4

Viewed 10K+ times! This question is

You Asked

I could be going mad, but have got unexpected results from use of START WITH...CONNECT BY...PRIOR syntax, and wanted to confirm my understanding in this scenario.

We have a table with a hiearchical data structure, used to hold details of related transactions (parents and children).

Where the ORIGINAL, PARENT and REPLACEMENT columns are NULL, this is the top of the structure – i.e. REFNO 327147 in example below.

A child record can have either the PARENT or REPLACEMENT set, not both.

E.G.
REFNO ORIGINAL PARENT REPLACEMENT
327147
327171 327147 327147
327172 327147 327147
473116 327147 <NULL> 327171
473118 327147 <NULL> 327172
596518 327147 <NULL> 473118
598455 327147 596518
598456 327147 596518
598470 327147 596518

When I write a query to traverse the tree starting from a point where the REPLACEMENT column equals our candidate start points, if there are multiple start points this seems to cause a Cartesian product...

Reproducible with:

create table transactions(REFNO NUMBER(10),ORIGINAL NUMBER(10),PARENT NUMBER(10),REPLACEMENT NUMBER(10));

insert into transactions values(327147,NULL,NULL,NULL);
insert into transactions values(327171,327147,327147,NULL);
insert into transactions values(327172,327147,327147,NULL);
insert into transactions values(473116,327147,NULL,327171);
insert into transactions values(473118,327147,NULL,327172);
insert into transactions values(596518,327147,NULL,473118);
insert into transactions values(598455,327147,596518,NULL);
insert into transactions values(598456,327147,596518,NULL);
insert into transactions values(598470,327147,596518,NULL);

select t.*,CONNECT_BY_ISLEAF leaf_node /* Used to get lowest level leaf node */
from transactions t,(select '327171' start_candidate FROM dual UNION ALL select '327172' start_candidate FROM dual) candids
start with t.replacement = candids.start_candidate connect by NVL(t.replacement,t.parent) = prior t.refno
order by nvl(original,refno),nvl(replacement,parent),refno

This can be seen by tweaking query above to include a "WHERE 1=2" clause for one or other of the in-line view "select ... FROM dual"...
I.E.
i) when only 327171 is returned by in-line view, the query returns 1 record
ii) when only 327172 is returned by in-line view, the query returns 5 records
iii) when both are returned, the query returns 16 records !!!

It seems to be related to my use of NVL, but if I replace this with a CASE statement or a DECODE is still does the same thing.

So, I thought that the DBMS would start with each node specified in the use of "START WITH" clause just once, then separately traverse down the tree walk - i.e. start with 327171 and traverse down the tree structure returning that result set, THEN start with 327172 and traverse down the tree structure returning that result set.

What I seem to have is a munged together tree walk...

HELP !

and Tom said...



You have a cartesian join between T and CANDIDS (start with does NOT 'join', it selects rows from a join - you have a cartesian join between T and candids - you doubled all of the rows)


ops$tkyte%ORA10GR2> select rpad( '*', level, '*') hier, t.*,CONNECT_BY_ISLEAF leaf_node
  2  from transactions t
  3  start with t.replacement in (select '327171' start_candidate FROM dual UNION ALL select '327172' start_candidate FROM dual)
  4  connect by NVL(t.replacement,t.parent) = prior t.refno
  5  /

HIER            REFNO   ORIGINAL     PARENT REPLACEMENT  LEAF_NODE
---------- ---------- ---------- ---------- ----------- ----------
*              473116     327147                 327171          1
*              473118     327147                 327172          0
**             596518     327147                 473118          0
***            598455     327147     596518                      1
***            598456     327147     596518                      1
***            598470     327147     596518                      1

6 rows selected.



is probably what you meant to do.

Is this answer out of date? If it is, please let us know via a Comment