Skip to Main Content
  • Questions
  • CONNECT BY vs recursive Common Table Expressions

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Duke.

Asked: December 12, 2005 - 11:05 am UTC

Last updated: June 23, 2019 - 9:34 pm UTC

Version: 9.2.0

Viewed 10K+ times! This question is

You Asked

Oracle has had the CONNECT BY syntax for some time. Competitors like DB2 and SQL Server 2005 use a "recursive Common Table Expression (CTE)", that is, a recursive named subquery, to achieve similar results:

</code> http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0510rielau/ http://msdn2.microsoft.com/en-us/library/ms175972.aspx <code>

Oracle 10gR2, unless I missed it in the docs, still doesn't allow recursive named queries. Is this because CONNECT BY does at least what recursive queries allow (particularly since the ORA-01436 error has been removed for at least some cases)? In other words, recursive CTE would be no more than a syntax change, something like going from Oracle outer joins to ANSI join syntax?


and we said...

Well, I'll say connect by has been around for some time....


Anyone care to guess what version/year?


Version 2, 1977.


so, yes, connect by is an oldie but goodie. (and recursive queries in DB2 - they don't have the 1436 or NOCYCLE, they can be infinite loops according to the link you provided - careful with that...)


I don't know about you - but I find the understandability of:

1 SELECT name
2 FROM emp
3 START WITH name = 'Goyal'
4 CONNECT BY PRIOR empid = mgrid

in Oracle versus:

1 WITH n(empid, name) AS
2 (SELECT empid, name
3 FROM emp
4 WHERE name = 'Goyal'
5 UNION ALL
6 SELECT nplus1.empid, nplus1.name
7 FROM emp as nplus1, n
8 WHERE n.empid = nplus1.mgrid)
9 SELECT name FROM n;

in DB2 et.al. to be night and day difference. I'm not sure what the motivation to introduce this syntax would be entirely. Given that the databases are different and what works on DB2 doesn't necessarily work on SQLSserver and what works on SQLServer doesn't necessarily work on Oracle and in fact:

What works on Database X
does not necessarily work
on Database Y

is true.... I am not aware of any move to provide this syntax (especially given the fact that the HIERARCHY is busted totally with this "recursive" approach, the Oracle query returns:

1 SELECT level, rpad('*',2*level,'*')||name name
2 FROM emp
3 START WITH name = 'Goyal'
4* CONNECT BY PRIOR empid = mgrid
ops$tkyte@ORA10GR2> /

LEVEL NAME
---------- --------------------
1 **Goyal
2 ****Zander
3 ******McKeough
3 ******Barnes
2 ****Henry
3 ******O'Neil
3 ******Smith
3 ******Shoeman
2 ****Scott

9 rows selected.


whereas the db2 query produces this:

1 WITH n(level, empid, name) AS
2 (SELECT 1, empid, name
3 FROM emp
4 WHERE name = 'Goyal'
5 UNION ALL
6 SELECT n.level + 1, nplus1.empid, nplus1.name
7 FROM emp as nplus1, n
8 WHERE n.empid = nplus1.mgrid)
9 SELECT level, name FROM n;

LEVEL NAME
----------- ----------
1 Goyal
2 Zander
2 Henry
2 Scott
3 McKeough
3 Barnes
3 O'Neil
3 Smith
3 Shoeman

9 record(s) selected


I see we can simulate it on a "case by case basis, sometimes" by writing a UDF connect_by_pos function written in C, we could code this:


1 WITH source(empid, name, salary, mgrid, rownum) AS
2 (SELECT empid, name, salary, mgrid,
3 ROW_NUMBER() OVER(ORDER BY salary)
4 FROM emp),
5 n(empid, name, salary, level, pos) AS
6 (SELECT empid, name, salary, 1,
7 CONNECT_BY_POS('', 0, rownum)
8 FROM source
9 WHERE name = 'Goyal'
10 UNION ALL
11 SELECT nplus1.empid, nplus1.name,
12 nplus1.salary, level + 1,
13 CONNECT_BY_POS(n.pos, level, rownum)
14 FROM source AS nplus1, n
15 WHERE n.empid = nplus1.mgrid)
16 SELECT name, level, salary
17 FROM n
18 ORDER BY pos;

NAME LEVEL SALARY
---------- ----------- -----------
Goyal 1 80000.00
Henry 2 51000.00
Shoeman 3 33000.00
Smith 3 34000.00
O'Neil 3 36000.00
Zander 2 52000.00
Barnes 3 41000.00
McKeough 3 42000.00
Scott 2 53000.00

9 record(s) selected


I'll stick with connect by for now if you don't mind.

Addenda: June 2019

Note that recursive WITH was added in 11.2 of the Oracle database engine, so now you can use CONNECT BY *or* recursive WITH.

Rating

  (9 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Recursive SQL

Mikito Harakiri, December 12, 2005 - 2:17 pm UTC

Ordering siblings can be done by global ordering of path, e.g.

/Goyal/Zander/41000
(that was Barnes salary)
/Goyal/Zander/42000
(that was McKeough salary)

Those strings are easily produced without resorting to UDF.

Each method has its own advantages. One thing that is surprising about DB2 recursive SQL implementation is that they can't calculate transitive closure for arbitrary graphs easily (i.e. without intrroducing UDF). One tiny deviation from ANSI standard -- "union all" instead of "union" is responsible for that.

On the other hand many queries look natural in recursive SQL. E.g.

-- integer generator
with Integers (num) as
( values(1)
union all
select num+1 from Integers
where num < 100
)
select * from Integers

-- decomposition comma separated string into pieces
with decomposed( pref, post ) as (
select '', lst from A
union
select substr(lst, 1, instr(lst,',')),
substr(lst, instr(lst,',')+1, length(lst)) from decomposed
) select pref from decomposed where pref <> ''

Subset queries:
</code> http://forums.oracle.com/forums/message.jspa?messageID=1134599#1134599 <code>


Tom Kyte
December 12, 2005 - 2:30 pm UTC

got to make them fixed length of course. and worry about negative numbers, don't forget to pad out nulls. and so on.


Thanks, but I'll stick with connect by thanks very much. Even if recursive with queries come about, I cannot imagine using them.

Pitfalls

Mikito Harakiri, December 12, 2005 - 3:01 pm UTC

Easily fixed:

Order by composite criteria
<path to parent> (string), <salary> (numeric)

IMO, recursive with feature is much more interesting than the latest xml fad or some obscure SQL spreadsheet functionality that vendors are vigorously chasing nowadays. Unfortunately, vendor's priorities are frequently defined by the interests of RDBMS engine development team, or users demand. Got a developer interested, or users are screaming about some feature -- you have it.


Tom Kyte
December 12, 2005 - 4:12 pm UTC

not so fast.

I want to order by A, B, C, D

a, b, c, d are all numbers. can be negative. and remember, there is no trivial "path to parent" (it was neat to see such of huge page to replicate a couple of pieces of small functionality, i enjoyed that and have that bookmarked for the future)

Umm...

Bob B, December 12, 2005 - 3:53 pm UTC

Question for Mikito ... who *should* Oracle (<insert any other software company>) be developing stuff for if its not something the developers want to add AND not something the users want? That's sort of like saying "Lets build a building that no one want to use and that no architect wants to make."

Bart, December 13, 2005 - 2:33 am UTC

'Lets build a building that no one want to use and that no architect wants to make'.

NB
Unfortunately architects WANT to make buildings that are one of a kind (and can safely and objectively ;) be called ugly).

Unfortunately softwarearchitects are no different. If it isn't something they cook, it can't be tasty. Amazing to see how managers are manipulated to accept and to agree with 'new' concepts and architectures.


Fortunately, Oracle has a unique policy. Truly amazing to see how they implement new features without breaking their philosopy. They must have some extraordinary architects...

Row order shouldn't matter

Matthew, December 21, 2005 - 11:45 am UTC

I think something that should be pointed out is that the examples given appear to be intended for direct human consumption of the result and that they rely on non-standard-compliant behavior. Your result with CONNECT BY:

LEVEL NAME
---------- --------------------
1 **Goyal
2 ****Zander
3 ******McKeough
3 ******Barnes
2 ****Henry
3 ******O'Neil
3 ******Smith
3 ******Shoeman
2 ****Scott

9 rows selected.

Requires the rows to be returned in this specific order for the consumer to infer the real hierarchy from the resultset.

Note that the CTE method in SQL and DB2 returns a set that looks like:

LEVEL NAME
----------- ----------
1 Goyal
2 Zander
2 Henry
2 Scott
3 McKeough
3 Barnes
3 O'Neil
3 Smith
3 Shoeman

9 record(s) selected

This is exactly in following with the RDBMS concept that unless you specify an ordering, the system may return the rows in an arbitrary and unpredictable manner and further, that the order may differ between executions of the query. In effect, the order _shouldn't_ matter unless you tell the query engine that it does. Making the statement "that the HIERARCHY is busted totally" in DB2's result set doesn't seem quite fair since it is compliant with SQL and does not let the programmer imply meaning from the ordering of the rows without specifying that order.

I don't know if Oracle garantees the order of the rows when you use CONNECT BY or if they just happen to come out that way, but if they changed the engine such that in each case, the row for SCOTT and HENRY changes places in the order returned (but all other rows come out in their original positions), then the semantic meaning of the CONNECT BY query changes but the CTE query does not. This is entirely due to taking meaning from the order of the rows which is not compliant with the SQL standard.

Tom Kyte
December 21, 2005 - 7:47 pm UTC

connect by does this though, IT IS a hierarchical query - by design. it was adde in version 2 of Oracle to support BOM (bills of material) explosion...

and there is order SIBLINGS by to order within the hierarchy

Different uses

Matthew, December 22, 2005 - 9:51 am UTC

Thanks for the follow up... I guess this is more of a specific use feature that makes certain tasks much simpler than if you were doing the same thing with a CTE. I think it would be a good thing if Oracle were to add CTEs in a future release, though I agree, if CONNECT BY provides the functionality you're looking for in a query, it is a faster/simpler approach to use for certain cases.

Tom Kyte
December 22, 2005 - 11:03 am UTC

we do have "cte".

we do not have "recursive" ones.

CTE = "WITH clause" or "named subquery"

Duke Ganote, December 27, 2005 - 11:44 am UTC

Not sure why Oracle doesn't just the term "common table expression", but it's the same as:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:4423923392083 <code>

Connect BY vs Common Table Expressions

Jim Stokes, May 30, 2008 - 8:03 am UTC

Just a late note - I am dealing with an Object oriented database in which a translation from Oracle to Sql Server is being attempted. The CONNECT BY of Oracle 'connects' by the values given NOT a full recursion of the values given.

Thank you for all of your comments. I have made a point to all-powers-to-be of the difficulties of moving functionality from 1 database to another

Presently a sql server dba , always enjoying oracle

recursive subquery factoring in 11gR2

Duke Ganote, September 01, 2009 - 9:06 pm UTC

New in 11gR2 http://tinyurl.com/l57l8t

Recursive Subquery Factoring

If a subquery_factoring_clause refers to its own query_name in the subquery that defines it, then the subquery_factoring_clause is said to be recursive. A recursive subquery_factoring_clause must contain two query blocks: the first is the anchor member and the second is the recursive member. The anchor member must appear before the recursive member, and it cannot reference query_name. The anchor member can be composed of one or more query blocks combined by the set operators: UNION ALL, UNION, INTERSECT or MINUS. The recursive member must follow the anchor member and must reference query_name exactly once. You must combine the recursive member with the anchor member using the UNION ALL set operator.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.