Skip to Main Content
  • Questions
  • Connect By with multiple tables - what clauses are needed?

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, John.

Asked: June 07, 2017 - 10:36 am UTC

Last updated: July 08, 2019 - 3:08 am UTC

Version: 11.2.0.4

Viewed 50K+ times! This question is

You Asked

I have a question on something that I always get confused on. Most of the demo's on "Connect By" feature a single table and the "Connect" Clause is simply

Connect By Prior emp_id = mgr_id 


or similar. However I have a situation whereby I need to join several tables together before doing the connect by - because I need to order the result set siblings by a field on a table that is several tables way from the hierarchical structure table.

To produce a simple example consider I have a table which can hold multiple copies of the employee hierarchy, with one of those being flagged as the primary hierarchy.

Create View hierarchy_versions As
  Select 1 version, 'N' primary_hier From dual Union All
  Select 2 version, 'Y' primary_hier From dual;
  
Create View emp_hierarchy As
  Select employee_id, first_name, last_name, manager_id, 1 version From hr.employees Union All
  Select employee_id, first_name, last_name, manager_id, 2 version From hr.employees;


Now if I want to query back my primary hierarchy, my first instinct would be to do this:

Select level, first_name, last_name
  From emp_hierarchy h,
       hierarchy_versions v
 Where h.version = v.version
   And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id;


Because as you can see, I have restricted the hierarchy_versions table to only return one row (primary_hier = 'Y'). However what I find is that gives me completely the wrong results. Instead I have to do this:

Select level, first_name, last_name
  From emp_hierarchy h,
       hierarchy_versions v
 Where h.version = v.version
   And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id
       And Prior h.version = h.version;


It seems as though Oracle doesn't perform the query to get a result set (i.e. all employees in the primary hierarchy) and then start chaining the rows together - instead it seems to chain before the predicates in the where clause are applied. Is that right?

1) So in a multiple table query with a connect by, which tables (or keys thereof) do I need to reference in my connect by clause?

2) If in my example above I didn't have the employee name on the hierarchy table, only on another table, i.e. emp_names, and thus I have to join off to that table to get the names, do I now (theoretically) need to add that into the Connect By?

Create View emp_names As 
  Select employee_id, first_name, last_name From hr.employees;


Select level, n.first_name, n.last_name
  From emp_hierarchy h,
       hierarchy_versions v, 
       emp_names n
 Where h.version = v.version
   And n.employee_id = h.employee_id
   And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id
       And Prior h.version = h.version;


Assume that emp_names table actually holds multiple names - in different languages:

Create or Replace View emp_names As 
  Select employee_id, first_name, last_name, 'UK' lang From hr.employees Union All
  Select employee_id, first_name, last_name || 'O', 'SPANISH' lang From hr.employees;


And I want to view my hierarchy with the Spanish names:

Select level, n.first_name, n.last_name
  From emp_hierarchy h,
       hierarchy_versions v, 
       emp_names n
 Where h.version = v.version
   And n.employee_id = h.employee_id
   And v.primary_hier = 'Y'
   And n.lang='SPANISH'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id
       And Prior h.version = h.version;


It again breaks. The fix is to add "And Prior n.lang=n.lang".

So is it where I add any table that is join AND constrained by a predicate (i.e. LANG above), I need to add those columns into the Connect By??

with LiveSQL Test Case:

and Chris said...

So, the order Oracle Database processes connect by is:

A join, if present, is evaluated first, whether the join is specified in the FROM clause or with WHERE clause predicates.

The CONNECT BY condition is evaluated.

Any remaining WHERE clause predicates are evaluated.


https://docs.oracle.com/database/122/SQLRF/Hierarchical-Queries.htm#SQLRF52332

So when you have:

Select level, first_name, last_name
  From emp_hierarchy h,
       hierarchy_versions v
 Where h.version = v.version
   And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id;


First it joins emp_hierarchy to hierarchy_versions. So you have two rows for each employee.

Then it processes the connect by.

At this point you have two rows for employee 100:

EMPLOYEE_ID  FIRST_NAME   LAST_NAME    MANAGER_ID  VERSION  VERSION  PRIMARY_HIER  
100          Steven       King                     1        1        N             
100          Steven       King                     2        2        Y 


And two rows for each of his direct reports:

EMPLOYEE_ID  FIRST_NAME   LAST_NAME    MANAGER_ID  VERSION  VERSION  PRIMARY_HIER
101          Neena        Kochhar      100         1        1        N             
101          Neena        Kochhar      100         2        2        Y             
102          Lex          De Haan      100         1        1        N             
102          Lex          De Haan      100         2        2        Y
... 


You're only joining rows where manager_id equals the previous employee_id. So Neena's non primary row joins to both of Steven's. Same for her primary. So you get four rows for Neena. And so on down the hierarchy. At this point your data looks like:

LEVEL  EMPLOYEE_ID  FIRST_NAME  LAST_NAME  MANAGER_ID  VERSION  VERSION  PRIMARY_HIER  
1      100          Steven      King                   2        2        Y             
1      100          Steven      King                   1        1        N             
2      101          Neena       Kochhar    100         2        2        Y             
2      101          Neena       Kochhar    100         1        1        N             
2      101          Neena       Kochhar    100         2        2        Y             
2      101          Neena       Kochhar    100         1        1        N


Once it's built the tree, the database finally filters out the rows where the primary isn't 'Y'. But Neena still has two rows where primary = 'Y'! This is why you end up doubling all the children down the tree.

If you're not sure which order the operations happen, I find it's helpful to look at the execution plan with rowstats and predicates:

set serveroutput off
alter session set statistics_level = all;
Select level, h.*, v.*
  From emp_hierarchy h,
       hierarchy_versions v
 Where h.version = v.version
   And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id
order by level, employee_id;

select *
from   table ( dbms_xplan.display_cursor (
    null,
    null,
    'ROWSTATS LAST +PREDICATE'
  ) );

PLAN_TABLE_OUTPUT                                                                                    
SQL_ID  40hvc9sm8t0jb, child number 1                                                                
-------------------------------------                                                                
Select level, h.*, v.*   From emp_hierarchy h,                                                       
hierarchy_versions v  Where h.version = v.version    And v.primary_hier                              
= 'Y' Start With h.employee_id = 100 Connect By Prior h.employee_id =                                
h.manager_id order by level, employee_id                                                             
                                                                                                     
Plan hash value: 1579310819                                                                          
                                                                                                     
---------------------------------------------------------------------------------------------------  
| Id  | Operation                                 | Name               | Starts | E-Rows | A-Rows |  
---------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                          |                    |      1 |        |    437 |  
|   1 |  SORT ORDER BY                            |                    |      1 |      2 |    437 |  
|*  2 |   FILTER                                  |                    |      1 |        |    437 |  
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|                    |      1 |        |    874 |  
|*  4 |     HASH JOIN                             |                    |      1 |      4 |    214 |  
|   5 |      VIEW                                 | HIERARCHY_VERSIONS |      1 |      2 |      2 |  
|   6 |       UNION-ALL                           |                    |      1 |        |      2 |  
|   7 |        FAST DUAL                          |                    |      1 |      1 |      1 |  
|   8 |        FAST DUAL                          |                    |      1 |      1 |      1 |  
|   9 |      VIEW                                 | EMP_HIERARCHY      |      1 |    214 |    214 |  
|  10 |       UNION-ALL                           |                    |      1 |        |    214 |  
|  11 |        TABLE ACCESS FULL                  | EMPLOYEES          |      1 |    107 |    107 |  
|  12 |        TABLE ACCESS FULL                  | EMPLOYEES          |      1 |    107 |    107 |  
---------------------------------------------------------------------------------------------------  
                                                                                                     
Predicate Information (identified by operation id):                                                  
---------------------------------------------------                                                  
                                                                                                     
   2 - filter("V"."PRIMARY_HIER"='Y')                                                                
   3 - access("H"."MANAGER_ID"=PRIOR NULL)                                                           
       filter("H"."EMPLOYEE_ID"=100)                                                                 
   4 - access("H"."VERSION"="V"."VERSION")


Here you can see that the join happens first (step 4), then the connect by (step 3) and finally the filtering (step 2). Looking at the A(ctual)-rows also helps you see if you're on the right track. Notice there's 874 rows at step 3. Way more than in the tables! So probably not what you want.

There are several ways to solve this. Checking the previous version matches the current as you've done is one way. This stops both of Neena's rows joining to both of King's. Just those where the version is the same, stopping the doubling up.

Or you could join in a subquery. Then connect by the result of that:

select level,
       v.*
from (
    select h.employee_id,
           h.manager_id
    from emp_hierarchy h,
         hierarchy_versions v
    where h.version        = v.version
      and v.primary_hier   = 'Y'
  ) v
start with  employee_id = 100
connect by  prior employee_id = manager_id
order by 1, 2;

LEVEL  EMPLOYEE_ID  MANAGER_ID  
1      100                      
2      101          100         
2      102          100         
2      114          100         
2      120          100
... 

Rating

  (7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

I think I get it

John Keymer, June 08, 2017 - 9:24 am UTC

Thanks, very clearly explained, and I think I now understand. So ultimately my last question:

So is it where I add any table that is join AND constrained by a predicate (i.e. LANG above), I need to add those columns into the Connect By??

...is Yes?

In general is there any performance gains or losses likely from using a sub-query and doing the connect by on that (i.e. as per your last example)?
Is there any risk that the optimizer might rewrite the query and we're in the same situation?
Chris Saxon
June 08, 2017 - 3:14 pm UTC

The real problem is that emp_hierarchy duplicates all the rows except for one column. So your connect by is on non-unique values. If you define the view as:

Create or replace View emp_hierarchy As
  Select employee_id, first_name, last_name, manager_id, 2 version From hr.employees;


the problem goes away. So really the rule is ensure that your connect by uniquely identifies a parent row. When a child links to two or more parents you get the unexpected increase. This could be a problem without a join, e.g. when you connect by a subset of a compound primary key.

In general is there any performance gains or losses likely from using a sub-query and doing the connect by on that (i.e. as per your last example)?

If you do the join and filtering inside the subquery then there's less data for the connect by to process. So I'd expect this to be faster in most cases.

Is there any risk that the optimizer might rewrite the query and we're in the same situation?

I guess it's possible. You could always use the no_merge hint to prevent this. Or other non-mergeable constructs.

Kashif, February 22, 2018 - 9:45 am UTC

Can you please tell how can i write a hierarchical query for 2 tables like Employee and Departments from known Schema (HR).
Department as on Level 1 and Employee on Level 2
I have to Populate tree from these 2 tables with Departments and Its Employees.
Chris Saxon
February 22, 2018 - 9:48 am UTC

What exactly are you trying to do? What should the output of the query look like using HR data?

A reader, February 22, 2018 - 10:44 am UTC

Required Result
Executive
King 100
Kochhar 101
De Haan 102

I want to Populate Tree
with Department as its Parent NODE and Employees as Its Child Nodes.
Chris Saxon
February 22, 2018 - 5:01 pm UTC

You could do something like:

with dept_tree (
  nm, employee_id, manager_id, lev
) as (
  select department_name nm, manager_id employee_id, null manager_id, 1 lev
  from   hr.departments d
  where  d.manager_id is not null
  union all 
  select e.last_name, e.employee_id, e.manager_id, d.lev + 1
  from   hr.employees e
  join   dept_tree d
  on     e.manager_id = d.employee_id
) search depth first by employee_id set seq
  select lpad(' ', lev-1, ' ') || nm || '-' || employee_id
  from   dept_tree d;

LPAD('',LEV-1,'')||NM||'-'||EMPLOYEE_ID   
Executive-100                             
 Kochhar-101                              
  Greenberg-108                           
   Faviet-109                             
   Chen-110                               
   Sciarra-111                            
   Urman-112                              
   Popp-113                               
  Whalen-200                              
  Mavris-203                              
  Baer-204                                
  Higgins-205                             
   Gietz-206  


Though this duplicates many employees, because they all appear under the Executive department + their own.

What precisely do you want to see here?

Requirement

Kashif, February 24, 2018 - 7:44 am UTC

Thank you very much for quick responses.

SELECT d.department_id, d.department_name, e.employee_id,
       e.first_name || ' ' || e.last_name
  FROM hr.employees e, hr.departments d
 WHERE e.department_id = d.department_id
 ORDER BY 1, 3

DEPARTMENT_ID DEPARTMENT_NAME                EMPLOYEE_ID ENAME
           10 Administration                         200 Jennifer Whalen
           10 Administration                         209 M Khalid
           20 Marketing                              201 Michael Hartstein
           20 Marketing                              202 Pat Fay
           30 Purchasing                             114 Den Raphaely
           30 Purchasing                             115 Alexander Khoo
           40 Human Resources                        203 Susan Mavris
           50 Shipping                               120 Matthew Weiss

.........


My output should look like this.
Administration
    Jennifer Whalen
    M Khalid
Marketing
    Michael Hartstein
    Pat Fay
Purchasing
    Den Raphaely
    Alexander Khoo
Human Resources
    Susan Mavris
Shipping
    Matthew Weiss


I want to use Hierarchical Query
Chris Saxon
February 26, 2018 - 4:35 pm UTC

If you're looking for a simple list of all the employees in each department, you can get this by unioning the two tables together. Then padding the names based on whether the row is for a department or employee:

with dept_emps as (
  select d.department_id id, d.department_name nm, null emp_id
  from   hr.departments d
  union all
  select e.department_id id, e.first_name || ' ' || e.last_name nm,e.employee_id emp_id
  from   hr.employees e
)
  select case
           when emp_id is not null then
             '    ' || nm
           else nm
         end nm
  from   dept_emps
  order  by id, emp_id nulls first;

NM                      
Administration          
    Jennifer Whalen     
Marketing               
    Michael Hartstein   
    Pat Fay             
Purchasing              
    Den Raphaely        
    Alexander Khoo
... etc. ...

Kashif, February 27, 2018 - 10:22 am UTC

Thanks for you response.
with your help/hints i'm able to create my desired query and output.

CREATE TABLE DEPARTMENTS
   (DEPARTMENT_ID NUMBER(4), 
  DEPARTMENT_NAME VARCHAR2(30));

alter table DEPARTMENTS
  add constraint PK_DEPT_ID primary key (DEPARTMENT_ID);

INSERT INTO departments VALUES (10, 'Administration');
INSERT INTO departments VALUES (20, 'Marketing');
INSERT INTO departments VALUES (30, 'Purchasing');
INSERT INTO departments VALUES (40, 'Human Resources');


CREATE TABLE EMPLOYEES
  (EMPLOYEE_ID NUMBER(6), 
  ENAME VARCHAR2(20),
  DEPARTMENT_ID NUMBER(4),
  CONSTRAINT EMP_DEPT_CODE_FK FOREIGN KEY (DEPARTMENT_ID)
   REFERENCES DEPARTMENTS (DEPARTMENT_ID));

INSERT INTO employees VALUES (200, 'Jennifer', 10);
INSERT INTO employees VALUES (209, 'Khalid', 10);
INSERT INTO employees VALUES (201, 'Michael', 20);
INSERT INTO employees VALUES (202, 'Pat', 20);
INSERT INTO employees VALUES (114, 'Den', 30);
INSERT INTO employees VALUES (115, 'Alexander', 30);
INSERT INTO employees VALUES (116, 'Shelli', 30);
INSERT INTO employees VALUES (117, 'Sigal', 30);
INSERT INTO employees VALUES (118, 'Guy', 30);
INSERT INTO employees VALUES (119, 'Karen', 30);
INSERT INTO employees VALUES (203, 'Susan', 40);


Used the Following hierarchical query to get the desired output.

WITH employ AS
 (SELECT to_char(d.department_id) emp_id, d.department_name nm, NULL dept_id
    FROM departments d
  UNION ALL
  SELECT to_char(e.employee_id), e.ename nm,
         e.department_id
    FROM employees e)
SELECT LEVEL, dept_id, lpad(' ', LEVEL * 4, '.')|| nm NM, emp_id
  FROM employ
 START WITH dept_id IS NULL
CONNECT BY dept_id = PRIOR emp_id;


LEVEL DEPT_ID NM                             EMP_ID
----- ------- ------------------------------ ---------
    1         ... Administration             10
    2      10 ....... Jennifer               200
    2      10 ....... Khalid                 209
    1         ... Marketing                  20
    2      20 ....... Michael                201
    2      20 ....... Pat                    202
    1         ... Purchasing                 30
    2      30 ....... Den                    114
    2      30 ....... Alexander              115
    2      30 ....... Shelli                 116
    2      30 ....... Sigal                  117
    2      30 ....... Guy                    118
    2      30 ....... Karen                  119
    1         ... Human Resources            40
    2      40 ....... Susan                  203


From this Query I've successfully populate the Tree Object in Oracle Forms 12c.
Thanks once again.
Chris Saxon
February 27, 2018 - 11:03 am UTC

I don't understand why you need connect by? You seem to have everything you need in the union?

with employ as
 (select 1 lev, to_char(d.department_id) emp_id, d.department_name nm, department_id dept_id
    from departments d
  union all
  select 2 lev, to_char(e.employee_id), e.ename nm,
         e.department_id
    from employees e)
select lev, 
       case when lev = 2 then dept_id end dept, 
       lpad(' ', lev * 4, '.')|| nm nm, emp_id
  from employ e
order  by dept_id, lev, emp_id;

LEV   DEPT     NM                    EMP_ID   
    1   <null> ... Administration    10       
    2       10 ....... Jennifer      200      
    2       10 ....... Khalid        209      
    1   <null> ... Marketing         20       
    2       20 ....... Michael       201      
    2       20 ....... Pat           202      
    1   <null> ... Purchasing        30       
    2       30 ....... Den           114      
    2       30 ....... Alexander     115      
    2       30 ....... Shelli        116      
    2       30 ....... Sigal         117      
    2       30 ....... Guy           118      
    2       30 ....... Karen         119      
    1   <null> ... Human Resources   40       
    2       40 ....... Susan         203 

dubbele records

sonja, December 13, 2018 - 12:49 pm UTC

Perfect uitgelegd, had ik hetzelfde probleem nu opgelost.
Chris Saxon
December 13, 2018 - 1:37 pm UTC

Thanks (I think!) ;)

Joining Hierarchy table to flat table

SJ, June 26, 2019 - 10:56 am UTC

My requirement is slightly twisted where I have to compare values of Parent Child relationship with that of flat table having hierarchies values.

Oracle DB version is as below
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
"CORE 12.1.0.2.0 Production"
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

With the help of below query, I'm now able to get the level in hierarchy

select lpad(' ','level,*5,' ')||knvh.kunnr,knvh.hkunnr,level,kna1.name1,kna1.ktokd
from src_customer_knvh knvh,src_customer_kna1 kna1
where knvh.kunnr=kna1.kunnr
connect by prior knvh.kunnr=knvh.hkunnr
start with knvh.hkunnr is null
order siblings by knvh.hkunnr;

Requirement is to compare side by side Source(SAP Table data) with Target(Worksheet data)
But the structure of Target worksheet is flat like
Id,Name,Division,Div_Desc,Subsidiary,Subsidiary_Desc,Corp,Corp_Desc.

In Mapping between Source to Target is as below
Kunnr => id,
Its respective kunnr => Division.
Division kunnr => Subsidiary and
Subsidiary kunnr => Corp

I build the query but it gives no data output
Query ::
select lpad(' ',level*4,' ')||knvh.kunnr,knvh.hkunnr,level,kna1.name1,kna1.ktokd,decode(ltrim(lpad(' ',level*4,' ')||knvh.kunnr,'0'),tgt.corp,tgt.corp||tgt.corp_desc) tgt
from src_customer_knvh knvh,src_customer_kna1 kna1,tgt_customer tgt
where knvh.kunnr=kna1.kunnr
and ltrim(knvh.kunnr,'0')=tgt.id
connect by prior knvh.kunnr = knvh.hkunnr
start with knvh.hkunnr is null order siblings by knvh.hkunnr;

Can you please tell me where did I go wrong or give me corrected query.

In the mean while, I created a view based on first query which is as below
select lpad(' ',level*4,' ')||knvh.kunnr as KUNNR,knvh.hkunnr as HKUNNR,level sa,kna1.name1 as NAME1,kna1.ktokd as KTODK
from src_customer_knvh knvh, src_customer_kna1 kna1
where knvh.kunnr=kna1.kunnr
connect by prior knvh.kunnr=knvh.hkunnr
start with knvh.hkunnr is null order siblings by knvh.hkunnr;

How do I join a Hierarchieral View to a flat target table.

Let me know if I need to post sample/test data

Connor McDonald
July 08, 2019 - 3:08 am UTC

Please ask a new question complete with setup scripts.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library