I have a question on something that I always get confused on. Most of the demo's on "Connect By" feature a single table and the "Connect" Clause is simply
Connect By Prior emp_id = mgr_id
or similar. However I have a situation whereby I need to join several tables together before doing the connect by - because I need to order the result set siblings by a field on a table that is several tables way from the hierarchical structure table.
To produce a simple example consider I have a table which can hold multiple copies of the employee hierarchy, with one of those being flagged as the primary hierarchy.
Create View hierarchy_versions As
Select 1 version, 'N' primary_hier From dual Union All
Select 2 version, 'Y' primary_hier From dual;
Create View emp_hierarchy As
Select employee_id, first_name, last_name, manager_id, 1 version From hr.employees Union All
Select employee_id, first_name, last_name, manager_id, 2 version From hr.employees;
Now if I want to query back my primary hierarchy, my first instinct would be to do this:
Select level, first_name, last_name
From emp_hierarchy h,
hierarchy_versions v
Where h.version = v.version
And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id;
Because as you can see, I have restricted the hierarchy_versions table to only return one row (primary_hier = 'Y'). However what I find is that gives me completely the wrong results. Instead I have to do this:
Select level, first_name, last_name
From emp_hierarchy h,
hierarchy_versions v
Where h.version = v.version
And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id
And Prior h.version = h.version;
It seems as though Oracle doesn't perform the query to get a result set (i.e. all employees in the primary hierarchy)
and then start chaining the rows together - instead it seems to chain
before the predicates in the where clause are applied. Is that right?
1) So in a multiple table query with a connect by, which tables (or keys thereof) do I need to reference in my connect by clause?
2) If in my example above I didn't have the employee name on the hierarchy table, only on another table, i.e. emp_names, and thus I have to join off to that table to get the names, do I now (theoretically) need to add that into the Connect By?
Create View emp_names As
Select employee_id, first_name, last_name From hr.employees;
Select level, n.first_name, n.last_name
From emp_hierarchy h,
hierarchy_versions v,
emp_names n
Where h.version = v.version
And n.employee_id = h.employee_id
And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id
And Prior h.version = h.version;
Assume that emp_names table actually holds multiple names - in different languages:
Create or Replace View emp_names As
Select employee_id, first_name, last_name, 'UK' lang From hr.employees Union All
Select employee_id, first_name, last_name || 'O', 'SPANISH' lang From hr.employees;
And I want to view my hierarchy with the Spanish names:
Select level, n.first_name, n.last_name
From emp_hierarchy h,
hierarchy_versions v,
emp_names n
Where h.version = v.version
And n.employee_id = h.employee_id
And v.primary_hier = 'Y'
And n.lang='SPANISH'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id
And Prior h.version = h.version;
It again breaks. The fix is to add "And Prior n.lang=n.lang".
So is it where I add any table that is join AND constrained by a predicate (i.e. LANG above), I need to add those columns into the Connect By??
So, the order Oracle Database processes connect by is:
A join, if present, is evaluated first, whether the join is specified in the FROM clause or with WHERE clause predicates.
The CONNECT BY condition is evaluated.
Any remaining WHERE clause predicates are evaluated. https://docs.oracle.com/database/122/SQLRF/Hierarchical-Queries.htm#SQLRF52332 So when you have:
Select level, first_name, last_name
From emp_hierarchy h,
hierarchy_versions v
Where h.version = v.version
And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id;
First it joins emp_hierarchy to hierarchy_versions. So you have two rows for each employee.
Then it processes the connect by.
At this point you have two rows for employee 100:
EMPLOYEE_ID FIRST_NAME LAST_NAME MANAGER_ID VERSION VERSION PRIMARY_HIER
100 Steven King 1 1 N
100 Steven King 2 2 Y
And two rows for each of his direct reports:
EMPLOYEE_ID FIRST_NAME LAST_NAME MANAGER_ID VERSION VERSION PRIMARY_HIER
101 Neena Kochhar 100 1 1 N
101 Neena Kochhar 100 2 2 Y
102 Lex De Haan 100 1 1 N
102 Lex De Haan 100 2 2 Y
...
You're only joining rows where manager_id equals the previous employee_id. So Neena's non primary row joins to both of Steven's. Same for her primary. So you get four rows for Neena. And so on down the hierarchy. At this point your data looks like:
LEVEL EMPLOYEE_ID FIRST_NAME LAST_NAME MANAGER_ID VERSION VERSION PRIMARY_HIER
1 100 Steven King 2 2 Y
1 100 Steven King 1 1 N
2 101 Neena Kochhar 100 2 2 Y
2 101 Neena Kochhar 100 1 1 N
2 101 Neena Kochhar 100 2 2 Y
2 101 Neena Kochhar 100 1 1 N
Once it's built the tree, the database finally filters out the rows where the primary isn't 'Y'. But Neena still has two rows where primary = 'Y'! This is why you end up doubling all the children down the tree.
If you're not sure which order the operations happen, I find it's helpful to look at the execution plan with rowstats and predicates:
set serveroutput off
alter session set statistics_level = all;
Select level, h.*, v.*
From emp_hierarchy h,
hierarchy_versions v
Where h.version = v.version
And v.primary_hier = 'Y'
Start With h.employee_id = 100
Connect By Prior h.employee_id = h.manager_id
order by level, employee_id;
select *
from table ( dbms_xplan.display_cursor (
null,
null,
'ROWSTATS LAST +PREDICATE'
) );
PLAN_TABLE_OUTPUT
SQL_ID 40hvc9sm8t0jb, child number 1
-------------------------------------
Select level, h.*, v.* From emp_hierarchy h,
hierarchy_versions v Where h.version = v.version And v.primary_hier
= 'Y' Start With h.employee_id = 100 Connect By Prior h.employee_id =
h.manager_id order by level, employee_id
Plan hash value: 1579310819
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 437 |
| 1 | SORT ORDER BY | | 1 | 2 | 437 |
|* 2 | FILTER | | 1 | | 437 |
|* 3 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | 874 |
|* 4 | HASH JOIN | | 1 | 4 | 214 |
| 5 | VIEW | HIERARCHY_VERSIONS | 1 | 2 | 2 |
| 6 | UNION-ALL | | 1 | | 2 |
| 7 | FAST DUAL | | 1 | 1 | 1 |
| 8 | FAST DUAL | | 1 | 1 | 1 |
| 9 | VIEW | EMP_HIERARCHY | 1 | 214 | 214 |
| 10 | UNION-ALL | | 1 | | 214 |
| 11 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 107 |
| 12 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 107 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("V"."PRIMARY_HIER"='Y')
3 - access("H"."MANAGER_ID"=PRIOR NULL)
filter("H"."EMPLOYEE_ID"=100)
4 - access("H"."VERSION"="V"."VERSION")
Here you can see that the join happens first (step 4), then the connect by (step 3) and finally the filtering (step 2). Looking at the A(ctual)-rows also helps you see if you're on the right track. Notice there's 874 rows at step 3. Way more than in the tables! So probably not what you want.
There are several ways to solve this. Checking the previous version matches the current as you've done is one way. This stops both of Neena's rows joining to both of King's. Just those where the version is the same, stopping the doubling up.
Or you could join in a subquery. Then connect by the result of that:
select level,
v.*
from (
select h.employee_id,
h.manager_id
from emp_hierarchy h,
hierarchy_versions v
where h.version = v.version
and v.primary_hier = 'Y'
) v
start with employee_id = 100
connect by prior employee_id = manager_id
order by 1, 2;
LEVEL EMPLOYEE_ID MANAGER_ID
1 100
2 101 100
2 102 100
2 114 100
2 120 100
...