Skip to Main Content
  • Questions
  • Identify cursors/query with more than x joins involved

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Hans.

Asked: September 05, 2023 - 7:59 am UTC

Last updated: September 14, 2023 - 1:07 pm UTC

Version: oracle 19c

Viewed 100+ times

You Asked

Hello,

I'd like to write a query based on dba_source/all_identifiers/all_dependencies to find all the queries/cursors that have more than x joins involved. Can you help me ?

Regards
Hans




and Chris said...

I don't know of an easy way to do this.

A quick hack: You can use PL/scope to count the number of tables in each statement and subtract one.

For example:

alter session set plscope_settings = 'identifiers:all';

create or replace procedure p as
begin
  for rws in ( 
    select * from hr.employees e
    join   hr.departments d
    on     d.department_id = e.department_id
    join   hr.jobs j
    on     j.job_id = e.job_id
    join   hr.locations l
    on     l.location_id = d.location_id
  ) loop
    null;
  end loop;
end p;
/

select src.name, sql_id, count(*) - 1 as joins,
       st.text
from   all_identifiers idt
join   all_source src
on     idt.line = src.line
and    idt.object_name = src.name
and    idt.owner = src.owner
join   all_statements st
on     idt.usage_context_id = st.usage_id
and    st.object_name = idt.object_name
and    st.object_type = idt.object_type 
and    st.owner = idt.owner
where  1=1
and    idt.usage = 'REFERENCE'
and    idt.type = 'TABLE'
group  by src.name, sql_id, st.text
having count(*) - 1 >= 3;

NAME       SQL_ID             JOINS
---------- ------------- ----------
TEXT                                                                                                                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
P          5k5a7a3a09pv5          3 
SELECT * FROM HR.EMPLOYEES E JOIN HR.DEPARTMENTS D ON D.DEPARTMENT_ID = E.DEPARTMENT_ID JOIN HR.JOBS J ON J.JOB_ID = E.JOB_ID JOIN HR.LOCATIONS L ON L.LOCATION_ID = D.LOCATION_ID


But this has several limitations, including if the query:

Accesses a view, any joins within this are hidden
Uses set operations (union/minus/intersect) these are counted as joins; which they aren't

An alternative is to check the cursor cache for current plans and search for join operations (NESTED LOOP, HASH JOIN, MERGE JOIN) within these. Ensure you exclude SYS queries when doing this!

Adaptive plans make this tricky. If you have these enabled, you need to count the join operations in the plan and subtract the number of STATISTICS COLLECTOR steps.

For example:

exec p();

select s.sql_id, s.plan_hash_value, s.child_number, 
       count( case when operation <> 'STATISTICS COLLECTOR' then 1 end ) 
       - count ( case when operation = 'STATISTICS COLLECTOR' then 1 end ) joins, 
       s.sql_text
from   v$sql_plan p
join   v$sql s
on     s.sql_id = p.sql_id
and    s.plan_hash_value = p.plan_hash_value
and    s.child_number = p.child_number 
where  operation in ( 'HASH JOIN', 'NESTED LOOPS', 'MERGE JOIN', 'STATISTICS COLLECTOR' )
and    parsing_schema_name <> 'SYS'
and    not exists (
  select * from v$sql_plan n
  where  s.sql_id = n.sql_id
  and    n.object_owner = 'SYS'
)
group  by s.sql_id, s.sql_text, s.plan_hash_value, s.child_number
having count(*) - count ( case when operation = 'STATISTICS COLLECTOR' then 1 end ) >= 3;

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER      JOINS SQL_TEXT                                                                                                                                                                           
------------- --------------- ------------ ---------- -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
5k5a7a3a09pv5      4184759747            0          4 SELECT * FROM HR.EMPLOYEES E JOIN HR.DEPARTMENTS D ON D.DEPARTMENT_ID = E.DEPARTMENT_ID JOIN HR.JOBS J ON J.JOB_ID = E.JOB_ID JOIN HR.LOCATIONS L ON L.LOCATION_ID = D.LOCATION_ID 


Again, this has limitations including:

Plans may not be cached
The optimizer can remove unnecessary joins in a query; so the plan has fewer joins than were written in the statement.
The nested nested loops operation

The query above demonstrates the second and third of these issues. It's an adaptive plan, so we need to avoid double-counting the possible joins.

And the nested loop version of the join actually does two joins to get the employee data. One to read the EMP_DEPARTMENT_IX index, another to get the table data itself. Even though the query only has one join to this table!

This is why the dictionary query above returns 4 instead of 3 joins.

You can see this in operations 4, 5, 12, & 13 in the plan below:

explain plan for 
SELECT * FROM HR.EMPLOYEES E JOIN HR.DEPARTMENTS D ON D.DEPARTMENT_ID = E.DEPARTMENT_ID 
JOIN HR.JOBS J ON J.JOB_ID = E.JOB_ID JOIN HR.LOCATIONS L ON L.LOCATION_ID = D.LOCATION_ID;

select * 
from   dbms_xplan.display( format => 'ADAPTIVE' );

---------------------------------------------------------------------------------------------------------
|   Id  | Operation                         | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                  |                   |   106 | 18232 |    12   (9)| 00:00:01 |
|  *  1 |  HASH JOIN                        |                   |   106 | 18232 |    12   (9)| 00:00:01 |
|     2 |   TABLE ACCESS FULL               | JOBS              |    19 |   627 |     3   (0)| 00:00:01 |
|  *  3 |   HASH JOIN                       |                   |   106 | 14734 |     9  (12)| 00:00:01 |
|-    4 |    NESTED LOOPS                   |                   |   106 | 14734 |     9  (12)| 00:00:01 |
|-    5 |     NESTED LOOPS                  |                   |       |       |            |          |
|-    6 |      STATISTICS COLLECTOR         |                   |       |       |            |          |
|     7 |       MERGE JOIN                  |                   |    27 |  1890 |     6  (17)| 00:00:01 |
|     8 |        TABLE ACCESS BY INDEX ROWID| DEPARTMENTS       |    27 |   567 |     2   (0)| 00:00:01 |
|     9 |         INDEX FULL SCAN           | DEPT_LOCATION_IX  |    27 |       |     1   (0)| 00:00:01 |
|  * 10 |        SORT JOIN                  |                   |    23 |  1127 |     4  (25)| 00:00:01 |
|    11 |         TABLE ACCESS FULL         | LOCATIONS         |    23 |  1127 |     3   (0)| 00:00:01 |
|- * 12 |      INDEX RANGE SCAN             | EMP_DEPARTMENT_IX |       |       |            |          |
|-   13 |     TABLE ACCESS BY INDEX ROWID   | EMPLOYEES         |     4 |   276 |     3   (0)| 00:00:01 |
|    14 |    TABLE ACCESS FULL              | EMPLOYEES         |   107 |  7383 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------


As this shows, even if your query contains N joins, the optimizer may transform this into M joins. This could be more or less than N!

So why exactly do you want to find queries with more than X joins?

Rating

  (1 rating)

Comments

hans, September 13, 2023 - 6:56 pm UTC

Thank you for your time to help me.


We've got some legacy code (over 30 years old), which one I would like to refactor. I noticed that there are queries with more than 20 joins. I want to start with the queries with the most joins and try to see whether I can rewrite them as a semi-join to improve the performance.





Chris Saxon
September 14, 2023 - 1:07 pm UTC

see whether I can rewrite them as a semi-join to improve the performance.

The optimizer is incredibly smart nowadays. There's a good chance it's already doing whatever optimization you're hoping for by switching to semi-joins.

The main exceptions are if you can:

Remove columns from the top-level select - this may enable index-only scans or omitting tables from the plan completely
Remove many references to the same table in one query - typically by using analytic functions.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.