Thanks for the question, Rob.

Asked: March 10, 2011 - 3:54 pm UTC

Last updated: June 03, 2013 - 2:35 pm UTC

Version: 10.2.0

Viewed 10K+ times! This question is

You Asked

Tom,

I hate it when people start off questions to celebrities with "I love everything you do and you are my hero" or the like, so I will resist my urges to write similar and jump into my question.

My question is about correlated subqueries, something I was taught about in my Oracle Database SQL Fundamentals course I took a few weeks ago. I've been writing SQL for a long time and have gone to great lengths to avoid correlated subqueries through the use of joins. It's more difficult to write the query, but the performance gains are astronomical; executing a query once instead of once per row in the result set.

Is it ever correct, from a performance standpoint or a "do the right thing" standpoint, to use a correlated subquery? Is the only reason to use one because they are easier for novices to write?

Thanks!

Rob

and Tom said...

I hate it when people start off questions to celebrities with "I love everything you do and you are my hero" or the like, so I will resist my urges to write similar and jump into my question.

thank you, I do appreciate that.

Correlated subqueries are not anything to be afraid of. The optimizer recognizes them and knows it can either

a) leave them "as is"
b) rewrite them as a join.

ops$tkyte%ORA11GR2> create table t1 as select * from all_objects;

Table created.

Elapsed: 00:00:04.40
ops$tkyte%ORA11GR2> create table t2 as select * from all_users;

Table created.

Elapsed: 00:00:00.04
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> alter table t2 add constraint t2_pk primary key(username);

Table altered.

Elapsed: 00:00:00.03
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> select count(*) from t1 where exists (select null from t2 where t2.username = t1.owner);
Elapsed: 00:00:00.05

Execution Plan
----------------------------------------------------------
Plan hash value: 3614317411

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    34 |   293   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE       |       |     1 |    34 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|       | 67674 |  2246K|   293   (1)| 00:00:04 |
|   3 |    INDEX FULL SCAN    | T2_PK |    44 |   748 |     1   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T1    | 67674 |  1123K|   291   (1)| 00:00:04 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T2"."USERNAME"="T1"."OWNER")

Note
-----
   - dynamic sampling used for this statement (level=2)

ops$tkyte%ORA11GR2> select count(*) from t1, t2 where t1.owner = t2.username;
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 386051149

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |     1 |    34 |   293   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE     |       |     1 |    34 |            |          |
|*  2 |   HASH JOIN         |       | 67674 |  2246K|   293   (1)| 00:00:04 |
|   3 |    INDEX FULL SCAN  | T2_PK |    44 |   748 |     1   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1    | 67674 |  1123K|   291   (1)| 00:00:04 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."OWNER"="T2"."USERNAME")

Note
-----
   - dynamic sampling used for this statement (level=2)

see how it rewrote it as a join using a correlated subquery - not really different from doing the join ourselves?

Fear not the correlated subquery.

http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html

Rating

(8 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Differing Database Versions?

Rob Ogilvie, March 10, 2011 - 4:42 pm UTC

Tom,

I appreciate your prompt response and insight. Unfortunately, it leaves me with a further question. I implemented the "No Correlated Subqueries" rule when our database (10gR1) was running on a Sun T2000 (many *slow* cores) and found rewriting many NOT IN correlated subqueries to utilize more complicated join syntax resulted in many orders of magnitude worth of speed improvement.

Was I likely looking at something else I was doing that improved the speed, or was the 10gR1 optimizer perhaps confused by my particular queries?

I apologize for not having examples ready; this was legacy code that has been rewritten with the "No Correlated Subqueries" rule imposed and after my SQL course I am considering relaxing that rule.

Thank you again for your time!

Rob

March 10, 2011 - 4:51 pm UTC

not in correlated subqueries???

that sounds like a mistake. correlated subqueries are generally used with EXISTS and NOT EXISTS.

????

Rob Ogilvie, March 10, 2011 - 5:02 pm UTC

Tom,

We had been using NOT IN (correlated subquery) and NOT IN (subquery) in (very) old code and my brief research to verify I am using the proper terminology here pointed me to a page that stated:

"Prior to Oracle10g there was a bug that caused a huge execution difference between EXISTS and IN. Starting in 10g release 2 and beyond, Oracle will automatically rewrite an IN subquery to use the EXISTS clause." (Reference dba-oracle.com)

I believe now that was the cause of our slowness and I feel edificated and confident we can, indeed, start using correlated (and other) subqueries when they make sense from a code readability standpoint without as much emphasis on performance concerns.

Thanks again!

Rob

March 10, 2011 - 5:13 pm UTC

please don't reference that site.

but - funny enough - if it were entirely correct (it isn't) - then it would be telling you to USE a correlated subquery since most all EXISTS queries are in fact correlated (they are not generally useful otherwise)!!!

NOT IN correlated subquery doesn't sound right. Demonstrate to me an "on purpose not in correlated subquery".

This is a correlated subquery, we reference DEPT.DEPTNO in the subquery - that is the 'correlated' part:

select * from dept where NOT EXISTS (select null from emp where emp.deptno = dept.deptno );

This is a not in subquery - we would NOT generally correlate it - we generate a 'set' - we do not reference DEPT in the subquery. Note: very very very important to have the where deptno is not null - without it, performance will be very very bad AND the answer would be different from above!!!!)

select * from dept where deptno NOT IN (select deptno from emp WHERE DEPTNO IS NOT NULL);

Perhaps you were using non-correlated NOT IN's that the optimizer DID NOT KNOW could not return a null. If they can (NOT IN's only) return NULL - then performance can be pretty bad AND you probably get the wrong answer if there ever is a null.

ops$tkyte%ORA11GR2> connect scott/tiger
Connected.
scott%ORA11GR2> select * from dept where not exists (select null from emp where emp.deptno = dept.deptno);

    DEPTNO DNAME          LOC
---------- -------------- -------------
        40 OPERATIONS     BOSTON

Elapsed: 00:00:00.03
scott%ORA11GR2> select * from dept where deptno NOT IN (select deptno from emp where deptno IS NOT NULL);

    DEPTNO DNAME          LOC
---------- -------------- -------------
        40 OPERATIONS     BOSTON

Elapsed: 00:00:00.00
scott%ORA11GR2> select * from dept where deptno NOT IN (select deptno from emp);
    DEPTNO DNAME          LOC
---------- -------------- -------------
        40 OPERATIONS     BOSTON

Elapsed: 00:00:00.01

<b>they look the same, but the ARE NOT</b>


scott%ORA11GR2> update emp set deptno = null where rownum = 1;

1 row updated.

Elapsed: 00:00:00.00
scott%ORA11GR2> select * from dept where deptno NOT IN (select deptno from emp);

no rows selected

Elapsed: 00:00:00.00
<b>no data!!!</b>

scott%ORA11GR2> select * from dept where deptno NOT IN (select deptno from emp where deptno IS NOT NULL);

    DEPTNO DNAME          LOC
---------- -------------- -------------
        40 OPERATIONS     BOSTON

Elapsed: 00:00:00.01
scott%ORA11GR2> select * from dept where not exists (select null from emp where emp.deptno = dept.deptno);

    DEPTNO DNAME          LOC
---------- -------------- -------------
        40 OPERATIONS     BOSTON

Elapsed: 00:00:00.00
scott%ORA11GR2>

Rob Ogilvie, March 10, 2011 - 5:36 pm UTC

Tom,

I apologize again for not having better examples, but you have more than answered my original query and have provided enough enlightenment and topic of thought to keep me busy for a while (and better prepared for the SQL exam).

It is funny you mention that site as being unreferencable; I have oft found myself skipping over their Google results in favor of anything else, questioning the validity of information (and writing style). Their pictures are a bit creepy, too...

Again, I appreciate your responses!

Rob

query result confusion

reader, February 26, 2013 - 2:40 pm UTC

Hi Tom,

I have a big query and have a big doubt:).

I will post you that whole query by editing with some changed tables and column name later.
The query looks like this.

select column1 ,(subquery) column2,......... like this many columns
from tables
some normal and or condition
some couter joins;

-- it return 12 records

The query looks like this.

but when I am changing the query as below

select column1 from
(
select column1 ,(subquery) column2,......... like this many columns
from 5 tables
some normal and or condition
some couter joins
);
--it returns 12 records

when I am changing the query as below

select column1,column2 from
(
select column1 ,(subquery) column2,......... like this many columns
from 5 tables
some normal and or condition
some couter joins
);
--it returns 9 records ,which is very strange in nature.(I have not used distinct clause but only added another column)

I have never seen such strange things.

Could you please tell ,if what ever I stasted above is true then would it be a bug. (version is 9i R2) .
or it is possible to get this kind of varying result.

Thanks and regards,
not mentioning name.

February 26, 2013 - 5:04 pm UTC

please contact support with this one - that would definitely be a bug if what you describe above is true. The projection on a table should not change the cardinality of the result set.

thanks

reader, February 27, 2013 - 4:43 am UTC

thank you Tom for your view.

will take care of that(support things).

Terminology

S, February 27, 2013 - 2:53 pm UTC

" The projection on a table should not change the cardinality of the result set."

Projection on a relation can definitely change the cardinality of the resulting relation because it removes duplicate tuples. I'm not sure "projection on a table" is even defined.

S

February 28, 2013 - 7:51 am UTC

http://docs.oracle.com/cd/E11882_01/server.112/e25789/sqllangu.htm#sthref827

projection is the set of columns projected, resulting from, the table (whatever is in the from list the projection is on)

More on Terminology

S, February 28, 2013 - 8:43 am UTC

The term projection has been defined in the context of relational model in 1969 (to mean removing duplicates). If you wanted to say that Oracle is not a relational RDBMS, which I agree with, then I have no case. If you wanted Oracle to be considered a relational RDBMS than the definition of "projection on a table" in that document is careless and misleading use of terminology and should be corrected rather than repeated.

S

a complex query for me to solve

Gunjan, June 02, 2013 - 10:38 am UTC

I need the record when user updated his password for last time. The problem however is i need data only for users I have specified in "IN" clause

select user_id, password, date_changed, pwd_changed_by
from AAA
where user_id in (1, 219648, 9047740073)

above query would get me all password changes for these users. But I only need last record with max(changed)

please guide

June 03, 2013 - 2:35 pm UTC

no create
no inserts
no look......

hints for you, read about row_number or keep dense_rank.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:122801500346829407

shows you both approaches.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer