Skip to Main Content
  • Questions
  • Why is this query syntatically correct?

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Douglas.

Asked: April 18, 2011 - 1:13 pm UTC

Last updated: April 19, 2011 - 7:11 pm UTC

Version: XE 11.2.0.2.0

Viewed 1000+ times

You Asked

In exploring correlated subqueries, I goofed and entered a query by mistake against the HR schema in XE 11.2.0.2.0.

I meant to enter the following:

select employee_id, department_id, salary
from employees e1
where salary >
( select avg ( salary )
from employees
where department_id = e1.department_id )
order by department_id, salary desc
/

But instead I entered:

SQL> select employee_id, department_id, salary
2 from employees e1
3 where salary >
4 ( select avg ( salary )
5 from departments
6 where department_id = e1.department_id )
7 order by department_id, salary desc
8 /

no rows selected

SQL>

How is this subquery syntactically correct? There is no SALARY column in the DEPARTMENTS table.

SQL> select avg ( salary )
2 from departments
3 /
select avg ( salary )
*
ERROR at line 1:
ORA-00904: "SALARY": invalid identifier


SQL>

and Tom said...

This is a frequently asked question - and yes, it is absolutely syntactically correct.

What you have coded is referred to as a correlated subquery.

You've probably seen queries such as:


select * from emp where exists (select null from dept where dept.deptno = emp.deptno);

and you've had no problem with that - the EMP.DEPTNO was "natural", expected, normal.


Well, you just coded a query a lot like that, your query is the same as:

select e1.employee_id, e1.department_id, e1.salary
from employees e1
where e1.salary >
( select avg ( E1.salary )
from departments
where department_id = e1.department_id )
order by e1.department_id, e1.salary desc


Maybe this will give you a justification to ALWAYS qualify columns :) that way - you *know* where they are coming from (or at least where you EXPECTED them to come from!!)

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, April 19, 2011 - 8:28 am UTC

Tom,
In addition to having unexpected query output, can reference to a column from a different table potentially generate a bad query plan?

Thanks...

Tom Kyte
April 19, 2011 - 6:54 pm UTC

but hold up here.

there IS NOT ANY UNEXPECTED RESULT.

there is ONLY THE LACK OF KNOWING HOW IT WORKS.

this is just a correlated subquery, nothing more, nothing less. very normal, very expected, very much used all of the time. Nothing abnormal is going on here. Every thing is not only expected, but very much anticipated. It is working as designed. 100%


It will only generate a bad plan if it isn't what you mean to do in the first place. in that case, the plan won't be a plan that actually answers your question...

Does it matter?

Harel, April 19, 2011 - 12:10 pm UTC

"In addition to having unexpected query output, can reference to a column from a different table
potentially generate a bad query plan?"

I wouldn't care about bad execution plans when the result is wrong. It shouldn't pass QA :)

A clarification

Douglas Latto, April 19, 2011 - 2:26 pm UTC

With all due respect, I don't think you answered my question. Your rewrite of my query also returned "no rows," because it retained the error I told you about of specifying DEPARTMENTS in the subquery when I really should have specified EMPLOYEES. I knew that. I could see that salary must have come from the EMPLOYEES table.

I didn't understand how the subquery worked when EMPLOYEES was not in the set of tables the subquery was retrieving FROM.

I think I get it now. A correlated subquery executes the subquery once for each row returned by the outer query, using one or more values returned by the outer query, in the subquery. In effect, Oracle executes a join of each row of the outer query, with the subquery. Since I mistakenly specified DEPARTMENTS in the subquery, each execution of the subquery did not return all of the salary values of the EMPLOYEES belonging to that department, but instead returned the one row that represented that department in the DEPARTMENTS table. The join of the one row being processed by the outer query, with the one row returned by the subquery, returned... one row! It then computed the average salary from the one row, which turned out to be the same as the salary. Since no employee's salary was greater than itself, the complete query returned no rows.

I see and agree with your point that all columns should be fully qualified for any query that specifies multiple tables - but that wasn't the problem here.

Tom Kyte
April 19, 2011 - 7:11 pm UTC

with all due respect, your question was:

"How is this subquery syntactically correct? There is no SALARY column in the DEPARTMENTS table. "

and that - that was answered IN FULL. I was explaining the concept, the syntax, the way SQL itself WORKS.

If, after reading this, you still don't understand "how" it worked - you have not understood corelated subqueries at all yes.

A correlated subquery executes the subquery once for each
row returned by the outer query, using one or more values returned by the outer
query, in the subquery.


that is a conceptual way to think of it, however, that is not the physical way it works.

select * from emp where exists ( select Null from dept where dept.deptno = emp.deptno)

might be able to be rewritten as:

select emp.* from emp, dept where emp.depto = dept.deptno;

assuming the constraints and all are in place - we don't HAVE to fire the query off.





I see and agree with your point that all columns should be fully qualified for
any query that specifies multiple tables - but that wasn't the problem here.


then what, pray tell, was the problem????????????????????

seriously - if not that, what WAS the problem?


the problem was simply:

a) you wrote a valid, correct correlated subquery
b) you didn't realize that you did
c) it didn't work the way you thought it should
d) it could all be avoided by using explicit references - correlation names - everywhere


you had an obvious bug in your query, that was the basic problem.
it would have been easily identified if you used correlation names every where, every time.