Home>Question Details



-- Thanks for the question regarding "Drawbacks of queries having many "selects" within the main Select clause", version 10.2

Submitted on 31-Mar-2009 5:00 Central time zone
Last updated 14-Apr-2009 16:22

You Asked

Hello Tom,

I have to tune some queries that are written as shown in the example below:

select (select ....) as "col1",
    col_2,
    col_3,
    (select ....) as "col_4",
    (select ....) as "col_5",
from (select (select...), a,b, ...)
  ,table_1
  ,table_2

etc...

Do you have some guidelines regarding this way of writing queries ? Are there any drawbacks ?

Regards,
Stephane

and we said...

sure, there are drawbacks.

Now, just to be clear, there are two 'select' constructs you are using here

a) scalar subqueries - a single row, single column query that you use in place of a "column", it looks like a column or function.

b) an inline view - the from (select ... from (select .... ) )


The inline views - great, fine, no problem - we can make them disappear when we optimize.

The scalar subqueries - they should be the exception - not the rule.

You should be JOINING in general. Not using scalar subqueries.

if you have a query

select (select t2.a from t2 where t2.c = t1.c), 
       col_2,
       col_3,
       (select t2.b from t2 where t2.c = t1.c),
       (select t3.x from t3 where t3.d = t1.d),
  from .....


You should just JOIN (outer join if need be) to T2 and T3, it would be:


select t2.a, col_2, col_3, t2.b, t3.x 
  from ...., T2, T3
 where ....
   and t2.c(+) = t1.c
   and t3.d(+) = t1.d  -- outer join IF and ONLY IF necessary




Even if the query was something like:

select (select count(*) from t2 where t2.c = t1.c), 
       col_2,
       col_3,
       (select max(b) from t2 where t2.c = t1.c),
       (select sum(x) from t3 where t3.d = t1.d),
  from .....



you would just


select t2.a, col_2, col_3, t2.b, t3.x 
  from ...., (select count(*) a, max(b) b, c from t2 group by c) T2, 
             (select sum(x), d from t3 group by d) T3
 where ....
   and t2.c(+) = t1.c
   and t3.d(+) = t1.d  -- outer join IF and ONLY IF necessary



Scalar subqueries can sometimes be useful in "fast return" queries (optimize for initial response time). If you have a "large" query (it would return a TON of data if you fetched the last row) but you get the first 25, then the next 25 and so on (paging through the results for example) - you might use a scalar subquery to get the first 25 rows really fast - and then run the scalar subqueries say 25 times to fill in the blanks.

If you got ALL of the rows this way, it would be very slow, but since you are interested in getting 25 and stopping, getting 25 and stopping - it might be the most efficient way of getting the first 25.


eg:

ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table t1;
ops$tkyte%ORA10GR2> drop table t2;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t1
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select * from all_users;
ops$tkyte%ORA10GR2> alter table t1 add constraint t1_pk primary key(username);
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t2
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select * from all_objects;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table t2 add constraint t2_pk primary key(object_id);
ops$tkyte%ORA10GR2> create index t2_idx on t2(owner);
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 
12';

Session altered.

ops$tkyte%ORA10GR2> declare
  2      cursor c1 is select t1.username, count(t2.object_id) cnt
  3                     from t1, t2
  4                    where t1.username = t2.owner(+)
  5                    group by t1.username
  6                    order by t1.username;
  7
  8      cursor c2 is select t1.username,
  9                         (select count(t2.object_id) from t2 where t2.owner = 
t1.username) cnt
 10                     from t1
 11                    order by username;
 12
 13      type array is table of c1%rowtype index by binary_integer;
 14      l_data array;
 15
 16  begin
 17      open c1;
 18      fetch c1 bulk collect into l_data limit 10;
 19      close c1;
 20      open c2;
 21      fetch c2 bulk collect into l_data limit 10;
 22      close c2;
 23  end;
 24  /

PL/SQL procedure successfully completed.



Now, you get the same answer, but tkprof shows in this case:

SELECT T1.USERNAME, COUNT(T2.OBJECT_ID) CNT
FROM
 T1, T2 WHERE T1.USERNAME = T2.OWNER(+) GROUP BY T1.USERNAME ORDER BY
  T1.USERNAME


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.09       0.09          0       1394          0          20
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6      0.09       0.09          0       1394          0          20

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 494     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  SORT GROUP BY (cr=697 pr=0 pw=0 time=50132 us)
  30112   HASH JOIN OUTER (cr=697 pr=0 pw=0 time=64103 us)
     41    INDEX FULL SCAN T1_PK (cr=1 pr=0 pw=0 time=155 us)(object id 171564)
  50193    TABLE ACCESS FULL T2 (cr=696 pr=0 pw=0 time=50359 us)
********************************************************************************
SELECT T1.USERNAME, (SELECT COUNT(T2.OBJECT_ID)
FROM
 T2 WHERE T2.OWNER = T1.USERNAME) CNT FROM T1 ORDER BY USERNAME


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         28          0          20
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6      0.00       0.00          0         28          0          20

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 494     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  SORT AGGREGATE (cr=13 pr=0 pw=0 time=379 us)
    578   INDEX RANGE SCAN T2_IDX (cr=13 pr=0 pw=0 time=686 us)(object id 171567)
     10  INDEX FULL SCAN T1_PK (cr=1 pr=0 pw=0 time=50 us)(object id 171564)

Reviews    
5 stars wow   March 31, 2009 - 11am Central time zone
Reviewer: A reader 
still like your rationale style for proofing things ;) many thanks


5 stars subqueries   March 31, 2009 - 2pm Central time zone
Reviewer: A reader 
Tom;

The perfect explanation i am looking for! 

I used to make that mistake with scalar subqueries all the time until Tom explains to do it this 
way. they are slow...

No one can explain things bettter than Tom.

Excellent.


Followup   April 1, 2009 - 8am Central time zone:

... they are slow...
...

No, it is better said:

sometimes they are slower than a join, sometimes they are faster than a join, it depends. It depends on the context and the conditions.


I gave an example above whereby a scalar subquery just might be the best thing ever (initial response time). Other times - they are not necessarily the best thing ever (they are very 'procedural' like)


Here is another example - if you are going to call PLSQL from SQL, I recommend always using a scalar subquery, that is, instead of:

select f(x)
from t
where g(y) = ?;


code:

select (select f(x) from dual)
from t
where (select g(y) from dual) = ?;


search this site for
http://asktom.oracle.com/pls/ask/search?p_string=%22scalar+subquery+caching%22


This is true for all releases - even 11g with PLSQL FUNCTION result caching .
5 stars Tom .. You rae the Best   March 31, 2009 - 4pm Central time zone
Reviewer: sam from USA
That gives perfect explanation


4 stars Scalar subqueries: Drawbacks of queries having many "selects" within the main Select clause   April 1, 2009 - 3am Central time zone
Reviewer: Stéphane Vercellotti from Geneva, Switzerland
Hello Tom,

JOINING...what else ! It reminds me of a joke: "A SQL query walks into a bar and sees two 
tables..."

In addition to your detailed and very useful answer, I would also add the fact that using these 
scalar subqueries can make it sometimes quite difficult to understand the queries.

Regards,
Stephane


4 stars Why not to use scalar subquery   April 1, 2009 - 3am Central time zone
Reviewer: Jaromir D.B. Nemec from EU
Quote: The scalar subqueries - they should be the exception - not the rule. 

Please, can you elaborate the reasons why the scalar subquery should be the exception. It this 
because the optimizer is (currently) forced to perform only specific execution plan (as NL), or 
execution plan is less readable, or some other reason?

Thanks,

Jaromir D.B. Nemec


Followup   April 1, 2009 - 9am Central time zone:

select (select from t2 where...)
from t1

is processed much like:


for x in (select * from t1)
loop
  select * from t2 where .....


there is scalar subquery caching which can change that slightly, but in general, you'll be running a query inside of another query. You want to use SET processing and currently the optimizer does not roll/merge the scalar subquery into the outer query.
5 stars Scalar Queries   April 1, 2009 - 5am Central time zone
Reviewer: Arvind Mishra from Pune, India
Very nice example!!


5 stars subquery   April 1, 2009 - 10am Central time zone
Reviewer: A reader 
Tom:

<<<sometimes they are slower than a join, sometimes they are faster than a join, it depends. It 
depends on the context and the conditions.>>

Does this mean i have to write the query both ways and see which one runs faster and then decide 
which method to use. Other than using subquery for calling pl/sql from sql do you see a cases to 
use subquery instead of a join.

Also, does the above apply to all databases in general or it is specific to Oracle.


Followup   April 1, 2009 - 4pm Central time zone:

or, you can try to mentally picture "what would have to take place in order to do something"

that is what I do.


That is why my chapter on sql tuning starts with:

...
This was probably the hardest chapter of the book to write. That is not because the material is all that complex. It¿s because I know what people want, and I know what can be delivered. What people want is the ten-step process for tuning any query. What can be delivered is knowledge about how queries are processed, which you can use and apply as you develop queries.

If there were a ten-step, or even a million-step, process by which any query (or a large percentage of queries) could be tuned, we would write a program to do it. Sure, there are actually many programs that try to do this, such as Oracle Enterprise Manager with its tuning pack, SQL Navigator, and others. What they do is primarily recommend indexing schemes to tune a query, suggest materialized views, and offer to add hints to the query to try other access plans. They show you different query plans for the same statement and allow you to pick one. These tuning tools use a very limited set of rules that sometimes can suggest that index or set of indexes you really should have thought of during your design. They offer ¿rule of thumb¿ (what I generally call ROT, since the acronym and the word it maps to are so appropriate for each other) SQL optimizations. If these were universally applicable, the optimizer would do them routinely.

In fact, the CBO does tuning already. It rewrites our queries all of the time. If there were an N-step process to tuning a query¿to writing efficient SQL¿ the optimizer would incorporate it all, and we would not be having a discussion on this topic. It is like the search for the Holy Grail. Maybe someday the software will be sophisticated enough to take our SQL, understand the question being asked, and process the question rather than syntax.

This chapter will provide the foundation knowledge you need in order to begin thinking about how queries could be processed. Once you understand query processing, you can make the query do what you think might be best. Writing efficient SQL is no different from writing efficient C, Java, or even English. It takes an understanding of how things work and what is available. If you don¿t know some technique is available, you¿ll never be able to optimize to use it.
....



and since each database implements things differently - something that is fast on database 1 might be not so fast on database 2.

If they were not different, there would only be one.

3 stars Why not... more like "when not to use"...the caveat   April 1, 2009 - 10am Central time zone
Reviewer: Duke Ganote from Township of Bataiva, County of Clermont, State of Ohio, US of A
Yep, they're logically equivalent, but implemented differently ( at least through version X... ) 
and always subject to change!

BTW, I concocted this example to help remember the terminology of "subquery factoring", "scalar 
subquery" and "inline view":

WITH
factored_subquery AS -- ANSI SQL-99: "Common Table Expression"
( SELECT 1 AS scalar_value
    FROM DUAL
)
SELECT ( SELECT 1 
           FROM dual 
          WHERE ROWNUM = 1 ) as scalar_subquery
     , scalar_value
  FROM 
( SELECT * 
    FROM factored_subquery ) /* as */ inline_view
/



Followup   April 1, 2009 - 4pm Central time zone:

you might consider adding just "subquery" as well ;)

and of course correlated subquery


4 stars Scalar subqueries and all_rows mode   April 1, 2009 - 1pm Central time zone
Reviewer: James Su from Toronto
Dear Tom,
The optimizer mode is ALL_ROWS not FIRST_ROWS in your example, does the cost of scalar subquery 
happen only when you fetch the rows? I remember in your other posts you said scalar subqueries are 
good for smaller sets (a few hundreds of rows) and you didn't said how they work when fetching some 
rows from a big set.


Followup   April 1, 2009 - 4pm Central time zone:

the scalar subqueries are typically executed as you retrieve the rows. I say typically because "it depends", if you push them way down into an inline view or a withsubquery, all bets are off.

scalar subqueries are good for small sets, whether that be a small set because you only fetch 25 rows out of a million, or it is small because - well - it is small.
5 stars scalar   April 1, 2009 - 7pm Central time zone
Reviewer: Sam 
Tom:

Would you also use joins on this type of query. If T1 has one million records and T2 has 100 
records and you want the records from T1 that exist in T2.

SELECT order_no from T1 
 WHERE order_no IN (select distinct order_no from T2)

Are you scanning every record in T1 and checking if it exists in T2.

Would you do a natural join instead or it would be the same.

SELECT order_no from T1, T2 where t1.order_no=t2.order_no


Followup   April 2, 2009 - 9am Central time zone:

... Would you also use joins on this type of query. If T1 has one million records
and T2 has 100 records and you want the records from T1 that exist in T2.
...

of course not, that screams for IN. You wrote "that exist in", you don't want data from T2, you would not join, you would use in or exists (we don't care which, we consider them the same).


http://www.oracle.com/technology/oramag/oracle/06-may/o36asktom.html



Your two queries are not equivalent by the way. Not even close, they return different result sets.

You never need the distinct in the subquery - using it just implies "i don't really understand what I'm doing"

But that you did use it, implies to me that order_no is NOT unique in T2 - so joining T1 to T2 would give you a completely different result than using IN.


but never use distinct/unique in the IN subquery - not only is it not necessary, but it makes you look bad ;)


I do remember in the early 1990's - when using Sybase SQLServer, you had to do that because they unrolled an IN to a join without considering that there could be duplicates (eg: they processed the query entirely wrong) - but - you would never use distinct/unique in the subquery like that.
5 stars Really good article   April 13, 2009 - 5pm Central time zone
Reviewer: Manuel Vidigal from Portugal


5 stars query   April 13, 2009 - 11pm Central time zone
Reviewer: A reader 
<<but never use distinct/unique in the IN subquery - not only is it not necessary, but it makes you 
look bad ;) >>

Why does that make me look bad? is it bad coding - result should be the same with "distinct" or 
without "distinct"


Followup   April 14, 2009 - 11am Central time zone:

It makes you look bad because you obviously thought "I need to distinct that data in order for the result to be correct" (why else would you have put it there?). Whether the end result is the same is quite simply "not relevant", you are performing (in your mind) an operation you apparently think you need to.

But you don't.

select * from dept where dept.deptno in (select emp.deptno from emp);


emp.deptno is not unique.
IN does not require, desire, need, want to have a "distinct" there.

If you use distinct in the subquery, it would appear you believe you do need it - which is absolutely incorrect. It would not change the result and the only possible thing that could happen is the optimizer does the extra distinct step when it doesn't need to. That is - nothing good could come from it, only bad could come from it.


It is sort of like coding:
select * 
  from t1, t2
 where t1.key = t2.key(+) 
   and t2.val > 5;



It would appear to demonstrate a lack of understanding of SQL and the set operations it performs and how it performs them.


Sort of like my pet peeve - when people code:

select count(1) from ......

I immediately downgrade my opinion of their level of expertise. Why would you want to count non-null evaluations of the expression "1" - you apparently want to count the ROWS in the result set, so just use count(*), count(*) says what you mean - count the rows. Count(1) says "count the number of occurrences of the number 1 that are not null".


It is like reading "IM speak" in an email from anyone over the age of 18 - it knocks points off of their IQ score.
4 stars   April 14, 2009 - 12pm Central time zone
Reviewer: A reader from Oregon,USA
Hi Tom,

In reponse to the difference between count(1) and count(*), you said 
"...count(*) says what you mean - count the rows. Count(1) says "count the number of occurrences of 
the number 1 that are not null"..."

I could not understand the statement regarding count(1)..

Here is some data which i tried to test for both count(*) and count(1) :-

create table test_table (a number, b number);

insert into test_table values ( 1, 2);

insert into test_table values (1,1);

insert into test_table values (2,1);

insert into test_table values (3,1);

insert into test_table values (4,2);

insert into test_table values (5,0);

insert into test_table values (6,1);

Insert into test_table values (NULL,NULL);

COMMIT;

SQL> SELECT COUNT(1) FROM TEST_TABLE;

  COUNT(1)
----------
         8


SQL> SELECT COUNT(*) FROM TEST_TABLE;

  COUNT(*)
----------
         8

Both queries are yielding the same result..

I could not understand what you meant? Can you please elaborate further ? 

Thanks


Followup   April 14, 2009 - 2pm Central time zone:

did you want to

a) count the number of times the expression '1' evaluated to be not null in test_table

OR

b) count the number of rows in test_table

what were you trying to do? Because a) is done by select count(1) from test_table whereas b) is accomplished by select count(*) from test_table.


Why would you want to count the number of times the expression '1' is not null? That doesn't seem to make any sense?

Now, sometimes counting the number of rows in a set makes sense - and thankfully they gave us a way to do that - via select count(*)



I don't care if they return the same answer - one of them says you want to do one thing, the other says you want to do another thing.

You probably always mean to do (b) (as I cannot see the business case for counting non-null evaluations of the expression '1' - can you?)


But if you phrase it as (a) - with count(1), it just looks like you might not have read a SQL manual.

Yes, being a bit harsh, but as I said - this count(1) is a pet peeve of mine. Did you know we had a bug filed in older releases of Oracle whereby the kernel group had to rewrite internally "select count(1)" as "select count(*)" to avoid the performance penalty, ugh.


http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions032.htm#SQLRF00624

3 stars Count 2   April 14, 2009 - 4pm Central time zone
Reviewer: Duke Ganote from Amelia, Ohio USA
COUNT(*) and COUNT(1) have the same result, but COUNT(1) is just "semantically slippery" -- for the 
sake of a (falsely) perceived performance limitation.  It makes no more sense than COUNT(2)...

SQL> select count(*), count(1), count(2), count(a) from test_table;


COUNT(*)   COUNT(1)   COUNT(2)   COUNT(A)
-------- ---------- ---------- ----------
       8          8          8          7


Followup   April 14, 2009 - 4pm Central time zone:

actually, if we all started using 42 - I could see the humor in that, that would be OK :)

select count(42) from t;


it will result in the ultimate answer...
5 stars Ultimate answer?   April 15, 2009 - 3am Central time zone
Reviewer: Martijn from NLD
MHO%xe> select count(42) from dual;

 COUNT(42)
----------
         1


I really don't get it anymore, it has become too philosophical: am I getting the ultimate answer or am I asking the ultimate question here? ;-)


Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement