Thanks for the question.

Asked: March 31, 2009 - 5:00 am UTC

Last updated: September 30, 2022 - 4:18 am UTC

Version: 10.2

Viewed 10K+ times! This question is

You Asked

Hello Tom,

I have to tune some queries that are written as shown in the example below:

select (select ....) as "col1",
col_2,
col_3,
(select ....) as "col_4",
(select ....) as "col_5",
from (select (select...), a,b, ...)
,table_1
,table_2

etc...

Do you have some guidelines regarding this way of writing queries ? Are there any drawbacks ?

Regards,
Stephane

and Tom said...

sure, there are drawbacks.

Now, just to be clear, there are two 'select' constructs you are using here

a) scalar subqueries - a single row, single column query that you use in place of a "column", it looks like a column or function.

b) an inline view - the from (select ... from (select .... ) )

The inline views - great, fine, no problem - we can make them disappear when we optimize.

The scalar subqueries - they should be the exception - not the rule.

You should be JOINING in general. Not using scalar subqueries.

if you have a query

select (select t2.a from t2 where t2.c = t1.c), 
       col_2,
       col_3,
       (select t2.b from t2 where t2.c = t1.c),
       (select t3.x from t3 where t3.d = t1.d),
  from .....

You should just JOIN (outer join if need be) to T2 and T3, it would be:

select t2.a, col_2, col_3, t2.b, t3.x 
  from ...., T2, T3
 where ....
   and t2.c(+) = t1.c
   and t3.d(+) = t1.d  -- outer join IF and ONLY IF necessary

Even if the query was something like:

select (select count(*) from t2 where t2.c = t1.c), 
       col_2,
       col_3,
       (select max(b) from t2 where t2.c = t1.c),
       (select sum(x) from t3 where t3.d = t1.d),
  from .....

you would just

select t2.a, col_2, col_3, t2.b, t3.x 
  from ...., (select count(*) a, max(b) b, c from t2 group by c) T2, 
             (select sum(x), d from t3 group by d) T3
 where ....
   and t2.c(+) = t1.c
   and t3.d(+) = t1.d  -- outer join IF and ONLY IF necessary

Scalar subqueries can sometimes be useful in "fast return" queries (optimize for initial response time). If you have a "large" query (it would return a TON of data if you fetched the last row) but you get the first 25, then the next 25 and so on (paging through the results for example) - you might use a scalar subquery to get the first 25 rows really fast - and then run the scalar subqueries say 25 times to fill in the blanks.

If you got ALL of the rows this way, it would be very slow, but since you are interested in getting 25 and stopping, getting 25 and stopping - it might be the most efficient way of getting the first 25.

eg:

ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table t1;
ops$tkyte%ORA10GR2> drop table t2;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t1
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select * from all_users;
ops$tkyte%ORA10GR2> alter table t1 add constraint t1_pk primary key(username);
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t2
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select * from all_objects;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table t2 add constraint t2_pk primary key(object_id);
ops$tkyte%ORA10GR2> create index t2_idx on t2(owner);
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';

Session altered.

ops$tkyte%ORA10GR2> declare
  2      cursor c1 is select t1.username, count(t2.object_id) cnt
  3                     from t1, t2
  4                    where t1.username = t2.owner(+)
  5                    group by t1.username
  6                    order by t1.username;
  7
  8      cursor c2 is select t1.username,
  9                         (select count(t2.object_id) from t2 where t2.owner = t1.username) cnt
 10                     from t1
 11                    order by username;
 12
 13      type array is table of c1%rowtype index by binary_integer;
 14      l_data array;
 15
 16  begin
 17      open c1;
 18      fetch c1 bulk collect into l_data limit 10;
 19      close c1;
 20      open c2;
 21      fetch c2 bulk collect into l_data limit 10;
 22      close c2;
 23  end;
 24  /

PL/SQL procedure successfully completed.

Now, you get the same answer, but tkprof shows in this case:

SELECT T1.USERNAME, COUNT(T2.OBJECT_ID) CNT
FROM
 T1, T2 WHERE T1.USERNAME = T2.OWNER(+) GROUP BY T1.USERNAME ORDER BY
  T1.USERNAME


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.09       0.09          0       1394          0          20
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6      0.09       0.09          0       1394          0          20

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 494     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  SORT GROUP BY (cr=697 pr=0 pw=0 time=50132 us)
  30112   HASH JOIN OUTER (cr=697 pr=0 pw=0 time=64103 us)
     41    INDEX FULL SCAN T1_PK (cr=1 pr=0 pw=0 time=155 us)(object id 171564)
  50193    TABLE ACCESS FULL T2 (cr=696 pr=0 pw=0 time=50359 us)
********************************************************************************
SELECT T1.USERNAME, (SELECT COUNT(T2.OBJECT_ID)
FROM
 T2 WHERE T2.OWNER = T1.USERNAME) CNT FROM T1 ORDER BY USERNAME


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         28          0          20
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6      0.00       0.00          0         28          0          20

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 494     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
     10  SORT AGGREGATE (cr=13 pr=0 pw=0 time=379 us)
    578   INDEX RANGE SCAN T2_IDX (cr=13 pr=0 pw=0 time=686 us)(object id 171567)
     10  INDEX FULL SCAN T1_PK (cr=1 pr=0 pw=0 time=50 us)(object id 171564)

Rating

(21 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

wow

A reader, March 31, 2009 - 11:19 am UTC

still like your rationale style for proofing things ;) many thanks

subqueries

A reader, March 31, 2009 - 2:42 pm UTC

Tom;

The perfect explanation i am looking for!

I used to make that mistake with scalar subqueries all the time until Tom explains to do it this way. they are slow...

No one can explain things bettter than Tom.

Excellent.

April 01, 2009 - 8:21 am UTC

... they are slow...
...

No, it is better said:

sometimes they are slower than a join, sometimes they are faster than a join, it depends. It depends on the context and the conditions.

I gave an example above whereby a scalar subquery just might be the best thing ever (initial response time). Other times - they are not necessarily the best thing ever (they are very 'procedural' like)

Here is another example - if you are going to call PLSQL from SQL, I recommend always using a scalar subquery, that is, instead of:

select f(x)
from t
where g(y) = ?;

code:

select (select f(x) from dual)
from t
where (select g(y) from dual) = ?;

search this site for
http://asktom.oracle.com/pls/ask/search?p_string=%22scalar+subquery+caching%22

This is true for all releases - even 11g with PLSQL FUNCTION result caching .

Tom .. You rae the Best

sam, March 31, 2009 - 4:33 pm UTC

That gives perfect explanation

Scalar subqueries: Drawbacks of queries having many "selects" within the main Select clause

Stéphane Vercellotti, April 01, 2009 - 3:08 am UTC

Hello Tom,

JOINING...what else ! It reminds me of a joke: "A SQL query walks into a bar and sees two tables..."

In addition to your detailed and very useful answer, I would also add the fact that using these scalar subqueries can make it sometimes quite difficult to understand the queries.

Regards,
Stephane

Why not to use scalar subquery

Jaromir D.B. Nemec, April 01, 2009 - 3:35 am UTC

Quote: The scalar subqueries - they should be the exception - not the rule.

Please, can you elaborate the reasons why the scalar subquery should be the exception. It this because the optimizer is (currently) forced to perform only specific execution plan (as NL), or execution plan is less readable, or some other reason?

Thanks,

Jaromir D.B. Nemec

April 01, 2009 - 9:26 am UTC

select (select from t2 where...)
from t1

is processed much like:

for x in (select * from t1)
loop
select * from t2 where .....

there is scalar subquery caching which can change that slightly, but in general, you'll be running a query inside of another query. You want to use SET processing and currently the optimizer does not roll/merge the scalar subquery into the outer query.

Scalar Queries

Arvind Mishra, April 01, 2009 - 5:35 am UTC

Very nice example!!

subquery

A reader, April 01, 2009 - 10:27 am UTC

Tom:

<<<sometimes they are slower than a join, sometimes they are faster than a join, it depends. It depends on the context and the conditions.>>

Does this mean i have to write the query both ways and see which one runs faster and then decide which method to use. Other than using subquery for calling pl/sql from sql do you see a cases to use subquery instead of a join.

Also, does the above apply to all databases in general or it is specific to Oracle.

April 01, 2009 - 4:14 pm UTC

or, you can try to mentally picture "what would have to take place in order to do something"

that is what I do.

That is why my chapter on sql tuning starts with:

...
This was probably the hardest chapter of the book to write. That is not because the material is all that complex. It¿s because I know what people want, and I know what can be delivered. What people want is the ten-step process for tuning any query. What can be delivered is knowledge about how queries are processed, which you can use and apply as you develop queries.

If there were a ten-step, or even a million-step, process by which any query (or a large percentage of queries) could be tuned, we would write a program to do it. Sure, there are actually many programs that try to do this, such as Oracle Enterprise Manager with its tuning pack, SQL Navigator, and others. What they do is primarily recommend indexing schemes to tune a query, suggest materialized views, and offer to add hints to the query to try other access plans. They show you different query plans for the same statement and allow you to pick one. These tuning tools use a very limited set of rules that sometimes can suggest that index or set of indexes you really should have thought of during your design. They offer ¿rule of thumb¿ (what I generally call ROT, since the acronym and the word it maps to are so appropriate for each other) SQL optimizations. If these were universally applicable, the optimizer would do them routinely.

In fact, the CBO does tuning already. It rewrites our queries all of the time. If there were an N-step process to tuning a query¿to writing efficient SQL¿ the optimizer would incorporate it all, and we would not be having a discussion on this topic. It is like the search for the Holy Grail. Maybe someday the software will be sophisticated enough to take our SQL, understand the question being asked, and process the question rather than syntax.

This chapter will provide the foundation knowledge you need in order to begin thinking about how queries could be processed. Once you understand query processing, you can make the query do what you think might be best. Writing efficient SQL is no different from writing efficient C, Java, or even English. It takes an understanding of how things work and what is available. If you don¿t know some technique is available, you¿ll never be able to optimize to use it.
....

and since each database implements things differently - something that is fast on database 1 might be not so fast on database 2.

If they were not different, there would only be one.

Why not... more like "when not to use"...the caveat

Duke Ganote, April 01, 2009 - 10:36 am UTC

Yep, they're logically equivalent, but implemented differently ( at least through version X... ) and always subject to change!

BTW, I concocted this example to help remember the terminology of "subquery factoring", "scalar subquery" and "inline view":

WITH
factored_subquery AS -- ANSI SQL-99: "Common Table Expression"
( SELECT 1 AS scalar_value
FROM DUAL
)
SELECT ( SELECT 1
FROM dual
WHERE ROWNUM = 1 ) as scalar_subquery
, scalar_value
FROM
( SELECT *
FROM factored_subquery ) /* as */ inline_view
/

April 01, 2009 - 4:15 pm UTC

you might consider adding just "subquery" as well ;)

and of course correlated subquery

Scalar subqueries and all_rows mode

James Su, April 01, 2009 - 1:53 pm UTC

Dear Tom,
The optimizer mode is ALL_ROWS not FIRST_ROWS in your example, does the cost of scalar subquery happen only when you fetch the rows? I remember in your other posts you said scalar subqueries are good for smaller sets (a few hundreds of rows) and you didn't said how they work when fetching some rows from a big set.

April 01, 2009 - 4:45 pm UTC

the scalar subqueries are typically executed as you retrieve the rows. I say typically because "it depends", if you push them way down into an inline view or a withsubquery, all bets are off.

scalar subqueries are good for small sets, whether that be a small set because you only fetch 25 rows out of a million, or it is small because - well - it is small.

scalar

Sam, April 01, 2009 - 7:32 pm UTC

Tom:

Would you also use joins on this type of query. If T1 has one million records and T2 has 100 records and you want the records from T1 that exist in T2.

SELECT order_no from T1
WHERE order_no IN (select distinct order_no from T2)

Are you scanning every record in T1 and checking if it exists in T2.

Would you do a natural join instead or it would be the same.

SELECT order_no from T1, T2 where t1.order_no=t2.order_no

April 02, 2009 - 9:28 am UTC

... Would you also use joins on this type of query. If T1 has one million records
and T2 has 100 records and you want the records from T1 that exist in T2.
...

of course not, that screams for IN. You wrote "that exist in", you don't want data from T2, you would not join, you would use in or exists (we don't care which, we consider them the same).

https://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html

Your two queries are not equivalent by the way. Not even close, they return different result sets.

You never need the distinct in the subquery - using it just implies "i don't really understand what I'm doing"

But that you did use it, implies to me that order_no is NOT unique in T2 - so joining T1 to T2 would give you a completely different result than using IN.

but never use distinct/unique in the IN subquery - not only is it not necessary, but it makes you look bad ;)

I do remember in the early 1990's - when using Sybase SQLServer, you had to do that because they unrolled an IN to a join without considering that there could be duplicates (eg: they processed the query entirely wrong) - but - you would never use distinct/unique in the subquery like that.

Really good article

Manuel Vidigal, April 13, 2009 - 5:49 pm UTC

query

A reader, April 13, 2009 - 11:17 pm UTC

<<but never use distinct/unique in the IN subquery - not only is it not necessary, but it makes you look bad ;) >>

Why does that make me look bad? is it bad coding - result should be the same with "distinct" or without "distinct"

April 14, 2009 - 11:17 am UTC

It makes you look bad because you obviously thought "I need to distinct that data in order for the result to be correct" (why else would you have put it there?). Whether the end result is the same is quite simply "not relevant", you are performing (in your mind) an operation you apparently think you need to.

But you don't.

select * from dept where dept.deptno in (select emp.deptno from emp);

emp.deptno is not unique.
IN does not require, desire, need, want to have a "distinct" there.

If you use distinct in the subquery, it would appear you believe you do need it - which is absolutely incorrect. It would not change the result and the only possible thing that could happen is the optimizer does the extra distinct step when it doesn't need to. That is - nothing good could come from it, only bad could come from it.

It is sort of like coding:

select * 
  from t1, t2
 where t1.key = t2.key(+) 
   and t2.val > 5;

It would appear to demonstrate a lack of understanding of SQL and the set operations it performs and how it performs them.

Sort of like my pet peeve - when people code:

select count(1) from ......

I immediately downgrade my opinion of their level of expertise. Why would you want to count non-null evaluations of the expression "1" - you apparently want to count the ROWS in the result set, so just use count(*), count(*) says what you mean - count the rows. Count(1) says "count the number of occurrences of the number 1 that are not null".

It is like reading "IM speak" in an email from anyone over the age of 18 - it knocks points off of their IQ score.

A reader, April 14, 2009 - 12:35 pm UTC

Hi Tom,

In reponse to the difference between count(1) and count(*), you said 
"...count(*) says what you mean - count the rows. Count(1) says "count the number of occurrences of the number 1 that are not null"..."

I could not understand the statement regarding count(1)..

Here is some data which i tried to test for both count(*) and count(1) :-

create table test_table (a number, b number);

insert into test_table values ( 1, 2);

insert into test_table values (1,1);

insert into test_table values (2,1);

insert into test_table values (3,1);

insert into test_table values (4,2);

insert into test_table values (5,0);

insert into test_table values (6,1);

Insert into test_table values (NULL,NULL);

COMMIT;

SQL> SELECT COUNT(1) FROM TEST_TABLE;

  COUNT(1)
----------
         8


SQL> SELECT COUNT(*) FROM TEST_TABLE;

  COUNT(*)
----------
         8

Both queries are yielding the same result..

I could not understand what you meant? Can you please elaborate further ? 

Thanks

April 14, 2009 - 2:11 pm UTC

did you want to

a) count the number of times the expression '1' evaluated to be not null in test_table

OR

b) count the number of rows in test_table

what were you trying to do? Because a) is done by select count(1) from test_table whereas b) is accomplished by select count(*) from test_table.

Why would you want to count the number of times the expression '1' is not null? That doesn't seem to make any sense?

Now, sometimes counting the number of rows in a set makes sense - and thankfully they gave us a way to do that - via select count(*)

I don't care if they return the same answer - one of them says you want to do one thing, the other says you want to do another thing.

You probably always mean to do (b) (as I cannot see the business case for counting non-null evaluations of the expression '1' - can you?)

But if you phrase it as (a) - with count(1), it just looks like you might not have read a SQL manual.

Yes, being a bit harsh, but as I said - this count(1) is a pet peeve of mine. Did you know we had a bug filed in older releases of Oracle whereby the kernel group had to rewrite internally "select count(1)" as "select count(*)" to avoid the performance penalty, ugh.

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14200/functions032.htm#SQLRF00624

Count 2

Duke Ganote, April 14, 2009 - 4:07 pm UTC

COUNT(*) and COUNT(1) have the same result, but COUNT(1) is just "semantically slippery" -- for the sake of a (falsely) perceived performance limitation.  It makes no more sense than COUNT(2)...

SQL> select count(*), count(1), count(2), count(a) from test_table;


COUNT(*)   COUNT(1)   COUNT(2)   COUNT(A)
-------- ---------- ---------- ----------
       8          8          8          7

April 14, 2009 - 4:22 pm UTC

actually, if we all started using 42 - I could see the humor in that, that would be OK :)

select count(42) from t;

it will result in the ultimate answer...

Ultimate answer?

Martijn, April 15, 2009 - 3:46 am UTC

MHO%xe> select count(42) from dual;

 COUNT(42)
----------
         1

I really don't get it anymore, it has become too philosophical: am I getting the ultimate answer or am I asking the ultimate question here? ;-)

Scalar subqueries - when to use

Pradeep, March 30, 2010 - 1:57 am UTC

Hello Tom,

I am a little confused about when to use scalar sub-queries. I executed the example from your book "Effective Oracle by Design" and this is what I get.
Oracle version is
Oracle Database 10g Express Edition Release 10.2.0.1.0:

SQL> ;
  1  select username, user_id, created,
  2  (select count(*) from all_constraints where owner = all_users.username) cons,
  3  (select count(*) from all_tables where owner=username) tables
  4* from all_users
SQL> /

18 rows selected.

Elapsed: 00:00:01.43
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      64489  consistent gets
          0  physical reads
          0  redo size
       1291  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         18  rows processed


SQL> select u.username, u.user_id, u.created, a.concnt, b.tabcnt
  2  from
  3  all_users u,
  4  (select owner, count(CONSTRAINT_NAME) concnt from all_constraints group by owner) a,
  5  (select owner, count(TABLE_NAME) tabcnt from all_tables group by owner) b
  6  where u.username = a.owner (+)
  7  and u.username = b.owner (+);

18 rows selected.

Elapsed: 00:00:00.15

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      19089  consistent gets
          0  physical reads
          0  redo size
       1304  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         18  rows processed

In contrast to your example in your book, I get lesser consistent gets in the outer-joined query than when I use scalar subquery. Even when I put the filter on created date, I get 22000 consistent gets for the scalar subquery as opposed to 19000 for outer joined query.

Also, when I executed your cursor example which you gave above, even without the limit clause, the scalar-subquery gives me lesser consistent gets than the outer-joined query.

So, my question is when should I be using the scalar subquery and when should I use outer joins? Should I just run both, check the plan and statistics and use whichever is better?

Thanks a lot.

April 05, 2010 - 10:36 am UTC

my examples were all about initial response time versus total query throughput.

If you want the first row as fast as possible - scalar subqueries might benefit you.

If you are going for ALL ROWS, then big bulk operations are.

You are getting all rows, and want to optimize for all rows, therefore, scalar subqueries probably do not make any sense for you.

I have examples such as "paging through a result set" where I get the 10 or 25 rows of interest from some base table and then apply the scalar subquery to just those 10 or 25 rows - preventing us from having to join to all of those tables while trying to FIND the 10 or 25 rows of interest.

I encourage you to re-read that section. What I tried to outline was "how they work". if you get that - you should be able to close your eyes, imagine the work the server would do - and have a reasonable guess beforehand as to which would make more sense.

Fix links?

A reader, January 17, 2011 - 10:38 am UTC

I would like to follow the links in Tom's responses to other articles, but they are broken. Help!

Jess, September 20, 2012 - 11:48 am UTC

Hi Tom,

When you have a query with numerous inline views, each doing 'select (*) from x where y', it is usually considered pretty bad form. The norm (and the way I learned it) seems to be to only select the columns you actually need, so fair enough. This can help with not needing to change the code later if the table changes for a column you don't care about.

But doesn't optimizer look at it and pick out only the columns that are being used anyway (as opposed to reading all the columns including those not used in the query)? If so, there shouldn't be negative performance implications. But if so, why is it considered to be so bad across the board?

Thank you.

September 26, 2012 - 12:04 pm UTC

This can help with not needing to change the code later if the table
changes for a column you don't care about.

that is the least of the reasons. The most important reasons are:

a) documentation, you get what you need and nothing more, if you are selecting it, there must be a reason for it

b) performance, if you select * and don't need *, you are making it harder for the optimizer to optimize. select just what you want and you may well find you get a much better plan!

But doesn't optimizer look at it and pick out only the columns that are being
used anyway

it depends - it depends on what can and cannot be merged. Usually it can - but you cannot count on it in all cases. and it depends on the final select list too.

it is best to select what you need.

A reader, September 26, 2012 - 2:51 pm UTC

Thanks Tom!

Scalar query in GROUP BY

Asim, September 28, 2022 - 4:49 pm UTC

Till version 21c scalar queries and column aliases are not allowed in GROUP BY, but recently I came to know that in upcoming 23c , column aliases will be allowed in GROUP BY, see

https://technology.amis.nl/database/oracle-database-23c-new-feature-highlights-release-february-2023/

My question is, in 23c, if I give a column alias to a scalar query in select list, can then I use that alias in group by? Which means I have actually used scalar query in group by.

September 29, 2022 - 1:53 am UTC

When 23c comes out, you'll be able to test it.

No good doing that test now internally, because the functionality may change before release date.

What about 23c beta

Asim, September 29, 2022 - 2:33 am UTC

Isnt the 23c beta version available from october 2022? Today 29 sep 2022, only 1 day left

September 30, 2022 - 4:18 am UTC

Yes ... if you're a beta customer ... which of course means signing non-disclosure etc etc

Breadcrumb

Question and Answer