wow
March 31, 2009 - 11am Central time zone
Reviewer: A reader
still like your rationale style for proofing things ;) many thanks
subqueries
March 31, 2009 - 2pm Central time zone
Reviewer: A reader
Tom;
The perfect explanation i am looking for!
I used to make that mistake with scalar subqueries all the time until Tom explains to do it this
way. they are slow...
No one can explain things bettter than Tom.
Excellent.
Followup April 1, 2009 - 8am Central time zone:
... they are slow...
...
No, it is better said:
sometimes they are slower than a join, sometimes they are faster than a join, it depends. It depends on the context and the conditions.
I gave an example above whereby a scalar subquery just might be the best thing ever (initial response time). Other times - they are not necessarily the best thing ever (they are very 'procedural' like)
Here is another example - if you are going to call PLSQL from SQL, I recommend always using a scalar subquery, that is, instead of:
select f(x)
from t
where g(y) = ?;
code:
select (select f(x) from dual)
from t
where (select g(y) from dual) = ?;
search this site for
http://asktom.oracle.com/pls/ask/search?p_string=%22scalar+subquery+caching%22
This is true for all releases - even 11g with PLSQL FUNCTION result caching .
Tom .. You rae the Best
March 31, 2009 - 4pm Central time zone
Reviewer: sam from USA
That gives perfect explanation
Scalar subqueries: Drawbacks of queries having many "selects" within the main Select clause
April 1, 2009 - 3am Central time zone
Reviewer: Stéphane Vercellotti from Geneva, Switzerland
Hello Tom,
JOINING...what else ! It reminds me of a joke: "A SQL query walks into a bar and sees two
tables..."
In addition to your detailed and very useful answer, I would also add the fact that using these
scalar subqueries can make it sometimes quite difficult to understand the queries.
Regards,
Stephane
Why not to use scalar subquery
April 1, 2009 - 3am Central time zone
Reviewer: Jaromir D.B. Nemec from EU
Quote: The scalar subqueries - they should be the exception - not the rule.
Please, can you elaborate the reasons why the scalar subquery should be the exception. It this
because the optimizer is (currently) forced to perform only specific execution plan (as NL), or
execution plan is less readable, or some other reason?
Thanks,
Jaromir D.B. Nemec
Followup April 1, 2009 - 9am Central time zone:
select (select from t2 where...)
from t1
is processed much like:
for x in (select * from t1)
loop
select * from t2 where .....
there is scalar subquery caching which can change that slightly, but in general, you'll be running a query inside of another query. You want to use SET processing and currently the optimizer does not roll/merge the scalar subquery into the outer query.
Scalar Queries
April 1, 2009 - 5am Central time zone
Reviewer: Arvind Mishra from Pune, India
Very nice example!!
subquery
April 1, 2009 - 10am Central time zone
Reviewer: A reader
Tom:
<<<sometimes they are slower than a join, sometimes they are faster than a join, it depends. It
depends on the context and the conditions.>>
Does this mean i have to write the query both ways and see which one runs faster and then decide
which method to use. Other than using subquery for calling pl/sql from sql do you see a cases to
use subquery instead of a join.
Also, does the above apply to all databases in general or it is specific to Oracle.
Followup April 1, 2009 - 4pm Central time zone:
or, you can try to mentally picture "what would have to take place in order to do something"
that is what I do.
That is why my chapter on sql tuning starts with:
...
This was probably the hardest chapter of the book to write. That is not because the material is all that complex. It¿s because I know what people want, and I know what can be delivered. What people want is the ten-step process for tuning any query. What can be delivered is knowledge about how queries are processed, which you can use and apply as you develop queries.
If there were a ten-step, or even a million-step, process by which any query (or a large percentage of queries) could be tuned, we would write a program to do it. Sure, there are actually many programs that try to do this, such as Oracle Enterprise Manager with its tuning pack, SQL Navigator, and others. What they do is primarily recommend indexing schemes to tune a query, suggest materialized views, and offer to add hints to the query to try other access plans. They show you different query plans for the same statement and allow you to pick one. These tuning tools use a very limited set of rules that sometimes can suggest that index or set of indexes you really should have thought of during your design. They offer ¿rule of thumb¿ (what I generally call ROT, since the acronym and the word it maps to are so appropriate for each other) SQL optimizations. If these were universally applicable, the optimizer would do them routinely.
In fact, the CBO does tuning already. It rewrites our queries all of the time. If there were an N-step process to tuning a query¿to writing efficient SQL¿ the optimizer would incorporate it all, and we would not be having a discussion on this topic. It is like the search for the Holy Grail. Maybe someday the software will be sophisticated enough to take our SQL, understand the question being asked, and process the question rather than syntax.
This chapter will provide the foundation knowledge you need in order to begin thinking about how queries could be processed. Once you understand query processing, you can make the query do what you think might be best. Writing efficient SQL is no different from writing efficient C, Java, or even English. It takes an understanding of how things work and what is available. If you don¿t know some technique is available, you¿ll never be able to optimize to use it.
....
and since each database implements things differently - something that is fast on database 1 might be not so fast on database 2.
If they were not different, there would only be one.
Why not... more like "when not to use"...the caveat
April 1, 2009 - 10am Central time zone
Reviewer: Duke Ganote from Township of Bataiva, County of Clermont, State of Ohio, US of A
Yep, they're logically equivalent, but implemented differently ( at least through version X... )
and always subject to change!
BTW, I concocted this example to help remember the terminology of "subquery factoring", "scalar
subquery" and "inline view":
WITH
factored_subquery AS -- ANSI SQL-99: "Common Table Expression"
( SELECT 1 AS scalar_value
FROM DUAL
)
SELECT ( SELECT 1
FROM dual
WHERE ROWNUM = 1 ) as scalar_subquery
, scalar_value
FROM
( SELECT *
FROM factored_subquery ) /* as */ inline_view
/
Followup April 1, 2009 - 4pm Central time zone:
you might consider adding just "subquery" as well ;)
and of course correlated subquery
Scalar subqueries and all_rows mode
April 1, 2009 - 1pm Central time zone
Reviewer: James Su from Toronto
Dear Tom,
The optimizer mode is ALL_ROWS not FIRST_ROWS in your example, does the cost of scalar subquery
happen only when you fetch the rows? I remember in your other posts you said scalar subqueries are
good for smaller sets (a few hundreds of rows) and you didn't said how they work when fetching some
rows from a big set.
Followup April 1, 2009 - 4pm Central time zone:
the scalar subqueries are typically executed as you retrieve the rows. I say typically because "it depends", if you push them way down into an inline view or a withsubquery, all bets are off.
scalar subqueries are good for small sets, whether that be a small set because you only fetch 25 rows out of a million, or it is small because - well - it is small.
scalar
April 1, 2009 - 7pm Central time zone
Reviewer: Sam
Tom:
Would you also use joins on this type of query. If T1 has one million records and T2 has 100
records and you want the records from T1 that exist in T2.
SELECT order_no from T1
WHERE order_no IN (select distinct order_no from T2)
Are you scanning every record in T1 and checking if it exists in T2.
Would you do a natural join instead or it would be the same.
SELECT order_no from T1, T2 where t1.order_no=t2.order_no
Followup April 2, 2009 - 9am Central time zone:
... Would you also use joins on this type of query. If T1 has one million records
and T2 has 100 records and you want the records from T1 that exist in T2.
...
of course not, that screams for IN. You wrote "that exist in", you don't want data from T2, you would not join, you would use in or exists (we don't care which, we consider them the same).
http://www.oracle.com/technology/oramag/oracle/06-may/o36asktom.html
Your two queries are not equivalent by the way. Not even close, they return different result sets.
You never need the distinct in the subquery - using it just implies "i don't really understand what I'm doing"
But that you did use it, implies to me that order_no is NOT unique in T2 - so joining T1 to T2 would give you a completely different result than using IN.
but never use distinct/unique in the IN subquery - not only is it not necessary, but it makes you look bad ;)
I do remember in the early 1990's - when using Sybase SQLServer, you had to do that because they unrolled an IN to a join without considering that there could be duplicates (eg: they processed the query entirely wrong) - but - you would never use distinct/unique in the subquery like that.
Really good article
April 13, 2009 - 5pm Central time zone
Reviewer: Manuel Vidigal from Portugal
query
April 13, 2009 - 11pm Central time zone
Reviewer: A reader
<<but never use distinct/unique in the IN subquery - not only is it not necessary, but it makes you
look bad ;) >>
Why does that make me look bad? is it bad coding - result should be the same with "distinct" or
without "distinct"
Followup April 14, 2009 - 11am Central time zone:
It makes you look bad because you obviously thought "I need to distinct that data in order for the result to be correct" (why else would you have put it there?). Whether the end result is the same is quite simply "not relevant", you are performing (in your mind) an operation you apparently think you need to.
But you don't.
select * from dept where dept.deptno in (select emp.deptno from emp);
emp.deptno is not unique.
IN does not require, desire, need, want to have a "distinct" there.
If you use distinct in the subquery, it would appear you believe you do need it - which is absolutely incorrect. It would not change the result and the only possible thing that could happen is the optimizer does the extra distinct step when it doesn't need to. That is - nothing good could come from it, only bad could come from it.
It is sort of like coding:
select *
from t1, t2
where t1.key = t2.key(+)
and t2.val > 5;
It would appear to demonstrate a lack of understanding of SQL and the set operations it performs and how it performs them.
Sort of like my pet peeve - when people code:
select count(1) from ......
I immediately downgrade my opinion of their level of expertise. Why would you want to count non-null evaluations of the expression "1" - you apparently want to count the ROWS in the result set, so just use count(*), count(*) says what you mean - count the rows. Count(1) says "count the number of occurrences of the number 1 that are not null".
It is like reading "IM speak" in an email from anyone over the age of 18 - it knocks points off of their IQ score.

April 14, 2009 - 12pm Central time zone
Reviewer: A reader from Oregon,USA
Hi Tom,
In reponse to the difference between count(1) and count(*), you said
"...count(*) says what you mean - count the rows. Count(1) says "count the number of occurrences of
the number 1 that are not null"..."
I could not understand the statement regarding count(1)..
Here is some data which i tried to test for both count(*) and count(1) :-
create table test_table (a number, b number);
insert into test_table values ( 1, 2);
insert into test_table values (1,1);
insert into test_table values (2,1);
insert into test_table values (3,1);
insert into test_table values (4,2);
insert into test_table values (5,0);
insert into test_table values (6,1);
Insert into test_table values (NULL,NULL);
COMMIT;
SQL> SELECT COUNT(1) FROM TEST_TABLE;
COUNT(1)
----------
8
SQL> SELECT COUNT(*) FROM TEST_TABLE;
COUNT(*)
----------
8
Both queries are yielding the same result..
I could not understand what you meant? Can you please elaborate further ?
Thanks
Followup April 14, 2009 - 2pm Central time zone:
did you want to
a) count the number of times the expression '1' evaluated to be not null in test_table
OR
b) count the number of rows in test_table
what were you trying to do? Because a) is done by select count(1) from test_table whereas b) is accomplished by select count(*) from test_table.
Why would you want to count the number of times the expression '1' is not null? That doesn't seem to make any sense?
Now, sometimes counting the number of rows in a set makes sense - and thankfully they gave us a way to do that - via select count(*)
I don't care if they return the same answer - one of them says you want to do one thing, the other says you want to do another thing.
You probably always mean to do (b) (as I cannot see the business case for counting non-null evaluations of the expression '1' - can you?)
But if you phrase it as (a) - with count(1), it just looks like you might not have read a SQL manual.
Yes, being a bit harsh, but as I said - this count(1) is a pet peeve of mine. Did you know we had a bug filed in older releases of Oracle whereby the kernel group had to rewrite internally "select count(1)" as "select count(*)" to avoid the performance penalty, ugh.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions032.htm#SQLRF00624
Count 2
April 14, 2009 - 4pm Central time zone
Reviewer: Duke Ganote from Amelia, Ohio USA
COUNT(*) and COUNT(1) have the same result, but COUNT(1) is just "semantically slippery" -- for the
sake of a (falsely) perceived performance limitation. It makes no more sense than COUNT(2)...
SQL> select count(*), count(1), count(2), count(a) from test_table;
COUNT(*) COUNT(1) COUNT(2) COUNT(A)
-------- ---------- ---------- ----------
8 8 8 7
Followup April 14, 2009 - 4pm Central time zone:
actually, if we all started using 42 - I could see the humor in that, that would be OK :)
select count(42) from t;
it will result in the ultimate answer...
Ultimate answer?
April 15, 2009 - 3am Central time zone
Reviewer: Martijn from NLD
MHO%xe> select count(42) from dual;
COUNT(42)
----------
1
I really don't get it anymore, it has become too philosophical: am I getting the ultimate answer or am I asking the ultimate question here? ;-)
|