Skip to Main Content
  • Questions
  • Performance tuning with where statement

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Keith.

Asked: October 25, 2001 - 10:51 am UTC

Last updated: May 29, 2012 - 6:43 am UTC

Version: 8i

Viewed 1000+ times

You Asked

I have been looking fro documentation on which statement within the WHERE clause is executed first within any SQL Statement. As you know - depening on which one is executed would tell users which of the high degrees of complexity to use first. For example,

Select name, address, email
FROM USER_PROFILE
WHERE NAME like 'K%' AND
USER_ID > 100 ;

In this case the user id is the primary key and would be indexed rather than the name which is just a regular varchar2 field. If the second statement within the WHERE clause is executed first - then the statement would go faster than if the first statement was executed.

Can you please let me know what you think.
Thank you - Keith

and Tom said...

I think -- this is the job of the optimizer!

The order of evaluation of the where clause in a SQL statment is purposely "undefined". We can (and in fact do) change the order of evaluation whenever we feel like it.

The optimizer looks at the predicate, analyzes the available access paths, chooses the appropriate indexes and executes the query. You just write the query, it does the rest.

See
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1299201885045 <code>
for example -- there is a query that depending on the optimizer mode and settings evaluates the predicate differently.

So, just put them in any order, the optimizer will take care of the rest. This is true for the CBO (which is the only mode you should be running in in 8i). With the RBO, there is occasionally some advantage to putting the most selective predicates LAST.


Rating

  (9 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Right Indexes

Andre Whittick Nasser, October 25, 2001 - 10:30 pm UTC

Would you comment on what indexes would be appropriate to set up to improve performance in the statement above ?

How would composite indexes work in this case ?

Can you give some examples ?

Thanks again !

Tom Kyte
October 26, 2001 - 8:10 am UTC

Well, it depends on the skew of the data of course.  Looking at the predicate we have:

name like 'K%' and user_id > 100;

An index on any of

name
user_id
name, user_id
user_id, name

would/could be used.  However, if a large pct of the rows in this table satisfied "user_id > 100", the indexes with user_id on the leading edge are not very efficient.  Same would hold true for "name like 'K%'".

Only if one of the two was selective would it be efficient, and then that column should be on the leading edge in this case (it is NOT true that the most selective items should be on the leading edge in general!!!)




ops$tkyte@ORA717DEV.US.ORACLE.COM> create table t1
  2  as
  3  select object_name name, object_id user_id, created from all_objects
  4  /

Table created.

ops$tkyte@ORA717DEV.US.ORACLE.COM> create index t1_idx1 on t1(name,user_id);

Index created.

ops$tkyte@ORA717DEV.US.ORACLE.COM> create index t1_idx2 on t1(user_id,name);

Index created.

ops$tkyte@ORA717DEV.US.ORACLE.COM> 
ops$tkyte@ORA717DEV.US.ORACLE.COM> create table t2
  2  as
  3  select decode( mod(rownum,2), 0, 'K', 'A' ) || substr(name,2) name,
  4             200-user_id user_id,
  5             created
  6    from t1
  7  /

Table created.

ops$tkyte@ORA717DEV.US.ORACLE.COM> create index t2_idx1 on t2(name,user_id);

Index created.

ops$tkyte@ORA717DEV.US.ORACLE.COM> create index t2_idx2 on t2(user_id,name);

Index created.

ops$tkyte@ORA717DEV.US.ORACLE.COM> 
ops$tkyte@ORA717DEV.US.ORACLE.COM> 
ops$tkyte@ORA717DEV.US.ORACLE.COM> select sum( case when name like 'K%' then 1 else 0 end ) K_cnt,
  2             sum( case when user_id > 100  then 1 else 0 end ) hundred_cnt,
  3             count(*) cnt
  4    from t1
  5  /

     K_CNT HUNDRED_CNT        CNT
---------- ----------- ----------
        12       16871      16965

ops$tkyte@ORA717DEV.US.ORACLE.COM> select sum( case when name like 'K%' then 1 else 0 end ) K_cnt,
  2             sum( case when user_id > 100  then 1 else 0 end ) hundred_cnt,
  3             count(*) cnt
  4    from t2
  5  /

     K_CNT HUNDRED_CNT        CNT
---------- ----------- ----------
      8482          93      16965

ops$tkyte@ORA717DEV.US.ORACLE.COM> 
ops$tkyte@ORA717DEV.US.ORACLE.COM> alter session set sql_trace=true;

Session altered.

ops$tkyte@ORA717DEV.US.ORACLE.COM> select /*+ index( t1 t1_idx1 ) */ count(created)
  2    from t1
  3   where name like 'K%' and user_id > 100;

COUNT(CREATED)
--------------
            12

ops$tkyte@ORA717DEV.US.ORACLE.COM> select /*+ index( t1 t1_idx2 ) */ count(created)
  2    from t1
  3   where name like 'K%' and user_id > 100;

COUNT(CREATED)
--------------
            12

ops$tkyte@ORA717DEV.US.ORACLE.COM> 
ops$tkyte@ORA717DEV.US.ORACLE.COM> select /*+ index( t2 t2_idx1 ) */ count(created)
  2    from t2
  3   where name like 'K%' and user_id > 100;

COUNT(CREATED)
--------------
            38

ops$tkyte@ORA717DEV.US.ORACLE.COM> select /*+ index( t2 t2_idx2 ) */ count(created)
  2    from t2
  3   where name like 'K%' and user_id > 100;

COUNT(CREATED)
--------------
            38

Now, tkprof will show us:


select /*+ index( t1 t1_idx1 ) */ count(created)
  from t1
 where name like 'K%' and user_id > 100

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0          2          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.01          0          2          0           1

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
     12   INDEX RANGE SCAN (object id 21913)



********************************************************************************

select /*+ index( t1 t1_idx2 ) */ count(created)
  from t1
 where name like 'K%' and user_id > 100

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.06       0.09         95         96          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.07       0.09         95         96          0           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 52

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
     12   INDEX RANGE SCAN (object id 21914)

<b>Here, name is very selective (12 out of 17,000 start with K). So, when we use the index on (NAME,USER_ID), we do very very little IO.  It is very fast to identify the rows where name starts with K, look to see if the user_id is > 100 and then access the table using t1_idx1.

Now, when we use t1_idx2, we see it is somewhat (lots) less efficient.  USER_ID is not very selective when it comes to the predicate "user_id > 100".  Virtually EVERY row satisfies that -- hence we end up reading the entire index in this case.  We get the same answer, we just had to do 45 times the work

Onto the next case...</b>

select /*+ index( t2 t2_idx1 ) */ count(created)
  from t2
 where name like 'K%' and user_id > 100

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.03       0.03          0         51          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.04       0.03          0         51          0           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 52

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
     38   INDEX RANGE SCAN (object id 21916)

********************************************************************************

select /*+ index( t2 t2_idx2 ) */ count(created)
  from t2
 where name like 'K%' and user_id > 100

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          2          3          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          2          3          0           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 52

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
     38   INDEX RANGE SCAN (object id 21917)

<b>Reversing the selectivity of the data (with regards to this particular predicate) reverses the situation.  When name like 'K%' is not very selective (we made about 1/2 of the rows start with K), we find we do more work then the other query.  Its simply the converse</b>

So, what did we learn from this?  Actually not as much as you think.  We learned that for this SPECIFIC PREDICATE we could over analyze the data and come up with the perfect index.  However, this is not the only predicate we must be concerned with, there will be others.  You really have to look at all of the questions you will be asking of this table to figure out the proper indexing scheme. 

Perfect !!! Too bad I cannot rate you 10

Andre Whittick Nasser, October 26, 2001 - 9:26 am UTC

Brilliant explanation!

Being more of a production, nuts-and-bolts guy, I am sometimes somewhat confused as to this kind of subtleness.

No doubt I will bookmark this page !

Thanks a lot !

Denis Quinn, October 26, 2001 - 9:39 am UTC

Tom, I know this is not the place to ask questions, but...
Where do I go to ask a question??
I can't find any place to ask a question on your site.

Thank you

excellent!!! -- but when to go for bitmap indexes

nk, October 26, 2001 - 12:49 pm UTC

that is a brilliant answer!!!

i also heard about bitmap indexes..
can you tell me with an example when to go for bit map indexes and their benefit over
normal indexes

Tom Kyte
October 26, 2001 - 3:11 pm UTC

Even better, I can point you to two references

</code> http://docs.oracle.com/docs/cd/A81042_01/DOC/server.816/a76994/indexes.htm#97322

and my book -- click through on 
http://asktom.oracle.com <code>
both cover this topic in detail.

Great explanation Tom :)

Chris, November 14, 2002 - 2:05 am UTC

Thx it give me enlighment on indexing policy...:)



two different sum clause in a single query

dmv, May 07, 2004 - 6:36 am UTC

Tom,
   we wants to sum overall salary as well as sum of salary for the dept no = 10. 

we have written one query. kindly check it up, is it OK?
because, our table contains, 8 million records and also it will keep on grows in future. so, kindly check it up and advice us.

Records details are :
----------------------
SQL> SELECT EMPNO, DEPTNO, SAL FROM EMP;

     EMPNO     DEPTNO        SAL
---------- ---------- ----------
      7839         10       5000
      7698         30       2850
      7782         10       2450
      7566         20       2975
      7654         30       1250
      7499         30       1600
      7844         30       1500
      7900         30        950
      7521         30       1250
      7902         20       3000
      7369         20        800
      7788         20       3000
      7876         20       1100
      7934         10       1300

14 rows selected.

QUERY :
--------
SQL> select * 
  2  FROM (SELECT SUM(SAL) FROM EMP), (SELECT SUM(SAL) FROM EMP WHERE DEPTNO = 10);


  SUM(SAL)   SUM(SAL)
---------- ----------
     29025       8750

Elapsed: 00:00:01.09

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   NESTED LOOPS
   2    1     VIEW
   3    2       SORT (AGGREGATE)
   4    3         TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
   5    4           INDEX (RANGE SCAN) OF 'EMP_DEPTNO_IDX' (NON-UNIQUE
          )

   6    1     VIEW
   7    6       SORT (AGGREGATE)
   8    7         TABLE ACCESS (FULL) OF 'EMP'


            In this execution plan, whether VIEW will affect the performance of this query? 

is there any simple query is possible in 9i?

Thanks
DMV 

Tom Kyte
May 07, 2004 - 8:36 am UTC

scott@ORA9IR2> select sum(sal), sum(decode(deptno,10,sal)) from emp;

SUM(SAL) SUM(DECODE(DEPTNO,10,SAL))
---------- --------------------------
29025 8750



Doubts about order of columns in an index

Carlos, May 07, 2004 - 9:45 am UTC

Hi Tom,

I become a little bit confused with your answer on this topic of choosing the order of columns in an index based on their selectivity, and your discussion about the "myth" that most discriminating elements should be first in your "Expert One-on-One" book : can you explain me the difference between these two cases? Because on this topic you recommend to put the most selectivity column first, and in the book you say that it doesn´t matter. Can you clarify me this?

Thanks


Tom Kyte
May 07, 2004 - 11:37 am UTC

can you point out where exactly I said something contrary to what is in the book?




I mean, i said:

Only if one of the two was selective would it be efficient, and then that column
should be on the leading edge in this case (it is NOT true that the most
selective items should be on the leading edge in general!!!)


here - but only for a very specific case -- the "where x like something and y > somethingelse". big range scans. if you think about what would have to happen here -- selectivity could play a large role.

the order of columns in an index is 100% driven by HOW you use the index.

Skip scanning

reader, May 08, 2004 - 12:32 pm UTC

In 9i, does skip scanning index feature help in cases like this, as the CBO will still use the index for col refered in WHRERE clause that may not be the leading edge of the index? Thanks.

Tom Kyte
May 10, 2004 - 7:18 am UTC

question as phrased doesn't make sense to me?

in cases like "what"

shall we care about the selectivity when doing composite index?

A reader, May 29, 2012 - 1:46 am UTC

Hi tom,

Actually i am also a little bit confused.

In below post, you said the selectivity does not matter.
http://asktom.oracle.com/pls/apex/f?p=100:11:0::NO::P11_QUESTION_ID:47238391968929

But here you recommend to consider the selectivity.

So please kindly confirm whether my below understanding is through:
1. for equal conditions like a=? and b=?, index(a,b) and index(b,a) does the same
only difference is we want to compress the index, in which case selective one should go first
and meanwhile, we can utilize 'index skip scan' somehow
2. for non-equal condition like a=? and b>?
we do need to consider the selectivity:
1. most satisfy b>?, index(a,b) to locate those a=? quickly then do a range scan, rather than range scan most if it is index(b,a)
2. most satisfy a=?, index(b,a) to locate those b>? firstly then filter out those a<>?


Tom Kyte
May 29, 2012 - 6:43 am UTC

I think you are talking about this where clause:

here - but only for a very specific case -- the "where x like something and y > somethingelse".
big range scans. if you think about what would have to happen here -- selectivity could play a
large role.



it would be nice to have some context in the future. If I guessed right - I'll start by repeating what I told you on that other posting:

close you eyes and IMAGINE the work that needs to be performed in each case (and if you cannot do that, study indexes a bit more until you can - seriously, unless you can visualize them, you'll never be able to optimally use them!


here we are using TWO range operators. here - we do not have an "=". here you do have to imagine how much work, in general would be done by the database.

Which of the two parts of the predicate are "more selective" - since we KNOW that we have to process every entry in the index where EITHER OF:

where x like ?

OR

where y >= ?

if x like ? would generate 1,000,000 hits in the index and y >= ? would generate 50, I'd like to see y first in the index.

If the numbers were reversed - I like to see x first.


1) correct
2) for a=? and b>? we want a first, see the other posting you reference for why again. if the where clause is a=? and b>? the index on (a,b) will never generate more work than an index on (b,a) and furthermore will in general result in less work being performed.

an index on (b,a) will NEVER generate LESS work and in general would cause MORE work to be performed.



selectivity only comes into play when you have all range operations - no equals are involved.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.