Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Milo.

Asked: February 09, 2007 - 12:03 pm UTC

Last updated: July 17, 2012 - 9:26 am UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Many web applications (including "Ask Tom") have an advanced search screen where the user can specify criteria in a number of fields. There are quite a few ways to build the SQL that goes with such a search screen, and I was wondering if you could give your input on the methods below. I'm particularly interested in the impact each method would have on caching, parsing, and execution plans. Also, if you know of a better way that is not listed below, please share it.

In the examples below, a search for books has many fields, but I'm only considering the search by "title" to simplify things. If a title is specified, it has to be an exact match. If no title is specified, all books are returned.

1) Native dynamic SQL using hardcoded values. (Creates a potentially infinite number of SQL statements.)
l_sql := 'INSERT INTO SEARCH_RESULTS SELECT book_id FROM BOOK WHERE 1=1';
IF ( p_title IS NOT NULL ) THEN
 l_sql := l_sql || ' AND title = ''' || p_title || '''';
END IF;
EXECUTE IMMEDIATE l_sql;


2) Native dynamic SQL using bind variables. (Creates one SQL statement per combination of criteria.)
l_sql := 'INSERT INTO SEARCH_RESULTS SELECT book_id FROM BOOK WHERE 1=1';
IF ( p_title IS NOT NULL ) THEN
 l_sql := l_sql || ' AND title = ?';
ELSE
 -- Criteria will not contribute to the search.
 -- It is only here to ensure that the bind variable is used.
 l_sql := l_sql || ' OR ? = NULL';
END IF;
EXECUTE IMMEDIATE l_sql USING p_title;


3) DBMS_SQL using bind variables. (Creates one SQL statement per combination of specified criteria.)
l_sql := 'INSERT INTO SEARCH_RESULTS SELECT book_id FROM BOOK WHERE 1=1';
IF ( p_title IS NOT NULL ) THEN
 l_parm_index := l_parm_index+1;
 l_parms(l_parm_index) := p_title;
 l_sql := l_sql || ' AND title = :parm' || l_parm_index;
END IF;
l_cursor := dbms_sql.open_cursor;
dbms_sql.parse( l_cursor, l_sql, dbms_sql.native );
IF ( l_parms.COUNT > 0 ) THEN
 FOR l_index IN l_parms.FIRST .. l_parms.LAST LOOP
  dbms_sql.bind_variable( l_cursor, ':parm' || l_index, l_parms(l_index) );
 END LOOP;
END IF;
l_result := dbms_sql.execute( l_cursor );
dbms_sql.close_cursor( l_cursor );


4) Native dynamic SQL using "Application Context". (Creates one SQL statement per combination of specified criteria.)
l_sql := 'INSERT INTO SEARCH_RESULTS SELECT book_id FROM BOOK WHERE 1=1';
IF ( p_title IS NOT NULL ) THEN

 DBMS_SESSION.set_context(
  namespace=>'CTX_SEARCH',
  attribute=>'p_title',
  value=>p_title );

 l_sql := l_sql || ' AND title = SYS_CONTEXT(''CTX_SEARCH'',''p_title'')';

END IF;
EXECUTE IMMEDIATE l_sql;


5) Native dynamic SQL using 'cleverer' bind variables. (Creates one SQL statement, period.)
l_sql := 'INSERT INTO SEARCH_RESULTS SELECT book_id FROM BOOK WHERE 1=1';
l_sql := l_sql || ' AND ( ? IS NULL OR title = ? )';
-- Need to bind every parameter twice since it is used twice.
EXECUTE IMMEDIATE l_sql USING p_title, p_title;


6) DBMS_SQL using 'cleverer' bind variables. (Creates one SQL statement, period.)
l_sql := 'INSERT INTO SEARCH_RESULTS SELECT book_id FROM BOOK WHERE 1=1';
l_parms(l) := p_title;
l_sql := l_sql || ' AND ( :parm1 IS NULL OR title = :parm1 )';
l_cursor := dbms_sql.open_cursor;
dbms_sql.parse( l_cursor, l_sql, dbms_sql.native );
FOR l_index IN l_parms.FIRST .. l_parms.LAST LOOP
 dbms_sql.bind_variable( l_cursor, ':parm' || l_index, l_parms(l_index) );
END LOOP;
l_result := dbms_sql.execute( l_cursor );
dbms_sql.close_cursor( l_cursor );


7) DBMS_SQL using "Application Context". (Creates one SQL statement, period.)
l_sql := 'INSERT INTO SEARCH_RESULTS SELECT book_id FROM BOOK WHERE 1=1';
DBMS_SESSION.set_context(
 namespace=>'CTX_SEARCH',
 attribute=>'p_title',
 value=>p_title );
l_sql := l_sql || ' AND ( SYS_CONTEXT(''CTX_SEARCH'',''p_title'') IS NULL OR title = SYS_CONTEXT(''CTX_SEARCH'',''p_title'') )';
EXECUTE IMMEDIATE l_sql;


and Tom said...

See
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1288401763279

for how I accomplish this.


#1) NEVER NEVER NEVER do that. Not only does it not use bind variables - thus killing the system - it is subject to SQL Injection big time.

#2) No, not necessary.

#3) will be applicable in 11g when we can convert a dbms_sql 'cursor' to a ref cursor - but right now, if I don't have to use dbms_sql, I won't. I'd rather have a ref cursor

#4) my approach

#5) nope, leads to the worst possible plan being used for ANY set of inputs

#6) see #5

#7) see #3, but would still just do #4

Rating

  (8 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

why?

Milo van der Leij, February 09, 2007 - 1:37 pm UTC

Thanks for the answer. I assumed the best way was #3 or #4, but I am surprised you picked SYS_CONTEXT over DBMS_SQL.

While I understand that native dynamic SQL is faster than DBMS_SQL, I'm assuming that SYS_CONTEXT also has some overhead associated with it. Is DBMS_SQL that bad?

Also, I understand that there is such a thing as "bind variable peeking". Can Oracle "peek" into the result of the SYS_CONTEXT call, or do you lose a potential benefit here? (Or does "bind variable peeking" not occur with DBMS_SQL either?)

And finally, for the sake of argument, consider doing SELECT rather than INSERT statements. Being on Oracle 8i, I can't bulk select using native dynamic SQL. In that case, would you consider using DBMS_SQL instead since it allows for bulk data retrieval?

That's why.

Milo van der Leij, February 12, 2007 - 12:53 pm UTC

To save you the trouble, and to have all the answers in one place, I will answer my own questions.

Yes, DBMS_SQL is that bad. From Expert One-on-One: "On simple queries, where the processing of the query itself could be ignored, native dynamic SQL is almost
twice as fast at fetching the data as DBMS_SQL."

From the "CURSOR" thread you linked: "sys_context is treated as a bind variable (but no bind variable peeking)".
There are reports elsewhere that DBMS_SQL also doesn't do bind variable peeking: http://www.db-nemec.com/HappyNewPeek.html
And finally, again from Expert One-on-One when comparing DBMS_SQL with native dynamic SQL: "DBMS_SQL will be used when (...) You will be fetching or inserting thousands of rows and can employ array processing."

Note: one other reason to use DBMS_SQL is that, on Oracle 8i, SYS_CONTEXT values are limited to 255 characters. (4000 characters in 9i).

Tom's latest position + caveats with App Context

Milo, July 14, 2009 - 10:53 am UTC

I see that in the July 2009 edition of Oracle Magazine you have changed your position on this ( https://asktom.oracle.com/Misc/oramag/on-popularity-and-natural-selection.html ), now recommending a variation on #2 that uses "(1=1 OR ? IS NULL)" instead of "OR ? = NULL". I'm glad that people are still trying to find the best way to solve this problem, as I have to believe it's a common one.

In addition to the Application Context downsides you mention, Dominic Brooks points out that messing with the Application Context's value between binding and fetching has consequences that you may or may not expect. ( http://orastory.wordpress.com/2009/07/14/gotcha-application-contexts ). This seems like another reason someone might not want to use Application Contexts.

Finally I'd like to add that since we've moved to 11g we are now using solution #3, using DBMS_SQL to build and bind to the statement, then convert it to a ref cursor, and then use a Dynamic SQL select (with bulk collect limit n) to retrieve the data. We feel it gives us the most flexibility and power without feeling like we're using any kind of hack.
Tom Kyte
July 15, 2009 - 11:14 am UTC

... without feeling like we're using any kind of hack. ...

not sure why (1=1 or ? is null) is a 'hack'. Seems to me that the pure native dynamic sql approach is easier to code than the dbms_sql one. You have to inspect the inputs once to build the query (which the (1=1 or ? is null) does) and then inspect them all again to selectively bind.

Milo van der Leij, July 15, 2009 - 4:40 pm UTC

Maybe not a "hack" by definition (if we could even agree on a definition) but it's definitely "clever", which is not the same as "smart", and it relies on what appears to be undocumented behavior. The PL/SQL Language Reference says that PL/SQL uses "Short-Circuit Evaluation", but I can't find anything that says that SQL also does that.

If my colleagues ask "will that always work in all cases, even when we upgrade to Oracle 12?", I wouldn't be able to answer "yes" with 100% certainty. I'm putting more value on trusting the solution than on performance of the solution.
Tom Kyte
July 15, 2009 - 4:54 pm UTC

... and it relies on
what appears to be undocumented behavior....

not at all?


select *
from t
where (1=1 or :x is null)
and (1=1 or :y is null)
/


what is undocumented about that sql syntax?
We rely one the optimizer to optimize things - to recognize "tautologies" - all of the time.

... I'm
putting more value on trusting the solution than on performance of the solution...

that doesn't seem to compute to me? When you upgrade to version 12, are you sure that the optimizer won't look at something like:


with p
as
(select :x x, :y y from dual)
select * 
  from t, p
 where t.hiredate > p.y;


and go nutso for *some reason*?

It is a non-equijoin, a cartesian join of sorts. It cannot bind peek like "where t.hiredate > :y" can. It has far less opportunity to get the right estimated cardinality.

Milo van der Leij, July 15, 2009 - 5:45 pm UTC

You're right, the SQL syntax is well documented. However, the optimizer's behavior is not. I'm simply saying that
select * from t

has a smaller chance of "going nutso" than
select * from t where (1=1 or :x is null)

Tom Kyte
July 16, 2009 - 11:11 am UTC

agreed - but

with p (select ... from dual)
select * from t, p

has a much larger change of going nutso than

select * from t where (1=1 or :x is null)


does. Adding that non-equijoin in there - in a complex query of any sort - is infinitely more complex to optimize....

Query using sys context going for full scan

Lal, July 11, 2012 - 7:33 am UTC

Tom,

I have the following query which is executed dynamically and uses sys_context.

PKG_DB_UTILITY is a user defined package which has set_value and get_value functions. Set value uses sys_Context to set the value in the session.
This query goes for a full scan when using the package with sys_context and uses index when literal values are used.

The following are the details of the execution
--Code to set the values
begin
PKG_DB_UTILITY.SET_PARAMETER('comp_code','XX');
Pkg_Db_Utility.set_parameter('phone_number','9999999999999999');
ENd;

SELECT /*+ GATHER_PLAN_STATISTICS */ comp_code,RES_NUMBER
from CONTACT_DTLS
where comp_code = Pkg_Db_Utility.get_value('comp_code')
AND (PHONE_NO LIKE Pkg_Db_Utility.get_value('phone_number')
or DEST_PHONE_NO like PKG_DB_UTILITY.GET_VALUE('phone_number')
or CELL_PHONE_NO like PKG_DB_UTILITY.GET_VALUE('phone_number')
);

The query returned one row, but went for a full scan as shown by the execution plan below
select * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ADVANCED ALLSTATS LAST'));
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 14319 (100)| | 1 |00:00:20.70 | 50901 | 50897 |
|* 1 | TABLE ACCESS FULL| CONTACT_DTLS | 1 | 116K| 5012K| 14319 (4)| 00:02:52 | 1 |00:00:20.70 | 50901 | 50897 |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((("PHONE_NO" LIKE "PKG_DB_UTILITY"."GET_VALUE"('phone_number') OR "DEST_PHONE_NO" LIKE
"PKG_DB_UTILITY"."GET_VALUE"('phone_number') OR "CELL_PHONE_NO" LIKE "PKG_DB_UTILITY"."GET_VALUE"('phone_number')) AND
"comp_code"="PKG_DB_UTILITY"."GET_VALUE"('comp_code')))

Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "comp_code"[VARCHAR2,6], "RES_NUMBER"[VARCHAR2,15]

But when using literals this goes for a index scan as shown below

SELECT /*+ GATHER_PLAN_STATISTICS */ comp_code, RES_NUMBER
from CONTACT_DTLS
where comp_code = PKG_DB_UTILITY.GET_VALUE('comp_code')
and (PHONE_NO like '9999999999999999' --Pkg_Db_Utility.get_value('phone_number')
or DEST_PHONE_NO like '9999999999999999' --PKG_DB_UTILITY.GET_VALUE('phone_number')
or CELL_PHONE_NO like '9999999999999999' --PKG_DB_UTILITY.GET_VALUE('phone_number')
);
select * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ADVANCED ALLSTATS LAST'));
---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 13 (100)| | 1 |00:00:00.77 | 10 | 7 |
|* 1 | TABLE ACCESS BY INDEX ROWID | CONTACT_DTLS | 1 | 15 | 660 | 13 (0)| 00:00:01 | 1 |00:00:00.77 | 10 | 7 |
| 2 | BITMAP CONVERSION TO ROWIDS | | 1 | | | | | 1 |00:00:00.75 | 9 | 6 |
| 3 | BITMAP OR | | 1 | | | | | 1 |00:00:00.75 | 9 | 6 |
| 4 | BITMAP CONVERSION FROM ROWIDS| | 1 | | | | | 0 |00:00:00.11 | 3 | 2 |
|* 5 | INDEX RANGE SCAN | CONTACT_DTLS_IDX11 | 1 | | | 3 (0)| 00:00:01 | 0 |00:00:00.11 | 3 | 2 |
| 6 | BITMAP CONVERSION FROM ROWIDS| | 1 | | | | | 0 |00:00:00.60 | 3 | 2 |
|* 7 | INDEX RANGE SCAN | CONTACT_DTLS_IDX10 | 1 | | | 3 (0)| 00:00:01 | 0 |00:00:00.60 | 3 | 2 |
| 8 | BITMAP CONVERSION FROM ROWIDS| | 1 | | | | | 1 |00:00:00.04 | 3 | 2 |
|* 9 | INDEX RANGE SCAN | CONTACT_DTLS_IDX9 | 1 | | | 3 (0)| 00:00:01 | 1 |00:00:00.04 | 3 | 2 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("comp_code"="PKG_DB_UTILITY"."GET_VALUE"('comp_code'))
5 - access("DEST_PHONE_NO"='9999999999999999')
7 - access("CELL_PHONE_NO"='9999999999999999')
9 - access("PHONE_NO"='9999999999999999')

Details on the table and index stats.
-------------------------------------
select NUM_ROWS,BLOCKS,SAMPLE_SIZE
from DBA_TABLES
where TABLE_NAME='CONTACT_DTLS';

NUM_ROWS BLOCKS SAMPLE_SIZE
-------------------------------
817963 50900 245389

select INDEX_NAME ,LEAF_BLOCKS,DISTINCT_KEYS,clustering_factor,NUM_ROWS
from DBA_INDEXES
where TABLE_NAME='CONTACT_DTLS';

INDEX_NAME LEAF_BLOCKS DISTINCT_KEYS clustering_factor NUM_ROWS
--------------------------------------------------------------------------------------------
CONTACT_DTLS_IDX9 3328 177347 749538 760979
CONTACT_DTLS_IDX10 3378 174861 763452 774363
CONTACT_DTLS_IDX11 2449 139253 594789 601896

select INDEX_NAME,COLUMN_NAME,COLUMN_POSITION
from DBA_IND_COLUMNS
where TABLE_NAME='CONTACT_DTLS' ORDER BY 1,3
INDEX_NAME COLUMN_NAME COLUMN_POSITION
-------------------------------------------------------
CONTACT_DTLS_IDX9 PHONE_NO 1
CONTACT_DTLS_IDX10 CELL_PHONE_NO 1
CONTACT_DTLS_IDX11 DEST_PHONE_NO 1

Doubts
------
Why is the estimated rows high in the case of query using package.?
The statistics is almost correct (The table has 8 lakh rows) and the selectivity of indexed columns is also high 177347 distinct values out of 817963 rows.
I understand that if sys_context is used optimiser wont peek the value. But even if it didnt peek the statistics is sufficient to go for a index scan right?.
This query uses index scan in a different environment with the package using sys_context.
Can you help me figure out the issue.
Thanks in advance for your precious time

Tom Kyte
July 12, 2012 - 5:08 pm UTC

I see no use of sys_context here *at all*

consider using this technique to invoke your plsql called from sql:
http://www.oracle.com/technetwork/issue-archive/2011/11-sep/o51asktom-453438.html


when the optimizer is given a query of the form:

c1 like ????? or c2 like ???? or c3 like ?????

the statistics are not going to matter too much. Think about it for a second - what kind of estimate would you come up with - you can have ANY statistic in the world you want - please estimate that row count.

you cannot. You don't know if ???? is going to be "%", or "123456789". You will have to guess.

We guess.


Now, if I told you the query was:

c1 like 12345 or c2 like 12345 or c3 like 12345

you could come up with a pretty darn good estimate couldn't you.


The issue is we have to guess at the estimated cardinalities - we have to. You might consider using a first_rows(1) hint if your goal is to retrieve the first record as fast as possible.

Query using sys context going for full scan

Lal, July 13, 2012 - 6:13 am UTC

Tom,
Thanks very much for the quick response.

This is just part of one search criteria, there are other search criterias as well and ie why this is implemented as a dynamic query using sys_context.

I went through the article on scalar subquery caching. In this case the plsql is executed only once for a hard coded value to get the paramter value set using the sys_context.
Will scalar subquery caching help if the package is executed only once in this query? (PKG_DB_UTILITY.GET_VALUE('phone_number') is executed only once)

SELECT /*+ GATHER_PLAN_STATISTICS  */ comp_code,RES_NUMBER
 from CONTACT_DTLS
where comp_code = Pkg_Db_Utility.get_value('comp_code')
  AND (PHONE_NO LIKE Pkg_Db_Utility.get_value('phone_number')
      or DEST_PHONE_NO like PKG_DB_UTILITY.GET_VALUE('phone_number')
      or CELL_PHONE_NO like PKG_DB_UTILITY.GET_VALUE('phone_number')     
   );


When i change the like condition to '=' its using the index. So that would mean dynamic queries using sys_context and like clause will need to be hinted to make use of the index?
Another approach i think will be to use the using clause instead of sys_context, so that hard parsing can be avoided and better plan is taken.

Another doubt is, why the same query goes for a index scan in a different database for a different client with similar data and stats and database version. (11.2).
There also the estimated cardinality values are wrong, but index is taken.

We are not paginating the results. Will first_rows(1) hint help if pagination is not done?

Once again thanks very much for your time and help. I am learning a lot from you.
Tom Kyte
July 16, 2012 - 3:11 pm UTC

how do you know the package is only executed once. To me, I would expect it to be executed three times per row for the full scan.

... When i change the like condition to '=' its using the index. ...

please think about this, I gave you a huge hint above...


when the optimizer is given a query of the form:

c1 like ????? or c2 like ???? or c3 like ?????

the statistics are not going to matter too much. Think about it for a second - what kind of estimate would you come up with - you can have ANY statistic in the world you want - please estimate that row count.


So, I asked you to think like an optimizer - so what sort of estimated cardinality would you come up for that? where c1 like ??? or c2 like ??? or c3 like ??? ?????

it would be pretty bad wouldn't it - anything from 0 to the number of rows in the table is the possible answer!!! So, you would almost certainly tend to overestimate.


Now, let's say I asked to you estimate:

where c1 = ?? or c2 = ?? or c3 = ??

and I told you:

there are N rows in the table
there are X distinct values of C1
there are Y distinct values of C2
there are Z distinct values of C3

Now, would you be able to guess a bit better??? Probably - think about it, start running the math through your head (start thinking like the optimizer - if you do that, you'll be way ahead of the game). Think of how much better a cardinality estimate would be with equals as opposed to like!!!!


If you know the estimated cardinality of the result set and the optimizer is having a really hard time guessing it - using first_rows(n) in this case would be a clear way to tell it what the estimated number of rows will be, yes.

cases when bind variables are not peeked

Lal, July 17, 2012 - 8:31 am UTC

Tom,
Thanks very much for the response.

I know that for queries having bind variable values set using sys_context, bind variable peeking does not happen.

Are there any other special cases where the bind variable values wont be peeked?

Also if histograms are not collected, would bind variable value be peeked?

I was under the impression that first_rows(1) hint is helpful only if we need part of the result set first like for pagination of results. But based on your explanation i undestand that first_rows(1) hint can be used to facilitate optimiser to use indexes. Is that correct?

If yes can this be enabled at the system level for an oltp database? Any issues in that?
Tom Kyte
July 17, 2012 - 9:26 am UTC

Are there any other special cases where the bind variable values wont be
peeked?


if the client does not support deferred optimization.

A program typically goes like this to execute a sql statement:

prepareStatement (this does the parse)
bind
bind
bind
execute (this runs the sql)
close


well, if we optimized during the parse - we would not have the binds to peek at - so we need to sort of "two step" the parse process:

prepareStatement (this does the parse UP TO but not including optimization)
bind
bind
bind
execute (this will optimize if necessary and then run the sql)
close

Some clients only support the former (old clients at this point in time, most all should be doing it)


Also if histograms are not collected, would bind variable value be peeked?


sure, they can be. If you have partitions for example they might be peeked to figure out what partition the query will hit and then use local statistics to optimize.



can be used to facilitate
optimiser to use indexes.


totally FALSE, entirely false.

it can be used to facilitate the returning of the first rows as fast as possible - which may involve indexes or not.

You know your result set is going to be very small, constrained. You know the size of it approximately. Using first_rows(n) will tell us that information.


I would not set it at the system level, no. Let the applications do it - perhaps as a parameter they use upon starting up.

I am against setting init.ora parameters unless you have to. And you don't have to here.

More to Explore

DBMS_SQL

More on PL/SQL routine DBMS_SQL here