Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Peter.

Asked: September 20, 2004 - 7:48 am UTC

Last updated: July 12, 2005 - 7:51 am UTC

Version: 9.2.5

Viewed 1000+ times

You Asked

Tom
We all know what you think about binding ;-} and how it will solve all ills (!) I can see that is true in the OLTP world or where we can 'pre-can' web queries, but in data warehouses we are up against 'never the same query twice' - one guy want's to know total sales last quarter, the next wants to know if we sell more beer on a Wednesday than on a Thursday. The use of front end query tools does not help us either as we can not modify the sql that is gerenrated for user queries.

So what do we do? Is binding the holy grail we should try to seek, or do we just rely on other Oracle features such as Dimensions, star transforms, materilized view query rewrite, bitmap indexes, partitioning etc to give the best user performance?

My question is : Do you have any tips to achieve binding in DWH or should we not even try to develop ways to do this?

By the way our dataload process (PL/SQL) is bind, bind, bind ;-)

I still have Effective Oracle by Design on my wish list - perhaps the answer I want is there



and Tom said...

Here is a snipped from that book explaining this in a little more detail:


(2)There Are Exceptions to Every Rule

In this section, we've seen mathematical, scientific, reproducible proof that as a general rule, you should use bind variables:

o Bind variables are a sound method to reduce the number of latches (read that word as locks) your application will use when parsing queries.

o Soft parsing uses significantly less CPU time than hard parsing, and bind variables are the way to achieve soft parsing.

o Stringing literals into your SQL statements, rather than using bind variables, opens your system to the SQL injection security bug.

o Stringing literals into your SQL statements can cause the statement to fail if the user inputs some unanticipated characters such as quotation marks.

o No matter what environment you are using-PL/SQL, Java and JDBC, or some other language-using bind variables is not only at least as fast, if not faster, than not using bind variables, but the code is also easier to write.

In short, if you need to execute hundreds or thousands or more SQL statements per minute or second, use bind variables in place of literals that change in the SQL. The effect of not using them can be quite dramatic. All it takes is one bad query in your system that fails to use a bind variable, but is executed frequently differing only in the literal values, to bring it to its knees.

Does that mean that you should always use bind variables in every circumstance?

No, those kinds of rules of thumb are dumb. For every rule of thumb out there, there are cases where they do not hold true. In some cases on systems that measure queries per second (which we've been focusing on in the examples here), you don't want use bind variables. On a system where you are measuring seconds per query (a data warehouse, for example), rather than per second, bind variables are something you may actually want to avoid using.

(3)Queries-per-Second Systems
....
discussion snipped
....

(3)Seconds-per-Query Systems

In what I'll loosely refer to as data warehouses, instead of running say 1,000 statements per second, they do something like take an average of 100 seconds to run a single query. In these systems, the queries are few but big (they ask large questions). Here, the overhead of the parse time is a tiny fraction of the overall execution time. Even if you have thousands of users, they are not waiting behind each other to parse queries, but rather are waiting for the queries to finish getting the answer.

In these systems, using bind variables may be counterproductive. Here, the runtimes for the queries are lengthy-in seconds, minutes, hours, or more. The goal is to get the best query optimization plan possible to reduce the runtime, not to execute as many of OLTP, one-tenth-second queries as possible. Since the optimizer's goal is different, the rules change.

As explained in the previous section (considering the example with the status = `N' query against the RECORDS_TO_BE_PROCESSED table), sometimes using a bind variable forces the optimizer to come up with the best generic plan, which actually might be the worst plan for the specific query. In a system where the queries take considerable time to execute, bind variables remove information the optimizer could have used to come up with a superior plan. In fact, some data warehouse-specific features are defeated by using bind variables. For example, Oracle supports a star transformation feature for data warehouses that can greatly reduce the time a query takes (we discuss this feature briefly in Chapter 6, the CBO). However, one restriction that precludes star transformation is having queries that contain bind variables.

In general, when you are working with a seconds-per-query system, use of bind variables may well be the performance inhibitor!


Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Fascinating

Dan Kefford, September 20, 2004 - 1:28 pm UTC

Alex... I'll take "You learn something new every day" for 800.

Thanks

Peter, September 21, 2004 - 3:53 am UTC

Just as I hoped for, thanks Tom.

Now to get my wife to buy your book as an anniversary present {;-]

Tom Kyte
September 21, 2004 - 7:43 am UTC

I can think of better things to get personally...

Automatic Binding using PL/SQL - Can we stop it

Jeff Plumb, July 11, 2005 - 9:40 pm UTC

Is it possible to stop PL/SQL from using bind variables. I have a query that passes in a date range and queries the table based on that date range. If it is a small date range then the index would be useful, but with a large date range a full tablescan would be preferred. With my routine, the index is always used regardless. Here is an example:

create table delivery
(load_id number(10) primary key
,delivery_date date not null
,weight number(10) not null
);

create index delivery_i1 on delivery (delivery_date);


declare
i number;
begin
for i in 1..250000
loop
insert into delivery values (i,sysdate-i,i);
end loop;
end;
/

analyze table delivery compute statistics;

VARIABLE p_from_date VARCHAR2(8)
VARIABLE p_to_date VARCHAR2(8)

begin
:p_from_date := '20050615';
:p_to_date := '20050711';
end;
/

set autot traceonly
select *
from delivery
where delivery_date between to_date(:p_from_date,'YYYYMMDD') and to_date(:p_to_date,'YYYYMMDD');
set autot off

begin
:p_from_date := '13000615';
:p_to_date := '20050711';
end;
/

set autot traceonly

select *
from delivery
where delivery_date between to_date(:p_from_date,'YYYYMMDD') and to_date(:p_to_date,'YYYYMMDD');

select *
from delivery
where delivery_date between to_date('20050615','YYYYMMDD') and to_date('20050711','YYYYMMDD');

select *
from delivery
where delivery_date between to_date('13000615','YYYYMMDD') and to_date('20050711','YYYYMMDD');

set autot off
spool off


Now the results I get are as follows:

PL/SQL procedure successfully completed.


26 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=625 Bytes=9375)
1 0 FILTER
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'DELIVERY' (Cost=8 Card=625 Bytes=9375)
3 2 INDEX (RANGE SCAN) OF 'DELIVERY_I1' (NON-UNIQUE) (Cost=6 Card=1125)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
1154 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
26 rows processed

PL/SQL procedure successfully completed.


249999 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=625 Bytes=9375)
1 0 FILTER
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'DELIVERY' (Cost=8 Card=625 Bytes=9375)
3 2 INDEX (RANGE SCAN) OF 'DELIVERY_I1' (NON-UNIQUE) (Cost=6 Card=1125)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5972 consistent gets
0 physical reads
0 redo size
5185646 bytes sent via SQL*Net to client
28140 bytes received via SQL*Net from client
2501 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
249999 rows processed
26 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=27 Bytes=405)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'DELIVERY' (Cost=4 Card=27 Bytes=405)
2 1 INDEX (RANGE SCAN) OF 'DELIVERY_I1' (NON-UNIQUE) (Cost=3 Card=27)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
1154 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
26 rows processed


249999 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=414 Card=250000 Bytes=3750000)
1 0 TABLE ACCESS (FULL) OF 'DELIVERY' (Cost=414 Card=250000 Bytes=3750000)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2903 consistent gets
0 physical reads
0 redo size
5185646 bytes sent via SQL*Net to client
28140 bytes received via SQL*Net from client
2501 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
249999 rows processed


So when I don't use bind variable and pass in a date range that selects most of the table a full scan is performed. However with the bind variables, the index is used and this performs badly in comparison.

I am using this in a data warehouse environment. I have a procedure that accepts the from and to dates as parameters and then executes the query. How can force the optimizer to look at the values to decide what plan to execute?




Tom Kyte
July 12, 2005 - 7:51 am UTC

how many times a day do you do this particular query.


if this is frequent, suggest the use of ref cursors


if ( p_end_date - p_start_date > l_threshold )
then
open l_cursor for select /*+ full */ .... from where ...
else
open l_cursor for select .... from where ....
end if;


if infrequent, ref cursor again, but dynamic sql


open l_cursor for 'select ... from ... where dt1 between to_date( '||.....



More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.