You Asked
Hi Tom,
We are trying to use oracle as our database for a kind of ad-hoc DSS system. In the book, Expert Oracle, you pointed that not using bind variable for executing query is one of the cause to fail the project. For our project, we need to generate SQL queries which are heavily dynamic and rarely one query be the same as another. Is Oracle the right kind of tool for such project? Most of the queries apply to the schema which are somewhat modified form of star-join but at the same time they may refer to another star-join. Besides, the queries are applied on large set of data and performance is a critical factor for the success of the project. Unlike most OLAP tools which support single value for each dimension, we need to supply more than one value or even the combination of values for one or more dimensions and their attributes.
Thanks in advance.
Deepak.
and Tom said...
Here is an excerpt from Effective Oracle By Design, where I elaborated on this a bit more (and in Expert Oracle: Database Development, I'll be spending yet more time on this - that is under development)
Bottom line: for the data warehouse query, no binds is OK and preferred. For 99.99999999999% of the queries your system runs however, binds are the way to go...
<quote>
There Are Exceptions to Every Rule
In this section, weve seen mathematical, scientific, reproducible proof that as a general rule, you should use bind variables:
· Bind variables are a sound method to reduce the number of latches (read that word as locks) your application will use when parsing queries.
· Soft parsing uses significantly less CPU time than hard parsing, and bind variables are the way to achieve soft parsing.
· Stringing literals into your SQL statements, rather than using bind variables, opens your system to the SQL injection security bug.
· Stringing literals into your SQL statements can cause the statement to fail if the user inputs some unanticipated characters such as quotation marks.
· No matter what environment you are usingPL/SQL, Java and JDBC, or some other languageusing bind variables is not only at least as fast, if not faster, than not using bind variables, but the code is also easier to write.
In short, if you need to execute hundreds or thousands or more SQL statements per minute or second, use bind variables in place of literals that change in the SQL. The effect of not using them can be quite dramatic. All it takes is one bad query in your system that fails to use a bind variable, but is executed frequently differing only in the literal values, to bring it to its knees.
Does that mean that you should always use bind variables in every circumstance? No, those kinds of rules of thumb are dumb. For every rule of thumb out there, there are cases where they do not hold true. In some cases on systems that measure queries per second (which weve been focusing on in the examples here), you dont want use bind variables. On a system where you are measuring seconds per query (a data warehouse, for example), rather than per second, bind variables are something you may actually want to avoid using.
Queries-per-Second Systems
As youve seen, in a typical system, where a Statspack report tells you that you execute tens of queries or more per second on average, you should use bind variables. But I have seen binding taken to an extreme, when developers seek and destroy all occurrences of literals in queries wherever they exist (this is also an unfortunate side effect of cursor_sharing=-FORCE, a session or init.ora setting). The problem with this approach is that you could be removing from the optimizer crucial information that it would have used to generate an optimal plan at runtime.
For example, suppose we have a table that has a STATUS column with the value Y (to indicate the record was processed) or N (to indicate the record was not processed). Most of the records in such a table would be in the processed state, with a small percentage not processed. We have an application that must pick off the unprocessed (N) records, process them, and update their status. Therefore, we have the following simple query in our code:
select * from records_to_be_processed where status = 'N';
Now, suppose that someone imposed the rule that bind variables are mandatory, so we must change that N into a bind variable. The problem we discover after doing so is that performance of our application just went down the tubes; what used to take seconds now takes minutes. The reason is simple: our query stopped using an index and is now full-scanning a rather large table. Consider this simple simulation:
ops$tkyte@ORA920> create table records_to_be_processed
2 as
3 select decode( mod(rownum,100), 0, 'N', 'Y' ) processed, a.*
4 from all_objects a;
Table created.
ops$tkyte@ORA920> create index processed_idx on records_to_be_processed(processed);
Index created.
ops$tkyte@ORA920> analyze table records_to_be_processed compute statistics
2 for table
3 for all indexes
4 for all indexed columns
5 /
Table analyzed.
ops$tkyte@ORA920> variable processed varchar2(1);
ops$tkyte@ORA920> exec :processed := 'N'
PL/SQL procedure successfully completed.
ops$tkyte@ORA920> set autotrace traceonly explain
ops$tkyte@ORA920> select *
2 from records_to_be_processed
3 where processed = 'N';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=97)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'RECORDS_TO_BE_PROCESSED'
(Cost=2 Card=1 Bytes=97)
2 1 INDEX (RANGE SCAN) OF 'PROCESSED_IDX' (NON-UNIQUE)
(Cost=1 Card=1)
ops$tkyte@ORA920> select *
2 from records_to_be_processed
3 where processed = :processed;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=46 Card=16220 Bytes=1573340)
1 0 TABLE ACCESS (FULL) OF 'RECORDS_TO_BE_PROCESSED'
(Cost=46 Card=16220 Bytes=1573340)
When we remove the fact that processed = N from the query and just generically say where processed = :some value, the optimizer must make some assumptions. Here, it assumes that half of the table on average will be returned, since the statistics indicate there are only two values. In this case, the index should be avoided. In general, it did the right thing, but specifically, it did the wrong thing. It should be noted however, that there is a change that autotrace is lying to us here however! There is a chance, due to bind variable peeking, that the optimizer will rethink that plan when asked to really execute the query and it would use an index. Well visit that in more detail below in the section on Bind Variable Peeking.
But, in this case, the only correct solution here is to put the literal back. No matter how many times we execute that query in our application, and no matter how many copies of our application we invoke, the query will always be where processed = N. The value is a true constant that never changes.
If this query sometimes used processed = N, sometimes used processed = Y, and sometimes used processed = Something else, using a bind variable would be the correct approach. In short, you need to use bind variables only when the value supplied to the query changes at runtime.
Another way to look at this is if you are using static SQLSQL that is known fully at compile timeyou do not need to use a bind variable; the values in the query can never change, no matter how many times you execute the query. On the other hand, if you are using truly dynamic SQL, whereby the query is dynamically constructed at runtime, you should use bind variables for those columns for which inputs change. For example, suppose you have some Java/JDBC code that looks like this:
PreparedStatement pstat =
conn.prepareStatement
("select ename, empno "+
"from emp " +
"where job = 'CLERK' " +
"and ename like '" + ename_like + "'" );
That will result in a query like this being generated at runtime:
select ename, empno
from emp
where job = 'CLERK'
and ename like '%A%'
At first glance, it might seem like you should bind both the literal CLERK and %A%. In fact, only %A% needs to be bound. No matter how many times that query is executed by that application or how many times you run that application, the predicate where job = CLERK is constant. Only the predicate against ENAME is variable, so only it needs to be bound. Here is the correct code in Java/JDBC:
PreparedStatement pstat =
conn.prepareStatement
("select ename, empno "+
"from emp " +
"where job = 'CLERK' " +
"and ename like ?" );
pstat.setString( 1, ename_like );
It accomplishes the same goal, but properly uses bind variables where necessary.
The same concept applies in PL/SQL stored procedures as in any other language: If the literal in the query is invariant at runtime, it need not be bound. Consider this query:
for x in ( select *
from emp
where job = 'CLERK'
and ename like p_ename ) ...
You do not want to replace the literal CLERK with a bind variable. Again, no matter how many times you run that query, the predicate will always be where job = CLERK.
In PL/SQL, the only time you need to worry about bind variables is when you use dynamic SQL and build a query at runtime. In other languages that support only dynamic SQL, such as Java with JDBC and VB with ODBC, you need to take care to properly bind each query when and where appropriate. This is yet another reason to consider using stored procedures: They actually make it very hard to misuse or skip using bind variables! Bear in mind, the developers still must bind in their call to the stored procedure, they cannot skip it all together. But by using stored procedures, you will reduce the number of times they must bind typically a single stored procedure call will execute many different SQL statements each of which must be bound. They will bind once to the stored procedure and PL/SQL will take it from there.
Seconds-per-Query Systems
In what Ill loosely refer to as data warehouses, instead of running say 1,000 statements per second, they do something like take an average of 100 seconds to run a single query. In these systems, the queries are few but big (they ask large questions). Here, the overhead of the parse time is a tiny fraction of the overall execution time. Even if you have thousands of users, they are not waiting behind each other to parse queries, but rather are waiting for the queries to finish getting the answer.
In these systems, using bind variables may be counterproductive. Here, the runtimes for the queries are lengthyin seconds, minutes, hours, or more. The goal is to get the best query optimization plan possible to reduce the runtime, not to execute as many of OLTP, one-tenth-second queries as possible. Since the optimizers goal is different, the rules change.
As explained in the previous section (considering the example with the status = N query against the RECORDS_TO_BE_PROCESSED table), sometimes using a bind variable forces the optimizer to come up with the best generic plan, which actually might be the worst plan for the specific query. In a system where the queries take considerable time to execute, bind variables remove information the optimizer could have used to come up with a superior plan. In fact, some data warehouse-specific features are defeated by using bind variables. For example, Oracle supports a star transformation feature for data warehouses that can greatly reduce the time a query takes (we discuss this feature briefly in Chapter 6, the CBO). However, one restriction that precludes star transformation is having queries that contain bind variables.
In general, when you are working with a seconds-per-query system, use of bind variables may well be the performance inhibitor!
</quote>
Rating
(3 ratings)
Is this answer out of date? If it is, please let us know via a Comment