Skip to Main Content
  • Questions
  • Using Oracle as DBMS for analytical processing

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Deepak.

Asked: December 15, 2005 - 2:25 pm UTC

Last updated: December 19, 2005 - 12:17 pm UTC

Version: 10gR2

Viewed 1000+ times

You Asked

Hi Tom,

We are trying to use oracle as our database for a kind of ad-hoc DSS system. In the book, Expert Oracle, you pointed that not using bind variable for executing query is one of the cause to fail the project. For our project, we need to generate SQL queries which are heavily dynamic and rarely one query be the same as another. Is Oracle the right kind of tool for such project? Most of the queries apply to the schema which are somewhat modified form of star-join but at the same time they may refer to another star-join. Besides, the queries are applied on large set of data and performance is a critical factor for the success of the project. Unlike most OLAP tools which support single value for each dimension, we need to supply more than one value or even the combination of values for one or more dimensions and their attributes.

Thanks in advance.

Deepak.


and Tom said...

Here is an excerpt from Effective Oracle By Design, where I elaborated on this a bit more (and in Expert Oracle: Database Development, I'll be spending yet more time on this - that is under development)

Bottom line: for the data warehouse query, no binds is OK and preferred. For 99.99999999999% of the queries your system runs however, binds are the way to go...

<quote>

There Are Exceptions to Every Rule

In this section, we’ve seen mathematical, scientific, reproducible proof that as a general rule, you should use bind variables:

· Bind variables are a sound method to reduce the number of latches (read that word as locks) your application will use when parsing queries.
· Soft parsing uses significantly less CPU time than hard parsing, and bind variables are the way to achieve soft parsing.
· Stringing literals into your SQL statements, rather than using bind variables, opens your system to the SQL injection security bug.
· Stringing literals into your SQL statements can cause the statement to fail if the user inputs some unanticipated characters such as quotation marks.
· No matter what environment you are using—PL/SQL, Java and JDBC, or some other language—using bind variables is not only at least as fast, if not faster, than not using bind variables, but the code is also easier to write.


In short, if you need to execute hundreds or thousands or more SQL statements per minute or second, use bind variables in place of literals that change in the SQL. The effect of not using them can be quite dramatic. All it takes is one bad query in your system that fails to use a bind variable, but is executed frequently differing only in the literal values, to bring it to its knees.

Does that mean that you should always use bind variables in every circumstance? No, those kinds of rules of thumb are dumb. For every rule of thumb out there, there are cases where they do not hold true. In some cases on systems that measure queries per second (which we’ve been focusing on in the examples here), you don’t want use bind variables. On a system where you are measuring seconds per query (a data warehouse, for example), rather than per second, bind variables are something you may actually want to avoid using.



Queries-per-Second Systems

As you’ve seen, in a typical system, where a Statspack report tells you that you execute tens of queries or more per second on average, you should use bind variables. But I have seen binding taken to an extreme, when developers seek and destroy all occurrences of literals in queries wherever they exist (this is also an unfortunate side effect of cursor_sharing=-FORCE, a session or init.ora setting). The problem with this approach is that you could be removing from the optimizer crucial information that it would have used to generate an optimal plan at runtime.

For example, suppose we have a table that has a STATUS column with the value Y (to indicate the record was processed) or N (to indicate the record was not processed). Most of the records in such a table would be in the processed state, with a small percentage not processed. We have an application that must pick off the unprocessed (N) records, process them, and update their status. Therefore, we have the following simple query in our code:

select * from records_to_be_processed where status = 'N';

Now, suppose that someone imposed the rule that bind variables are mandatory, so we must change that ‘N’ into a bind variable. The problem we discover after doing so is that performance of our application just went down the tubes; what used to take seconds now takes minutes. The reason is simple: our query stopped using an index and is now full-scanning a rather large table. Consider this simple simulation:

ops$tkyte@ORA920> create table records_to_be_processed
2 as
3 select decode( mod(rownum,100), 0, 'N', 'Y' ) processed, a.*
4 from all_objects a;
Table created.

ops$tkyte@ORA920> create index processed_idx on records_to_be_processed(processed);
Index created.

ops$tkyte@ORA920> analyze table records_to_be_processed compute statistics
2 for table
3 for all indexes
4 for all indexed columns
5 /
Table analyzed.

ops$tkyte@ORA920> variable processed varchar2(1);
ops$tkyte@ORA920> exec :processed := 'N'
PL/SQL procedure successfully completed.

ops$tkyte@ORA920> set autotrace traceonly explain
ops$tkyte@ORA920> select *
2 from records_to_be_processed
3 where processed = 'N';

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=97)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'RECORDS_TO_BE_PROCESSED'
(Cost=2 Card=1 Bytes=97)
2 1 INDEX (RANGE SCAN) OF 'PROCESSED_IDX' (NON-UNIQUE)
(Cost=1 Card=1)

ops$tkyte@ORA920> select *
2 from records_to_be_processed
3 where processed = :processed;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
(Cost=46 Card=16220 Bytes=1573340)
1 0 TABLE ACCESS (FULL) OF 'RECORDS_TO_BE_PROCESSED'
(Cost=46 Card=16220 Bytes=1573340)

When we remove the fact that processed = ‘N’ from the query and just generically say where processed = :some value, the optimizer must make some assumptions. Here, it assumes that half of the table on average will be returned, since the statistics indicate there are only two values. In this case, the index should be avoided. In general, it did the right thing, but specifically, it did the wrong thing. It should be noted however, that there is a change that autotrace is lying to us here however! There is a chance, due to bind variable peeking, that the optimizer will rethink that plan when asked to really execute the query and it would use an index. We’ll visit that in more detail below in the section on Bind Variable Peeking.


But, in this case, the only correct solution here is to put the literal back. No matter how many times we execute that query in our application, and no matter how many copies of our application we invoke, the query will always be where processed = ‘N’. The value is a true constant that never changes.

If this query sometimes used processed = ‘N’, sometimes used processed = ‘Y’, and sometimes used processed = ‘Something else’, using a bind variable would be the correct approach. In short, you need to use bind variables only when the value supplied to the query changes at runtime.

Another way to look at this is if you are using static SQL—SQL that is known fully at compile time—you do not need to use a bind variable; the values in the query can never change, no matter how many times you execute the query. On the other hand, if you are using truly dynamic SQL, whereby the query is dynamically constructed at runtime, you should use bind variables for those columns for which inputs change. For example, suppose you have some Java/JDBC code that looks like this:

PreparedStatement pstat =
conn.prepareStatement
("select ename, empno "+
"from emp " +
"where job = 'CLERK' " +
"and ename like '" + ename_like + "'" );

That will result in a query like this being generated at runtime:

select ename, empno
from emp
where job = 'CLERK'
and ename like '%A%'

At first glance, it might seem like you should bind both the literal ‘CLERK’ and ‘%A%’. In fact, only ‘%A%’ needs to be bound. No matter how many times that query is executed by that application or how many times you run that application, the predicate where job = ‘CLERK’ is constant. Only the predicate against ENAME is variable, so only it needs to be bound. Here is the correct code in Java/JDBC:


PreparedStatement pstat =
conn.prepareStatement
("select ename, empno "+
"from emp " +
"where job = 'CLERK' " +
"and ename like ?" );

pstat.setString( 1, ename_like );

It accomplishes the same goal, but properly uses bind variables where necessary.

The same concept applies in PL/SQL stored procedures as in any other language: If the literal in the query is invariant at runtime, it need not be bound. Consider this query:

for x in ( select *
from emp
where job = 'CLERK'
and ename like p_ename ) ...

You do not want to replace the literal ‘CLERK’ with a bind variable. Again, no matter how many times you run that query, the predicate will always be where job = ‘CLERK’.

In PL/SQL, the only time you need to worry about bind variables is when you use dynamic SQL and build a query at runtime. In other languages that support only dynamic SQL, such as Java with JDBC and VB with ODBC, you need to take care to properly bind each query when and where appropriate. This is yet another reason to consider using stored procedures: They actually make it very hard to misuse or skip using bind variables! Bear in mind, the developers still must bind in their call to the stored procedure, they cannot skip it all together. But by using stored procedures, you will reduce the number of times they must bind – typically a single stored procedure call will execute many different SQL statements – each of which must be bound. They will bind once to the stored procedure and PL/SQL will take it from there.




Seconds-per-Query Systems


In what I’ll loosely refer to as data warehouses, instead of running say 1,000 statements per second, they do something like take an average of 100 seconds to run a single query. In these systems, the queries are few but big (they ask large questions). Here, the overhead of the parse time is a tiny fraction of the overall execution time. Even if you have thousands of users, they are not waiting behind each other to parse queries, but rather are waiting for the queries to finish getting the answer.

In these systems, using bind variables may be counterproductive. Here, the runtimes for the queries are lengthy—in seconds, minutes, hours, or more. The goal is to get the best query optimization plan possible to reduce the runtime, not to execute as many of OLTP, one-tenth-second queries as possible. Since the optimizer’s goal is different, the rules change.

As explained in the previous section (considering the example with the status = ‘N’ query against the RECORDS_TO_BE_PROCESSED table), sometimes using a bind variable forces the optimizer to come up with the best generic plan, which actually might be the worst plan for the specific query. In a system where the queries take considerable time to execute, bind variables remove information the optimizer could have used to come up with a superior plan. In fact, some data warehouse-specific features are defeated by using bind variables. For example, Oracle supports a star transformation feature for data warehouses that can greatly reduce the time a query takes (we discuss this feature briefly in Chapter 6, the CBO). However, one restriction that precludes star transformation is having queries that contain bind variables.


In general, when you are working with a seconds-per-query system, use of bind variables may well be the performance inhibitor!
</quote>

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Using constants

Steve Kiteley, December 16, 2005 - 3:33 am UTC

If we declare a PL/SQL variable as a constant...

c_value CONSTANT VARCHAR2(6) := 'AVALUE';

And then use it in a query...

SELECT...
INTO...
FROM...
WHERE a_column = c_value;

Is the parser clever enough to generate a specific execution path i.e. treat it as

SELECT...
INTO...
FROM...
WHERE a_column = 'AVALUE';

or will it generate a generic one as discussed above?



Tom Kyte
December 16, 2005 - 8:36 am UTC

PLSQL won't, SQL *might*, maybe.

It'll bind peek if possible, but if you have:


drop table t;

create table t as select 99 id, a.* from all_objects a;
update t set id = 1 where rownum = 1;
create index t_idx on t(id);

exec dbms_stats.gather_table_stats( user, 'T', method_opt=>'for all indexed columns' );

alter session set sql_trace=true;
declare
l_id_99 constant number := 99;
l_id_1 constant number := 1 ;
l_cnt number;
begin
select count(object_type) into l_cnt from t bind_99_first where id = l_id_99;
select count(object_type) into l_cnt
from t bind_99_first
where id = l_id_1;


select count(object_type) into l_cnt from t bind_1_first where id = l_id_1;
select count(object_type) into l_cnt
from t bind_1_first
where id = l_id_99;
end;
/



for example, you can expect to see EITHER a full scan OR a index range scan - depending on which gets executed first (even though the queries are "different" - plsql starting with 9204 or 9205 "normalizes" the queries on us)

tkprof says:

SELECT COUNT(OBJECT_TYPE) FROM T BIND_99_FIRST WHERE ID = :B1

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 59 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=703 pr=0 pw=0 time=14462 us)
49270 TABLE ACCESS FULL T (cr=703 pr=0 pw=0 time=49365 us)
--------------------------------------------------------------------------------
SELECT COUNT(OBJECT_TYPE) FROM T BIND_99_FIRST WHERE ID = :B1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 59 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=703 pr=0 pw=0 time=7750 us)
1 TABLE ACCESS FULL T (cr=703 pr=0 pw=0 time=7704 us)
--------------------------------------------------------------------------------
SELECT COUNT(OBJECT_TYPE) FROM T BIND_1_FIRST WHERE ID = :B1

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 59 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=3 pr=0 pw=0 time=125 us)
1 TABLE ACCESS BY INDEX ROWID T (cr=3 pr=0 pw=0 time=98 us)
1 INDEX RANGE SCAN T_IDX (cr=2 pr=0 pw=0 time=69 us)(object id 52144)
--------------------------------------------------------------------------------
SELECT COUNT(OBJECT_TYPE) FROM T BIND_1_FIRST WHERE ID = :B1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 59 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=796 pr=0 pw=0 time=38758 us)
49270 TABLE ACCESS BY INDEX ROWID T (cr=796 pr=0 pw=0 time=147872 us)
49270 INDEX RANGE SCAN T_IDX (cr=98 pr=0 pw=0 time=49309 us)(object id 52144)


99.99999999999%

A reader, December 16, 2005 - 5:04 am UTC

you know how much 99.99999999999% is ?

SQL> select (100-99.99999999999) * 1000 * 1000 * 1000 * 100 from dual;

(100-99.99999999999)*1000*1000*1000*100
---------------------------------------
                                      1

I suspect 99.99999999999% is a bit too much. 

Tom Kyte
December 16, 2005 - 8:42 am UTC

100 minus 99.9999999999?

I think you meant:

ops$tkyte@ORA10GR2> set numformat 99999999999999999999.99999
ops$tkyte@ORA10GR2> select (99.99999999999/100) * 1000 * 1000 * 1000 * 100 from dual;

(99.99999999999/100)*1000*1000*1000*100
---------------------------------------
                      99999999999.99000


that sounds about right to me :)


Yes, I was using hyperbole. 

Using Oracle as DBMS for analytical processing

Deepak Bajracharya, December 19, 2005 - 11:12 am UTC

The response was good enough that we are quite convinced that using blind variable is not the 100% bullet proof technique and as with every case requires sound judgement. So far, I know Oracle is heavily used for OLTP application and it does have very good transaction throughput. While creating database, DBCA provides option for data warehouse. I am curious how those settings make oracle better for data warehouse? Besides, is oracle database the appropriate tool for data mart. I have heard about Oracle express in the past but I don't have much idea about it. Does the current version of Oracle(10gR2) makes Oracle Express obsolete? Thanks in advance.

Tom Kyte
December 19, 2005 - 12:17 pm UTC

Oracle Express is now part of the database (we moved the functionality of an external analytical engine into the core database itself)

DBCA sets various parameters differently for a data warehouse, that is the most significant difference.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library