Thanks for the question, John .

Asked: September 08, 2005 - 11:11 am UTC

Last updated: November 28, 2008 - 5:12 pm UTC

Version: 9.2.0

Viewed 1000+ times

You Asked

Tom,

During the 05' Hotsos symposium you made a very puzzling statement about histograms that I still can't believe what I heard. Did I hear you correctly when you said that the first SQL statement that uses a particular histogram will determine the query plan for ALL subsequent SQL statements that use that histogram? If this is true, why use a histogram? The purpose of using a histogram is to give the optimizer enough information to determine the best query plan, but with histograms this is true only with the first SQL statement! With the rest SQL statements, the optimizer will be flying blindly! For instance, if the first SQL statement uses an index, then a subsequent SQL statement that uses the same histogram but in this case perform a full table scan will be forced to use the same execution plan that calls for index usage. This can't be the way it works. Please tell me that I misunderstood you.

Is this the way it works? If it is, then are histograms worthless? Is this a flaw that will be fixed in future releases?

Thanks,

John Cantu

and Tom said...

That is with bind variables.

In a transaction system, you execute hundreds/thousands of queries per second. Therefore you MUST use bind variables and reduce the number of soft parses as much as possible.

I would say yes, in a transactional system, histograms are not excessively useful.

however, when you get to the reporting/data warehouse side of things, they become most useful. In these cases (especially the data warehouse) you do not necessarily use bind variables (sure in the hundreds of "normal" queries you absolutely will - but not the 'big' query)

But even in a transactional system -- not *everything* is bound, you still have many literals in many queries and these will use the histograms once again upon the first hard parse to find "the best overall general plan" for that query.

Rating

(35 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Shift of paradigm

Mikito Harakiri, September 08, 2005 - 3:23 pm UTC

I can believe that bind variables and cursor sharing idea were important 20 years ago when processors were virtually impotent and they had dwarf memory. Why parsing a cursor today is still expensive? Does it have to be that way? What about cursor sharing, is it still important with today's RAM standards?

September 08, 2005 - 6:01 pm UTC

You'd be wrong if you believe that.

remember, it is called *a shared pool*.

it is not "mikito's private pool"

the expense of parsing relates to the latching of shared data structures.

hard parsing is horrible.
soft parsing, only slightly less so.

You know, in 20 years, a row lock is still a row lock, we wait for it. Doesn't matter how fast the CPU's are, how much ram you got - serialization devices are still serialization devices.

Did you know that for a simple insert statement that is hard parsed -- about 4/5th's the time (single user mode, gets much worse in multi-user mode) is spent *parsing* your sql versus actually executing it.

Tell me, do you compile your subroutines - when the user picks them from a menu - dynamically link them in - run it and throw it away??

I doubt it.

So, that is what parsing is, you are compiling a program.

Alberto Dell'Era, September 08, 2005 - 6:17 pm UTC

I've personally seen a powerful Sun E16k, 8 CPUs, basically blocked and using only 10-20% of its cpu power (almost all on parsing) - and spending all its time waiting on "latch free".

The character ":" was almost never seen in v$sql.sql_text.

To see is to believe :)

Peter Nybo Rasmussen, September 09, 2005 - 9:56 am UTC

Tom, you said:
<quote>
In a transaction system, you execute hundreds/thousands of queries per second. Therefore you MUST use bind variables and reduce the number of soft parses as much as possible.
</quote>

I believe what you meant was: ... use bind variables and reduce the number of HARD parses ...

Also, your clear distinction between transactional and reporting/data warehouse systems is not always possible, e.g. some systems are mainly "transactional", but nevertheless have (few or many) "reporting/data warehouse" queries. To take full advantage of both bind variables and histograms I would like to suggest the following (I know you are a great fan of rules of thumb :-)

1) In general, just use bind variables, BUT ...

2) Do not use bind variables for columns with massively skewed data, at least not if there are only few distinct values (usually the case). Then the extra parsing penalty will be minimal (e.g. just ONE extra hard parse for a yes/no column), but the optimizer can use different plans (based on the histograms) for each distinct value.

3) ... or maybe even simpler: Do not use bind variables for columns with few distinct values !

Why not ?

September 09, 2005 - 11:08 am UTC

Nope, I very much purposely said:

therefore you must BIND and reduce the number of soft parses as much as possible.

It was very much on purpose. the use of binds is what reduces the hard parses. YOU the programmer can reduce the number of soft parses you do.

I put it easier:

if you are going to be executing a bunch of queries per second (transactional application) BIND.

if you are a once a month report run by a single user, maybe not bind some of the values.

if you are a data warehouse user - don't bind that DW query

It all comes back to math, a single unbound statement executed hundreds or more times is all it takes to knock you out.

Peter Nybo Rasmussen, September 09, 2005 - 12:49 pm UTC

Tom,
sorry, I misunderstood the little "and" word.

But you did not really answer my other point. I was not suggesting an unbound statement as a whole, just not binding columns with skewed distribution and few distinct values (e.g. a "yes/no" column, a "status" column, etc). That would mean almost no extra parsing, but potential great benefits from different execution plans ?

September 09, 2005 - 12:55 pm UTC

I would only use that if bind variable peeking was becoming an issue -- as a fix for a specific problem, rather than in general.

Peter Nybo Rasmussen, September 09, 2005 - 1:23 pm UTC

Exactly, but why not in general ?
Why would I ever (talking performance, not security) bind a "yes/no" column if distribution is skewed and I have both "yes" queries and "no" queries ?

September 09, 2005 - 1:41 pm UTC

becuase the yes/no case is the worst case and typically isn't the general case.

IF (at that place in the application you ALWAYS use "Y")
then
by all means, don't bind it, you only bind that which varies
end if

(same for "N")

but to have at that place "sometimes Y, sometimes N", seems to me you are a report and the "if you are a report, maybe you don't bind" comes into play.

But again, I would only do that for bind peeking issues - if the bind works fine, go for it.

John Cantu, September 10, 2005 - 12:26 pm UTC

Tom,

Thanks for the answer; however, I have a couple of questions concerning some of your statements.

The first statement:

"In these cases (especially the data warehouse) you do not
necessarily use bind variables (sure in the hundreds of "normal" queries you absolutely will - but not the 'big' query)"

Are you recommending that if we have developed a data warehouse application using bind variables everywhere, we should redesign those applications so that they don't use bind variables with the big queries?

"But even in a transactional system -- not *everything* is bound, you still have many literals in many queries and these will use the histograms once again upon the first hard parse to find "the best overall general plan" for that query."

Our application is using bind variables everywhere! We don't use literals in queries because Oracle experts have recommended the use of bind variables over literals. I don't remember them stating stipulations. Sigh... It is very hard to keep a good reputation as a database engineer with the rules and recommendations constantly changing.

September 10, 2005 - 12:52 pm UTC

If you have queries that take "a long time" to run and would benefit from the optimizer having access to the literal values, THEN not binding in that case is OK, something to be considered.

Remember, we are talking about queries that take minutes or hours to run - you are not going to be hard parsing them like crazy. They take too long to run to be a problem (and if you are constructing them dynamically - ad-hoc like, they are likely unique anyway - bind or not).

In a transactional system - if you have a query like:

select *
from t
where x = ?
and y = ?
and status = 'CLOSED';

it is perfectly OK to "not bind" status='CLOSED' - because the value for 'CLOSED' will never ever change.

You bind those values that change from execution to execution.

You do not bind those values that do not change from execution to execution.

The rules haven't really changed.

cursor_sharing and bind variable

Enayet, September 11, 2005 - 8:29 am UTC

okay, i do know that bind variable is useful to reduce hard parsing; thus improved performance. i also know that oracle introduced a new initialisation parameter: cursor_sharing.

questions:

1. can we use cursor_sharing=similar and use literal instead of bind variable? my reason of using literal is: if we use bind variable oracle will not able to determine an effective query execution plan, since it does not know the value of the bind varaible yet. Is not it better to use letral combined with cursor_sharing parameter?

2. histogram is very obscure to me. can you enlighten me ...

September 11, 2005 - 8:44 am UTC

1) you really have a query that you execute alot - in a transactional application - and the plan should be generated based on the inputs each time? (for each unqiue set of inputs?)

If you use cursor sharing similar, you might (stress might) reduce the number of plans generated - then again, you might have exactly the same number of plans.

Say "x" is skewed and "where x = ?" could result in different plans for different X's.

Say your query is "select * from t where x = ?".

Enabling cursor sharing similar will result in exactly as many child cursors you had before - becaues each of "x=5", "x=6", "x=7", ... and so on will generate their own plan, their own child cursor -- when similar is used.

2) a distribution of data.....

between the values of:

0 and 10 there are 10 rows
11 and 22 there are 2 rows
23 and 30 there are 100000 rows
31 and 40 there are 423 rows
......

for example

Peter Nybo Rasmussen, September 11, 2005 - 8:37 am UTC

"You bind those values that change from execution to execution.
You do not bind those values that do not change from execution to execution."

How about a slight modification:

You bind those values that change a lot (many distinct values).
You do not bind those values that change only a little (few distinct values).

This would allow different execution plans based on histograms when appropiate, with no significant extra parsing. I find it easier to think about distinct values rather than considering "transactional" or "datawarehose".

September 11, 2005 - 8:45 am UTC

nope, cause you'd have to sit there and think for every bind value "Hmm, are there 'few' or are there 'many', or what - and will it stay 'few' forever"

Most developers don't care, nor do they know that information off hand.

Why are histograms organized in ranges only?

Roman Filipsky, September 12, 2005 - 5:45 am UTC

Hello,
we found that the histograms as they are implemented in Oracle are usualy almost worthless (maybe it's just the case with our app). We think that the histograms based on distinct values would be much more effective:
Say there are 1mio different values in a column. Out of these there are 200 values which occur very often. So the histogram should put these 200 values aside and use full scan for them and index access for all others. With histograms as they are implemented we usualy encounter the situation that one of those 200 values drags another value with it thus causing full scan for the value although access via index would be better by far. On the contrary the histogram might collect only values for which index makes sense while full scan would be better for the rest.

Regards, Roman

September 12, 2005 - 7:35 am UTC

and you really want in the dictionary 1,000,000 rows with counts... "ouch" comes to mind.

I agree they do not work 100% of the time - but I wouldn't want to have a distinct set of values either.

Range based histograms

Roman Filipsky, September 12, 2005 - 7:56 am UTC

Hello Tom,
I think you got me wrong. I meant only those 200 values would be kept and not the other ones. These values we are sure are to be looked up using FTS and the rest using indexes.
The opposite type of histogram could store only those values which are to be looked up using indexes and the rest using FTS.
I know that these types of histograms are not absolutely generic but in many cases they would perform much better than current histograms.
Also, partitioned tables. Data in different partitions may vary a lot (consider SMS traffic, for certain period of time you have huge traffic to some number (TV show) while for most of the time there is almost no traffic at all). Having just global histograms is not helpful here.
I think that Oracle might consider adding new types of histograms for certain cases.

September 12, 2005 - 8:04 am UTC

well, we already have local histograms for partitioned tables and use them...

But remember, histograms are not about "use index, don't use index", they are about getting the right cardinality. If we kept "exceptions", we wouldn't understand the norm - and what happens when you have 300 exceptions, not just 255 and so on?

Range based histograms

Roman Filipsky, September 12, 2005 - 8:16 am UTC

Local histograms: My mistake, there is view user_part_histograms.
Set based histograms: With growing number of distinct entries the histogram could turn back into normal range based histogram. But in many cases this would not be neccesary. Also histograms could be combined, say 200 most outstanding values could be kept separately and the rest could be range based.
I am writing this because the histograms as they are implemented now dont realy work well for us. I was just thinking about how to make them behave better in certain (although maybe not that rare) cases.

Best regards, Roman

September 12, 2005 - 8:34 am UTC

Don't get me wrong, I'm interested - but what might work really well for you would be dynamic sampling - have you looked at that? It'll happen at hard parse time and can "fix" many issues like this - even more complex ones, such as cross column correlations (when x = 5, you will get 10 distinct y values -- when x = 6, you will get 10000 distinct y values sorts of issues)

Dynamic sampling

Roman Filipsky, September 12, 2005 - 8:51 am UTC

I did not want to use dynamic sampling as I was afraid of potential performance problems. I will do some tests now.
Now this is not related directly to histograms but I still think it might go here:
(10g) I suspect that dbms_stats.gather_table_stats(...,estimate_percent=>0.01,block_sample=>true)
walks more than 0.01% of blocks. When I looked at the SQL it generated (via TOAD, session browser), it looked like: select ... from <table> sample 7.xx ...
Since I know that the table is very big I want to limit I/O. Do you think I should set something else (besides those two arguments) to force Oracle to read not more than 0.01% of table blocks?

September 12, 2005 - 9:05 am UTC

0.01??? how big are these tables.

Dynamic sampling

Roman Filipsky, September 12, 2005 - 9:22 am UTC

Usualy more than 300GB (3 billion records is not unusual). I always load data into some stage tables, then gather stats on them and then do exchange with some selected partition. I understand that partition-level information is not enough and so I have to gather GLOBAL stats when substantial amount of partitions are exchanged.
Maybe there are ways to propagate partition-level stats into global stats without resampling the table itself. Maybe I dont need the global stats at all. Could you please help me here?
To be honest I do this primarily to avoid dynamic sampling at runtime when joining to other tables which have stats gathered. If only these large tables were queried than the stats would not be needed (queries usualy work fined without stats on these tables).

Best Regards, Roman

September 12, 2005 - 10:23 am UTC

Given the size of your objects - the relatively small price to pay for dynamic sampling at hard parse time may well pay for itself many times over in the form of a much better plan.

You can use dbms_stats to SET the statistics yourself if you like (global stats). the optimizer cares not where they come from - just that they are there.

And if your queries ALWAYS eliminate down to a single partition, local stats are used to optimize them. global stats are used for queries that cross more than one partition or for when the partition is not known at parse time.

Dynamic sampling

Roman Filipsky, September 12, 2005 - 11:39 am UTC

So if I hardly have queries that target only a single partition (most of the time query spans tens or even hundreds of partitions) then it makes no sense to have partition level statistics. Only global stats which I set manualy should be sufficient? There are also one to 4 local indexes on the tables. If I dont have local stats, only global ones, would it affect execution plans (if more than 99% of queries affect multiple partitions)?

Regards, Roman

September 12, 2005 - 2:21 pm UTC

when queries can touch more than one partition, global stats are used.

only when the query is known at parse time will local statistics be used.

However, you can use local statistics to more "accurately" set the global statistics.

On "cursor_sharing and bind variable " response

Bipul, September 13, 2005 - 11:27 am UTC

Hi Tom.

I am a bit puzzled. In the review you said

"If you use cursor sharing similar, you might (stress might) reduce the number of
plans generated - then again, you might have exactly the same number of plans."

What are the factors that influences the number of plans generated ? I thought, if the cursor_sharing is set to similar, then all literals will be replaced by ":SYS_B_?" by optimizer. This is what we see in our database.

"Say your query is "select * from t where x = ?".

Enabling cursor sharing similar will result in exactly as many child cursors you
had before - becaues each of "x=5", "x=6", "x=7", ... and so on will generate
their own plan, their own child cursor -- when similar is used."

I thought the query for different value of x will generate same child cursor under cursor_sharing similar.

I did a quick test and it shows :

I have a table x_bk with just one column Y. Y has only two distinct values.

ALTER SESSION SET cursor_sharing='EXACT';

Then excute the following queries

SELECT/*cur_test_exact*/ 1 FROM X_bk WHERE y ='OLD';
SELECT/*cur_test_exact*/ 1 FROM X_bk WHERE y ='NEW';
SELECT/*cur_test_exact*/ 1 FROM X_bk WHERE y ='NEW1';
SELECT/*cur_test_exact*/ 1 FROM X_bk WHERE y ='NEW2';

And check the v$sql. It shows that there is one cursor for ach of the query. This is as expected.

SELECT sql_text, sql_id,fetches, executions, child_number FROM v$sql WHERE sql_text LIKE '%cur_test_exact%';

Result of the query
----------------------
SQL_TEXT SQL_ID FETCHES EXECUTIONS CHILD_NUMBER
SELECT/*cur_test_exact*/ 1 FROM X_bk WHERE y ='OLD' gby4vpkxftpdu 1 1 0
SELECT/*cur_test_exact*/ 1 FROM X_bk WHERE y ='NEW' 89qgxth63a4vu 1 1 0
SELECT/*cur_test_exact*/ 1 FROM X_bk WHERE y ='NEW1' fm8bmcv9h70qm 1 1 0
SELECT/*cur_test_exact*/ 1 FROM X_bk WHERE y ='NEW2' cf052wn6rbbvw 1 1 0

Then I changed the cursor_sharing to SIMILAR.

ALTER SESSION SET cursor_sharing='SIMILAR';

And execute the following queries.

SELECT/*cur_test_sim*/ 1 FROM X_bk WHERE y ='NEW2';
SELECT/*cur_test_sim*/ 1 FROM X_bk WHERE y ='NEW1';
SELECT/*cur_test_sim*/ 1 FROM X_bk WHERE y ='NEW';
SELECT/*cur_test_sim*/ 1 FROM X_bk WHERE y ='OLD'

And take a look at the v$sql.

SELECT sql_text, sql_id,fetches, executions, child_number FROM v$sql WHERE sql_text LIKE '%cur_test_sim%';

Result of the query
--------------------

SQL_TEXT SQL_ID FETCHES EXECUTIONS CHILD_NUMBER
SELECT/*cur_test_sim*/ :"SYS_B_0" FROM X_bk WHERE y =:"SYS_B_1" 327g25h6znrq8 4 4 0

There is only one cursor for all the four queries with different value of y. And this is what I expected. Am I misinterpreting your review or something is missing ?

Thanks
bipul

September 13, 2005 - 1:00 pm UTC

when the bound value "COULD" change a plan (when the optimize says "different values could cause different plans") that bind is marked UNSAFE and its value is added to the signature of the generated plan when using cursor sharing = similar.  

So, if you have skewed data - and a predicate on it, you could end up with a child cursor PER BIND value, for example:

ops$tkyte@ORA9IR2> drop table t;

Table dropped.

ops$tkyte@ORA9IR2> alter system flush shared_pool;

System altered.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t
  2  as
  3  select object_id, rpad('*',20,'*') data
  4    from all_objects
  5   union all
  6  select 0, rpad('*',20,'*') data
  7    from all_objects
  8  /

Table created.

<b>object id = 0, lots of values...</b>

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_idx on t(object_id);

Index created.

ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all indexed columns size 254', cascade => true );

PL/SQL procedure successfully completed.

ops$tkyte@ORA9IR2> set linesize 121
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where object_id = 55;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=16)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=16)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=1)



ops$tkyte@ORA9IR2> select * from t where object_id = 0;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=26 Card=30418 Bytes=486688)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=26 Card=30418 Bytes=486688)


<b>optimizer would do different things based on different binds for object id, so object id is considered "not safe"</b>


ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter session set cursor_sharing=similar;

Session altered.

ops$tkyte@ORA9IR2> set autotrace traceonly statistics;
ops$tkyte@ORA9IR2> select * from t where object_id= 55;


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        453  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ops$tkyte@ORA9IR2> select * from t where object_id= 0;

30659 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2274  consistent gets
          0  physical reads
          0  redo size
     400999  bytes sent via SQL*Net to client
      22976  bytes received via SQL*Net from client
       2045  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      30659  rows processed

ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> select sql_text from v$sql where sql_text like 'select * from t where object_id=%';

SQL_TEXT
-------------------------------------------------------------------------------------------------------------------------
select * from t where object_id= :"SYS_B_0"
select * from t where object_id= :"SYS_B_0"

<b>two child cursors, in fact:</b>

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> declare
  2          l_data t.data%type;
  3  begin
  4          for x in ( select distinct object_id from t )
  5          loop
  6                  execute immediate 'select data from t where object_id = '
  7                          || x.object_id || ' and rownum = 1'
  8                  into l_data;
  9          end loop;
 10  end;
 11  /

PL/SQL procedure successfully completed.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*) from v$sql where sql_text like 'select data from t where object_id = %';

  COUNT(*)
----------
      5010

<b>I had so many, the shared pool did not hold them all....</b>

Got it now

Bipul, September 13, 2005 - 1:35 pm UTC

Hi Tom.

Thanks for the explanation. I knew I was missing something, but wasn;t sure what. I guess my test case was not appropriate. I see this behaviour [i.e. several child cursor for same sql_text under cursor_sharing similar] on our live db and was not sure why it happens. And also I got now what "unsafe" bind value means. I read it so many times on other "tips" site, but never understood what it meant.

Thanks again! And btw, when is your new book coming out?

Regards
bipul

September 13, 2005 - 4:08 pm UTC

September 19th is when it should start hitting the stores.

Dynamic sampling

Andrew Max, September 13, 2005 - 2:08 pm UTC

To Roman Filipsky:

You wrote:
"I did not want to use dynamic sampling as I was afraid of potential performance problems".

In fact, appropriate and proper use of dynamic sampling will improve performance -- after all, I believe if it was a useless or harmful feature -- then Oracle would not implement it at all :)

Indeed, there are situations where any pre-built statistics (I mean statistics gathered before statement parsing) can not help CBO in generating an optimal execution plan.
For example, it may turn out that some columns which are referenced in WHERE clause are correlated.

Consider this as an example:

SQL> select * from v$version where rownum = 1;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Prod

SQL> create table correlated(EmpNo int, EmpDept int, EmpData int);

Table created.

SQL> begin
  2   insert into correlated
  3   select 1, 2, rownum from all_objects
  4    where rownum <= 20000;
  5   insert into correlated
  6   select 2, 3, rownum from all_objects
  7    where rownum <= 20000;
  8   insert into correlated
  9   select 3, 1, rownum from all_objects
 10    where rownum <= 20000;
 11  end;
 12  /

PL/SQL procedure successfully completed.

SQL> create index idx_cor on correlated(EmpNo, EmpDept);

Index created.

SQL> begin
  2   dbms_stats.gather_table_stats(
  3    ownname => user,
  4    tabname => 'CORRELATED',
  5    cascade => true);
  6  end;
  7  /

PL/SQL procedure successfully completed.

Now, let's issue query which selects all rows where EmpNo = 1 and EmpDept = 2. When query is parsed, CBO will realize that first predicate selectivity is 1/3 (that's because EmpNo = 1 condition is true for every third row in our table). The same is for second predicate, EmpDept = 2 -- selectivity is 1/3 as well.
These two predicates are AND-ed together -- therefore, CBO will simply multiply selectivity estimates to compute total selectivity:

SP = SP1 * SP2 = 1/9

Let's go further and see what CBO will do:

SQL> set autotrace traceonly
SQL>
SQL> select * from correlated t
  2   where EmpNo = 1 and EmpDept = 2;

20000 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=31 Card=6649 Bytes=66490)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'CORRELATED' (TABLE) (Cost=31 Card=6649 Bytes=66490)
   2    1     INDEX (RANGE SCAN) OF 'IDX_COR' (INDEX) (Cost=16 Card=6667)




Statistics
----------------------------------------------------------
        196  recursive calls
          0  db block gets
       2782  consistent gets
          0  physical reads
          0  redo size
     296783  bytes sent via SQL*Net to client
      15159  bytes received via SQL*Net from client
       1335  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
      20000  rows processed

Look at that Card=6649 estimation and recall that our table have 60000 rows in it. So, we can compute "expected" cardinality as 60000 * 1/9 = 6667. Very close to CBO estimation.

But had CBO "guessed" right? Is that plan really optimal? For both question -- the answer is "no".
In fact, our query returned 20000 rows. That's because columns are strongly correlated: EmpDept = 2 is true only if EmpNo = 1 and vice versa. Therefore an actual selectivity for both predicates is 1/3, very different from 1/9.

I believe that no one kind of pre-built statistics (even above-discussed histograms) can help CBO here. But dynamic sampling, "on-the-fly" by its nature, gives us a chance here:

SQL> select /*+ DYNAMIC_SAMPLING(t 10) */ * from correlated t
  2   where EmpNo = 1 and EmpDept = 2;

20000 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=42 Card=19947 Bytes=199470)
   1    0   TABLE ACCESS (FULL) OF 'CORRELATED' (TABLE) (Cost=42 Card=19947 Bytes=199470)




Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
       1677  consistent gets
          0  physical reads
          0  redo size
     296783  bytes sent via SQL*Net to client
      15159  bytes received via SQL*Net from client
       1335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      20000  rows processed

That's much better. Card=19947 is not 100% accurate estimation, but very close to it. Execution plan also changed, now Oracle is using FULL SCAN -- for query which returns every third row from the table it's definitely the best choice. Look at "consistent gets" statistics after all...

So, the way I see it, dynamic sampling is great feature when used properly.

Hope I didn't miss somewhere...

Kind regards ;)

John Cantu, September 14, 2005 - 1:05 pm UTC

Does the "9I BIND VARIABLE PEEKING" NOT work the way it is suppose to work? If it did work, then we there wouldn't be a problem with histograms/bind variables. I remember when I attended the 03' HOTSOS symposium, most of the panel experts agreed that the new "9i Peaking" was the solution to being able to use histograms with bind variables. Were they misinformed by Oracle?

September 14, 2005 - 1:38 pm UTC

I don't know what you mean?

What misinformation?

I was at the 2003, 2004, 2005 hotsos conference - I'm not sure what you are alluding to.

Bind variable peeking works like this:

the first time a query is HARD PARSED, the bind variable values that are supplied with that query are taken into consideration. That is:

select * from t where x = :x;

when hard parsed with a bind value of 12345 will be parsed as if it were

select * from t where x = 12345;

histograms, all statistics in fact, that are available -- will be used to optimize that query.

The next time the query

select * from t where x = :x;

is soft parsed, it will use the existing plan (else, it would be a hard parse again!)

That is what bind variable peeking is designed to do. Rather than the optimizer say ":x is very generic, we'll just use heuristics with the stats to figure out what the generic cardinality would be", bind variable peeking makes the assumption that the bind variable values passed in with the first hard parse are "representative" of the types of inputs that will be passed in and makes the plan based on that.

folklore

Mikito Harakiri, September 14, 2005 - 3:34 pm UTC

This is known in database circles as "The reasonable first user assumption". Imagine the first user comes to work, bang the head into the keyboard, the value "fdgdbvdf" gets into the entry field, the query with this bind variable is submitted, and all the rest of the users have wonderful performance experience.

Bind Variable Peeking worthless?

John Cantu, September 14, 2005 - 7:00 pm UTC

Well, that is the problem. It shouldn't assume. It should find out instead. If it didn't make that assumption, then this bind peeking would be the answer to histograms not working with bind variables in 8i. But instead, Oracle has just given the impression that they have fixed a problem that is still a problem, and that is, histograms still don't work with bind variables in 9i.

What good is it that it works great for the first one? What if the first SQL to be executed performs a full table scan on a 500 MB table because the query returns 99% of the table data, but the rest of the queries should have used an index since it only returns back 1kb of data. Now, we have every query performing 500 MB I/O operations per query.

Do you agree that bind variable peeking is mostly worthless?

September 14, 2005 - 8:30 pm UTC

No, not at all.

The assumption is - at that point in the program, the binds are "representative" (you are a transactional system).

If it did not assume, it would hard parse, hard parse is "bad".

It is an attempt to alleviate issues rising from a very complex issue. Where did Oracle give the impression they "fixed" as in totally solved a problem?

I look at this as an incremental refinement.

John Cantu, September 15, 2005 - 12:57 pm UTC

Where did Oracle give the impression they "fixed" as in totally solved a problem?

The impression came from the statements and excitement some of the panel personel gave when they said that histograms now work with bind variables in 9i. It would have been nice if they had also given all its shortcomings.

Thank you, Tom :)

September 15, 2005 - 1:05 pm UTC

but they *do* work with transactional applications, this is sort of my point.

The assumption made by the developers holds "this query at this point in the application will have representative binds sent to it and we should use them to optimize"

Histograms - concepts question

AR, October 26, 2005 - 12:53 pm UTC

Tom,
My understanding is that histograms are essentially frequency tables. So as long as the number of buckets remains the same while collecting statistics; the histograms of two schemas with identical data on two very different databases should remain the same. Essentially, the contents of dba_tab_histogram for tables under this schema will be identical. Is that correct?

To elaborate : a) If I were to export a schema, with a few tables, from one database to another (not necessarily identical databases). b) Then were to delete schema stats on both databases. c) Then were to collect schema stats with say the command - execute dbms_stats.gather_schema_stats(ownname=>'SCHEMA',cascade=>TRUE,degree=>4,method_opt=>'FOR ALL INDEXED COLUMNS SIZE 10');

d) Then were to query dba_histograms on both databases with :

select lpad(TABLE_NAME,15), lpad(COLUMN_NAME, 25), endpoint_number, endpoint_value from dba_histograms where owner='SCHEMA_NAME' and table_name in ('TABLE_NAME') order by table_name,column_name;

--> Should I not see the identical results on both the databases?

If not, why not? I am seeing somewhat different histograms on the 2 databases and I don't understand why.

Thank you for your time.

October 27, 2005 - 3:26 am UTC

If you compute, I would probably concur (but frankly, would need to set up a couple of tests - look at the queries used to generate histograms and verify), if you estimate - I would say "no, not necessarily, not at all - it is a sample"

did you sample?

Clarification

Dave Thompson, October 27, 2005 - 4:21 am UTC

Tom,

Could you clarify the following point you made earlier in this thread?

'However, you can use local statistics to more "accurately" set the global
statistics. '

October 27, 2005 - 6:52 am UTC

you can read the local partition stats, aggregate them up, and set global stats yourself if you need.

(eg: if we haven't done that already - we aggregate only when all partitions have stats)

Cardinality Incorrect with Histograms

Steve Mangan, November 08, 2005 - 5:37 am UTC

Tom

Can you explain the following :

We gathered stats on a table and histograms on some columns as shown below.

Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production

begin
dbms_stats.gather_table_stats
(ownname=>'ONST_ASS_HISTORY',
tabname=>'JOBEVENTSHISTORY');

dbms_stats.gather_table_stats
(ownname=>'ONST_ASS_HISTORY',
tabname=>'JOBEVENTSHISTORY',
method_opt=>'FOR COLUMNS WORKLIST_ID SIZE 254');

dbms_stats.gather_table_stats
(ownname=>'ONST_ASS_HISTORY',
tabname=>'JOBEVENTSHISTORY',
method_opt=>'FOR COLUMNS JOB_REFERENCE SIZE 254');

dbms_stats.gather_table_stats
(ownname=>'ONST_ASS_HISTORY',
tabname=>'JOBEVENTSHISTORY',
method_opt=>'FOR COLUMNS JOB_STATUS SIZE 254');

dbms_stats.gather_table_stats
(ownname=>'ONST_ASS_HISTORY',
tabname=>'JOBEVENTSHISTORY',
method_opt=>'FOR COLUMNS ACTUAL_START_TIME SIZE 254');

dbms_stats.gather_table_stats
(ownname=>'ONST_ASS_HISTORY',
tabname=>'JOBEVENTSHISTORY',
method_opt=>'FOR COLUMNS JOB_STATUS_REASON_CODE SIZE 254');

dbms_stats.gather_table_stats
(ownname=>'ONST_ASS_HISTORY',
tabname=>'JOBEVENTSHISTORY',
method_opt=>'FOR COLUMNS DW_MOP_WORKHISTORY_DATE_CLOSED SIZE 254');

end;
/

Then we checked the cardinality estimates for various queries.

CORRECT
--------
select count(*)
from onst_ass_history.jobeventshistory jeh

COUNT(*)
643502

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 1 12
SORT AGGREGATE 1
PARTITION RANGE ALL 1 17
INDEX FAST FULL SCAN ONST_ASS_HISTORY.JOBEVENTSHISTORY_UK 643 K 12 1 17

CORRECT
--------
select count(*)
from onst_ass_history.jobeventshistory jeh
where 1=1
AND jeh.job_status = 'SITE VISIT'

COUNT(*)
176761

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 1 1804
SORT AGGREGATE 1 10
PARTITION RANGE ALL 1 17
TABLE ACCESS FULL ONST_ASS_HISTORY.JOBEVENTSHISTORY 176 K 1 M 1804 1 17

CORRECT
--------
select count(*)
from onst_ass_history.jobeventshistory jeh
where 1=1
AND jeh.actual_start_time >= TO_DATE('01-OCT-2005','DD-MON-YYYY')
AND jeh.actual_start_time < TO_DATE('31-OCT-2005','DD-MON-YYYY') + 1

COUNT(*)
28109

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 1 1903
SORT AGGREGATE 1 4
PARTITION RANGE ALL 1 17
TABLE ACCESS FULL ONST_ASS_HISTORY.JOBEVENTSHISTORY 28 K 109 K 1903 1 17

INCORRECT
----------
select count(*)
from onst_ass_history.jobeventshistory jeh
where 1=1
AND jeh.job_status = 'SITE VISIT'
AND jeh.actual_start_time >= TO_DATE('01-OCT-2005','DD-MON-YYYY')
AND jeh.actual_start_time < TO_DATE('31-OCT-2005','DD-MON-YYYY') + 1

COUNT(*)
27425

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 1 1907
SORT AGGREGATE 1 14
PARTITION RANGE ALL 1 17
TABLE ACCESS FULL ONST_ASS_HISTORY.JOBEVENTSHISTORY 7 K 105 K 1907 1 17

Why does the Optimiser think there will be 7K rows when there are 27K. How can we use stats to get a closer estimate.

This is part of a bigger SQL statement tuning problem.

Thanks

Steve

November 08, 2005 - 10:05 pm UTC

how many unique values in each of those 3 columns

(I cannot wait for Jonathan's book to become generally available - I'm going to say "get this book" a lot :)

Further Column Info

Steve Mangan, November 09, 2005 - 4:09 am UTC

Tom

Here is the extra info you asked for :

SELECT t.owner,
t.table_name,
t.column_name,
t.data_type,
t.nullable,
t.num_distinct,
t.num_nulls,
t.num_buckets,
t.last_analyzed,
t.sample_size
FROM all_tab_columns t
WHERE t.owner = 'ONST_ASS_HISTORY'
AND t.table_name = 'JOBEVENTSHISTORY'
AND t.column_name in ('JOB_STATUS','ACTUAL_START_TIME')

OWNER TABLE_NAME COLUMN_NAME DATA_TYPE NULLABLE NUM_DISTINCT NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE
ONST_ASS_HISTORY JOBEVENTSHISTORY JOB_STATUS VARCHAR2 Y 7 0 6 08/11/2005 10:21:00 643502
ONST_ASS_HISTORY JOBEVENTSHISTORY ACTUAL_START_TIME DATE Y 97348 457445 254 08/11/2005 10:21:35 186057

I think Jonathan's Cost-Based book will be on my Christmas list :-)

November 10, 2005 - 5:04 pm UTC

so for job_status - we know "exactly"

how about actual_start_time - what are the low/high values - is this data skewed?

So much for my indexes!

James, March 15, 2006 - 12:41 am UTC

Well I created an index on the following column:

DOOR_NUM COUNT(*)
---------- ----------
0 23
1 1
2 5872
3 1913
4 29733
5 5652
1237

And then I created a histogram on this data as it is quite skewed, knowing full well that I *didn't* want to use the index on door_num=4.

In fact, I have about 5 other indexes I have created on columns with skewed data...

Now, I used bind variables almost everywhere when I build queries (as we execute 1,000's of select statements a minute) and sometimes massive BITMAP CONVERSION (TO ROWIDS) with large range scans occur, causing my query to generate 20 times the constistant gets a full table scan would generate.

Now, what would you do here Tom?

1. Drop the indexes
2. "Unbind" the columns where I want my histograms to kick in.
3. Use approriate index/no_index hints when I detect what the bind is going to be anyway? (Although if I "unbind" my skewed columns the CBO shouold do this anyway.)

Personally I kind of like 2, as the skewed columns are not the most popular columns to query on, so maybe I could take the "hit" of more hard parsing/filling up the SGA...

I will need to monitor hard parses and the SGA if I try this...

Regards

March 15, 2006 - 4:36 pm UTC

If I had six values - not binding is something to consider.

If I new door number 4 was the culprit, binding one of two queries would be an option:

if ( door is number 4 )
then
open c for select * from t door4 where door_num = :door;
else
open c for select * from t door_not4 where door_num = :door;

bind variable peeking would get me my two plans.

How to avoid soft parse in PL/SQL

Michael Dimitriadis, June 02, 2006 - 4:50 am UTC

I know I have skewed values for a column which is selected in a PL/SQL-procedure through comparism with a PL/SQL-variable. I have histograms but the first time the procedure is executed the database determines through bind peeking the execution plan for the SELECT-statement which will not change for all subsequent calls to the same procedure.
Is there any other way but using EXECUTE IMMEDIATE or DBMS_SQL to avoid that static execution plan in PL/SQL for such statements?

June 02, 2006 - 10:47 am UTC

You would have to use literals - which implies dynamic sql. And that is it...

Make really sure you want to do that - if this skewed column has thousands of values - you probably DO NOT.

When not to use binding

A reader, June 05, 2006 - 9:33 am UTC

Tom,

From the 2 posting above (So much for my indexes! and How to avoid soft parse in PL/SQL).......

What additional things shouuld one look for when looking at the option of binding/not binding. The 2 things I get from the posting above for not binding are
1) The higher the number of distinct values, be away from bind variables.
2) If data skewness is preset, don't use bind variables.

Thank you

June 05, 2006 - 9:42 am UTC

Nope, I would put it like this:

going to execute many queries per second? (OLTP) bind

going to execute queries that take many seconds? (Warehouse) nobind

When not to use binding

A reader, June 05, 2006 - 1:53 pm UTC

Tom,

>>going to execute queries that take many seconds? (Warehouse) nobind

1) If it takes many seconds to execute a query in an OLTP system should we not use bind variables?

In an OLTP system, you would have situations where data skeweness is present and also columns with many distinct values. In such situations
2) How would we measure data skewness and is there a cut off value before we decide to use/not use bind variables?
3) Is there a cut off value for the number of distinct values before we decide to use/not use bind variables?

Thank you for your time.

June 05, 2006 - 2:29 pm UTC

1) it comes down to math - are you in a system that is parsing many queries per second (OLTP), then you will almost certainly be binding virtually every query (else you WILL NOT be parsing many queries per second!!)

2) when I see that "hey, this just ain't working in real life" in most cases (eg: in reaction to a performance problem)

3) no, it is all about "queries per second vs seconds per query"

Bind Peeking and Histogram

Su Baba, August 15, 2006 - 1:19 pm UTC

In the following example, I was expecting that bind peeking would allow Oracle to use histogram on table x thus use the index on column id, but the result indicates otherwise. Can you please explain? Thanks.

select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Prod
PL/SQL Release 10.2.0.2.0 - Production
CORE 10.2.0.2.0 Production
TNS for 32-bit Windows: Version 10.2.0.2.0 - Production
NLSRTL Version 10.2.0.2.0 - Production

DROP TABLE x;

CREATE TABLE x (
id NUMBER NOT NULL,
col2 VARCHAR2(30)
);

INSERT INTO x
SELECT 1, object_name FROM all_objects
WHERE rownum <= 10000;

INSERT INTO x
SELECT 2, object_name FROM all_objects
WHERE rownum <= 20;

INSERT INTO x
SELECT 3, object_name FROM all_objects
WHERE rownum <= 1;

commit;

col table_name for a12
col column_name for a12

SELECT table_name, column_name, num_distinct, num_buckets,
density, histogram
FROM user_tab_columns
WHERE table_name = 'X';

TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS DENSITY HISTOGRAM
------------ ------------ ------------ ----------- ---------- ---------
X ID NONE
X COL2 NONE

select id, count(*) from x group by id;

ID COUNT(*)
---------- ----------
1 10000
2 20
3 1

CREATE INDEX x_n1 ON x(id);
CREATE INDEX x_n2 ON x(col2);

BEGIN
dbms_stats.gather_table_stats(
user,
'X',
cascade => true,
estimate_percent => null,
method_opt => 'FOR COLUMNS id size 3'
);
END;
/

SELECT table_name, column_name, num_distinct, num_buckets,
density, histogram
FROM user_tab_columns
WHERE table_name = 'X';

TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS DENSITY HISTOGRAM
------------ ------------ ------------ ----------- ---------- ---------
X ID 3 3 .000049895 FREQUENCY
X COL2 NONE

VAR x NUMBER
exec :x := 3

set autotrace traceonly

SELECT * FROM x
WHERE id = :x;

Execution Plan
----------------------------------------------------------

---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3340 | 66800 | 12 (9)|
|* 1 | TABLE ACCESS FULL| X | 3340 | 66800 | 12 (9)|
---------------------------------------------------------------

ALTER SYSTEM FLUSH SHARED_POOL;

SELECT * FROM x
WHERE id = :x;

Execution Plan
----------------------------------------------------------

---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3340 | 66800 | 12 (9)|
|* 1 | TABLE ACCESS FULL| X | 3340 | 66800 | 12 (9)|
---------------------------------------------------------------

August 15, 2006 - 2:35 pm UTC

explain plan

a) does not bind peek
b) assumes the bind is ALWAYS a string
c) always does a hard parse (so dynamic sampling can cause confusion as well)

I'll use tkprof with explain=u/p (to get explain plan) and include the row source operation (the TRUTH, what happened - available also in v$sql_plan) to demonstrate a and b.

ops$tkyte%ORA10GR2> create table t
  2  ( pk      varchar2(10) primary key,
  3    skewed  number,
  4    data    char(20)
  5  )
  6  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t
  2  select rownum, 99, 'x'
  3    from all_objects
  4  /

50013 rows created.

ops$tkyte%ORA10GR2> update t set skewed = 1 where rownum = 1;

1 row updated.

ops$tkyte%ORA10GR2> create index t_idx on t(skewed);

Index created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=>'for all indexed columns size 254' );


PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> variable pk number
ops$tkyte%ORA10GR2> variable skewed number
ops$tkyte%ORA10GR2> exec :pk := 1

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec :skewed := 1

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter session set sql_trace=true;

Session altered.

ops$tkyte%ORA10GR2> select * from t where pk = :pk;

PK             SKEWED DATA
---------- ---------- --------------------
1                  99 x

ops$tkyte%ORA10GR2> select * from t where skewed = :skewed;

PK             SKEWED DATA
---------- ---------- --------------------
224                 1 x



Now, tkprof will show:

select * from t where pk = :pk

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS FULL T (cr=249 pr=0 pw=0 time=505 us)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      1   TABLE ACCESS   MODE: ANALYZED (BY INDEX ROWID) OF 'T' (TABLE)
      0    INDEX   MODE: ANALYZED (UNIQUE SCAN) OF 'SYS_C006513' (INDEX
               (UNIQUE))

<b>row source operation is "the truth", we full scanned - why?  because we compared a string (primary key in database) to a number (bind variable) and when comparing a string to a number, we to_number(string) - invalidating the use of an index on the string.

but, explain plan (execution plan), it does not see bind variable types (in this case, it couldn't, the bind just isn't there!  In sqplus, pro*c, oci, whatever - explain plan is consistent - you don't bind to those statements, so it just assumes "string" for the bind type</b>
********************************************************************************
select * from t where skewed = :skewed

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS BY INDEX ROWID T (cr=4 pr=0 pw=0 time=61 us)
      1   INDEX RANGE SCAN T_IDX (cr=3 pr=0 pw=0 time=57 us)(object id 55124)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      1   TABLE ACCESS   MODE: ANALYZED (FULL) OF 'T' (TABLE)

<b>Now we have the opposite - reality says "index range scan", because the optimizer peeked the bind and optimized "where skewed=1" which uses an index.

explain plan optimized "where skewed=????", the optimizer said "two values for skewed, 50,000 rows - where skewed = ???? therefore returns an average of 25,000 records - we will full scan</b>


use v$sql_plan and/or tkprof with the row source operation if you are

a) having statistics that would cause bind peeking to have an effect, eg: histograms, local statistics on partitions..

b) binding things other than strings (or at least use to_number(), to_date() in the explain plan)

where are bind varible peeking values?

Cristian, January 29, 2007 - 9:49 am UTC

hi tom,
we experience performance problems, probably caused by bind peeking. Does the optimizer save samewhere the values it uses
with bind variable peeking where calculates the execution plan during the hard parse?

January 31, 2007 - 1:27 pm UTC

no, it does not.

histogram

Samuel, August 25, 2007 - 8:58 am UTC

your above statement: having statistics that would cause bind peeking to have an effect, eg: histograms, local statistics on partitions..

(1) Does it mean that for oracle to do bind peeking, we should gather histograms?

(2) our database version is 9.2.0.7. we use bind variables in our code. This is oltp and reporting database as well. We do not gather histograms as we use bind variables. We have some important tables partitioned (Range-list)recently. Do you think we should start gathering histograms as well.

Thanks.

August 27, 2007 - 4:10 pm UTC

1) no, Oracle will bind peek - but the question is "so what if it does, if there are not any statistics that would change its mind"

If you want bind peeking to affect the selected plan, you need to have statistics in place that could affect the plans.

2) no, are you experiencing poor performance?

Histograms or Not

Matt, February 07, 2008 - 5:49 am UTC

Tom,

We're seeing intermittent poor performance due to bind variable peeking and different skews of data.

Looking in metalink at note Note:377847.1

"When bind variables are peeked. The parse engine makes a decision as to the 'safety' of these peeked values for creating plans based upon whether it is felt that different values could produce different plans.

The usual reason for such different plans is the presence of histogram column statistics on the column in question when using the Cost Based Optimizer (CBO). If there are histograms on the column that the bind is being compared with, then the bind value is deemed to be unsafe because there is potential that different values could produce a different explain plan. If the bind is unsafe then multiple children are created for each set of different bound values so that different plans can be associated with them. This occurs in a few scenarios but the most common is with histogram stats on an equality predicate. "

I have a testcase where we have a table with a million records and a report_id column which is heavily skewed 90% to a single value. This column currently has no histogram on it, but we're seeing access to this table differ for the same SQL quite frequently - sometimes using a poor plan (index) for the large set.

Based on the metalink note we assumed by adding a histogram, that the optimizer would say "Hey I know about this column, its unsafe to bind peek because of the skew" so I'll create extra child cursors (bind_mismatch="Y") and a new plan for each set of different bound values.

This doesn't appear to work in practise, (Oracle 10g R2 10.2.0.3).

February 07, 2008 - 8:39 am UTC

that is not the way it works in 10g - bind peeking is done ONCE at hard parse time and a plan is chosen, that plan will be used.

https://www.oracle.com/technetwork/issue-archive/2008/08-jan/o18asktom-090158.html

describes this in some detail.

Hisogram for IN cluase

virtualdba, July 23, 2008 - 5:11 am UTC

Hi Tom,

   I have conducted a test case based on the following.

Table A Contains (table_name, tablespace_name) Index on tablespace_name (IDX_A). Histogram on column tablespace_name size 50.

Table A has the following data
Tablespace         Count(*)
TBS_ODS_LRG_DAT    80
SYSAUX             3440     <--- Highest count
WORK_32            208

set autotrace traceonly

1) Use index
SQL> select * from a where tablespace_name='TBS_ODS_LRG_DAT';

80 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2623187831

--------------------------------------------------------------------------------                                                                                                 -----
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time                                                                                                      |
--------------------------------------------------------------------------------                                                                                                 -----
|   0 | SELECT STATEMENT            |       |   627 | 16302 |    39   (0)| 00:00                                                                                                 :01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| A     |   627 | 16302 |    39   (0)| 00:00                                                                                                 :01 |
|*  2 |   INDEX RANGE SCAN          | IDX_A |   627 |       |     2   (0)| 00:00                                                                                                 :01 |
--------------------------------------------------------------------------------                                                                                                 -----

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TABLESPACE_NAME"='TBS_ODS_LRG_DAT')


2) Use FTS
SQL> select * from a
  2  where tablespace_name='SYSAUX';

3440 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1928539813

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  3362 | 87412 |    50   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| A    |  3362 | 87412 |    50   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TABLESPACE_NAME"='SYSAUX')

3) Use index
SQL> select * from a
  2  where tablespace_name ='F_MV_CUST';

16 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2623187831

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   627 | 16302 |    39   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| A     |   627 | 16302 |    39   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_A |   627 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TABLESPACE_NAME"='F_MV_CUST')


4) Use FTS
SQL>  select * from a
  2  where tablespace_name in ('SYSAUX','TBS_ODS_LRG_DAT');

3520 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1928539813

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  3990 |   101K|    50   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| A    |  3990 |   101K|    50   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TABLESPACE_NAME"='SYSAUX' OR
              "TABLESPACE_NAME"='TBS_ODS_LRG_DAT')

 
SQL> select * from a
  2  where  tablespace_name in ('TBS_ODS_LRG_DAT','F_MV_CUST');

96 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1928539813

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1255 | 32630 |    50   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| A    |  1255 | 32630 |    50   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TABLESPACE_NAME"='F_MV_CUST' OR
              "TABLESPACE_NAME"='TBS_ODS_LRG_DAT')


   Question :
   a. Is the histogram useless when we use the IN clause ? The optimizer will always perform the full table scan on table A. 
   
   Hope you understand my question. Thanks in advance.

Regards
SH

July 24, 2008 - 10:13 am UTC

a) huh??????? did you see the different estimated card= values? If histograms were 'useless', they would have been the same.

and it will not always perform a full scan - it totally depends....... The clustering factor has more to do with this than anything.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1032431852141

A reader, November 26, 2008 - 6:32 pm UTC

Hi Tom,

As per metalink Note:72539.1, I tested to see how optimizer behave if histogram is created. However, in the example which I tried (column a contained values 1..9, then value 1000 which repeated 90000 times) , I couldn't see the difference at all. Optimizer used the index all the time. Could you please take a look and explain why? I have used 10.2.0.4 in my test.

Thanks

November 28, 2008 - 5:12 pm UTC

how about this

you give us a tiny example. just the creates, the inserts and the sample output...

so we can see what you saw.

Perhaps you just asked for that column and the optimizer said "I'll use the index as if it were a table" to avoid full scanning the table.

You give us no clue HOW the index was used, to see if it was used "wrong"

Here is my example:

ops$tkyte%ORA10GR2> create table t ( x int, data char(20)  default 'x' );

Table created.

ops$tkyte%ORA10GR2> insert into t select level, 'x' from dual connect by level <= 9;

9 rows created.

ops$tkyte%ORA10GR2> insert into t select 1000, 'x' from dual connect by level <= 90000;

90000 rows created.

ops$tkyte%ORA10GR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats(user,'T', method_opt=>'for columns x size 254' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where x = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    24 |     2   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    24 |     2   (0
|*  2 |   INDEX RANGE SCAN          | T_IDX |     1 |       |     1   (0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"=1)

<b>used index range scan+table access by index rowid - just one row...</b>

ops$tkyte%ORA10GR2> select * from t where x = 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 90001 |  2109K|    86   (4)| 00:00:02
|*  1 |  TABLE ACCESS FULL| T    | 90001 |  2109K|    86   (4)| 00:00:02
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"=1000)

<b>said to itself "hey, this gets a lot of data, just full scan...</b>

ops$tkyte%ORA10GR2> select x from t where x = 1000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3163761342

------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       | 90001 |   263K|    42   (5)| 00:0
|*  1 |  INDEX FAST FULL SCAN| T_IDX | 90001 |   263K|    42   (5)| 00:0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"=1000)
<b>
Now this too says "lots of rows", but - it knows it can avoid the WIDE table for the SKINNY index - so it does so. It did not read the index like an index, but read the index as if it were a skinny table - that is all</b>

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Shift of paradigm

cursor_sharing and bind variable

Why are histograms organized in ranges only?

Range based histograms

Range based histograms

Dynamic sampling

Dynamic sampling

Dynamic sampling

On "cursor_sharing and bind variable " response

Got it now

Dynamic sampling

folklore

Bind Variable Peeking worthless?

Histograms - concepts question

Clarification

Cardinality Incorrect with Histograms

Further Column Info

So much for my indexes!

How to avoid soft parse in PL/SQL

When not to use binding

When not to use binding

Bind Peeking and Histogram

where are bind varible peeking values?

histogram

Histograms or Not

Hisogram for IN cluase

More to Explore

Performance

Are histograms worthless?

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Shift of paradigm

cursor_sharing and bind variable

Why are histograms organized in ranges only?

Range based histograms

Range based histograms

Dynamic sampling

Dynamic sampling

Dynamic sampling

On "cursor_sharing and bind variable " response

Got it now

Dynamic sampling

folklore

Bind Variable Peeking worthless?

Histograms - concepts question

Clarification

Cardinality Incorrect with Histograms

Further Column Info

So much for my indexes!

How to avoid soft parse in PL/SQL

When not to use binding

When not to use binding

Bind Peeking and Histogram

where are bind varible peeking values?

histogram

Histograms or Not

Hisogram for IN cluase

More to Explore

Performance