Thanks for the question.

Asked: May 27, 2001 - 11:42 am UTC

Last updated: June 07, 2013 - 1:45 pm UTC

Version: 8.1.5

Viewed 10K+ times! This question is

You Asked

Tom

Can you give a simple example of how to create a histogram, to see the performance of a query?

and Tom said...

Sure, run a script like this:

create table t
as
select 1 X, rpad('*',4000,'*') data from all_objects a
/
insert into t
select 2, rpad('*',4000,'*') from all_objects a where rownum = 1
/
create index t_idx on t(x)
/

alter session set sql_trace=true;

select count(data) from t t1 where x = 1;
select count(data) from t t1 where x = 2;

analyze table t compute statistics
for table
for all indexes
for all indexed columns;

select count(data) from t t2 where x = 1;
select count(data) from t t2 where x = 2;

(the for all indexed columns got the histograms for us on the indexed column X)...

Now the tkprof shows (your mileage WILL vary on this)

select count(data)
from
t t1 where x = 1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 4.46 29.99 30847 30864 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 4.46 29.99 30847 30864 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 43984

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
30802 TABLE ACCESS BY INDEX ROWID T
30803 INDEX RANGE SCAN (object id 123779)

select count(data)
from
t t1 where x = 2

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 1 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 1 3 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 43984

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
1 TABLE ACCESS BY INDEX ROWID T
2 INDEX RANGE SCAN (object id 123779)

that shows the optimizer (without any stats at this point) used the index. For "x=1" this was a very bad idea - we were reading almost every row already (all but 1) much faster to full scan. the optimizer did not realize that the indexed column was heavily skewed. so we ran stats and then:

select count(data)
from
t t2 where x = 1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 2.03 11.06 22744 30807 52 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 2.03 11.07 22744 30807 52 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 43984

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
30802 TABLE ACCESS FULL T

********************************************************************************

Ahh, thats much better. 1/2 the cpu time and much better on the elapsed time. as well.... Note that an index will still be used when appropriate:

select count(data)
from
t t2 where x = 2

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.00 0 3 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 43984

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
1 TABLE ACCESS BY INDEX ROWID T
2 INDEX RANGE SCAN (object id 123779)

follow up to comment one

the optimizer would use heuristics (rules) to see if it wanted to use the index in general or not and would come up with ONE plan that both queries would use. You would obviate the usefulness of histograms in this case.

this is not a reason to NOT USE bind variabes -- it simply points out one time where you might consider NOT using them

Rating

(93 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, May 27, 2001 - 11:30 pm UTC

What the CBO will do if the bind variable used for the x?

Data Skewness

A reader, December 03, 2003 - 5:06 pm UTC

Tom,

What is the best way to get the data skewness. For example, assume that we have an index on a table on column COL1. If it uses indexes for certain values and does a full-table scan for other values, then the approach would be :
1. Check the data skewness. Is that correct?
2. If the column in question is a integer column, how do we do we do that
3. If the column in question is a non-integer column, how do we do that

The reason I am asking this question is that when I generate a histogram for an integer column, I am able to use the ENDPOINT_NUMBER and ENDPOINT_VALUE to determine the data distribution by finding the difference in ENDPOINT_NUMBER between the previous and current row (after we sort on ENDPOINT_NUMBER).
However, for non-numeric columns, only ENDPOINT_NUMBER column shows the numbers. The ENDPOINT_VALUE shows some cryptic number and the ENDPOINT_ACTUAL_VALUE is NULL.

I request you to provide me a methodology to determine data skewness in general (both for numeric and non-numeric columns). The text books merely state that we need to check the data skewness, but does not provide an example or way to determine that. An example from you in this regard will be of great help
Also, please provide me an example and a clear explanation of the SIZE clause in ANALYZE command because I am not able to find clear explanation in Oracle documentation or in any performance tuning books I read so far.

December 03, 2003 - 7:00 pm UTC

select column, count(*) from t group by column;

look at the results, or maybe

select min(cnt), max(cnt), avg(cnt), stddev(cnt)
from ( select column, count(*) cnt from t group by column )

to see if there is skewness -- min far from max indicates skewed.

what is the best way to get skewness

reader, February 20, 2004 - 12:36 pm UTC

Tom, in 9i is WIDTH_BUCKET function useful to generate info about skewed data? If so, any example you have? Thanks.

February 20, 2004 - 1:30 pm UTC

it is a tool you can use to measure "skewedness".   for example:


ops$tkyte@ORA920PC> select min(object_id), max(object_id), count(object_id), wb
  2    from (
  3  select object_id,
  4         width_bucket( object_id,
  5                       (select min(object_id)-1 from all_objects),
  6                       (select max(object_id)+1 from all_objects), 5 ) wb
  7    from all_objects
  8         )
  9   group by wb
 10  /
 
MIN(OBJECT_ID) MAX(OBJECT_ID) COUNT(OBJECT_ID)         WB
-------------- -------------- ---------------- ----------
             3           8639             7972          1
          8640          17277             8638          2
         17278          25915             8619          3
         25916          34528             4751          4
         35159          43191              238          5
 

shows that if I bucketize into 5 buckets, most of my data is "low" 1,2,3.  object_ids over 26k are "few and far between"

it is an analysis tool you can use to see what you have.

Can you please explain above ,little detail for my tiny brain

A reader, February 21, 2004 - 3:35 am UTC

"
shows that if I bucketize into 5 buckets, most of my data is "low" 1,2,3.
object_ids over 26k are "few and far between"
"
I did not understand what is low 1,2,3. where how to see it is low.
Similarly where/how to see
object_ids over 26k are "few and far between". few must be the count *

February 21, 2004 - 11:09 am UTC

ops$tkyte@ORA920PC> select min(object_id), max(object_id), count(object_id), wb
  2    from (
  3  select object_id,
  4         width_bucket( object_id,
  5                       (select min(object_id)-1 from all_objects),
  6                       (select max(object_id)+1 from all_objects), 5 ) wb
  7    from all_objects
  8         )
  9   group by wb
 10  /

<b>I took the range:

low object_id-1 .. hi object_id + 1

and broken it into 5 equi-sized buckets (well, I didn't, the database did).  In this example, that was the range

2..43191.

Now those ranges are roughly the min and max object ids in the following report:</b>

MIN(OBJECT_ID) MAX(OBJECT_ID) COUNT(OBJECT_ID)         WB
-------------- -------------- ---------------- ----------
             3           8639             7972          1
          8640          17277             8638          2
         17278          25915             8619          3
         25916          34528             4751          4
         35159          43191              238          5
 

<b>I say roughly cause I do not have a continous set here -- 34529 for example "has gone missing" -- not in the set.  

So, now that we have those five ranges, we assign a bucket to each row in the result set based on which range they fall into.  

As you can see, range 1, 2, 3 have about 8k members each.  Most of my data is "low".  Most of my data has an object_id less than 26k.

When you look at buckets 4 and 5 here -- there is only 5k rows between them. 

Hence, the data is skewed,  most of it resides in the first 3 buckets.

do with that information "what you will" (eg: if those facts are un-interesting to you -- that is OK.  this function is useful to someone analyzing the data -- to them, this would be fascinating perhaps)</b>

Histogram

Peter Rhoades, February 21, 2004 - 2:02 pm UTC

Good example. However, our DBA runs a script that creates Histograms on EVERY column that's in EVERY index in EVERY table--whether it makes sense to have the Histogram stats or not. I think that's crazy but he says it "doesn't hurt." If this is not a new thread, what do you think? Can too many, needless histograms mess things up?

February 21, 2004 - 2:16 pm UTC

I typically start with the equivalent of "table", "index" and "all indexed columns" myself. If you have the cycles to compute the histograms, they can be helpful (with bind variable peeking and such).

Could you skip most/many of them? probably.
Would it change your systems? stats would take a little less time/resources.

I'm not opposed to what they are doing, it does more work then you need in theory. they might consider "auto" or "skewonly" in 9i.

what about bind variable peeking ?

Alberto Dell'Era, February 21, 2004 - 3:45 pm UTC

Imagine that a column has skewed values.

As an extreme case, all values are unique except one, say 0, which occurs n times, where n is big enough to make a fts cheaper than an index scan (we have an index on this column).

What if the app is using bind variables (and so i assume it's an OLTP application) and the app developers thought "well, we don't compute histograms, so the plan will always be the same, and will be an index scan. That's ok, since we bind 0 only in very very rare occasions, and we can afford to pay for a (very) suboptimal index scan in this very very rare occasion."

Now, the aforementioned DBA builds histograms since "they don't hurt", and immediately after 0 is bound ... voila', full table scans forever (ok, until the death of the shared sql stmt in the shared pool).

Isn't this an example of "hurting histograms" ?
Wouldn't it be better to avoid collecting them, if the app designers didn't ask for them ?

Alberto

February 21, 2004 - 4:29 pm UTC

the aforementioned DBA would have been building histograms since the beginning of time and you would have caught this in test/development since you always test/develop on real live sized result sets (or you spend lots of money on coffee and soda right before/during a rollout -- because you won't be getting any sleep)

I'll have to assume that what is "standard operating procedure" in production is also in test/dev -- else, well, you have bigger things to worry about than this.

My point is -- "if this is what they always do, from the get go, I don't see anything inheritly evil in it"

Sure, anytime you walk into a production system and say "Hey, we are going to make this change" and just do it -- you stand a risk of "something going wrong".

The app designers in my experience don't even know what a histogram is. They would not know to ask for that which they do not know about.

understood

Alberto Dell'Era, February 21, 2004 - 5:17 pm UTC

Thanks for your prompt answer. Well, I assumed the worst scenario since "it doesn't hurt" is exactly the same phrase that a DBA said during a nightly rollout of mine, before making a change to the prod db, and the Day After [pun intended] we experienced an incredible loss of performance ...

Got a Pavlovian Reflex to that phrase.

thanks
Alberto

PS What about automatically removing trailing empty lines from reviews/questions posts, like mine before ? I normally pay attention and get rid of them myself but sometimes I overlook them ...

why it's not good to generate histograms always

A reader, February 21, 2004 - 9:26 pm UTC

Hi

Recently we generated histograms for our application and most queries was deadly slow, when we got rid of histograms performance went back again except some queries.

My question is why queries run slow with histograms? Arent we giving more information to the optimizer in order to get a better plan? It seems the other way round, the more information you give the worse CBO performs!

February 22, 2004 - 9:28 am UTC

it depends -- and it really depends on your system characteristics.

Many times people have OLTP systems that have "open ended queries" that they fetch the first 10, 100 or so rows from and never get to the bottom. These (unless hinted as "first rows" or run on a system where thought was given to the optimizer_index_* init.ora parameters) can perform poorly in BOTH cases "apparently" (apparently perform poorly, but not really since the optimizer will optimize for throughput in such a system -- not initial response time. give it more info and it'll come up with plans that better optimize throughput response times at the expense of initial responses times. tell it first rows and it'll change that.)

But the bottom line here was -- you introduced "a rather large fundemental change into the system" and would have to expect some ramifications.

Do this -- develop a system from day one with histograms generated.

Then, take them away

And then report back "what happened". You will find pretty much any major league "change in the way we do business" is going to have such an effect. For example -- in the above thread with regards to the DBA that always gets table/index/column stats -- if you want to kill that system, stop gathering histograms and see what happens.

you said

A reader, February 22, 2004 - 4:25 am UTC

hi

You said that DBA should always generate histograms deelopment to catch bind variable peeking effects.

I always thought we should not generate histograms always and identify those columns with skewed data with analysist?!?!

As someone else already said, generating histograms always is deadly, I got burnt as well

February 22, 2004 - 9:38 am UTC

no, no I did not.

where did I say "dba should always generate histograms"

generating histograms "as a new operating procedure after the code is in place" is what is known as a "bad idea (tm)"

NOT generating histograms "as a new operating proceudre after the code is in place" is what is knowns as a "bad idea(tm)"

what I said was "do what you do, but do what you do consistently. don't use one method in test/dev and another DIFFERENT method. You most likely got burnt either

o testing in a database with little or no data
o testing in a database with stats gathered one way and a different way in prod

else, you could not "get burned" as you would have discovered this stuff right from the get go in test/dev.

What I propose is "consistency".

histogram's effect on performance

Ryan Gaffuri, March 02, 2004 - 10:09 pm UTC

there is a debate on oracle-l about excessive use of histogram's negatively effecting performance. Carrie Milsap posted the following:

"I don't have access to the data to back this up, but I have seen PARSE call response time move from over a minute to less than a second by eliminating histograms from the equation. I think when I last saw this, it was an Oracle8 system."

Do you have any comments on this? How often do you recommend gathering performance? I have been using 'FOR ALL COLUMNS' by default recently with no adverse affects on performance.

March 03, 2004 - 9:16 am UTC

and it could go the other way as well. histograms are just *data*. in the above, i would have to guess that histograms caused the optimizer to consider "more stuff", more join paths perhaps (but it is hard to say, it is really rare to have parse times of a minute).

the big thing with histograms many times is the amount of resources they take to compute in the first place. I start with "none" or "for all indexed columns" -- the key thing is to be consistent (in dev, test, and prod). In order to avoid surprises.

Why HISTOGRAM?

Jayesh, March 23, 2004 - 8:46 pm UTC

Can you please explain why should we go for histogram? What is real purpose of using histograms? What will be high and low value when using a varchar column?

March 24, 2004 - 8:27 am UTC

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96533/stats.htm#27027 <code>

stored procedure and histograms

Nikhil G Mishra, July 11, 2004 - 10:06 am UTC

Tom,

All the variables in queries within stored procedures, are bind variables.
Does that mean , histograms cant help in cases we have a poorly performing query in the stored procedure involving highly skewed values ? Because with bind variables, the CBO will generate a "fit all" explain plan based on its best guess estimates ?

nikhil

July 11, 2004 - 10:56 am UTC

it'll use bind variable peeking in 9i -- to determine the best fit plan.

describe to me the nature of a stored procedure that should sometimes full scan and sometimes not? frequency of run? who runs it? how does that fit in to the system (if interactive, it would give wildly different response times -- end users would deem it "unreliable").

height and width based

pooja, July 12, 2004 - 8:27 am UTC

hi tom,
this is very useful and now im able to do histograms
but i have a little bit confusion about height and width based histograms. please explain me with a simple example.

In CBO, the statistics are calculated based on the predicate in the SQL statement and on the IO, MEMORY and CPU reads..but how it will calculate statistics...
please expalin this also

July 12, 2004 - 11:27 am UTC

I'll recommend to you the performance guide, a freely available document part of the Oracle documentation set.....

Variable response times

Matt, July 12, 2004 - 7:17 pm UTC

Above at:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:707586567563#20833867221653 <code>

You say:

"describe to me the nature of a stored procedure that should sometimes full scan
and sometimes not? frequency of run? who runs it? how does that fit in to the
system (if interactive, it would give wildly different response times -- end
users would deem it "unreliable")."

We have a PL/SQL batch process which generates a report on data owned by particular users of the system. This data is highly skewed. 99.5% of the data is owned by one user the remaining .5% is split between 5 other users.

For some reason the whole report always worked fine when accessing generating data for the .5%, however, the 99.5% was extremely slow.

This report is run rarely (it was a report generated at the end of a data migration process to allow user to verify their data). It was manually run. However, I can see an upcoming requirement to run this report more often - and it's performance is "unreliable".

What (general - I'll post the detail another time when you are accepting questions) options for tuning this report were available to me?

In the end I 10046 traced the report and noted lots of index range scans and nested loop joins. I verified the stats (and histograms) - (dbms_stats "for all columns size auto" "cascade = true"). However, the optimizer failed to pick a full scan. Due to time pressure I've slotted lots of "full(t)" hints in the query and improved the reports execution time from 4+ hours to 15 minutes - for all cases of data.

Obviously you would need more information to comment, but what other approaches are available to get the optimizer to do what it is supposed to?....yes I will post this when I see the accepting questions button.

Best Regards,

July 12, 2004 - 11:45 pm UTC

a report that is run rarely against known skewed data can certainly use "no binds".

dynamic sql via ref cursors might be the answer.

Thanks

Matt, July 13, 2004 - 3:06 am UTC

The report uses dynamic sql and ref cursors. Currently the skewed data columns are not literals, though.

I will give this approach a whirl and see whether I get a good a run time.

How intellinget is CBO now

A reader, July 28, 2004 - 9:39 pm UTC

Hi Tom, one hard question,
I had been trying to test it,
but I didn't get it, so I ask you.

Now, the CBO, ONLY use histogram info, when he have not binded variables like ='ADM'
or there is some exception that allows you for example i don't know in the soft parse to take advantage of histogram info.

Thanks Tom

July 29, 2004 - 7:31 am UTC

that is not true, there is bind variable peeking going on at hard parse time too (search for peeking on this site to read more)

during the soft parse, the plan is in place -- literal or not - the histograms would not be consulted.

A reader, July 29, 2004 - 10:09 am UTC

Thanks Tom
From
OracleWorld Copenhagen, Database Track
BIND VARIABLES AND CURSOR SHARING – NEW DIRECTIONS IN
ORACLE9I
Bjørn Engsig, Miracle A/S

"BIND VARIABLE PEEKING IN ORACLE9I In Oracle9i, the optimizer will peek at actual values of bind variables if these are available. By doing this, the optimizer can use generated table/index- or column-level statistics when bind variables are used. This will benefit both applications that themselves are written to use bind variables, and applications that have had literals replaced by bind variables using the cursor_sharing parameter. The tests that we have conducted show that the bind variable peeking works as expected: the optimizer will make its decision based on actual values of bind variables. However, once an execution plan is generated by the optimizer, no further peeking into bind variables will take place; neither from the same session or from a different session parsing the same SQL statement. In both cases, the same SQL statement and associated execution plan is found in the library cache."

July 29, 2004 - 1:00 pm UTC

that is correct -- that is in line with what I said. during hard parse, they would be used, during soft parse -- not.

How to get literal from endpoint_value

A reader, July 30, 2004 - 2:17 pm UTC

Hi Tom, please
1) Could you please tell me how to get the values from dba_histograms view,
for example instead of 4.12433214 to get 'NY'
something more friendly to analyze (why ENDPOINT_ACTUAL_VALUE column is not always filled)
2) Do you agree with oracle documentation when it says to use size auto
I think is something like this
DBMS_STATS.GATHER_SCHEMA_STATS( OWNNAME=>'ADM', ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL COLUMNS SIZE AUTO ');
3) Which is the real different between
DBMS_STATS.GATHER_SCHEMA_STATS( OWNNAME=>'ADM', ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL COLUMNS SIZE AUTO ');
and
DBMS_STATS.GATHER_SCHEMA_STATS( OWNNAME=>'ADM', ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL COLUMNS SIZE SKEWONLY ');
4) How I know the column is heigh or value based histogram

Thanks Tom

July 30, 2004 - 6:03 pm UTC

1) you'll want to search for:

endpoint_value endpoint_actual_value

on this site, but:


ops$tkyte@ORA9IR2> create table t ( x varchar2(26) );
 
Table created.
 
ops$tkyte@ORA9IR2> insert into t values ( 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' );
 
1 row created.
 
ops$tkyte@ORA9IR2> insert into t values ( 'XXXX' );
 
1 row created.
 
ops$tkyte@ORA9IR2> insert into t values ( 'YYYY' );
 
1 row created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace function hexstr( p_number in number ) return varchar2
  2  as
  3          l_str long := to_char(p_number,'fm'||rpad('x',50,'x'));
  4          l_return varchar2(4000);
  5  begin
  6          while ( l_str is not null )
  7          loop
  8                  l_return := l_return || chr(to_number(substr(l_str,1,2),'xx'));
  9                  l_str := substr( l_str, 3 );
 10          end loop;
 11          return l_return;
 12  end;
 13  /
 
Function created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics for all columns;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> column column_name format a10
ops$tkyte@ORA9IR2> column ENDPOINT_ACTUAL_VALUE format a20
ops$tkyte@ORA9IR2> select endpoint_number, endpoint_value, endpoint_actual_value,
  2         to_char(endpoint_value,rpad('x',32,'x') ),
  3         substr( hexstr( endpoint_value ), 1, 30 ) ev
  4  from dba_tab_histograms where table_name = 'T' and owner = user;
 
ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
--------------- -------------- --------------------
TO_CHAR(ENDPOINT_VALUE,RPAD('X',3 EV
--------------------------------- ------------------------------
              0     3.3884E+35
   41424344454657fd6de9c0ccc00000 ABCDEFW&#65533;m&#65533;&#65533;&#65533;
 
              1     4.6393E+35
   59595958ffffefab8e7525d2400000 YYYX&#65533;&#65533;&#64206;u%&#65533;@



it is not as simple as "a low bound"....


2) I'm thinking you don't need histograms too often and given how expensive they are to compute -- for all columns is hugely expensive, so I might be looking at doing it just for the columns deemed necessary.


3) search for skewonly, we've written about this before.

4) depends on the cardinality -- searching for the terms in #1 will lead you to some discussions on that.

A reader, July 30, 2004 - 6:13 pm UTC

Thanks Tom

endpoint_actual_value

A reader, August 17, 2004 - 10:13 pm UTC

I gathered histograms on a table with

method_opt=>'for all columns size 254'.

When I query *_tab_histograms for a specific column in that table, I get my 254 rows, but the ENDPOINT_ACTUAL_VALUE for all of them is null.

Why is this? This is throwing off the cardinality estimates for queries involving where column like 'blah%'

Thanks

endpoint_actual_value

A reader, August 17, 2004 - 11:05 pm UTC

OK the endpoint_value (the number in scientific notation) is probably the hex representation of my string or something.

anyway, i am trying to understand the following

select * from table
where col like 'A%' or col like 'B%'

col has a UNIQUE index, it has histograms using 254 buckets, the no. of distinct values of col is 20000

The above query works fine, the cardinality estimate for each of the CONCATENATION steps is close to reality.

But when I add another OR predicate like

select * from table
where col like 'A%' or col like 'B%' or col like 'C%'

the query tanks, the plan shows a CONCATENATION of 3 INDEX RANGE SCANS, each with a cardinality of ONE! The total card for the query is 12 but the query returns 2000 rows.

I tried to create a simple, concise test case for you

create table t (i varchar2(6) unique);

insert into t select dbms_random.string('x',6) from all_objects where rownum<=20000;

exec dbms_stats.gather_table_stats(null,'t',method_opt=>'for all columns size 254')

select * from t
where i like 'A%' or i like 'B%' or i like 'C%'

I guess since the table is so skinny, it chooses a FTS for this instead of the 3 individual range scans and a concatenation, so I am unable to reproduce what I am seeing in my actual table as I mention above.

Help? Thanks

August 18, 2004 - 7:58 am UTC

when you have more values than buckets (254 in your case), you have to expect "guestimation" to kick in -- you cannot represent the actual cardinality in the data dictionary. you get ranges and within each range -- you get estimates.

throw in a "like" or two and you really guess. (your C values could be crossing many "buckets" there)

but why would 2000 rows out of 20,000 via an index be necessarily "a bad thing (tm)"?

and are you really sure you want histograms on every column? especially columns with lots and lots of distinct values like this one?

Histograms

A reader, August 18, 2004 - 10:50 am UTC

<quote>
but why would 2000 rows out of 20,000 via an index be necessarily "a bad thing (tm)"?
<quote>

Its not bad, the query performs fine

select acctno,blah,blah from <view>
where acctno like 'A%' or acctno like 'B%' or acctno like 'C%'

performs great. Returns 2000 rows from possible 20000. Fine.

As soon as I change this to

select acctno,blah,blah from <view>
where acctno like 'A%' or acctno like 'B%' or acctno like 'C%'
order by acctno

it tanks (I posted this in another review recently also, sorry, maybe we can continue this discussion here)

"and are you really sure you want histograms on every column? especially columns with lots and lots of distinct values like this one?"

OK I just deleted stats on the 3 tables involved in the view and re-gathered them using dbms_stats 'for all indexed columns' and now the query is great even with the order by!

So, seems like the histograms on the UK (acctno) were actually hurting the performance!

Questions

1. I didnt know that having histograms can actually be detrimental to queries? Does this make sense? I thought that , if anything, they can only help?

2. I noticed that the overall cardinality of the query with the good performance is 10, but the actual rows returned is 2000.

The performance is good, so I dont really care, but if the CBO estimated card is so much off, is there a problem lurking out there that might adversely affect other queries using these tables?

Thanks

August 18, 2004 - 10:59 am UTC

need to *see the plans*

for histograms it depends on the number of values and buckets. when you have 254 buckets but 2000 values -- well, you don't have a bucket/value so it can be mislead at that point.

Histograms

A reader, August 18, 2004 - 11:31 am UTC

Darn it, now no matter what I do, I cannot reproduce the "bad plan" that was resulting in the poor performance with the order by.

I deleted all stats on the 3 tables using

dbms_stats.delete_table_stats(null,'table_name')

re-gathered them with the 254-bucket histograms and everything is still good!

This is what frustrates me sometimes about the CBO, everything is too magical and automatic, it is hard to reproduce stuff.

Oh well, I will keep an eye on it and see if the problem reoccurs.

Thanks

Automagical parameters

Ronald, October 29, 2004 - 9:41 am UTC

Problem with CBO and gather_stats:

sql> @desc tshi
Elapsed: 00:00:00.00
Datatypes for Table tshi

Data Data
Column Name Type Length Nullable Dafault COMMENTS
-------------------------- -------------- ---------- -------- -------- ---------
SHI_NUM NUMBER (4,0) not null
POL_NUM VARCHAR2 10 not null
REASN_CODE VARCHAR2 3 not null
ACCT_MNE_CD VARCHAR2 8 not null
SUSP_DT DATE 7 not null
TRXN_CD VARCHAR2 8 not null
SUSP_AMT NUMBER (13,2) not null
CRCY_CODE VARCHAR2 2 not null
TRXN_ID VARCHAR2 15 null
UNDO_TRXN_ID VARCHAR2 15 null
SUSP_REASN VARCHAR2 3 null
POL_STAT_CD VARCHAR2 1 null
FREZ_CODE VARCHAR2 1 null
PD_TO_DT DATE 7 null
MODE_PREM NUMBER (11,2) null
PMT_MODE VARCHAR2 5 null
Elapsed: 00:00:00.01

Indexes on tshi
Index Is
Name Unique columns
------------------------------------------------------------ ------ --------------------------------
SHI_PK Yes POL_NUM, SHI_NUM, SUSP_DT
SHI_IDX1 No TRXN_ID
I_SHI_POL_FK No POL_NUM
Elapsed: 00:00:00.04

sql> select min(TRXN_ID), max(TRXN_ID), count(TRXN_ID), wb
2 from ( select TRXN_ID,
3 width_bucket( to_number(TRXN_ID), (select to_number(min(TRXN_ID)) from TSHI),
4 (select to_number(max(TRXN_ID)) from TSHI), 10) wb
5 from TSHI) group by wb ;

MIN(TRXN_ID) MAX(TRXN_ID) COUNT(TRXN_ID) WB
--------------- --------------- -------------- ----------
001304030000001 020405190002259 26300 1
041304030000001 060405190002259 26300 3
061304030000001 080405190002259 26300 4
081304030000001 098405190002259 23670 5
200406011220485 200406011220486 2 11
021304030000001 040405190002259 26300 2
200206040000008 200406011220484 4623660 10
0

8 rows selected.

Elapsed: 00:00:30.04

cas@chkp.world> select count(*) from tshi ;

COUNT(*)
----------
5290940

1 row selected.

SQL and autotrace plans:

sql> select *
from tshi
where undo_trxn_id is null
and trxn_id >= '200304150000003'
and pol_num = '2870000013'
order by trxn_id, shi_num ;

<<run forever>>

Plan 1:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=100)
1 0 SORT (ORDER BY) (Cost=7 Card=1 Bytes=100)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'TSHI' (Cost=5 Card=1 Bytes=100)
3 2 INDEX (RANGE SCAN) OF 'SHI_IDX1' (NON-UNIQUE) (Cost=3 Card=1)

Plan 2:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=100)
1 0 SORT (ORDER BY) (Cost=8 Card=1 Bytes=100)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'TSHI' (Cost=6 Card=1 Bytes=100)
3 2 INDEX (RANGE SCAN) OF 'SHI_PK' (UNIQUE) (Cost=3 Card=22)

Plan 3:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=75)
1 0 SORT (ORDER BY) (Cost=7 Card=1 Bytes=75)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'TSHI' (Cost=5 Card=1 Bytes=75)
3 2 INDEX (RANGE SCAN) OF 'I_SHI_POL_FK' (NON-UNIQUE) (Cost=3 Card=16)

SHI_NUM POL_NUM REA ACCT_MNE SUSP_DT TRXN_CD SUSP_AMT CR TRXN_ID
---------- ---------- --- -------- ------------------ -------- ---------- -- ---------------
UNDO_TRXN_ID SUS P F PD_TO_DT MODE_PREM PMT_M
--------------- --- - - ------------------ ---------- -----
880 2870000013 301 SMMKKY 15-Apr-03 12:00 XOOREZ 1500 04 200
304150000003
6 0 04-Apr-03 12:00 750 01

881 2870000013 108 SMMKKY 15-Apr-03 12:00 XOOREZ -1500 04 200
304150000004
6 0 04-Apr-03 12:00 750 01

2 rows selected.

Elapsed: 00:00:00.46

Test results:
(tshi /w) (autotrace plan)
1) no stats ----> Plan 3
2) analyze + estimate ----> Plan 3
3) analyze + compute ----> Plan 1
4) dbms_stats + estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE + method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO' ----> Plan 1
5) dbms_stats + method_opt => 'FOR ALL COLUMNS SIZE 5' ----> Plan 2

Question:
1) Why the CBO tends to choose the bad index? (since from the predicate itself should a equal condition more selective than that of a range condition??? (very confusing to me.))
2) What is the correct (->less risk) bucket size? seemed the automagical values is not doing good, should it be just 254 all the way for the sake of simplicity of it?
3) do you actually have a post here on how to search for answers, yes=there are almost all the answers here already, but the words are simply too common to locate what I need, want to know where magical keywords like show_space, skewonly from?

October 29, 2004 - 10:43 am UTC

can we see a tkprof . with so few rows "runs forever" doesn't compute to me

how many buckets did "auto" actually come up with.

do you use binds in the "real world"

Re: Automagical parameters

Ronald, October 29, 2004 - 2:10 pm UTC

the "run forever" is just it take so longer I never bother to wait till finish.

the "auto" produced 2 buckets

yes we did, but it makes the story longer,
with bind variable

the CBO seemed always always pick SHI_PK
why is it so???

and if c/200304150000003/100304150000003/
it magically picks I_SHI_POL_FK for different stats analyzed

I am very much wanted to know why???

does this has anything to do with peeking? (or halloween)

Re: Re

Ronald Chan, November 01, 2004 - 12:11 pm UTC

The "auto" always produced 2 buckets

how to parameter gather_stats in order to safely help the optimizer make decisions.

Histograms

Ronald, November 02, 2004 - 4:14 am UTC

Trace output for when the query use wrong index and run for 1:15 hour

ora_3376.trc:

Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /test/oracle/product/9.2.0.4
System name: HP-UX
Node name: hkgxd09
Release: B.11.00
Version: U
Machine: 9000/800
Instance name: tun8
Redo thread mounted by this instance: 1
Oracle process number: 11
Unix process pid: 3376, image: oracle@hkgxd09 (TNS V1-V3)
*** 2004-11-02 11:14:58.848
*** SESSION ID:(17.1593) 2004-11-02 11:14:58.848
QUERY
SELECT * FROM TSHI
WHERE POL_NUM = '2870000013'
AND TRXN_ID >= '200304150000003'
AND UNDO_TRXN_ID IS NULL
ORDER BY TRXN_ID, SHI_NUM
***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
OPTIMIZER_FEATURES_ENABLE = 9.2.0
OPTIMIZER_MODE/GOAL = Choose
_OPTIMIZER_PERCENT_PARALLEL = 101
HASH_AREA_SIZE = 131072
HASH_JOIN_ENABLED = TRUE
HASH_MULTIBLOCK_IO_COUNT = 0
SORT_AREA_SIZE = 65536
OPTIMIZER_SEARCH_LIMIT = 5
PARTITION_VIEW_ENABLED = FALSE
_ALWAYS_STAR_TRANSFORMATION = FALSE
_B_TREE_BITMAP_PLANS = TRUE
STAR_TRANSFORMATION_ENABLED = TRUE
_COMPLEX_VIEW_MERGING = TRUE
_PUSH_JOIN_PREDICATE = TRUE
PARALLEL_BROADCAST_ENABLED = TRUE
OPTIMIZER_MAX_PERMUTATIONS = 2000
OPTIMIZER_INDEX_CACHING = 0
_SYSTEM_INDEX_CACHING = 0
OPTIMIZER_INDEX_COST_ADJ = 100
OPTIMIZER_DYNAMIC_SAMPLING = 1
_OPTIMIZER_DYN_SMP_BLKS = 32
QUERY_REWRITE_ENABLED = FALSE
QUERY_REWRITE_INTEGRITY = ENFORCED
_INDEX_JOIN_ENABLED = TRUE
_SORT_ELIMINATION_COST_RATIO = 0
_OR_EXPAND_NVL_PREDICATE = TRUE
_NEW_INITIAL_JOIN_ORDERS = TRUE
ALWAYS_ANTI_JOIN = CHOOSE
ALWAYS_SEMI_JOIN = CHOOSE
_OPTIMIZER_MODE_FORCE = TRUE
_OPTIMIZER_UNDO_CHANGES = FALSE
_UNNEST_SUBQUERY = TRUE
_PUSH_JOIN_UNION_VIEW = TRUE
_FAST_FULL_SCAN_ENABLED = TRUE
_OPTIM_ENHANCE_NNULL_DETECTION = TRUE
_ORDERED_NESTED_LOOP = TRUE
_NESTED_LOOP_FUDGE = 100
_NO_OR_EXPANSION = FALSE
_QUERY_COST_REWRITE = TRUE
QUERY_REWRITE_EXPRESSION = TRUE
_IMPROVED_ROW_LENGTH_ENABLED = TRUE
_USE_NOSEGMENT_INDEXES = FALSE
_ENABLE_TYPE_DEP_SELECTIVITY = TRUE
_IMPROVED_OUTERJOIN_CARD = TRUE
_OPTIMIZER_ADJUST_FOR_NULLS = TRUE
_OPTIMIZER_CHOOSE_PERMUTATION = 0
_USE_COLUMN_STATS_FOR_FUNCTION = TRUE
_SUBQUERY_PRUNING_ENABLED = TRUE
_SUBQUERY_PRUNING_REDUCTION_FACTOR = 50
_SUBQUERY_PRUNING_COST_FACTOR = 20
_LIKE_WITH_BIND_AS_EQUALITY = FALSE
_TABLE_SCAN_COST_PLUS_ONE = TRUE
_SORTMERGE_INEQUALITY_JOIN_OFF = FALSE
_DEFAULT_NON_EQUALITY_SEL_CHECK = TRUE
_ONESIDE_COLSTAT_FOR_EQUIJOINS = TRUE
_OPTIMIZER_COST_MODEL = CHOOSE
_GSETS_ALWAYS_USE_TEMPTABLES = FALSE
DB_FILE_MULTIBLOCK_READ_COUNT = 32
_NEW_SORT_COST_ESTIMATE = TRUE
_GS_ANTI_SEMI_JOIN_ALLOWED = TRUE
_CPU_TO_IO = 0
_PRED_MOVE_AROUND = TRUE
***************************************
BASE STATISTICAL INFORMATION
***********************
Table stats Table: TSHI Alias: TSHI
TOTAL :: CDN: 5301581 NBLKS: 77544 AVG_ROW_LEN: 88
-- Index stats
INDEX NAME: I_SHI_POL_FK COL#: 2
TOTAL :: LVLS: 2 #LB: 18481 #DK: 324526 LB/K: 1 DB/K: 1 CLUF: 357471
INDEX NAME: SHI_IDX1 COL#: 9
TOTAL :: LVLS: 2 #LB: 17940 #DK: 4595967 LB/K: 1 DB/K: 1 CLUF: 4400837
INDEX NAME: SHI_PK COL#: 2 1 5
TOTAL :: LVLS: 2 #LB: 27726 #DK: 5301581 LB/K: 1 DB/K: 1 CLUF: 405855
_OPTIMIZER_PERCENT_PARALLEL = 100
***************************************
SINGLE TABLE ACCESS PATH
Column: POL_NUM Col#: 2 Table: TSHI Alias: TSHI
NDV: 324526 NULLS: 0 DENS: 3.0814e-06
NO HISTOGRAM: #BKT: 1 #VAL: 2
Column: UNDO_TRXN_ Col#: 10 Table: TSHI Alias: TSHI
NDV: 2916 NULLS: 5287899 DENS: 3.4294e-04
NO HISTOGRAM: #BKT: 1 #VAL: 2
Column: TRXN_ID Col#: 9 Table: TSHI Alias: TSHI
NDV: 4595967 NULLS: 549056 DENS: 2.1758e-07
NO HISTOGRAM: #BKT: 1 #VAL: 2
TABLE: TSUSPENSE_HISTORIES ORIG CDN: 5301581 ROUNDED CDN: 1 CMPTD CDN: 0
Access path: tsc Resc: 7459 Resp: 1865
Access path: index (equal)
Index: I_SHI_POL_FK
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 5
IX_SEL: 0.0000e+00 TB_SEL: 3.0814e-06
Access path: index (scan)
Index: SHI_IDX1
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 4
IX_SEL: 2.2177e-07 TB_SEL: 2.2177e-07
Skip scan: ss-sel 0 andv 9989
ss cost 9989
index io scan cost 0
Access path: index (scan)
Index: SHI_PK
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 5
IX_SEL: 3.0814e-06 TB_SEL: 3.0814e-06
Access path: index (equal)
Index: I_SHI_POL_FK
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 3
IX_SEL: 3.0814e-06 TB_SEL: 3.0814e-06
Access path: index (index-only)
Index: SHI_IDX1
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 3
IX_SEL: 2.2177e-07 TB_SEL: 2.2177e-07
SORT resource Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 1
Blocks to Sort: 1 Row size: 21 Rows: 2
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
Total Temp space used: 0
BEST_CST: 4.00 PATH: 4 Degree: 1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]: TSHI [TSHI]
ORDER BY sort
SORT resource Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 1
Blocks to Sort: 1 Row size: 93 Rows: 1
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
Total Temp space used: 0
Best so far: TABLE#: 0 CST: 6 CDN: 1 BYTES: 75
****** Recost for parallel table scan *******
***************************************
SINGLE TABLE ACCESS PATH
TABLE: TSHI ORIG CDN: 5301581 ROUNDED CDN: 1 CMPTD CDN: 0
Access path: tsc Resc: 7459 Resp: 1865
BEST_CST: 1865.00 PATH: 2 Degree: 4
***********************
Join order[1]: TSHI [TSHI]
ORDER BY sort
SORT resource Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 1
Blocks to Sort: 1 Row size: 93 Rows: 1
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
Total Temp space used: 0
SORT response Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 4
Blocks to Sort: 1 Row size: 93 Rows: 1
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
SORT resource Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 1
Blocks to Sort: 1 Row size: 93 Rows: 1
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
Total Temp space used: 0
Final:
CST: 6 CDN: 1 RSC: 6 RSP: 6 BYTES: 75
IO-RSC: 6 IO-RSP: 6 CPU-RSC: 0 CPU-RSP: 0

2) "auto" created 2 buckets

3) yes we use bind in "real world"
but just wanted to know how to create "reliable" statistics for this case.

May 11, 2012 - 7:30 am UTC

Which Tables to have histogram?

A reader, February 10, 2005 - 6:14 am UTC

Tom,
I have created histograms on some big tables(10 to 40 million rows) and benefited. I do not want to create histograms on all tables. Is there a way I can find which tables will be best candidates for creating histograms? Is there a way to quantify how much I will get benefited, performacnewise - apart from doing it in stress/test environment?
Thanks,

February 11, 2005 - 2:58 am UTC

#b
#b

do you have some knowledge of your data?

if so, you might know the candidate columns (which ones are skewed) and whether you place predicates on them at all.

if you do not, you can query:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:30966075177614 <code>

and see if a column you predicate on frequently is "skewed" and may benefit from this.

Helpful

A reader, February 13, 2005 - 1:07 pm UTC

Tom,

Your reply & the link was very helpful. I can decide about the bucket width easily with this. However, AUTO or SKEWONLY - I have found to address this effectively. Any comments?

Also, I am concerned with deciding what tables to collect histograms on, rather than the other details once we decide what table. Histograms take long time to collect, even on big servers - atleast for big tables. And since all my tables are MONITORED, I do not know when actually the stats willbe gathered - it could be right in the core business hour of my DSS. So, I prefer to be very selective in collecting histograms. Any input, guidelines?
Thanks,

February 13, 2005 - 4:20 pm UTC

this is simple a query for you to visualize the data, nothing more, nothing less.

monitoring does NOT mean you do not "know" when... you still call dbms_stats.gather_.... it'll only happen during core business hours if YOU make it so (so don't do that)

auto and skewonly do what they do -- dbms_stats peeks at the data sort of like I did and figures out if it wants to.

How to delete histogram stats?

Jim, April 29, 2005 - 6:16 pm UTC

How do I delete histogram stats related to a particular column from data dictionary? Thanks.

April 29, 2005 - 6:48 pm UTC

ops$tkyte@ORA10GR1> create table t as select * from all_users;
 
Table created.
 
ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns size 254' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
 
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
CREATED                                22
USERNAME                               32
USER_ID                                32
 
ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> exec dbms_stats.delete_column_stats( user, 'T', 'USERNAME' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10GR1> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
 
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
CREATED                                22
USER_ID                                32

Not getting the same output

Jagjeet Singh, November 05, 2005 - 10:16 am UTC

Hi,

 I am not getting the same result.
 Can you pls. look into it.

SQL> drop table t;

Table dropped.

SQL> create table t as select * from dba_objects;

Table created.

SQL> update t set object_type = 'TABLE';

5795 rows updated.

SQL> update t set object_type = 'INDEX' where rownum = 1;

1 row updated.

SQL> create index ind_t on t (object_Type) ;

Index created.

SQL> analyze table t compute statistics;

Table analyzed.


select * 
from
 t where object_Type = 'TABLE'


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      388      0.12       0.11          0        684          0        5794
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      390      0.12       0.12          0        684          0        5794

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 23  

Rows     Row Source Operation
-------  ---------------------------------------------------
   5794  TABLE ACCESS FULL T (cr=684 r=0 w=0 time=61709 us)

*********************************************

select * 
from
 t where object_type = 'INDEX' 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.01       0.01          0        312          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.02       0.01          0        312          0           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 23  

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS FULL T (cr=312 r=0 w=0 time=10384 us)

**********************************************

November 06, 2005 - 8:02 am UTC

well, you didn't do what I did - I gathered histograms, you did not. (the question was all about *histograms*)

here is the example updated to use dbms_stats properly:


ops$tkyte@ORA10GR2> create table t
  2  as
  3  select 99 id, a.*
  4    from all_objects a;

Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> update t set id = 1 where rownum = 1;

1 row updated.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> create index t_idx on t(id);

Index created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> begin
  2      dbms_stats.gather_table_stats
  3      ( user,
  4       'T',
  5        method_opt=>'for all indexed columns size 254',
  6        cascade => true );
  7  end;
  8  /

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> select * from t where id = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    96 |     2   (0)| 00:
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    96 |     2   (0)| 00:
|*  2 |   INDEX RANGE SCAN          | T_IDX |     1 |       |     1   (0)| 00:
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)

ops$tkyte@ORA10GR2> select * from t where id = 99;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 49996 |  4687K|   165   (3)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    | 49996 |  4687K|   165   (3)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"=99)

ops$tkyte@ORA10GR2> set autotrace off

Same path but diff. LIOs

Jagjeet Singh, November 06, 2005 - 3:33 am UTC

Both queries using the same path but diff.
block gets.

November 06, 2005 - 8:29 am UTC

That is due to the way blocks are gotten from the cache and the number of rows returned.

....
Fetch 388 0.12 0.11 0 684 0 5794
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 390 0.12 0.12 0 684 0 5794

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 23

Rows Row Source Operation
------- ---------------------------------------------------
5794 TABLE ACCESS FULL T (cr=684 r=0 w=0 time=61709 us

SQLPlus array fetches 15 rows at a time. There are about 310 blocks in this table (based on the LIO's from the second query which got all of the data in one call). So, you have 18-19 rows per block (5794/310).

When you fetched rows 1..15, we went to the buffer cache got block 1 and copied the 15 rows off of it and sent them back.

When you fetched 16..30, we went to the buffer cache and got block 1 again to get the last 3 or 4 rows, then got block 2 from the buffer cache and got say 12 rows and sent them back.

When you fetched 31..45, we went to the buffer cache to get block 2 again, got 6 or 7 remaining rows and then got block 3 to get 8 or 9 more and sent them back.

We read every block two times pretty much from the buffer cache (increase the arraysize in sqlplus and you'll see a decrease in IO).

Then, you full scanned the table to get a single row - we read block 1, didn't find it, read block 2, didn't find it, read block 3 - say we found it, then we continued on reading every block (not finding anymore data) and returned the single row - we read each block once.

Re:

Jagjeet Singh, November 06, 2005 - 10:02 am UTC

 Thanks for this explaination.


 o  I used "analyze table t compute statistics" which is equal of  

   "analyze table t compute statistics for table for all indexes for 
   all indexd columns".
    
    and it generates histograms. Do you mean no. of buckets is insuff.

 o  As you said diff. in LIO because of the way of getting the block from
    cache for two queries.

   In first query oracle scans 1-15 [ as per arraysize ] and send them 
   back and again picks the block 1 for rest records. that's why two 
   gets for per block.
 
   For second query, Oracle picks the 1 block and scans it and then
   picks the next block. In this case, oracle is scanning the whole 
   block, why there is no limitation with 15 rows. 

   Is this because oracle does not need to send data back using sqlnet
   is it correct ?
 
   If this is the case,     I tried this with procedure using two diff. cursors. like ..

          create procedure .. as
          begin
          for i in ( First query ) loop
          null;
          end loop;
          for ii in ( Second Query ) loop
          ... same ..

Just accessing the data. -- Nothing to send back ---- But still there is diff. in LIO's.


 o   One more case --  without histograms  ---

SQL> create table t as select 1 id,
  2  a.* from dba_objects a ;

Table created.

SQL> create index ind_t on t ( id );

Index created.

SQL> analyze table t compute statistics
  2  for table for all indexes;

Table analyzed.

SQL> set autot traceonly 
SQL> select * from t where id = 1 ;

8195 rows selected.


----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=74 Card=82 Bytes=7298)
1  0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=74 Card=82 BY..
2  1     INDEX (RANGE SCAN) OF 'IND_T' (NON-UNIQUE) (Cost=71 Card=33)



Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1582  consistent gets   <-----


SQL> select /*+FULL(t)*/ * from t where id = 1 ;

8195 rows selected.


Execution Plan
----------------------------------------------------------
0   SELECT STATEMENT Optimizer=CHOOSE (Cost=75 Card=82 Bytes=7298)
1  0   TABLE ACCESS (FULL) OF 'T' (Cost=75 Card=82 Bytes=7298)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        967  consistent gets <-------
 

 I have not created histograms but still CBO has this
 information that this table has 8195 records and there is
 only one distinct value. But still CBO is choosing index scan?

 
SQL> select num_rows,distinct_keys from user_indexes where index_name
 = 'IND_T' ;


  NUM_ROWS DISTINCT_KEYS
---------- -------------
      8195             1


Thanks,
Js

November 06, 2005 - 10:26 am UTC

1) it is not equivalent to that.  


ops$tkyte@ORA10GR2> create table t as select object_id, object_name, object_type from all_objects;

Table created.

ops$tkyte@ORA10GR2> create index t_idx on t(object_id);

Index created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> analyze table t compute statistics;

Table analyzed.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_ID                     2
OBJECT_TYPE                   2
OBJECT_NAME                   2

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> analyze table t delete statistics;

Table analyzed.

ops$tkyte@ORA10GR2> analyze table t compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;

Table analyzed.

ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_ID                    76

very different.

2) I do not understand why you expect two different queries to have the same IO's in the first place?  

3) it did not have that information, you specifically say "don't look at the columns".  You skipped all column level information.

Why this is comming in BOLD

Jagjeet Singh, November 06, 2005 - 10:04 am UTC

I do not know why my followup is in BOLD ?

November 06, 2005 - 10:29 am UTC

there is an unbalanced tag on the page somewhere, don't worry about it.

I use hash-B and hash-b for and tags - someone posted something with a hash-B in it

Now I know why the other post is in bold :) ... On to the histograms.

Vladimir Sadilovskiy, November 12, 2005 - 2:33 pm UTC

Tom, I have a table that usually contains a sinle value in a column. create table t1 as select * from all_objects; insert /*+ append */ into t1 select * ftom t1; ... update t1 set object_name='30_CHARACTERS_STRING_PPPPPPPPP'; dbms_stats.gather_table_stats(null,tabname=>'t1',method_opt=> 'for all columns size auto'); I have this simple query that is extracted from a bigger one: select * from t1 where object_name != :1; Now lets explain series of them with different imput: explain plan for select * from t1 where object_name != '1'; PLAN_TABLE_OUTPUT ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)| |* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T1"."OBJECT_NAME"<>'1') explain plan for select * from t1 where object_name != 'A'; PLAN_TABLE_OUTPUT ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)| |* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T1"."OBJECT_NAME"<>'A') explain plan for select * from t1 where object_name != 'ABC'; PLAN_TABLE_OUTPUT ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)| |* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T1"."OBJECT_NAME"<>'ABC') explain plan for select * from t1 where object_name != 'ABCDEF'; PLAN_TABLE_OUTPUT ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)| |* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T1"."OBJECT_NAME"<>'ABCDEF') explain plan for select * from t1 where object_name != 'ABCDEFG'; PLAN_TABLE_OUTPUT ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 100 | 3703 (5)| |* 1 | TABLE ACCESS FULL | T1 | 1 | 100 | 3703 (5)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T1"."OBJECT_NAME"<>'ABCDEFG') SQL> Please, take a look at estimated cardinality. Our problem that joining to such table with this predicate usually leads to merge join Cartesian. So, it's not an academic question. Q1. Do you know why the value for inequality predicate such drasticly changes cardinality expectations? Especially '1' and 'A'. They seem to be similar. Q2. I've made series of additional tests and found that the boundary length of the value and its content after which the cardinality drops to 1 depends on the value in the table. Q3. What would be your workaround in case you don't know the cause? Thanks as always for your thoughts. - Vladimir

November 12, 2005 - 4:37 pm UTC

you don't have histograms, you did size auto - and without any pre-existing queries - the dbms_stats won't get any.

the magic is when you crossed 6 characters (suggest reading for you based on your last two questions is Jonathan Lewis's new book - see home page for a direct link to it.

I'm not sure what you mean by 1 and A as they seem to be the same as the rest?

Bah! Probably something in the text again. Sorry for that.

A reader, November 12, 2005 - 2:34 pm UTC

November 12, 2005 - 4:38 pm UTC

hash-mark B turns on bolding
hash-mark b turns it off

the 10053 traces have hash-mark B's in them.

Not that easy...

Vladimir Sadilovskiy, November 12, 2005 - 6:25 pm UTC

Tom, It's just a coincidence I provided the tests with auto size. I can accept that migh brough in a different issue. But, here is an example of the same behavior with histograms. Could you please elaborate on "size auto - and without any pre-existing queries - the dbms_stats won't get any"? Thanks, - Vladimir SQL> update t1 set object_name = 'aa.bb.cccccc'; commit; 495568 rows updated. SQL> Commit complete. SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 254'); PL/SQL procedure successfully completed. PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 495K| 38M| 3883 (8)| |* 1 | TABLE ACCESS FULL | T1 | 495K| 38M| 3883 (8)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- 1 - filter("T1"."OBJECT_NAME"<>'aa.bb.') 12 rows selected. SQL> SQL> 2 rows deleted. SQL> 2 3 Explained. SQL> SQL> PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 82 | 3766 (5)| |* 1 | TABLE ACCESS FULL | T1 | 1 | 82 | 3766 (5)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- 1 - filter("T1"."OBJECT_NAME"<>'aa.bb.c') 12 rows selected. SQL> SQL> 2 rows deleted. SQL> 2 3 Explained. SQL> SQL> PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 82 | 3766 (5)| |* 1 | TABLE ACCESS FULL | T1 | 1 | 82 | 3766 (5)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- 1 - filter("T1"."OBJECT_NAME"<>'aa.bb.cccccc.dddddddddddddd') 12 rows selected. SQL> SQL>

November 12, 2005 - 7:28 pm UTC

... Could you please elaborate on "size auto - and without any pre-existing queries - the dbms_stats won't get any"?... size auto looks at the historical set of predicates used to figure out what columns might benefit from histgrams. ops$tkyte@ORA9IR2> create table t 2 as 3 select 99 id, a.* from all_objects a; Table created. ops$tkyte@ORA9IR2> ops$tkyte@ORA9IR2> update t set id = 1 where rownum = 1; 1 row updated. ops$tkyte@ORA9IR2> create index t_idx on t(id); Index created. ops$tkyte@ORA9IR2> ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns size auto', cascade=>true ); PL/SQL procedure successfully completed. ops$tkyte@ORA9IR2> ops$tkyte@ORA9IR2> set autotrace traceonly explain ops$tkyte@ORA9IR2> select * from t where id = 99; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45 Card=15342 Bytes=1472832) 1 0 TABLE ACCESS (FULL) OF 'T' (Cost=45 Card=15342 Bytes=1472832) ops$tkyte@ORA9IR2> select * from t where id = 1; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45 Card=15342 Bytes=1472832) 1 0 TABLE ACCESS (FULL) OF 'T' (Cost=45 Card=15342 Bytes=1472832) ops$tkyte@ORA9IR2> set autotrace off ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns size auto' ); PL/SQL procedure successfully completed. ops$tkyte@ORA9IR2> set autotrace traceonly explain ops$tkyte@ORA9IR2> select * from t where id = 99; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45 Card=30683 Bytes=2945568) 1 0 TABLE ACCESS (FULL) OF 'T' (Cost=45 Card=30683 Bytes=2945568) ops$tkyte@ORA9IR2> select * from t where id = 1; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=96) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=96) 2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=1) ops$tkyte@ORA9IR2> set autotrace off you are seeing the 6 to 7 transition again. The column information is 6 bytes long only. strange things will happen if the first 6 bytes are the "same" You will very much enjoy Jonathan's book - it will explain a lot.

Thanks Tom. That was very informative.

Vladimir Sadilovskiy, November 12, 2005 - 9:02 pm UTC

Reader, December 21, 2005 - 3:30 am UTC

Oracle generating 2 buckets for each column if I am executing this command. SQL> analyze table t compute statistics; Table analyzed. SQL> select count(*),column_name from user_Tab_histograms where table_name = 'T' group by column_name; COUNT(*) COLUMN_NAME ---------- -------------------------------------------------- 2 CREATED 2 DATA_OBJECT_ID 2 GENERATED 2 LAST_DDL_TIME 2 OBJECT_ID 2 OBJECT_NAME 2 OBJECT_TYPE 2 OWNER 2 SECONDARY 2 STATUS 2 SUBOBJECT_NAME 2 TEMPORARY 2 TIMESTAMP 13 rows selected. And If use this command ... SQL> analyze table t compute statistics for table for all columns; Table analyzed. SQL> select count(*),column_name from user_Tab_histograms where table_name = 'T' group by column_name; COUNT(*) COLUMN_NAME ---------- -------------------------------------------------- 76 CREATED 75 DATA_OBJECT_ID 2 GENERATED 76 LAST_DDL_TIME 76 OBJECT_ID 76 OBJECT_NAME 24 OBJECT_TYPE 11 OWNER 1 SECONDARY 2 STATUS 2 SUBOBJECT_NAME 2 TEMPORARY 76 TIMESTAMP I am confused .. No of buckets is dependent upon what ? Regards.

December 21, 2005 - 7:26 am UTC

first and foremost - stop using analyze to gather stats, start using dbms_stats. Second, analyze table t compute statististics is analyze table t compute statistics for table for all columns SIZE 1 for all indexes; ops$tkyte@ORA9IR2> create table t as select * from scott.emp; Table created. ops$tkyte@ORA9IR2> create index t_idx on t(ename); Index created. ops$tkyte@ORA9IR2> ops$tkyte@ORA9IR2> analyze table t compute statistics; Table analyzed. ops$tkyte@ORA9IR2> ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T'; TABLE_NAME NUM_ROWS ------------------------------ ---------- T 14 ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name; COLUMN_NAME COUNT(*) -------------------- ---------- COMM 2 DEPTNO 2 EMPNO 2 ENAME 2 HIREDATE 2 JOB 2 MGR 2 SAL 2 8 rows selected. ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name= 'T'; INDEX_NAME NUM_ROWS ------------------------------ ---------- T_IDX 14 ops$tkyte@ORA9IR2> ops$tkyte@ORA9IR2> analyze table t delete statistics; Table analyzed. ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T'; TABLE_NAME NUM_ROWS ------------------------------ ---------- T ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name; no rows selected ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name= 'T'; INDEX_NAME NUM_ROWS ------------------------------ ---------- T_IDX ops$tkyte@ORA9IR2> analyze table t 2 compute statistics 3 for table 4 for all columns SIZE 1 5 for all indexes; Table analyzed. ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T'; TABLE_NAME NUM_ROWS ------------------------------ ---------- T 14 ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name; COLUMN_NAME COUNT(*) -------------------- ---------- COMM 2 DEPTNO 2 EMPNO 2 ENAME 2 HIREDATE 2 JOB 2 MGR 2 SAL 2 8 rows selected. ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name= 'T'; INDEX_NAME NUM_ROWS ------------------------------ ---------- T_IDX 14 ops$tkyte@ORA9IR2>

Re:

Reader, December 21, 2005 - 7:40 am UTC

Thanks Tom for quick response.

And

analyze table t compute statistics for table for all columns;

This command using what size for columns ..

December 21, 2005 - 7:57 am UTC

you can research it further

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96540/statements_46a.htm#2076689 <code>

default size 75 - but suggestion: use dbms_stats, that is the way to go, pretend analyze table t compute... doesn't exist for now.

Got it default is 75 -- thanks

Reader, December 21, 2005 - 7:56 am UTC

Analyze table

A reader, December 21, 2005 - 8:51 am UTC

Tom,
This is not a follow up question but more of supporting incident to what you said about not using analyze table.

This happened several months ago so I do not remember all the table names. I was working with Oracle support to resolve an issue of EM running some very expensive queries against the 10gr1 database. In an OWC session, the Oracle support analyst asked me to analyze some tables using analyze table...command. After analyzing the tables, the problem did not go away. When I checked in the last_analyzed date in dba_tables, the dates were totally weird for those tables. We tried many times but no effect. Then we used the dbms_stats to analyze these tables and the problem was resolved and the weird dates were replaced by the correct ones. Though the analyze table command was populating the statistics, it was missing something which dbms_stats fixed.

After that I have stopped using analyze table completely even for small adhoc test cases on 10g databases.

December 21, 2005 - 7:26 pm UTC

analyze and dbms_stats can and will result in different numbers in the dictionary. The optimizer is built expecting the dbms_stats numbers. food for thought.

Fixing plan for table having skewed data

Narendra, March 24, 2006 - 6:31 am UTC

Hi Tom, In following query in EXPLAIN PLAN, I am getting CARD=9 where as row returned is 1. Following are necessary details: SQL> select * from v$version ; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - 64bi PL/SQL Release 10.1.0.4.0 - Production CORE 10.1.0.4.0 Production TNS for Solaris: Version 10.1.0.4.0 - Production NLSRTL Version 10.1.0.4.0 - Production SQL> desc product Name Null? Type ----------------------------------------- -------- ---------------------------- PRODUCT_KEY NOT NULL NUMBER(10) DESCRIPTION VARCHAR2(30) FULL_DESCRIPTION VARCHAR2(50) PRODUCT_TYPE VARCHAR2(20) BRAND VARCHAR2(20) PRODUCT_CATEGORY VARCHAR2(20) AGE_CATEGORY VARCHAR2(20) DEPARTMENT VARCHAR2(15) PRODUCT_KEY is PRIMARY KEY and has a UNIQUE index on it. Following is distribution of data for BRAND column. SQL> select brand, count(*) from product group by brand ; BRAND COUNT(*) -------------------- ---------- Astro 4 Big Studios 24 Coda 1 Dordor 2 Duff 1 Galore 1 Leavenworth 1 Little Guys 2 MKF Studios 24 Nagazoo 6 Parabuster Inc. 29 BRAND COUNT(*) -------------------- ---------- Sani 3 Solo 6 Wild Age 9 Wolf 28 15 rows selected. Now, when I execute following query, SQL> select * from product where brand = 'Duff'; PRODUCT_KEY DESCRIPTION ----------- ------------------------------ FULL_DESCRIPTION PRODUCT_TYPE -------------------------------------------------- -------------------- BRAND PRODUCT_CATEGORY AGE_CATEGORY DEPARTMENT -------------------- -------------------- -------------------- --------------- 141 Beer FOOD Duff Alcohol over 21 Beverage Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=9 Bytes=684) 1 0 TABLE ACCESS (FULL) OF 'PRODUCT' (TABLE) (Cost=2 Card=9 Bytes=684) The plan shows FULL table access step has CARD=9, whereas query returns only single row. I tried collecting statistics using SQL> exec dbms_stats.gather_table_stats(ownname => user, tabname => 'PRODUCT', method_opt => 'FOR ALL INDEXED COLUMNS SIZE 254', cascade => true); PL/SQL procedure successfully completed. However, still no change in plan. SQL> select * from product where brand = 'Duff'; PRODUCT_KEY DESCRIPTION ----------- ------------------------------ FULL_DESCRIPTION PRODUCT_TYPE -------------------------------------------------- -------------------- BRAND PRODUCT_CATEGORY AGE_CATEGORY DEPARTMENT -------------------- -------------------- -------------------- --------------- 141 Beer FOOD Duff Alcohol over 21 Beverage Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=9 Bytes=684) 1 0 TABLE ACCESS (FULL) OF 'PRODUCT' (TABLE) (Cost=2 Card=9 Bytes=684) But when I do following: SQL> create table product_test as select * from product ; Table created. And fire same query on new table: SQL> select * from product_test where brand = 'Duff'; PRODUCT_KEY DESCRIPTION ----------- ------------------------------ FULL_DESCRIPTION PRODUCT_TYPE -------------------------------------------------- -------------------- BRAND PRODUCT_CATEGORY AGE_CATEGORY DEPARTMENT -------------------- -------------------- -------------------- --------------- 141 Beer FOOD Duff Alcohol over 21 Beverage Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=114) 1 0 TABLE ACCESS (FULL) OF 'PRODUCT_TEST' (TABLE) (Cost=2 Card=1 Bytes=114) The CARDINALITY is correct. How can I fix the problem of incorrect cardinality on PRODUCT table? This table is part of VIDEO5 schema (which is part of ORACLE DISCOVERER) and PRODUCT table contains 141 rows.

March 24, 2006 - 9:56 am UTC

Histograms would "correct" the estimate, the estimate right now is: ops$tkyte@ORA10GR2> select 141/15 from dual; 141/15 ---------- 9.4 (141 rows, 15 distinct values). However, do you really NEED to fix it - 1 or 9 - is it making a material difference somewhere (if you use binds, unlikely for this table that you want to have histograms)

For study

Narendra, March 26, 2006 - 3:56 am UTC

Tom,

I was (obviously) not doing any tuning in this case. The idea of probing was just to know
1. Why estimates were incorrect ?
2. On creating separate table (without PK) why did it get "corrected"?

March 26, 2006 - 8:26 am UTC

why obviously? If you have to state "this was obviously" - that means "it was not" (else it would be obvious and you would have no need to state "obviously" :) I told you why the estimates were "incorrect" and even showed you the math. (just divide) On the other table, you gathered statistics differently one would presume. You have different statistics. As a bit of magic, I can show this (how different statistics affect the opimizer) I put your aggregated data into a table T1 and created T2 with your data as follows: ops$tkyte@ORA10GR2> create table t2 2 as 3 with data 4 as 5 (select level l from dual connect by level <= 29) 6 select rownum id, t1.nm 7 from t1, data 8 where data.l <= t1.cnt 9 / Table created. ops$tkyte@ORA10GR2> select nm, count(*) from t2 group by nm; NM COUNT(*) ------------------------------ ---------- Astro 4 Duff 1 Big Studios 24 Leavenworth 1 Sani 3 Wild Age 9 Solo 6 Galore 1 Nagazoo 6 Parabuster Inc. 29 Coda 1 Dordor 2 Little Guys 2 MKF Studios 24 Wolf 28 15 rows selected. ops$tkyte@ORA10GR2> ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' ); PL/SQL procedure successfully completed. ops$tkyte@ORA10GR2> ops$tkyte@ORA10GR2> set autotrace traceonly explain ops$tkyte@ORA10GR2> select * from t2 where nm = 'Duff'; Execution Plan ---------------------------------------------------------- Plan hash value: 1513984157 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9 | 117 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T2 | 9 | 117 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("NM"='Duff') exactly what you saw - and for the reasons I stated - DIVIDE 141/15 ~ 9. About nine rows. Why? Because of the lack of information, the optimizer just knows the HIGH and LOW values for NM and the number of distinct values: ops$tkyte@ORA10GR2> set autotrace off ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T2' group by column_name; COLUMN_NAME COUNT(*) -------------------- ---------- NM 2 ID 2 but, and here is the "magic", we just regather stats and since we've run a query that puts a predicate on a column - the stats Oracle gathers using AUTO change: ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' ); PL/SQL procedure successfully completed. ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T2' group by column_name; COLUMN_NAME COUNT(*) -------------------- ---------- NM 15 ID 2 Oracle now has a rather complete histogram of the values in NM and will therefore make a better "guess" ops$tkyte@ORA10GR2> set autotrace traceonly explain ops$tkyte@ORA10GR2> select * from t2 where nm = 'Duff'; Execution Plan ---------------------------------------------------------- Plan hash value: 1513984157 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("NM"='Duff') ops$tkyte@ORA10GR2> set autotrace off Now, this DOES NOT MEAN run out and gather histograms on everything - in fact, quite the opposite. In many/most systems (transactional for example), you probably don't want them at all. In a warehouse (where the use of bind variables for the big queries would not be prevalent), you may well want them But this explains "why" - you had different statistics for the two different tables.

Thanks ("Obviously...")

Narendra, March 27, 2006 - 4:01 am UTC

Tom,

When I gathered statistics using METHOD_OPT as 'FOR ALL COLUMNS SIZE SKEWONLY', the cardinality in query plan got corrected. I feel reason for not being able to achieve that earlier was due to gathering statistics on INDEXED COLUMNS only. BTW, I NEVER gathered statistics on NEW table after creating it with CTAS statement. So can I assume that CTAS ALWAYS keeps table stats up-to-date?

p.s. That "Obvious" was too much...:)

Thanks.

March 27, 2006 - 9:58 am UTC

create table as select gathers no statistics, nothing would be "up to date"

Histograms on system tables.

Raju, August 14, 2006 - 3:47 am UTC

Hi Tom,

We have noticed several system tables where
dba_tab_col_statistics reports HEIGHT BALANCED histograms where bucket_cnt >
distcnt (count of distinct values). It was my understanding that a similar
problem was fixed in 10.2.0.1 (dbms_stats method_opt size n generates a
FREQUENCY histogram where n >= NDV). However, even more perplexing is why do we
have many height balanced histogram buckets on a column with only 1 distinct
value?
SQL > select ENDPOINT_NUMBER,ENDPOINT_VALUE from dba_histograms where
table_name = 'INDPART$' and column_name = 'TS#';

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
217 79

So, some questions?
1. in sys.hist_head$, how is bucket_cnt calculated?
2. what determines how many buckets a column gets?

Thanks,
Raju

August 14, 2006 - 11:16 am UTC

bucket cnt is not calculated, it is "set" - you either decide the number of buckets (size on the method opt) or you let us use any algorithm we feel like to do it "auto".

and you don't need/want to be querying hist_head$, just use the normal views like the rest of us :)

frequency based histograms - number of distinct values < 255
"height balanced" - number of distinct values >= 255

Now, example please? I can definitely reproduce your findings, by modifying data after gathering of course.

So, got example?

Histogram

A Reader, September 22, 2006 - 7:54 pm UTC

hi tom

Oracle 9i
DW Environment
Monthly Data Load
120 Million rows in Fact

we create hsitograms for all indexed columns with 10% sample size

one of the indexed columns is Month-Year to which data belongs - this column is not that skewed as each month roughly same # of rows get inserted.

The warehouse has 120 million rows for approx 72 months, when we do

select * from fact_table

it does a full scan, even though the selectivity is approx 1/72

i was wondering if the 10% samplesize is generating histograms which are more hurting then helping as with a selectivity of 1/72 indexes should be useful

2 questions

a) do you think removing the histograms on this (such)
columns will be helpful ?
b) do we also need to remove the stats of the index based
on this column ?

Thanks very much

September 24, 2006 - 1:26 pm UTC

... we create hsitograms for all indexed columns with 10% sample size
....

why? why not the other columns? why all indexed columns? why not "columns we know to be skewed and that we know we use in some predicate and that we know the optimizer needs to be told about"

since, well, histograms cost alot to compute...

Histrogram

Rinku, September 24, 2006 - 2:04 pm UTC

Excellent description about Histogram.

Thanks Tom & All...

A Reader, September 24, 2006 - 8:29 pm UTC

hi tom,

if i am not mistaken - you have mentioned a couple of diff times that starting with indexed columns for creating histograms is not a bad starting point - and to me it stands to reason - the fact that one decided to create indexes on these columns is a good indication that these often get used in predicates,

i know this is an approximation - but nevertheless a good starting point. The other issue is of manageabilty. A decent sized app can easilt have 2000 columns (100 tables with 20 columns each) - to tell precisely which colsumns to create histograms can be quite a task - with high maintenance as well, as the schema evolves. (As a suggestion it would be a rather useful addition to oracle create table DDL syntax to be able to earmark columns for which to create histograms - on the column clause).

My point is - if we do fall back on this rather common paradigm of creating histograms on indexed columns - a good best practice could be :

"IF YOU CREATE HISTOGRAMS FOR ALL INDEXED COLUMNS (COZ YOU ARE LAZY NOT TO ANALYZE WHICH ONES REALLY NEEN EM :-) AND
IF YOU ARE ANALYZING A SMALL % OF ROWS - EG 10 % - THEN MAKE SURE YOU DELETE THE COLUMN STATS FOR COLUMN WHICH YOU KNOW FOR A FACT ARE NOT SKEWED"

the basic reason, i think we need to do is

if a table has 72 M rows - say 1 M for each month - in such as case a predicate suct as month_year = '01-2006' -
has a selectivity of of 1/72. And with that kind of selectivity i feel it should use the index - and its the histograms build with just 10% of the rows that are confusing the system. Do you agree ?

Does all this even remotely make sense - will be graeful to hear your take.

Regards,

September 25, 2006 - 2:25 am UTC

can you point to a place where I said that - so I can update it.

In a data warehouse, you don't index in many cases - so it doesn't really hold true (indexes are good for getting a few rows from a big table, warehouses are many times characterized for NOT doing that)

to gather histograms on all indexed columns to me would be the wrong (inefficient) approach.

maybe you should not gather any. I don't fall back on "common paradigms", those are also known as ROT (rule of thumb)

you would not need to remove the ones you know are not skewed (this is funny, you say you don't have the time/energy/motivation to discover what ones you SHOULD gather on, but you are knowledgable about the ones you don't need them on?? that would imply you do know the skewed columns by definition - it is the set of columns left over after you take these away). Since they are not skewed, the histograms will reflect that.

No, it doesn't make sense to me to say "it is too much work to understand our data"

test

A reader, September 25, 2006 - 7:17 am UTC

test

Dumb question... How do I NOT create histograms when using DBMS_STATS to gather stats?

Philip Moore, September 26, 2006 - 2:36 pm UTC

Hi Tom,

This may be an EXTREMELY dumb question - but I can't find the answer in the 9i documentation or on askTom...

How do I explicitly NOT create histograms when using DBMS_STATS with 9.2.0.6? What is the "METHOD_OPT" parameter I should use?

I really need to analyze my tables with DBMS_STATS - I just don't want histograms :)

Thanks for your help in advance, sir.

September 26, 2006 - 5:04 pm UTC

size 1

if you have one bucket, only thing we know is "high and low and number distinct" values.

statistics & HIST_HEAD$

Antonio Dell'Elce, September 27, 2006 - 8:50 am UTC

Dear Tom,

I am investigating possible performance improvements on
on a 10.1.0.2.0 instance, looking at the ADDM report:

FINDING 1: 88% impact (36798 seconds)
-------------------------------------
SQL statements consuming significant database time were found.

RECOMMENDATION 1: SQL Tuning, 60% benefit (25021 seconds)
ACTION: Investigate the SQL statement with SQL_ID "b6usrg82hwsa3" for
possible performance improvements.
RELEVANT OBJECT: SQL statement with SQL_ID b6usrg82hwsa3
call dbms_stats.gather_database_stats_job_proc ( )

(only first lines reported)

and at v$segment_statistics:

OWNER OBJ SUBNAME SNAME VALUE
---------- ------------------------------ --------------- -------------------- ----------------
SYS HIST_HEAD$ db block changes 107,039,200
STAT_OWN INTERVALLI SYS_SUBP1119 physical reads 80,407,080
STAT_OWN INTERVALLI SYS_SUBP1121 physical reads 80,335,637
STAT_OWN INTERVALLI SYS_SUBP1117 physical reads 68,966,192
STAT_OWN INTERVALLI SYS_SUBP1118 physical reads 55,163,908
STAT_OWN INTERVALLI SYS_SUBP1120 physical reads 43,452,888
SYS I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST db block changes 11,185,920
SYS HISTGRM$ db block changes 10,908,512

I came to believe that statistics collections are not working at best, ... how do you suggest I can investigate this?

Thanks,
Antonio

September 27, 2006 - 3:27 pm UTC

is this job running during your time of "peak usage", this is the auto scheduled job that should be running "off peak", is it running when you are "doing useful work" or "when you are asleep and no one is really doing stuff"

Antonio, September 29, 2006 - 12:19 pm UTC

I verified the schedules and it is on the default schedule
(night and weekends) but while the system is "operational"
on the whole day (it collects data from other systems which
are 24hrs active)

September 30, 2006 - 7:50 am UTC

#bok, big question: should you be gathering stats so frequently...

you cannot really change what this routine does.

Find queries using histograms

Ravi, October 26, 2007 - 4:45 pm UTC

Tom,

How do we find a query is using histograms or not? Only way is through trace?

Thanks
Ravi

October 29, 2007 - 12:25 pm UTC

query user_tab_histograms - if they have them, the query was optimized with them in mind.

that is, every query that accesses a table with histograms "used" them - whether they make any difference to the plan or not

Histogram is not working ?

Roberto Veiga, February 17, 2009 - 3:09 pm UTC

Tom, I think I have a problem.
The user is complaining about performance.

I have a query that uses bind variables.

We are investigating the query and we've noticed the plan changes according with the values that are passed to the query.

We have already collected statistcs and histograms. But sometimes the performance is good and sometimes is bad.

I dont know what to say to the user because this process have to have a good performance and one minute makes a lot of difference.

Here is my tests:

** *******************************************************************
** ORIGINAL QUERY AND ORIGINAL ACCESS PLAN
** *******************************************************************

SELECT RAES.RAES_USR_ANT_ATM, RAES_DAT_ANT_ATM, RAES.RAES_STA_LBR_RMS, RAES.RAES_CTRA_AFL_CDS_COD, RAES.RAES_DAT_CLC, RAES.RAES_DAT_PGT, VPR.PGT_VAL_BRT VAL_BRT, RAES.RAES_VAL_LQD, RAES.RAES_VAL_BLQ_ALR, RAES.RAES_ADMS_ADME_ADM_COD, RAES.RAES_RMS_ANO_EMO, RAES.RAES_RMS_LOCP_COD, RAES.RAES_RMS_TIP_NUM_RMS, RAES.RAES_RMS_NUM_PRG_RMS, RAES.RAES_DAT_COR, RAES.RAES_PRM_COB RAES_PRM_COB_AUX, RAES.RAES_PGT_COD, RAES.RAES_ADMS_FRMS_COD, RAES.RAES_MOE_COD, RAES.RAES_NUM_PRL, RAES.RAES_ADMS_SRV_COD, VPR.GUI_ADME_ADM_COD GUI_ADME_ADM_COD, VPR.GUI_EQP_NUM GUI_EQP_NUM, VPR.GUI_COD GUI_COD, VPR.GUI_DTA_ETG GUI_DTA_ETG, VPR.GUI_QTD_TOT_IFD GUI_QTD_TOT_IFD, VPR.GUI_VAL_TOT_IFD GUI_VAL_TOT_IFD, C620040P(RAES.RAES_STA_LBR_RMS, RAES.RAES_PRM_COB, RAES.RAES_DAT_PGT, RAES.RAES_PGT_COD) RAES_STATUS, RAES_CTRA_ADMS_ADME_ADM_COD, RAES_CTRA_ADMS_FRMS_COD, RAES_CTRA_ADMS_SRV_COD, RAES_CTRA_MOE_COD FROM RMS_ADM_EMP_SRV RAES, V_RAES_PGT VPR
WHERE RAES.RAES_STA_LBR_RMS < 4
AND NVL(RAES.RAES_VAL_PGO_PRL, 0) > 0
AND NOT VPR.RAES_RMS_TIP_NUM_RMS = 7
AND NOT VPR.RAES_FRMP_COD = 7
AND RAES.RAES_STA_RAES + 0 >= 2
AND RAES.RAES_PGT_COD + 0 IS NULL
AND RAES.RAES_RMS_ANO_EMO = VPR.RAES_RMS_ANO_EMO
AND RAES.RAES_RMS_LOCP_COD = VPR.RAES_RMS_LOCP_COD
AND RAES.RAES_RMS_TIP_NUM_RMS = VPR.RAES_RMS_TIP_NUM_RMS
AND RAES.RAES_RMS_NUM_PRG_RMS = VPR.RAES_RMS_NUM_PRG_RMS
AND RAES.RAES_ADMS_ADME_ADM_COD = VPR.RAES_ADMS_ADME_ADM_COD
AND RAES.RAES_ADMS_SRV_COD = VPR.RAES_ADMS_SRV_COD
AND RAES.RAES_ADMS_FRMS_COD = VPR.RAES_ADMS_FRMS_COD
AND RAES.RAES_MOE_COD = VPR.RAES_MOE_COD
AND RAES.RAES_NUM_PRL = VPR.RAES_NUM_PRL
AND RAES.RAES_ADMS_ADME_ADM_COD = 5
AND RAES.RAES_ADMS_FRMS_COD = 2
AND RAES.RAES_ADMS_SRV_COD = 1
AND RAES.RAES_MOE_COD = 'R$'
AND RAES.RAES_DAT_PGT = '12/16/2008'
AND RAES.RAES_PGT_GRD = 'N'
AND ( ('S' = 'S' AND VPR.PGT_CDS_COD_REL = 104938 )
OR ('S' != 'S' AND VPR.PGT_CDS_COD = 104938 ) );

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3546104366

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 179 | 49 (0)| 00:00:01 | | |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| RMS_ADM_EMP_SRV | 1 | 102 | 1 (0)| 00:00:01 | ROWID | ROWID |
| 2 | NESTED LOOPS | | 1 | 179 | 49 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS BY INDEX ROWID | SNAP$_RAES_PGT | 13 | 1001 | 46 (3)| 00:00:01 | | |
|* 4 | INDEX RANGE SCAN | SNAP$RAES_AFL_DAT_PGT_I | 218 | | 1 (0)| 00:00:01 | | |
|* 5 | INDEX RANGE SCAN | RAES_RMS_FK_I | 1 | | 1 (0)| 00:00:01 | | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RAES"."RAES_PGT_GRD"='N' AND "RAES"."RAES_ADMS_FRMS_COD"=2 AND "RAES"."RAES_ADMS_SRV_COD"=1 AND
"RAES"."RAES_PGT_COD"+0 IS NULL AND "RAES"."RAES_DAT_PGT"='12/16/2008' AND NVL("RAES"."RAES_VAL_PGO_PRL",0)>0 AND
"RAES"."RAES_ADMS_ADME_ADM_COD"=5 AND "RAES"."RAES_STA_LBR_RMS"<4 AND "RAES"."RAES_STA_RAES"+0>=2 AND
"RAES"."RAES_MOE_COD"='R$' AND "RAES"."RAES_NUM_PRL"="RAES_NUM_PRL")
3 - filter("RAES_ADMS_SRV_COD"=1 AND "RAES_ADMS_FRMS_COD"=2 AND "RAES_RMS_TIP_NUM_RMS"<>7 AND "RAES_FRMP_COD"<>7
AND "RAES_ADMS_ADME_ADM_COD"=5 AND "RAES_MOE_COD"='R$' AND ("RAES_RMS_TIP_NUM_RMS"=1 OR "RAES_RMS_TIP_NUM_RMS"=2 OR
"RAES_RMS_TIP_NUM_RMS"=3 OR "RAES_RMS_TIP_NUM_RMS"=4 OR "RAES_RMS_TIP_NUM_RMS"=5 OR "RAES_RMS_TIP_NUM_RMS"=6 OR
"RAES_RMS_TIP_NUM_RMS"=7 OR "RAES_RMS_TIP_NUM_RMS"=8 OR "RAES_RMS_TIP_NUM_RMS"=9))
4 - access("PGT_CDS_COD_REL"=104938)
5 - access("RAES"."RAES_RMS_NUM_PRG_RMS"="RAES_RMS_NUM_PRG_RMS" AND
"RAES"."RAES_RMS_TIP_NUM_RMS"="RAES_RMS_TIP_NUM_RMS" AND "RAES"."RAES_RMS_ANO_EMO"="RAES_RMS_ANO_EMO" AND
"RAES"."RAES_RMS_LOCP_COD"="RAES_RMS_LOCP_COD")
filter("RAES"."RAES_RMS_TIP_NUM_RMS"<>7)

29 rows selected.

** *******************************************************************
** TESTING VALUES ON COLUMN RAES_ADMS_SRV_COD
** *******************************************************************

-- -------------------------------------------------------------------
-- DISTINCT VALUES FOR COLUMN RAES_ADMS_SRV_COD
-- -------------------------------------------------------------------

SELECT DISTINCT RAES_ADMS_SRV_COD FROM RMS_ADM_EMP_SRV ORDER BY 1;

RAES_ADMS_SRV_COD
-----------------
1
2
3
4
6
7
16
22
23
24
25
26
27
30
31
32

16 rows selected.

-- -------------------------------------------------------------------
-- BUCKETS/VALUES FOR COLUMN RAES_ADMS_SRV_COD
-- -------------------------------------------------------------------

select min(RAES_ADMS_SRV_COD), max(RAES_ADMS_SRV_COD), count(RAES_ADMS_SRV_COD), wb
from (
select RAES_ADMS_SRV_COD,
width_bucket( RAES_ADMS_SRV_COD,
(select min(RAES_ADMS_SRV_COD)-1 from prod_dba.RMS_ADM_EMP_SRV),
(select max(RAES_ADMS_SRV_COD)+1 from prod_dba.RMS_ADM_EMP_SRV), 50 ) wb
from prod_dba.RMS_ADM_EMP_SRV
)
group by wb
order by wb
/

MIN(RAES_ADMS_SRV_COD)|MAX(RAES_ADMS_SRV_COD)|COUNT(RAES_ADMS_SRV_COD)| WB
----------------------|----------------------|------------------------|----------
1| 1| 11692690| 2
2| 2| 36778973| 4
3| 3| 3309| 5
4| 4| 5125| 7
6| 6| 3431| 10
7| 7| 2111| 11
16| 16| 4731| 25
22| 22| 43| 34
23| 23| 5| 35
24| 24| 8851| 37
25| 25| 537676| 38
26| 26| 1387| 40
27| 27| 797| 41
30| 30| 227446| 46
31| 31| 3285855| 47
32| 32| 1473147| 49

16 rows selected.

-- -------------------------------------------------------------------
-- TESTING RAES_ADMS_SRV_COD = 1
-- -------------------------------------------------------------------

SELECT RAES.RAES_USR_ANT_ATM, RAES_DAT_ANT_ATM, RAES.RAES_STA_LBR_RMS, RAES.RAES_CTRA_AFL_CDS_COD, RAES.RAES_DAT_CLC, RAES.RAES_DAT_PGT, VPR.PGT_VAL_BRT VAL_BRT, RAES.RAES_VAL_LQD, RAES.RAES_VAL_BLQ_ALR, RAES.RAES_ADMS_ADME_ADM_COD, RAES.RAES_RMS_ANO_EMO, RAES.RAES_RMS_LOCP_COD, RAES.RAES_RMS_TIP_NUM_RMS, RAES.RAES_RMS_NUM_PRG_RMS, RAES.RAES_DAT_COR, RAES.RAES_PRM_COB RAES_PRM_COB_AUX, RAES.RAES_PGT_COD, RAES.RAES_ADMS_FRMS_COD, RAES.RAES_MOE_COD, RAES.RAES_NUM_PRL, RAES.RAES_ADMS_SRV_COD, VPR.GUI_ADME_ADM_COD GUI_ADME_ADM_COD, VPR.GUI_EQP_NUM GUI_EQP_NUM, VPR.GUI_COD GUI_COD, VPR.GUI_DTA_ETG GUI_DTA_ETG, VPR.GUI_QTD_TOT_IFD GUI_QTD_TOT_IFD, VPR.GUI_VAL_TOT_IFD GUI_VAL_TOT_IFD, C620040P(RAES.RAES_STA_LBR_RMS, RAES.RAES_PRM_COB, RAES.RAES_DAT_PGT, RAES.RAES_PGT_COD) RAES_STATUS, RAES_CTRA_ADMS_ADME_ADM_COD, RAES_CTRA_ADMS_FRMS_COD, RAES_CTRA_ADMS_SRV_COD, RAES_CTRA_MOE_COD FROM RMS_ADM_EMP_SRV RAES, V_RAES_PGT VPR
WHERE RAES.RAES_STA_LBR_RMS < 4
AND NVL(RAES.RAES_VAL_PGO_PRL, 0) > 0
AND NOT VPR.RAES_RMS_TIP_NUM_RMS = 7
AND NOT VPR.RAES_FRMP_COD = 7
AND RAES.RAES_STA_RAES + 0 >= 2
AND RAES.RAES_PGT_COD + 0 IS NULL
AND RAES.RAES_RMS_ANO_EMO = VPR.RAES_RMS_ANO_EMO
AND RAES.RAES_RMS_LOCP_COD = VPR.RAES_RMS_LOCP_COD
AND RAES.RAES_RMS_TIP_NUM_RMS = VPR.RAES_RMS_TIP_NUM_RMS
AND RAES.RAES_RMS_NUM_PRG_RMS = VPR.RAES_RMS_NUM_PRG_RMS
AND RAES.RAES_ADMS_ADME_ADM_COD = VPR.RAES_ADMS_ADME_ADM_COD
AND RAES.RAES_ADMS_SRV_COD = VPR.RAES_ADMS_SRV_COD
AND RAES.RAES_ADMS_FRMS_COD = VPR.RAES_ADMS_FRMS_COD
AND RAES.RAES_MOE_COD = VPR.RAES_MOE_COD
AND RAES.RAES_NUM_PRL = VPR.RAES_NUM_PRL
AND RAES.RAES_ADMS_ADME_ADM_COD = 5
AND RAES.RAES_ADMS_FRMS_COD = 2
AND RAES.RAES_ADMS_SRV_COD = 1 -- <<
AND RAES.RAES_MOE_COD = 'R$'
AND RAES.RAES_DAT_PGT = '12/16/2008'
AND RAES.RAES_PGT_GRD = 'N'
AND ( ('S' = 'S' AND VPR.PGT_CDS_COD_REL = 104938 )
OR ('S' != 'S' AND VPR.PGT_CDS_COD = 104938 ) );

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3546104366

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 179 | 49 (0)| 00:00:01 | | |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| RMS_ADM_EMP_SRV | 1 | 102 | 1 (0)| 00:00:01 | ROWID | ROWID |
| 2 | NESTED LOOPS | | 1 | 179 | 49 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS BY INDEX ROWID | SNAP$_RAES_PGT | 13 | 1001 | 46 (3)| 00:00:01 | | |
|* 4 | INDEX RANGE SCAN | SNAP$RAES_AFL_DAT_PGT_I | 218 | | 1 (0)| 00:00:01 | | |
|* 5 | INDEX RANGE SCAN | RAES_RMS_FK_I | 1 | | 1 (0)| 00:00:01 | | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RAES"."RAES_PGT_GRD"='N' AND "RAES"."RAES_ADMS_FRMS_COD"=2 AND "RAES"."RAES_ADMS_SRV_COD"=1 AND
"RAES"."RAES_PGT_COD"+0 IS NULL AND "RAES"."RAES_DAT_PGT"='12/16/2008' AND NVL("RAES"."RAES_VAL_PGO_PRL",0)>0 AND
"RAES"."RAES_ADMS_ADME_ADM_COD"=5 AND "RAES"."RAES_STA_LBR_RMS"<4 AND "RAES"."RAES_STA_RAES"+0>=2 AND
"RAES"."RAES_MOE_COD"='R$' AND "RAES"."RAES_NUM_PRL"="RAES_NUM_PRL")
3 - filter("RAES_ADMS_SRV_COD"=1 AND "RAES_ADMS_FRMS_COD"=2 AND "RAES_RMS_TIP_NUM_RMS"<>7 AND "RAES_FRMP_COD"<>7
AND "RAES_ADMS_ADME_ADM_COD"=5 AND "RAES_MOE_COD"='R$' AND ("RAES_RMS_TIP_NUM_RMS"=1 OR "RAES_RMS_TIP_NUM_RMS"=2 OR
"RAES_RMS_TIP_NUM_RMS"=3 OR "RAES_RMS_TIP_NUM_RMS"=4 OR "RAES_RMS_TIP_NUM_RMS"=5 OR "RAES_RMS_TIP_NUM_RMS"=6 OR
"RAES_RMS_TIP_NUM_RMS"=7 OR "RAES_RMS_TIP_NUM_RMS"=8 OR "RAES_RMS_TIP_NUM_RMS"=9))
4 - access("PGT_CDS_COD_REL"=104938)
5 - access("RAES"."RAES_RMS_NUM_PRG_RMS"="RAES_RMS_NUM_PRG_RMS" AND
"RAES"."RAES_RMS_TIP_NUM_RMS"="RAES_RMS_TIP_NUM_RMS" AND "RAES"."RAES_RMS_ANO_EMO"="RAES_RMS_ANO_EMO" AND
"RAES"."RAES_RMS_LOCP_COD"="RAES_RMS_LOCP_COD")
filter("RAES"."RAES_RMS_TIP_NUM_RMS"<>7)

29 rows selected.

-- -------------------------------------------------------------------
-- TESTING RAES_ADMS_SRV_COD = 4
-- -------------------------------------------------------------------

** >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< **
** >>>>>>>>>>>>>>>>>>>>>>>>>>> CHANGE ON ACCESS PLAN <<<<<<<<<<<<<<<<<<<<<<<<<< **
** >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< **

SELECT RAES.RAES_USR_ANT_ATM, RAES_DAT_ANT_ATM, RAES.RAES_STA_LBR_RMS, RAES.RAES_CTRA_AFL_CDS_COD, RAES.RAES_DAT_CLC, RAES.RAES_DAT_PGT, VPR.PGT_VAL_BRT VAL_BRT, RAES.RAES_VAL_LQD, RAES.RAES_VAL_BLQ_ALR, RAES.RAES_ADMS_ADME_ADM_COD, RAES.RAES_RMS_ANO_EMO, RAES.RAES_RMS_LOCP_COD, RAES.RAES_RMS_TIP_NUM_RMS, RAES.RAES_RMS_NUM_PRG_RMS, RAES.RAES_DAT_COR, RAES.RAES_PRM_COB RAES_PRM_COB_AUX, RAES.RAES_PGT_COD, RAES.RAES_ADMS_FRMS_COD, RAES.RAES_MOE_COD, RAES.RAES_NUM_PRL, RAES.RAES_ADMS_SRV_COD, VPR.GUI_ADME_ADM_COD GUI_ADME_ADM_COD, VPR.GUI_EQP_NUM GUI_EQP_NUM, VPR.GUI_COD GUI_COD, VPR.GUI_DTA_ETG GUI_DTA_ETG, VPR.GUI_QTD_TOT_IFD GUI_QTD_TOT_IFD, VPR.GUI_VAL_TOT_IFD GUI_VAL_TOT_IFD, C620040P(RAES.RAES_STA_LBR_RMS, RAES.RAES_PRM_COB, RAES.RAES_DAT_PGT, RAES.RAES_PGT_COD) RAES_STATUS, RAES_CTRA_ADMS_ADME_ADM_COD, RAES_CTRA_ADMS_FRMS_COD, RAES_CTRA_ADMS_SRV_COD, RAES_CTRA_MOE_COD FROM RMS_ADM_EMP_SRV RAES, V_RAES_PGT VPR
WHERE RAES.RAES_STA_LBR_RMS < 4
AND NVL(RAES.RAES_VAL_PGO_PRL, 0) > 0
AND NOT VPR.RAES_RMS_TIP_NUM_RMS = 7
AND NOT VPR.RAES_FRMP_COD = 7
AND RAES.RAES_STA_RAES + 0 >= 2
AND RAES.RAES_PGT_COD + 0 IS NULL
AND RAES.RAES_RMS_ANO_EMO = VPR.RAES_RMS_ANO_EMO
AND RAES.RAES_RMS_LOCP_COD = VPR.RAES_RMS_LOCP_COD
AND RAES.RAES_RMS_TIP_NUM_RMS = VPR.RAES_RMS_TIP_NUM_RMS
AND RAES.RAES_RMS_NUM_PRG_RMS = VPR.RAES_RMS_NUM_PRG_RMS
AND RAES.RAES_ADMS_ADME_ADM_COD = VPR.RAES_ADMS_ADME_ADM_COD
AND RAES.RAES_ADMS_SRV_COD = VPR.RAES_ADMS_SRV_COD
AND RAES.RAES_ADMS_FRMS_COD = VPR.RAES_ADMS_FRMS_COD
AND RAES.RAES_MOE_COD = VPR.RAES_MOE_COD
AND RAES.RAES_NUM_PRL = VPR.RAES_NUM_PRL
AND RAES.RAES_ADMS_ADME_ADM_COD = 5
AND RAES.RAES_ADMS_FRMS_COD = 2
AND RAES.RAES_ADMS_SRV_COD = 4 -- <<
AND RAES.RAES_MOE_COD = 'R$'
AND RAES.RAES_DAT_PGT = '12/16/2008'
AND RAES.RAES_PGT_GRD = 'N'
AND ( ('S' = 'S' AND VPR.PGT_CDS_COD_REL = 104938 )
OR ('S' != 'S' AND VPR.PGT_CDS_COD = 104938 ) );

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3292825880

------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 179 | 9 (0)| 00:00:01 | | |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| RMS_ADM_EMP_SRV | 1 | 102 | 1 (0)| 00:00:01 | ROWID | ROWID |
| 2 | NESTED LOOPS | | 1 | 179 | 9 (0)| 00:00:01 | | |
| 3 | INLIST ITERATOR | | | | | | | |
|* 4 | TABLE ACCESS BY INDEX ROWID | SNAP$_RAES_PGT | 1 | 77 | 8 (0)| 00:00:01 | | |
|* 5 | INDEX RANGE SCAN | SNAP$_RAES_PGT_PK | 185 | | 2 (0)| 00:00:01 | | |
|* 6 | INDEX RANGE SCAN | RAES_RMS_FK_I | 1 | | 1 (0)| 00:00:01 | | |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RAES"."RAES_ADMS_SRV_COD"=4 AND "RAES"."RAES_PGT_GRD"='N' AND "RAES"."RAES_ADMS_FRMS_COD"=2 AND
"RAES"."RAES_PGT_COD"+0 IS NULL AND "RAES"."RAES_DAT_PGT"='12/16/2008' AND NVL("RAES"."RAES_VAL_PGO_PRL",0)>0
AND "RAES"."RAES_ADMS_ADME_ADM_COD"=5 AND "RAES"."RAES_STA_LBR_RMS"<4 AND "RAES"."RAES_STA_RAES"+0>=2 AND
"RAES"."RAES_MOE_COD"='R$' AND "RAES"."RAES_NUM_PRL"="RAES_NUM_PRL")
4 - filter("PGT_CDS_COD_REL"=104938 AND "RAES_FRMP_COD"<>7)
5 - access(("RAES_RMS_TIP_NUM_RMS"=1 OR "RAES_RMS_TIP_NUM_RMS"=2 OR "RAES_RMS_TIP_NUM_RMS"=3 OR
"RAES_RMS_TIP_NUM_RMS"=4 OR "RAES_RMS_TIP_NUM_RMS"=5 OR "RAES_RMS_TIP_NUM_RMS"=6 OR "RAES_RMS_TIP_NUM_RMS"=7 OR
"RAES_RMS_TIP_NUM_RMS"=8 OR "RAES_RMS_TIP_NUM_RMS"=9) AND "RAES_ADMS_ADME_ADM_COD"=5 AND "RAES_ADMS_SRV_COD"=4
AND "RAES_ADMS_FRMS_COD"=2 AND "RAES_MOE_COD"='R$')
filter("RAES_RMS_TIP_NUM_RMS"<>7 AND "RAES_MOE_COD"='R$')
6 - access("RAES"."RAES_RMS_NUM_PRG_RMS"="RAES_RMS_NUM_PRG_RMS" AND
"RAES"."RAES_RMS_TIP_NUM_RMS"="RAES_RMS_TIP_NUM_RMS" AND "RAES"."RAES_RMS_ANO_EMO"="RAES_RMS_ANO_EMO" AND
"RAES"."RAES_RMS_LOCP_COD"="RAES_RMS_LOCP_COD")
filter("RAES"."RAES_RMS_TIP_NUM_RMS"<>7)

31 rows selected.

February 17, 2009 - 4:37 pm UTC

it is *because* the histograms are there and are working that you are seeing this

http://asktom.oracle.com/Misc/tuning-with-sqltracetrue.html
http://asktom.oracle.com/Misc/sqltracetrue-part-two.html

The First example in this page against 10204

Yoav, March 06, 2009 - 6:00 pm UTC

Hi Tom, I run the first example in this page againt oracle 10204 , and got totally different results. Actually the performance are better before gathering statistics : Its took to oracle 0.56 sec to return an answer befor gathering stats. and 3.33 sec after gathering stats . I did not find any benift using histograms against 10204 - in this case. (It may be related to dynamic sampling ?) could you please verify it ? If indeed this is the case could you please give an example where i could benit from histograms in 10gr2 ? Thanks SQL*Plus: Release 10.2.0.4.0 - Production on Sat Mar 7 00:24:08 2009 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> alter session set optimizer_mode='ALL_ROWS'; Session altered. SQL> show parameter optimizer_mode NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ optimizer_mode string ALL_ROWS SQL> create table t as select 1 X, rpad('*',4000,'*') data from all_objects a 2 3 ; Table created. SQL> insert into t 2 select 2, rpad('*',4000,'*') from all_objects a where rownum = 1; 1 row created. SQL> commit; Commit complete. SQL> create index t_idx on t(x); Index created. SQL> alter session set sql_trace=true; Session altered. SQL> select count(data) from t t1 where x = 1; COUNT(DATA) ----------- 53130 1 row selected. SQL> select count(data) from t t1 where x = 2; COUNT(DATA) ----------- 1 1 row selected. SQL> alter system flush shared_pool; System altered. SQL> alter system flush buffer_cache; System altered. SQL> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO', cascade=>true); PL/SQL procedure successfully completed. SQL> select count(data) from t t2 where x = 1; COUNT(DATA) ----------- 53130 1 row selected. SQL> select count(data) from t t2 where x = 2; COUNT(DATA) ----------- 1 1 row selected. SQL> select sysdate from dual; SYSDATE --------- 07-MAR-09 1 row selected. =========================================================== TKPROF shows , befor collecting statsistics: select count(data) from t t1 where x = 1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.00 0 2 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.58 0.56 0 118851 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.59 0.56 0 118853 0 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 13 Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=118851 pr=0 pw=0 time=565898 us) 53130 TABLE ACCESS FULL T (cr=118851 pr=0 pw=0 time=584474 us) select count(data) from t t1 where x = 2 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.00 0 2 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.00 0 4 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.01 0.00 0 6 0 1 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 13 Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=4 pr=0 pw=0 time=39 us) 1 TABLE ACCESS BY INDEX ROWID T (cr=4 pr=0 pw=0 time=31 us) 1 INDEX RANGE SCAN T_IDX (cr=2 pr=0 pw=0 time=13 us)(object id 10977844) AFTER COLLECTING STATISTICS: select count(data) from t t2 where x = 1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 1.16 3.33 22770 118851 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 1.16 3.33 22770 118851 0 1 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 13 Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=118851 pr=22770 pw=0 time=3337004 us) 53130 TABLE ACCESS FULL T (cr=118851 pr=22770 pw=0 time=3932313 us) *********************************************************** select count(data) from t t2 where x = 2 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.00 0 4 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.00 0.00 0 4 0 1 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 13 Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=4 pr=0 pw=0 time=112 us) 1 TABLE ACCESS BY INDEX ROWID T (cr=4 pr=0 pw=0 time=98 us) 1 INDEX RANGE SCAN T_IDX (cr=2 pr=0 pw=0 time=71 us)(object id 10977844) ***********************************************************

March 07, 2009 - 12:15 pm UTC

ummm, please apply a little bit of critical reading to what you see above and see if you cannot figure this out yourself?

do you see that they use the same plans? (yes, they do)
do you see that before and after statistics - they are identical in nature (yes, they are)

do you see that in between, you flushed the buffer cache and caused the second one to do A TON OF physical IO?

??????

Look - anytime you have the same exact plan, well, you have the SAME EXACT PLAN. If the plans are the same - they do - the same thing.

They did exactly the same thing, you flushed the cache for whatever reason (why would you do that???? that never happens in real life - I hate that command)

Guidelines for better sql statement

Roberto Veiga, May 31, 2009 - 9:00 am UTC

Tom I am facing a lot of problems with bind variables because the access plan is changing and a critical performance problem occurs.
My customer is very disapointed about this "feature" of Oracle and he is asking me what to do to avoid this behavior.

We have already tried to remove histograms and the situation become worse.

I have proposed to him to focus on the most critical process and put some hints to freeze the plan.

But the processes work with a lot of procedures with a lot of statements. So I need to define a method to focus only in the queries with most change to have this problem.

My points are:

-Statements with Bind Variables
-Columns with Skewed distribution
-Statements with more than 5 tables join
-Tables with many rows (500k or more)

This could be a good start to identify queries that can have problem?

Questions on histograms and how they are useful

A reader, April 13, 2010 - 12:11 pm UTC

Hi Tom - We have a current prod environment where we run full 100% stats on IOT tables. For other tables, we run the following command

SYS.DBMS_STATS.GATHER_SCHEMA_STATS (
OwnName => '<owner>'
,estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE
,degree => DBMS_STATS.AUTO_DEGREE);

How will the histograms be generated for this ? Few weeks back, we found that having histograms on two columns, a system generated GUID column and a typecode column contributed to some slowness on some queries. So everyday now we delete the stats on these two columns. I want to get away with this and change the stats script to do something better. What are the other options I have ?

April 14, 2010 - 7:58 am UTC

read about method_opt and what it's defaults are on your system (change by release). It controls what histograms are/are not generated.

A reader, April 14, 2010 - 10:34 am UTC

Our database is Oracle 10.2.0.4.0. Before changing method_opt how can we find out which histograms are having a negative impact. Can we check the skewness and determine if the histograms will be good or bad ? We have a lot of tables which have the histograms on system generated GUIDs. Can I use a scientific method to determine if the histograms on the GUIDs are ok to be deleted ?

April 14, 2010 - 3:47 pm UTC

none of them should be in this case - you'd have to give me an example.

If there were a universal check that anyone could run to find "bad histograms" (no such thing), then we would do that as part of gather and remove them already. There is no such thing as a "bad histogram"

You'd have to give me an example whereby your histograms are causing an issue, demonstrate with before and after plans

histograms + estimate_percent?

Xenofon, April 28, 2010 - 4:50 am UTC

Hi Tom,

I am currently reading "Cost-Based Oracle Fundamentals" from Jonathan Lewis, great book, but "hard-bread" as Germans would say.

One question I have concerning histograms is: even JL seems to always use the default "estimate_percent" setting. But does this make sense, when you certainly have a non-uniform value distribution and you want to collect histograms to get around that problem?

Say the CBO chooses to scan 10% of the table. Certainly it will get a good estimate about the highest frequented values, but what about the rare ones? What about those values, he did not even find in the 10%? Which bucket are they assigned to or how are those predicates managed during hard parse?

Many thanks in advance

April 28, 2010 - 8:30 am UTC

there are two types of histograms - those when we think there are less than 255 distinct values and those when we know there are more than 255.

When there is less then 255 - we have a frequency histogram - a bucket per value. If we don't observe some value in the table - it will of course not have a bucket and when it doesn't have a bucket, doesn't have a value - we "make up an estimate".

If you estimate, yes, you might miss an observation and we'll have to 'guess'. But it is not always viable to compute to get "perfection" - especially in the type of system that a histogram is useful for (warehouse/read mostly/report system) since the segments can be very large.

play with this:

/* drop table t; create table t (x,y) as select level, rpad('*',500,'*') from dual connect by level <= 20 union all select mod(level,10)+21, rpad('*',500,'*') from dual connect by level <= 100000 / select x, count(*) from t group by x order by x; */ exec dbms_stats.delete_table_stats( user, 'T' ) exec dbms_stats.gather_table_stats(user,'T',method_opt=>'for columns x size 254', estimate_percent=>1); declare c sys_refcursor; l_rec t%rowtype; begin for i in 1 .. 30 loop delete from plan_table; execute immediate 'explain plan for select * from t where x = ' || i ; for x in ( select plan_table_output from table(dbms_xplan.display()) where plan_table_output like '|* 1 | TABLE ACCESS FULL| T %' ) loop dbms_output.put_line( to_char( i, '999' ) || ') ' || x.plan_table_output ); end loop; end loop; end; /

run the block a couple of times and see how the estimated row counts vary over time. Use estimate_percent=>100 to see the get "fixed" at a value - a perfect value.

Histogram on 11GR2

Rajeshwaran, Jeyabal, May 29, 2011 - 9:30 pm UTC

Tom:

We know that Histograms in 10G are limited to 32 bytes of string.

http://rajeshwaranbtech.blogspot.com/2011/05/histograms-on-varchar232.html

But 11G Optimizer looks pretty interesting and able to handle upto 34 bytes of datas.

http://rajeshwaranbtech.blogspot.com/2011/05/histograms-on-varchar232-11gr2.html

Looks interesting to me, thought of sharing this with you.

May 31, 2011 - 10:04 am UTC

http://www.oracle.com/technetwork/issue-archive/2011/11-jan/o11asktom-195084.html

It is still 32 bytes (that article is using 11gR2)

In your example, if you drop the index - the estimated cardinalities are inaccurate in all cases. That shows it was not the histograms on the column that helped the optimizer out in that case, but rather information gleaned from the index (remember - index creates on non-empty tables have an implicit compute statistics associated with them since 10g so you had detailed index statistics).

It was the index information, not the histograms.

If you run your example in 10g in fact, you'll see different cardinality estimates when the index does/does not exist. I observed it estimating 10 rows with the index and 505 rows without in 10g. Not exactly the same as 11g, but not exactly different either :)

Histogram on 11GR2

Rajeshwaran, Jeyabal, May 31, 2011 - 12:27 pm UTC

Tom:

Even after dropping the index, Estimated Cardinality matches with Actual Cardinality for 34 Bytes of String datas.

http://rajeshwaranbtech.blogspot.com/2011/05/histograms-on-varchar232-11gr2.html?showComment=1306862661122#c706576398309682476

May 31, 2011 - 1:10 pm UTC

umm, your example sort of proves the point. The estimated cardinality without th e index index is 315. With it, it is 1. That is the POINT - the histogram isn't working past 32 bytes.

ops$tkyte%ORA11GR2> drop table t purge; Table dropped. ops$tkyte%ORA11GR2> create table t 2 nologging 3 as 4 select rownum as x, 5 rpad('*',28,'*')||to_char(rownum,'fm00000') as y, 6 sysdate as z, 7 a.* 8 from all_objects a; Table created. ops$tkyte%ORA11GR2> create index t_ind on t(y) nologging; Index created. ops$tkyte%ORA11GR2> begin 2 dbms_stats.gather_table_stats( 3 ownname =>user, 4 tabname =>'T', 5 estimate_percent=> dbms_stats.auto_sample_size, 6 method_opt=>'for all indexed columns size 254' ); 7 end; 8 / PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> select column_name,count(*) 2 from user_tab_histograms 3 where table_name ='T' 4 group by column_name; COLUMN_NAME ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- COUNT(*) ---------- Y 255 ops$tkyte%ORA11GR2> variable y varchar2(40); ops$tkyte%ORA11GR2> exec :y := rpad('*',28,'*')||to_char(55,'fm00000'); PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> set autotrace traceonly explain statistics; ops$tkyte%ORA11GR2> select x,y,z 2 from t 3 where y = :y; Execution Plan ---------------------------------------------------------- Plan hash value: 1376202287 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 56 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 56 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T_IND | 1 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("Y"=:Y) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 567 bytes sent via SQL*Net to client 420 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed ops$tkyte%ORA11GR2> set autotrace off ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> drop table t purge; Table dropped. ops$tkyte%ORA11GR2> create table t 2 nologging 3 as 4 select rownum as x, 5 rpad('*',28,'*')||to_char(rownum,'fm00000') as y, 6 sysdate as z, 7 a.* 8 from all_objects a; Table created. ops$tkyte%ORA11GR2> begin 2 dbms_stats.gather_table_stats( 3 ownname =>user, 4 tabname =>'T', 5 estimate_percent=> dbms_stats.auto_sample_size, 6 method_opt=>'for all indexed columns size 254' ); 7 end; 8 / PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> select column_name,count(*) 2 from user_tab_histograms 3 where table_name ='T' 4 group by column_name; no rows selected ops$tkyte%ORA11GR2> variable y varchar2(40); ops$tkyte%ORA11GR2> exec :y := rpad('*',28,'*')||to_char(55,'fm00000'); PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> set autotrace traceonly explain statistics; ops$tkyte%ORA11GR2> select x,y,z 2 from t 3 where y = :y; Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 723 | 29643 | 420 (1)| 00:00:06 | |* 1 | TABLE ACCESS FULL| T | 723 | 29643 | 420 (1)| 00:00:06 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("Y"=:Y) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 1511 consistent gets 0 physical reads 0 redo size 567 bytes sent via SQL*Net to client 420 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed ops$tkyte%ORA11GR2> set autotrace off

Histogram on 11GR2

Rajeshwaran, Jeyabal, May 31, 2011 - 2:05 pm UTC

Tom:

So you agree that Histograms improved to 34 Bytes in Oracle 11GR2?

Histogram on 11GR2

Rajeshwaran, Jeyabal, May 31, 2011 - 9:52 pm UTC

Tom:

I was looking at the script that you used above (two posting above). (Specifically on the second Create table T statement)

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:707586567563#3475409700346643210

ops$tkyte%ORA11GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname =>'T',
5 estimate_percent=> dbms_stats.auto_sample_size,
6 method_opt=>'for all indexed columns size 254' );
7 end;
8 /

PL/SQL procedure successfully completed.

Since we dont have index on Table 'T', I would mention the method_opt as 'for all columns size 254' and NOT as 'for all indexed columns size 254'.

And Here is what I see after making method_opt=>'for all columns size 254'

rajesh@ORA11GR2> drop table t purge; Table dropped. Elapsed: 00:00:00.40 rajesh@ORA11GR2> rajesh@ORA11GR2> create table t 2 nologging 3 as 4 select rownum as x, 5 rpad('*',28,'*')||to_char(rownum,'fm00000') as y, 6 sysdate as z, 7 a.* 8 from all_objects a; Table created. Elapsed: 00:00:07.30 rajesh@ORA11GR2> rajesh@ORA11GR2> begin 2 dbms_stats.gather_table_stats( 3 ownname =>user, 4 tabname =>'T', 5 estimate_percent=>dbms_stats.auto_sample_size, 6 method_opt=>'for all columns size 254'); 7 end; 8 / PL/SQL procedure successfully completed. Elapsed: 00:00:01.42 rajesh@ORA11GR2> rajesh@ORA11GR2> select column_name,count(*) 2 from user_tab_histograms 3 where table_name ='T' 4 and column_name ='Y' 5 group by column_name; COLUMN_NAME COUNT(*) -------------------- -------- Y 255 Elapsed: 00:00:00.00 rajesh@ORA11GR2> rajesh@ORA11GR2> variable y varchar2(40); rajesh@ORA11GR2> exec :y := rpad('*',28,'*')||to_char(55,'fm00000'); PL/SQL procedure successfully completed. Elapsed: 00:00:00.00 rajesh@ORA11GR2> rajesh@ORA11GR2> set autotrace traceonly explain statistics; rajesh@ORA11GR2> rajesh@ORA11GR2> select x,y,z 2 from t 3 where y = :y; Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 47 | 416 (1)| 00:00:05 | |* 1 | TABLE ACCESS FULL| T | 1 | 47 | 416 (1)| 00:00:05 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("Y"=:Y) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 1498 consistent gets 0 physical reads 0 redo size 567 bytes sent via SQL*Net to client 415 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed rajesh@ORA11GR2> rajesh@ORA11GR2> set autotrace off; rajesh@ORA11GR2> rajesh@ORA11GR2> rajesh@ORA11GR2>

You see the Estimated Cardinality matches with Actual cardinality for even 33 bytes of string data. This clearly proves that 11GR2 Optimizer has improved.

June 01, 2011 - 8:40 am UTC

you are correct in that my dbms_stats was wrong.

You are incorrect about the 32/33/34 byte thing.

Look at the substr(1,32) of your bind. It is:

****************************0005

(the last character is gone). That, coupled with the NDV (number of distinct values) is more than enough to allow the optimizer to come up with the right answer.

The article I've pointed you to clearly shows the optimizer STOPS at 32 characters. Your example does not prove that it goes beyond.

You'll find it extremely difficult to prove something is right with a test case - most often you can use them to prove something is not right or not always right. In this case, my previous article shows it stops at 32 characters. Your test case does not prove that it goes beyond 32 characters. It only shows that with your specific data and the way it is constructed - you "got lucky".

I can show you it doesn't use 33 bytes. Let's construct the test a bit differently:

create table t nologging as select rownum as x, rpad('*',28,'*')||'0005'||mod(rownum,10) as y, sysdate as z, a.* from all_objects a; begin dbms_stats.gather_table_stats( ownname =>user, tabname =>'T', estimate_percent=> dbms_stats.auto_sample_size, method_opt=>'for columns Y size 254' ); end; /

Now, the first 32 bytes are constant - but the 33rd byte is one of ten values. Hence, it will retrieve 10% of the data.

AND there are only ten values in this column - if they fit into a histogram column - there would be 10 buckets, everything would be Perfectly guessable...

but, the optimizer guesses wrong:
ops$tkyte%ORA11GR2> select column_name,count(*) 2 from user_tab_histograms 3 where table_name ='T' 4 group by column_name; COLUMN_NAME COUNT(*) ------------------------------ ---------- Y 1 ops$tkyte%ORA11GR2> variable y varchar2(40); ops$tkyte%ORA11GR2> exec :y := rpad('*',28,'*')||'0005' || '8'; PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> set serveroutput off ops$tkyte%ORA11GR2> set termout off ops$tkyte%ORA11GR2> select * from TABLE(dbms_xplan.display_cursor); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID bw1r5zgh6zr7x, child number 0 ------------------------------------- select x,y,substr(y,1,32), z from t where y = :y Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 420 (100)| | |* 1 | TABLE ACCESS FULL| T | 72241 | 3950K| 420 (1)| 00:00:06 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("Y"=:Y) 18 rows selected.

It guessed every row in the table, not 10% of the table. However, if we back off the length of the string to 32 - just by removing one byte:

ops$tkyte%ORA11GR2> create table t 2 nologging 3 as 4 select rownum as x, 5 rpad('*',28,'*')||'005'||mod(rownum,10) as y, 6 sysdate as z, 7 a.* 8 from all_objects a; Table created. ops$tkyte%ORA11GR2> begin 2 dbms_stats.gather_table_stats( 3 ownname =>user, 4 tabname =>'T', 5 estimate_percent=> dbms_stats.auto_sample_size, 6 method_opt=>'for columns Y size 254' ); 7 end; 8 / PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> select column_name,count(*) 2 from user_tab_histograms 3 where table_name ='T' 4 group by column_name; COLUMN_NAME COUNT(*) ------------------------------ ---------- Y 10 ops$tkyte%ORA11GR2> variable y varchar2(40); ops$tkyte%ORA11GR2> exec :y := rpad('*',28,'*')||'005' || '8'; PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> set serveroutput off ops$tkyte%ORA11GR2> set termout off ops$tkyte%ORA11GR2> select * from TABLE(dbms_xplan.display_cursor); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID bw1r5zgh6zr7x, child number 0 ------------------------------------- select x,y,substr(y,1,32), z from t where y = :y Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 416 (100)| | |* 1 | TABLE ACCESS FULL| T | 7044 | 378K| 416 (1)| 00:00:05 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("Y"=:Y) 18 rows selected.

bingo - 10% guess - because the histogram is now useful. (and exact same thing happens in 10g as well)

This *shows* 32 bytes. Yours 'worked' because your first 32 bytes was discriminating enough.

Test cases are excellent at proving something is (always) true. My test case here shows that it doesn't use 33 bytes, if it did - both of the estimates above would have been the same.

Histogram on 11GR2

Rajeshwaran, Jeyabal, June 01, 2011 - 11:38 am UTC

Thanks for Teaching me again and again Tom!

execution plan change

Reader, June 15, 2011 - 6:12 pm UTC

we are on 10.2.0.4.0. We gather histogram and our application uses bind variables. SQL Execution plan sometimes changes causing "unacceptable" performance. This causes business impact, as reports time out. Will sql plan management feature in 11g address this issue? If so, we can propose to business that upgrading to 11gR2 is the only option. Thanks.

June 17, 2011 - 1:23 pm UTC

what you are seeing is bind variable peeking. You have a query that would be optimized differently depending on which bind input is used to optimize the query

http://www.google.com/search?q=site%3Atkyte.blogspot.com+bind+variable+peeking

One solution could be to NOT gather histograms. You'll get a consistent plan that way. If we do not have statistics that would cause two different plans - you won't get plans that flip flop depending on who parsed them.

You can use sql profiles if you have access to that.

You can use a query plan baseline right now - in 10g. You would create a baseline for that query, and in that baseline would be the "good" plan. We would use that plan.

execution plan change

Reader, June 18, 2011 - 9:51 am UTC

Yes, we do create sql profile for that query but that is after the fact it caused business impact as report timed out. Assuming we are on 11gR2, does 11g feature "sql plan management" help not to accept the "bad" plan because of the plan change due to bind variable peeking/histogram? Thanks.

June 20, 2011 - 9:49 am UTC

plan baselines already exist in 10g, they are the part of sql plan management you are interested in. sql plan management = 10g baselines PLUS plan evolution (and plan evolution could actually evolve the plan you don't want into the baseline under some circumstances)

so you already have what you need. plan baselines. as stated above.

Histograms on columns having NDV's greater than 2200

sri, June 27, 2011 - 10:42 am UTC

Hi Tom,
I created a table with 1 partition(range) and 1 subpartition(list).
insert into above_table
select * from another_table
where cola='0000012345' and rownum <= 21999.

generated histograms on cola and endpoint number is 21999. Fair.

But If I insert with ..and rownum <= 22000 the endpoint number comes up something like 5324...
Please advice.

June 27, 2011 - 11:41 am UTC

cola has one value - where do you have NDV's greater than 2200???

give full example to reproduce with and explain what the issue is (not what you see, but what material side effect you are experiencing)

how dbms_stats determin column orders

shu, June 30, 2011 - 2:39 pm UTC

I'm doing a dbms_stats.gather_table_stats on a large table 800gb and 70 columns, I saw the the process touch lot of columns randomly. Is there a way to find out the order to help me determine how long it will contine to run?

The METHOD_OPT, i used is "col1 size skewonly, col2 size skewonly,col3 size skewonly,col4 size skewonly,col5 size skewonly ... ... "
DB is 11.1

Thanks

July 01, 2011 - 8:59 am UTC

you should be able to monitor that via v$session_longops

A reader, November 03, 2011 - 11:58 am UTC

Hi Tom, create table t as select * from all_objects; create index idx_owner on t(owner); SQL> select object_id from t where owner='ORDDATA'; OBJECT_ID ---------- 58236 58237 58238 58239 58260 Elapsed: 00:00:00.31 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=5 Bytes=150 ) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=2 Card= 5 Bytes=150) 2 1 INDEX (RANGE SCAN) OF 'IDX_OWNER' (INDEX) (Cost=1 Card=5 ) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 449 bytes sent via SQL*Net to client 363 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed I did not gather statistics on the table :so NO histograms select * from all_histograms where table_name like 'T'; ----NULL-- but the CBO estimated the CARDINALITY =5 exactly .. how was it possible ? Thanks

November 03, 2011 - 3:07 pm UTC

dynamic sampling.

SQL> show parameter dynamic

http://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

A reader, November 08, 2011 - 2:36 pm UTC

Hi Tom, SQL> execute dbms_stats.gather_table_stats('HR','T'); I have 2 questions 1.After executing the above procedure- the histograms were not selected on a column (ID)- why ? 2.what is the 'deciding factor' that Oracle decides to collect histograms ? Thanks Tom

November 08, 2011 - 4:53 pm UTC

1) how would I know - I have NO context here at all. what is HR.T. why do you think histograms should be on column ID (id columns are many times unique - not skewed).

2) need context here - and I have zero

A reader, November 08, 2011 - 5:41 pm UTC

Sorry Tom,

select id, count(*) from t
group by id

id count(*)
1 1
50 100
999 9999

its just a temp table I created myself

November 09, 2011 - 7:13 am UTC

give the full example and show your stats - like num rows and such.

Also, did you query the table before gathering statistics?

If the table was a true temporary table - no idea if that is the case or not due to the total lack of any useful information - then it could be that dbms_stats (since it commits) wipes out the data:

ops$tkyte%ORA11GR2> create global temporary table t on commit delete rows 2 as 3 select case when rownum = 1 then 1 4 when rownum <= 51 then 50 5 else 999 6 end id, a.* 7 from all_objects a 8 where 1=0 9 / Table created. ops$tkyte%ORA11GR2> insert into t 2 select case when rownum = 1 then 1 3 when rownum <= 51 then 50 4 else 999 5 end id, a.* 6 from all_objects a 7 where rownum <= 1050 8 / 1050 rows created. ops$tkyte%ORA11GR2> select id, count(*) from t group by id; ID COUNT(*) ---------- ---------- 1 1 999 999 50 50 ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' ); PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> select num_rows from user_tables where table_name = 'T'; NUM_ROWS ---------- 0

Or, it is a real table posing as a temporary table, it could be for the reason documented here:

http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html

in the section "Why Does My Plan Change?"

for example:

ops$tkyte%ORA11GR2> create table t 2 as 3 select case when rownum = 1 then 1 4 when rownum <= 51 then 50 5 else 999 6 end id, a.* 7 from all_objects a 8 where rownum <= 1050 9 / Table created. ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> select id, count(*) from t group by id; ID COUNT(*) ---------- ---------- 1 1 999 999 50 50 ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' ); PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> select histogram from user_tab_columns where table_name = 'T' and column_name = 'ID'; HISTOGRAM --------------- NONE ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> select * from t where id = 1; ID OWNER OBJECT_NAME ---------- ------------------------------ ------------------------------ SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE ------------------------------ ---------- -------------- ------------------- CREATED LAST_DDL_ TIMESTAMP STATUS T G S NAMESPACE --------- --------- ------------------- ------- - - - ---------- EDITION_NAME ------------------------------ 1 SYS ICOL$ 20 2 TABLE 05-SEP-10 05-SEP-10 2010-09-05:15:39:55 VALID N N N 1 ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' ); PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> select histogram from user_tab_columns where table_name = 'T' and column_name = 'ID'; HISTOGRAM --------------- FREQUENCY

To the "Reader" above...

A reader, November 08, 2011 - 7:35 pm UTC

Do you have the explain plan? It is possibly because of the following 2 reasons:

1. It is a small table and hence does not need an index
2. Since you requested for the entire data it has rightly done a full table scan.

Cheers.

November 09, 2011 - 7:14 am UTC

1) that wouldn't matter, you don't gather histograms exclusively on indexed columns.

You gather histograms in order to get the correct estimated cardinalities - that includes columns used in the where clause that are not indexed as well.

2) that wasn't the question :) the question was "why no histograms, the data is skewed"

A reader, November 09, 2011 - 8:00 am UTC

Perfect -Many thanks Tom ,
your article answered my question :
http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html

Many thanks !!

Value Based Histogram-When to Incorporate

Jim Cox, November 30, 2011 - 5:35 pm UTC

Hi Tom

i am just starting into looking at Value Based Histograms

Excerpt from above where you said:

*****

Followup February 21, 2004 - 2pm Central time zone:

I typically start with the equivalent of "table", "index" and "all indexed columns" myself. If you
have the cycles to compute the histograms, they can be helpful (with bind variable peeking and
such).......

*****

I was wondering on just where to start if I wanted to incorporate a Value Base Histogram on a particular table and column ?
I am already doing what you stated above, but I am not sure just when to add a Value Based Histogram

If I have some slow running sql, would i run a test before and then after i incorporate the histogram to see if it speeds up the sql or is there some other method i should pursue to determine if any slow sql would benefit from the histogram ?

Thanks
Jim

December 06, 2011 - 9:49 am UTC

You'll naturally get a value based histogram (frequency) if it has 254 and less distinct values (just use size 254). You'll get height balanced if you have more than 254 values.

I don't see a point in not using size 254 for everything - if there are less than 254 distinct values, that's all we'll store.

Value Based Histogram-When to Incorporate-Thanks

Jim Cox, December 06, 2011 - 4:08 pm UTC

Thanks Tom

Jim

A Reader

awais, December 28, 2011 - 1:15 pm UTC

DB is 10.2.0.2.0

Hi tom as i am gathering stats with histogram for all columns with size 254.

exec dbms_stats.gather_table_stats('INV','MTL_MATERIAL_TRANSACTIONS',method_opt=>'for all columns size 254')

As you can see there are 24 organization id with its number of occurence within table MTL_MATERIAL_TRANSACTIONS

select organization_id,count(organization_id)
from mtl_material_transactions m
group by organization_id
/

ORGANIZATION_ID COUNT(ORGANIZATION_ID)
--------------- -----------------------
84 969421
102 3817258
124 1812622
125 5219763
143 1714215<----
145 3891003
168 1133484
169 3086792
170 1501886
171 2531388
172 1004050
173 2697835
184 798113
185 3242713
204 187333
205 1016991
224 656003
225 2274042
242 255
243 32
264 71905
265 398933
284 5
285 5

As i understand after getting stats with histogram the difference of previous and current cumulative frequency describe the number of rows a bucket contain i.e from above 143 organization id count value is 1714215 but from below histogram for 143 it would be 1933-1688=245 rows contain which is contradict to actual value to 1714215 for 143 , what i am missing for stats gathering please clear me.

select column_name,endpoing_number,endpoint_value
from user_tab_histograms h
where h.table_name='MTL_MATERIAL_TRANSACTIONS' and column_name='ORGANIZATION_ID'
/
COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
------------------ --------------- -------------- ---------------------
ORGANIZATION_ID 699 102
ORGANIZATION_ID 939 124
ORGANIZATION_ID 1688 125
ORGANIZATION_ID 1933 143<-----
ORGANIZATION_ID 2495 145
ORGANIZATION_ID 2660 168
ORGANIZATION_ID 3126 169
ORGANIZATION_ID 3324 170
ORGANIZATION_ID 3690 171
ORGANIZATION_ID 3822 172
ORGANIZATION_ID 4222 173
ORGANIZATION_ID 4321 184
ORGANIZATION_ID 4795 185
ORGANIZATION_ID 4822 204
ORGANIZATION_ID 4954 205
ORGANIZATION_ID 5048 224
ORGANIZATION_ID 5371 225
ORGANIZATION_ID 5383 264
ORGANIZATION_ID 139 84
ORGANIZATION_ID 5442 265

December 29, 2011 - 11:09 am UTC

245 is more than close enough to the right answer - we get worried when estimated cardinalities are off be orders of magnitude (x10), this small difference is not worrisome.

what is the optimizer estimating in plans for that predicate now - something close to reality, or something very far away.

Awais a Reader

Awais, December 30, 2011 - 3:37 am UTC

you said 245 is more than close enough to the right answer...

1)Why not bucket actual row contain for organization id 143 is 1714215 rows?

you said we get worried when estimated cardinalities are off be orders of magnitude (x10), this small difference is not worrisome...

2)I could not understand it , please bear me.

yuo said "what is the optimizer estimating in plans for that predicate now - something close to reality, or something very far away."

Here i go

select * from user_tab_statistics where table_name='MTL_MATERIAL_TRANSACTIONS' / GLOBAL_STATS YES OBJECT_TYPE TABLE STALE_STATS NO USER_STATS NO TABLE_NAME MTL_MATERIAL_TRANSACTIONS SAMPLE_SIZE 38093598 NUM_ROWS 38093598<------------------------- AVG_ROW_LEN 252 BLOCKS 1899357 LAST_ANALYZED 12/27/2011 9:36:08 PM AVG_SPACE_FREELIST_BLOCKS 0 NUM_FREELIST_BLOCKS 0 EMPTY_BLOCKS 0 AVG_SPACE 0 CHAIN_CNT 0 PARTITION_POSITION PARTITION_NAME STATTYPE_LOCKED SUBPARTITION_NAME AVG_CACHED_BLOCKS AVG_CACHE_HIT_RATIO SUBPARTITION_POSITION select * from user_tab_col_statistics where table_name='MTL_MATERIAL_TRANSACTIONS' and column_name='ORGANIZATION_ID' / TABLE_NAME MTL_MATERIAL_TRANSACTIONS COLUMN_NAME ORGANIZATION_ID NUM_DISTINCT 20<-------------------------Distinct values are 20 within stats due to last analyzed on 27th Dec LOW_VALUE C155 HIGH_VALUE C20342 DENSITY 1.32189925035054E-8 NUM_NULLS 0 NUM_BUCKETS 20 LAST_ANALYZED 12/27/2011 9:36:08 PM SAMPLE_SIZE 5448 GLOBAL_STATS YES USER_STATS NO AVG_COL_LEN 4 HISTOGRAM FREQUENCY select a.column_name, apps.display_raw(a.low_value,b.data_type) as low_val, apps.display_raw(a.high_value,b.data_type) as high_val, b.data_type from user_tab_col_statistics a, user_tab_cols b where a.table_name='MTL_MATERIAL_TRANSACTIONS' and b.column_name='ORGANIZATION_ID' and a.table_name=b.table_name and a.column_name=b.column_name COLUMN_NAME ORGANIZATION_ID LOW_VAL 84 HIGH_VAL 265 DATA_TYPE NUMBER

By default, for columns that do not have histograms (statistics that relate to data skew), the statistic called density contains the answer for the
selectivity calculation for an equality condition. In my simple example, stats exist so the selectivity would be for an equality as 1/NDV (# distinct values) for the column(s) used in the WHERE clause.

Cardinality = selectivity * rows in table = 1/20 * 38093598 = 0.05 * 38093598 = 1904680<-------------------------- Organization Id Already Indexed. SQL> select /*+ gather_plan_statistics abc*/ count(*) 2 from mtl_material_transactions 3 where organization_id=143 4 / COUNT(*) ---------- 1722507 SQL> SELECT * FROM table(dbms_xplan.display_cursor(NULL,NULL,'ALLSTATS LAST')) 2 / PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- SQL_ID 813dmq90jmyjy, child number 0 ------------------------------------- select /*+ gather_plan_statistics abc*/ count(*) from mtl_material_transactions where organizatio Plan hash value: 1164751366 ---------------------------------------------------------------------------------------------------- | Id | Operation |Name |Starts|E-Rows |A-Rows | A-Time | Buffers | Reads | ---------------------------------------------------------------------------------------------------- | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:44.28| 5895 | 5840 | |* 2 | INDEX RANGE SCAN| MTL_MATERIAL_TRANSACTIONS_N26| 1 | 1713K| 1722K |00:00:44.83| 589 PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ORGANIZATION_ID"=143) 18 rows selected.

As i have calculated cardinility is 1904680 but optimizer estimating 1713000 , what's wrong with my cardinility calculation?Thats why i bothered
either there is issue with histogram stats.

December 30, 2011 - 11:28 am UTC

sorry, i thought 143 was the actual row count.

that last bit of information there shows the optimizer is guessing 1713k rows and we are getting 1722k rows - that is very close. What is the issue? The optimizer is getting the proper estimated cardinality here?

stop looking at and trying to interpret buckets and endpoints and the like, look at what the optimizer is actually guessing. And it seems to be guessing correctly?

Awais

Awais, December 31, 2011 - 9:49 am UTC

Again same two silly question tom please.

I agree optimizer guess is perfect , but what about the typical cardinality calculation is the below statement and cardinility calculation is wrong?
---------------------------------------------------------
"By default, for columns that do not have histograms (statistics that relate to data skew), the statistic called density contains the answer for the
selectivity calculation for an equality condition. In my simple example, stats exist so the selectivity would be for an equality as 1/NDV (# distinct values) for the column(s) used in the WHERE clause. "

Cardinality = selectivity * rows in table
= 1/20 * 38093598
= 0.05 * 38093598
= 1904680<------

----------------------------------------------------------

If i stop seeing to interpret bucket then how oracle interpret it ? i am 100% agree that what optimizer do should be adhered but what is the backend picture for such a skewed column data , how oracle inerpret the buckets row from histogram?

December 31, 2011 - 10:38 am UTC

If you want to go down that path, I recommend the last two books by Jonathan Lewis.

http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=jonathan+lewis+oracle&x=0&y=0

but yes, without histograms, cardinality calculations on data with skewed values is obviously going to be wrong - that is why we have histograms in the first place.

Awais

Awais, January 02, 2012 - 3:43 am UTC

Tom but why my cardinality calculation is wrong to optimizer cardinility.

Cardinality = selectivity * rows in table
= 1/20 * 38093598
= 0.05 * 38093598
= 1904680<------

January 02, 2012 - 8:39 am UTC

because you are not doing what the optimizer did. I don't go down that route, I look at the output, I don't try to reverse engineer all of the time. I care about the results.

If you want to go down that path, I recommend the last two books by Jonathan Lewis.

http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=jonathan+lewis+oracle&x
=0&y=0

Progress Meter

Hemanshu Sheth, January 03, 2012 - 8:54 am UTC

Is there any function used as a select projection, which returns maximum rowcount of a table which can be used as under for getting the progress of table scan so far?

select ((rownum/rowcount())+(1/rowcount()))*100 from large_table

rowcount() doesn't work in oracle.
In foxrpo, rownum was recno() & rowcount() was reccount().

Regards
Hemanshu

January 03, 2012 - 11:24 am UTC

In order to get the rowcount, you'd have to - well - count all of the rows, which means you wouldn't have to provide a slider so much because you'd know that in order to get the last row - it will just take twice as long as the first (because you would have already hit the last row in order to get the first row).

Oracle does a bit more data management wise than Foxpro did. We do not store a rowcount anywhere.

There is NUM_ROWS from user_tables - however that is only as current as your last statistics gathering. It would tend to give you a fairly good approximation in many cases.

Progress Meter

Hemanshu Sheth, January 04, 2012 - 10:10 am UTC

I was not trying to compare with FoxPro. Oracle is way ahead in data management. But, it was nice to have a slider option to foxpro developer to show the progress meter while the rows are being processed from a large dbf.

I thought that oracle might be storing a current record count (excluding deletions) for each table irrespective of whether stats are being gathered or not. That way basic performance steps like analyzing / stats gathering would not have to be done explicitly.

Any way thanks for your comments.

character limit

D, January 16, 2012 - 10:56 am UTC

does oracle only use the first few characters to form the histogram? eg on 10.2.0.4: SQL> exec dbms_stats.gather_table_stats(user, 'MYTAB', method_opt=>'for columns status size skewonly', degree=>12, cascade=>false, estimate_percent=>100); PL/SQL procedure successfully completed. SQL> select status, count(*) from MYTAB group by status; STATUS COUNT(*) -------------------------------------------------- ---------- IGNORE 62020 PROCESSED 10662504 status is indexed (leading edge). SQL> explain plan for select * from MYTAB where status = 'PROCESSING'; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------- Plan hash value: 2976737173 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 2511M| 568K (1)| 01:53:46 | |* 1 | TABLE ACCESS FULL| MYTAB | 10M| 2511M| 568K (1)| 01:53:46 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("STATUS"='PROCESSING') 13 rows selected. SQL> explain plan for select * from MYTAB where status = 'NEW'; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------- Plan hash value: 2344747963 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 31010 | 7479K| 17533 (1)| 00:03:31 | | 1 | TABLE ACCESS BY INDEX ROWID| MYTAB | 31010 | 7479K| 17533 (1)| 00:03:31 | |* 2 | INDEX RANGE SCAN | IDX_MYTAB_03 | 31010 | | 358 (1)| 00:00:05 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("STATUS"='NEW') 14 rows selected. SQL> explain plan for select * from MYTAB where status = 'IGNORE'; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 2344747963 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 62020 | 14M| 35063 (1)| 00:07:01 | | 1 | TABLE ACCESS BY INDEX ROWID| MYTAB | 62020 | 14M| 35063 (1)| 00:07:01 | |* 2 | INDEX RANGE SCAN | IDX_MYTAB_03 | 62020 | | 713 (1)| 00:00:09 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("STATUS"='IGNORE') 14 rows selected. SQL> in the table theres a few transient statuses ..PROCESSING and NEW. when stats are gathered in the overnight process, theres usually no PROCESSING/NEW rows. with this set up, it gets PROCESSING badly wrong (ie it thinks PROCESSED and PROCESSING are the same card) and makes up a guess for NEW.

January 17, 2012 - 3:06 pm UTC

yes

http://www.oracle.com/technetwork/issue-archive/2011/11-jan/o11asktom-195084.html

but it is 32 characters, it does not affect you.

when stats are gathered in the overnight process,

bad idea isn't that? think about it - why bother even gathering stats if you are doing it when the table has data that doesn't represent reality? Sounds like you want to either gather or set stats once (when the table is 'good', representative) and LOCK those stats - don't gather anymore

(you might have to update some column stats to set new high values for dates/sequences if applicable over time)

what you are seeing is an issue with missing values in a histogram. It'll clump the guess in with something close. It is because you are gathering stats when the data is not representative of reality.

Consider:

ops$tkyte%ORA11GR2> create table t ( status varchar2(20), data varchar2(20) ); Table created. ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> insert into t (status, data) 2 ( 3 select * from ( 4 with data as (select level l from dual connect by level <= 3316), 5 big_data as (select rownum r from data, data) 6 select case when rownum <= 62020 then 'IGNORE' else 'PROCESSED' end, 'xxxxxxx' 7 from big_data 8 ) 9 ) 10 / 10995856 rows created. ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for columns status size 254' ); PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> set autotrace traceonly explain ops$tkyte%ORA11GR2> select * from t where status = 'PROCESSED'; Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 187M| 9650 (2)| 00:01:56 | |* 1 | TABLE ACCESS FULL| T | 10M| 187M| 9650 (2)| 00:01:56 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("STATUS"='PROCESSED') ops$tkyte%ORA11GR2> select * from t where status = 'PROCESSING'; Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 187M| 9650 (2)| 00:01:56 | |* 1 | TABLE ACCESS FULL| T | 10M| 187M| 9650 (2)| 00:01:56 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("STATUS"='PROCESSING') ops$tkyte%ORA11GR2> select * from t where status = 'NEW'; Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 35012 | 615K| 9641 (2)| 00:01:56 | |* 1 | TABLE ACCESS FULL| T | 35012 | 615K| 9641 (2)| 00:01:56 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("STATUS"='NEW') ops$tkyte%ORA11GR2> select * from t where status = 'IGNORE'; Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 70025 | 1230K| 9641 (2)| 00:01:56 | |* 1 | TABLE ACCESS FULL| T | 70025 | 1230K| 9641 (2)| 00:01:56 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("STATUS"='IGNORE') ops$tkyte%ORA11GR2> set autotrace off ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> drop table t; Table dropped. ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> create table t ( status varchar2(20), data varchar2(20) ); Table created. ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> insert into t (status, data) 2 ( 3 select * from ( 4 with data as (select level l from dual connect by level <= 3316), 5 big_data as (select rownum r from data, data) 6 select case when rownum <= 62020 then 'IGNORE' 7 when rownum <= 62400 then 'PROCESSING' 8 when rownum <= 62800 then 'NEW' 9 else 'PROCESSED' 10 end, 'xxxxxxx' 11 from big_data 12 ) 13 ) 14 / 10995856 rows created. ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for columns status size 254' ); PL/SQL procedure successfully completed. ops$tkyte%ORA11GR2> ops$tkyte%ORA11GR2> set autotrace traceonly explain ops$tkyte%ORA11GR2> select * from t where status = 'PROCESSED'; Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 187M| 9580 (2)| 00:01:55 | |* 1 | TABLE ACCESS FULL| T | 10M| 187M| 9580 (2)| 00:01:55 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("STATUS"='PROCESSED') ops$tkyte%ORA11GR2> select * from t where status = 'PROCESSING'; Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1969 | 35442 | 9571 (2)| 00:01:55 | |* 1 | TABLE ACCESS FULL| T | 1969 | 35442 | 9571 (2)| 00:01:55 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("STATUS"='PROCESSING') ops$tkyte%ORA11GR2> select * from t where status = 'NEW'; Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 985 | 17730 | 9571 (2)| 00:01:55 | |* 1 | TABLE ACCESS FULL| T | 985 | 17730 | 9571 (2)| 00:01:55 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("STATUS"='NEW') ops$tkyte%ORA11GR2> select * from t where status = 'IGNORE'; Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 66952 | 1176K| 9571 (2)| 00:01:55 | |* 1 | TABLE ACCESS FULL| T | 66952 | 1176K| 9571 (2)| 00:01:55 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("STATUS"='IGNORE') ops$tkyte%ORA11GR2> set autotrace off

Notice in the first pass - it gets them wrong - because there were no observations of the values you went looking for when the stats were gathered.

However, in the second case - when we gathered stats (and you should just use size 254, you know it is skewed, just go for it) on representative data - we get really nice estimates.

Note: i didn't index, was only interested in the cardinalities.

Why would you gather for "all columns" vs. "all indexed columns"?

Robert, May 09, 2012 - 9:31 am UTC

Tom,

What would be the rationale for gathering histograms for "all columns" vs. "all indexed columns"?

"all INDEXED columns" makes sense... But if a column has skewed data values but it is not indexed, Oracle has no way to get to that data or to know where on disk any particular value is than to blindly read data blocks does it?

So why gather histogram stats on non-indexed columns?

Thanks!

Robert.

May 10, 2012 - 7:20 am UTC

select * from t where indexed_column in (select y from t2 where unindexed_column = 'something');

Now, if it is known that the subquery returns 100 records - maybe the query will use a nested loops approach and an index on the indexed_column

If it is know that the subquery returns 1,000,000 records - maybe the query will use a hash join approach.

Estimated cardinalities are crucial in order to get correct plans - and we need statistics on columns used in where clauses - period. It really doesn't matter if the column is indexed or not - estimated cardinalities are vital to getting the correct plan.

Sajan Varughese, May 10, 2012 - 10:35 am UTC

where are you creating histogram? I thought you would use dbms_stats to generate histograms. I am not seeing that you are doin it.

May 10, 2012 - 3:24 pm UTC

what part of this page are you talking about?

way back when I first wrote this, it was accepted to use ANALYZE (no more) to gather stats.

way back then I did:

analyze table t compute statistics
for table
for all indexes
for all indexed columns;

that gathered table stats, all index stats, and histograms on all indexed columns.

back in the day, today you would use

dbms_stats.gather_table_stats( user, 'T', method_opt=>'for all indexed columns', cascade=>true );

of course

Sajan Varughese, May 11, 2012 - 7:19 am UTC

thanks

I totally get it now

Robert, May 11, 2012 - 3:11 pm UTC

Thank you Tom,

Of course...I see it now.
Thank you for the explanation!

compute statistics on all columns size skewonly

Robert, May 11, 2012 - 3:20 pm UTC

Tom,

1. When an INDEX is build/rebuilt (10.2 and up) Oracle automatically *COMPUTES* all statistics for the index, correct? ... in other words, if an index is built/rebuilt, no reason to collect stats on that index as a separate step afterwards, correct?

2. Would you say under most circumstances, for the sake of the CBO, if possible, we should do "COMPUTE statistics.... on ALL COLUMNS size SKEWONLY".... or even "... on ALL COLUMNS size 254"?

3. Since gathering stats appears not to block/lock the table, assuming we have the CPU to spare, what is wrong with letting COMPUTE statistics (e.g. size 254) run for a large table, during normal operations, even though it might take 2 weeks to complete? ... is there any reason not to do this?

Thanks,

Robert.

May 12, 2012 - 12:37 am UTC

1) correct

2) incorrect. In a reporting system - you should use the default METHOD_OPT - see
http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html
"Why Does My Plan Change?"

In a transactional system with bind variables - you want to be a little more judicious in most cases about when and how histograms are generated.

3) forget CPU, you want to let is consume that IO? In general, it comes at a cost.

In 11g, dbms_stats was basically rewritten under the covers. You'll find that if you go with the defaults - the gathering of statistics will tend to be a compute and will tend to be faster than it used to be in 10g and before (much faster in most cases)

In 10g and before, using estimate pct was common = in order to get valid statistics in a reasonable amount of time.

Remember - if it took weeks to gather - that means for two weeks you were running on non-representative statistics - and when it finished - you were still using non-representative statistics because one presumes some things might have changed in two weeks :)

all columns vs. all indexed columns

Markus, May 17, 2013 - 2:17 pm UTC

Hi Tom,

A third party vendor instructed (or even bothered) us to gather histograms for indexed columns only (method_opt=>'for all indexed columns size auto') by setting table preferences on all their tables using dbms_stats.set_schema_prefs.

They argue using the default of gathering histograms for all columns (method_opt=>'for all columns size auto') would lead to negative performance issues.

How could this be? How can having more histograms and cardinalities (as of calculating value distribution for all columns) lead to bad performance?

Oracle Version = 11.2 EE

Cheers, Markus

May 21, 2013 - 2:29 pm UTC

you gather histograms on columns that have skewed cardinality - and that are used to select rows. regardless of whether they are indexed are not.

consider for example:

select * from big_table where indexed_column in (select something from skewed_table where unindexed_skewed_column = 42 )

Ok, the plans we could probably consider would be:

a) full scan skewed_table to retrieve something, then use an index on big_table.indexed_column to retrieve the rows.

b) full scan skewed_table and full scan big_table to perform a semi join.

now, which one is better? What if where unindexed_skewed_column = 42 returns a ton of rows? What if where unindexed_skewed_column = 42 returns one row?

what if we don't *know* what where unindexed_skewed_column = 42 might return?

You generate histograms on columns you use in where clauses to select rows and those columns contain data that is skewed in nature.

whether those columns are indexed is not relevant.

all columns vs. all indexed columns

Markus, June 04, 2013 - 10:28 am UTC

yes, skewed columns should have their histograms. but why should i generate histograms on indexed columns only?

it's unlikely that only indexed columns are skewed. it's unlikely that they (the 3rd party vendor) use only indexed columns in their predicates.

it's more likely that they get bad performance in fact of missing histograms on not indexed but skewed columns.

isn't it?

June 04, 2013 - 2:19 pm UTC

but why should i generate
histograms on indexed columns only?

in general you don't. It (method_opt=>'for all indexed columns') is really good for demo's, but in real life, it is not very useful. I wish it did not exist.

sometimes your indexed columns are skewed
sometimes they aren't
sometimes your unindexed column are skewed
sometimes they aren't

sometimes your unindexed columns are in predicates
sometimes they are not.

what you want histograms on at most would be

a) skewed columns
b) referenced in predicates

If you just let method_opt default, you'll get that after the database sees some of your queries.

see "Why Does My Plan Change?"

http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html

all columns vs. all indexed columns

Markus, June 06, 2013 - 9:24 am UTC

I found an official explanation from the vendor:

<quote>
The reason for using FOR ALL INDEXED COLUMNS SIZE AUTO as the preferred value,
is that we have extensive experience from many client sides, showing that
statistic/histograms on all columns appear to have caused flawed decisions by
the Oracle optimizer for our app. In these investigated cases, it has been seen
that removing the histograms from all columns and ‘only’ having histograms on
the indexed columns (on the tables involved in the query), changed the
Optimizers decision for the better. So these experiences, together with the
knowledge that by far the majority of queries from our app has predicates on
indexed columns, and that in most cases bind variables are used and that bind
peeking is disabled, have led to this general recommendation.
<\quote>

Note: We had to set "_optim_peek_user_binds = false" on their request (Version 11.2.0.3 EE).

Does this sound reasonable for you?

June 06, 2013 - 1:53 pm UTC

nope,

if I were running a query:

select * from really_big_table rbt, another_really_big_table arbt where rbt.unindex_column = 5 and arbt.unidnex_column = 42 and rbt.something = arbt.something

would you not like a histogram on those unindexed columns if they were skewed? Otherwise you would know a) number of rows in table, b) number of distinct values of column in table.

suppose rbt was 100,000,000 rows and its unindexed column has 1,000 distinct values. So, we'll get about 100,000 rows from that rowsource.

further, suppose arbt has 100,000,000 rows and its unindexed column has 10,000 distinct values. So, we'll get about 10,000 rows from that rowsource.

Now, we'd likely come up with full scan and hash arbt into memory, full scan rbt, hash join them. We'd use arbt as the first table to scan and hash since we assume it would be smaller.

But wouldn't it have been cool to know that rbt.unindex_column = 5 returns just 100 rows? Then we would full scan rbt, and use nested loops to join to arbt.

the goal: getting the optimizer to get the right estimated cardinality so we get the right plan.

or what about the new feature in 11g - extended statistics - where you have a query like:

select * from t where (a+b)/2 > 50;

the DBA might well gather stats on (a+b)/2 simply to get the right estimated cardinality - and there isn't any index on (a+b)/2 (in fact, in the case of many extended statistics there are no indexes on the column groups/expressions). here a feature for gathering stats on unindexed information in general.

You want histograms on columns that you use in predicates. If you only use indexed columns in predicates - fine, but realize please that you are gathering more histograms than you need (you only need it on skewed columns - those primary keys - what a waste to have a histogram on them isn't it)....

some indexed columns need them
some unindexed columns need them

some indexed columns DO NOT need them
some unindexed columns DO NOT need them

all columns vs. all indexed columns

Markus, June 07, 2013 - 1:41 pm UTC

thanks a lot for your opinion.

by the way. does "_optim_peek_user_binds = false" negatively influence or even turn off adaptive cursor sharing?

cheers, markus

June 07, 2013 - 1:45 pm UTC

thanks a lot for your opinion.

actually, not an opinion, a verifiable fact.

gathering histograms on a primary key is a waste of resources (fact, data is never skewed)

not gathering histograms on a skewed, unindexed column can easily lead to the optimizer incorrectly estimating cardinalities and if you have a bad plan it is due to bad cardinalities estimates 99.99999999999999% of the time (fact)

by the way. does "_optim_peek_user_binds = false" negatively influence or even
turn off adaptive cursor sharing?

it would render it meaningless pretty much, yes. The optimizer would always come up with the same plan regardless without bind peeking enabled, there would be nothing to adapt the plan to.

probably just a common way to go

orel, June 07, 2013 - 9:15 pm UTC

Hello Tom, Markus,

I think the point of this software company is just to give some insight to help people to deal with their software on Oracle.
I don't think this means you have to follow every recommendations about statistics. For example Oracle provide an "auto" way to collect statistics, which doesn't mean this is the proper way to go with your data.

I think you are more or less in the same place than Wolfgang Breitling, who used to deal with Peoplesoft software and wrote, few years ago, some valuable articles about optimising query just using statistics.
http://www.centrexcc.com/

I really think you just have to check with your software company that you can deal yourself around statistics without loosing support. And then, hopefully, you will be able to collect what you know to be the best for your data.

Connor and Chris don't just spend all day on AskTOM. You can also catch regular content via Connor's blog and Chris's blog. Or if video is more your thing, check out Connor's channel and Chris's channel from their Youtube channels. And of course, keep up to date with AskTOM via the official twitter account.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

histogram

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Data Skewness

what is the best way to get skewness

Can you please explain above ,little detail for my tiny brain

Histogram

what about bind variable peeking ?

understood

why it's not good to generate histograms always

you said

histogram's effect on performance

Why HISTOGRAM?

stored procedure and histograms

height and width based

Variable response times

Thanks

How intellinget is CBO now

How to get literal from endpoint_value

endpoint_actual_value

endpoint_actual_value

Histograms

Histograms

Automagical parameters

Re: Automagical parameters

Re: Re

Histograms

Which Tables to have histogram?

Helpful

How to delete histogram stats?

Not getting the same output

Same path but diff. LIOs

Re:

Why this is comming in BOLD

Now I know why the other post is in bold :) ... On to the histograms.

Bah! Probably something in the text again. Sorry for that.

Not that easy...

Thanks Tom. That was very informative.

Re:

Got it default is 75 -- thanks

Analyze table

Fixing plan for table having skewed data

For study

Thanks ("Obviously...")

Histograms on system tables.

Histogram

Histrogram

test

Dumb question... How do I NOT create histograms when using DBMS_STATS to gather stats?

statistics & HIST_HEAD$

Find queries using histograms

Histogram is not working ?

The First example in this page against 10204

Guidelines for better sql statement

Questions on histograms and how they are useful

histograms + estimate_percent?

Histogram on 11GR2

Histogram on 11GR2

Histogram on 11GR2

Histogram on 11GR2

Histogram on 11GR2

execution plan change

execution plan change

Histograms on columns having NDV's greater than 2200

how dbms_stats determin column orders

To the "Reader" above...

Value Based Histogram-When to Incorporate

Value Based Histogram-When to Incorporate-Thanks

A Reader

Awais a Reader

Awais

Awais

Progress Meter

Progress Meter

character limit

Why would you gather for "all columns" vs. "all indexed columns"?