
May 27, 2001 - 11pm Central time zone
Reviewer: A reader
What the CBO will do if the bind variable used for the x?
Data Skewness
December 3, 2003 - 5pm Central time zone
Reviewer: A reader from NJ, USA
Tom,
What is the best way to get the data skewness. For example, assume that we have an index on a table
on column COL1. If it uses indexes for certain values and does a full-table scan for other values,
then the approach would be :
1. Check the data skewness. Is that correct?
2. If the column in question is a integer column, how do we do we do that
3. If the column in question is a non-integer column, how do we do that
The reason I am asking this question is that when I generate a histogram for an integer column, I
am able to use the ENDPOINT_NUMBER and ENDPOINT_VALUE to determine the data distribution by finding
the difference in ENDPOINT_NUMBER between the previous and current row (after we sort on
ENDPOINT_NUMBER).
However, for non-numeric columns, only ENDPOINT_NUMBER column shows the numbers. The ENDPOINT_VALUE
shows some cryptic number and the ENDPOINT_ACTUAL_VALUE is NULL.
I request you to provide me a methodology to determine data skewness in general (both for numeric
and non-numeric columns). The text books merely state that we need to check the data skewness, but
does not provide an example or way to determine that. An example from you in this regard will be of
great help
Also, please provide me an example and a clear explanation of the SIZE clause in ANALYZE command
because I am not able to find clear explanation in Oracle documentation or in any performance
tuning books I read so far.
Followup December 3, 2003 - 7pm Central time zone:
select column, count(*) from t group by column;
look at the results, or maybe
select min(cnt), max(cnt), avg(cnt), stddev(cnt)
from ( select column, count(*) cnt from t group by column )
to see if there is skewness -- min far from max indicates skewed.
what is the best way to get skewness
February 20, 2004 - 12pm Central time zone
Reviewer: reader
Tom, in 9i is WIDTH_BUCKET function useful to generate info about skewed data? If so, any example
you have? Thanks.
Followup February 20, 2004 - 1pm Central time zone:
it is a tool you can use to measure "skewedness". for example:
ops$tkyte@ORA920PC> select min(object_id), max(object_id), count(object_id), wb
2 from (
3 select object_id,
4 width_bucket( object_id,
5 (select min(object_id)-1 from all_objects),
6 (select max(object_id)+1 from all_objects), 5 ) wb
7 from all_objects
8 )
9 group by wb
10 /
MIN(OBJECT_ID) MAX(OBJECT_ID) COUNT(OBJECT_ID) WB
-------------- -------------- ---------------- ----------
3 8639 7972 1
8640 17277 8638 2
17278 25915 8619 3
25916 34528 4751 4
35159 43191 238 5
shows that if I bucketize into 5 buckets, most of my data is "low" 1,2,3. object_ids over 26k are
"few and far between"
it is an analysis tool you can use to see what you have.
Can you please explain above ,little detail for my tiny brain
February 21, 2004 - 3am Central time zone
Reviewer: A reader
"
shows that if I bucketize into 5 buckets, most of my data is "low" 1,2,3.
object_ids over 26k are "few and far between"
"
I did not understand what is low 1,2,3. where how to see it is low.
Similarly where/how to see
object_ids over 26k are "few and far between". few must be the count *
Followup February 21, 2004 - 11am Central time zone:
ops$tkyte@ORA920PC> select min(object_id), max(object_id), count(object_id), wb
2 from (
3 select object_id,
4 width_bucket( object_id,
5 (select min(object_id)-1 from all_objects),
6 (select max(object_id)+1 from all_objects), 5 ) wb
7 from all_objects
8 )
9 group by wb
10 /
I took the range:
low object_id-1 .. hi object_id + 1
and broken it into 5 equi-sized buckets (well, I didn't, the database did). In this example, that
was the range
2..43191.
Now those ranges are roughly the min and max object ids in the following report:
MIN(OBJECT_ID) MAX(OBJECT_ID) COUNT(OBJECT_ID) WB
-------------- -------------- ---------------- ----------
3 8639 7972 1
8640 17277 8638 2
17278 25915 8619 3
25916 34528 4751 4
35159 43191 238 5
I say roughly cause I do not have a continous set here -- 34529 for example "has gone missing" --
not in the set.
So, now that we have those five ranges, we assign a bucket to each row in the result set based on
which range they fall into.
As you can see, range 1, 2, 3 have about 8k members each. Most of my data is "low". Most of my
data has an object_id less than 26k.
When you look at buckets 4 and 5 here -- there is only 5k rows between them.
Hence, the data is skewed, most of it resides in the first 3 buckets.
do with that information "what you will" (eg: if those facts are un-interesting to you -- that is
OK. this function is useful to someone analyzing the data -- to them, this would be fascinating
perhaps)
Histogram
February 21, 2004 - 2pm Central time zone
Reviewer: Peter Rhoades from Berkeley California
Good example. However, our DBA runs a script that creates Histograms on EVERY column that's in
EVERY index in EVERY table--whether it makes sense to have the Histogram stats or not. I think
that's crazy but he says it "doesn't hurt." If this is not a new thread, what do you think? Can
too many, needless histograms mess things up?
Followup February 21, 2004 - 2pm Central time zone:
I typically start with the equivalent of "table", "index" and "all indexed columns" myself. If you
have the cycles to compute the histograms, they can be helpful (with bind variable peeking and
such).
Could you skip most/many of them? probably.
Would it change your systems? stats would take a little less time/resources.
I'm not opposed to what they are doing, it does more work then you need in theory. they might
consider "auto" or "skewonly" in 9i.
what about bind variable peeking ?
February 21, 2004 - 3pm Central time zone
Reviewer: Alberto Dell'Era from Milan, Italy
Imagine that a column has skewed values.
As an extreme case, all values are unique except one, say 0, which occurs n times, where n is big
enough to make a fts cheaper than an index scan (we have an index on this column).
What if the app is using bind variables (and so i assume it's an OLTP application) and the app
developers thought "well, we don't compute histograms, so the plan will always be the same, and
will be an index scan. That's ok, since we bind 0 only in very very rare occasions, and we can
afford to pay for a (very) suboptimal index scan in this very very rare occasion."
Now, the aforementioned DBA builds histograms since "they don't hurt", and immediately after 0 is
bound ... voila', full table scans forever (ok, until the death of the shared sql stmt in the
shared pool).
Isn't this an example of "hurting histograms" ?
Wouldn't it be better to avoid collecting them, if the app designers didn't ask for them ?
Alberto
Followup February 21, 2004 - 4pm Central time zone:
the aforementioned DBA would have been building histograms since the beginning of time and you
would have caught this in test/development since you always test/develop on real live sized result
sets (or you spend lots of money on coffee and soda right before/during a rollout -- because you
won't be getting any sleep)
I'll have to assume that what is "standard operating procedure" in production is also in test/dev
-- else, well, you have bigger things to worry about than this.
My point is -- "if this is what they always do, from the get go, I don't see anything inheritly
evil in it"
Sure, anytime you walk into a production system and say "Hey, we are going to make this change" and
just do it -- you stand a risk of "something going wrong".
The app designers in my experience don't even know what a histogram is. They would not know to ask
for that which they do not know about.
understood
February 21, 2004 - 5pm Central time zone
Reviewer: Alberto Dell'Era from Milan, Italy
Thanks for your prompt answer. Well, I assumed the worst scenario since "it doesn't hurt" is
exactly the same phrase that a DBA said during a nightly rollout of mine, before making a change to
the prod db, and the Day After [pun intended] we experienced an incredible loss of performance ...
Got a Pavlovian Reflex to that phrase.
thanks
Alberto
PS What about automatically removing trailing empty lines from reviews/questions posts, like mine
before ? I normally pay attention and get rid of them myself but sometimes I overlook them ...
why it's not good to generate histograms always
February 21, 2004 - 9pm Central time zone
Reviewer: A reader
Hi
Recently we generated histograms for our application and most queries was deadly slow, when we got
rid of histograms performance went back again except some queries.
My question is why queries run slow with histograms? Arent we giving more information to the
optimizer in order to get a better plan? It seems the other way round, the more information you
give the worse CBO performs!
Followup February 22, 2004 - 9am Central time zone:
it depends -- and it really depends on your system characteristics.
Many times people have OLTP systems that have "open ended queries" that they fetch the first 10,
100 or so rows from and never get to the bottom. These (unless hinted as "first rows" or run on a
system where thought was given to the optimizer_index_* init.ora parameters) can perform poorly in
BOTH cases "apparently" (apparently perform poorly, but not really since the optimizer will
optimize for throughput in such a system -- not initial response time. give it more info and it'll
come up with plans that better optimize throughput response times at the expense of initial
responses times. tell it first rows and it'll change that.)
But the bottom line here was -- you introduced "a rather large fundemental change into the system"
and would have to expect some ramifications.
Do this -- develop a system from day one with histograms generated.
Then, take them away
And then report back "what happened". You will find pretty much any major league "change in the
way we do business" is going to have such an effect. For example -- in the above thread with
regards to the DBA that always gets table/index/column stats -- if you want to kill that system,
stop gathering histograms and see what happens.
you said
February 22, 2004 - 4am Central time zone
Reviewer: A reader
hi
You said that DBA should always generate histograms deelopment to catch bind variable peeking
effects.
I always thought we should not generate histograms always and identify those columns with skewed
data with analysist?!?!
As someone else already said, generating histograms always is deadly, I got burnt as well
Followup February 22, 2004 - 9am Central time zone:
no, no I did not.
where did I say "dba should always generate histograms"
generating histograms "as a new operating procedure after the code is in place" is what is known as
a "bad idea (tm)"
NOT generating histograms "as a new operating proceudre after the code is in place" is what is
knowns as a "bad idea(tm)"
what I said was "do what you do, but do what you do consistently. don't use one method in test/dev
and another DIFFERENT method. You most likely got burnt either
o testing in a database with little or no data
o testing in a database with stats gathered one way and a different way in prod
else, you could not "get burned" as you would have discovered this stuff right from the get go in
test/dev.
What I propose is "consistency".
histogram's effect on performance
March 2, 2004 - 10pm Central time zone
Reviewer: Ryan Gaffuri from Tysons Corner, VA
there is a debate on oracle-l about excessive use of histogram's negatively effecting performance.
Carrie Milsap posted the following:
"I don't have access to the data to back this up, but I have seen PARSE call response time move
from over a minute to less than a second by eliminating histograms from the equation. I think when
I last saw this, it was an Oracle8 system."
Do you have any comments on this? How often do you recommend gathering performance? I have been
using 'FOR ALL COLUMNS' by default recently with no adverse affects on performance.
Followup March 3, 2004 - 9am Central time zone:
and it could go the other way as well. histograms are just *data*. in the above, i would have to
guess that histograms caused the optimizer to consider "more stuff", more join paths perhaps (but
it is hard to say, it is really rare to have parse times of a minute).
the big thing with histograms many times is the amount of resources they take to compute in the
first place. I start with "none" or "for all indexed columns" -- the key thing is to be consistent
(in dev, test, and prod). In order to avoid surprises.
Why HISTOGRAM?
March 23, 2004 - 8pm Central time zone
Reviewer: Jayesh from India
Can you please explain why should we go for histogram? What is real purpose of using histograms?
What will be high and low value when using a varchar column?
stored procedure and histograms
July 11, 2004 - 10am Central time zone
Reviewer: Nikhil G Mishra from Hyderabad, India
Tom,
All the variables in queries within stored procedures, are bind variables.
Does that mean , histograms cant help in cases we have a poorly performing query in the stored
procedure involving highly skewed values ? Because with bind variables, the CBO will generate a
"fit all" explain plan based on its best guess estimates ?
nikhil
Followup July 11, 2004 - 10am Central time zone:
it'll use bind variable peeking in 9i -- to determine the best fit plan.
describe to me the nature of a stored procedure that should sometimes full scan and sometimes not?
frequency of run? who runs it? how does that fit in to the system (if interactive, it would give
wildly different response times -- end users would deem it "unreliable").
height and width based
July 12, 2004 - 8am Central time zone
Reviewer: pooja from India
hi tom,
this is very useful and now im able to do histograms
but i have a little bit confusion about height and width based histograms. please explain me with
a simple example.
In CBO, the statistics are calculated based on the predicate in the SQL statement and on the IO,
MEMORY and CPU reads..but how it will calculate statistics...
please expalin this also
Followup July 12, 2004 - 11am Central time zone:
I'll recommend to you the performance guide, a freely available document part of the Oracle
documentation set.....
Variable response times
July 12, 2004 - 7pm Central time zone
Reviewer: Matt from Australia
Above at:
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:707586567563#20833867221653
You say:
"describe to me the nature of a stored procedure that should sometimes full scan
and sometimes not? frequency of run? who runs it? how does that fit in to the
system (if interactive, it would give wildly different response times -- end
users would deem it "unreliable")."
We have a PL/SQL batch process which generates a report on data owned by particular users of the
system. This data is highly skewed. 99.5% of the data is owned by one user the remaining .5% is
split between 5 other users.
For some reason the whole report always worked fine when accessing generating data for the .5%,
however, the 99.5% was extremely slow.
This report is run rarely (it was a report generated at the end of a data migration process to
allow user to verify their data). It was manually run. However, I can see an upcoming requirement
to run this report more often - and it's performance is "unreliable".
What (general - I'll post the detail another time when you are accepting questions) options for
tuning this report were available to me?
In the end I 10046 traced the report and noted lots of index range scans and nested loop joins. I
verified the stats (and histograms) - (dbms_stats "for all columns size auto" "cascade = true").
However, the optimizer failed to pick a full scan. Due to time pressure I've slotted lots of
"full(t)" hints in the query and improved the reports execution time from 4+ hours to 15 minutes -
for all cases of data.
Obviously you would need more information to comment, but what other approaches are available to
get the optimizer to do what it is supposed to?....yes I will post this when I see the accepting
questions button.
Best Regards,
Followup July 12, 2004 - 11pm Central time zone:
a report that is run rarely against known skewed data can certainly use "no binds".
dynamic sql via ref cursors might be the answer.
Thanks
July 13, 2004 - 3am Central time zone
Reviewer: Matt from Australia
The report uses dynamic sql and ref cursors. Currently the skewed data columns are not literals,
though.
I will give this approach a whirl and see whether I get a good a run time.
How intellinget is CBO now
July 28, 2004 - 9pm Central time zone
Reviewer: A reader
Hi Tom, one hard question,
I had been trying to test it,
but I didn't get it, so I ask you.
Now, the CBO, ONLY use histogram info, when he have not binded variables like ='ADM'
or there is some exception that allows you for example i don't know in the soft parse to take
advantage of histogram info.
Thanks Tom
Followup July 29, 2004 - 7am Central time zone:
that is not true, there is bind variable peeking going on at hard parse time too (search for
peeking on this site to read more)
during the soft parse, the plan is in place -- literal or not - the histograms would not be
consulted.

July 29, 2004 - 10am Central time zone
Reviewer: A reader
Thanks Tom
From
OracleWorld Copenhagen, Database Track
BIND VARIABLES AND CURSOR SHARING – NEW DIRECTIONS IN
ORACLE9I
Bjørn Engsig, Miracle A/S
"BIND VARIABLE PEEKING IN ORACLE9I In Oracle9i, the optimizer will peek at actual values of bind
variables if these are available. By doing this, the optimizer can use generated table/index- or
column-level statistics when bind variables are used. This will benefit both applications that
themselves are written to use bind variables, and applications that have had literals replaced by
bind variables using the cursor_sharing parameter. The tests that we have conducted show that the
bind variable peeking works as expected: the optimizer will make its decision based on actual
values of bind variables. However, once an execution plan is generated by the optimizer, no further
peeking into bind variables will take place; neither from the same session or from a different
session parsing the same SQL statement. In both cases, the same SQL statement and associated
execution plan is found in the library cache."
Followup July 29, 2004 - 1pm Central time zone:
that is correct -- that is in line with what I said. during hard parse, they would be used, during
soft parse -- not.
How to get literal from endpoint_value
July 30, 2004 - 2pm Central time zone
Reviewer: A reader
Hi Tom, please
1) Could you please tell me how to get the values from dba_histograms view,
for example instead of 4.12433214 to get 'NY'
something more friendly to analyze (why ENDPOINT_ACTUAL_VALUE column is not always filled)
2) Do you agree with oracle documentation when it says to use size auto
I think is something like this
DBMS_STATS.GATHER_SCHEMA_STATS( OWNNAME=>'ADM', ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL COLUMNS
SIZE AUTO ');
3) Which is the real different between
DBMS_STATS.GATHER_SCHEMA_STATS( OWNNAME=>'ADM', ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL COLUMNS
SIZE AUTO ');
and
DBMS_STATS.GATHER_SCHEMA_STATS( OWNNAME=>'ADM', ESTIMATE_PERCENT=>100,METHOD_OPT=>'FOR ALL COLUMNS
SIZE SKEWONLY ');
4) How I know the column is heigh or value based histogram
Thanks Tom
Followup July 30, 2004 - 6pm Central time zone:
1) you'll want to search for:
endpoint_value endpoint_actual_value
on this site, but:
ops$tkyte@ORA9IR2> create table t ( x varchar2(26) );
Table created.
ops$tkyte@ORA9IR2> insert into t values ( 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' );
1 row created.
ops$tkyte@ORA9IR2> insert into t values ( 'XXXX' );
1 row created.
ops$tkyte@ORA9IR2> insert into t values ( 'YYYY' );
1 row created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace function hexstr( p_number in number ) return varchar2
2 as
3 l_str long := to_char(p_number,'fm'||rpad('x',50,'x'));
4 l_return varchar2(4000);
5 begin
6 while ( l_str is not null )
7 loop
8 l_return := l_return || chr(to_number(substr(l_str,1,2),'xx'));
9 l_str := substr( l_str, 3 );
10 end loop;
11 return l_return;
12 end;
13 /
Function created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics for all columns;
Table analyzed.
ops$tkyte@ORA9IR2> column column_name format a10
ops$tkyte@ORA9IR2> column ENDPOINT_ACTUAL_VALUE format a20
ops$tkyte@ORA9IR2> select endpoint_number, endpoint_value, endpoint_actual_value,
2 to_char(endpoint_value,rpad('x',32,'x') ),
3 substr( hexstr( endpoint_value ), 1, 30 ) ev
4 from dba_tab_histograms where table_name = 'T' and owner = user;
ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
--------------- -------------- --------------------
TO_CHAR(ENDPOINT_VALUE,RPAD('X',3 EV
--------------------------------- ------------------------------
0 3.3884E+35
41424344454657fd6de9c0ccc00000 ABCDEFW�m���
1 4.6393E+35
59595958ffffefab8e7525d2400000 YYYX��龜u%�@
it is not as simple as "a low bound"....
2) I'm thinking you don't need histograms too often and given how expensive they are to compute --
for all columns is hugely expensive, so I might be looking at doing it just for the columns deemed
necessary.
3) search for skewonly, we've written about this before.
4) depends on the cardinality -- searching for the terms in #1 will lead you to some discussions on
that.

July 30, 2004 - 6pm Central time zone
Reviewer: A reader
Thanks Tom
endpoint_actual_value
August 17, 2004 - 10pm Central time zone
Reviewer: A reader
I gathered histograms on a table with
method_opt=>'for all columns size 254'.
When I query *_tab_histograms for a specific column in that table, I get my 254 rows, but the
ENDPOINT_ACTUAL_VALUE for all of them is null.
Why is this? This is throwing off the cardinality estimates for queries involving where column like
'blah%'
Thanks
endpoint_actual_value
August 17, 2004 - 11pm Central time zone
Reviewer: A reader
OK the endpoint_value (the number in scientific notation) is probably the hex representation of my
string or something.
anyway, i am trying to understand the following
select * from table
where col like 'A%' or col like 'B%'
col has a UNIQUE index, it has histograms using 254 buckets, the no. of distinct values of col is
20000
The above query works fine, the cardinality estimate for each of the CONCATENATION steps is close
to reality.
But when I add another OR predicate like
select * from table
where col like 'A%' or col like 'B%' or col like 'C%'
the query tanks, the plan shows a CONCATENATION of 3 INDEX RANGE SCANS, each with a cardinality of
ONE! The total card for the query is 12 but the query returns 2000 rows.
I tried to create a simple, concise test case for you
create table t (i varchar2(6) unique);
insert into t select dbms_random.string('x',6) from all_objects where rownum<=20000;
exec dbms_stats.gather_table_stats(null,'t',method_opt=>'for all columns size 254')
select * from t
where i like 'A%' or i like 'B%' or i like 'C%'
I guess since the table is so skinny, it chooses a FTS for this instead of the 3 individual range
scans and a concatenation, so I am unable to reproduce what I am seeing in my actual table as I
mention above.
Help? Thanks
Followup August 18, 2004 - 7am Central time zone:
when you have more values than buckets (254 in your case), you have to expect "guestimation" to
kick in -- you cannot represent the actual cardinality in the data dictionary. you get ranges and
within each range -- you get estimates.
throw in a "like" or two and you really guess. (your C values could be crossing many "buckets"
there)
but why would 2000 rows out of 20,000 via an index be necessarily "a bad thing (tm)"?
and are you really sure you want histograms on every column? especially columns with lots and lots
of distinct values like this one?
Histograms
August 18, 2004 - 10am Central time zone
Reviewer: A reader
<quote>
but why would 2000 rows out of 20,000 via an index be necessarily "a bad thing (tm)"?
<quote>
Its not bad, the query performs fine
select acctno,blah,blah from <view>
where acctno like 'A%' or acctno like 'B%' or acctno like 'C%'
performs great. Returns 2000 rows from possible 20000. Fine.
As soon as I change this to
select acctno,blah,blah from <view>
where acctno like 'A%' or acctno like 'B%' or acctno like 'C%'
order by acctno
it tanks (I posted this in another review recently also, sorry, maybe we can continue this
discussion here)
"and are you really sure you want histograms on every column? especially columns with lots and
lots of distinct values like this one?"
OK I just deleted stats on the 3 tables involved in the view and re-gathered them using dbms_stats
'for all indexed columns' and now the query is great even with the order by!
So, seems like the histograms on the UK (acctno) were actually hurting the performance!
Questions
1. I didnt know that having histograms can actually be detrimental to queries? Does this make
sense? I thought that , if anything, they can only help?
2. I noticed that the overall cardinality of the query with the good performance is 10, but the
actual rows returned is 2000.
The performance is good, so I dont really care, but if the CBO estimated card is so much off, is
there a problem lurking out there that might adversely affect other queries using these tables?
Thanks
Followup August 18, 2004 - 10am Central time zone:
need to *see the plans*
for histograms it depends on the number of values and buckets. when you have 254 buckets but 2000
values -- well, you don't have a bucket/value so it can be mislead at that point.
Histograms
August 18, 2004 - 11am Central time zone
Reviewer: A reader
Darn it, now no matter what I do, I cannot reproduce the "bad plan" that was resulting in the poor
performance with the order by.
I deleted all stats on the 3 tables using
dbms_stats.delete_table_stats(null,'table_name')
re-gathered them with the 254-bucket histograms and everything is still good!
This is what frustrates me sometimes about the CBO, everything is too magical and automatic, it is
hard to reproduce stuff.
Oh well, I will keep an eye on it and see if the problem reoccurs.
Thanks
Automagical parameters
October 29, 2004 - 9am Central time zone
Reviewer: Ronald from Fortress Hill, Hong Kong
Problem with CBO and gather_stats:
sql> @desc tshi
Elapsed: 00:00:00.00
Datatypes for Table tshi
Data Data
Column Name Type Length Nullable Dafault COMMENTS
-------------------------- -------------- ---------- -------- -------- ---------
SHI_NUM NUMBER (4,0) not null
POL_NUM VARCHAR2 10 not null
REASN_CODE VARCHAR2 3 not null
ACCT_MNE_CD VARCHAR2 8 not null
SUSP_DT DATE 7 not null
TRXN_CD VARCHAR2 8 not null
SUSP_AMT NUMBER (13,2) not null
CRCY_CODE VARCHAR2 2 not null
TRXN_ID VARCHAR2 15 null
UNDO_TRXN_ID VARCHAR2 15 null
SUSP_REASN VARCHAR2 3 null
POL_STAT_CD VARCHAR2 1 null
FREZ_CODE VARCHAR2 1 null
PD_TO_DT DATE 7 null
MODE_PREM NUMBER (11,2) null
PMT_MODE VARCHAR2 5 null
Elapsed: 00:00:00.01
Indexes on tshi
Index Is
Name Unique columns
------------------------------------------------------------ ------
--------------------------------
SHI_PK Yes POL_NUM, SHI_NUM, SUSP_DT
SHI_IDX1 No TRXN_ID
I_SHI_POL_FK No POL_NUM
Elapsed: 00:00:00.04
sql> select min(TRXN_ID), max(TRXN_ID), count(TRXN_ID), wb
2 from ( select TRXN_ID,
3 width_bucket( to_number(TRXN_ID), (select to_number(min(TRXN_ID)) from TSHI),
4 (select to_number(max(TRXN_ID)) from TSHI), 10) wb
5 from TSHI) group by wb ;
MIN(TRXN_ID) MAX(TRXN_ID) COUNT(TRXN_ID) WB
--------------- --------------- -------------- ----------
001304030000001 020405190002259 26300 1
041304030000001 060405190002259 26300 3
061304030000001 080405190002259 26300 4
081304030000001 098405190002259 23670 5
200406011220485 200406011220486 2 11
021304030000001 040405190002259 26300 2
200206040000008 200406011220484 4623660 10
0
8 rows selected.
Elapsed: 00:00:30.04
cas@chkp.world> select count(*) from tshi ;
COUNT(*)
----------
5290940
1 row selected.
SQL and autotrace plans:
sql> select *
from tshi
where undo_trxn_id is null
and trxn_id >= '200304150000003'
and pol_num = '2870000013'
order by trxn_id, shi_num ;
<<run forever>>
Plan 1:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=100)
1 0 SORT (ORDER BY) (Cost=7 Card=1 Bytes=100)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'TSHI' (Cost=5 Card=1 Bytes=100)
3 2 INDEX (RANGE SCAN) OF 'SHI_IDX1' (NON-UNIQUE) (Cost=3 Card=1)
Plan 2:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=100)
1 0 SORT (ORDER BY) (Cost=8 Card=1 Bytes=100)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'TSHI' (Cost=6 Card=1 Bytes=100)
3 2 INDEX (RANGE SCAN) OF 'SHI_PK' (UNIQUE) (Cost=3 Card=22)
Plan 3:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=75)
1 0 SORT (ORDER BY) (Cost=7 Card=1 Bytes=75)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'TSHI' (Cost=5 Card=1 Bytes=75)
3 2 INDEX (RANGE SCAN) OF 'I_SHI_POL_FK' (NON-UNIQUE) (Cost=3 Card=16)
SHI_NUM POL_NUM REA ACCT_MNE SUSP_DT TRXN_CD SUSP_AMT CR TRXN_ID
---------- ---------- --- -------- ------------------ -------- ---------- -- ---------------
UNDO_TRXN_ID SUS P F PD_TO_DT MODE_PREM PMT_M
--------------- --- - - ------------------ ---------- -----
880 2870000013 301 SMMKKY 15-Apr-03 12:00 XOOREZ 1500 04 200
304150000003
6 0 04-Apr-03 12:00 750 01
881 2870000013 108 SMMKKY 15-Apr-03 12:00 XOOREZ -1500 04 200
304150000004
6 0 04-Apr-03 12:00 750 01
2 rows selected.
Elapsed: 00:00:00.46
Test results:
(tshi /w) (autotrace plan)
1) no stats ----> Plan 3
2) analyze + estimate ----> Plan 3
3) analyze + compute ----> Plan 1
4) dbms_stats + estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE + method_opt => 'FOR ALL INDEXED
COLUMNS SIZE AUTO' ----> Plan 1
5) dbms_stats + method_opt => 'FOR ALL COLUMNS SIZE 5' ----> Plan 2
Question:
1) Why the CBO tends to choose the bad index? (since from the predicate itself should a equal
condition more selective than that of a range condition??? (very confusing to me.))
2) What is the correct (->less risk) bucket size? seemed the automagical values is not doing good,
should it be just 254 all the way for the sake of simplicity of it?
3) do you actually have a post here on how to search for answers, yes=there are almost all the
answers here already, but the words are simply too common to locate what I need, want to know where
magical keywords like show_space, skewonly from?
Followup October 29, 2004 - 10am Central time zone:
can we see a tkprof . with so few rows "runs forever" doesn't compute to me
how many buckets did "auto" actually come up with.
do you use binds in the "real world"
Re: Automagical parameters
October 29, 2004 - 2pm Central time zone
Reviewer: Ronald from Fortress Hill, Hong Kong
the "run forever" is just it take so longer I never bother to wait till finish.
the "auto" produced 2 buckets
yes we did, but it makes the story longer,
with bind variable
the CBO seemed always always pick SHI_PK
why is it so???
and if c/200304150000003/100304150000003/
it magically picks I_SHI_POL_FK for different stats analyzed
I am very much wanted to know why???
does this has anything to do with peeking? (or halloween)
Re: Re
November 1, 2004 - 12pm Central time zone
Reviewer: Ronald Chan from Fortress Hill, Hong Kong
The "auto" always produced 2 buckets
how to parameter gather_stats in order to safely help the optimizer make decisions.
Histograms
November 2, 2004 - 4am Central time zone
Reviewer: Ronald from Fortress Hill, Hong Kong
Trace output for when the query use wrong index and run for 1:15 hour
ora_3376.trc:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /test/oracle/product/9.2.0.4
System name: HP-UX
Node name: hkgxd09
Release: B.11.00
Version: U
Machine: 9000/800
Instance name: tun8
Redo thread mounted by this instance: 1
Oracle process number: 11
Unix process pid: 3376, image: oracle@hkgxd09 (TNS V1-V3)
*** 2004-11-02 11:14:58.848
*** SESSION ID:(17.1593) 2004-11-02 11:14:58.848
QUERY
SELECT * FROM TSHI
WHERE POL_NUM = '2870000013'
AND TRXN_ID >= '200304150000003'
AND UNDO_TRXN_ID IS NULL
ORDER BY TRXN_ID, SHI_NUM
***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
OPTIMIZER_FEATURES_ENABLE = 9.2.0
OPTIMIZER_MODE/GOAL = Choose
_OPTIMIZER_PERCENT_PARALLEL = 101
HASH_AREA_SIZE = 131072
HASH_JOIN_ENABLED = TRUE
HASH_MULTIBLOCK_IO_COUNT = 0
SORT_AREA_SIZE = 65536
OPTIMIZER_SEARCH_LIMIT = 5
PARTITION_VIEW_ENABLED = FALSE
_ALWAYS_STAR_TRANSFORMATION = FALSE
_B_TREE_BITMAP_PLANS = TRUE
STAR_TRANSFORMATION_ENABLED = TRUE
_COMPLEX_VIEW_MERGING = TRUE
_PUSH_JOIN_PREDICATE = TRUE
PARALLEL_BROADCAST_ENABLED = TRUE
OPTIMIZER_MAX_PERMUTATIONS = 2000
OPTIMIZER_INDEX_CACHING = 0
_SYSTEM_INDEX_CACHING = 0
OPTIMIZER_INDEX_COST_ADJ = 100
OPTIMIZER_DYNAMIC_SAMPLING = 1
_OPTIMIZER_DYN_SMP_BLKS = 32
QUERY_REWRITE_ENABLED = FALSE
QUERY_REWRITE_INTEGRITY = ENFORCED
_INDEX_JOIN_ENABLED = TRUE
_SORT_ELIMINATION_COST_RATIO = 0
_OR_EXPAND_NVL_PREDICATE = TRUE
_NEW_INITIAL_JOIN_ORDERS = TRUE
ALWAYS_ANTI_JOIN = CHOOSE
ALWAYS_SEMI_JOIN = CHOOSE
_OPTIMIZER_MODE_FORCE = TRUE
_OPTIMIZER_UNDO_CHANGES = FALSE
_UNNEST_SUBQUERY = TRUE
_PUSH_JOIN_UNION_VIEW = TRUE
_FAST_FULL_SCAN_ENABLED = TRUE
_OPTIM_ENHANCE_NNULL_DETECTION = TRUE
_ORDERED_NESTED_LOOP = TRUE
_NESTED_LOOP_FUDGE = 100
_NO_OR_EXPANSION = FALSE
_QUERY_COST_REWRITE = TRUE
QUERY_REWRITE_EXPRESSION = TRUE
_IMPROVED_ROW_LENGTH_ENABLED = TRUE
_USE_NOSEGMENT_INDEXES = FALSE
_ENABLE_TYPE_DEP_SELECTIVITY = TRUE
_IMPROVED_OUTERJOIN_CARD = TRUE
_OPTIMIZER_ADJUST_FOR_NULLS = TRUE
_OPTIMIZER_CHOOSE_PERMUTATION = 0
_USE_COLUMN_STATS_FOR_FUNCTION = TRUE
_SUBQUERY_PRUNING_ENABLED = TRUE
_SUBQUERY_PRUNING_REDUCTION_FACTOR = 50
_SUBQUERY_PRUNING_COST_FACTOR = 20
_LIKE_WITH_BIND_AS_EQUALITY = FALSE
_TABLE_SCAN_COST_PLUS_ONE = TRUE
_SORTMERGE_INEQUALITY_JOIN_OFF = FALSE
_DEFAULT_NON_EQUALITY_SEL_CHECK = TRUE
_ONESIDE_COLSTAT_FOR_EQUIJOINS = TRUE
_OPTIMIZER_COST_MODEL = CHOOSE
_GSETS_ALWAYS_USE_TEMPTABLES = FALSE
DB_FILE_MULTIBLOCK_READ_COUNT = 32
_NEW_SORT_COST_ESTIMATE = TRUE
_GS_ANTI_SEMI_JOIN_ALLOWED = TRUE
_CPU_TO_IO = 0
_PRED_MOVE_AROUND = TRUE
***************************************
BASE STATISTICAL INFORMATION
***********************
Table stats Table: TSHI Alias: TSHI
TOTAL :: CDN: 5301581 NBLKS: 77544 AVG_ROW_LEN: 88
-- Index stats
INDEX NAME: I_SHI_POL_FK COL#: 2
TOTAL :: LVLS: 2 #LB: 18481 #DK: 324526 LB/K: 1 DB/K: 1 CLUF: 357471
INDEX NAME: SHI_IDX1 COL#: 9
TOTAL :: LVLS: 2 #LB: 17940 #DK: 4595967 LB/K: 1 DB/K: 1 CLUF: 4400837
INDEX NAME: SHI_PK COL#: 2 1 5
TOTAL :: LVLS: 2 #LB: 27726 #DK: 5301581 LB/K: 1 DB/K: 1 CLUF: 405855
_OPTIMIZER_PERCENT_PARALLEL = 100
***************************************
SINGLE TABLE ACCESS PATH
Column: POL_NUM Col#: 2 Table: TSHI Alias: TSHI
NDV: 324526 NULLS: 0 DENS: 3.0814e-06
NO HISTOGRAM: KT: 1 #VAL: 2
Column: UNDO_TRXN_ Col#: 10 Table: TSHI Alias: TSHI
NDV: 2916 NULLS: 5287899 DENS: 3.4294e-04
NO HISTOGRAM: KT: 1 #VAL: 2
Column: TRXN_ID Col#: 9 Table: TSHI Alias: TSHI
NDV: 4595967 NULLS: 549056 DENS: 2.1758e-07
NO HISTOGRAM: KT: 1 #VAL: 2
TABLE: TSUSPENSE_HISTORIES ORIG CDN: 5301581 ROUNDED CDN: 1 CMPTD CDN: 0
Access path: tsc Resc: 7459 Resp: 1865
Access path: index (equal)
Index: I_SHI_POL_FK
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 5
IX_SEL: 0.0000e+00 TB_SEL: 3.0814e-06
Access path: index (scan)
Index: SHI_IDX1
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 4
IX_SEL: 2.2177e-07 TB_SEL: 2.2177e-07
Skip scan: ss-sel 0 andv 9989
ss cost 9989
index io scan cost 0
Access path: index (scan)
Index: SHI_PK
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 5
IX_SEL: 3.0814e-06 TB_SEL: 3.0814e-06
Access path: index (equal)
Index: I_SHI_POL_FK
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 3
IX_SEL: 3.0814e-06 TB_SEL: 3.0814e-06
Access path: index (index-only)
Index: SHI_IDX1
TABLE: TSHI
RSC_CPU: 0 RSC_IO: 3
IX_SEL: 2.2177e-07 TB_SEL: 2.2177e-07
SORT resource Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 1
Blocks to Sort: 1 Row size: 21 Rows: 2
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
Total Temp space used: 0
BEST_CST: 4.00 PATH: 4 Degree: 1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]: TSHI [TSHI]
ORDER BY sort
SORT resource Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 1
Blocks to Sort: 1 Row size: 93 Rows: 1
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
Total Temp space used: 0
Best so far: TABLE#: 0 CST: 6 CDN: 1 BYTES: 75
****** Recost for parallel table scan *******
***************************************
SINGLE TABLE ACCESS PATH
TABLE: TSHI ORIG CDN: 5301581 ROUNDED CDN: 1 CMPTD CDN: 0
Access path: tsc Resc: 7459 Resp: 1865
BEST_CST: 1865.00 PATH: 2 Degree: 4
***********************
Join order[1]: TSHI [TSHI]
ORDER BY sort
SORT resource Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 1
Blocks to Sort: 1 Row size: 93 Rows: 1
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
Total Temp space used: 0
SORT response Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 4
Blocks to Sort: 1 Row size: 93 Rows: 1
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
SORT resource Sort statistics
Sort width: 3 Area size: 43008 Max Area size: 43008 Degree: 1
Blocks to Sort: 1 Row size: 93 Rows: 1
Initial runs: 1 Merge passes: 1 IO Cost / pass: 2
Total IO sort cost: 2
Total CPU sort cost: 0
Total Temp space used: 0
Final:
CST: 6 CDN: 1 RSC: 6 RSP: 6 BYTES: 75
IO-RSC: 6 IO-RSP: 6 CPU-RSC: 0 CPU-RSP: 0
2) "auto" created 2 buckets
3) yes we use bind in "real world"
but just wanted to know how to create "reliable" statistics for this case.
Which Tables to have histogram?
February 10, 2005 - 6am Central time zone
Reviewer: A reader from MD, USA
Tom,
I have created histograms on some big tables(10 to 40 million rows) and benefited. I do not want to
create histograms on all tables. Is there a way I can find which tables will be best candidates for
creating histograms? Is there a way to quantify how much I will get benefited, performacnewise -
apart from doing it in stress/test environment?
Thanks,
Followup February 11, 2005 - 2am Central time zone:
do you have some knowledge of your data?
if so, you might know the candidate columns (which ones are skewed) and whether you place
predicates on them at all.
if you do not, you can query:
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:30966075177614
and see if a column you predicate on frequently is "skewed" and may benefit from this.
Helpful
February 13, 2005 - 1pm Central time zone
Reviewer: A reader from MD, USA
Tom,
Your reply & the link was very helpful. I can decide about the bucket width easily with this.
However, AUTO or SKEWONLY - I have found to address this effectively. Any comments?
Also, I am concerned with deciding what tables to collect histograms on, rather than the other
details once we decide what table. Histograms take long time to collect, even on big servers -
atleast for big tables. And since all my tables are MONITORED, I do not know when actually the
stats willbe gathered - it could be right in the core business hour of my DSS. So, I prefer to be
very selective in collecting histograms. Any input, guidelines?
Thanks,
Followup February 13, 2005 - 4pm Central time zone:
this is simple a query for you to visualize the data, nothing more, nothing less.
monitoring does NOT mean you do not "know" when... you still call dbms_stats.gather_.... it'll
only happen during core business hours if YOU make it so (so don't do that)
auto and skewonly do what they do -- dbms_stats peeks at the data sort of like I did and figures
out if it wants to.
How to delete histogram stats?
April 29, 2005 - 6pm Central time zone
Reviewer: Jim
How do I delete histogram stats related to a particular column from data dictionary? Thanks.
Followup April 29, 2005 - 6pm Central time zone:
ops$tkyte@ORA10GR1> create table t as select * from all_users;
Table created.
ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns
size 254' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> select column_name, count(*) from user_tab_histograms where table_name = 'T'
group by column_name;
COLUMN_NAME COUNT(*)
------------------------------ ----------
CREATED 22
USERNAME 32
USER_ID 32
ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> exec dbms_stats.delete_column_stats( user, 'T', 'USERNAME' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR1> select column_name, count(*) from user_tab_histograms where table_name = 'T'
group by column_name;
COLUMN_NAME COUNT(*)
------------------------------ ----------
CREATED 22
USER_ID 32
Not getting the same output
November 5, 2005 - 10am Central time zone
Reviewer: Jagjeet Singh from Delhi, India
Hi,
I am not getting the same result.
Can you pls. look into it.
SQL> drop table t;
Table dropped.
SQL> create table t as select * from dba_objects;
Table created.
SQL> update t set object_type = 'TABLE';
5795 rows updated.
SQL> update t set object_type = 'INDEX' where rownum = 1;
1 row updated.
SQL> create index ind_t on t (object_Type) ;
Index created.
SQL> analyze table t compute statistics;
Table analyzed.
select *
from
t where object_Type = 'TABLE'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 388 0.12 0.11 0 684 0 5794
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 390 0.12 0.12 0 684 0 5794
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 23
Rows Row Source Operation
------- ---------------------------------------------------
5794 TABLE ACCESS FULL T (cr=684 r=0 w=0 time=61709 us)
*********************************************
select *
from
t where object_type = 'INDEX'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.01 0 312 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.02 0.01 0 312 0 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 23
Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS FULL T (cr=312 r=0 w=0 time=10384 us)
**********************************************
Followup November 6, 2005 - 8am Central time zone:
well, you didn't do what I did - I gathered histograms, you did not. (the question was all about
*histograms*)
here is the example updated to use dbms_stats properly:
ops$tkyte@ORA10GR2> create table t
2 as
3 select 99 id, a.*
4 from all_objects a;
Table created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> update t set id = 1 where rownum = 1;
1 row updated.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> create index t_idx on t(id);
Index created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> begin
2 dbms_stats.gather_table_stats
3 ( user,
4 'T',
5 method_opt=>'for all indexed columns size 254',
6 cascade => true );
7 end;
8 /
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> select * from t where id = 1;
Execution Plan
----------------------------------------------------------
Plan hash value: 470836197
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 96 | 2 (0)| 00:
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 96 | 2 (0)| 00:
|* 2 | INDEX RANGE SCAN | T_IDX | 1 | | 1 (0)| 00:
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=1)
ops$tkyte@ORA10GR2> select * from t where id = 99;
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 49996 | 4687K| 165 (3)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| T | 49996 | 4687K| 165 (3)| 00:00:02 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=99)
ops$tkyte@ORA10GR2> set autotrace off
Same path but diff. LIOs
November 6, 2005 - 3am Central time zone
Reviewer: Jagjeet Singh from Delhi, India
Both queries using the same path but diff.
block gets.
Followup November 6, 2005 - 8am Central time zone:
That is due to the way blocks are gotten from the cache and the number of rows returned.
....
Fetch 388 0.12 0.11 0 684 0 5794
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 390 0.12 0.12 0 684 0 5794
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 23
Rows Row Source Operation
------- ---------------------------------------------------
5794 TABLE ACCESS FULL T (cr=684 r=0 w=0 time=61709 us
SQLPlus array fetches 15 rows at a time. There are about 310 blocks in this table (based on the
LIO's from the second query which got all of the data in one call). So, you have 18-19 rows per
block (5794/310).
When you fetched rows 1..15, we went to the buffer cache got block 1 and copied the 15 rows off of
it and sent them back.
When you fetched 16..30, we went to the buffer cache and got block 1 again to get the last 3 or 4
rows, then got block 2 from the buffer cache and got say 12 rows and sent them back.
When you fetched 31..45, we went to the buffer cache to get block 2 again, got 6 or 7 remaining
rows and then got block 3 to get 8 or 9 more and sent them back.
We read every block two times pretty much from the buffer cache (increase the arraysize in sqlplus
and you'll see a decrease in IO).
Then, you full scanned the table to get a single row - we read block 1, didn't find it, read block
2, didn't find it, read block 3 - say we found it, then we continued on reading every block (not
finding anymore data) and returned the single row - we read each block once.
Re:
November 6, 2005 - 10am Central time zone
Reviewer: Jagjeet Singh from Delhi, India
Thanks for this explaination.
o I used "analyze table t compute statistics" which is equal of
"analyze table t compute statistics for table for all indexes for
all indexd columns".
and it generates histograms. Do you mean no. of buckets is insuff.
o As you said diff. in LIO because of the way of getting the block from
cache for two queries.
In first query oracle scans 1-15 [ as per arraysize ] and send them
back and again picks the block 1 for rest records. that's why two
gets for per block.
For second query, Oracle picks the 1 block and scans it and then
picks the next block. In this case, oracle is scanning the whole
block, why there is no limitation with 15 rows.
Is this because oracle does not need to send data back using sqlnet
is it correct ?
If this is the case, I tried this with procedure using two diff. cursors. like ..
create procedure .. as
begin
for i in ( First query ) loop
null;
end loop;
for ii in ( Second Query ) loop
... same ..
Just accessing the data. -- Nothing to send back ---- But still there is diff. in LIO's.
o One more case -- without histograms ---
SQL> create table t as select 1 id,
2 a.* from dba_objects a ;
Table created.
SQL> create index ind_t on t ( id );
Index created.
SQL> analyze table t compute statistics
2 for table for all indexes;
Table analyzed.
SQL> set autot traceonly
SQL> select * from t where id = 1 ;
8195 rows selected.
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=74 Card=82 Bytes=7298)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=74 Card=82 BY..
2 1 INDEX (RANGE SCAN) OF 'IND_T' (NON-UNIQUE) (Cost=71 Card=33)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1582 consistent gets <-----
SQL> select /*+FULL(t)*/ * from t where id = 1 ;
8195 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=75 Card=82 Bytes=7298)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=75 Card=82 Bytes=7298)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
967 consistent gets <-------
I have not created histograms but still CBO has this
information that this table has 8195 records and there is
only one distinct value. But still CBO is choosing index scan?
SQL> select num_rows,distinct_keys from user_indexes where index_name
= 'IND_T' ;
NUM_ROWS DISTINCT_KEYS
---------- -------------
8195 1
Thanks,
Js
Followup November 6, 2005 - 10am Central time zone:
1) it is not equivalent to that.
ops$tkyte@ORA10GR2> create table t as select object_id, object_name, object_type from all_objects;
Table created.
ops$tkyte@ORA10GR2> create index t_idx on t(object_id);
Index created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> analyze table t compute statistics;
Table analyzed.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T'
group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
OBJECT_ID 2
OBJECT_TYPE 2
OBJECT_NAME 2
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> analyze table t delete statistics;
Table analyzed.
ops$tkyte@ORA10GR2> analyze table t compute statistics
2 for table
3 for all indexes
4 for all indexed columns;
Table analyzed.
ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T'
group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
OBJECT_ID 76
very different.
2) I do not understand why you expect two different queries to have the same IO's in the first
place?
3) it did not have that information, you specifically say "don't look at the columns". You skipped
all column level information.
Why this is comming in BOLD
November 6, 2005 - 10am Central time zone
Reviewer: Jagjeet Singh from Delhi, India
I do not know why my followup is in BOLD ?
Followup November 6, 2005 - 10am Central time zone:
there is an unbalanced <b> tag on the page somewhere, don't worry about it.
I use hash-B and hash-b for <b> and </b> tags - someone posted something with a hash-B in it
Now I know why the other post is in bold :) ... On to the histograms.
November 12, 2005 - 2pm Central time zone
Reviewer: Vladimir Sadilovskiy from MA,US
Tom,
I have a table that usually contains a sinle value in a column.
create table t1 as select * from all_objects;
insert /*+ append */ into t1 select * ftom t1;
...
update t1 set object_name='30_CHARACTERS_STRING_PPPPPPPPP';
dbms_stats.gather_table_stats(null,tabname=>'t1',method_opt=> 'for all columns size auto');
I have this simple query that is extracted from a bigger one:
select * from t1
where object_name != :1;
Now lets explain series of them with different imput:
explain plan for
select * from t1
where object_name != '1';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'1')
explain plan for
select * from t1
where object_name != 'A';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'A')
explain plan for
select * from t1
where object_name != 'ABC';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'ABC')
explain plan for
select * from t1
where object_name != 'ABCDEF';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'ABCDEF')
explain plan for
select * from t1
where object_name != 'ABCDEFG';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 100 | 3703 (5)|
|* 1 | TABLE ACCESS FULL | T1 | 1 | 100 | 3703 (5)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'ABCDEFG')
SQL>
Please, take a look at estimated cardinality. Our problem that joining to such table with this
predicate usually leads to merge join Cartesian. So, it's not an academic question.
Q1. Do you know why the value for inequality predicate such drasticly changes cardinality
expectations? Especially '1' and 'A'. They seem to be similar.
Q2. I've made series of additional tests and found that the boundary length of the value and its
content after which the cardinality drops to 1 depends on the value in the table.
Q3. What would be your workaround in case you don't know the cause?
Thanks as always for your thoughts.
- Vladimir
Followup November 12, 2005 - 4pm Central time zone:
you don't have histograms, you did size auto - and without any pre-existing queries - the
dbms_stats won't get any.
the magic is when you crossed 6 characters (suggest reading for you based on your last two
questions is Jonathan Lewis's new book - see home page for a direct link to it.
I'm not sure what you mean by 1 and A as they seem to be the same as the rest?
Bah! Probably something in the text again. Sorry for that.
November 12, 2005 - 2pm Central time zone
Reviewer: A reader
Followup November 12, 2005 - 4pm Central time zone:
hash-mark B turns on bolding
hash-mark b turns it off
the 10053 traces have hash-mark B's in them.
Not that easy...
November 12, 2005 - 6pm Central time zone
Reviewer: Vladimir Sadilovskiy from MA,US
Tom,
It's just a coincidence I provided the tests with auto size. I can accept that migh brough in a
different issue.
But, here is an example of the same behavior with histograms.
Could you please elaborate on "size auto - and without any pre-existing queries - the dbms_stats
won't get any"?
Thanks,
- Vladimir
SQL> update t1 set object_name = 'aa.bb.cccccc';
commit;
495568 rows updated.
SQL>
Commit complete.
SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 254');
PL/SQL procedure successfully completed.
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 38M| 3883 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 38M| 3883 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'aa.bb.')
12 rows selected.
SQL> SQL>
2 rows deleted.
SQL> 2 3
Explained.
SQL> SQL>
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 82 | 3766 (5)|
|* 1 | TABLE ACCESS FULL | T1 | 1 | 82 | 3766 (5)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'aa.bb.c')
12 rows selected.
SQL> SQL>
2 rows deleted.
SQL> 2 3
Explained.
SQL> SQL>
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 82 | 3766 (5)|
|* 1 | TABLE ACCESS FULL | T1 | 1 | 82 | 3766 (5)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'aa.bb.cccccc.dddddddddddddd')
12 rows selected.
SQL> SQL>
Followup November 12, 2005 - 7pm Central time zone:
...
Could you please elaborate on "size auto - and without any pre-existing queries
- the dbms_stats won't get any"?...
size auto looks at the historical set of predicates used to figure out what columns might benefit
from histgrams.
ops$tkyte@ORA9IR2> create table t
2 as
3 select 99 id, a.* from all_objects a;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> update t set id = 1 where rownum = 1;
1 row updated.
ops$tkyte@ORA9IR2> create index t_idx on t(id);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns
size auto', cascade=>true );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where id = 99;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45 Card=15342 Bytes=1472832)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=45 Card=15342 Bytes=1472832)
ops$tkyte@ORA9IR2> select * from t where id = 1;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45 Card=15342 Bytes=1472832)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=45 Card=15342 Bytes=1472832)
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns
size auto' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where id = 99;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45 Card=30683 Bytes=2945568)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=45 Card=30683 Bytes=2945568)
ops$tkyte@ORA9IR2> select * from t where id = 1;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=96)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=96)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=1)
ops$tkyte@ORA9IR2> set autotrace off
you are seeing the 6 to 7 transition again. The column information is 6 bytes long only. strange
things will happen if the first 6 bytes are the "same"
You will very much enjoy Jonathan's book - it will explain a lot.
Thanks Tom. That was very informative.
November 12, 2005 - 9pm Central time zone
Reviewer: Vladimir Sadilovskiy from MA,US

December 21, 2005 - 3am Central time zone
Reviewer: Reader from Ind
Oracle generating 2 buckets for each column if I am executing this
command.
SQL> analyze table t compute statistics;
Table analyzed.
SQL> select count(*),column_name from user_Tab_histograms where
table_name = 'T' group by column_name;
COUNT(*) COLUMN_NAME
---------- --------------------------------------------------
2 CREATED
2 DATA_OBJECT_ID
2 GENERATED
2 LAST_DDL_TIME
2 OBJECT_ID
2 OBJECT_NAME
2 OBJECT_TYPE
2 OWNER
2 SECONDARY
2 STATUS
2 SUBOBJECT_NAME
2 TEMPORARY
2 TIMESTAMP
13 rows selected.
And If use this command ...
SQL> analyze table t compute statistics for table for all columns;
Table analyzed.
SQL> select count(*),column_name from user_Tab_histograms where
table_name = 'T' group by column_name;
COUNT(*) COLUMN_NAME
---------- --------------------------------------------------
76 CREATED
75 DATA_OBJECT_ID
2 GENERATED
76 LAST_DDL_TIME
76 OBJECT_ID
76 OBJECT_NAME
24 OBJECT_TYPE
11 OWNER
1 SECONDARY
2 STATUS
2 SUBOBJECT_NAME
2 TEMPORARY
76 TIMESTAMP
I am confused .. No of buckets is dependent upon what ?
Regards.
Followup December 21, 2005 - 7am Central time zone:
first and foremost - stop using analyze to gather stats, start using dbms_stats.
Second, analyze table t compute statististics is
analyze table t
compute statistics
for table
for all columns SIZE 1
for all indexes;
ops$tkyte@ORA9IR2> create table t as select * from scott.emp;
Table created.
ops$tkyte@ORA9IR2> create index t_idx on t(ename);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
Table analyzed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
TABLE_NAME NUM_ROWS
------------------------------ ----------
T 14
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T'
group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
COMM 2
DEPTNO 2
EMPNO 2
ENAME 2
HIREDATE 2
JOB 2
MGR 2
SAL 2
8 rows selected.
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name= 'T';
INDEX_NAME NUM_ROWS
------------------------------ ----------
T_IDX 14
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t delete statistics;
Table analyzed.
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
TABLE_NAME NUM_ROWS
------------------------------ ----------
T
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T'
group by column_name;
no rows selected
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name= 'T';
INDEX_NAME NUM_ROWS
------------------------------ ----------
T_IDX
ops$tkyte@ORA9IR2> analyze table t
2 compute statistics
3 for table
4 for all columns SIZE 1
5 for all indexes;
Table analyzed.
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
TABLE_NAME NUM_ROWS
------------------------------ ----------
T 14
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T'
group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
COMM 2
DEPTNO 2
EMPNO 2
ENAME 2
HIREDATE 2
JOB 2
MGR 2
SAL 2
8 rows selected.
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name= 'T';
INDEX_NAME NUM_ROWS
------------------------------ ----------
T_IDX 14
ops$tkyte@ORA9IR2>
Re:
December 21, 2005 - 7am Central time zone
Reviewer: Reader from Ind
Thanks Tom for quick response.
And
analyze table t compute statistics for table for all columns;
This command using what size for columns ..
Got it default is 75 -- thanks
December 21, 2005 - 7am Central time zone
Reviewer: Reader from ind
Analyze table
December 21, 2005 - 8am Central time zone
Reviewer: A reader
Tom,
This is not a follow up question but more of supporting incident to what you said about not using
analyze table.
This happened several months ago so I do not remember all the table names. I was working with
Oracle support to resolve an issue of EM running some very expensive queries against the 10gr1
database. In an OWC session, the Oracle support analyst asked me to analyze some tables using
analyze table...command. After analyzing the tables, the problem did not go away. When I checked in
the last_analyzed date in dba_tables, the dates were totally weird for those tables. We tried many
times but no effect. Then we used the dbms_stats to analyze these tables and the problem was
resolved and the weird dates were replaced by the correct ones. Though the analyze table command
was populating the statistics, it was missing something which dbms_stats fixed.
After that I have stopped using analyze table completely even for small adhoc test cases on 10g
databases.
Followup December 21, 2005 - 7pm Central time zone:
analyze and dbms_stats can and will result in different numbers in the dictionary. The optimizer
is built expecting the dbms_stats numbers. food for thought.
Fixing plan for table having skewed data
March 24, 2006 - 6am Central time zone
Reviewer: Narendra from India
Hi Tom,
In following query in EXPLAIN PLAN, I am getting CARD=9 where as row returned is 1.
Following are necessary details:
SQL> select * from v$version ;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - 64bi
PL/SQL Release 10.1.0.4.0 - Production
CORE 10.1.0.4.0 Production
TNS for Solaris: Version 10.1.0.4.0 - Production
NLSRTL Version 10.1.0.4.0 - Production
SQL> desc product
Name Null? Type
----------------------------------------- -------- ----------------------------
PRODUCT_KEY NOT NULL NUMBER(10)
DESCRIPTION VARCHAR2(30)
FULL_DESCRIPTION VARCHAR2(50)
PRODUCT_TYPE VARCHAR2(20)
BRAND VARCHAR2(20)
PRODUCT_CATEGORY VARCHAR2(20)
AGE_CATEGORY VARCHAR2(20)
DEPARTMENT VARCHAR2(15)
PRODUCT_KEY is PRIMARY KEY and has a UNIQUE index on it.
Following is distribution of data for BRAND column.
SQL> select brand, count(*) from product group by brand ;
BRAND COUNT(*)
-------------------- ----------
Astro 4
Big Studios 24
Coda 1
Dordor 2
Duff 1
Galore 1
Leavenworth 1
Little Guys 2
MKF Studios 24
Nagazoo 6
Parabuster Inc. 29
BRAND COUNT(*)
-------------------- ----------
Sani 3
Solo 6
Wild Age 9
Wolf 28
15 rows selected.
Now, when I execute following query,
SQL> select * from product where brand = 'Duff';
PRODUCT_KEY DESCRIPTION
----------- ------------------------------
FULL_DESCRIPTION PRODUCT_TYPE
-------------------------------------------------- --------------------
BRAND PRODUCT_CATEGORY AGE_CATEGORY DEPARTMENT
-------------------- -------------------- -------------------- ---------------
141 Beer
FOOD
Duff Alcohol over 21 Beverage
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=9 Bytes=684)
1 0 TABLE ACCESS (FULL) OF 'PRODUCT' (TABLE) (Cost=2 Card=9 Bytes=684)
The plan shows FULL table access step has CARD=9, whereas query returns only single row.
I tried collecting statistics using
SQL> exec dbms_stats.gather_table_stats(ownname => user, tabname => 'PRODUCT', method_opt => 'FOR
ALL INDEXED COLUMNS SIZE 254', cascade => true);
PL/SQL procedure successfully completed.
However, still no change in plan.
SQL> select * from product where brand = 'Duff';
PRODUCT_KEY DESCRIPTION
----------- ------------------------------
FULL_DESCRIPTION PRODUCT_TYPE
-------------------------------------------------- --------------------
BRAND PRODUCT_CATEGORY AGE_CATEGORY DEPARTMENT
-------------------- -------------------- -------------------- ---------------
141 Beer
FOOD
Duff Alcohol over 21 Beverage
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=9 Bytes=684)
1 0 TABLE ACCESS (FULL) OF 'PRODUCT' (TABLE) (Cost=2 Card=9 Bytes=684)
But when I do following:
SQL> create table product_test as select * from product ;
Table created.
And fire same query on new table:
SQL> select * from product_test where brand = 'Duff';
PRODUCT_KEY DESCRIPTION
----------- ------------------------------
FULL_DESCRIPTION PRODUCT_TYPE
-------------------------------------------------- --------------------
BRAND PRODUCT_CATEGORY AGE_CATEGORY DEPARTMENT
-------------------- -------------------- -------------------- ---------------
141 Beer
FOOD
Duff Alcohol over 21 Beverage
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=114)
1 0 TABLE ACCESS (FULL) OF 'PRODUCT_TEST' (TABLE) (Cost=2 Card=1 Bytes=114)
The CARDINALITY is correct.
How can I fix the problem of incorrect cardinality on PRODUCT table? This table is part of VIDEO5
schema (which is part of ORACLE DISCOVERER) and PRODUCT table contains 141 rows.
Followup March 24, 2006 - 9am Central time zone:
Histograms would "correct" the estimate, the estimate right now is:
ops$tkyte@ORA10GR2> select 141/15 from dual;
141/15
----------
9.4
(141 rows, 15 distinct values). However, do you really NEED to fix it - 1 or 9 - is it making a
material difference somewhere (if you use binds, unlikely for this table that you want to have
histograms)
For study
March 26, 2006 - 3am Central time zone
Reviewer: Narendra from India
Tom,
I was (obviously) not doing any tuning in this case. The idea of probing was just to know
1. Why estimates were incorrect ?
2. On creating separate table (without PK) why did it get "corrected"?
Followup March 26, 2006 - 8am Central time zone:
why obviously? If you have to state "this was obviously" - that means "it was not" (else it would
be obvious and you would have no need to state "obviously" :)
I told you why the estimates were "incorrect" and even showed you the math. (just divide)
On the other table, you gathered statistics differently one would presume. You have different
statistics.
As a bit of magic, I can show this (how different statistics affect the opimizer)
I put your aggregated data into a table T1 and created T2 with your data as follows:
ops$tkyte@ORA10GR2> create table t2
2 as
3 with data
4 as
5 (select level l from dual connect by level <= 29)
6 select rownum id, t1.nm
7 from t1, data
8 where data.l <= t1.cnt
9 /
Table created.
ops$tkyte@ORA10GR2> select nm, count(*) from t2 group by nm;
NM COUNT(*)
------------------------------ ----------
Astro 4
Duff 1
Big Studios 24
Leavenworth 1
Sani 3
Wild Age 9
Solo 6
Galore 1
Nagazoo 6
Parabuster Inc. 29
Coda 1
Dordor 2
Little Guys 2
MKF Studios 24
Wolf 28
15 rows selected.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> select * from t2 where nm = 'Duff';
Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 117 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T2 | 9 | 117 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NM"='Duff')
exactly what you saw - and for the reasons I stated - DIVIDE 141/15 ~ 9. About nine rows. Why?
Because of the lack of information, the optimizer just knows the HIGH and LOW values for NM and the
number of distinct values:
ops$tkyte@ORA10GR2> set autotrace off
ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T2'
group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
NM 2
ID 2
but, and here is the "magic", we just regather stats and since we've run a query that puts a
predicate on a column - the stats Oracle gathers using AUTO change:
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T2'
group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
NM 15
ID 2
Oracle now has a rather complete histogram of the values in NM and will therefore make a better
"guess"
ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> select * from t2 where nm = 'Duff';
Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NM"='Duff')
ops$tkyte@ORA10GR2> set autotrace off
Now, this DOES NOT MEAN run out and gather histograms on everything - in fact, quite the
opposite. In many/most systems (transactional for example), you probably don't want them at all.
In a warehouse (where the use of bind variables for the big queries would not be prevalent), you
may well want them
But this explains "why" - you had different statistics for the two different tables.
Thanks ("Obviously...")
March 27, 2006 - 4am Central time zone
Reviewer: Narendra from India
Tom,
When I gathered statistics using METHOD_OPT as 'FOR ALL COLUMNS SIZE SKEWONLY', the cardinality in
query plan got corrected. I feel reason for not being able to achieve that earlier was due to
gathering statistics on INDEXED COLUMNS only. BTW, I NEVER gathered statistics on NEW table after
creating it with CTAS statement. So can I assume that CTAS ALWAYS keeps table stats up-to-date?
p.s. That "Obvious" was too much...:)
Thanks.
Followup March 27, 2006 - 9am Central time zone:
create table as select gathers no statistics, nothing would be "up to date"
Histograms on system tables.
August 14, 2006 - 3am Central time zone
Reviewer: Raju from India
Hi Tom,
We have noticed several system tables where
dba_tab_col_statistics reports HEIGHT BALANCED histograms where bucket_cnt >
distcnt (count of distinct values). It was my understanding that a similar
problem was fixed in 10.2.0.1 (dbms_stats method_opt size n generates a
FREQUENCY histogram where n >= NDV). However, even more perplexing is why do we
have many height balanced histogram buckets on a column with only 1 distinct
value?
SQL > select ENDPOINT_NUMBER,ENDPOINT_VALUE from dba_histograms where
table_name = 'INDPART$' and column_name = 'TS#';
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
217 79
So, some questions?
1. in sys.hist_head$, how is bucket_cnt calculated?
2. what determines how many buckets a column gets?
Thanks,
Raju
Followup August 14, 2006 - 11am Central time zone:
bucket cnt is not calculated, it is "set" - you either decide the number of buckets (size on the
method opt) or you let us use any algorithm we feel like to do it "auto".
and you don't need/want to be querying hist_head$, just use the normal views like the rest of us :)
frequency based histograms - number of distinct values < 255
"height balanced" - number of distinct values >= 255
Now, example please? I can definitely reproduce your findings, by modifying data after gathering
of course.
So, got example?
Histogram
September 22, 2006 - 7pm Central time zone
Reviewer: A Reader from USA
hi tom
Oracle 9i
DW Environment
Monthly Data Load
120 Million rows in Fact
we create hsitograms for all indexed columns with 10% sample size
one of the indexed columns is Month-Year to which data belongs - this column is not that skewed as
each month roughly same # of rows get inserted.
The warehouse has 120 million rows for approx 72 months, when we do
select * from fact_table
it does a full scan, even though the selectivity is approx 1/72
i was wondering if the 10% samplesize is generating histograms which are more hurting then helping
as with a selectivity of 1/72 indexes should be useful
2 questions
a) do you think removing the histograms on this (such)
columns will be helpful ?
b) do we also need to remove the stats of the index based
on this column ?
Thanks very much
Followup September 24, 2006 - 1pm Central time zone:
... we create hsitograms for all indexed columns with 10% sample size
....
why? why not the other columns? why all indexed columns? why not "columns we know to be skewed
and that we know we use in some predicate and that we know the optimizer needs to be told about"
since, well, histograms cost alot to compute...
Histrogram
September 24, 2006 - 2pm Central time zone
Reviewer: Rinku from India
Excellent description about Histogram.
Thanks Tom & All...

September 24, 2006 - 8pm Central time zone
Reviewer: A Reader from US
hi tom,
if i am not mistaken - you have mentioned a couple of diff times that starting with indexed columns
for creating histograms is not a bad starting point - and to me it stands to reason - the fact that
one decided to create indexes on these columns is a good indication that these often get used in
predicates,
i know this is an approximation - but nevertheless a good starting point. The other issue is of
manageabilty. A decent sized app can easilt have 2000 columns (100 tables with 20 columns each) -
to tell precisely which colsumns to create histograms can be quite a task - with high maintenance
as well, as the schema evolves. (As a suggestion it would be a rather useful addition to oracle
create table DDL syntax to be able to earmark columns for which to create histograms - on the
column clause).
My point is - if we do fall back on this rather common paradigm of creating histograms on indexed
columns - a good best practice could be :
"IF YOU CREATE HISTOGRAMS FOR ALL INDEXED COLUMNS (COZ YOU ARE LAZY NOT TO ANALYZE WHICH ONES
REALLY NEEN EM :-) AND
IF YOU ARE ANALYZING A SMALL % OF ROWS - EG 10 % - THEN MAKE SURE YOU DELETE THE COLUMN STATS FOR
COLUMN WHICH YOU KNOW FOR A FACT ARE NOT SKEWED"
the basic reason, i think we need to do is
if a table has 72 M rows - say 1 M for each month - in such as case a predicate suct as month_year
= '01-2006' -
has a selectivity of of 1/72. And with that kind of selectivity i feel it should use the index -
and its the histograms build with just 10% of the rows that are confusing the system. Do you agree
?
Does all this even remotely make sense - will be graeful to hear your take.
Regards,
Followup September 25, 2006 - 2am Central time zone:
can you point to a place where I said that - so I can update it.
In a data warehouse, you don't index in many cases - so it doesn't really hold true (indexes are
good for getting a few rows from a big table, warehouses are many times characterized for NOT doing
that)
to gather histograms on all indexed columns to me would be the wrong (inefficient) approach.
maybe you should not gather any. I don't fall back on "common paradigms", those are also known as
ROT (rule of thumb)
you would not need to remove the ones you know are not skewed (this is funny, you say you don't
have the time/energy/motivation to discover what ones you SHOULD gather on, but you are
knowledgable about the ones you don't need them on?? that would imply you do know the skewed
columns by definition - it is the set of columns left over after you take these away). Since they
are not skewed, the histograms will reflect that.
No, it doesn't make sense to me to say "it is too much work to understand our data"
test
September 25, 2006 - 7am Central time zone
Reviewer: A reader
test
Dumb question... How do I NOT create histograms when using DBMS_STATS to gather stats?
September 26, 2006 - 2pm Central time zone
Reviewer: Philip Moore from Cincinnati, OH USA
Hi Tom,
This may be an EXTREMELY dumb question - but I can't find the answer in the 9i documentation or on
askTom...
How do I explicitly NOT create histograms when using DBMS_STATS with 9.2.0.6? What is the
"METHOD_OPT" parameter I should use?
I really need to analyze my tables with DBMS_STATS - I just don't want histograms :)
Thanks for your help in advance, sir.
Followup September 26, 2006 - 5pm Central time zone:
size 1
if you have one bucket, only thing we know is "high and low and number distinct" values.
statistics & HIST_HEAD$
September 27, 2006 - 8am Central time zone
Reviewer: Antonio Dell'Elce from Italy
Dear Tom,
I am investigating possible performance improvements on
on a 10.1.0.2.0 instance, looking at the ADDM report:
FINDING 1: 88% impact (36798 seconds)
-------------------------------------
SQL statements consuming significant database time were found.
RECOMMENDATION 1: SQL Tuning, 60% benefit (25021 seconds)
ACTION: Investigate the SQL statement with SQL_ID "b6usrg82hwsa3" for
possible performance improvements.
RELEVANT OBJECT: SQL statement with SQL_ID b6usrg82hwsa3
call dbms_stats.gather_database_stats_job_proc ( )
(only first lines reported)
and at v$segment_statistics:
OWNER OBJ SUBNAME SNAME VALUE
---------- ------------------------------ --------------- -------------------- ----------------
SYS HIST_HEAD$ db block changes 107,039,200
STAT_OWN INTERVALLI SYS_SUBP1119 physical reads 80,407,080
STAT_OWN INTERVALLI SYS_SUBP1121 physical reads 80,335,637
STAT_OWN INTERVALLI SYS_SUBP1117 physical reads 68,966,192
STAT_OWN INTERVALLI SYS_SUBP1118 physical reads 55,163,908
STAT_OWN INTERVALLI SYS_SUBP1120 physical reads 43,452,888
SYS I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST db block changes 11,185,920
SYS HISTGRM$ db block changes 10,908,512
I came to believe that statistics collections are not working at best, ... how do you suggest I can
investigate this?
Thanks,
Antonio
Followup September 27, 2006 - 3pm Central time zone:
is this job running during your time of "peak usage", this is the auto scheduled job that should be
running "off peak", is it running when you are "doing useful work" or "when you are asleep and no
one is really doing stuff"

September 29, 2006 - 12pm Central time zone
Reviewer: Antonio
I verified the schedules and it is on the default schedule
(night and weekends) but while the system is "operational"
on the whole day (it collects data from other systems which
are 24hrs active)
Followup September 30, 2006 - 7am Central time zone:
ok, big question: should you be gathering stats so frequently...
you cannot really change what this routine does.
Find queries using histograms
October 26, 2007 - 4pm Central time zone
Reviewer: Ravi from Chicago, IL
Tom,
How do we find a query is using histograms or not? Only way is through trace?
Thanks
Ravi
Followup October 29, 2007 - 12pm Central time zone:
query user_tab_histograms - if they have them, the query was optimized with them in mind.
that is, every query that accesses a table with histograms "used" them - whether they make any difference to the plan or not
Histogram is not working ?
February 17, 2009 - 3pm Central time zone
Reviewer: Roberto Veiga from Brazil
Tom, I think I have a problem.
The user is complaining about performance.
I have a query that uses bind variables.
We are investigating the query and we've noticed the plan changes according with the values that
are passed to the query.
We have already collected statistcs and histograms. But sometimes the performance is good and
sometimes is bad.
I dont know what to say to the user because this process have to have a good performance and one
minute makes a lot of difference.
Here is my tests:
** *******************************************************************
** ORIGINAL QUERY AND ORIGINAL ACCESS PLAN
** *******************************************************************
SELECT RAES.RAES_USR_ANT_ATM, RAES_DAT_ANT_ATM, RAES.RAES_STA_LBR_RMS, RAES.RAES_CTRA_AFL_CDS_COD,
RAES.RAES_DAT_CLC, RAES.RAES_DAT_PGT, VPR.PGT_VAL_BRT VAL_BRT, RAES.RAES_VAL_LQD,
RAES.RAES_VAL_BLQ_ALR, RAES.RAES_ADMS_ADME_ADM_COD, RAES.RAES_RMS_ANO_EMO, RAES.RAES_RMS_LOCP_COD,
RAES.RAES_RMS_TIP_NUM_RMS, RAES.RAES_RMS_NUM_PRG_RMS, RAES.RAES_DAT_COR, RAES.RAES_PRM_COB
RAES_PRM_COB_AUX, RAES.RAES_PGT_COD, RAES.RAES_ADMS_FRMS_COD, RAES.RAES_MOE_COD, RAES.RAES_NUM_PRL,
RAES.RAES_ADMS_SRV_COD, VPR.GUI_ADME_ADM_COD GUI_ADME_ADM_COD, VPR.GUI_EQP_NUM GUI_EQP_NUM,
VPR.GUI_COD GUI_COD, VPR.GUI_DTA_ETG GUI_DTA_ETG, VPR.GUI_QTD_TOT_IFD GUI_QTD_TOT_IFD,
VPR.GUI_VAL_TOT_IFD GUI_VAL_TOT_IFD, C620040P(RAES.RAES_STA_LBR_RMS, RAES.RAES_PRM_COB,
RAES.RAES_DAT_PGT, RAES.RAES_PGT_COD) RAES_STATUS, RAES_CTRA_ADMS_ADME_ADM_COD,
RAES_CTRA_ADMS_FRMS_COD, RAES_CTRA_ADMS_SRV_COD, RAES_CTRA_MOE_COD FROM RMS_ADM_EMP_SRV RAES,
V_RAES_PGT VPR
WHERE RAES.RAES_STA_LBR_RMS < 4
AND NVL(RAES.RAES_VAL_PGO_PRL, 0) > 0
AND NOT VPR.RAES_RMS_TIP_NUM_RMS = 7
AND NOT VPR.RAES_FRMP_COD = 7
AND RAES.RAES_STA_RAES + 0 >= 2
AND RAES.RAES_PGT_COD + 0 IS NULL
AND RAES.RAES_RMS_ANO_EMO = VPR.RAES_RMS_ANO_EMO
AND RAES.RAES_RMS_LOCP_COD = VPR.RAES_RMS_LOCP_COD
AND RAES.RAES_RMS_TIP_NUM_RMS = VPR.RAES_RMS_TIP_NUM_RMS
AND RAES.RAES_RMS_NUM_PRG_RMS = VPR.RAES_RMS_NUM_PRG_RMS
AND RAES.RAES_ADMS_ADME_ADM_COD = VPR.RAES_ADMS_ADME_ADM_COD
AND RAES.RAES_ADMS_SRV_COD = VPR.RAES_ADMS_SRV_COD
AND RAES.RAES_ADMS_FRMS_COD = VPR.RAES_ADMS_FRMS_COD
AND RAES.RAES_MOE_COD = VPR.RAES_MOE_COD
AND RAES.RAES_NUM_PRL = VPR.RAES_NUM_PRL
AND RAES.RAES_ADMS_ADME_ADM_COD = 5
AND RAES.RAES_ADMS_FRMS_COD = 2
AND RAES.RAES_ADMS_SRV_COD = 1
AND RAES.RAES_MOE_COD = 'R$'
AND RAES.RAES_DAT_PGT = '12/16/2008'
AND RAES.RAES_PGT_GRD = 'N'
AND ( ('S' = 'S' AND VPR.PGT_CDS_COD_REL = 104938 )
OR ('S' != 'S' AND VPR.PGT_CDS_COD = 104938 ) );
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
--------------------------
Plan hash value: 3546104366
----------------------------------------------------------------------------------------------------
--------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
--------------------------
| 0 | SELECT STATEMENT | | 1 | 179 | 49 (0)|
00:00:01 | | |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| RMS_ADM_EMP_SRV | 1 | 102 | 1 (0)|
00:00:01 | ROWID | ROWID |
| 2 | NESTED LOOPS | | 1 | 179 | 49 (0)|
00:00:01 | | |
|* 3 | TABLE ACCESS BY INDEX ROWID | SNAP$_RAES_PGT | 13 | 1001 | 46 (3)|
00:00:01 | | |
|* 4 | INDEX RANGE SCAN | SNAP$RAES_AFL_DAT_PGT_I | 218 | | 1 (0)|
00:00:01 | | |
|* 5 | INDEX RANGE SCAN | RAES_RMS_FK_I | 1 | | 1 (0)|
00:00:01 | | |
----------------------------------------------------------------------------------------------------
--------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAES"."RAES_PGT_GRD"='N' AND "RAES"."RAES_ADMS_FRMS_COD"=2 AND
"RAES"."RAES_ADMS_SRV_COD"=1 AND
"RAES"."RAES_PGT_COD"+0 IS NULL AND "RAES"."RAES_DAT_PGT"='12/16/2008' AND
NVL("RAES"."RAES_VAL_PGO_PRL",0)>0 AND
"RAES"."RAES_ADMS_ADME_ADM_COD"=5 AND "RAES"."RAES_STA_LBR_RMS"<4 AND
"RAES"."RAES_STA_RAES"+0>=2 AND
"RAES"."RAES_MOE_COD"='R$' AND "RAES"."RAES_NUM_PRL"="RAES_NUM_PRL")
3 - filter("RAES_ADMS_SRV_COD"=1 AND "RAES_ADMS_FRMS_COD"=2 AND "RAES_RMS_TIP_NUM_RMS"<>7 AND
"RAES_FRMP_COD"<>7
AND "RAES_ADMS_ADME_ADM_COD"=5 AND "RAES_MOE_COD"='R$' AND ("RAES_RMS_TIP_NUM_RMS"=1
OR "RAES_RMS_TIP_NUM_RMS"=2 OR
"RAES_RMS_TIP_NUM_RMS"=3 OR "RAES_RMS_TIP_NUM_RMS"=4 OR "RAES_RMS_TIP_NUM_RMS"=5 OR
"RAES_RMS_TIP_NUM_RMS"=6 OR
"RAES_RMS_TIP_NUM_RMS"=7 OR "RAES_RMS_TIP_NUM_RMS"=8 OR "RAES_RMS_TIP_NUM_RMS"=9))
4 - access("PGT_CDS_COD_REL"=104938)
5 - access("RAES"."RAES_RMS_NUM_PRG_RMS"="RAES_RMS_NUM_PRG_RMS" AND
"RAES"."RAES_RMS_TIP_NUM_RMS"="RAES_RMS_TIP_NUM_RMS" AND
"RAES"."RAES_RMS_ANO_EMO"="RAES_RMS_ANO_EMO" AND
"RAES"."RAES_RMS_LOCP_COD"="RAES_RMS_LOCP_COD")
filter("RAES"."RAES_RMS_TIP_NUM_RMS"<>7)
29 rows selected.
** *******************************************************************
** TESTING VALUES ON COLUMN RAES_ADMS_SRV_COD
** *******************************************************************
-- -------------------------------------------------------------------
-- DISTINCT VALUES FOR COLUMN RAES_ADMS_SRV_COD
-- -------------------------------------------------------------------
SELECT DISTINCT RAES_ADMS_SRV_COD FROM RMS_ADM_EMP_SRV ORDER BY 1;
RAES_ADMS_SRV_COD
-----------------
1
2
3
4
6
7
16
22
23
24
25
26
27
30
31
32
16 rows selected.
-- -------------------------------------------------------------------
-- BUCKETS/VALUES FOR COLUMN RAES_ADMS_SRV_COD
-- -------------------------------------------------------------------
select min(RAES_ADMS_SRV_COD), max(RAES_ADMS_SRV_COD), count(RAES_ADMS_SRV_COD), wb
from (
select RAES_ADMS_SRV_COD,
width_bucket( RAES_ADMS_SRV_COD,
(select min(RAES_ADMS_SRV_COD)-1 from prod_dba.RMS_ADM_EMP_SRV),
(select max(RAES_ADMS_SRV_COD)+1 from prod_dba.RMS_ADM_EMP_SRV), 50 ) wb
from prod_dba.RMS_ADM_EMP_SRV
)
group by wb
order by wb
/
MIN(RAES_ADMS_SRV_COD)|MAX(RAES_ADMS_SRV_COD)|COUNT(RAES_ADMS_SRV_COD)| WB
----------------------|----------------------|------------------------|----------
1| 1| 11692690| 2
2| 2| 36778973| 4
3| 3| 3309| 5
4| 4| 5125| 7
6| 6| 3431| 10
7| 7| 2111| 11
16| 16| 4731| 25
22| 22| 43| 34
23| 23| 5| 35
24| 24| 8851| 37
25| 25| 537676| 38
26| 26| 1387| 40
27| 27| 797| 41
30| 30| 227446| 46
31| 31| 3285855| 47
32| 32| 1473147| 49
16 rows selected.
-- -------------------------------------------------------------------
-- TESTING RAES_ADMS_SRV_COD = 1
-- -------------------------------------------------------------------
SELECT RAES.RAES_USR_ANT_ATM, RAES_DAT_ANT_ATM, RAES.RAES_STA_LBR_RMS, RAES.RAES_CTRA_AFL_CDS_COD,
RAES.RAES_DAT_CLC, RAES.RAES_DAT_PGT, VPR.PGT_VAL_BRT VAL_BRT, RAES.RAES_VAL_LQD,
RAES.RAES_VAL_BLQ_ALR, RAES.RAES_ADMS_ADME_ADM_COD, RAES.RAES_RMS_ANO_EMO, RAES.RAES_RMS_LOCP_COD,
RAES.RAES_RMS_TIP_NUM_RMS, RAES.RAES_RMS_NUM_PRG_RMS, RAES.RAES_DAT_COR, RAES.RAES_PRM_COB
RAES_PRM_COB_AUX, RAES.RAES_PGT_COD, RAES.RAES_ADMS_FRMS_COD, RAES.RAES_MOE_COD, RAES.RAES_NUM_PRL,
RAES.RAES_ADMS_SRV_COD, VPR.GUI_ADME_ADM_COD GUI_ADME_ADM_COD, VPR.GUI_EQP_NUM GUI_EQP_NUM,
VPR.GUI_COD GUI_COD, VPR.GUI_DTA_ETG GUI_DTA_ETG, VPR.GUI_QTD_TOT_IFD GUI_QTD_TOT_IFD,
VPR.GUI_VAL_TOT_IFD GUI_VAL_TOT_IFD, C620040P(RAES.RAES_STA_LBR_RMS, RAES.RAES_PRM_COB,
RAES.RAES_DAT_PGT, RAES.RAES_PGT_COD) RAES_STATUS, RAES_CTRA_ADMS_ADME_ADM_COD,
RAES_CTRA_ADMS_FRMS_COD, RAES_CTRA_ADMS_SRV_COD, RAES_CTRA_MOE_COD FROM RMS_ADM_EMP_SRV RAES,
V_RAES_PGT VPR
WHERE RAES.RAES_STA_LBR_RMS < 4
AND NVL(RAES.RAES_VAL_PGO_PRL, 0) > 0
AND NOT VPR.RAES_RMS_TIP_NUM_RMS = 7
AND NOT VPR.RAES_FRMP_COD = 7
AND RAES.RAES_STA_RAES + 0 >= 2
AND RAES.RAES_PGT_COD + 0 IS NULL
AND RAES.RAES_RMS_ANO_EMO = VPR.RAES_RMS_ANO_EMO
AND RAES.RAES_RMS_LOCP_COD = VPR.RAES_RMS_LOCP_COD
AND RAES.RAES_RMS_TIP_NUM_RMS = VPR.RAES_RMS_TIP_NUM_RMS
AND RAES.RAES_RMS_NUM_PRG_RMS = VPR.RAES_RMS_NUM_PRG_RMS
AND RAES.RAES_ADMS_ADME_ADM_COD = VPR.RAES_ADMS_ADME_ADM_COD
AND RAES.RAES_ADMS_SRV_COD = VPR.RAES_ADMS_SRV_COD
AND RAES.RAES_ADMS_FRMS_COD = VPR.RAES_ADMS_FRMS_COD
AND RAES.RAES_MOE_COD = VPR.RAES_MOE_COD
AND RAES.RAES_NUM_PRL = VPR.RAES_NUM_PRL
AND RAES.RAES_ADMS_ADME_ADM_COD = 5
AND RAES.RAES_ADMS_FRMS_COD = 2
AND RAES.RAES_ADMS_SRV_COD = 1 -- <<
AND RAES.RAES_MOE_COD = 'R$'
AND RAES.RAES_DAT_PGT = '12/16/2008'
AND RAES.RAES_PGT_GRD = 'N'
AND ( ('S' = 'S' AND VPR.PGT_CDS_COD_REL = 104938 )
OR ('S' != 'S' AND VPR.PGT_CDS_COD = 104938 ) );
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
-----------------------------
Plan hash value: 3546104366
----------------------------------------------------------------------------------------------------
--------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
--------------------------
| 0 | SELECT STATEMENT | | 1 | 179 | 49 (0)|
00:00:01 | | |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| RMS_ADM_EMP_SRV | 1 | 102 | 1 (0)|
00:00:01 | ROWID | ROWID |
| 2 | NESTED LOOPS | | 1 | 179 | 49 (0)|
00:00:01 | | |
|* 3 | TABLE ACCESS BY INDEX ROWID | SNAP$_RAES_PGT | 13 | 1001 | 46 (3)|
00:00:01 | | |
|* 4 | INDEX RANGE SCAN | SNAP$RAES_AFL_DAT_PGT_I | 218 | | 1 (0)|
00:00:01 | | |
|* 5 | INDEX RANGE SCAN | RAES_RMS_FK_I | 1 | | 1 (0)|
00:00:01 | | |
----------------------------------------------------------------------------------------------------
--------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAES"."RAES_PGT_GRD"='N' AND "RAES"."RAES_ADMS_FRMS_COD"=2 AND
"RAES"."RAES_ADMS_SRV_COD"=1 AND
"RAES"."RAES_PGT_COD"+0 IS NULL AND "RAES"."RAES_DAT_PGT"='12/16/2008' AND
NVL("RAES"."RAES_VAL_PGO_PRL",0)>0 AND
"RAES"."RAES_ADMS_ADME_ADM_COD"=5 AND "RAES"."RAES_STA_LBR_RMS"<4 AND
"RAES"."RAES_STA_RAES"+0>=2 AND
"RAES"."RAES_MOE_COD"='R$' AND "RAES"."RAES_NUM_PRL"="RAES_NUM_PRL")
3 - filter("RAES_ADMS_SRV_COD"=1 AND "RAES_ADMS_FRMS_COD"=2 AND "RAES_RMS_TIP_NUM_RMS"<>7 AND
"RAES_FRMP_COD"<>7
AND "RAES_ADMS_ADME_ADM_COD"=5 AND "RAES_MOE_COD"='R$' AND ("RAES_RMS_TIP_NUM_RMS"=1
OR "RAES_RMS_TIP_NUM_RMS"=2 OR
"RAES_RMS_TIP_NUM_RMS"=3 OR "RAES_RMS_TIP_NUM_RMS"=4 OR "RAES_RMS_TIP_NUM_RMS"=5 OR
"RAES_RMS_TIP_NUM_RMS"=6 OR
"RAES_RMS_TIP_NUM_RMS"=7 OR "RAES_RMS_TIP_NUM_RMS"=8 OR "RAES_RMS_TIP_NUM_RMS"=9))
4 - access("PGT_CDS_COD_REL"=104938)
5 - access("RAES"."RAES_RMS_NUM_PRG_RMS"="RAES_RMS_NUM_PRG_RMS" AND
"RAES"."RAES_RMS_TIP_NUM_RMS"="RAES_RMS_TIP_NUM_RMS" AND
"RAES"."RAES_RMS_ANO_EMO"="RAES_RMS_ANO_EMO" AND
"RAES"."RAES_RMS_LOCP_COD"="RAES_RMS_LOCP_COD")
filter("RAES"."RAES_RMS_TIP_NUM_RMS"<>7)
29 rows selected.
-- -------------------------------------------------------------------
-- TESTING RAES_ADMS_SRV_COD = 4
-- -------------------------------------------------------------------
** >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< **
** >>>>>>>>>>>>>>>>>>>>>>>>>>> CHANGE ON ACCESS PLAN <<<<<<<<<<<<<<<<<<<<<<<<<< **
** >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< **
SELECT RAES.RAES_USR_ANT_ATM, RAES_DAT_ANT_ATM, RAES.RAES_STA_LBR_RMS, RAES.RAES_CTRA_AFL_CDS_COD,
RAES.RAES_DAT_CLC, RAES.RAES_DAT_PGT, VPR.PGT_VAL_BRT VAL_BRT, RAES.RAES_VAL_LQD,
RAES.RAES_VAL_BLQ_ALR, RAES.RAES_ADMS_ADME_ADM_COD, RAES.RAES_RMS_ANO_EMO, RAES.RAES_RMS_LOCP_COD,
RAES.RAES_RMS_TIP_NUM_RMS, RAES.RAES_RMS_NUM_PRG_RMS, RAES.RAES_DAT_COR, RAES.RAES_PRM_COB
RAES_PRM_COB_AUX, RAES.RAES_PGT_COD, RAES.RAES_ADMS_FRMS_COD, RAES.RAES_MOE_COD, RAES.RAES_NUM_PRL,
RAES.RAES_ADMS_SRV_COD, VPR.GUI_ADME_ADM_COD GUI_ADME_ADM_COD, VPR.GUI_EQP_NUM GUI_EQP_NUM,
VPR.GUI_COD GUI_COD, VPR.GUI_DTA_ETG GUI_DTA_ETG, VPR.GUI_QTD_TOT_IFD GUI_QTD_TOT_IFD,
VPR.GUI_VAL_TOT_IFD GUI_VAL_TOT_IFD, C620040P(RAES.RAES_STA_LBR_RMS, RAES.RAES_PRM_COB,
RAES.RAES_DAT_PGT, RAES.RAES_PGT_COD) RAES_STATUS, RAES_CTRA_ADMS_ADME_ADM_COD,
RAES_CTRA_ADMS_FRMS_COD, RAES_CTRA_ADMS_SRV_COD, RAES_CTRA_MOE_COD FROM RMS_ADM_EMP_SRV RAES,
V_RAES_PGT VPR
WHERE RAES.RAES_STA_LBR_RMS < 4
AND NVL(RAES.RAES_VAL_PGO_PRL, 0) > 0
AND NOT VPR.RAES_RMS_TIP_NUM_RMS = 7
AND NOT VPR.RAES_FRMP_COD = 7
AND RAES.RAES_STA_RAES + 0 >= 2
AND RAES.RAES_PGT_COD + 0 IS NULL
AND RAES.RAES_RMS_ANO_EMO = VPR.RAES_RMS_ANO_EMO
AND RAES.RAES_RMS_LOCP_COD = VPR.RAES_RMS_LOCP_COD
AND RAES.RAES_RMS_TIP_NUM_RMS = VPR.RAES_RMS_TIP_NUM_RMS
AND RAES.RAES_RMS_NUM_PRG_RMS = VPR.RAES_RMS_NUM_PRG_RMS
AND RAES.RAES_ADMS_ADME_ADM_COD = VPR.RAES_ADMS_ADME_ADM_COD
AND RAES.RAES_ADMS_SRV_COD = VPR.RAES_ADMS_SRV_COD
AND RAES.RAES_ADMS_FRMS_COD = VPR.RAES_ADMS_FRMS_COD
AND RAES.RAES_MOE_COD = VPR.RAES_MOE_COD
AND RAES.RAES_NUM_PRL = VPR.RAES_NUM_PRL
AND RAES.RAES_ADMS_ADME_ADM_COD = 5
AND RAES.RAES_ADMS_FRMS_COD = 2
AND RAES.RAES_ADMS_SRV_COD = 4 -- <<
AND RAES.RAES_MOE_COD = 'R$'
AND RAES.RAES_DAT_PGT = '12/16/2008'
AND RAES.RAES_PGT_GRD = 'N'
AND ( ('S' = 'S' AND VPR.PGT_CDS_COD_REL = 104938 )
OR ('S' != 'S' AND VPR.PGT_CDS_COD = 104938 ) );
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
-------------------------
Plan hash value: 3292825880
----------------------------------------------------------------------------------------------------
--------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
| Pstart| Pstop |
----------------------------------------------------------------------------------------------------
--------------------
| 0 | SELECT STATEMENT | | 1 | 179 | 9 (0)|
00:00:01 | | |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| RMS_ADM_EMP_SRV | 1 | 102 | 1 (0)|
00:00:01 | ROWID | ROWID |
| 2 | NESTED LOOPS | | 1 | 179 | 9 (0)|
00:00:01 | | |
| 3 | INLIST ITERATOR | | | | |
| | |
|* 4 | TABLE ACCESS BY INDEX ROWID | SNAP$_RAES_PGT | 1 | 77 | 8 (0)|
00:00:01 | | |
|* 5 | INDEX RANGE SCAN | SNAP$_RAES_PGT_PK | 185 | | 2 (0)|
00:00:01 | | |
|* 6 | INDEX RANGE SCAN | RAES_RMS_FK_I | 1 | | 1 (0)|
00:00:01 | | |
----------------------------------------------------------------------------------------------------
--------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAES"."RAES_ADMS_SRV_COD"=4 AND "RAES"."RAES_PGT_GRD"='N' AND
"RAES"."RAES_ADMS_FRMS_COD"=2 AND
"RAES"."RAES_PGT_COD"+0 IS NULL AND "RAES"."RAES_DAT_PGT"='12/16/2008' AND
NVL("RAES"."RAES_VAL_PGO_PRL",0)>0
AND "RAES"."RAES_ADMS_ADME_ADM_COD"=5 AND "RAES"."RAES_STA_LBR_RMS"<4 AND
"RAES"."RAES_STA_RAES"+0>=2 AND
"RAES"."RAES_MOE_COD"='R$' AND "RAES"."RAES_NUM_PRL"="RAES_NUM_PRL")
4 - filter("PGT_CDS_COD_REL"=104938 AND "RAES_FRMP_COD"<>7)
5 - access(("RAES_RMS_TIP_NUM_RMS"=1 OR "RAES_RMS_TIP_NUM_RMS"=2 OR "RAES_RMS_TIP_NUM_RMS"=3 OR
"RAES_RMS_TIP_NUM_RMS"=4 OR "RAES_RMS_TIP_NUM_RMS"=5 OR "RAES_RMS_TIP_NUM_RMS"=6 OR
"RAES_RMS_TIP_NUM_RMS"=7 OR
"RAES_RMS_TIP_NUM_RMS"=8 OR "RAES_RMS_TIP_NUM_RMS"=9) AND "RAES_ADMS_ADME_ADM_COD"=5
AND "RAES_ADMS_SRV_COD"=4
AND "RAES_ADMS_FRMS_COD"=2 AND "RAES_MOE_COD"='R$')
filter("RAES_RMS_TIP_NUM_RMS"<>7 AND "RAES_MOE_COD"='R$')
6 - access("RAES"."RAES_RMS_NUM_PRG_RMS"="RAES_RMS_NUM_PRG_RMS" AND
"RAES"."RAES_RMS_TIP_NUM_RMS"="RAES_RMS_TIP_NUM_RMS" AND
"RAES"."RAES_RMS_ANO_EMO"="RAES_RMS_ANO_EMO" AND
"RAES"."RAES_RMS_LOCP_COD"="RAES_RMS_LOCP_COD")
filter("RAES"."RAES_RMS_TIP_NUM_RMS"<>7)
31 rows selected.
The First example in this page against 10204
March 6, 2009 - 6pm Central time zone
Reviewer: Yoav
Hi Tom,
I run the first example in this page againt oracle 10204 , and got totally different results.
Actually the performance are better before gathering statistics : Its took to oracle 0.56 sec to
return an answer befor gathering stats. and 3.33 sec after gathering stats .
I did not find any benift using histograms against 10204 - in this case. (It may be related to
dynamic sampling ?)
could you please verify it ?
If indeed this is the case could you please give an example where i could benit from histograms in
10gr2 ?
Thanks
SQL*Plus: Release 10.2.0.4.0 - Production on Sat Mar 7 00:24:08 2009
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> alter session set optimizer_mode='ALL_ROWS';
Session altered.
SQL> show parameter optimizer_mode
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_mode string ALL_ROWS
SQL> create table t
as
select 1 X, rpad('*',4000,'*') data from all_objects a 2 3 ;
Table created.
SQL> insert into t
2 select 2, rpad('*',4000,'*') from all_objects a where rownum = 1;
1 row created.
SQL> commit;
Commit complete.
SQL> create index t_idx on t(x);
Index created.
SQL> alter session set sql_trace=true;
Session altered.
SQL> select count(data) from t t1 where x = 1;
COUNT(DATA)
-----------
53130
1 row selected.
SQL> select count(data) from t t1 where x = 2;
COUNT(DATA)
-----------
1
1 row selected.
SQL> alter system flush shared_pool;
System altered.
SQL> alter system flush buffer_cache;
System altered.
SQL> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'T',method_opt=>'FOR ALL COLUMNS
SIZE AUTO', cascade=>true);
PL/SQL procedure successfully completed.
SQL> select count(data) from t t2 where x = 1;
COUNT(DATA)
-----------
53130
1 row selected.
SQL> select count(data) from t t2 where x = 2;
COUNT(DATA)
-----------
1
1 row selected.
SQL> select sysdate from dual;
SYSDATE
---------
07-MAR-09
1 row selected.
===========================================================
TKPROF shows , befor collecting statsistics:
select count(data)
from t t1 where x = 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.58 0.56 0 118851 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.59 0.56 0 118853 0
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 13
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=118851 pr=0 pw=0 time=565898 us)
53130 TABLE ACCESS FULL T (cr=118851 pr=0 pw=0 time=584474 us)
select count(data)
from
t t1 where x = 2
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 4 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.00 0 6 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 13
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=4 pr=0 pw=0 time=39 us)
1 TABLE ACCESS BY INDEX ROWID T (cr=4 pr=0 pw=0 time=31 us)
1 INDEX RANGE SCAN T_IDX (cr=2 pr=0 pw=0 time=13 us)(object id 10977844)
AFTER COLLECTING STATISTICS:
select count(data)
from
t t2 where x = 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.16 3.33 22770 118851 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.16 3.33 22770 118851 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 13
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=118851 pr=22770 pw=0 time=3337004 us)
53130 TABLE ACCESS FULL T (cr=118851 pr=22770 pw=0 time=3932313 us)
***********************************************************
select count(data)
from
t t2 where x = 2
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 4 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 4 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 13
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=4 pr=0 pw=0 time=112 us)
1 TABLE ACCESS BY INDEX ROWID T (cr=4 pr=0 pw=0 time=98 us)
1 INDEX RANGE SCAN T_IDX (cr=2 pr=0 pw=0 time=71 us)(object id 10977844)
***********************************************************
Followup March 7, 2009 - 12pm Central time zone:
ummm, please apply a little bit of critical reading to what you see above and see if you cannot figure this out yourself?
do you see that they use the same plans? (yes, they do)
do you see that before and after statistics - they are identical in nature (yes, they are)
do you see that in between, you flushed the buffer cache and caused the second one to do A TON OF physical IO?
??????
Look - anytime you have the same exact plan, well, you have the SAME EXACT PLAN. If the plans are the same - they do - the same thing.
They did exactly the same thing, you flushed the cache for whatever reason (why would you do that???? that never happens in real life - I hate that command)
Guidelines for better sql statement
May 31, 2009 - 9am Central time zone
Reviewer: Roberto Veiga from BRAZIL, Sao Paulo
Tom I am facing a lot of problems with bind variables because the access plan is changing and a
critical performance problem occurs.
My customer is very disapointed about this "feature" of Oracle and he is asking me what to do to
avoid this behavior.
We have already tried to remove histograms and the situation become worse.
I have proposed to him to focus on the most critical process and put some hints to freeze the plan.
But the processes work with a lot of procedures with a lot of statements. So I need to define a
method to focus only in the queries with most change to have this problem.
My points are:
-Statements with Bind Variables
-Columns with Skewed distribution
-Statements with more than 5 tables join
-Tables with many rows (500k or more)
This could be a good start to identify queries that can have problem?
|