Now I know why the other post is in bold :) ... On to the histograms.
Vladimir Sadilovskiy, November 12, 2005 - 2:33 pm UTC
Tom,
I have a table that usually contains a sinle value in a column.
create table t1 as select * from all_objects;
insert /*+ append */ into t1 select * ftom t1;
...
update t1 set object_name='30_CHARACTERS_STRING_PPPPPPPPP';
dbms_stats.gather_table_stats(null,tabname=>'t1',method_opt=> 'for all columns size auto');
I have this simple query that is extracted from a bigger one:
select * from t1
where object_name != :1;
Now lets explain series of them with different imput:
explain plan for
select * from t1
where object_name != '1';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'1')
explain plan for
select * from t1
where object_name != 'A';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'A')
explain plan for
select * from t1
where object_name != 'ABC';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'ABC')
explain plan for
select * from t1
where object_name != 'ABCDEF';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 47M| 3821 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 47M| 3821 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'ABCDEF')
explain plan for
select * from t1
where object_name != 'ABCDEFG';
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 100 | 3703 (5)|
|* 1 | TABLE ACCESS FULL | T1 | 1 | 100 | 3703 (5)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'ABCDEFG')
SQL>
Please, take a look at estimated cardinality. Our problem that joining to such table with this predicate usually leads to merge join Cartesian. So, it's not an academic question.
Q1. Do you know why the value for inequality predicate such drasticly changes cardinality expectations? Especially '1' and 'A'. They seem to be similar.
Q2. I've made series of additional tests and found that the boundary length of the value and its content after which the cardinality drops to 1 depends on the value in the table.
Q3. What would be your workaround in case you don't know the cause?
Thanks as always for your thoughts.
- Vladimir
November 12, 2005 - 4:37 pm UTC
you don't have histograms, you did size auto - and without any pre-existing queries - the dbms_stats won't get any.
the magic is when you crossed 6 characters (suggest reading for you based on your last two questions is Jonathan Lewis's new book - see home page for a direct link to it.
I'm not sure what you mean by 1 and A as they seem to be the same as the rest?
Bah! Probably something in the text again. Sorry for that.
A reader, November 12, 2005 - 2:34 pm UTC
November 12, 2005 - 4:38 pm UTC
hash-mark B turns on bolding
hash-mark b turns it off
the 10053 traces have hash-mark B's in them.
Not that easy...
Vladimir Sadilovskiy, November 12, 2005 - 6:25 pm UTC
Tom,
It's just a coincidence I provided the tests with auto size. I can accept that migh brough in a different issue.
But, here is an example of the same behavior with histograms.
Could you please elaborate on "size auto - and without any pre-existing queries - the dbms_stats won't get any"?
Thanks,
- Vladimir
SQL> update t1 set object_name = 'aa.bb.cccccc';
commit;
495568 rows updated.
SQL>
Commit complete.
SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 254');
PL/SQL procedure successfully completed.
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 495K| 38M| 3883 (8)|
|* 1 | TABLE ACCESS FULL | T1 | 495K| 38M| 3883 (8)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'aa.bb.')
12 rows selected.
SQL> SQL>
2 rows deleted.
SQL> 2 3
Explained.
SQL> SQL>
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 82 | 3766 (5)|
|* 1 | TABLE ACCESS FULL | T1 | 1 | 82 | 3766 (5)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'aa.bb.c')
12 rows selected.
SQL> SQL>
2 rows deleted.
SQL> 2 3
Explained.
SQL> SQL>
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 82 | 3766 (5)|
|* 1 | TABLE ACCESS FULL | T1 | 1 | 82 | 3766 (5)|
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("T1"."OBJECT_NAME"<>'aa.bb.cccccc.dddddddddddddd')
12 rows selected.
SQL> SQL>
November 12, 2005 - 7:28 pm UTC
...
Could you please elaborate on "size auto - and without any pre-existing queries
- the dbms_stats won't get any"?...
size auto looks at the historical set of predicates used to figure out what columns might benefit from histgrams.
ops$tkyte@ORA9IR2> create table t
2 as
3 select 99 id, a.* from all_objects a;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> update t set id = 1 where rownum = 1;
1 row updated.
ops$tkyte@ORA9IR2> create index t_idx on t(id);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns size auto', cascade=>true );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where id = 99;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45 Card=15342 Bytes=1472832)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=45 Card=15342 Bytes=1472832)
ops$tkyte@ORA9IR2> select * from t where id = 1;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45 Card=15342 Bytes=1472832)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=45 Card=15342 Bytes=1472832)
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns size auto' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where id = 99;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=45 Card=30683 Bytes=2945568)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=45 Card=30683 Bytes=2945568)
ops$tkyte@ORA9IR2> select * from t where id = 1;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=96)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=96)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=1)
ops$tkyte@ORA9IR2> set autotrace off
you are seeing the 6 to 7 transition again. The column information is 6 bytes long only. strange things will happen if the first 6 bytes are the "same"
You will very much enjoy Jonathan's book - it will explain a lot.
Thanks Tom. That was very informative.
Vladimir Sadilovskiy, November 12, 2005 - 9:02 pm UTC
Reader, December 21, 2005 - 3:30 am UTC
Oracle generating 2 buckets for each column if I am executing this
command.
SQL> analyze table t compute statistics;
Table analyzed.
SQL> select count(*),column_name from user_Tab_histograms where
table_name = 'T' group by column_name;
COUNT(*) COLUMN_NAME
---------- --------------------------------------------------
2 CREATED
2 DATA_OBJECT_ID
2 GENERATED
2 LAST_DDL_TIME
2 OBJECT_ID
2 OBJECT_NAME
2 OBJECT_TYPE
2 OWNER
2 SECONDARY
2 STATUS
2 SUBOBJECT_NAME
2 TEMPORARY
2 TIMESTAMP
13 rows selected.
And If use this command ...
SQL> analyze table t compute statistics for table for all columns;
Table analyzed.
SQL> select count(*),column_name from user_Tab_histograms where
table_name = 'T' group by column_name;
COUNT(*) COLUMN_NAME
---------- --------------------------------------------------
76 CREATED
75 DATA_OBJECT_ID
2 GENERATED
76 LAST_DDL_TIME
76 OBJECT_ID
76 OBJECT_NAME
24 OBJECT_TYPE
11 OWNER
1 SECONDARY
2 STATUS
2 SUBOBJECT_NAME
2 TEMPORARY
76 TIMESTAMP
I am confused .. No of buckets is dependent upon what ?
Regards.
December 21, 2005 - 7:26 am UTC
first and foremost - stop using analyze to gather stats, start using dbms_stats.
Second, analyze table t compute statististics is
analyze table t
compute statistics
for table
for all columns SIZE 1
for all indexes;
ops$tkyte@ORA9IR2> create table t as select * from scott.emp;
Table created.
ops$tkyte@ORA9IR2> create index t_idx on t(ename);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
Table analyzed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
TABLE_NAME NUM_ROWS
------------------------------ ----------
T 14
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
COMM 2
DEPTNO 2
EMPNO 2
ENAME 2
HIREDATE 2
JOB 2
MGR 2
SAL 2
8 rows selected.
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name= 'T';
INDEX_NAME NUM_ROWS
------------------------------ ----------
T_IDX 14
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t delete statistics;
Table analyzed.
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
TABLE_NAME NUM_ROWS
------------------------------ ----------
T
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
no rows selected
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name= 'T';
INDEX_NAME NUM_ROWS
------------------------------ ----------
T_IDX
ops$tkyte@ORA9IR2> analyze table t
2 compute statistics
3 for table
4 for all columns SIZE 1
5 for all indexes;
Table analyzed.
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
TABLE_NAME NUM_ROWS
------------------------------ ----------
T 14
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
COMM 2
DEPTNO 2
EMPNO 2
ENAME 2
HIREDATE 2
JOB 2
MGR 2
SAL 2
8 rows selected.
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name= 'T';
INDEX_NAME NUM_ROWS
------------------------------ ----------
T_IDX 14
ops$tkyte@ORA9IR2>
Re:
Reader, December 21, 2005 - 7:40 am UTC
Thanks Tom for quick response.
And
analyze table t compute statistics for table for all columns;
This command using what size for columns ..
Got it default is 75 -- thanks
Reader, December 21, 2005 - 7:56 am UTC
Analyze table
A reader, December 21, 2005 - 8:51 am UTC
Tom,
This is not a follow up question but more of supporting incident to what you said about not using analyze table.
This happened several months ago so I do not remember all the table names. I was working with Oracle support to resolve an issue of EM running some very expensive queries against the 10gr1 database. In an OWC session, the Oracle support analyst asked me to analyze some tables using analyze table...command. After analyzing the tables, the problem did not go away. When I checked in the last_analyzed date in dba_tables, the dates were totally weird for those tables. We tried many times but no effect. Then we used the dbms_stats to analyze these tables and the problem was resolved and the weird dates were replaced by the correct ones. Though the analyze table command was populating the statistics, it was missing something which dbms_stats fixed.
After that I have stopped using analyze table completely even for small adhoc test cases on 10g databases.
December 21, 2005 - 7:26 pm UTC
analyze and dbms_stats can and will result in different numbers in the dictionary. The optimizer is built expecting the dbms_stats numbers. food for thought.
Fixing plan for table having skewed data
Narendra, March 24, 2006 - 6:31 am UTC
Hi Tom,
In following query in EXPLAIN PLAN, I am getting CARD=9 where as row returned is 1.
Following are necessary details:
SQL> select * from v$version ;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - 64bi
PL/SQL Release 10.1.0.4.0 - Production
CORE 10.1.0.4.0 Production
TNS for Solaris: Version 10.1.0.4.0 - Production
NLSRTL Version 10.1.0.4.0 - Production
SQL> desc product
Name Null? Type
----------------------------------------- -------- ----------------------------
PRODUCT_KEY NOT NULL NUMBER(10)
DESCRIPTION VARCHAR2(30)
FULL_DESCRIPTION VARCHAR2(50)
PRODUCT_TYPE VARCHAR2(20)
BRAND VARCHAR2(20)
PRODUCT_CATEGORY VARCHAR2(20)
AGE_CATEGORY VARCHAR2(20)
DEPARTMENT VARCHAR2(15)
PRODUCT_KEY is PRIMARY KEY and has a UNIQUE index on it.
Following is distribution of data for BRAND column.
SQL> select brand, count(*) from product group by brand ;
BRAND COUNT(*)
-------------------- ----------
Astro 4
Big Studios 24
Coda 1
Dordor 2
Duff 1
Galore 1
Leavenworth 1
Little Guys 2
MKF Studios 24
Nagazoo 6
Parabuster Inc. 29
BRAND COUNT(*)
-------------------- ----------
Sani 3
Solo 6
Wild Age 9
Wolf 28
15 rows selected.
Now, when I execute following query,
SQL> select * from product where brand = 'Duff';
PRODUCT_KEY DESCRIPTION
----------- ------------------------------
FULL_DESCRIPTION PRODUCT_TYPE
-------------------------------------------------- --------------------
BRAND PRODUCT_CATEGORY AGE_CATEGORY DEPARTMENT
-------------------- -------------------- -------------------- ---------------
141 Beer
FOOD
Duff Alcohol over 21 Beverage
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=9 Bytes=684)
1 0 TABLE ACCESS (FULL) OF 'PRODUCT' (TABLE) (Cost=2 Card=9 Bytes=684)
The plan shows FULL table access step has CARD=9, whereas query returns only single row.
I tried collecting statistics using
SQL> exec dbms_stats.gather_table_stats(ownname => user, tabname => 'PRODUCT', method_opt => 'FOR ALL INDEXED COLUMNS SIZE 254', cascade => true);
PL/SQL procedure successfully completed.
However, still no change in plan.
SQL> select * from product where brand = 'Duff';
PRODUCT_KEY DESCRIPTION
----------- ------------------------------
FULL_DESCRIPTION PRODUCT_TYPE
-------------------------------------------------- --------------------
BRAND PRODUCT_CATEGORY AGE_CATEGORY DEPARTMENT
-------------------- -------------------- -------------------- ---------------
141 Beer
FOOD
Duff Alcohol over 21 Beverage
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=9 Bytes=684)
1 0 TABLE ACCESS (FULL) OF 'PRODUCT' (TABLE) (Cost=2 Card=9 Bytes=684)
But when I do following:
SQL> create table product_test as select * from product ;
Table created.
And fire same query on new table:
SQL> select * from product_test where brand = 'Duff';
PRODUCT_KEY DESCRIPTION
----------- ------------------------------
FULL_DESCRIPTION PRODUCT_TYPE
-------------------------------------------------- --------------------
BRAND PRODUCT_CATEGORY AGE_CATEGORY DEPARTMENT
-------------------- -------------------- -------------------- ---------------
141 Beer
FOOD
Duff Alcohol over 21 Beverage
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=114)
1 0 TABLE ACCESS (FULL) OF 'PRODUCT_TEST' (TABLE) (Cost=2 Card=1 Bytes=114)
The CARDINALITY is correct.
How can I fix the problem of incorrect cardinality on PRODUCT table? This table is part of VIDEO5 schema (which is part of ORACLE DISCOVERER) and PRODUCT table contains 141 rows.
March 24, 2006 - 9:56 am UTC
Histograms would "correct" the estimate, the estimate right now is:
ops$tkyte@ORA10GR2> select 141/15 from dual;
141/15
----------
9.4
(141 rows, 15 distinct values). However, do you really NEED to fix it - 1 or 9 - is it making a material difference somewhere (if you use binds, unlikely for this table that you want to have histograms)
For study
Narendra, March 26, 2006 - 3:56 am UTC
Tom,
I was (obviously) not doing any tuning in this case. The idea of probing was just to know
1. Why estimates were incorrect ?
2. On creating separate table (without PK) why did it get "corrected"?
March 26, 2006 - 8:26 am UTC
why obviously? If you have to state "this was obviously" - that means "it was not" (else it would be obvious and you would have no need to state "obviously" :)
I told you why the estimates were "incorrect" and even showed you the math. (just divide)
On the other table, you gathered statistics differently one would presume. You have different statistics.
As a bit of magic, I can show this (how different statistics affect the opimizer)
I put your aggregated data into a table T1 and created T2 with your data as follows:
ops$tkyte@ORA10GR2> create table t2
2 as
3 with data
4 as
5 (select level l from dual connect by level <= 29)
6 select rownum id, t1.nm
7 from t1, data
8 where data.l <= t1.cnt
9 /
Table created.
ops$tkyte@ORA10GR2> select nm, count(*) from t2 group by nm;
NM COUNT(*)
------------------------------ ----------
Astro 4
Duff 1
Big Studios 24
Leavenworth 1
Sani 3
Wild Age 9
Solo 6
Galore 1
Nagazoo 6
Parabuster Inc. 29
Coda 1
Dordor 2
Little Guys 2
MKF Studios 24
Wolf 28
15 rows selected.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> select * from t2 where nm = 'Duff';
Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 117 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T2 | 9 | 117 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NM"='Duff')
<b>exactly what you saw - and for the reasons I stated - DIVIDE 141/15 ~ 9. About nine rows. Why? Because of the lack of information, the optimizer just knows the HIGH and LOW values for NM and the number of distinct values:</b>
ops$tkyte@ORA10GR2> set autotrace off
ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T2' group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
NM 2
ID 2
<b>but, and here is the "magic", we just regather stats and since we've run a query that puts a predicate on a column - the stats Oracle gathers using AUTO change:</b>
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR2> select column_name, count(*) from user_tab_histograms where table_name = 'T2' group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
NM 15
ID 2
<b>Oracle now has a rather complete histogram of the values in NM and will therefore make a better "guess"</b>
ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> select * from t2 where nm = 'Duff';
Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T2 | 1 | 13 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NM"='Duff')
ops$tkyte@ORA10GR2> set autotrace off
<b>Now, this DOES NOT MEAN run out and gather histograms on everything - in fact, quite the opposite. In many/most systems (transactional for example), you probably don't want them at all. In a warehouse (where the use of bind variables for the big queries would not be prevalent), you may well want them</b>
But this explains "why" - you had different statistics for the two different tables.
Thanks ("Obviously...")
Narendra, March 27, 2006 - 4:01 am UTC
Tom,
When I gathered statistics using METHOD_OPT as 'FOR ALL COLUMNS SIZE SKEWONLY', the cardinality in query plan got corrected. I feel reason for not being able to achieve that earlier was due to gathering statistics on INDEXED COLUMNS only. BTW, I NEVER gathered statistics on NEW table after creating it with CTAS statement. So can I assume that CTAS ALWAYS keeps table stats up-to-date?
p.s. That "Obvious" was too much...:)
Thanks.
March 27, 2006 - 9:58 am UTC
create table as select gathers no statistics, nothing would be "up to date"
Histograms on system tables.
Raju, August 14, 2006 - 3:47 am UTC
Hi Tom,
We have noticed several system tables where
dba_tab_col_statistics reports HEIGHT BALANCED histograms where bucket_cnt >
distcnt (count of distinct values). It was my understanding that a similar
problem was fixed in 10.2.0.1 (dbms_stats method_opt size n generates a
FREQUENCY histogram where n >= NDV). However, even more perplexing is why do we
have many height balanced histogram buckets on a column with only 1 distinct
value?
SQL > select ENDPOINT_NUMBER,ENDPOINT_VALUE from dba_histograms where
table_name = 'INDPART$' and column_name = 'TS#';
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
217 79
So, some questions?
1. in sys.hist_head$, how is bucket_cnt calculated?
2. what determines how many buckets a column gets?
Thanks,
Raju
August 14, 2006 - 11:16 am UTC
bucket cnt is not calculated, it is "set" - you either decide the number of buckets (size on the method opt) or you let us use any algorithm we feel like to do it "auto".
and you don't need/want to be querying hist_head$, just use the normal views like the rest of us :)
frequency based histograms - number of distinct values < 255
"height balanced" - number of distinct values >= 255
Now, example please? I can definitely reproduce your findings, by modifying data after gathering of course.
So, got example?
Histogram
A Reader, September 22, 2006 - 7:54 pm UTC
hi tom
Oracle 9i
DW Environment
Monthly Data Load
120 Million rows in Fact
we create hsitograms for all indexed columns with 10% sample size
one of the indexed columns is Month-Year to which data belongs - this column is not that skewed as each month roughly same # of rows get inserted.
The warehouse has 120 million rows for approx 72 months, when we do
select * from fact_table
it does a full scan, even though the selectivity is approx 1/72
i was wondering if the 10% samplesize is generating histograms which are more hurting then helping as with a selectivity of 1/72 indexes should be useful
2 questions
a) do you think removing the histograms on this (such)
columns will be helpful ?
b) do we also need to remove the stats of the index based
on this column ?
Thanks very much
September 24, 2006 - 1:26 pm UTC
... we create hsitograms for all indexed columns with 10% sample size
....
why? why not the other columns? why all indexed columns? why not "columns we know to be skewed and that we know we use in some predicate and that we know the optimizer needs to be told about"
since, well, histograms cost alot to compute...
Histrogram
Rinku, September 24, 2006 - 2:04 pm UTC
Excellent description about Histogram.
Thanks Tom & All...
A Reader, September 24, 2006 - 8:29 pm UTC
hi tom,
if i am not mistaken - you have mentioned a couple of diff times that starting with indexed columns for creating histograms is not a bad starting point - and to me it stands to reason - the fact that one decided to create indexes on these columns is a good indication that these often get used in predicates,
i know this is an approximation - but nevertheless a good starting point. The other issue is of manageabilty. A decent sized app can easilt have 2000 columns (100 tables with 20 columns each) - to tell precisely which colsumns to create histograms can be quite a task - with high maintenance as well, as the schema evolves. (As a suggestion it would be a rather useful addition to oracle create table DDL syntax to be able to earmark columns for which to create histograms - on the column clause).
My point is - if we do fall back on this rather common paradigm of creating histograms on indexed columns - a good best practice could be :
"IF YOU CREATE HISTOGRAMS FOR ALL INDEXED COLUMNS (COZ YOU ARE LAZY NOT TO ANALYZE WHICH ONES REALLY NEEN EM :-) AND
IF YOU ARE ANALYZING A SMALL % OF ROWS - EG 10 % - THEN MAKE SURE YOU DELETE THE COLUMN STATS FOR COLUMN WHICH YOU KNOW FOR A FACT ARE NOT SKEWED"
the basic reason, i think we need to do is
if a table has 72 M rows - say 1 M for each month - in such as case a predicate suct as month_year = '01-2006' -
has a selectivity of of 1/72. And with that kind of selectivity i feel it should use the index - and its the histograms build with just 10% of the rows that are confusing the system. Do you agree ?
Does all this even remotely make sense - will be graeful to hear your take.
Regards,
September 25, 2006 - 2:25 am UTC
can you point to a place where I said that - so I can update it.
In a data warehouse, you don't index in many cases - so it doesn't really hold true (indexes are good for getting a few rows from a big table, warehouses are many times characterized for NOT doing that)
to gather histograms on all indexed columns to me would be the wrong (inefficient) approach.
maybe you should not gather any. I don't fall back on "common paradigms", those are also known as ROT (rule of thumb)
you would not need to remove the ones you know are not skewed (this is funny, you say you don't have the time/energy/motivation to discover what ones you SHOULD gather on, but you are knowledgable about the ones you don't need them on?? that would imply you do know the skewed columns by definition - it is the set of columns left over after you take these away). Since they are not skewed, the histograms will reflect that.
No, it doesn't make sense to me to say "it is too much work to understand our data"
test
A reader, September 25, 2006 - 7:17 am UTC
test
Dumb question... How do I NOT create histograms when using DBMS_STATS to gather stats?
Philip Moore, September 26, 2006 - 2:36 pm UTC
Hi Tom,
This may be an EXTREMELY dumb question - but I can't find the answer in the 9i documentation or on askTom...
How do I explicitly NOT create histograms when using DBMS_STATS with 9.2.0.6? What is the "METHOD_OPT" parameter I should use?
I really need to analyze my tables with DBMS_STATS - I just don't want histograms :)
Thanks for your help in advance, sir.
September 26, 2006 - 5:04 pm UTC
size 1
if you have one bucket, only thing we know is "high and low and number distinct" values.
statistics & HIST_HEAD$
Antonio Dell'Elce, September 27, 2006 - 8:50 am UTC
Dear Tom,
I am investigating possible performance improvements on
on a 10.1.0.2.0 instance, looking at the ADDM report:
FINDING 1: 88% impact (36798 seconds)
-------------------------------------
SQL statements consuming significant database time were found.
RECOMMENDATION 1: SQL Tuning, 60% benefit (25021 seconds)
ACTION: Investigate the SQL statement with SQL_ID "b6usrg82hwsa3" for
possible performance improvements.
RELEVANT OBJECT: SQL statement with SQL_ID b6usrg82hwsa3
call dbms_stats.gather_database_stats_job_proc ( )
(only first lines reported)
and at v$segment_statistics:
OWNER OBJ SUBNAME SNAME VALUE
---------- ------------------------------ --------------- -------------------- ----------------
SYS HIST_HEAD$ db block changes 107,039,200
STAT_OWN INTERVALLI SYS_SUBP1119 physical reads 80,407,080
STAT_OWN INTERVALLI SYS_SUBP1121 physical reads 80,335,637
STAT_OWN INTERVALLI SYS_SUBP1117 physical reads 68,966,192
STAT_OWN INTERVALLI SYS_SUBP1118 physical reads 55,163,908
STAT_OWN INTERVALLI SYS_SUBP1120 physical reads 43,452,888
SYS I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST db block changes 11,185,920
SYS HISTGRM$ db block changes 10,908,512
I came to believe that statistics collections are not working at best, ... how do you suggest I can investigate this?
Thanks,
Antonio
September 27, 2006 - 3:27 pm UTC
is this job running during your time of "peak usage", this is the auto scheduled job that should be running "off peak", is it running when you are "doing useful work" or "when you are asleep and no one is really doing stuff"
Antonio, September 29, 2006 - 12:19 pm UTC
I verified the schedules and it is on the default schedule
(night and weekends) but while the system is "operational"
on the whole day (it collects data from other systems which
are 24hrs active)
September 30, 2006 - 7:50 am UTC
#bok, big question: should you be gathering stats so frequently...
you cannot really change what this routine does.
Find queries using histograms
Ravi, October 26, 2007 - 4:45 pm UTC
Tom,
How do we find a query is using histograms or not? Only way is through trace?
Thanks
Ravi
October 29, 2007 - 12:25 pm UTC
query user_tab_histograms - if they have them, the query was optimized with them in mind.
that is, every query that accesses a table with histograms "used" them - whether they make any difference to the plan or not
Histogram is not working ?
Roberto Veiga, February 17, 2009 - 3:09 pm UTC
Tom, I think I have a problem.
The user is complaining about performance.
I have a query that uses bind variables.
We are investigating the query and we've noticed the plan changes according with the values that are passed to the query.
We have already collected statistcs and histograms. But sometimes the performance is good and sometimes is bad.
I dont know what to say to the user because this process have to have a good performance and one minute makes a lot of difference.
Here is my tests:
** *******************************************************************
** ORIGINAL QUERY AND ORIGINAL ACCESS PLAN
** *******************************************************************
SELECT RAES.RAES_USR_ANT_ATM, RAES_DAT_ANT_ATM, RAES.RAES_STA_LBR_RMS, RAES.RAES_CTRA_AFL_CDS_COD, RAES.RAES_DAT_CLC, RAES.RAES_DAT_PGT, VPR.PGT_VAL_BRT VAL_BRT, RAES.RAES_VAL_LQD, RAES.RAES_VAL_BLQ_ALR, RAES.RAES_ADMS_ADME_ADM_COD, RAES.RAES_RMS_ANO_EMO, RAES.RAES_RMS_LOCP_COD, RAES.RAES_RMS_TIP_NUM_RMS, RAES.RAES_RMS_NUM_PRG_RMS, RAES.RAES_DAT_COR, RAES.RAES_PRM_COB RAES_PRM_COB_AUX, RAES.RAES_PGT_COD, RAES.RAES_ADMS_FRMS_COD, RAES.RAES_MOE_COD, RAES.RAES_NUM_PRL, RAES.RAES_ADMS_SRV_COD, VPR.GUI_ADME_ADM_COD GUI_ADME_ADM_COD, VPR.GUI_EQP_NUM GUI_EQP_NUM, VPR.GUI_COD GUI_COD, VPR.GUI_DTA_ETG GUI_DTA_ETG, VPR.GUI_QTD_TOT_IFD GUI_QTD_TOT_IFD, VPR.GUI_VAL_TOT_IFD GUI_VAL_TOT_IFD, C620040P(RAES.RAES_STA_LBR_RMS, RAES.RAES_PRM_COB, RAES.RAES_DAT_PGT, RAES.RAES_PGT_COD) RAES_STATUS, RAES_CTRA_ADMS_ADME_ADM_COD, RAES_CTRA_ADMS_FRMS_COD, RAES_CTRA_ADMS_SRV_COD, RAES_CTRA_MOE_COD FROM RMS_ADM_EMP_SRV RAES, V_RAES_PGT VPR
WHERE RAES.RAES_STA_LBR_RMS < 4
AND NVL(RAES.RAES_VAL_PGO_PRL, 0) > 0
AND NOT VPR.RAES_RMS_TIP_NUM_RMS = 7
AND NOT VPR.RAES_FRMP_COD = 7
AND RAES.RAES_STA_RAES + 0 >= 2
AND RAES.RAES_PGT_COD + 0 IS NULL
AND RAES.RAES_RMS_ANO_EMO = VPR.RAES_RMS_ANO_EMO
AND RAES.RAES_RMS_LOCP_COD = VPR.RAES_RMS_LOCP_COD
AND RAES.RAES_RMS_TIP_NUM_RMS = VPR.RAES_RMS_TIP_NUM_RMS
AND RAES.RAES_RMS_NUM_PRG_RMS = VPR.RAES_RMS_NUM_PRG_RMS
AND RAES.RAES_ADMS_ADME_ADM_COD = VPR.RAES_ADMS_ADME_ADM_COD
AND RAES.RAES_ADMS_SRV_COD = VPR.RAES_ADMS_SRV_COD
AND RAES.RAES_ADMS_FRMS_COD = VPR.RAES_ADMS_FRMS_COD
AND RAES.RAES_MOE_COD = VPR.RAES_MOE_COD
AND RAES.RAES_NUM_PRL = VPR.RAES_NUM_PRL
AND RAES.RAES_ADMS_ADME_ADM_COD = 5
AND RAES.RAES_ADMS_FRMS_COD = 2
AND RAES.RAES_ADMS_SRV_COD = 1
AND RAES.RAES_MOE_COD = 'R$'
AND RAES.RAES_DAT_PGT = '12/16/2008'
AND RAES.RAES_PGT_GRD = 'N'
AND ( ('S' = 'S' AND VPR.PGT_CDS_COD_REL = 104938 )
OR ('S' != 'S' AND VPR.PGT_CDS_COD = 104938 ) );
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3546104366
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 179 | 49 (0)| 00:00:01 | | |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| RMS_ADM_EMP_SRV | 1 | 102 | 1 (0)| 00:00:01 | ROWID | ROWID |
| 2 | NESTED LOOPS | | 1 | 179 | 49 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS BY INDEX ROWID | SNAP$_RAES_PGT | 13 | 1001 | 46 (3)| 00:00:01 | | |
|* 4 | INDEX RANGE SCAN | SNAP$RAES_AFL_DAT_PGT_I | 218 | | 1 (0)| 00:00:01 | | |
|* 5 | INDEX RANGE SCAN | RAES_RMS_FK_I | 1 | | 1 (0)| 00:00:01 | | |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAES"."RAES_PGT_GRD"='N' AND "RAES"."RAES_ADMS_FRMS_COD"=2 AND "RAES"."RAES_ADMS_SRV_COD"=1 AND
"RAES"."RAES_PGT_COD"+0 IS NULL AND "RAES"."RAES_DAT_PGT"='12/16/2008' AND NVL("RAES"."RAES_VAL_PGO_PRL",0)>0 AND
"RAES"."RAES_ADMS_ADME_ADM_COD"=5 AND "RAES"."RAES_STA_LBR_RMS"<4 AND "RAES"."RAES_STA_RAES"+0>=2 AND
"RAES"."RAES_MOE_COD"='R$' AND "RAES"."RAES_NUM_PRL"="RAES_NUM_PRL")
3 - filter("RAES_ADMS_SRV_COD"=1 AND "RAES_ADMS_FRMS_COD"=2 AND "RAES_RMS_TIP_NUM_RMS"<>7 AND "RAES_FRMP_COD"<>7
AND "RAES_ADMS_ADME_ADM_COD"=5 AND "RAES_MOE_COD"='R$' AND ("RAES_RMS_TIP_NUM_RMS"=1 OR "RAES_RMS_TIP_NUM_RMS"=2 OR
"RAES_RMS_TIP_NUM_RMS"=3 OR "RAES_RMS_TIP_NUM_RMS"=4 OR "RAES_RMS_TIP_NUM_RMS"=5 OR "RAES_RMS_TIP_NUM_RMS"=6 OR
"RAES_RMS_TIP_NUM_RMS"=7 OR "RAES_RMS_TIP_NUM_RMS"=8 OR "RAES_RMS_TIP_NUM_RMS"=9))
4 - access("PGT_CDS_COD_REL"=104938)
5 - access("RAES"."RAES_RMS_NUM_PRG_RMS"="RAES_RMS_NUM_PRG_RMS" AND
"RAES"."RAES_RMS_TIP_NUM_RMS"="RAES_RMS_TIP_NUM_RMS" AND "RAES"."RAES_RMS_ANO_EMO"="RAES_RMS_ANO_EMO" AND
"RAES"."RAES_RMS_LOCP_COD"="RAES_RMS_LOCP_COD")
filter("RAES"."RAES_RMS_TIP_NUM_RMS"<>7)
29 rows selected.
** *******************************************************************
** TESTING VALUES ON COLUMN RAES_ADMS_SRV_COD
** *******************************************************************
-- -------------------------------------------------------------------
-- DISTINCT VALUES FOR COLUMN RAES_ADMS_SRV_COD
-- -------------------------------------------------------------------
SELECT DISTINCT RAES_ADMS_SRV_COD FROM RMS_ADM_EMP_SRV ORDER BY 1;
RAES_ADMS_SRV_COD
-----------------
1
2
3
4
6
7
16
22
23
24
25
26
27
30
31
32
16 rows selected.
-- -------------------------------------------------------------------
-- BUCKETS/VALUES FOR COLUMN RAES_ADMS_SRV_COD
-- -------------------------------------------------------------------
select min(RAES_ADMS_SRV_COD), max(RAES_ADMS_SRV_COD), count(RAES_ADMS_SRV_COD), wb
from (
select RAES_ADMS_SRV_COD,
width_bucket( RAES_ADMS_SRV_COD,
(select min(RAES_ADMS_SRV_COD)-1 from prod_dba.RMS_ADM_EMP_SRV),
(select max(RAES_ADMS_SRV_COD)+1 from prod_dba.RMS_ADM_EMP_SRV), 50 ) wb
from prod_dba.RMS_ADM_EMP_SRV
)
group by wb
order by wb
/
MIN(RAES_ADMS_SRV_COD)|MAX(RAES_ADMS_SRV_COD)|COUNT(RAES_ADMS_SRV_COD)| WB
----------------------|----------------------|------------------------|----------
1| 1| 11692690| 2
2| 2| 36778973| 4
3| 3| 3309| 5
4| 4| 5125| 7
6| 6| 3431| 10
7| 7| 2111| 11
16| 16| 4731| 25
22| 22| 43| 34
23| 23| 5| 35
24| 24| 8851| 37
25| 25| 537676| 38
26| 26| 1387| 40
27| 27| 797| 41
30| 30| 227446| 46
31| 31| 3285855| 47
32| 32| 1473147| 49
16 rows selected.
-- -------------------------------------------------------------------
-- TESTING RAES_ADMS_SRV_COD = 1
-- -------------------------------------------------------------------
SELECT RAES.RAES_USR_ANT_ATM, RAES_DAT_ANT_ATM, RAES.RAES_STA_LBR_RMS, RAES.RAES_CTRA_AFL_CDS_COD, RAES.RAES_DAT_CLC, RAES.RAES_DAT_PGT, VPR.PGT_VAL_BRT VAL_BRT, RAES.RAES_VAL_LQD, RAES.RAES_VAL_BLQ_ALR, RAES.RAES_ADMS_ADME_ADM_COD, RAES.RAES_RMS_ANO_EMO, RAES.RAES_RMS_LOCP_COD, RAES.RAES_RMS_TIP_NUM_RMS, RAES.RAES_RMS_NUM_PRG_RMS, RAES.RAES_DAT_COR, RAES.RAES_PRM_COB RAES_PRM_COB_AUX, RAES.RAES_PGT_COD, RAES.RAES_ADMS_FRMS_COD, RAES.RAES_MOE_COD, RAES.RAES_NUM_PRL, RAES.RAES_ADMS_SRV_COD, VPR.GUI_ADME_ADM_COD GUI_ADME_ADM_COD, VPR.GUI_EQP_NUM GUI_EQP_NUM, VPR.GUI_COD GUI_COD, VPR.GUI_DTA_ETG GUI_DTA_ETG, VPR.GUI_QTD_TOT_IFD GUI_QTD_TOT_IFD, VPR.GUI_VAL_TOT_IFD GUI_VAL_TOT_IFD, C620040P(RAES.RAES_STA_LBR_RMS, RAES.RAES_PRM_COB, RAES.RAES_DAT_PGT, RAES.RAES_PGT_COD) RAES_STATUS, RAES_CTRA_ADMS_ADME_ADM_COD, RAES_CTRA_ADMS_FRMS_COD, RAES_CTRA_ADMS_SRV_COD, RAES_CTRA_MOE_COD FROM RMS_ADM_EMP_SRV RAES, V_RAES_PGT VPR
WHERE RAES.RAES_STA_LBR_RMS < 4
AND NVL(RAES.RAES_VAL_PGO_PRL, 0) > 0
AND NOT VPR.RAES_RMS_TIP_NUM_RMS = 7
AND NOT VPR.RAES_FRMP_COD = 7
AND RAES.RAES_STA_RAES + 0 >= 2
AND RAES.RAES_PGT_COD + 0 IS NULL
AND RAES.RAES_RMS_ANO_EMO = VPR.RAES_RMS_ANO_EMO
AND RAES.RAES_RMS_LOCP_COD = VPR.RAES_RMS_LOCP_COD
AND RAES.RAES_RMS_TIP_NUM_RMS = VPR.RAES_RMS_TIP_NUM_RMS
AND RAES.RAES_RMS_NUM_PRG_RMS = VPR.RAES_RMS_NUM_PRG_RMS
AND RAES.RAES_ADMS_ADME_ADM_COD = VPR.RAES_ADMS_ADME_ADM_COD
AND RAES.RAES_ADMS_SRV_COD = VPR.RAES_ADMS_SRV_COD
AND RAES.RAES_ADMS_FRMS_COD = VPR.RAES_ADMS_FRMS_COD
AND RAES.RAES_MOE_COD = VPR.RAES_MOE_COD
AND RAES.RAES_NUM_PRL = VPR.RAES_NUM_PRL
AND RAES.RAES_ADMS_ADME_ADM_COD = 5
AND RAES.RAES_ADMS_FRMS_COD = 2
AND RAES.RAES_ADMS_SRV_COD = 1 -- <<
AND RAES.RAES_MOE_COD = 'R$'
AND RAES.RAES_DAT_PGT = '12/16/2008'
AND RAES.RAES_PGT_GRD = 'N'
AND ( ('S' = 'S' AND VPR.PGT_CDS_COD_REL = 104938 )
OR ('S' != 'S' AND VPR.PGT_CDS_COD = 104938 ) );
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3546104366
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 179 | 49 (0)| 00:00:01 | | |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| RMS_ADM_EMP_SRV | 1 | 102 | 1 (0)| 00:00:01 | ROWID | ROWID |
| 2 | NESTED LOOPS | | 1 | 179 | 49 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS BY INDEX ROWID | SNAP$_RAES_PGT | 13 | 1001 | 46 (3)| 00:00:01 | | |
|* 4 | INDEX RANGE SCAN | SNAP$RAES_AFL_DAT_PGT_I | 218 | | 1 (0)| 00:00:01 | | |
|* 5 | INDEX RANGE SCAN | RAES_RMS_FK_I | 1 | | 1 (0)| 00:00:01 | | |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAES"."RAES_PGT_GRD"='N' AND "RAES"."RAES_ADMS_FRMS_COD"=2 AND "RAES"."RAES_ADMS_SRV_COD"=1 AND
"RAES"."RAES_PGT_COD"+0 IS NULL AND "RAES"."RAES_DAT_PGT"='12/16/2008' AND NVL("RAES"."RAES_VAL_PGO_PRL",0)>0 AND
"RAES"."RAES_ADMS_ADME_ADM_COD"=5 AND "RAES"."RAES_STA_LBR_RMS"<4 AND "RAES"."RAES_STA_RAES"+0>=2 AND
"RAES"."RAES_MOE_COD"='R$' AND "RAES"."RAES_NUM_PRL"="RAES_NUM_PRL")
3 - filter("RAES_ADMS_SRV_COD"=1 AND "RAES_ADMS_FRMS_COD"=2 AND "RAES_RMS_TIP_NUM_RMS"<>7 AND "RAES_FRMP_COD"<>7
AND "RAES_ADMS_ADME_ADM_COD"=5 AND "RAES_MOE_COD"='R$' AND ("RAES_RMS_TIP_NUM_RMS"=1 OR "RAES_RMS_TIP_NUM_RMS"=2 OR
"RAES_RMS_TIP_NUM_RMS"=3 OR "RAES_RMS_TIP_NUM_RMS"=4 OR "RAES_RMS_TIP_NUM_RMS"=5 OR "RAES_RMS_TIP_NUM_RMS"=6 OR
"RAES_RMS_TIP_NUM_RMS"=7 OR "RAES_RMS_TIP_NUM_RMS"=8 OR "RAES_RMS_TIP_NUM_RMS"=9))
4 - access("PGT_CDS_COD_REL"=104938)
5 - access("RAES"."RAES_RMS_NUM_PRG_RMS"="RAES_RMS_NUM_PRG_RMS" AND
"RAES"."RAES_RMS_TIP_NUM_RMS"="RAES_RMS_TIP_NUM_RMS" AND "RAES"."RAES_RMS_ANO_EMO"="RAES_RMS_ANO_EMO" AND
"RAES"."RAES_RMS_LOCP_COD"="RAES_RMS_LOCP_COD")
filter("RAES"."RAES_RMS_TIP_NUM_RMS"<>7)
29 rows selected.
-- -------------------------------------------------------------------
-- TESTING RAES_ADMS_SRV_COD = 4
-- -------------------------------------------------------------------
** >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< **
** >>>>>>>>>>>>>>>>>>>>>>>>>>> CHANGE ON ACCESS PLAN <<<<<<<<<<<<<<<<<<<<<<<<<< **
** >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< **
SELECT RAES.RAES_USR_ANT_ATM, RAES_DAT_ANT_ATM, RAES.RAES_STA_LBR_RMS, RAES.RAES_CTRA_AFL_CDS_COD, RAES.RAES_DAT_CLC, RAES.RAES_DAT_PGT, VPR.PGT_VAL_BRT VAL_BRT, RAES.RAES_VAL_LQD, RAES.RAES_VAL_BLQ_ALR, RAES.RAES_ADMS_ADME_ADM_COD, RAES.RAES_RMS_ANO_EMO, RAES.RAES_RMS_LOCP_COD, RAES.RAES_RMS_TIP_NUM_RMS, RAES.RAES_RMS_NUM_PRG_RMS, RAES.RAES_DAT_COR, RAES.RAES_PRM_COB RAES_PRM_COB_AUX, RAES.RAES_PGT_COD, RAES.RAES_ADMS_FRMS_COD, RAES.RAES_MOE_COD, RAES.RAES_NUM_PRL, RAES.RAES_ADMS_SRV_COD, VPR.GUI_ADME_ADM_COD GUI_ADME_ADM_COD, VPR.GUI_EQP_NUM GUI_EQP_NUM, VPR.GUI_COD GUI_COD, VPR.GUI_DTA_ETG GUI_DTA_ETG, VPR.GUI_QTD_TOT_IFD GUI_QTD_TOT_IFD, VPR.GUI_VAL_TOT_IFD GUI_VAL_TOT_IFD, C620040P(RAES.RAES_STA_LBR_RMS, RAES.RAES_PRM_COB, RAES.RAES_DAT_PGT, RAES.RAES_PGT_COD) RAES_STATUS, RAES_CTRA_ADMS_ADME_ADM_COD, RAES_CTRA_ADMS_FRMS_COD, RAES_CTRA_ADMS_SRV_COD, RAES_CTRA_MOE_COD FROM RMS_ADM_EMP_SRV RAES, V_RAES_PGT VPR
WHERE RAES.RAES_STA_LBR_RMS < 4
AND NVL(RAES.RAES_VAL_PGO_PRL, 0) > 0
AND NOT VPR.RAES_RMS_TIP_NUM_RMS = 7
AND NOT VPR.RAES_FRMP_COD = 7
AND RAES.RAES_STA_RAES + 0 >= 2
AND RAES.RAES_PGT_COD + 0 IS NULL
AND RAES.RAES_RMS_ANO_EMO = VPR.RAES_RMS_ANO_EMO
AND RAES.RAES_RMS_LOCP_COD = VPR.RAES_RMS_LOCP_COD
AND RAES.RAES_RMS_TIP_NUM_RMS = VPR.RAES_RMS_TIP_NUM_RMS
AND RAES.RAES_RMS_NUM_PRG_RMS = VPR.RAES_RMS_NUM_PRG_RMS
AND RAES.RAES_ADMS_ADME_ADM_COD = VPR.RAES_ADMS_ADME_ADM_COD
AND RAES.RAES_ADMS_SRV_COD = VPR.RAES_ADMS_SRV_COD
AND RAES.RAES_ADMS_FRMS_COD = VPR.RAES_ADMS_FRMS_COD
AND RAES.RAES_MOE_COD = VPR.RAES_MOE_COD
AND RAES.RAES_NUM_PRL = VPR.RAES_NUM_PRL
AND RAES.RAES_ADMS_ADME_ADM_COD = 5
AND RAES.RAES_ADMS_FRMS_COD = 2
AND RAES.RAES_ADMS_SRV_COD = 4 -- <<
AND RAES.RAES_MOE_COD = 'R$'
AND RAES.RAES_DAT_PGT = '12/16/2008'
AND RAES.RAES_PGT_GRD = 'N'
AND ( ('S' = 'S' AND VPR.PGT_CDS_COD_REL = 104938 )
OR ('S' != 'S' AND VPR.PGT_CDS_COD = 104938 ) );
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3292825880
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 179 | 9 (0)| 00:00:01 | | |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| RMS_ADM_EMP_SRV | 1 | 102 | 1 (0)| 00:00:01 | ROWID | ROWID |
| 2 | NESTED LOOPS | | 1 | 179 | 9 (0)| 00:00:01 | | |
| 3 | INLIST ITERATOR | | | | | | | |
|* 4 | TABLE ACCESS BY INDEX ROWID | SNAP$_RAES_PGT | 1 | 77 | 8 (0)| 00:00:01 | | |
|* 5 | INDEX RANGE SCAN | SNAP$_RAES_PGT_PK | 185 | | 2 (0)| 00:00:01 | | |
|* 6 | INDEX RANGE SCAN | RAES_RMS_FK_I | 1 | | 1 (0)| 00:00:01 | | |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAES"."RAES_ADMS_SRV_COD"=4 AND "RAES"."RAES_PGT_GRD"='N' AND "RAES"."RAES_ADMS_FRMS_COD"=2 AND
"RAES"."RAES_PGT_COD"+0 IS NULL AND "RAES"."RAES_DAT_PGT"='12/16/2008' AND NVL("RAES"."RAES_VAL_PGO_PRL",0)>0
AND "RAES"."RAES_ADMS_ADME_ADM_COD"=5 AND "RAES"."RAES_STA_LBR_RMS"<4 AND "RAES"."RAES_STA_RAES"+0>=2 AND
"RAES"."RAES_MOE_COD"='R$' AND "RAES"."RAES_NUM_PRL"="RAES_NUM_PRL")
4 - filter("PGT_CDS_COD_REL"=104938 AND "RAES_FRMP_COD"<>7)
5 - access(("RAES_RMS_TIP_NUM_RMS"=1 OR "RAES_RMS_TIP_NUM_RMS"=2 OR "RAES_RMS_TIP_NUM_RMS"=3 OR
"RAES_RMS_TIP_NUM_RMS"=4 OR "RAES_RMS_TIP_NUM_RMS"=5 OR "RAES_RMS_TIP_NUM_RMS"=6 OR "RAES_RMS_TIP_NUM_RMS"=7 OR
"RAES_RMS_TIP_NUM_RMS"=8 OR "RAES_RMS_TIP_NUM_RMS"=9) AND "RAES_ADMS_ADME_ADM_COD"=5 AND "RAES_ADMS_SRV_COD"=4
AND "RAES_ADMS_FRMS_COD"=2 AND "RAES_MOE_COD"='R$')
filter("RAES_RMS_TIP_NUM_RMS"<>7 AND "RAES_MOE_COD"='R$')
6 - access("RAES"."RAES_RMS_NUM_PRG_RMS"="RAES_RMS_NUM_PRG_RMS" AND
"RAES"."RAES_RMS_TIP_NUM_RMS"="RAES_RMS_TIP_NUM_RMS" AND "RAES"."RAES_RMS_ANO_EMO"="RAES_RMS_ANO_EMO" AND
"RAES"."RAES_RMS_LOCP_COD"="RAES_RMS_LOCP_COD")
filter("RAES"."RAES_RMS_TIP_NUM_RMS"<>7)
31 rows selected.
February 17, 2009 - 4:37 pm UTC
The First example in this page against 10204
Yoav, March 06, 2009 - 6:00 pm UTC
Hi Tom,
I run the first example in this page againt oracle 10204 , and got totally different results.
Actually the performance are better before gathering statistics : Its took to oracle 0.56 sec to return an answer befor gathering stats. and 3.33 sec after gathering stats .
I did not find any benift using histograms against 10204 - in this case. (It may be related to dynamic sampling ?)
could you please verify it ?
If indeed this is the case could you please give an example where i could benit from histograms in 10gr2 ?
Thanks
SQL*Plus: Release 10.2.0.4.0 - Production on Sat Mar 7 00:24:08 2009
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> alter session set optimizer_mode='ALL_ROWS';
Session altered.
SQL> show parameter optimizer_mode
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_mode string ALL_ROWS
SQL> create table t
as
select 1 X, rpad('*',4000,'*') data from all_objects a 2 3 ;
Table created.
SQL> insert into t
2 select 2, rpad('*',4000,'*') from all_objects a where rownum = 1;
1 row created.
SQL> commit;
Commit complete.
SQL> create index t_idx on t(x);
Index created.
SQL> alter session set sql_trace=true;
Session altered.
SQL> select count(data) from t t1 where x = 1;
COUNT(DATA)
-----------
53130
1 row selected.
SQL> select count(data) from t t1 where x = 2;
COUNT(DATA)
-----------
1
1 row selected.
SQL> alter system flush shared_pool;
System altered.
SQL> alter system flush buffer_cache;
System altered.
SQL> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'T',method_opt=>'FOR ALL COLUMNS SIZE AUTO', cascade=>true);
PL/SQL procedure successfully completed.
SQL> select count(data) from t t2 where x = 1;
COUNT(DATA)
-----------
53130
1 row selected.
SQL> select count(data) from t t2 where x = 2;
COUNT(DATA)
-----------
1
1 row selected.
SQL> select sysdate from dual;
SYSDATE
---------
07-MAR-09
1 row selected.
===========================================================
TKPROF shows , befor collecting statsistics:
select count(data)
from t t1 where x = 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.58 0.56 0 118851 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.59 0.56 0 118853 0
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 13
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=118851 pr=0 pw=0 time=565898 us)
53130 TABLE ACCESS FULL T (cr=118851 pr=0 pw=0 time=584474 us)
select count(data)
from
t t1 where x = 2
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 4 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.00 0 6 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 13
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=4 pr=0 pw=0 time=39 us)
1 TABLE ACCESS BY INDEX ROWID T (cr=4 pr=0 pw=0 time=31 us)
1 INDEX RANGE SCAN T_IDX (cr=2 pr=0 pw=0 time=13 us)(object id 10977844)
AFTER COLLECTING STATISTICS:
select count(data)
from
t t2 where x = 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.16 3.33 22770 118851 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.16 3.33 22770 118851 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 13
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=118851 pr=22770 pw=0 time=3337004 us)
53130 TABLE ACCESS FULL T (cr=118851 pr=22770 pw=0 time=3932313 us)
***********************************************************
select count(data)
from
t t2 where x = 2
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 4 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 4 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 13
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=4 pr=0 pw=0 time=112 us)
1 TABLE ACCESS BY INDEX ROWID T (cr=4 pr=0 pw=0 time=98 us)
1 INDEX RANGE SCAN T_IDX (cr=2 pr=0 pw=0 time=71 us)(object id 10977844)
***********************************************************
March 07, 2009 - 12:15 pm UTC
ummm, please apply a little bit of critical reading to what you see above and see if you cannot figure this out yourself?
do you see that they use the same plans? (yes, they do)
do you see that before and after statistics - they are identical in nature (yes, they are)
do you see that in between, you flushed the buffer cache and caused the second one to do A TON OF physical IO?
??????
Look - anytime you have the same exact plan, well, you have the SAME EXACT PLAN. If the plans are the same - they do - the same thing.
They did exactly the same thing, you flushed the cache for whatever reason (why would you do that???? that never happens in real life - I hate that command)
Guidelines for better sql statement
Roberto Veiga, May 31, 2009 - 9:00 am UTC
Tom I am facing a lot of problems with bind variables because the access plan is changing and a critical performance problem occurs.
My customer is very disapointed about this "feature" of Oracle and he is asking me what to do to avoid this behavior.
We have already tried to remove histograms and the situation become worse.
I have proposed to him to focus on the most critical process and put some hints to freeze the plan.
But the processes work with a lot of procedures with a lot of statements. So I need to define a method to focus only in the queries with most change to have this problem.
My points are:
-Statements with Bind Variables
-Columns with Skewed distribution
-Statements with more than 5 tables join
-Tables with many rows (500k or more)
This could be a good start to identify queries that can have problem?
Questions on histograms and how they are useful
A reader, April 13, 2010 - 12:11 pm UTC
Hi Tom - We have a current prod environment where we run full 100% stats on IOT tables. For other tables, we run the following command
SYS.DBMS_STATS.GATHER_SCHEMA_STATS (
OwnName => '<owner>'
,estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE
,degree => DBMS_STATS.AUTO_DEGREE);
How will the histograms be generated for this ? Few weeks back, we found that having histograms on two columns, a system generated GUID column and a typecode column contributed to some slowness on some queries. So everyday now we delete the stats on these two columns. I want to get away with this and change the stats script to do something better. What are the other options I have ?
April 14, 2010 - 7:58 am UTC
read about method_opt and what it's defaults are on your system (change by release). It controls what histograms are/are not generated.
A reader, April 14, 2010 - 10:34 am UTC
Our database is Oracle 10.2.0.4.0. Before changing method_opt how can we find out which histograms are having a negative impact. Can we check the skewness and determine if the histograms will be good or bad ? We have a lot of tables which have the histograms on system generated GUIDs. Can I use a scientific method to determine if the histograms on the GUIDs are ok to be deleted ?
April 14, 2010 - 3:47 pm UTC
none of them should be in this case - you'd have to give me an example.
If there were a universal check that anyone could run to find "bad histograms" (no such thing), then we would do that as part of gather and remove them already. There is no such thing as a "bad histogram"
You'd have to give me an example whereby your histograms are causing an issue, demonstrate with before and after plans
histograms + estimate_percent?
Xenofon, April 28, 2010 - 4:50 am UTC
Hi Tom,
I am currently reading "Cost-Based Oracle Fundamentals" from Jonathan Lewis, great book, but "hard-bread" as Germans would say.
One question I have concerning histograms is: even JL seems to always use the default "estimate_percent" setting. But does this make sense, when you certainly have a non-uniform value distribution and you want to collect histograms to get around that problem?
Say the CBO chooses to scan 10% of the table. Certainly it will get a good estimate about the highest frequented values, but what about the rare ones? What about those values, he did not even find in the 10%? Which bucket are they assigned to or how are those predicates managed during hard parse?
Many thanks in advance
April 28, 2010 - 8:30 am UTC
there are two types of histograms - those when we think there are less than 255 distinct values and those when we know there are more than 255.
When there is less then 255 - we have a frequency histogram - a bucket per value. If we don't observe some value in the table - it will of course not have a bucket and when it doesn't have a bucket, doesn't have a value - we "make up an estimate".
If you estimate, yes, you might miss an observation and we'll have to 'guess'. But it is not always viable to compute to get "perfection" - especially in the type of system that a histogram is useful for (warehouse/read mostly/report system) since the segments can be very large.
play with this:
/*
drop table t;
create table t (x,y)
as
select level, rpad('*',500,'*') from dual connect by level <= 20
union all
select mod(level,10)+21, rpad('*',500,'*') from dual connect by level <= 100000
/
select x, count(*) from t group by x order by x;
*/
exec dbms_stats.delete_table_stats( user, 'T' )
exec dbms_stats.gather_table_stats(user,'T',method_opt=>'for columns x size 254', estimate_percent=>1);
declare
c sys_refcursor;
l_rec t%rowtype;
begin
for i in 1 .. 30
loop
delete from plan_table;
execute immediate 'explain plan for select * from t where x = ' || i ;
for x in ( select plan_table_output from table(dbms_xplan.display()) where plan_table_output like '|* 1 | TABLE ACCESS FULL| T %' )
loop
dbms_output.put_line( to_char( i, '999' ) || ') ' || x.plan_table_output );
end loop;
end loop;
end;
/
run the block a couple of times and see how the estimated row counts vary over time. Use estimate_percent=>100 to see the get "fixed" at a value - a perfect value.
Histogram on 11GR2
Rajeshwaran, Jeyabal, May 29, 2011 - 9:30 pm UTC
May 31, 2011 - 10:04 am UTC
http://www.oracle.com/technetwork/issue-archive/2011/11-jan/o11asktom-195084.html It is still 32 bytes (that article is using 11gR2)
In your example, if you drop the index - the estimated cardinalities are inaccurate in all cases. That shows it was not the histograms on the column that helped the optimizer out in that case, but rather information gleaned from the index (remember - index creates on non-empty tables have an implicit compute statistics associated with them since 10g so you had detailed index statistics).
It was the index information, not the histograms.
If you run your example in 10g in fact, you'll see different cardinality estimates when the index does/does not exist. I observed it estimating 10 rows with the index and 505 rows without in 10g. Not exactly the same as 11g, but not exactly different either :)
Histogram on 11GR2
Rajeshwaran, Jeyabal, May 31, 2011 - 12:27 pm UTC
May 31, 2011 - 1:10 pm UTC
umm, your example sort of proves the point. The estimated cardinality without th e index index is 315. With it, it is 1. That is the POINT - the histogram isn't working past 32 bytes.
ops$tkyte%ORA11GR2> drop table t purge;
Table dropped.
ops$tkyte%ORA11GR2> create table t
2 nologging
3 as
4 select rownum as x,
5 rpad('*',28,'*')||to_char(rownum,'fm00000') as y,
6 sysdate as z,
7 a.*
8 from all_objects a;
Table created.
ops$tkyte%ORA11GR2> create index t_ind on t(y) nologging;
Index created.
ops$tkyte%ORA11GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname =>'T',
5 estimate_percent=> dbms_stats.auto_sample_size,
6 method_opt=>'for all indexed columns size 254' );
7 end;
8 /
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> select column_name,count(*)
2 from user_tab_histograms
3 where table_name ='T'
4 group by column_name;
COLUMN_NAME
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
COUNT(*)
----------
Y
255
ops$tkyte%ORA11GR2> variable y varchar2(40);
ops$tkyte%ORA11GR2> exec :y := rpad('*',28,'*')||to_char(55,'fm00000');
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> set autotrace traceonly explain statistics;
ops$tkyte%ORA11GR2> select x,y,z
2 from t
3 where y = :y;
Execution Plan
----------------------------------------------------------
Plan hash value: 1376202287
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 56 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 56 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IND | 1 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("Y"=:Y)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
567 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
ops$tkyte%ORA11GR2> set autotrace off
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> drop table t purge;
Table dropped.
ops$tkyte%ORA11GR2> create table t
2 nologging
3 as
4 select rownum as x,
5 rpad('*',28,'*')||to_char(rownum,'fm00000') as y,
6 sysdate as z,
7 a.*
8 from all_objects a;
Table created.
ops$tkyte%ORA11GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname =>'T',
5 estimate_percent=> dbms_stats.auto_sample_size,
6 method_opt=>'for all indexed columns size 254' );
7 end;
8 /
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> select column_name,count(*)
2 from user_tab_histograms
3 where table_name ='T'
4 group by column_name;
no rows selected
ops$tkyte%ORA11GR2> variable y varchar2(40);
ops$tkyte%ORA11GR2> exec :y := rpad('*',28,'*')||to_char(55,'fm00000');
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> set autotrace traceonly explain statistics;
ops$tkyte%ORA11GR2> select x,y,z
2 from t
3 where y = :y;
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 723 | 29643 | 420 (1)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| T | 723 | 29643 | 420 (1)| 00:00:06 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("Y"=:Y)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1511 consistent gets
0 physical reads
0 redo size
567 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
ops$tkyte%ORA11GR2> set autotrace off
Histogram on 11GR2
Rajeshwaran, Jeyabal, May 31, 2011 - 2:05 pm UTC
Tom:
So you agree that Histograms improved to 34 Bytes in Oracle 11GR2?
Histogram on 11GR2
Rajeshwaran, Jeyabal, May 31, 2011 - 9:52 pm UTC
Tom:
I was looking at the script that you used above (two posting above). (Specifically on the second Create table T statement)
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:707586567563#3475409700346643210 ops$tkyte%ORA11GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname =>'T',
5 estimate_percent=> dbms_stats.auto_sample_size,
6 method_opt=>'for all indexed columns size 254' );
7 end;
8 /
PL/SQL procedure successfully completed.Since we dont have index on Table 'T', I would mention the
method_opt as
'for all columns size 254' and NOT as
'for all indexed columns size 254'.
And Here is what I see after
making method_opt=>'for all columns size 254' rajesh@ORA11GR2> drop table t purge;
Table dropped.
Elapsed: 00:00:00.40
rajesh@ORA11GR2>
rajesh@ORA11GR2> create table t
2 nologging
3 as
4 select rownum as x,
5 rpad('*',28,'*')||to_char(rownum,'fm00000') as y,
6 sysdate as z,
7 a.*
8 from all_objects a;
Table created.
Elapsed: 00:00:07.30
rajesh@ORA11GR2>
rajesh@ORA11GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname =>'T',
5 estimate_percent=>dbms_stats.auto_sample_size,
6 method_opt=>'for all columns size 254');
7 end;
8 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.42
rajesh@ORA11GR2>
rajesh@ORA11GR2> select column_name,count(*)
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='Y'
5 group by column_name;
COLUMN_NAME COUNT(*)
-------------------- --------
Y 255
Elapsed: 00:00:00.00
rajesh@ORA11GR2>
rajesh@ORA11GR2> variable y varchar2(40);
rajesh@ORA11GR2> exec :y := rpad('*',28,'*')||to_char(55,'fm00000');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.00
rajesh@ORA11GR2>
rajesh@ORA11GR2> set autotrace traceonly explain statistics;
rajesh@ORA11GR2>
rajesh@ORA11GR2> select x,y,z
2 from t
3 where y = :y;
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 47 | 416 (1)| 00:00:05 |
|* 1 | TABLE ACCESS FULL| T | 1 | 47 | 416 (1)| 00:00:05 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("Y"=:Y)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1498 consistent gets
0 physical reads
0 redo size
567 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
rajesh@ORA11GR2>
rajesh@ORA11GR2> set autotrace off;
rajesh@ORA11GR2>
rajesh@ORA11GR2>
rajesh@ORA11GR2>
You see the Estimated Cardinality matches with Actual cardinality for even 33 bytes of string data. This clearly proves that 11GR2 Optimizer has improved.
June 01, 2011 - 8:40 am UTC
you are correct in that my dbms_stats was wrong.
You are incorrect about the 32/33/34 byte thing.
Look at the substr(1,32) of your bind. It is:
****************************0005
(the last character is gone). That, coupled with the NDV (number of distinct values) is more than enough to allow the optimizer to come up with the right answer.
The article I've pointed you to clearly shows the optimizer STOPS at 32 characters. Your example does not prove that it goes beyond.
You'll find it extremely difficult to prove something is right with a test case - most often you can use them to prove something is not right or not always right. In this case, my previous article shows it stops at 32 characters. Your test case does not prove that it goes beyond 32 characters. It only shows that with your specific data and the way it is constructed - you "got lucky".
I can show you it doesn't use 33 bytes. Let's construct the test a bit differently:
create table t
nologging
as
select rownum as x,
rpad('*',28,'*')||'0005'||mod(rownum,10) as y,
sysdate as z,
a.*
from all_objects a;
begin
dbms_stats.gather_table_stats(
ownname =>user,
tabname =>'T',
estimate_percent=> dbms_stats.auto_sample_size,
method_opt=>'for columns Y size 254' );
end;
/
Now, the first 32 bytes are constant - but the 33rd byte is one of ten values. Hence, it will retrieve 10% of the data.
AND there are only ten values in this column - if they fit into a histogram column - there would be 10 buckets, everything would be Perfectly guessable...
but, the optimizer guesses wrong:
ops$tkyte%ORA11GR2> select column_name,count(*)
2 from user_tab_histograms
3 where table_name ='T'
4 group by column_name;
COLUMN_NAME COUNT(*)
------------------------------ ----------
Y 1
ops$tkyte%ORA11GR2> variable y varchar2(40);
ops$tkyte%ORA11GR2> exec :y := rpad('*',28,'*')||'0005' || '8';
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> set serveroutput off
ops$tkyte%ORA11GR2> set termout off
ops$tkyte%ORA11GR2> select * from TABLE(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID bw1r5zgh6zr7x, child number 0
-------------------------------------
select x,y,substr(y,1,32), z from t where y = :y
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 420 (100)| |
|* 1 | TABLE ACCESS FULL| T | 72241 | 3950K| 420 (1)| 00:00:06 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("Y"=:Y)
18 rows selected.
It guessed every row in the table, not 10% of the table. However, if we back off the length of the string to 32 - just by removing one byte:
ops$tkyte%ORA11GR2> create table t
2 nologging
3 as
4 select rownum as x,
5 rpad('*',28,'*')||'005'||mod(rownum,10) as y,
6 sysdate as z,
7 a.*
8 from all_objects a;
Table created.
ops$tkyte%ORA11GR2> begin
2 dbms_stats.gather_table_stats(
3 ownname =>user,
4 tabname =>'T',
5 estimate_percent=> dbms_stats.auto_sample_size,
6 method_opt=>'for columns Y size 254' );
7 end;
8 /
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> select column_name,count(*)
2 from user_tab_histograms
3 where table_name ='T'
4 group by column_name;
COLUMN_NAME COUNT(*)
------------------------------ ----------
Y 10
ops$tkyte%ORA11GR2> variable y varchar2(40);
ops$tkyte%ORA11GR2> exec :y := rpad('*',28,'*')||'005' || '8';
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> set serveroutput off
ops$tkyte%ORA11GR2> set termout off
ops$tkyte%ORA11GR2> select * from TABLE(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID bw1r5zgh6zr7x, child number 0
-------------------------------------
select x,y,substr(y,1,32), z from t where y = :y
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 416 (100)| |
|* 1 | TABLE ACCESS FULL| T | 7044 | 378K| 416 (1)| 00:00:05 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("Y"=:Y)
18 rows selected.
bingo - 10% guess - because the histogram is now useful. (and exact same thing happens in 10g as well)
This *shows* 32 bytes. Yours 'worked' because your first 32 bytes was discriminating enough.
Test cases are excellent at proving something is (always) true. My test case here shows that it doesn't use 33 bytes, if it did - both of the estimates above would have been the same.
Histogram on 11GR2
Rajeshwaran, Jeyabal, June 01, 2011 - 11:38 am UTC
Thanks for Teaching me again and again Tom!
execution plan change
Reader, June 15, 2011 - 6:12 pm UTC
we are on 10.2.0.4.0. We gather histogram and our application uses bind variables. SQL Execution plan sometimes changes causing "unacceptable" performance. This causes business impact, as reports time out. Will sql plan management feature in 11g address this issue? If so, we can propose to business that upgrading to 11gR2 is the only option. Thanks.
June 17, 2011 - 1:23 pm UTC
what you are seeing is bind variable peeking. You have a query that would be optimized differently depending on which bind input is used to optimize the query
http://www.google.com/search?q=site%3Atkyte.blogspot.com+bind+variable+peeking One solution could be to NOT gather histograms. You'll get a consistent plan that way. If we do not have statistics that would cause two different plans - you won't get plans that flip flop depending on who parsed them.
You can use sql profiles if you have access to that.
You can use a query plan baseline right now - in 10g. You would create a baseline for that query, and in that baseline would be the "good" plan. We would use that plan.
execution plan change
Reader, June 18, 2011 - 9:51 am UTC
Yes, we do create sql profile for that query but that is after the fact it caused business impact as report timed out. Assuming we are on 11gR2, does 11g feature "sql plan management" help not to accept the "bad" plan because of the plan change due to bind variable peeking/histogram? Thanks.
June 20, 2011 - 9:49 am UTC
plan baselines already exist in 10g, they are the part of sql plan management you are interested in. sql plan management = 10g baselines PLUS plan evolution (and plan evolution could actually evolve the plan you don't want into the baseline under some circumstances)
so you already have what you need. plan baselines. as stated above.
Histograms on columns having NDV's greater than 2200
sri, June 27, 2011 - 10:42 am UTC
Hi Tom,
I created a table with 1 partition(range) and 1 subpartition(list).
insert into above_table
select * from another_table
where cola='0000012345' and rownum <= 21999.
generated histograms on cola and endpoint number is 21999. Fair.
But If I insert with ..and rownum <= 22000 the endpoint number comes up something like 5324...
Please advice.
June 27, 2011 - 11:41 am UTC
cola has one value - where do you have NDV's greater than 2200???
give full example to reproduce with and explain what the issue is (not what you see, but what material side effect you are experiencing)
how dbms_stats determin column orders
shu, June 30, 2011 - 2:39 pm UTC
I'm doing a dbms_stats.gather_table_stats on a large table 800gb and 70 columns, I saw the the process touch lot of columns randomly. Is there a way to find out the order to help me determine how long it will contine to run?
The METHOD_OPT, i used is "col1 size skewonly, col2 size skewonly,col3 size skewonly,col4 size skewonly,col5 size skewonly ... ... "
DB is 11.1
Thanks
July 01, 2011 - 8:59 am UTC
you should be able to monitor that via v$session_longops
A reader, November 03, 2011 - 11:58 am UTC
Hi Tom,
create table t as
select * from all_objects;
create index idx_owner on t(owner);
SQL> select object_id from t where owner='ORDDATA';
OBJECT_ID
----------
58236
58237
58238
58239
58260
Elapsed: 00:00:00.31
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=5 Bytes=150
)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=2 Card=
5 Bytes=150)
2 1 INDEX (RANGE SCAN) OF 'IDX_OWNER' (INDEX) (Cost=1 Card=5
)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
449 bytes sent via SQL*Net to client
363 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5 rows processed
I did not gather statistics on the table :so NO histograms
select * from all_histograms where table_name like 'T';
----NULL--
but the CBO estimated the CARDINALITY =5 exactly ..
how was it possible ?
Thanks
November 03, 2011 - 3:07 pm UTC
A reader, November 08, 2011 - 2:36 pm UTC
Hi Tom,
SQL> execute dbms_stats.gather_table_stats('HR','T');
I have 2 questions
1.After executing the above procedure- the histograms were not selected on a column (ID)- why ?
2.what is the 'deciding factor' that Oracle decides to collect histograms ?
Thanks Tom
November 08, 2011 - 4:53 pm UTC
1) how would I know - I have NO context here at all. what is HR.T. why do you think histograms should be on column ID (id columns are many times unique - not skewed).
2) need context here - and I have zero
A reader, November 08, 2011 - 5:41 pm UTC
Sorry Tom,
select id, count(*) from t
group by id
id count(*)
1 1
50 100
999 9999
its just a temp table I created myself
November 09, 2011 - 7:13 am UTC
give the full example and show your stats - like num rows and such.
Also, did you query the table before gathering statistics?
If the table was a true temporary table - no idea if that is the case or not due to the total lack of any useful information - then it could be that dbms_stats (since it commits) wipes out the data:
ops$tkyte%ORA11GR2> create global temporary table t on commit delete rows
2 as
3 select case when rownum = 1 then 1
4 when rownum <= 51 then 50
5 else 999
6 end id, a.*
7 from all_objects a
8 where 1=0
9 /
Table created.
ops$tkyte%ORA11GR2> insert into t
2 select case when rownum = 1 then 1
3 when rownum <= 51 then 50
4 else 999
5 end id, a.*
6 from all_objects a
7 where rownum <= 1050
8 /
1050 rows created.
ops$tkyte%ORA11GR2> select id, count(*) from t group by id;
ID COUNT(*)
---------- ----------
1 1
999 999
50 50
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> select num_rows from user_tables where table_name = 'T';
NUM_ROWS
----------
0
Or, it is a real table posing as a temporary table, it could be for the reason documented here:
http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html in the section "Why Does My Plan Change?"
for example:
ops$tkyte%ORA11GR2> create table t
2 as
3 select case when rownum = 1 then 1
4 when rownum <= 51 then 50
5 else 999
6 end id, a.*
7 from all_objects a
8 where rownum <= 1050
9 /
Table created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select id, count(*) from t group by id;
ID COUNT(*)
---------- ----------
1 1
999 999
50 50
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select histogram from user_tab_columns where table_name = 'T' and column_name = 'ID';
HISTOGRAM
---------------
NONE
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select * from t where id = 1;
ID OWNER OBJECT_NAME
---------- ------------------------------ ------------------------------
SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE
------------------------------ ---------- -------------- -------------------
CREATED LAST_DDL_ TIMESTAMP STATUS T G S NAMESPACE
--------- --------- ------------------- ------- - - - ----------
EDITION_NAME
------------------------------
1 SYS ICOL$
20 2 TABLE
05-SEP-10 05-SEP-10 2010-09-05:15:39:55 VALID N N N 1
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> select histogram from user_tab_columns where table_name = 'T' and column_name = 'ID';
HISTOGRAM
---------------
FREQUENCY
To the "Reader" above...
A reader, November 08, 2011 - 7:35 pm UTC
Do you have the explain plan? It is possibly because of the following 2 reasons:
1. It is a small table and hence does not need an index
2. Since you requested for the entire data it has rightly done a full table scan.
Cheers.
November 09, 2011 - 7:14 am UTC
1) that wouldn't matter, you don't gather histograms exclusively on indexed columns.
You gather histograms in order to get the correct estimated cardinalities - that includes columns used in the where clause that are not indexed as well.
2) that wasn't the question :) the question was "why no histograms, the data is skewed"
A reader, November 09, 2011 - 8:00 am UTC
Value Based Histogram-When to Incorporate
Jim Cox, November 30, 2011 - 5:35 pm UTC
Hi Tom
i am just starting into looking at Value Based Histograms
Excerpt from above where you said:
*****
Followup February 21, 2004 - 2pm Central time zone:
I typically start with the equivalent of "table", "index" and "all indexed columns" myself. If you
have the cycles to compute the histograms, they can be helpful (with bind variable peeking and
such).......
*****
I was wondering on just where to start if I wanted to incorporate a Value Base Histogram on a particular table and column ?
I am already doing what you stated above, but I am not sure just when to add a Value Based Histogram
If I have some slow running sql, would i run a test before and then after i incorporate the histogram to see if it speeds up the sql or is there some other method i should pursue to determine if any slow sql would benefit from the histogram ?
Thanks
Jim
December 06, 2011 - 9:49 am UTC
You'll naturally get a value based histogram (frequency) if it has 254 and less distinct values (just use size 254). You'll get height balanced if you have more than 254 values.
I don't see a point in not using size 254 for everything - if there are less than 254 distinct values, that's all we'll store.
Value Based Histogram-When to Incorporate-Thanks
Jim Cox, December 06, 2011 - 4:08 pm UTC
Thanks Tom
Jim
A Reader
awais, December 28, 2011 - 1:15 pm UTC
DB is 10.2.0.2.0
Hi tom as i am gathering stats with histogram for all columns with size 254.
exec dbms_stats.gather_table_stats('INV','MTL_MATERIAL_TRANSACTIONS',method_opt=>'for all columns size 254')
As you can see there are 24 organization id with its number of occurence within table MTL_MATERIAL_TRANSACTIONS
select organization_id,count(organization_id)
from mtl_material_transactions m
group by organization_id
/
ORGANIZATION_ID COUNT(ORGANIZATION_ID)
--------------- -----------------------
84 969421
102 3817258
124 1812622
125 5219763
143 1714215<----
145 3891003
168 1133484
169 3086792
170 1501886
171 2531388
172 1004050
173 2697835
184 798113
185 3242713
204 187333
205 1016991
224 656003
225 2274042
242 255
243 32
264 71905
265 398933
284 5
285 5
As i understand after getting stats with histogram the difference of previous and current cumulative frequency describe the number of rows a bucket contain i.e from above 143 organization id count value is 1714215 but from below histogram for 143 it would be 1933-1688=245 rows contain which is contradict to actual value to 1714215 for 143 , what i am missing for stats gathering please clear me.
select column_name,endpoing_number,endpoint_value
from user_tab_histograms h
where h.table_name='MTL_MATERIAL_TRANSACTIONS' and column_name='ORGANIZATION_ID'
/
COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
------------------ --------------- -------------- ---------------------
ORGANIZATION_ID 699 102
ORGANIZATION_ID 939 124
ORGANIZATION_ID 1688 125
ORGANIZATION_ID 1933 143<-----
ORGANIZATION_ID 2495 145
ORGANIZATION_ID 2660 168
ORGANIZATION_ID 3126 169
ORGANIZATION_ID 3324 170
ORGANIZATION_ID 3690 171
ORGANIZATION_ID 3822 172
ORGANIZATION_ID 4222 173
ORGANIZATION_ID 4321 184
ORGANIZATION_ID 4795 185
ORGANIZATION_ID 4822 204
ORGANIZATION_ID 4954 205
ORGANIZATION_ID 5048 224
ORGANIZATION_ID 5371 225
ORGANIZATION_ID 5383 264
ORGANIZATION_ID 139 84
ORGANIZATION_ID 5442 265
December 29, 2011 - 11:09 am UTC
245 is more than close enough to the right answer - we get worried when estimated cardinalities are off be orders of magnitude (x10), this small difference is not worrisome.
what is the optimizer estimating in plans for that predicate now - something close to reality, or something very far away.
Awais a Reader
Awais, December 30, 2011 - 3:37 am UTC
you said 245 is more than close enough to the right answer...
1)Why not bucket actual row contain for organization id 143 is 1714215 rows?
you said we get worried when estimated cardinalities are off be orders of magnitude (x10), this small difference is not worrisome...
2)I could not understand it , please bear me.
yuo said "what is the optimizer estimating in plans for that predicate now - something close to reality, or something very far away."
Here i go
select
*
from user_tab_statistics
where table_name='MTL_MATERIAL_TRANSACTIONS'
/
GLOBAL_STATS YES
OBJECT_TYPE TABLE
STALE_STATS NO
USER_STATS NO
TABLE_NAME MTL_MATERIAL_TRANSACTIONS
SAMPLE_SIZE 38093598
NUM_ROWS 38093598<-------------------------
AVG_ROW_LEN 252
BLOCKS 1899357
LAST_ANALYZED 12/27/2011 9:36:08 PM
AVG_SPACE_FREELIST_BLOCKS 0
NUM_FREELIST_BLOCKS 0
EMPTY_BLOCKS 0
AVG_SPACE 0
CHAIN_CNT 0
PARTITION_POSITION
PARTITION_NAME
STATTYPE_LOCKED
SUBPARTITION_NAME
AVG_CACHED_BLOCKS
AVG_CACHE_HIT_RATIO
SUBPARTITION_POSITION
select *
from user_tab_col_statistics
where table_name='MTL_MATERIAL_TRANSACTIONS' and column_name='ORGANIZATION_ID'
/
TABLE_NAME MTL_MATERIAL_TRANSACTIONS
COLUMN_NAME ORGANIZATION_ID
NUM_DISTINCT 20<-------------------------Distinct values are 20 within stats due to last analyzed on 27th Dec
LOW_VALUE C155
HIGH_VALUE C20342
DENSITY 1.32189925035054E-8
NUM_NULLS 0
NUM_BUCKETS 20
LAST_ANALYZED 12/27/2011 9:36:08 PM
SAMPLE_SIZE 5448
GLOBAL_STATS YES
USER_STATS NO
AVG_COL_LEN 4
HISTOGRAM FREQUENCY
select
a.column_name,
apps.display_raw(a.low_value,b.data_type) as low_val,
apps.display_raw(a.high_value,b.data_type) as high_val,
b.data_type
from
user_tab_col_statistics a, user_tab_cols b
where
a.table_name='MTL_MATERIAL_TRANSACTIONS' and
b.column_name='ORGANIZATION_ID' and
a.table_name=b.table_name and
a.column_name=b.column_name
COLUMN_NAME ORGANIZATION_ID
LOW_VAL 84
HIGH_VAL 265
DATA_TYPE NUMBER
By default, for columns that do not have histograms (statistics that relate to data skew), the statistic called density contains the answer for the
selectivity calculation for an equality condition. In my simple example, stats exist so the selectivity would be for an equality as 1/NDV (# distinct values) for the column(s) used in the WHERE clause.
Cardinality = selectivity * rows in table
= 1/20 * 38093598
= 0.05 * 38093598
= 1904680<--------------------------
Organization Id Already Indexed.
SQL> select /*+ gather_plan_statistics abc*/ count(*)
2 from mtl_material_transactions
3 where organization_id=143
4 /
COUNT(*)
----------
1722507
SQL> SELECT * FROM table(dbms_xplan.display_cursor(NULL,NULL,'ALLSTATS LAST'))
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID 813dmq90jmyjy, child number 0
-------------------------------------
select /*+ gather_plan_statistics abc*/ count(*) from mtl_material_transactions where organizatio
Plan hash value: 1164751366
----------------------------------------------------------------------------------------------------
| Id | Operation |Name |Starts|E-Rows |A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:44.28| 5895 | 5840 |
|* 2 | INDEX RANGE SCAN| MTL_MATERIAL_TRANSACTIONS_N26| 1 | 1713K| 1722K |00:00:44.83| 589
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ORGANIZATION_ID"=143)
18 rows selected.
As i have calculated cardinility is 1904680 but optimizer estimating 1713000 , what's wrong with my cardinility calculation?Thats why i bothered
either there is issue with histogram stats.
December 30, 2011 - 11:28 am UTC
sorry, i thought 143 was the actual row count.
that last bit of information there shows the optimizer is guessing 1713k rows and we are getting 1722k rows - that is very close. What is the issue? The optimizer is getting the proper estimated cardinality here?
stop looking at and trying to interpret buckets and endpoints and the like, look at what the optimizer is actually guessing. And it seems to be guessing correctly?
Awais
Awais, December 31, 2011 - 9:49 am UTC
Again same two silly question tom please.
I agree optimizer guess is perfect , but what about the typical cardinality calculation is the below statement and cardinility calculation is wrong?
---------------------------------------------------------
"By default, for columns that do not have histograms (statistics that relate to data skew), the statistic called density contains the answer for the
selectivity calculation for an equality condition. In my simple example, stats exist so the selectivity would be for an equality as 1/NDV (# distinct values) for the column(s) used in the WHERE clause. "
Cardinality = selectivity * rows in table
= 1/20 * 38093598
= 0.05 * 38093598
= 1904680<------
----------------------------------------------------------
If i stop seeing to interpret bucket then how oracle interpret it ? i am 100% agree that what optimizer do should be adhered but what is the backend picture for such a skewed column data , how oracle inerpret the buckets row from histogram?
Awais
Awais, January 02, 2012 - 3:43 am UTC
Tom but why my cardinality calculation is wrong to optimizer cardinility.
Cardinality = selectivity * rows in table
= 1/20 * 38093598
= 0.05 * 38093598
= 1904680<------
Progress Meter
Hemanshu Sheth, January 03, 2012 - 8:54 am UTC
Is there any function used as a select projection, which returns maximum rowcount of a table which can be used as under for getting the progress of table scan so far?
select ((rownum/rowcount())+(1/rowcount()))*100 from large_table
rowcount() doesn't work in oracle.
In foxrpo, rownum was recno() & rowcount() was reccount().
Regards
Hemanshu
January 03, 2012 - 11:24 am UTC
In order to get the rowcount, you'd have to - well - count all of the rows, which means you wouldn't have to provide a slider so much because you'd know that in order to get the last row - it will just take twice as long as the first (because you would have already hit the last row in order to get the first row).
Oracle does a bit more data management wise than Foxpro did. We do not store a rowcount anywhere.
There is NUM_ROWS from user_tables - however that is only as current as your last statistics gathering. It would tend to give you a fairly good approximation in many cases.
Progress Meter
Hemanshu Sheth, January 04, 2012 - 10:10 am UTC
I was not trying to compare with FoxPro. Oracle is way ahead in data management. But, it was nice to have a slider option to foxpro developer to show the progress meter while the rows are being processed from a large dbf.
I thought that oracle might be storing a current record count (excluding deletions) for each table irrespective of whether stats are being gathered or not. That way basic performance steps like analyzing / stats gathering would not have to be done explicitly.
Any way thanks for your comments.
character limit
D, January 16, 2012 - 10:56 am UTC
does oracle only use the first few characters to form the histogram?
eg on 10.2.0.4:
SQL> exec dbms_stats.gather_table_stats(user, 'MYTAB', method_opt=>'for columns status size skewonly', degree=>12, cascade=>false, estimate_percent=>100);
PL/SQL procedure successfully completed.
SQL> select status, count(*) from MYTAB group by status;
STATUS COUNT(*)
-------------------------------------------------- ----------
IGNORE 62020
PROCESSED 10662504
status is indexed (leading edge).
SQL> explain plan for select * from MYTAB where status = 'PROCESSING';
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 2976737173
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10M| 2511M| 568K (1)| 01:53:46 |
|* 1 | TABLE ACCESS FULL| MYTAB | 10M| 2511M| 568K (1)| 01:53:46 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='PROCESSING')
13 rows selected.
SQL> explain plan for select * from MYTAB where status = 'NEW';
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 2344747963
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31010 | 7479K| 17533 (1)| 00:03:31 |
| 1 | TABLE ACCESS BY INDEX ROWID| MYTAB | 31010 | 7479K| 17533 (1)| 00:03:31 |
|* 2 | INDEX RANGE SCAN | IDX_MYTAB_03 | 31010 | | 358 (1)| 00:00:05 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("STATUS"='NEW')
14 rows selected.
SQL> explain plan for select * from MYTAB where status = 'IGNORE';
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2344747963
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 62020 | 14M| 35063 (1)| 00:07:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| MYTAB | 62020 | 14M| 35063 (1)| 00:07:01 |
|* 2 | INDEX RANGE SCAN | IDX_MYTAB_03 | 62020 | | 713 (1)| 00:00:09 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("STATUS"='IGNORE')
14 rows selected.
SQL>
in the table theres a few transient statuses ..PROCESSING and NEW.
when stats are gathered in the overnight process, theres usually no PROCESSING/NEW rows.
with this set up, it gets PROCESSING badly wrong (ie it thinks PROCESSED and PROCESSING are the same card) and makes up a guess for NEW.
January 17, 2012 - 3:06 pm UTC
yes
http://www.oracle.com/technetwork/issue-archive/2011/11-jan/o11asktom-195084.html but it is 32 characters, it does not affect you.
when stats are gathered in the overnight process,bad idea isn't that? think about it - why bother even gathering stats if you are doing it when the table has data that doesn't represent reality? Sounds like you want to either gather or set stats once (when the table is 'good', representative) and LOCK those stats - don't gather anymore
(you might have to update some column stats to set new high values for dates/sequences if applicable over time)
what you are seeing is an issue with missing values in a histogram. It'll clump the guess in with something close. It is because you are gathering stats when the data is not representative of reality.
Consider:
ops$tkyte%ORA11GR2> create table t ( status varchar2(20), data varchar2(20) );
Table created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> insert into t (status, data)
2 (
3 select * from (
4 with data as (select level l from dual connect by level <= 3316),
5 big_data as (select rownum r from data, data)
6 select case when rownum <= 62020 then 'IGNORE' else 'PROCESSED' end, 'xxxxxxx'
7 from big_data
8 )
9 )
10 /
10995856 rows created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for columns status size 254' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> select * from t where status = 'PROCESSED';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10M| 187M| 9650 (2)| 00:01:56 |
|* 1 | TABLE ACCESS FULL| T | 10M| 187M| 9650 (2)| 00:01:56 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='PROCESSED')
ops$tkyte%ORA11GR2> select * from t where status = 'PROCESSING';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10M| 187M| 9650 (2)| 00:01:56 |
|* 1 | TABLE ACCESS FULL| T | 10M| 187M| 9650 (2)| 00:01:56 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='PROCESSING')
ops$tkyte%ORA11GR2> select * from t where status = 'NEW';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 35012 | 615K| 9641 (2)| 00:01:56 |
|* 1 | TABLE ACCESS FULL| T | 35012 | 615K| 9641 (2)| 00:01:56 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='NEW')
ops$tkyte%ORA11GR2> select * from t where status = 'IGNORE';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 70025 | 1230K| 9641 (2)| 00:01:56 |
|* 1 | TABLE ACCESS FULL| T | 70025 | 1230K| 9641 (2)| 00:01:56 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='IGNORE')
ops$tkyte%ORA11GR2> set autotrace off
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> drop table t;
Table dropped.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> create table t ( status varchar2(20), data varchar2(20) );
Table created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> insert into t (status, data)
2 (
3 select * from (
4 with data as (select level l from dual connect by level <= 3316),
5 big_data as (select rownum r from data, data)
6 select case when rownum <= 62020 then 'IGNORE'
7 when rownum <= 62400 then 'PROCESSING'
8 when rownum <= 62800 then 'NEW'
9 else 'PROCESSED'
10 end, 'xxxxxxx'
11 from big_data
12 )
13 )
14 /
10995856 rows created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for columns status size 254' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> select * from t where status = 'PROCESSED';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10M| 187M| 9580 (2)| 00:01:55 |
|* 1 | TABLE ACCESS FULL| T | 10M| 187M| 9580 (2)| 00:01:55 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='PROCESSED')
ops$tkyte%ORA11GR2> select * from t where status = 'PROCESSING';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1969 | 35442 | 9571 (2)| 00:01:55 |
|* 1 | TABLE ACCESS FULL| T | 1969 | 35442 | 9571 (2)| 00:01:55 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='PROCESSING')
ops$tkyte%ORA11GR2> select * from t where status = 'NEW';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 985 | 17730 | 9571 (2)| 00:01:55 |
|* 1 | TABLE ACCESS FULL| T | 985 | 17730 | 9571 (2)| 00:01:55 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='NEW')
ops$tkyte%ORA11GR2> select * from t where status = 'IGNORE';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 66952 | 1176K| 9571 (2)| 00:01:55 |
|* 1 | TABLE ACCESS FULL| T | 66952 | 1176K| 9571 (2)| 00:01:55 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='IGNORE')
ops$tkyte%ORA11GR2> set autotrace off
Notice in the first pass - it gets them wrong - because there were no observations of the values you went looking for when the stats were gathered.
However, in the second case - when we gathered stats (and you should just use size 254, you know it is skewed, just go for it) on representative data - we get really nice estimates.
Note: i didn't index, was only interested in the cardinalities.
Why would you gather for "all columns" vs. "all indexed columns"?
Robert, May 09, 2012 - 9:31 am UTC
Tom,
What would be the rationale for gathering histograms for "all columns" vs. "all indexed columns"?
"all INDEXED columns" makes sense... But if a column has skewed data values but it is not indexed, Oracle has no way to get to that data or to know where on disk any particular value is than to blindly read data blocks does it?
So why gather histogram stats on non-indexed columns?
Thanks!
Robert.
May 10, 2012 - 7:20 am UTC
select * from t where indexed_column in (select y from t2 where unindexed_column = 'something');
Now, if it is known that the subquery returns 100 records - maybe the query will use a nested loops approach and an index on the indexed_column
If it is know that the subquery returns 1,000,000 records - maybe the query will use a hash join approach.
Estimated cardinalities are crucial in order to get correct plans - and we need statistics on columns used in where clauses - period. It really doesn't matter if the column is indexed or not - estimated cardinalities are vital to getting the correct plan.
Sajan Varughese, May 10, 2012 - 10:35 am UTC
where are you creating histogram? I thought you would use dbms_stats to generate histograms. I am not seeing that you are doin it.
May 10, 2012 - 3:24 pm UTC
what part of this page are you talking about?
way back when I first wrote this, it was accepted to use ANALYZE (no more) to gather stats.
way back then I did:
analyze table t compute statistics
for table
for all indexes
for all indexed columns;
that gathered table stats, all index stats, and histograms on all indexed columns.
back in the day, today you would use
dbms_stats.gather_table_stats( user, 'T', method_opt=>'for all indexed columns', cascade=>true );
of course
Sajan Varughese, May 11, 2012 - 7:19 am UTC
thanks
I totally get it now
Robert, May 11, 2012 - 3:11 pm UTC
Thank you Tom,
Of course...I see it now.
Thank you for the explanation!
compute statistics on all columns size skewonly
Robert, May 11, 2012 - 3:20 pm UTC
Tom,
1. When an INDEX is build/rebuilt (10.2 and up) Oracle automatically *COMPUTES* all statistics for the index, correct? ... in other words, if an index is built/rebuilt, no reason to collect stats on that index as a separate step afterwards, correct?
2. Would you say under most circumstances, for the sake of the CBO, if possible, we should do "COMPUTE statistics.... on ALL COLUMNS size SKEWONLY".... or even "... on ALL COLUMNS size 254"?
3. Since gathering stats appears not to block/lock the table, assuming we have the CPU to spare, what is wrong with letting COMPUTE statistics (e.g. size 254) run for a large table, during normal operations, even though it might take 2 weeks to complete? ... is there any reason not to do this?
Thanks,
Robert.
May 12, 2012 - 12:37 am UTC
1) correct
2) incorrect. In a reporting system - you should use the default METHOD_OPT - see
http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html "Why Does My Plan Change?"
In a transactional system with bind variables - you want to be a little more judicious in most cases about when and how histograms are generated.
3) forget CPU, you want to let is consume that IO? In general, it comes at a cost.
In 11g, dbms_stats was basically rewritten under the covers. You'll find that if you go with the defaults - the gathering of statistics will tend to be a compute and will tend to be faster than it used to be in 10g and before (much faster in most cases)
In 10g and before, using estimate pct was common = in order to get valid statistics in a reasonable amount of time.
Remember - if it took weeks to gather - that means for two weeks you were running on non-representative statistics - and when it finished - you were still using non-representative statistics because one presumes some things might have changed in two weeks :)
all columns vs. all indexed columns
Markus, May 17, 2013 - 2:17 pm UTC
Hi Tom,
A third party vendor instructed (or even bothered) us to gather histograms for indexed columns only (method_opt=>'for all indexed columns size auto') by setting table preferences on all their tables using dbms_stats.set_schema_prefs.
They argue using the default of gathering histograms for all columns (method_opt=>'for all columns size auto') would lead to negative performance issues.
How could this be? How can having more histograms and cardinalities (as of calculating value distribution for all columns) lead to bad performance?
Oracle Version = 11.2 EE
Cheers, Markus
May 21, 2013 - 2:29 pm UTC
you gather histograms on columns that have skewed cardinality - and that are used to select rows. regardless of whether they are indexed are not.
consider for example:
select *
from big_table
where indexed_column in (select something
from skewed_table
where unindexed_skewed_column = 42 )
Ok, the plans we could probably consider would be:
a) full scan skewed_table to retrieve something, then use an index on big_table.indexed_column to retrieve the rows.
b) full scan skewed_table and full scan big_table to perform a semi join.
now, which one is better? What if where unindexed_skewed_column = 42 returns a ton of rows? What if where unindexed_skewed_column = 42 returns one row?
what if we don't *know* what where unindexed_skewed_column = 42 might return?
You generate histograms on columns you use in where clauses to select rows and those columns contain data that is skewed in nature.
whether those columns are indexed is not relevant.
all columns vs. all indexed columns
Markus, June 04, 2013 - 10:28 am UTC
yes, skewed columns should have their histograms. but why should i generate histograms on indexed columns only?
it's unlikely that only indexed columns are skewed. it's unlikely that they (the 3rd party vendor) use only indexed columns in their predicates.
it's more likely that they get bad performance in fact of missing histograms on not indexed but skewed columns.
isn't it?
June 04, 2013 - 2:19 pm UTC
but why should i generate
histograms on indexed columns only? in general
you don't. It (method_opt=>'for all indexed columns') is really good for demo's, but in real life, it is not very useful. I wish it did not exist.
sometimes your indexed columns are skewed
sometimes they aren't
sometimes your unindexed column are skewed
sometimes they aren't
sometimes your unindexed columns are in predicates
sometimes they are not.
what you want histograms on at most would be
a) skewed columns
b) referenced in predicates
If you just let method_opt default, you'll get that after the database sees some of your queries.
see "Why Does My Plan Change?"
http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html
all columns vs. all indexed columns
Markus, June 06, 2013 - 9:24 am UTC
I found an official explanation from the vendor:
<quote>
The reason for using FOR ALL INDEXED COLUMNS SIZE AUTO as the preferred value,
is that we have extensive experience from many client sides, showing that
statistic/histograms on all columns appear to have caused flawed decisions by
the Oracle optimizer for our app. In these investigated cases, it has been seen
that removing the histograms from all columns and ‘only’ having histograms on
the indexed columns (on the tables involved in the query), changed the
Optimizers decision for the better. So these experiences, together with the
knowledge that by far the majority of queries from our app has predicates on
indexed columns, and that in most cases bind variables are used and that bind
peeking is disabled, have led to this general recommendation.
<\quote>
Note: We had to set "_optim_peek_user_binds = false" on their request (Version 11.2.0.3 EE).
Does this sound reasonable for you?
June 06, 2013 - 1:53 pm UTC
nope,
if I were running a query:
select *
from really_big_table rbt, another_really_big_table arbt
where rbt.unindex_column = 5
and arbt.unidnex_column = 42
and rbt.something = arbt.something
would you not like a histogram on those unindexed columns if they were skewed? Otherwise you would know a) number of rows in table, b) number of distinct values of column in table.
suppose rbt was 100,000,000 rows and its unindexed column has 1,000 distinct values. So, we'll get about 100,000 rows from that rowsource.
further, suppose arbt has 100,000,000 rows and its unindexed column has 10,000 distinct values. So, we'll get about 10,000 rows from that rowsource.
Now, we'd likely come up with full scan and hash arbt into memory, full scan rbt, hash join them. We'd use arbt as the first table to scan and hash since we assume it would be smaller.
But wouldn't it have been cool to know that rbt.unindex_column = 5 returns just 100 rows? Then we would full scan rbt, and use nested loops to join to arbt.
the goal: getting the optimizer to get the right estimated cardinality so we get the right plan.
or what about the new feature in 11g - extended statistics - where you have a query like:
select * from t where (a+b)/2 > 50;
the DBA might well gather stats on (a+b)/2 simply to get the right estimated cardinality - and there isn't any index on (a+b)/2 (in fact, in the case of many extended statistics there are no indexes on the column groups/expressions). here a feature for gathering stats on unindexed information in general.
You want histograms on columns that you use in predicates. If you only use indexed columns in predicates - fine, but realize please that you are gathering more histograms than you need (you only need it on skewed columns - those primary keys - what a waste to have a histogram on them isn't it)....
some indexed columns need them
some unindexed columns need them
some indexed columns DO NOT need them
some unindexed columns DO NOT need them
all columns vs. all indexed columns
Markus, June 07, 2013 - 1:41 pm UTC
thanks a lot for your opinion.
by the way. does "_optim_peek_user_binds = false" negatively influence or even turn off adaptive cursor sharing?
cheers, markus
June 07, 2013 - 1:45 pm UTC
thanks a lot for your opinion.
actually, not an opinion, a verifiable fact.
gathering histograms on a primary key is a waste of resources (fact, data is never skewed)
not gathering histograms on a skewed, unindexed column can easily lead to the optimizer incorrectly estimating cardinalities and if you have a bad plan it is due to bad cardinalities estimates 99.99999999999999% of the time (fact)
by the way. does "_optim_peek_user_binds = false" negatively influence or even
turn off adaptive cursor sharing?
it would render it meaningless pretty much, yes. The optimizer would always come up with the same plan regardless without bind peeking enabled, there would be nothing to adapt the plan to.
probably just a common way to go
orel, June 07, 2013 - 9:15 pm UTC
Hello Tom, Markus,
I think the point of this software company is just to give some insight to help people to deal with their software on Oracle.
I don't think this means you have to follow every recommendations about statistics. For example Oracle provide an "auto" way to collect statistics, which doesn't mean this is the proper way to go with your data.
I think you are more or less in the same place than Wolfgang Breitling, who used to deal with Peoplesoft software and wrote, few years ago, some valuable articles about optimising query just using statistics.
http://www.centrexcc.com/ I really think you just have to check with your software company that you can deal yourself around statistics without loosing support. And then, hopefully, you will be able to collect what you know to be the best for your data.