Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Sidharth.

Asked: September 26, 2008 - 2:11 pm UTC

Last updated: November 29, 2010 - 2:21 am UTC

Version: 9.2.0.7

Viewed 1000+ times

You Asked

Hi Tom,

My questions are related to using DYNAMIC_SAMPLING.

a. Does dynamic sampling work the same way in 10g and 9i? If not, what does it do better in 10g?

b. What is the difference between using DYNAMIC_SAMPLING(F 0) and DYNAMIC_SAMPLING(F 10), in which scenario would be use the higher number?

c. Does using this hint have any additional benefits in star transformation queries?

As a side note ,I used to be a DBA around 6 years back, but then my line of work changed a bit, until I found myself doing some Oracle performance tuning last month for a new project - never came across dynamic sampling before. Also, your book Expert One on One is still a prized posession of mine from those days - I have it kept securely under lock and key and it is not accessible for me at the moment - I was led to believe that there will never be a second edition so I treated it like treasure(even refused to lend it to not-so-close friends, fearing I would never get it back). I hear that it is available once again - good for the new guys out there, bad for antique collectors :)

and Tom said...

a) they do the same thing - however, in 9i the default is level 1, in 10g the default is level 2.

b) we fortunately document this stuff! Really!

http://docs.oracle.com/cd/B19306_01/server.102/b14211/stats.htm#i43032


c) this capability can cause a hard parse to gather 'better' statistics right here, right now by sampling the data. Anything that gets us to more reliable, better estimates for cardinalities helps us get the right plan. So, the optimizer would be able to use this information to get the right plan - be that a star transformation or not.

It will not make a star transformation "better" (or worse), it will help the optimizer decide whether to use a star transformation.



Rating

  (14 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks, Tom

Sid, September 29, 2008 - 4:36 pm UTC

Most helpful.


Question

Anil, September 30, 2008 - 6:42 am UTC

Tom,

<<Start>>
Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all
tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks. For
unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.

Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic
sampling blocks respectively.

Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using
all blocks in the table.
<<End>>

Can you please explain the concept behind the number of blocks sampled is two times the default number of dynamic sampling blocks?

And how do we know estimate what's the value of blocks sampled for a given table?

How do we know with what value should I set for dynamic sampling for a particular query?

Thanks
Tom Kyte
September 30, 2008 - 9:26 am UTC

... Can you please explain the concept behind the number of blocks sampled is two
times the default number of dynamic sampling blocks?
...

if there are no statistics to work with at all - we read more blocks during the sample in order to arrive at reasonable statistics.


dynamic sampling can and will change from release to release. You and I do not control the sample size precisely - it'll be a function of the segment size (the percentage read will vary from segment to segment based on the segment size, you can use dbms_monitor.session_trace_enable(binds=>true) to see the sample query and the values used to sample the data)



... How do we know with what value should I set for dynamic sampling for a
particular query? ....

well, you have the data on what it does, you would use your knowledge of what that all means to set it. Likely - 3 or 4 is well good enough for a reporting/warehouse system (5 and above are just 4 with larger samples). 2 (the default) is likely good for your transactional systems.

Dynamic Sampling Hint

A reader, March 30, 2009 - 12:21 pm UTC

Can the dynamic sampling hint be used on more than one table in a SQL? If yes, what's the syntax?

If the SQL has a subquery using EXISTS and NOT EXISTS, does it make sense to use dynamic sampling on both the main tables and the tables in the subquery?
Tom Kyte
March 30, 2009 - 5:22 pm UTC

https://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

and yes, you can dynamically sample as many as you want.


... If the SQL has a subquery using EXISTS and NOT EXISTS, does it make sense to
use dynamic sampling on both the main tables and the tables in the subquery? ...

it *depends*. Did we have to guess at some estimated cardinality? Where we missing statistics?

A reader, March 31, 2009 - 5:39 pm UTC

Hi Tom, it's a very good article on dynamic sampling. However, it didn't answer my original question on how to use hint to dynamically sampling more than one table. Is it even possible to do this through a hint?
Tom Kyte
April 01, 2009 - 8:41 am UTC

I answered your question with YES

... and yes, you can dynamically sample as many as you want. ....



just put it in there over and over again.


ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table t1;
ops$tkyte%ORA10GR2> drop table t2;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t1 as
ops$tkyte%ORA10GR2> select object_id,
ops$tkyte%ORA10GR2>        decode( mod(object_id,2), 0, 'Y', 'N' ) f1,
ops$tkyte%ORA10GR2>        decode( mod(object_id,2), 0, 'N', 'Y' ) f2
ops$tkyte%ORA10GR2>  from all_objects
ops$tkyte%ORA10GR2> /
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T1' );
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t2 as
ops$tkyte%ORA10GR2> select object_id,
ops$tkyte%ORA10GR2>        decode( mod(object_id,2), 0, 'Y', 'N' ) f1,
ops$tkyte%ORA10GR2>        decode( mod(object_id,2), 0, 'N', 'Y' ) f2
ops$tkyte%ORA10GR2>  from t1
ops$tkyte%ORA10GR2> /
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t1, t2 where t1.object_id = t2.object_id and t1.f1='N' and t1.f2='N' and t2.f1='Y' and t2.f2='Y';

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 12463 |   194K|    79  (14)| 00:00:0
|*  1 |  HASH JOIN         |      | 12463 |   194K|    79  (14)| 00:00:0
|*  2 |   TABLE ACCESS FULL| T2   | 12464 | 99712 |    39  (13)| 00:00:0
|*  3 |   TABLE ACCESS FULL| T1   | 12472 | 99776 |    39  (13)| 00:00:0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   2 - filter("T2"."F2"='Y' AND "T2"."F1"='Y')
   3 - filter("T1"."F1"='N' AND "T1"."F2"='N')

<b>no dynamic sampling, we don't know that YY and NN do not exist...</b>

ops$tkyte%ORA10GR2> select /*+ dynamic_sampling(t1 3) */ * from t1, t2 where t1.object_id = t2.object_id and t1.f1='N' and t1.f2='N' and t2.f1='Y' and t2.f2='Y';

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    16 |    78  (13)| 00:00:0
|*  1 |  HASH JOIN         |      |     1 |    16 |    78  (13)| 00:00:0
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    39  (13)| 00:00:0
|*  3 |   TABLE ACCESS FULL| T2   | 12464 | 99712 |    39  (13)| 00:00:0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   2 - filter("T1"."F1"='N' AND "T1"."F2"='N')
   3 - filter("T2"."F2"='Y' AND "T2"."F1"='Y')

Note
-----
   - dynamic sampling used for this statement

<b>we can do table T1</b>

ops$tkyte%ORA10GR2> select /*+ dynamic_sampling(t2 3) */ * from t1, t2 where t1.object_id = t2.object_id and t1.f1='N' and t1.f2='N' and t2.f1='Y' and t2.f2='Y';

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    16 |    78  (13)| 00:00:0
|*  1 |  HASH JOIN         |      |     1 |    16 |    78  (13)| 00:00:0
|*  2 |   TABLE ACCESS FULL| T2   |     1 |     8 |    39  (13)| 00:00:0
|*  3 |   TABLE ACCESS FULL| T1   | 12472 | 99776 |    39  (13)| 00:00:0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   2 - filter("T2"."F2"='Y' AND "T2"."F1"='Y')
   3 - filter("T1"."F1"='N' AND "T1"."F2"='N')

Note
-----
   - dynamic sampling used for this statement

<b>we can do table T2</b>

ops$tkyte%ORA10GR2> select /*+ dynamic_sampling(t1 3) dynamic_sampling(t2 3) */ * from t1, t2 where t1.object_id = t2.object_id and t1.f1='N' and t1.f2='N' and t2.f1='Y' and t2.f2='Y';

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    16 |    78  (13)| 00:00:0
|*  1 |  HASH JOIN         |      |     1 |    16 |    78  (13)| 00:00:0
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    39  (13)| 00:00:0
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     8 |    39  (13)| 00:00:0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   2 - filter("T1"."F1"='N' AND "T1"."F2"='N')
   3 - filter("T2"."F2"='Y' AND "T2"."F1"='Y')

Note
-----
   - dynamic sampling used for this statement

<b>and we can do t1 and t2...</b>

ops$tkyte%ORA10GR2> set autotrace off


I didn't understand

A reader, September 05, 2009 - 10:28 am UTC

Hi,Tom

I didn't understand   the sample case you  gave above.
If we dont analyze the table ,oracle will choose dynamic samping to get a correct cardinality,here is 1 row.

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    16 |    78  (13)| 00:00:0
|*  1 |  HASH JOIN         |      |     1 |    16 |    78  (13)| 00:00:0
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    39  (13)| 00:00:0
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     8 |    39  (13)| 00:00:0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   2 - filter("T1"."F1"='N' AND "T1"."F2"='N')
   3 - filter("T2"."F2"='Y' AND "T2"."F1"='Y')

Note
-----
   - dynamic sampling used for this statement


but if we analyze the table ,oracle will not choose a dynamic sampling,as you said ,"no dynamic sampling, we don't know that YY and NN do not exist..."
ps$tkyte%ORA10GR2> select * from t1, t2 where t1.object_id = t2.object_id and t1.f1='N' and 
t1.f2='N' and t2.f1='Y' and t2.f2='Y';

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 12463 |   194K|    79  (14)| 00:00:0
|*  1 |  HASH JOIN         |      | 12463 |   194K|    79  (14)| 00:00:0
|*  2 |   TABLE ACCESS FULL| T2   | 12464 | 99712 |    39  (13)| 00:00:0
|*  3 |   TABLE ACCESS FULL| T1   | 12472 | 99776 |    39  (13)| 00:00:0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   2 - filter("T2"."F2"='Y' AND "T2"."F1"='Y')
   3 - filter("T1"."F1"='N' AND "T1"."F2"='N')


So 
1.does it ALWAYS be correct to analyze a table?
2.does dynamic sampling do the same thing as analyze table or just a very small analyze,if yes ,why dynamic sampling know "that YY and NN do not exist..." but analyze dont ?


And please give me a comment on this thead,I have added more information:
http://asktom.oracle.com/pls/asktom/f?p=100:24:0::NO::P24_ID:1924289100346304181

Thank you !

Best regards

Alan Tan

Tom Kyte
September 06, 2009 - 9:29 am UTC

... If we dont analyze the table ...


but I did gather statistics on the table, so I don't understand your question??



1) nothing is always true. Even this statement :)

2) very very very small subset. Gathering statistics does not do the YY NN thing because in general there are dozens of columns and millions of combinations to consider.


... And please give me a comment on this thead,I have added more information:
....

don't do that, that is just plain annoying. I answer when I can, when I have time - I just arrived in India after being in a plane for a day. I'll look at your stuff - when I get time.


I'am very sory for my words,I apologize.

A reader, September 07, 2009 - 11:08 pm UTC


dynamic_sampling

A reader, September 11, 2009 - 9:32 am UTC

So does it means we need to always opt for dynamic sampling as gathering statistics is not aware of YY and NN.

If so then if we have huge number of rows in tables, dynamic sampling doesn't calculate exact number of rows leading to imprecise execution plan..


Tom Kyte
September 14, 2009 - 12:05 pm UTC

... So does it means we need to always opt for dynamic sampling as gathering
statistics is not aware of YY and NN.
...

of course not.

read
https://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

Dynamic Sampling

Abhisek, April 07, 2010 - 7:16 am UTC

Hi Tom,

One little question:

Now we know that Dynamic sampling is always used when the statistics are not present in the database for the table i.e. the table is never analyzed. It takes the default values defined for it. We can change it through hints to use/not use the dynamic sampling.

But what happens if the table is analyzed and all the statistics are up-to-date.

Also, what are disadvantages for the dynamic sampling except that it considers more blocks than actual for sample

Please correct me wherever my understanding is wrong and give your precious suggestions.
Tom Kyte
April 12, 2010 - 8:38 pm UTC

Now we know that Dynamic sampling is always used

depends on the release - 8i and before - no sampling, did not exist.

9i, sampling defaults to level 1, we sample if the table has NO stats AND there are no indexes on the table (think global temporary tables)

10g, sampling defaults to level 2, we sample if the table has no stats.

But what happens if the table is analyzed and all the statistics are
up-to-date.


it depends on the level, see:
https://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

level 3 and above will sample sometimes even if the stats are all there.


Also, what are disadvantages for the dynamic sampling except that it considers
more blocks than actual for sample



it doesn't consider MORE blocks - it considers sampling blocks - it considers doing more work during parsing. See the reference article for more discussion.


dynamic sampling true or myth?

Veera, September 20, 2010 - 9:29 pm UTC

Tom,

We use Oracle 10g Rel 2, and our application use lots of Global Temporary Tables (GTT). As you know, often we encounter performance issues due to GTT. My colleague don't want me to use dynamic sampling for the following reasons stated by them.

1. If queries use bind variables, dynamic sampling will never work.
2. Default dynamic samplying set at db level will not work if we have index on GTT.

Are they true or myths?

All the queries in our application (OLTP) uses bind variables and most GTTs have index.





Tom Kyte
September 21, 2010 - 3:39 pm UTC


1) false - what does a) have to do with b)

2) false - if you have dynamic sampling set to level 2 (default in 10g) then any unanalyzed table will be sampled - regardless of the indexing - at hard parse time.

In 9ir2 (and only in 9ir2) the default was level one which would skip the sampling IF the object had an index.

read this:
http://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

Hard Parsing and dynamic sampling

Mahesh, September 21, 2010 - 9:26 pm UTC

Tom,

1. Default Dynamic sampling (level 2 in 10g set at db level ) will be done for all global temp tables / persistent tables even if dynamic_sampling hint is not all used in the query. Am I right or do we need to use the hint?

2. In a multi-user environment, can I use 'truncate' on global temp tables?

Thank you.


Tom Kyte
September 23, 2010 - 9:47 am UTC

1) as per the article referenced above, the default setting is 2 for all queries, you would have to OVERRIDE that by

a) changing the init.ora
b) using a hint
c) altering your session

but by default, out of the box, every query is subject to level 2.

2) yes. Every session has their own global temporary table segment - their own private one. In fact if you have two users and user1 has a temp tablespace of TEMP1 and user2 has a temp tablespace of TEMP2 - then user1 will use temp1 to allocate their temp segment and user2 will use temp2 to allocate theirs. They are completely disjoint from each other.

Dynamic Sampling and CHECK

Parthiban Nagarajan, October 15, 2010 - 9:24 am UTC

Hi Tom

Please have a look at the SQL and its output below
Dynamic Sampling with the CHECK constraint in place, gives me the very correct estimate of cardinality;
But after I collect the stat, the cardinality seems to be weird.  Yes, still CHECK constraint is in place.

What am I doing wrong?
(The version also given there)


SQL> create table t7 as select cast(rownum - 1 as int) as i1, cast(rownum as int) as i2, a.*
  2  from all_objects a;

Table created.

SQL> alter table t7 modify ( i1 check(i1 >= 0), i2 check(i2 >= 1) );

Table altered.

SQL> alter table t7 add constraint t7_i1_lt_i2_ck check (i1 < i2);

Table altered.

SQL> desc t7
 Name                                                             Null?    Type
 ---------------------------------------------------------------- -------- --------------------------------------------
 I1                                                                        NUMBER
 I2                                                                        NUMBER
 OWNER                                                                     VARCHAR2(30)
 OBJECT_NAME                                                               VARCHAR2(30)
 SUBOBJECT_NAME                                                            VARCHAR2(30)
 OBJECT_ID                                                                 NUMBER
 DATA_OBJECT_ID                                                            NUMBER
 OBJECT_TYPE                                                               VARCHAR2(19)
 CREATED                                                                   DATE
 LAST_DDL_TIME                                                             DATE
 TIMESTAMP                                                                 VARCHAR2(19)
 STATUS                                                                    VARCHAR2(7)
 TEMPORARY                                                                 VARCHAR2(1)
 GENERATED                                                                 VARCHAR2(1)
 SECONDARY                                                                 VARCHAR2(1)

SQL> set autot traceonly
SQL> select * from t7 where i1 < i2;

49395 rows selected.


Execution Plan
----------------------------------------------------------
   0
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=212 Card=49551 Bytes=7630854)


   1    0
  TABLE ACCESS (FULL) OF 'T7' (TABLE) (Cost=212 Card=49551 Bytes=7630854)






Statistics
----------------------------------------------------------
        369  recursive calls
          0  db block gets
       4119  consistent gets
          0  physical reads
          0  redo size
    2844771  bytes sent via SQL*Net to client
      36720  bytes received via SQL*Net from client
       3294  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      49395  rows processed

SQL> select * from t7 where i1 > i2;

no rows selected


Execution Plan
----------------------------------------------------------
   0
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=211 Card=1 Bytes=154)


   1    0
  TABLE ACCESS (FULL) OF 'T7' (TABLE) (Cost=211 Card=1 Bytes=154)






Statistics
----------------------------------------------------------
         48  recursive calls
          0  db block gets
        828  consistent gets
          0  physical reads
          0  redo size
       1058  bytes sent via SQL*Net to client
        373  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL> exec dbms_stats.gather_table_stats(user, 't7', cascade => true);

PL/SQL procedure successfully completed.

SQL> select * from t7 where i1 > i2;

no rows selected


Execution Plan
----------------------------------------------------------
   0
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=212 Card=24696 Bytes=2543688)


   1    0
  TABLE ACCESS (FULL) OF 'T7' (TABLE) (Cost=212 Card=24696 Bytes=2543688)






Statistics
----------------------------------------------------------
         42  recursive calls
          0  db block gets
        753  consistent gets
          0  physical reads
          0  redo size
       1058  bytes sent via SQL*Net to client
        373  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL> select * from t7 where i1 < i2;

49395 rows selected.


Execution Plan
----------------------------------------------------------
   0
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=211 Card=24699 Bytes=2543997)


   1    0
  TABLE ACCESS (FULL) OF 'T7' (TABLE) (Cost=211 Card=24699 Bytes=2543997)






Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3994  consistent gets
          0  physical reads
          0  redo size
    2844771  bytes sent via SQL*Net to client
      36720  bytes received via SQL*Net from client
       3294  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      49395  rows processed

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Prod
PL/SQL Release 10.1.0.2.0 - Production
CORE    10.1.0.2.0      Production
TNS for 32-bit Windows: Version 10.1.0.2.0 - Production
NLSRTL Version 10.1.0.2.0 - Production


Tom Kyte
October 15, 2010 - 9:58 am UTC

dyanmic sampling would have taken your specific where clause into consideration. If you would trace during the first hard parse, you would see a query:

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE
  NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
  NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"),
  NVL(SUM(C2),:"SYS_B_1")
FROM
 (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T7") FULL("T7")
  NO_PARALLEL_INDEX("T7") */ :"SYS_B_2" AS C1, CASE WHEN "T7"."I1"<"T7"."I2"
  AND "T7"."I2">:"SYS_B_3" THEN :"SYS_B_4" ELSE :"SYS_B_5" END AS C2 FROM
  "T7" SAMPLE BLOCK (:"SYS_B_6" , :"SYS_B_7") SEED (:"SYS_B_8") "T7")
  SAMPLESUB



see how it actually incorporated your where clause into it. Having that in there adds to the fidelity of the statistics available.

when you dbms_stats.gather_table_stats - you get (in 10g) at most stats on the table, on individual columns, on indexes - but not on expressions like "x < y". For that you would need a sqlprofile or dynamic sampling, for example:



ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats(user, 't7', cascade => true);

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> set autot traceonly
ops$tkyte%ORA10GR2> select * from t7 after_gather where i1 < i2;

50086 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3979761704

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 25044 |  2519K|    57   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T7   | 25044 |  2519K|    57   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("I1"<"I2" AND "I2">0)


Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
       4055  consistent gets
          0  physical reads
          0  redo size
    3018098  bytes sent via SQL*Net to client
      37129  bytes received via SQL*Net from client
       3341  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      50086  rows processed

ops$tkyte%ORA10GR2> set autotrace off
ops$tkyte%ORA10GR2> set autot traceonly
ops$tkyte%ORA10GR2> select /*+ dynamic_sampling(after_gather 3) */ * from t7 after_gather  where i1 < i2;

50086 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3979761704

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50086 |  5037K|    58   (6)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T7   | 50086 |  5037K|    58   (6)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("I1"<"I2" AND "I2">0)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
         47  recursive calls
          0  db block gets
       4177  consistent gets
          0  physical reads
          0  redo size
    3018098  bytes sent via SQL*Net to client
      37129  bytes received via SQL*Net from client
       3341  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      50086  rows processed

ops$tkyte%ORA10GR2> set autotrace off


when I use dynamic sampling set to level 3 - the optimizer will sample WHEN there are statistics in place but it was forced to "guess" at some cardinality value.

see
http://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

for more info.

Dynamic sampling results seem to differ when no stats are present vs when stats were deleted

stats confused, October 16, 2010 - 7:00 am UTC

Hi Tom,

I have a bit of a dilemma. It seems that the dynamic sampling produces different results (i.e. plans) when no stats were ever present vs when the stats were removed.

Here is the test case I played with (my table has 3 columns: id NUMBER, txt VARCHAR2(16), dt DATE):

Step 1: explain plan with trace 10053 level 2
alter session set events '10053 trace name context forever, level 2';

<10053 - snippet>
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TBL1  Alias: TBL1  (NOT ANALYZED)
    #Rows: 46395  #Blks:  568  AvgRowLen:  100.00
Index Stats::
  Index: TBL1_ID_DT_IDX  Col#: 1 3
    LVLS: 2  #LB: 505  #DK: 3  LB/K: 168.00  DB/K: 236.00  CLUF: 709.00
***************************************
</10053 - snippet>
...
<10053 - snippet>
  Access Path: TableScan
    Cost:  128.03  Resp: 128.03  Degree: 0
      Cost_io: 126.00  Cost_cpu: 58323638
      Resp_io: 126.00  Resp_cpu: 58323638
  Access Path: index (AllEqRange)
    Index: TBL1_ID_DT_IDX
    resc_io: 408.00  resc_cpu: 24205008
    ix_sel: 0.33333  ix_sel_with_filters: 0.33333
    Cost: <b>408.84</b>  Resp: 408.84  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 128.03  Degree: 1  Resp: 128.03  Card: 15.82  Bytes: 0
</10053 - snippet>


Step 2: gather stats on the table and index, emulating the nightly database wide auto job:
begin
     dbms_stats.gather_table_stats(ownname=>'scott', 
           tabname=>'tbl1',
           cascade=>true,
           method_opt=>'for all columns size auto');
end;
/


Explain plan and trace again:
<10053 - snippet>
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TBL1  Alias: TBL1
    #Rows: 162856  #Blks:  568  AvgRowLen:  17.00
Index Stats::
  Index: TBL1_ID_DT_IDX  Col#: 1 3
    LVLS: 2  #LB: 504  #DK: 3  LB/K: 168.00  DB/K: 236.00  CLUF: 709.00
***************************************
</10053 - snippet>
...
<10053 - snippet>
  Access Path: TableScan
    Cost:  128.91  Resp: 128.91  Degree: 0
      Cost_io: 126.00  Cost_cpu: 83844418
      Resp_io: 126.00  Resp_cpu: 83844418
  Access Path: index (AllEqRange)
    Index: TBL1_ID_DT_IDX
    resc_io: 306.00  resc_cpu: 18153561
    ix_sel: 0.25  ix_sel_with_filters: 0.25
    Cost: <b>306.63</b>  Resp: 306.63  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 128.91  Degree: 1  Resp: 128.91  Card: 40714.00  Bytes: 0
</10053 - snippet>


Step 3: remove statistics on both table and index and trace the explain plan again

begin
     dbms_stats.delete_table_stats('scott', 'tbl1');
     dbms_stats.delete_index_stats('scott', 'tbl1_id_dt_idx');
end;
/


<10053 - snippet>
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TBL1  Alias: TBL1  (NOT ANALYZED)
    #Rows: 46395  #Blks:  568  AvgRowLen:  100.00
Index Stats::
  Index: TBL1_ID_DT_IDX  Col#: 1 3    (NOT ANALYZED)
    LVLS: 1  #LB: 25  #DK: 100  LB/K: 1.00  DB/K: 1.00  CLUF: 800.00
</10053 - snippet>

<10053 - snippet>
  Access Path: TableScan
    Cost:  128.03  Resp: 128.03  Degree: 0
      Cost_io: 126.00  Cost_cpu: 58323638
      Resp_io: 126.00  Resp_cpu: 58323638
  Access Path: index (AllEqRange)
    Index: TBL1_ID_DT_IDX
    resc_io: 9.00  resc_cpu: 74693
    ix_sel: 0.01  ix_sel_with_filters: 0.01
    Cost: <b>9.00</b>  Resp: 9.00  Degree: 1
  Best:: AccessPath: IndexRange  Index: TBL1_ID_DT_IDX
         Cost: 9.00  Degree: 1  Resp: 9.00  Card: 15.82  Bytes: 0
</10053 - snippet>



I am puzzled as of why the index access cost is estimated so differently when no stats were ever gathered vs when stats were gathered with SIZE AUTO vs stats were deleted.

Can you please explain why this is happening? (please note how the the index access cost changes throughout the test case, becoming the lowest after the stats were deleted)

Database version is 10.2.0.4.
Tom Kyte
November 17, 2010 - 5:42 am UTC




what would have been really cool would have been to have an actual test case to test with - so that I myself could reproduce your findings and discuss them. sigh. I never seem to get that - even though I give them to you guys all of the time. sigh.



here is my GUESS.

You created the table.
You filled the table.
You then created an index.

In Oracle 10g and above, that gathered statistics on the index, create index always issues a COMPUTE STATISTICS on the index during the create/rebuild if the index is not empty.

So, in your first case - you ran with no stats on the table, stats on the index. Dynamic sampling kicks in and did a sample of the table.

In the second case, stats all around. no sampling required. You used auto_sample_size as the estimate percent by default - so it was a sample, not a compute - hence the numbers were fractionally different.

In the third case, no stats on anything, you used dynamic sampling for everything for the first time.


So, you are comparing:

a) compute statistics on the index with
b) estimate statistics on the index with
c) dynamic sample on the index.

</b>RE: dynamic sampling results seem to differ when no stats are present vs when stats were deleted

stats confused, November 16, 2010 - 12:32 pm UTC

The intention was to post the test case, which I wish I did, because I can't seem to be able to reproduce it myself either...

Your assumption was right: in the first case I only had stats on the indexes, in the second case stats on both table and indexes, and in the third case no stats.
You were also right on the fact that I first populated the table and then created the indexes.

I can see very little variation between the IO cost for the FTS, but rather large variations for the index. Would you please be able to shed some light on why the
resc_io
for the index TBL1_ID_DT_IDX (AllEqRange) is so much different in the 3 cases? (408.00 vs 306.00 vs 9.00)
Tom Kyte
November 17, 2010 - 5:47 am UTC

if you had a test case so that we had some "facts" to deal with, we could likely explain why the costs what they were. Without it - we'd be hypothesizing, guessing and so on.

but just look at the index statistics reported:

LVLS: 2 #LB: 504 #DK: 3 LB/K: 168.00 DB/K: 236.00 CLUF: 709.00
LVLS: 1 #LB: 25 #DK: 100 LB/K: 1.00 DB/K: 1.00 CLUF: 800.00


the last set of stats shows the index to be radically different looking - something like that would definitely affect the io resources.

It is just that the statistics are radically different in each of the three cases. Different numbers - different costs.

Dynamic_sampling in 10gR2

Rajeshwaran Jeyabal, November 28, 2010 - 9:00 pm UTC

test@IRDST1> SELECT en.enc_key
  2  FROM test.ir_enc en,
  3    test.ir_diag d
  4  WHERE en.new_max_dt     > (to_date('09302009','mmddyyyy') - 1) 
  5  AND en.enc_key    = d.enc_key
  6  AND d.dc_key       = 153384
  7  AND ( ( d.rc_st_key = 5
  8  AND d.rec_status_key    = 3)
  9  OR ( d.rc_st_key    = 6
10  AND d.rec_status_key    = 3)
11  OR ( d.rc_st_key    = 10
12  AND d.rec_status_key   IN (8,10)) );

17 rows selected.

Elapsed: 00:01:58.14

Execution Plan
----------------------------------------------------------
Plan hash value: 1992610540

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name                | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                     |  6322 |   203K|   651 (100)| 00:00:08 |      |      |         |
|   1 |  PX COORDINATOR          |                     |       |       |            |          |      |      |         |
|   2 |   PX SEND QC (RANDOM)    | :TQ10001            |  6322 |   203K|   651 (100)| 00:00:08 |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN             |                     |  6322 |   203K|   651 (100)| 00:00:08 |  Q1,01 | PCWP |            |
|   4 |     BUFFER SORT          |                     |       |       |            |          |  Q1,01 | PCWC |            |
|   5 |      PX RECEIVE          |                     |  6322 |   111K|     5  (60)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND BROADCAST  | :TQ10000            |  6322 |   111K|     5  (60)| 00:00:01 |      | S->P | BROADCAST  |
|*  7 |        INDEX RANGE SCAN  | IX_ir_diag_01  |  6322 |   111K|     5  (60)| 00:00:01 |      |      |         |
|   8 |     PX BLOCK ITERATOR    |                     |   129M|  1846M|     2   (0)| 00:00:01 |  Q1,01 | PCWC |            |
|*  9 |      INDEX FAST FULL SCAN| IX_ir_enc_021 |   129M|  1846M|     2   (0)| 00:00:01 |  Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("EN"."enc_key"="D"."enc_key")
   7 - access("D"."dc_key"=153384)
       filter("D"."rc_st_key"=10 AND ("D"."REC_STATUS_KEY"=8 OR "D"."REC_STATUS_KEY"=10) OR
              "D"."rc_st_key"=5 AND "D"."REC_STATUS_KEY"=3 OR "D"."rc_st_key"=6 AND "D"."REC_STATUS_KEY"=3)
   9 - filter("EN"."NEW_MAX_DT">TO_DATE(' 2009-09-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
    1498917  consistent gets
    1496220  physical reads
          0  redo size
        417  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
         17  rows processed


The table segments (test.ir_enc test.ir_diag) and respective index having stale statistics, so thought of using Dynamic sampling but the plan looks not good. Logical IO's got higher.


test@IRDST1> SELECT /*+ dynamic_sampling(d,3) dynamic_sampling(en,3) */ en.enc_key
  2  FROM test.ir_enc en,
  3    test.ir_diag d
  4  WHERE en.new_max_dt     > (to_date('09302009','mmddyyyy') - 1) --old diag cd key thru_dt -- TT 126269
  5  AND en.enc_key    = d.enc_key
  6  AND d.dc_key       = 153384
  7  AND ( ( d.rc_st_key = 5
  8  AND d.rec_status_key    = 3)
  9  OR ( d.rc_st_key    = 6
10  AND d.rec_status_key    = 3)
11  OR ( d.rc_st_key    = 10
12  AND d.rec_status_key   IN (8,10)) );

17 rows selected.

Elapsed: 00:02:05.28

Execution Plan
----------------------------------------------------------
Plan hash value: 4167657302

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                     |  6323 |   203K|   653 (100)| 00:00:08 |     |         |            |
|   1 |  PX COORDINATOR            |                     |       |       |            |          |     |         |            |
|   2 |   PX SEND QC (RANDOM)      | :TQ10002            |       |       |            |          |  Q1,02 | P->S | QC (RAND)  |
|   3 |    BUFFER SORT             |                     |  6323 |   203K|            |          |  Q1,02 | PCWP |            |
|   4 |     CONCATENATION          |                     |       |       |            |          |  Q1,02 | PCWP |            |
|   5 |      NESTED LOOPS          |                     |     1 |    33 |     2   (0)| 00:00:01 |  Q1,02 | PCWP |            |
|   6 |       BUFFER SORT          |                     |       |       |            |          |  Q1,02 | PCWC |            |
|   7 |        PX RECEIVE          |                     |       |       |            |          |  Q1,02 | PCWP |            |
|   8 |         PX SEND BROADCAST  | :TQ10000            |       |       |            |          |        | S->P | BROADCAST  |
|   9 |          BUFFER SORT       |                     |  6323 |   203K|            |          |     |         |            |
|* 10 |           INDEX RANGE SCAN | IX_ir_diag_01  |     1 |    18 |     2   (0)| 00:00:01 |     |         |            |
|  11 |       PX BLOCK ITERATOR    |                     |     1 |    15 |     2   (0)| 00:00:01 |  Q1,02 | PCWC |            |
|* 12 |        INDEX FAST FULL SCAN| IX_ir_enc_021 |     1 |    15 |     2   (0)| 00:00:01 |  Q1,02 | PCWP |            |
|* 13 |      HASH JOIN             |                     |  6322 |   203K|   651 (100)| 00:00:08 |  Q1,02 | PCWP |            |
|  14 |       BUFFER SORT          |                     |       |       |            |          |  Q1,02 | PCWC |            |
|  15 |        PX RECEIVE          |                     |  6322 |   111K|     5  (60)| 00:00:01 |  Q1,02 | PCWP |            |
|  16 |         PX SEND BROADCAST  | :TQ10001            |  6322 |   111K|     5  (60)| 00:00:01 |     | S->P | BROADCAST  |
|  17 |          BUFFER SORT       |                     |  6323 |   203K|            |          |     |         |            |
|* 18 |           INDEX RANGE SCAN | IX_ir_diag_01  |  6322 |   111K|     5  (60)| 00:00:01 |     |         |            |
|  19 |       PX BLOCK ITERATOR    |                     |   129M|  1846M|     2   (0)| 00:00:01 |  Q1,02 | PCWC |            |
|* 20 |        INDEX FAST FULL SCAN| IX_ir_enc_021 |   129M|  1846M|     2   (0)| 00:00:01 |  Q1,02 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  10 - access("D"."dc_key"=153384)
       filter(("D"."rc_st_key"=5 AND "D"."REC_STATUS_KEY"=3 OR "D"."rc_st_key"=6 AND "D"."REC_STATUS_KEY"=3)
              AND ("D"."rc_st_key"=5 AND "D"."REC_STATUS_KEY"=3 OR "D"."rc_st_key"=6 AND "D"."REC_STATUS_KEY"=3) AND
              ("D"."rc_st_key"=5 AND "D"."REC_STATUS_KEY"=3 OR "D"."rc_st_key"=6 AND "D"."REC_STATUS_KEY"=3))
  12 - filter("EN"."NEW_MAX_DT">TO_DATE(' 2009-09-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "EN"."enc_key"="D"."enc_key")
  13 - access("EN"."enc_key"="D"."enc_key")
  18 - access("D"."dc_key"=153384 AND "D"."rc_st_key"=10)
       filter("D"."rc_st_key"=10 AND ("D"."REC_STATUS_KEY"=8 OR "D"."REC_STATUS_KEY"=10) AND
              (LNNVL("D"."rc_st_key"=6) OR LNNVL("D"."REC_STATUS_KEY"=3)) AND (LNNVL("D"."rc_st_key"=5) OR
              LNNVL("D"."REC_STATUS_KEY"=3)))
  20 - filter("EN"."NEW_MAX_DT">TO_DATE(' 2009-09-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
    2997834  consistent gets
    2992440  physical reads
          0  redo size
        436  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         16  sorts (memory)
          0  sorts (disk)
         17  rows processed


Tom:

1) please help me why dynamic sampling is not good in this case ?

2) I don't know why the optimizer has done the steps 4-12 in the Dynamic sampling Explain plan? I dont know what is the need for that (CONCATENATION & NESTED LOOPS).
Tom Kyte
November 29, 2010 - 2:21 am UTC

dynamic sampling will not, cannot, make up for "stale statistics". Dynamic sampling will fill in gaps, provide MISSING statistics.

so, what you have now is the use of STALE statistics with a little bit of extra stuff added by dynamic sampling.

Either get rid of the stale (bad, misleading, incorrect) statistics or get them freshened.

If you get rid of them, dynamic sampling will get them.
If you freshen them, the optimizer will have meaningful data.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.