Skip to Main Content
  • Questions
  • When are partition statistics beneficial?

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, David.

Asked: December 30, 2004 - 11:13 am UTC

Last updated: February 11, 2019 - 4:49 am UTC

Version: 9.2.0

Viewed 10K+ times! This question is

You Asked

Tom,

When are partition-level statistics used by the optimizer in 9iR2 and 10g? The only thing I've found in the docs so far (trying to avoid RTFM) is that they're used when a query fit the form SELECT ... FROM MYTABLE PARTITION (PTN_NAME). Specifically, do the partition stats get used if the query is doing partition elimination such as SELECT ... FROM MTABLE WHERE MYDATE BETWEEN TO_DATE(:date1) AND TO_DATE(:date2) ?

My particular interest is focused on using a 'rolling time window' of partitions based on a DATE column; should we gather partition stats each time a new partition is loaded, or do the table-level stats have more influence?

Thanks for your help!

David

and Tom said...

partition level statistics are used when the query plan would have explicit partition START and STOP keys -- and the START=STOP.

eg:

ops$tkyte@ORA9IR2> CREATE TABLE t
2 (
3 dt date,
4 x int
5 )
6 PARTITION BY RANGE (dt)
7 (
8 PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) tablespace users,
9 PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) tablespace tools,
10 PARTITION junk VALUES LESS THAN (MAXVALUE)
11 )
12 /

Table created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> delete from plan_table;

0 rows deleted.

ops$tkyte@ORA9IR2> explain plan for
2 select * from t where dt = to_date('12-mar-2003','dd-mon-yyyy');

Explained.

ops$tkyte@ORA9IR2> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 2 | | |
|* 1 | TABLE ACCESS FULL | T | 1 | 22 | 2 | 1 | 1 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("T"."DT"=TO_DATE('2003-03-12 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

Note: cpu costing is off

optimizer knows "this query hits partition 1 and only partition 1, therefore, let's use the stats on partition 1

14 rows selected.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> variable x varchar2(30)
ops$tkyte@ORA9IR2> delete from plan_table;

2 rows deleted.

ops$tkyte@ORA9IR2> explain plan for
2 select * from t where dt = to_date(:x,'dd-mon-yyyy');

Explained.

ops$tkyte@ORA9IR2> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------

-------------------------------------------------------------------------------
| Id | Operation | Name| Rows | Bytes | Cost | Pstart| Pstop |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 2 | | |
| 1 | PARTITION RANGE SINGLE| | | | | KEY | KEY |
|* 2 | TABLE ACCESS FULL | T | 1 | 22 | 2 | KEY | KEY |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("T"."DT"=TO_DATE(:Z,'dd-mon-yyyy'))

Note: cpu costing is off

15 rows selected.

This query knows it'll partition eliminate, but not down to "which partition", therefore the optimizer will use the table level statistics to optimize this since it doesn't know "partition 1, 2 or 3"




Rating

  (77 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Ptn stats

David Scott, December 30, 2004 - 1:34 pm UTC

Thanks for the quick and useful answer! My (long and ugly) query plan shows a PSTART of 1 and a PSTOP of 90. Based on your response, I would expect the partition stats to be used, but I'm extrapolating this from your example. Is this still correct?

Tom Kyte
December 30, 2004 - 2:20 pm UTC

No, because there are 1..90 partitions involved so....

table stats are used.

the criteria is:

a) pstart and pstop do not say "KEY"
b) pstart = pstop (a single partition is hit)


So, just to be completely complete:

ops$tkyte@ORA9IR2> explain plan for
  2  select * from t where dt >= to_date('12-mar-2003');
 
Explained.
 
ops$tkyte@ORA9IR2> select * from table(dbms_xplan.display);
 
PLAN_TABLE_OUTPUT
------------------------------------
 
-------------------------------------------------------------------------------
| Id  | Operation            |  Name  | Rows  | Bytes | Cost  | Pstart| Pstop |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |    12 |   264 |     2 |       |       |
|   1 |  PARTITION RANGE ALL |        |       |       |       |     1 |     3 |
|*  2 |   TABLE ACCESS FULL  | T      |    12 |   264 |     2 |     1 |     3 |
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("T"."DT">=TO_DATE('2003-03-12 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
 
Note: cpu costing is off
 
15 rows selected.
 
<b>so, pstart = 1, pstop = 3 ===>>> global table stats are used</b>

 

Thanks

David Scott, December 30, 2004 - 2:49 pm UTC

Tom,

Thanks for clarifying this - I always appreciate your ability to make things more understandable.


clarification

reader, December 30, 2004 - 3:05 pm UTC

From above results, how do you interpret "global table stats are used" or partition stats are used? Thanks.


Tom Kyte
December 30, 2004 - 3:27 pm UTC

you don't, it was a statement.


I was showing that "if you see this in the explain plan, global stats will be used. On the other hand, if you see this (pstart=pstop=NUMBER), partition stats will be used"


If more than one partition could be accessed (start!=stop)
OR the precise partition to be accessed is not known (start is not a number)
then global stats kick in.

Only when the precise partition to be access is known by the optimizer will the local partition stats be used.

Global Stats Vs Partition Stats

A reader, December 30, 2004 - 4:02 pm UTC

Hi Tom,
In a DW enviornment, where the queries are submitted by our friendly business users, and we don't have much control. Though we know that "most of the time" users will query with partition key, yet they can occasionally query without partition key. Do you think we should gather global stats also?
Thanks


Tom Kyte
December 30, 2004 - 4:32 pm UTC

hard to say -- if your data is 'nice' you could just set them from the individual partitions (eg: num rows => sum(num rows), num blks => sum(num blks))

it is mostly column level stuff that gives issues. If partition 1 has 100 distinct values of X, and so do partitions 2, 3, 4, 5 do you have

a) 100 distinct values of X
b) 500 distinct values of X
c) some number between 100 and 500
d) all of the above


the answer to the optimizer is (d). so, you know your data better than I (hopefully :). do you think you suffer from that in a wide ranging fashion?

Stats

A reader, December 30, 2004 - 11:57 pm UTC

Hi Tom,
Thanks for quick response.
As per data distribution, the variation between partitions with various distinct values is not much. However, as we don't gather table level stats (ie., all table level stats - num_rows etc., are null), how optimiser will behave, if partition key is not part of the where clause?

Tom Kyte
December 31, 2004 - 10:56 am UTC

are you sure?  when all of the partitions have been analyzed, it should be rolled up:

ops$tkyte@ORA9IR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   int
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) tablespace users,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) tablespace tools,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /
 
Table created.
 
ops$tkyte@ORA9IR2> insert into t select to_date( '12-mar-2003', 'dd-mon-yyyy' ) + mod(rownum,3), 1, 1 from all_users;
 
22 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', 'PART1', granularity=>'PARTITION' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', 'PART2', granularity=>'PARTITION' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', 'JUNK', granularity=>'PARTITION' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                      22
 
ops$tkyte@ORA9IR2> select partition_name, num_rows from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
PART1                                   7
PART2                                   8
JUNK                                    7
 
ops$tkyte@ORA9IR2>
 

Peter, December 31, 2004 - 7:32 am UTC

Hi Tom,

the setup of our tables is similar in so far as our main tables are partitioned by a 'rolling time window' on a date column, whereby one partition is dropped and a new partition is added every day. Data is then subsequently loaded into the new partition during the day. The data in the different partitions is fairly similar for all of them.

For the different queries run against them the CBO does in some cases know which partition to hit, but in many cases it doesn't. Therefore we need statistics on partition level (especially for the new partition once data has been loaded into it), but we need also global statistics as well.

Gathering statistics on global table-level takes too long to run on a daily basis (up to 20 hours, depending on the workload).

My question now is, will it be sufficient to gather stats for the new partition only after the dataload (this is fast) and relying on global statistices which are gathered only on a monthly basis?

Many Thanks for your advise, and I wish you a brilliant New Year.

Tom Kyte
June 23, 2011 - 7:46 am UTC

only you can answer the last question, for it takes knowledge of your data to properly answer that.

the largest problems will center around column statistics -- high and low values. If you have monotomically increasing values and the global column stats are way out of wack -- the optimizer might think its:

a) going to get tons and tons of rows when in fact it'll get none.
b) going to get no rows when in fact it'll get tons and tons.

those will mostly be the cases you need to worry about.

A reader, January 02, 2005 - 2:26 pm UTC

Thanks Tom.



Roll up - partition level stats

A reader, January 02, 2005 - 4:54 pm UTC

Hi Tom,
We don't gather stats for all partitions. We gather stats for a partition as we move forward from months to months. Initially, we created 24 months partitions, and to start with we gathered stats for first partition. For next month -partition stats, we use min(partition_position) where num_rows is null, and gather stats through a DB procedure. With this approach, as there are partitions with null stats, partition level stats are ***not*** rolled up.The way you demostrated with the example, we have to change our mechanism for gathering the partition level stats, in order for partition level stats to rollup?
Thanks

Tom Kyte
January 02, 2005 - 8:00 pm UTC

suggestion: gathering stats on empty partitions is really fast -- so do it?

(start!=stop) and (start is not a number) will partition elimnation works?

A reader, January 02, 2005 - 9:40 pm UTC

If global stats are used - will it do partition elimination

if global stats are used will have to go through all the 90 partition or identify the partition range from global stats and go to the corresponding partition? -- which is true here

simply putting - (start!=stop) and (start is not a number) will partition elimnation works?

Thanks a lot

Tom Kyte
January 02, 2005 - 10:42 pm UTC

yes, absolutely (that was sort of the crux of the answer -- when a single KNOWN partition is identified -- pstart = pstop and pstart is a NUMBER -- then local, else if we just KNOW a partition will be accessed but not which one..... we use global)


when you see "KEY KEY" in the plan -- partition elimination is happening, but we cannot use a single partitions statistics (we don't know which one)

when you see "3 9" in the plan -- we are using just partitions 3 through 9 (out of thousands perhaps) but we cannot us an individual partitions statistics.....




A reader, January 03, 2005 - 12:45 am UTC

fantastic reply ..

Lets take your above plan

-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12 | 264 | 2 | | |
| 1 | PARTITION RANGE ALL | | | | | 1 | 3 |
|* 2 | TABLE ACCESS FULL | T | 12 | 264 | 2 | 1 | 3 |
-------------------------------------------------------------------------------

there are only 3 partition and its doing range all and partition 1 to 3. In this case, it will use the global stats and does partition elimination.

So, partition elimination is occuring but the partitions stats are not used,

Since its taking the global stats, if the index is used it will use the global index (prefixed) on the table or the local index ?

I have similiar situation in my case here. I dont see the local index in my plan at all. I feel that, since it has used the global stats, it could not use the local index - am i right ?

Thanks for your time

Tom Kyte
January 03, 2005 - 8:19 am UTC

well, given that there are only 3 partitions

and it says "pstart = 1, pstop = 3", we didn't really have the ability to eliminate anything.


either index could be used -- IF you submit a query that could use an index, which we did not here.

global stats are "TABLE stats", don't confuse them -- just because we use 'TABLE STATS' (not partition stats) does not preclude a GLOBAL index, or a LOCAL index, either or is possible.



that we used "global table stats" has no bearing on the selection or exclusion of a certain class of indices.

Alberto Dell'Era, January 03, 2005 - 8:24 am UTC

>when you see "KEY KEY" in the plan -- partition elimination is happening,
>but we cannot use a single partition statistics (we don't know which one)

Not true in 9.2.0.6 with bind variable peeking, if i'm not mistaken:

create table tb (
key int,
x int
)
partition by range (key)
(
partition p_oneval values less than (10),
partition p_distinct values less than (99999999)
);

-- insert 10,000 identical values in p_oneval
insert into tb partition (p_oneval) (key, x)
select 1 , rownum
from all_objects where rownum <= 10000;

-- insert 10,000 different values in p_distinct
insert into tb partition (p_distinct) (key, x)
select 10+rownum-1, rownum
from all_objects where rownum <= 10000;

-- a local index on key
-- it is very selective on p_distinct,
-- not selective at all on p_oneval
create index tb_key on tb (key) LOCAL;

-- gather stats (global + local)
exec dbms_stats.gather_table_stats (user, 'tb', cascade=>true);

-- same statement, different bind values:

alter session set sql_trace=true;

variable key number
exec :key := 1; -- hits partition p_oneval
select /* key=1 */ x from tb where key = :key;

Rows Row Source Operation
------- ---------------------------------------------------
10000 PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=690 r=0 w=0 time=66914 us)
10000 TABLE ACCESS FULL TB PARTITION: KEY KEY (cr=690 r=0 w=0 time=37510 us)

exec :key := 100; -- hits partition p_distinct
select /* key=100 */ x from tb where key = :key;

Rows Row Source Operation
------- ---------------------------------------------------
1 PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=4 r=0 w=0 time=137 us)
1 TABLE ACCESS BY LOCAL INDEX ROWID OBJ#(97233) PARTITION: KEY KEY (cr=4 r=0 w=0 time=117 us)
1 INDEX RANGE SCAN OBJ#(97236) PARTITION: KEY KEY (cr=3 r=0 w=0 time=83 us)(object id 97236)

Different plans for different bind values => bind variable peeking
perfect plan for selected partition => partition stats are used

Plans fetched from v$sql_plan show the same thing; explain plan shows (wrongly imho) always the second plan.

what each one stands for ?

A reader, January 04, 2005 - 8:15 am UTC

Tom,

Can you please give me a link or tell me what the below means in explain plan.

PARTITION RANGE ITERATOR PARTITION: KEY KEY
PARTITION RANGE ALL PARTITION: KEY KEY
PARTITION RANGE SINGLE PARTITION: KEY KEY

I am getting the any of the above in my tkprof. Not clear what each one stands for .. Please requesting your inputs

Thanks a lot




Tom Kyte
January 04, 2005 - 8:47 am UTC

range iterator -- we are going to ITERATE over a range of partitions, we know that we'll hit 0, 1 or more but won't know until query execution time how many.

range all -- we are going to hit em all

range single -- we are going to hit one of them.


ops$tkyte@ORA9IR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   int
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) tablespace users,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) tablespace tools,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> variable x varchar2(30)
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where dt between to_date(:x) and to_date(:x)+1;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=35)
   1    0   FILTER
   2    1     PARTITION RANGE (ITERATOR)
   3    2       TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=1 Bytes=35)
 
<b>the between on DT (partition key) lets us partition eliminate, but we don't know WHICH ONES - we'll hit 0, 1 or more partitions when we execution this query with different :x values</b>

 
ops$tkyte@ORA9IR2> select * from t where dt = to_date(:x);
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=35)
   1    0   PARTITION RANGE (SINGLE)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=1 Bytes=35)
 
<b>that knows "exactly ONE partition" will be hit</b>

 
ops$tkyte@ORA9IR2> select * from t where x > 5;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=12 Bytes=420)
   1    0   PARTITION RANGE (ALL)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=12 Bytes=420)
 
 
 
ops$tkyte@ORA9IR2> set autotrace off
<b>that knows "all partitions will be hit" since x>5 could be in any and all - we need to look at them all</b>
 

Thank you !!!

A reader, January 04, 2005 - 9:46 am UTC


no chance of partition elimination - - > PARTITION RANGE (ALL),

A reader, January 04, 2005 - 9:52 am UTC

Tom,

So, when i see PARTITION RANGE (ALL), it means there is no chance of partition elimination - right?

Thanks




Tom Kyte
January 05, 2005 - 8:04 am UTC

it means it is iterating over each partition, all will be peeked at.

Empty partition Stats

A reader, January 04, 2005 - 3:44 pm UTC

Hi Tom,
Regarding:
suggestion: gathering stats on empty partitions is really fast -- so do it?
________________
That is fine, but we have a logic, once monthly load is finished, get the min(partition_position)name where num_rows is null, and then analyze that partition. Is there any way, to get the partition name of the partition which need to be analyzed in which the data has been loaded? We have a control table which holds the key like 200412 on which the partitions are based i.e.,
tables are partitioned as
table a
partition a1 values < 200411
partition a2 values < 200412
partition a3 values < 200501
......
...
and control table keeps only one record based upon the month key just processed.
Regards


Tom Kyte
January 05, 2005 - 9:03 am UTC

not following you -- all I'm saying is have empty partitions analyzed -- analyze all partitions that haven't been analyzed.

you can still use whatever logic you like right now to analyze the partition you just loaded into - i'm just saying "make sure all partitions have stats"

Alberto Dell'Era, January 06, 2005 - 5:12 pm UTC

(please ignore if you are too busy ;)

May you please confirm or deny my findings above:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:30966739234862#31083457790627 <code>

If confirmed, it looks like a very important thing to know and remember ...

Tom Kyte
January 06, 2005 - 7:24 pm UTC

oh, the only reason I didn't comment on your correct observations is because it looked so much like the way I would have proved to you that it worked that way!


I honestly can say -- every day, each and every day (sat/sun included) I learn something new about Oracle myself. that was one of them.

Alberto Dell'Era, January 06, 2005 - 7:40 pm UTC

>oh, the only reason I didn't comment on your correct observations is because
>it looked so much like the way I would have proved to you that it worked that
>way!

Looks like you have shaped my mind someway ;)

(actually the investigating method itself is the most important thing i've learnt from you)

Many thanks!

ora 6550 and ora 922 during import February 25, 2003

Randal Singh, January 07, 2005 - 7:43 am UTC

I would rather suggest a different approach to the problem which i was facing during import of a table.
The suggestion which you had provided well worked but I would rather suggest if you have two machines (M1,M2) on which you have installed oracle 9i with different release such as M1---- Oracle 9i Release 9.1 and M2 --- Oracle 9i Release 9.2 if you have taken the export of a table t1 from M2 and want to import in M1 you would face the error Ora-00922. The solution is, In the tnsname.ora of M1 you mention the entry so as to connect to M2 Machine ( The way you do for distributed database). Staying in M1 machine you take the export of the t1 table of M2 machine i.e from Oracle 9i Release 9.2 database with this command (exp scott/tiger@m2 file=abc.dmp grants=n log=abc.log). After the data got exported you import the data from m1 machine this will solve the problem.

Tom Kyte
January 07, 2005 - 9:25 am UTC

huh? which part of this thread are you referring to?

global statistics

steve, January 11, 2005 - 5:01 pm UTC

Hi Tom,

a. Regarding Peter's post above about the daily rolling window,
we have the same situation. 4 months of data, partitioned
by day, every day one partition is marked as stale....
Like Peter, we cannot collect statistics on the whole
table every day (too long). The question is, when do we need
to refresh the global statistics?.

I understood your answer, but it doesn't help in deciding
when to collect global statistics. Is there any way to
determine if the 'global statistics' have gone stale?

b. When you execute:


exec dbms_stats.gather_schema_stats('TEST',
estimate_percent=>5,
options=>'gather stale',
cascade=>true,
granularity=>'PARTITION'
);
By specifying

granularity=>'PARTITION'

does that only gather statistics for partitioned tables and
ignore non partitioned tables? I know the parameter is only pertinent
to partitioned tables, but does specifying it ignore nonpartitioned
tables? It seems to. Because, prior to running it, I listed the
stale objects and got:

TABLE ==> PP_TRANSACTIONS partition(CD_TRANS_13_29)
TABLE ==> T partition()

And after running it I got:

TABLE ==> T partition()

If this is true, that would mean we would need 2 passes, one with
granularity=>'PARTITION' and one without in order to analyze a schema?

Thanks Tom.

Steve


Tom Kyte
January 11, 2005 - 7:04 pm UTC

a) you know better than I, you actually know the data.

globally -- things like num rows -- they can be rolled up pretty accurately.  Things like "number of distinct values in a column" -- well, they cannot.

So, you'd be the best at knowing whether that presents an issue for you or not.

You know, will you get bit by things like this:

ops$tkyte@ORA9IR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   int
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) tablespace users,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) tablespace tools,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t select to_date( '12-mar-2003', 'dd-mon-yyyy'), rownum, rownum from all_objects;
 
27816 rows created.
 
ops$tkyte@ORA9IR2> insert into t select to_date( '13-mar-2003', 'dd-mon-yyyy'), rownum, rownum from all_objects;
 
27816 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where x > 30000;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=28 Card=2 Bytes=34)
   1    0   PARTITION RANGE (ALL)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=28 Card=2 Bytes=34)
 
 
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t select to_date( '15-mar-2003', 'dd-mon-yyyy'), 30000+rownum, rownum from all_objects;
 
27816 rows created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', partname => 'JUNK', GRANULARITY => 'PARTITION' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where x > 30000;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=28 Card=2 Bytes=34)
   1    0   PARTITION RANGE (ALL)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=28 Card=2 Bytes=34)
 
<b>whoops, didn't know about the new ID's we loaded up last week at the global level</b> 
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where x > 30000;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=42 Card=40149 Bytes=682533)
   1    0   PARTITION RANGE (ALL)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=42 Card=40149 Bytes=682533)
 
 
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>

<b>of course, if you say "yes --there are two or three columns like that", there is always "dbms_stats.SET......", you loader (which has *seen* the data) can in fact tell us things...</b>



b) well done, exact test i would have set up :) (and was getting ready to -- but then read you did it...)

you would run two if you said "partition" once.  Worst case by running two -- the second one would have had nothing to do, so rather harmless to run two.
 

What about statistics when exchanging partitions?

Seth, January 11, 2005 - 5:49 pm UTC

Could you please complement your previous explanation with a discussion of what happens to statistics when exchanging partitions into a table? I've been struggling with this for a while. I find that in those cases, statistics do not "rollup" into the next level and would like to find a way to avoid re-analyzing what are often large and numerous partitions for our warehouse.

To illustrate, I initially create a range-hash composite partitioned table CLAIM with one initial range and then gather stats with granularity=>'all' and cascade=>true. This table has a few local indexes defined and no global indexes. I then add a partition P_PLAN1 and analyze it with granularity=>'PARTITION' and cascade. For discussion purposes, I'll call the partition column plan_id and the hash column pat_id, and the first defined partition has range plan_id < 1, and the second added partition has range plan_id < 2. (BTW, version is 9.2.0.5 on Solaris 9.)

So far so good: I see analysis dates (and rows=0) at all levels global, partition, subpartition (from views dba_tables, dba_tab_partitions, dba_tab_subpartitions, and corresponding views for indexes).

I now create a table PLAN1 designed to be exchanged in. It is hashed on pat_id (#partitions in PLAN1 = #subpartitions in CLAIM), and the value of plan_id in every row will be '1'. I then populate PLAN1 with sqlldr and analyze granularity=>all cascade=>true.

I then exchange PLAN1 in into CLAIM:
ALTER table CLAIM
EXCHANGE partition P_PLAN1
WITH table PLAN1
INCLUDING indexes WITHOUT validation

The data from PLAN1 is now available in CLAIM. However, when I look at the stats through dba views, I find the following:

- dba_tab_subpartition and dba_ind_subpartition do reflect tha stats from PLAN1 correctly

- dba_tab_partition, dba_ind_partition, dba_tables, and dba_indexes do NOT show any change (they still show the original dates and rows=0 from prior to the exchange).

This has obviously been problem for the CBO when we perform queries and our only solution has been to re-analyze after exchange. I'm hoping you have some insight into this and can suggest a more efficient solution that is not in the documentation.


Incidentally, what happens to column statistics in my example? Do they "rollup" as well? What about in your example?

Many Thanks,

-- Seth


Tom Kyte
January 11, 2005 - 7:26 pm UTC

exchanging partitions will preserve partition level stats fine

global stats however will be affected in the same fashion as above -- column distributions are not easily derived from partition level information.

dynamic_sampling

Rob, January 11, 2005 - 8:42 pm UTC

Tom:

If I have a partitioned table but only global stats, no partition stats, and the query specifies a single value for the partition key in the predicate, will the global stats be used or will dynamic_sampling take place assuming the query has the /*+ dynamic_sampling(3) */ hint?

Thanks,

Rob

Tom Kyte
January 12, 2005 - 8:16 am UTC

it has the statistics -- the lack of partition stats is not "unanalyzed".

I tried a couple of variants -- joins, complex predicates. I could see it dynamically sample the table being joined to -- but not the individual partition.


even when there are no stats on the partitioned table, the dynamic sample query would simply include the predicate -- which has the "where partition_column = CONSTANT" -- meaning, it would not sample a "partition" in as much as the sampling query itself would use partition elimination to just sample a single partition.

A couple of additional questions

Rob, January 12, 2005 - 9:10 am UTC

Tom:

As a followup to your answer I have a couple of additional questions.
1) How can you tell that dynamic sampling is taking place?
2) Any way to force dynamic_sampling in a query even if the table does have stats.

Rob

Tom Kyte
January 12, 2005 - 9:55 am UTC

1) sql_trace=true, you'll see the sql in the tkprof.

2) it only kicks in when it needs them, if it finds all of the stats it needs in the dictionary, it'll not dynamically sample.




Thanks

steve mckee, January 12, 2005 - 9:16 am UTC

Thank you very much Tom for answering my question above.

Cheers.

Steve

What about statistics when exchanging partitions?

Seth, January 12, 2005 - 3:46 pm UTC

Um... Well, my experience doesn't bear that out. Here is the log of sequence of steps you can reproduce:

drop table many_parts;
drop table part1;

column last_analyzed format A18
column table_name format A12
column part_name format A12
column subpart_name format A12

CREATE or REPLACE VIEW VIEW_LEVELS /* for diagnostic purposes */
AS
select table_name, 'TABLE' as llevel, null as part_name, null as subpart_name, num_rows, last_analyzed
from user_tables
union ALL
select table_name, 'PARTITION' as llevel, PARTITION_NAME, null as subpart_name, num_rows, last_analyzed
from user_tab_partitions
union ALL
select table_name, 'SUBPARTITION' as llevel, PARTITION_NAME, SUBPARTITION_NAME, num_rows, last_analyzed
from user_tab_subpartitions
/


-- Main table

CREATE TABLE MANY_PARTS
PARTITION BY RANGE (PLAN_ID)
SUBPARTITION BY HASH (PAT_ID)
SUBPARTITIONS 2
(PARTITION P_PART0 VALUES LESS THAN (1)
)
AS
SELECT 0 as PLAN_ID,
ROWNUM as PAT_ID,
DO.*
FROM DBA_OBJECTS DO
WHERE OWNER = 'SYS'
/


select * from view_levels where table_name = 'MANY_PARTS';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
MANY_PARTS TABLE
MANY_PARTS PARTITION P_PART0
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4217
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4218


begin
dbms_stats.gather_table_stats(user, tabname=>'MANY_PARTS',granularity=>'ALL');
end;
/


select * from view_levels where table_name = 'MANY_PARTS';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
MANY_PARTS TABLE 4043
MANY_PARTS PARTITION P_PART0 4043
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4217 2037
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4218 2006


-- Table to be exchanged in

CREATE TABLE PART1
PARTITION BY HASH (PAT_ID)
PARTITIONS 2
AS
SELECT 1 as PLAN_ID,
ROWNUM as PAT_ID,
DO.*
FROM DBA_OBJECTS DO
WHERE OWNER = 'SYS'
/


select * from view_levels where table_name = 'PART1';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
PART1 TABLE
PART1 PARTITION SYS_P4219
PART1 PARTITION SYS_P4220

begin
dbms_stats.gather_table_stats(user, tabname=>'PART1',granularity=>'ALL');
end;
/


select * from view_levels where table_name = 'PART1';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
PART1 TABLE 4043
PART1 PARTITION SYS_P4219 2037
PART1 PARTITION SYS_P4220 2006


ALTER TABLE MANY_PARTS
ADD PARTITION P_PART1 VALUES LESS THAN (2)
/


select * from view_levels where table_name = 'MANY_PARTS';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
MANY_PARTS TABLE 4043
MANY_PARTS PARTITION P_PART0 4043
MANY_PARTS PARTITION P_PART1
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4217 2037
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4218 2006
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4221
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4222


begin
dbms_stats.gather_table_stats(user, tabname=>'MANY_PARTS',granularity=>'SUBPARTITION', partname=>'P_PART1');
end;
/


select * from view_levels where table_name = 'MANY_PARTS';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
MANY_PARTS TABLE 4043
MANY_PARTS PARTITION P_PART0 4043
MANY_PARTS PARTITION P_PART1 0
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4217 2037
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4218 2006
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4221 0
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4222 0

Note: So far, all this makes sense


ALTER TABLE MANY_PARTS
EXCHANGE partition P_PART1
WITH table PART1
INCLUDING indexes WITHOUT validation
/



SELECT COUNT(*) FROM MANY_PARTS;

COUNT(*)
----------
8086


select * from view_levels where table_name = 'MANY_PARTS';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
MANY_PARTS TABLE 4043
MANY_PARTS PARTITION P_PART0 4043
MANY_PARTS PARTITION P_PART1 0
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4217 2037
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4218 2006
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4221 2037
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4222 2006


Note above: Exchanging partition does not alter num_rows for PARTITION P_PART1, and table level does not reflect the correct total either (should be 8086).



select * from view_levels where table_name = 'PART1';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
PART1 TABLE 4043
PART1 PARTITION SYS_P4219 0
PART1 PARTITION SYS_P4220 0


Note above: Table level does not reflect the fact that there are no rows anymore.


begin
dbms_stats.gather_table_stats(user, tabname=>'MANY_PARTS',granularity=>'SUBPARTITION', partname=>'P_PART1');
end;
/


select * from view_levels where table_name = 'MANY_PARTS';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
MANY_PARTS TABLE 4043
MANY_PARTS PARTITION P_PART0 4043
MANY_PARTS PARTITION P_PART1 4043
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4217 2037
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4218 2006
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4221 2037
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4222 2006


Note above: Just to see what would happen, I re-analyzed P_PART1 at the supartition level. Partition level row count is now correct, but the table level is not.


begin
dbms_stats.gather_table_stats(user, tabname=>'MANY_PARTS',granularity=>'PARTITION', partname=>'P_PART1');
end;
/


select * from view_levels where table_name = 'MANY_PARTS';

TABLE_NAME LLEVEL PART_NAME SUBPART_NAME NUM_ROWS
------------ ------------ ------------ ------------ ----------
MANY_PARTS TABLE 4043
MANY_PARTS PARTITION P_PART0 4043
MANY_PARTS PARTITION P_PART1 4043
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4217 2037
MANY_PARTS SUBPARTITION P_PART0 SYS_SUBP4218 2006
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4221 2037
MANY_PARTS SUBPARTITION P_PART1 SYS_SUBP4222 2006

Note above: So I now I re-analyze at the Partition level in hopes of seeing an effect at the table level. No cigar.


So my questions are;

- Is this the correct functionality, or am I missing something?

- Is it unreasonable for me to expect stats to get "rolled up" and "rolled down" automatically as I exchange partitions in and out?

- Is this perhaps a bug?

In any event, it appears if I want correct num_rows at the table level, I have no choice but to re-analyze the entire table at the global level (at a mininum). With extremely large tables, this seems like an undesirable result. Any suggested workarounds?

Thanks,

-- Seth



Tom Kyte
January 12, 2005 - 5:39 pm UTC

yes, it is not maintaining the global stats (partition level stats are fine, global stats are not)

short of analyzing the entire table -- we could use dbms_stats.set_table_stats, eg:

ops$tkyte@ORA9IR2> select * from v;
 
WHAT                           TABLE_NAME     NUM_ROWS
------------------------------ ------------ ----------
TABLE                          T                    24
TABLE                          T_NEW                40
PART                           PART1                 8
PART                           PART2                 8
PART                           JUNK                  8

<b>part1, part2, junk belong to T</b>

 
ops$tkyte@ORA9IR2> alter table t
  2  exchange partition junk with table t_new;
 
Table altered.
 
ops$tkyte@ORA9IR2> select * from v;
 
WHAT                           TABLE_NAME     NUM_ROWS
------------------------------ ------------ ----------
TABLE                          T                    24
TABLE                          T_NEW                 8
PART                           PART1                 8
PART                           PART2                 8
PART                           JUNK                 40
 
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'T', numrows => 56 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select * from v;
 
WHAT                           TABLE_NAME     NUM_ROWS
------------------------------ ------------ ----------
TABLE                          T                    56
TABLE                          T_NEW                 8
PART                           PART1                 8
PART                           PART2                 8
PART                           JUNK                 40
 
<b>and do that for num blocks and other statistics you want to update.  things that cannot be rolled up -- "distinct values" -- will be problematic</b>
 

Effect of gather stale on partition level stats

Rob, February 01, 2005 - 9:35 am UTC

Tom:

If I am using dbms_stats with the gather stale option and table and index monitoring is turned on, what happens in the following scenario:

A new partition is created and loaded with data. The amount of data in the new partition is less than 10% of the total in the table. Will gather stale cause this table to be skipped when I analyze, thus resulting in no partition level stats for the new partition and no update of the global stats?



What is rowid |row ?

A reader, February 03, 2005 - 11:31 pm UTC

What does the plan below mean.Have pasted only the partioned table portion of the explain plan

It looks like its using global stats and not partition stats.

what is the meaing of rowid | row there.

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
PARTITION RANGE ITERATOR | | | | | KEY | KEY |
TABLE ACCESS BY GLOBAL INDEX ROWID| my_partition_tab | 2034 | 53186 | 2 | ROWID | ROW
INDEX RANGE SCAN | my_partiton_index | 3537 | | 1 | KEY | KEY |


Thanx for all your help.

Tom Kyte
February 04, 2005 - 2:06 am UTC

it is saying "we will do partition elimination at run time"

won't don't know right now what partitions we need (key key), we'll need anywhere from 0 to all of them. the inputs to the query will tell us what we need ultimately.


so, this plan is doing N index range scans (N will be determined when the inputs are supplied and it can figure out the lo and hi partitions)

For each hit in the index, we'll do a table access by index rowid.


it uses global stats since it does not know at optimize time what partitions will be used (and partition stats are used when a single partition is identified to be used and that partition is known)

Rob, try it and see

Peter, February 04, 2005 - 4:04 am UTC

We are just about to change our stats collection method to a "gather stale" schema stats. On our TEST system (you must have one?) we ran a set of tests involving our batch data load and inspecting last_analyzed from user_tables and user_tab_partititions (and just for fun looked at how the user_tab_modifications changed before and after the dataload and also after the gather stats). Of course we also timed the runs!
Well... we were happy with what we saw!

What is better for performance and querying and DMLs ?

A reader, February 05, 2005 - 2:56 am UTC

1 )Having a Range partioned table but all partitions( say total 20 of them) in just one Tablespace

or

2) Having a partitioned table with all partitions in different tablespace

And the same argument for Partitoned Index for the above Partitoned table.

Thanks.

Tom Kyte
February 05, 2005 - 5:34 am UTC

trick question.

tablespaces aren't about performance.
tablespaces ARE about administration.


If you use SAME (strip and mirror everything) or ASM (new 10g feature, a database file system), then it really just "doesn't matter" if you have 1 or 100 tablespaces - they are all sharing the same disks.


Even if you aren't doing that (SAME or ASM), the same thing applies. A single tablespace with files from disk1, disk2, disk3, disk4, .... diskN will "stripe" naturally extent by extent.


In fact, if you put a tablespace per disk and put each partition on its own disk, that could be lots worse then striping over everything - if you have a popular partition (say the current data), that disk would be really hot. If you stripped -- all disks would be doing IO.



tablespaces are not about performance.
tablespaces are about ease of administration.


Say these range partitioned tables are DW tables. Of the 20 partitions -- 19 oare "read only" effectively (all of their data has been loaded). You could as an admin) alter them read only, back them up for the last time and forget about them. So, putting them into separate tablespaces would make sense for that reason.


So, think about this from an admin perspective.

Thanx for above Partiton answer But Q

A reader, February 05, 2005 - 11:40 am UTC

It is always said that use seperate tablespaces for Indexes and their tables.
Can I imply from above that this is just for ease of administration, no performance gains while doing I/O or Index updates ?

Thanks

Tom Kyte
February 06, 2005 - 3:04 am UTC

think about it.

what is a tablespace?

a bunch of files. nothing more, nothing less.

You can achieve precisely the same IO distribution over all disks using

a) one tablespace
b) hundreds of tablespaces

in fact, you'll find it easier perhap with a) and some nice striping of disks.


It is a myth (with some historical foundations in facts from the late 1980's, before striping, back when a 4 disk system "was really big", when you had schemas with 5 or 10 tables for an entire application) that you separate indexes and data for performance.





Missing link

SFT, February 07, 2005 - 3:54 am UTC

Hi Tom,

The link to this thread from OTN start page [surprisingly]points to Chris Chalk's J2EE Weblog. That's a bit confusing, and then after all the number of visitors to your site may drop by few dozens of unfortunates that could not figure out how to find it. That's not the problem for you, but for poor creatures out there ... :)

Please help!

Sandra Johns, February 10, 2005 - 11:41 pm UTC

I know that this is not the correct area for my question; BUT, I really need help with this one!

I getting a:

ORA-03116 invalid buffer length passed to a conversion routine

It happened in a PRO-COBOL program during a FETCH of a cursor, while other routines were accessing this table using a SELECT. This table is accessed using PROC ACCESS views. The OS is a HP-UX with a Oracle 9i database.

Thank you so much!

Tom Kyte
February 11, 2005 - 7:55 pm UTC

please contact support, i've no experience whatsoever with pro*cobol.

What about statistics when exchanging partitions?

Seth Goodman, March 15, 2005 - 10:03 am UTC

Sorry for posting again. It didn't seem to take yesterday.

Tom,

I wanted to thank your for your help (see Jan 12th) and share my experience with others. It's taken me a while to get to this point. This was my first time in dealing with many of the procedures required in the solution.

One question first: What does the 'no_invalidate' parameter do, as found in various dbms_stats procedures like 'set_table_stats'?

My example above was used just to illustrate the problem. My master database in fact has 5 tables, 4 of them range-hash, and the other just range. All of them have somewhere between 2 to 11 local indexes.

Before an exchange, I have to add a partition or split an existing partition in the master table. Adding is easy. Splitting is a headache because of dealing with the TS for the indexes. Although oracle was good about defining the TS for the tables, it kept the index TS in the same TS as the one defined for the existing partition. Nothing I could do would change that (it's not clear if this is correct behaviour). I was forced to REBUILD the indexes and define the TS I wanted at that time (I'm using the same TS as the table). Fortunately the new partition is empty and so it doesn't take long to execute, but it required a fair amount of PL/SQL to build the statements and execute immediate. Maybe someone can suggest a better built-in method...

The script I use for exchanging the partitions ended up having to do the following:
1) save the all current stats using export_table_stats (w/cascade) (table PART1)
2) edit the exported stats to match partition/subpartion and other criteria for importing into the master table (for Table, Index, and Column)
3) exchange the partition (PART1 <--> MANY_PARTS)
4) import the partition level stats using import_table_stats (w/cascade) (into MANY_PARTS)
5) Calculate global table stats using already exchanged partitions and using set_table_stats. Obviously, this couldn't be done for column stats.

When dealing with range-hash stats, I only needed to deal with the global to partition level exchange. Using my example, the stats in the partitions of PART1 were automatically exchanged into the supartitions of MANY_PARTS. The excercise described above was to only needed to exchange the global stats in PART1 into the partition level of MANY_PARTS. This is unnecessary if PART1 is not partitioned, in which case the global stats automatically exchange into the partition level of MANY_PARTS.

I had to calculate the global row stats individually for each object exchanged (tables and indexes). This could be done by re-analyzing MANY_PARTS after each exchange, but this is time consuming. So I use set_table_stats (as per Tom's suggestion) and set_index_stats after calculating the new number of rows and blocks etc. using the partition stats. Eventually, the tables will need be re-analyzed every so often, but this has worked ok so far. Query performance was crappy at first until I did my first analyze after about 6 partitions were exchanged: the CBO needed some global column stats to formulate good plans.

In conclusion, it would seem to me that Oracle should help streamline some of these steps automatically in the 'exchange partition' function. It may happen some day.

I hope my description has been clear enough and of use to others.

Thanks,

-- Seth



Tom Kyte
March 15, 2005 - 10:13 am UTC

no invalidate would cause the stats to be put in place, but not invalidate the dependent SQL in the shared pool -- only subsequent hard parses of sql would see the new stats, the existing parsed SQL would be in place as it exists.

Subpartition histograms

Robert, March 17, 2005 - 5:03 pm UTC

Tom,

We have noticed something that raises our suspicion about histograms on subpartitioned tables. We initially gathered table statistics when the tables were first built and once we started loading data we gathered on the exchanged subpartitions. This approach works very well as we can see that the subpartition statistics are rolling up to both the partition level and table level.

We discoverd that if we gathered stats initially with a granularity of ALL, then the stats did not roll up.

Here is the way we have been gathering stats on our subpartitioned tables:

dbms_stats.gather_table_stats(ownname => piTableOwner,
tabname => piTableName,
partname => piPartitionName,
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
granularity => 'SUBPARTITION',
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
cascade => true);


We use the following scripts to check on this periodically. The only time we have noticed discrepancies between the different levels for the last_analyzed times or num_rows has been when we encountered an error during the exchange process.

1 select table_name,
2 Table_lvl,
3 partition_name,
4 Part_lvl,
5 subpartition_name,
6 Subpart_lvl
7 from (select A.table_name,
8 A.last_analyzed Table_lvl,
9 B.partition_name,
10 B.last_analyzed Part_lvl,
11 C.subpartition_name,
12 C.last_analyzed Subpart_lvl,
13 max(B.last_analyzed) OVER (partition by B.table_name) max_part_la,
14 max(C.last_analyzed) OVER (partition by C.table_name) max_subpart_la
15 from dba_tables A,
16 dba_tab_partitions B,
17 dba_tab_subpartitions C
18 where A.owner = B.table_owner
19 and A.table_name = B.table_name
20 and A.table_name = C.table_name
21 and B.table_name = C.table_name
22 and B.partition_name = C.partition_name)
23 where (Table_lvl != max_part_la or Table_lvl != max_subpart_la)
24 and Part_lvl = max_part_la
25 and Subpart_lvl = max_subpart_la
26* order by table_name desc nulls last
sposp> /
Press <RETURN> To Continue...

no rows selected


1 select distinct (A.table_name) table_name,
2 A.num_rows Table_rows,
3 sum(B.num_rows) over (partition by B.table_name) /
4 count(*) over (partition by C.table_name, C.partition_name) Part_rows,
5 sum(C.num_rows) over (partition by C.table_name) Subpart_rows
6 from dba_tables A,
7 dba_tab_partitions B,
8 dba_tab_subpartitions C
9 where A.owner = B.table_owner
10 and A.owner = C.table_owner
11 and A.table_name = B.table_name
12 and A.table_name = C.table_name
13 and B.table_name = C.table_name
14* and B.partition_name = C.partition_name
sposp> /
Press <RETURN> To Continue...

Table
Name TABLE_ROWS PART_ROWS SUBPART_ROWS
------------------------------ ---------- ---------- ------------
D_DAILY_DC_INVENTORY_FACT 968164 968164 968164
D_DAILY_POS_FACT 29656244 29656244 29656244
D_DAILY_RETAIL_INVEN_FACT 81379154 81379154 81379154
D_WEEKLY_POS_FACT 64066815 64066815 64066815


If the optimizer only looks at the subpartition stats when it knows the ONE subpartition it needs to access, then most of the time we are going to use global stats. And since most of the time we are going to traverse multiple ranges, the global stats we will use are at the table level.


1 select distinct column_name,
2 count(*)
3 from dba_subpart_histograms
4 where owner = 'POSDM'
5 and table_name = 'D_DAILY_RETAIL_INVEN_FACT'
6 and subpartition_name = 'P200503_NPPC_PETSMART_USA'
7* group by column_name
sposp> /
Press <RETURN> To Continue...

Column
Name COUNT(*)
------------------------------ ----------
CREATE_D 2
CURRENCY_ID 2
CURRENCY_SOURCE_ID 2
D_ITEM_ID 201
D_RETAILER_PARTITION_ID 1
D_RETAIL_LOC_ID 2
INDICATOR_ID 2
IN_TRANSIT_Q 2
LAST_UPDATE_D 2
ON_HAND_Q 62
ON_ORDER_Q 2
RETAILER_NET_COST_DOLLAR_A 2
RETAILER_NET_COST_EURO_A 2
RETLR_NET_COST_LCL_CRNCY_A 2
SESSION_RUN_ID 2
SYSTEM_CALENDAR_KEY_N 1
TOTAL_INVENTORY_Q 2



1 select distinct column_name,
2 count(*)
3 from dba_part_histograms
4 where owner = 'POSDM'
5 and table_name = 'D_DAILY_RETAIL_INVEN_FACT'
6 and partition_name = 'P200503'
7* group by column_name
sposp> /
Press <RETURN> To Continue...

Column
Name COUNT(*)
------------------------------ ----------
CREATE_D 2
CURRENCY_ID 2
CURRENCY_SOURCE_ID 2
D_ITEM_ID 2
D_RETAILER_PARTITION_ID 2
D_RETAIL_LOC_ID 2
INDICATOR_ID 2
LAST_UPDATE_D 2
ON_HAND_Q 2
ON_ORDER_Q 2
RETAILER_NET_COST_DOLLAR_A 2
RETLR_NET_COST_LCL_CRNCY_A 2
SESSION_RUN_ID 2
SYSTEM_CALENDAR_KEY_N 2
TOTAL_INVENTORY_Q 2


1 select distinct column_name,
2 count(*)
3 from dba_tab_histograms
4 where owner = 'POSDM'
5 and table_name = 'D_DAILY_RETAIL_INVEN_FACT'
6* group by column_name
sposp> /
Press <RETURN> To Continue...

Column
Name COUNT(*)
------------------------------ ----------
CREATE_D 2
CURRENCY_ID 2
CURRENCY_SOURCE_ID 2
D_ITEM_ID 2
D_RETAILER_PARTITION_ID 2
D_RETAIL_LOC_ID 2
INDICATOR_ID 2
LAST_UPDATE_D 2
ON_HAND_Q 2
ON_ORDER_Q 2
RETAILER_NET_COST_DOLLAR_A 2
RETLR_NET_COST_LCL_CRNCY_A 2
SESSION_RUN_ID 2
SYSTEM_CALENDAR_KEY_N 2
TOTAL_INVENTORY_Q 2



If the optimizer looks at the histogram information at the table level then the number of buckets and the associated endpoint values are not accurate when compared to the same column at the subpartition level. Are you aware of any issues with this? I have searched everywhere and I havenÂ’t found anything yet to indicate that this is a problem.


Thanks,
Robert D. Ware

Tom Kyte
March 17, 2005 - 6:03 pm UTC

I'm not what "this" is in regards to "issues with this"

if we cannot eliminate down to a single partition -- global stats, else partition stats.

To the Point

Robert, March 21, 2005 - 10:41 am UTC

Tom,

Do you know whether or not the use of partitioning obviates the optimizer's ability to use histograms when it uses global/table level stats.



Tom Kyte
March 22, 2005 - 10:45 am UTC

No, it'll either

a) use global histograms
b) local histograms

depending on the query.

Am I reading this right?

A reader, April 26, 2005 - 11:33 am UTC

So basically, if a query does partition elimination and turns out that only a few partitions will be looked at I should gather table level stats, not partition level stats?

Here's why I am asking. I have a table with 1.2 billion rows. This table is partitioned on a POST_DATE column. The partitions range from 01-01-1993 up to 12-31-2010. This table is used to collect new rows on a daily basis. All rows inserted have a POST_DATE of yesterday so we don't insert or update any rows with a POST_DATE < yesterday. So, we always "append" new rows. Most queries against this table have a date range on the POST_DATE using BETWEEN and the optimizer usually looks at a number of partitions.

Recently I decided to only gather stats on the most current partition only (the partitions that is currently being filled). Is this a good idea? Should I gather stats on the entire table and not just the most current partition? Does it even make sense to gather table stats when the "old" rows are static? How should I gather stats??!!


Tom Kyte
April 26, 2005 - 1:35 pm UTC

if your queries do not boil down to a single partition, known at parse time (eg: they have a pstart=pstop=something_known) -- global stats are used.

If a single partition is known to be hit, those partition stats will be used.


You can gather or you can SET, the optimizer needs the facts, it could well be you know the facts (high bounds of columns, number of rows and other facts)

Partitions and granularity

A reader, April 26, 2005 - 2:25 pm UTC

Thanks for your reply.

I still have two questions regarding gather_table_stats.

1. If I specify a partition name and use the default granularity, does that mean Oracle will gather global stats and partition stats on the partition I specified only?

2. If I don't specify a partition, and use the default granularity, will Oracle gather global stats and partition stats on ALL partitions?

Please let me know. Thank you very much.

Tom Kyte
April 26, 2005 - 2:45 pm UTC

ops$tkyte@ORA9IR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   varchar2(25)
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) tablespace users,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) tablespace tools,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t select to_date('10-mar-2003','dd-mon-yyyy')+mod(rownum,10), 1, 1 from all_users;
 
24 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select partition_name, num_rows from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
PART1                                   8
PART2                                   3
JUNK                                   13
 
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                      24
 
ops$tkyte@ORA9IR2> delete from t;
 
24 rows deleted.
 
ops$tkyte@ORA9IR2> commit;
 
Commit complete.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', 'PART1' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select partition_name, num_rows from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
PART1                                   0
PART2                                   3
JUNK                                   13
 
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                       0
 



<b>so, only the partition stats for A partition and the table stats for the entire table were modified by that gather.... the old partition stats would stick</b>
 

Confusion

A reader, June 06, 2005 - 11:02 pm UTC

1. Regarding your latest followup "so, only the partition stats for A partition and the table stats for the
entire table were modified by that gather.... the old partition stats would stick", so why does it show

TABLE_NAME NUM_ROWS
------------------------------ ----------
T 0

shouldnt that be 13+3=16?

2. I have a DW where monthly partitions are added to a table from the OLTP system. Partition key is the yyyymm date column. Stats are gathered on the table and it is exchanged into the partitioned table so the partitioned table has partition level stats.

Most queries use only 1 partition at a time (pstart=pstop), but some of them use more than 1.

There seems to be some confusion as to whether partition level stats rollup to the table level or not.

Quoting you from earlier on the page:

"when all of the partitions have been analyzed, it should be
rolled up"

But then when someone proved otherwise, you admitted that

"yes, it is not maintaining the global stats (partition level stats are fine, global stats are not)"

So, which is it? Do partition level stats roll up to the table level or not?

3. Gathering table stats after each partition is added is hugely expensive.

If partition stats indeed do rollup properly to the table level, is there even a need to gather table level stats explicitly? Yes, column level stats complicate things, but what is the solution?

What is a workable solution to this problem? Use dbms_stats.set_*? How exactly?

Thanks

Tom Kyte
June 07, 2005 - 8:07 am UTC

1) because as stated:

"so, only the partition stats for A partition 
and the table stats for the 
entire table were modified by that gather.... the old partition stats would 
stick"

the TABLE stats were modified
the PART1 stats were modified
the remaining stats were not touched.

2) well, I showed that when all of the partitions are analyzed, it was rolled up.

but when the partition was swapped in, it changed behavior ("admitted" sounds like there is some culpability, something false was said, you can observe all of this behavior yourself)

It depended on how the information was all gathered.

So, look at your system -- you tell us, what do YOU see.  Seems you should have sufficient information to make this determination.

It was two different cases

case 1) analyze individual partitions in a partitioned table, what happens to global stats.

case 2) exchange a partition, what happens to global stats

case 1 and 2 are different.  And you can see what happens by putting the two examples together:

                                                                                                                                                      
ops$tkyte@ORA9IR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   int
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION jan_2005 VALUES LESS THAN (to_date('01-feb-2005','dd-mon-yyyy')),
 10    PARTITION feb_2005 VALUES LESS THAN (to_date('01-mar-2005','dd-mon-yyyy')),
 11    PARTITION mar_2005 VALUES LESS THAN (to_date('01-apr-2005','dd-mon-yyyy')),
 12    PARTITION apr_2005 VALUES LESS THAN (to_date('01-may-2005','dd-mon-yyyy'))
 13  )
 14  /
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t
  2  select to_date('01-jan-2005','dd-mon-yyyy')+mod(rownum,to_date('30-apr-2005','dd-mon-yyyy')-to_date('01-jan-2005','dd-mon-yyyy')+1),
  3         rownum, rownum
  4    from all_objects
  5  /
 
27986 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', 'jan_2005', granularity=>'PARTITION' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', 'feb_2005', granularity=>'PARTITION' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', 'mar_2005', granularity=>'PARTITION' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', 'apr_2005', granularity=>'PARTITION' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                   27986
 
ops$tkyte@ORA9IR2> select partition_name, num_rows from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JAN_2005                             7249
FEB_2005                             6524
MAR_2005                             7223
APR_2005                             6990


<b>ok, as the last partition was analyzed, rollup</b>


ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table t add partition may_2005 values less than (to_date('01-jun-2005','dd-mon-yyyy'));
 
Table altered.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table temp ( dt date, x int, y int );
 
Table created.
 
ops$tkyte@ORA9IR2> insert into temp
  2  select to_date('01-may-2005')+mod(rownum,30) dt, rownum x, rownum y
  3    from all_objects
  4   where rownum <= 5000;
 
5000 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'TEMP' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table t exchange partition may_2005 with table temp;
 
Table altered.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                   27986
 
ops$tkyte@ORA9IR2> select partition_name, num_rows from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
MAY_2005                             5000
JAN_2005                             7249
FEB_2005                             6524
MAR_2005                             7223
APR_2005                             6990

<b>Now, the table T was not analyzed, it was exchanged into.  Note that the partition level statistics came over but the global table stats were not modified, we haven't gathered stats on anything related to table T yet</b>

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', 'may_2005', granularity=>'PARTITION' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                   32986
 
ops$tkyte@ORA9IR2> select partition_name, num_rows from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
MAY_2005                             5000
JAN_2005                             7249
FEB_2005                             6524
MAR_2005                             7223
APR_2005                             6990
 
<b>so.... there you go.  you need to run dbms_stats on that table to have the global information rolled up</b>


3) what is the solution?  you are the only one that can answer that.  are the rolled up stats *sufficient for you, given your data*.  If there were a single solution, there would be no choices to make.

If you have information about the true "number of distinct values", or "high value, low value" for columns -- you can set this information using the documented set_* functions -- rather than gather it.

Statistics need only be good enough for the optimizer to understand reality, what is there. 
 

A reader, June 07, 2005 - 12:28 pm UTC

"the TABLE stats were modified"

OK so the table had 24 rows in 3 partitions, your DELETE removed all of them.

But then running

exec dbms_stats.gather_table_stats( user, 'T', 'PART1' );

shows

PART1 0
PART2 3
JUNK 13

but at the global level shows

T 0

So, what I meant was, running gather_table_stats( user, 'T', 'PART1' ) had the desired effect of accurately reflecting stats for PART1, but at the expense of the undesirable side-sffect of setting the global num_rows to 0.

Isnt this bad? Specifiying the partname=> parameter to gather_table_stats should only modify partition level stats for that partition and not mess with global stats, right?

Thanks

Tom Kyte
June 07, 2005 - 1:13 pm UTC

but the delete deleted ALL ROWS from the ENTIRE TABLE. it did the right thing here. There were no rows left?



A reader, June 07, 2005 - 1:58 pm UTC

Right, its just that things dont add up i.e. partition stats dont add up to the table stats. (13+3+0 != 0)

Not sure if any CBO decision is based on things adding up in this fashion, but just thought I would point it out.

Thanks

Tom Kyte
June 07, 2005 - 2:01 pm UTC

because the command I used said "just do this one partition" and let the granularity default (that partition and that table). the database did exactly what it was told to do.

Randall, August 24, 2005 - 3:03 pm UTC

In you example above you show:

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> variable x varchar2(30)
ops$tkyte@ORA9IR2> delete from plan_table;

2 rows deleted.

ops$tkyte@ORA9IR2> explain plan for
2 select * from t where dt = to_date(:x,'dd-mon-yyyy');
ops$tkyte@ORA9IR2> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------

-------------------------------------------------------------------------------
| Id | Operation | Name| Rows | Bytes | Cost | Pstart| Pstop |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 2 | | |
| 1 | PARTITION RANGE SINGLE| | | | | KEY | KEY |
|* 2 | TABLE ACCESS FULL | T | 1 | 22 | 2 | KEY | KEY |
-------------------------------------------------------------------------------

Q: With bind variable peeking, shouldn't it know that you will hit just one partition and which partition to hit?

Thanks..


Tom Kyte
August 25, 2005 - 3:18 am UTC

explain plans do not know about bind variables. autotrace and explain plan do not do "bind variable peeking"

(and in current releases of the sofware, the to_date will preclude the bind variable peeking from taking place)

Partition statistics and local index

Serge Shmygelsky, April 27, 2006 - 5:00 am UTC

Hello Tom,

I have a partitioned table of the following structure:
RASH@RASH> desc msc_data
Name Null? Type
------------------------------- -------- ----------------------------
1 FILE_ID NOT NULL NUMBER
2 CALL_TYPE_ID NUMBER
3 RECORD_TYPE_ID NUMBER
4 CUSTOMER_TYPE_ID NUMBER
5 CALL_DATE DATE
6 DURATION NUMBER
7 MSC_NO VARCHAR2(8)
8 IS_DELETED VARCHAR2(1)
9 LAC VARCHAR2(10)
10 CID VARCHAR2(4)
11 IMEI VARCHAR2(20)
12 IMSI VARCHAR2(20)
13 DN_NUM VARCHAR2(40)
14 MSRN VARCHAR2(40)
15 B_NUMBER VARCHAR2(40)
16 B_MSRN VARCHAR2(40)
17 TRANS_B_NUMBER VARCHAR2(40)
18 C_NUMBER VARCHAR2(40)
19 TRUNK_IN VARCHAR2(8)
20 TRUNK_OUT VARCHAR2(8)
21 VOLUME_UP NUMBER
22 VOLUME_DOWN NUMBER

Table is partitioned by call_date (a partition for each day) and then by call_type_id (5 more subpartitions).

I have created the following index:
CREATE INDEX msc_data_idx
ON msc_data
(
call_date,
call_type_id,
dn_num,
b_number,
trans_b_number
)
LOCAL
NOLOGGING
COMPRESS 2
PARALLEL
/

I'm collecting statistics for the table partition and rebuilding index partitions after daily data loading. The problem is that I cannot make the following query using index:

RASH@RASH> explain plan for select call_date, call_type_id, dn_num from msc_data where call_date between '01.04.2006' and '03.04.2006' and call_type_id = 1 and dn_num = 'adfdf' and b_number = 'aaaa' and trans_b_number = 'aaa';

Plan is:
RASH@RASH> @p

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3415524828

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 24565 (30)| 00:04:55 | | |
| 1 | PX COORDINATOR | | | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 1 | 41 | 24565 (30)| 00:04:55 | | |
| 3 | PX BLOCK ITERATOR | | 1 | 41 | 24565 (30)| 00:04:55 | KEY | KEY |
|* 4 | TABLE ACCESS FULL| MSC_DATA | 1 | 41 | 24565 (30)| 00:04:55 | KEY | KEY |
-------------------------------------------------------------------------------------------------

When I use hint:
RASH@RASH> explain plan for select /*+ INDEX (msc_data msc_data_idx) */ call_date, call_type_id, dn_num from msc_data where call_date between '01.04.2006' and '03.04.2006' and call_type_id = 1 and dn_num = 'adfdf' and b_number = 'aaaa' and trans_b_number = 'aaa';
RASH@RASH> @p

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2064827886

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 61043 (31)| 00:12:13 | | |
| 1 | PX COORDINATOR | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 41 | 61043 (31)| 00:12:13 | | |
| 3 | PX PARTITION LIST SINGLE| | 1 | 41 | 61043 (31)| 00:12:13 | KEY | KEY |
|* 4 | INDEX SKIP SCAN | MSC_DATA_IDX | 1 | 41 | 61043 (31)| 00:12:13 | KEY | KEY |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("CALL_DATE">=TO_DATE('2006-04-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
"CALL_TYPE_ID"=1 AND "DN_NUM"='adfdf' AND "B_NUMBER"='aaaa' AND "TRANS_B_NUMBER"='aaa' AND
"CALL_DATE"<=TO_DATE('2006-04-03 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
filter("DN_NUM"='adfdf' AND "TRANS_B_NUMBER"='aaa' AND "B_NUMBER"='aaaa' AND
"CALL_TYPE_ID"=1)

So the questions are:
1. Why doesn't Oracle use the index? From my understanding, it must be used as it fits into all the rules: sql query contains predicates used in the index and the data itself - there is no need to read the table at all.
2. Why the cost is so different in the plans? For me it is obviuosly wrong as if I run query with hint it takes about 5 seconds to finish and without hint it runs for hours.

Please advise.



Partition statistics and local index - ctd

Serge Shmygelsky, May 12, 2006 - 5:59 am UTC

Hello Tom,

sorry for bothering you, but I cannot find the way out from this index situation. Now it's getting even stranger as when I run a query under schema's owner, the index is used. Any other user having 'select' on this table and running the same query, gets the table full scanned. Looks like noone else can see the statistics. How can it be? And what I'm doing wrong?

Hope this time you'll hear me crying...

Tom Kyte
May 12, 2006 - 9:29 am UTC

do a 10053 trace as the person who can "use the index" and as the person "who cannot use the index" and compare them.


</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:63445044804318 <code>

I have a feeling "these are not the same tables" perhaps (you have confused yourself with synonyms or views maybe) or there is fine grained access control involved or ....

Indexes are not something you have to "grant" on. They are available to everyone.

Partition statistics and local index - ctd

Serge Shmygelsky, May 17, 2006 - 9:10 am UTC

Hi Tom,

thanks for your information. Unfortunately, I cannot reproduce the situation. I gathered the statistics for the table and the index. Index is local prefixed (you can see it from my previous post). And now the index is not used at all in any case. I'm going to lose my faith in humanity :).

Below is the output from a query with 'EVENTS' on. Maybe you could suggest something on that?

select count(*) from msc_data where call_date between '15.05.2006' and '16.06.2006' and call_type_id = 1
*********************
QUERY BLOCK SIGNATURE
*********************
qb name was generated
signature (optimizer): qb_name=SEL$1 nbfros=1 flg=0
fro(0): flg=0 objn=58191 hint_alias="MSC_DATA"@"SEL$1"
*****************************
SYSTEM STATISTICS INFORMATION
*****************************
Using NOWORKLOAD Stats
CPUSPEED: 72 millions instruction/sec
IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
IOSEEKTIM: 10 milliseconds (default is 10)
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: MSC_DATA Alias: MSC_DATA (Using composite stats)
(making adjustments for partition skews)
ORIGINAL VALUES:: #Rows: 4460853857 #Blks: 23103220 AvgRowLen: 114.00
SUBPARTITIONS::
PRUNED: 33
ANALYZED: 33 UNANALYZED: 0
#Rows: 4460853857 #Blks: 183616 AvgRowLen: 114.00
Index Stats::
Index: FILE_ID_IDX Col#: 1
USING COMPOSITE STATS
LVLS: 2 #LB: 4847987 #DK: 2115 LB/K: 2292.00 DB/K: 11636.00 CLUF: 24610820.00
Index: MSC_DATA_IDX Col#: 5 2 13 15 17
USING COMPOSITE STATS
LVLS: 2 #LB: 12607386 #DK: 4574215695 LB/K: 1.00 DB/K: 1.00 CLUF: 4510454397.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#2): CALL_TYPE_ID(NUMBER)
AvgLen: 4.00 NDV: 16 Nulls: 0 Density: 1.1097e-10 Min: -111 Max: 910
Histogram: Freq #Bkts: 16 UncompBkts: 5454 EndPtVals: 16
Column (#5): CALL_DATE(DATE)
AvgLen: 8.00 NDV: 2880147 Nulls: 0 Density: 3.4720e-07 Min: 2453827 Max: 2453872
Table: MSC_DATA Alias: MSC_DATA
Card: Original: 4460853857 Rounded: 24176128 Computed: 24176127.82 Non Adjusted: 24176127.82
Access Path: TableScan
Cost: 61301.30 Resp: 8514.07 Degree: 0
Cost_io: 49731.00 Cost_cpu: 9972979094
Resp_io: 6907.08 Resp_cpu: 1385135985
Access Path: index (index (FFS))
Index: MSC_DATA_IDX
resc_io: 3414502.00 resc_cpu: 921084943807
ix_sel: 0.0000e+00 ix_sel_with_filters: 1
Access Path: index (FFS)
Cost: 4483112.02 Resp: 622654.45 Degree: 8
Cost_io: 3414502.00 Cost_cpu: 921084943807
Resp_io: 474236.39 Resp_cpu: 127928464418
kkofmx: index filter:"MSC_DATA"."CALL_TYPE_ID"=1 AND "MSC_DATA"."CALL_DATE">=TO_DATE('2006-05-15 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND "MSC_DATA"."CALL_DATE"<=TO_DATE('2006-06-16 00:00:00', 'yyyy-mm-dd hh24:mi:ss')
Access Path: index (skip-scan)
SS sel: 1.1148e-04 ANDV (#skips): 5591029
SS io: 5591029.00 vs. index scan io: 259330.00
Skip Scan rejected
Access Path: index (IndexOnly)
Index: MSC_DATA_IDX
resc_io: 259332.00 resc_cpu: 21697496356
ix_sel: 0.02057 ix_sel_with_filters: 0.0054196
Cost: 284504.66 Resp: 39514.54 Degree: 8
Best:: AccessPath: TableScan
Cost: 8514.07 Degree: 8 Resp: 8514.07 Card: 24176127.82 Bytes: 0
***************************************



Tom Kyte
May 17, 2006 - 9:32 pm UTC

one would want to see the table, the index create and the dbms_xplan output.

Partition statistics and local index - ctd

Serge Shmygelsky, May 18, 2006 - 11:19 am UTC

Hi Tom,

thanks for you attention. Here is the information:

Table:

CREATE TABLE &owner..msc_data
(
file_id NUMBER NOT NULL,
call_type_id NUMBER,
record_type_id NUMBER,
customer_type_id NUMBER,
call_date DATE,
duration NUMBER,
msc_no VARCHAR2(8),
is_deleted VARCHAR2(1),
lac VARCHAR2(10),
cid VARCHAR2(4),
imei VARCHAR2(20),
imsi VARCHAR2(20),
dn_num VARCHAR2(40),
msrn VARCHAR2(40),
b_number VARCHAR2(40),
b_msrn VARCHAR2(40),
trans_b_number VARCHAR2(40),
c_number VARCHAR2(40),
trunk_in VARCHAR2(8),
trunk_out VARCHAR2(8),
volume_up NUMBER,
volume_down NUMBER
)
PARTITION BY RANGE ( call_date )
SUBPARTITION BY LIST ( call_type_id )
(
PARTITION part_18022006 VALUES LESS THAN ( TO_DATE( '18.02.2006', 'DD.MM.YYYY' ) )
PCTFREE 0
STORAGE
(
INITIAL 16M
NEXT 16M
)
COMPRESS
(
SUBPARTITION part_18022006_1 VALUES (30),
SUBPARTITION part_18022006_2 VALUES (31),
SUBPARTITION part_18022006_3 VALUES (1),
SUBPARTITION part_18022006_4 VALUES (2),
SUBPARTITION part_18022006_5 VALUES (DEFAULT)
)
)
/

Each day a partition is added by:
ALTER TABLE MSC_DATA
ADD PARTITION part_18052006
VALUES LESS THAN ( TO_DATE( '19.05.2006', 'DD.MM.YYYY' ) ) PCTFREE 0 STORAGE ( INITIAL 16M NEXT 16M ) COMPRESS
( SUBPARTITION part_18052006_1 VALUES (1),
SUBPARTITION part_18052006_2 VALUES (2),
SUBPARTITION part_18052006_3 VALUES (30),
SUBPARTITION part_18052006_4 VALUES (31),
SUBPARTITION part_18052006_5 VALUES ( DEFAULT ))

Index:
CREATE INDEX msc_data_idx
ON &owner..msc_data
(
call_date,
call_type_id,
dn_num,
b_number,
trans_b_number
)
LOCAL
NOLOGGING
COMPRESS 2
PARALLEL
/

Query and dbms_xplan output:

RASH@RASH> explain plan for select count(*) from msc_data where call_date between '15.05.2006' and '16.06.2006' and call_type_id = 1;
RASH@RASH> @p

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 969686288

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 25997 (20)| 00:05:12 | | |
| 1 | SORT AGGREGATE | | 1 | 12 | | | | |
| 2 | PX COORDINATOR | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 12 | | | | |
| 4 | SORT AGGREGATE | | 1 | 12 | | | | |
| 5 | PX BLOCK ITERATOR | | 78M| 899M| 25997 (20)| 00:05:12 | KEY | KEY |
|* 6 | TABLE ACCESS FULL| MSC_DATA | 78M| 899M| 25997 (20)| 00:05:12 | KEY | KEY |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - filter("CALL_DATE"<=TO_DATE('2006-06-16 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
RASH@RASH> explain plan for select /*+ INDEX (msc_data msc_data_idx) */ count(*) from msc_data where call_date between '15.05.2006' and '16.06.2006' and call_type_id = 1;
RASH@RASH> @p

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 247641948

-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 119K (9)| 00:23:51 | | |
| 1 | SORT AGGREGATE | | 1 | 12 | | | | |
| 2 | PX COORDINATOR | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 12 | | | | |
| 4 | SORT AGGREGATE | | 1 | 12 | | | | |
| 5 | PX PARTITION LIST SINGLE| | 78M| 899M| 119K (9)| 00:23:51 | KEY | KEY |
|* 6 | INDEX RANGE SCAN | MSC_DATA_IDX | 78M| 899M| 119K (9)| 00:23:51 | KEY | KEY |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("CALL_DATE">=TO_DATE('2006-05-15 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
"CALL_TYPE_ID"=1 AND "CALL_DATE"<=TO_DATE('2006-06-16 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
filter("CALL_TYPE_ID"=1)

Statistics is gathered 5 minutes ago and no changes have been done since.

One thought just hit me: maybe this is because fields can contain nulls?

Tom Kyte
May 19, 2006 - 9:54 am UTC

that is using partition elimination and choosing to full scan the partitions using multiblock IO instand of using single block IO on an index range scan.

it wanted to use multiblock IO, you have parallel query enabled.

I see nothing inheritly wrong here. seems most of the data would be hit, table is compressed, might even be smaller than the index.


so, does the index plan actually *go faster*

Partition statistics and local index - ctd

Serge Shmygelsky, May 26, 2006 - 11:19 am UTC

Hello Tom,

I've done some tests according to your suggestions. Here are the results:

RASH@RASH> explain plan for select /*+ INDEX (msc_data msc_data_idx ) */ b_number, record_type_id, duration from rash.msc_data where call_date between '15.05.2006' and '22.05.2006' and call_type_id = 1 and dn_num = '80503111011';


RASH@RASH> @p

PLAN_TABLE_OUTPUT
-------------------
Plan hash value: 3241126277

---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31 | 1271 | 290K (9)| 00:58:07 | | |
| 1 | PX COORDINATOR | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 31 | 1271 | 290K (9)| 00:58:07 | | |
| 3 | PX PARTITION LIST SINGLE | | 31 | 1271 | 290K (9)| 00:58:07 | KEY | KEY |
| 4 | TABLE ACCESS BY LOCAL INDEX ROWID| MSC_DATA | 31 | 1271 | 290K (9)| 00:58:07 | KEY | KEY |
|* 5 | INDEX RANGE SCAN | MSC_DATA_IDX | 31 | | 290K (9)| 00:58:07 | KEY | KEY |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("CALL_DATE">=TO_DATE('2006-05-15 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND "CALL_TYPE_ID"=1 AND
"DN_NUM"='80503111011' AND "CALL_DATE"<=TO_DATE('2006-05-22 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
filter("DN_NUM"='80503111011' AND "CALL_TYPE_ID"=1)
Elapsed: 00:00:00.19

RASH@RASH> select /*+ INDEX (msc_data msc_data_idx ) */ b_number, record_type_id, duration from rash.msc_data where call_date between '15.05.2006' and '22.05.2006' and call_type_id = 1 and dn_num = '80503111011';

Elapsed: 00:02:21.04

Statistics
----------------------------------------------------------
61 recursive calls
3 db block gets
644696 consistent gets
644624 physical reads
5264 redo size
1876 bytes sent via SQL*Net to client
238 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
15 sorts (memory)
0 sorts (disk)
81 rows processed

RASH@RASH> explain plan for select b_number, record_type_id, duration from rash.msc_data where call_date between '15.05.2006' and '22.05.2006' and call_type_id = 1 and dn_num = '80503111011';

RASH@RASH> @p

PLAN_TABLE_OUTPUT
--------------------
Plan hash value: 3415524828

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31 | 1271 | 76016 (27)| 00:15:13 | | |
| 1 | PX COORDINATOR | | | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 31 | 1271 | 76016 (27)| 00:15:13 | | |
| 3 | PX BLOCK ITERATOR | | 31 | 1271 | 76016 (27)| 00:15:13 | KEY | KEY |
|* 4 | TABLE ACCESS FULL| MSC_DATA | 31 | 1271 | 76016 (27)| 00:15:13 | KEY | KEY |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("DN_NUM"='80503111011' AND "CALL_DATE"<=TO_DATE('2006-05-22 00:00:00',
'yyyy-mm-dd hh24:mi:ss'))

RASH@RASH> select b_number, record_type_id, duration from rash.msc_data where call_date between '15.05.2006' and '22.05.2006' and call_type_id = 1 and dn_num = '80503111011';

Elapsed: 00:04:19.58

Statistics
----------------------------------------------------------
45 recursive calls
3 db block gets
1630781 consistent gets
1378072 physical reads
680 redo size
1912 bytes sent via SQL*Net to client
238 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
81 rows processed

So the index usage was almost two times faster. Looks like due to some reason Oracle chooses incorrect plan (from human point of view). Why could it happen?

Tom Kyte
May 27, 2006 - 9:24 pm UTC

got luck due to the fewer physical IO's in this case. So, do you have system statistics in place on your system - does the optimizer have information about how IO's perform on your system and such?

And do you really want your tables parallel enabled (usually, the answer is "not really")

Partition statistics and local index - ctd

Serge Shmygelsky, May 29, 2006 - 4:22 am UTC

Hello Tom,

I've already told our DBA that he needed to turn system statistics on. Hope your suggestion will help me to persuade him.
As to parallelism - the main idea of this table is to be used as a data source for some MViews. The process is as follows: calls for entire day are loaded once per day, MViews are refreshed and after that there are no modifications/deletions. Nobody is actually selecting the data from the table itself, only from MViews (except rare cases). It is a kind of small datawarehouse. So I thought it would be good to have parallelism in order to refresh this MViews as fast as possible. But at the meantime in case I need the data from the table, I build the aforementioned index to speed up queries. Is this approach correct?


Tables with Partitions

dav, April 15, 2007 - 10:31 pm UTC

Hi,
We've a table partitioned by month. We've data from Jan'05 to current month '07.. When we analyze that table, it analyzes all the partitions. But data is changed only recent month(s) partitions., How can we configure analyze to analyze only the partitions with recently changed data.

Thanks in advance.
Tom Kyte
April 16, 2007 - 1:14 pm UTC

don't use analyze
use dbms_stats. you can tell it what partitions to gather statistics for.


Entire Schema

Dav, April 16, 2007 - 5:28 pm UTC

As of today, We analyze entire schema. Do we need to change it to loop thro tables in that schema and run individual analyze/dbms_stats? Please let me know.
Tom Kyte
April 17, 2007 - 9:44 am UTC



you don't *need* to do anything - there is a chance you might want to do something differently.

You could, for example, use table monitoring (on by default in 10g, alter it on table by table in 9i) and use dbms_stats to gather just "stale" statistics. Maybe, it might be appropriate, it might not.

Not knowing your needs, or anything about what you do - no one can say...

DM, April 18, 2007 - 1:47 am UTC

Hi Tom

I am new to the Performance tunning .Here is the Explain plan for the Query I am running .
I am aware that low COST = GOOD.
Please let me know how I can tune this query for optimal Performance as the cost is very high.

OPERATION OPTIONS OBJECT NAME OBJECT INSTANCE OPTIMIZER COST CARDINALITY BYTES
==========================================================================================================================================================
SELECT STATEMENT FIRST_ROWS 0 3192385 3192385 1833117 1688300757
NESTED LOOPS 1 0 1 3192385 1833117 1688300757
NESTED LOOPS 2 1 1 1359268 1833117 1629641013
NESTED LOOPS 3 2 1 466498 892770 277651470
TABLE ACCESS FULL TABLE_BUS_ORG 3 ANALYZED 4 3 1 146 58294 3789110
TABLE ACCESS BY INDEX ROWID TABLE_SITE 4 ANALYZED 5 3 2 8 15 3690
INDEX RANGE SCAN IND_PRIMARY2BUS_ORG NON-UNIQUE ANALYZED 1 6 5 1 2 94
TABLE ACCESS BY INDEX ROWID TABLE_ADDRESS 2 7 2 2 1 2 1156
INDEX UNIQUE SCAN ADDRESS_OBJINDEX UNIQUE 1 8 7 1 1
TABLE ACCESS BY INDEX ROWID TABLE_COUNTRY 1 ANALYZED 9 1 2 1 1 32
INDEX UNIQUE SCAN COUNTRY_OBJINDEX UNIQUE ANALYZED 1 10 9 1 1


select table_site.objid, table_site.site_id, table_site.S_site_id,
table_site.name, table_site.S_name, table_site.type,
table_address.address, table_address.S_address, table_address.city, table_address.S_city,
table_address.state, table_address.S_state, table_country.code,
table_site.site_type, table_site.status,
table_bus_org.objid, table_bus_org.org_id, table_bus_org.S_org_id,
table_bus_org.name, table_bus_org.S_name, table_bus_org.type,
table_country.name, table_country.S_name, table_address.zipcode, table_address.S_zipcode,
table_country.objid, table_address.objid,
table_site.region, table_site.S_region, table_site.district, table_site.S_district,
table_address.address_2, table_site.x_manned_owner,
table_site.x_op_support_site, table_site.S_x_op_support_site, table_site.x_delivery_party_name, table_site.S_x_delivery_party_name,
table_site.phone, table_site.x_gsa_cust_ref_no, table_site.S_x_gsa_cust_ref_no,
table_site.site_type, table_site.x_subtype,
table_site.x_fins_currency, table_site.x_fins_ban_type,
table_address.x_county, table_address.S_x_county
from table_country,table_address,
table_bus_org,table_site
where table_country.objid = table_address.address2country
AND table_bus_org.objid = table_site.primary2bus_org
AND table_address.objid = table_site.cust_primaddr2address

Please let me know If I missed anything.

Thanks
DM
IND
Tom Kyte
April 18, 2007 - 12:04 pm UTC

... I am aware that low COST = GOOD. ....

hmmm, it depends, just always say "it depends"

What about partition stats on a partitioned index?

Robert, April 23, 2007 - 3:44 pm UTC

Tom,

What about partition level statistics on a partitioned index (which is on a partitioned table)?

Given...
Partitioned table (on account_id by range).
Partitioned index (on contact_date by year).
Query with predicate "contact_date between '01/01/2006' and '12/31/2006'"

The query is using the contact_date index when it should be doing a full table scan. (ie. FTS runs 3x faster).

I've tried all types of combinations of statistics. Global, partitioned, on contact_date index and histogram on contact_date field in table.... but CBO will not choose the 'correct' path.

In desparation I set db_file_multiblock_read_count from 16 to 24 and then it did choose correct path (but caused bad side effects elsewhere in system).

This 2006 of data is approx 10% of data in table. And this is several million rows. It is fairly 'intuitive' to me that a full table scan would be much faster than millions of index lookups...... What info do I need to provide the CBO so it will come to same conclusion?

Thanks,

Robert.
Tom Kyte
April 23, 2007 - 4:51 pm UTC

what does the explain plan say (card=) compared to a tkprof row source operation (reality)

are they close. what kind of statistics do you have on contact_date, is it skewed, do you do that bad thing of "put in the year 9999 for nulls" stuff?

Information you requested.....

Robert, April 23, 2007 - 6:43 pm UTC

Tom,

Here is the query and the explain plans using both DB_MRC of 16 and 24.
No, we are not doing the null = year 9999 stuff.

The INTERACTIONS table has ~600M rows
There are ~110M rows where contact_date between 1/1/06 and 12/31/06 (~20%).
The contact_date index is partitioned by contact_date year (~12 partitions).

16:37:18 SQL> alter session set db_file_multiblock_read_count=16;

Session altered.

16:37:28 SQL> SELECT NA.ALPHA_NAME
16:37:32   2       , INT.TECHNIQUE ORIGINAL_TECH
16:37:32   3       , INT.SOLICITOR
16:37:32   4       , INT.USER_INSERTING
16:37:32   5       , DECODE(INT.TECHNIQUE, 'AN','CC', 'BQ','CC', 'CP','PH', 'CU','PH', 'CV','PH', 'IB','CC'
16:37:32   6                             , 'OD','CC', 'PF','CC', 'PR','PR', 'PT','VI', 'TD','PH', 'TN','PH'
16:37:32   7                             , 'TR','CC', 'WC','WC', 'IP','IP' ) TECHNIQUE 
16:37:32   8    FROM NAMES NA
16:37:32   9       , BRIEF_NAMES BN
16:37:32  10       , INTERACTIONS INT 
16:37:32  11   WHERE ':B1' = BN.BRIEF_NAME 
16:37:32  12     AND NA.ACCOUNT_ID = BN.ACCOUNT_ID 
16:37:32  13     AND NA.NAME_ID = BN.NAME_ID 
16:37:32  14     AND INT.STS != 'D' 
16:37:32  15     AND INT.CONTACT_DATE BETWEEN to_date('01/01/06', 'DD/MM/RR') and to_date('12/31/06', 'DD/MM/RR') 
16:37:32  16     AND INT.TECHNIQUE IN ('IP','PT','CP','CU','CV','WC','PR', 'BQ','OD','PF','TR','AN','IB','TD', 'TN') 
16:37:32  17     AND INT.CONTACT_CATEGORY||NULL IN ('GP','FN','OP') 
16:37:32  18     AND ( ( 'SO' = 'SO' AND INT.SOLICITOR = ':B1' ) OR ( 'SO' = 'ST' AND INT.USER_INSERTING = ':B1' ) )
16:37:32  19  /

Execution Plan <--------- <<<<< W R O N G ---- USING CONTACT_DATE INDEX >>>>>>
----------------------------------------------------------
 0          SELECT STATEMENT Optimizer=CHOOSE (Cost=2856874 Card=1 Bytes=65)
 1        0   FILTER
 2        1     NESTED LOOPS (Cost=2856874 Card=1 Bytes=65)
 3        2       NESTED LOOPS (Cost=4 Card=1 Bytes=41)
 4        3         TABLE ACCESS (BY INDEX ROWID) OF 'BRIEF_NAMES' (Cost=2 Card=1 Bytes=16)
 5        4           INDEX (UNIQUE SCAN) OF 'UQ_AMBRF_BRIEF_NAME' (UNIQUE) (Cost=1 Card=1)
 6        3         TABLE ACCESS (BY INDEX ROWID) OF 'NAMES' (Cost=3 Card=1 Bytes=25)
 7        6           INDEX (UNIQUE SCAN) OF 'PK_NAMES' (UNIQUE) (Cost=2 Card=1)
 8        2       PARTITION RANGE (ITERATOR)
 9        8         TABLE ACCESS (BY GLOBAL INDEX ROWID) OF 'INTERACTIONS' (Cost=2856870 Card=1 Bytes=24)
10        9           INDEX (RANGE SCAN) OF 'IX_MSINT_CONTACT_DATE' (NON-UNIQUE) (Cost=178434 Card=30085221)



16:37:33 SQL> alter session set db_file_multiblock_read_count=24;

Session altered.

16:37:39 SQL> SELECT NA.ALPHA_NAME
16:37:43   2       , INT.TECHNIQUE ORIGINAL_TECH
16:37:43   3       , INT.SOLICITOR
16:37:43   4       , INT.USER_INSERTING
16:37:43   5       , DECODE(INT.TECHNIQUE, 'AN','CC', 'BQ','CC', 'CP','PH', 'CU','PH', 'CV','PH', 'IB','CC'
16:37:43   6                             , 'OD','CC', 'PF','CC', 'PR','PR', 'PT','VI', 'TD','PH', 'TN','PH'
16:37:43   7                             , 'TR','CC', 'WC','WC', 'IP','IP' ) TECHNIQUE 
16:37:43   8    FROM NAMES NA
16:37:43   9       , BRIEF_NAMES BN
16:37:43  10       , INTERACTIONS INT 
16:37:43  11   WHERE ':B1' = BN.BRIEF_NAME 
16:37:43  12     AND NA.ACCOUNT_ID = BN.ACCOUNT_ID 
16:37:43  13     AND NA.NAME_ID = BN.NAME_ID 
16:37:43  14     AND INT.STS != 'D' 
16:37:43  15     AND INT.CONTACT_DATE BETWEEN to_date('01/01/06', 'DD/MM/RR') and to_date('12/31/06', 'DD/MM/RR') 
16:37:43  16     AND INT.TECHNIQUE IN ('IP','PT','CP','CU','CV','WC','PR', 'BQ','OD','PF','TR','AN','IB','TD', 'TN') 
16:37:43  17     AND INT.CONTACT_CATEGORY||NULL IN ('GP','FN','OP') 
16:37:43  18     AND ( ( 'SO' = 'SO' AND INT.SOLICITOR = ':B1' ) OR ( 'SO' = 'ST' AND INT.USER_INSERTING = ':B1' ) )
16:37:43  19  /

Execution Plan  <----- <<<<<<  C O R R E C T  --- FULL TABLE SCAN ON INTERACTIONS >>>>>
----------------------------------------------------------
0          SELECT STATEMENT Optimizer=CHOOSE (Cost=2772568 Card=1 Bytes=65)
1        0   FILTER
2        1     NESTED LOOPS (Cost=2772568 Card=1 Bytes=65)
3        2       NESTED LOOPS (Cost=4 Card=1 Bytes=41)
4        3         TABLE ACCESS (BY INDEX ROWID) OF 'BRIEF_NAMES' (Cost=2 Card=1 Bytes=16)
5        4           INDEX (UNIQUE SCAN) OF 'UQ_AMBRF_BRIEF_NAME' (UNIQUE) (Cost=1 Card=1)
6        3         TABLE ACCESS (BY INDEX ROWID) OF 'NAMES' (Cost=3 Card=1 Bytes=25)
7        6           INDEX (UNIQUE SCAN) OF 'PK_NAMES' (UNIQUE) (Cost=2 Card=1)
8        2       PARTITION RANGE (ALL)
9        8         TABLE ACCESS (FULL) OF 'INTERACTIONS' (Cost=2772564 Card=1 Bytes=24)



16:37:43 SQL> 

What do you think?

Thanks,

Robert.

Tom Kyte
April 24, 2007 - 10:12 am UTC

what we need is for you to compare the card= values in the above to the ACTUALS reported in a tkprof.

are they close to reality

Something changed(?!)

Robert, April 26, 2007 - 10:39 am UTC

Tom,

I've been trying to get the tkprof you requested of the statements, but something has changed and I am unable to duplicate the results (!?). Will try to duplicate situation and get you the data you requested soon.

Thank you,

Robert.

Here is some of the info.....

Robert, April 27, 2007 - 11:00 am UTC

 
Tom,

I cannot reproduce problem *exactly* now... but here is enough info (I think) for what you were asking (showing Oracle's *actual* cardinality).

Both DBFMBRC settings are giving exactly the same explain plan now... but here is the results of the tkprof for the query....

SELECT /*+ db_file_mbrc=24 */ NA.ALPHA_NAME
     , INT.TECHNIQUE ORIGINAL_TECH
     , INT.SOLICITOR
     , INT.USER_INSERTING
     , DECODE(INT.TECHNIQUE, 'AN','CC', 'BQ','CC', 'CP','PH', 'CU','PH', 'CV','PH', 'IB','CC'
                           , 'OD','CC', 'PF','CC', 'PR','PR', 'PT','VI', 'TD','PH', 'TN','PH'
                           , 'TR','CC', 'WC','WC', 'IP','IP' ) TECHNIQUE
  FROM NAMES NA
     , BRIEF_NAMES BN
     , INTERACTIONS INT
 WHERE '???????' = BN.BRIEF_NAME
   AND NA.ACCOUNT_ID = BN.ACCOUNT_ID
   AND NA.NAME_ID = BN.NAME_ID
   AND INT.STS != 'D'
   AND INT.CONTACT_DATE BETWEEN to_date('01/01/06', 'MM/DD/RR') and to_date('12/31/06', 'MM/DD/RR')
   AND INT.TECHNIQUE IN ('IP','PT','CP','CU','CV','WC','PR', 'BQ','OD','PF','TR','AN','IB','TD', 'TN')
   AND INT.CONTACT_CATEGORY||NULL IN ('GP','FN','OP')
   AND ( ( 'SO' = 'SO' AND INT.SOLICITOR = '???????' ) OR ( 'SO' = 'ST' AND INT.USER_INSERTING = '???????' ) )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.02       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2   1380.98    2800.48    7960338    8866779          0           7
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4   1381.00    2800.50    7960338    8866779          0           7

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 587

Rows     Row Source Operation
-------  ---------------------------------------------------
      7  FILTER
      7   NESTED LOOPS
      1    NESTED LOOPS
      1     TABLE ACCESS BY INDEX ROWID OBJ#(36372)
      1      INDEX UNIQUE SCAN OBJ#(36563) (object id 36563)
      1     TABLE ACCESS BY INDEX ROWID OBJ#(34465)
      1      INDEX UNIQUE SCAN OBJ#(73436) (object id 73436)
      7    PARTITION RANGE ITERATOR PARTITION: KEY KEY
      7     TABLE ACCESS BY GLOBAL INDEX ROWID OBJ#(145334) PARTITION: ROW LOCATION ROW LOCATION
111828135      INDEX RANGE SCAN OBJ#(584500) PARTITION: KEY KEY (object id 584500)

09:54:41 SQL> select object_name
09:54:45   2    from dba_objects where object_id = 584500;

OBJECT
NAME
----------------------------------------
IX_MSINT_CONTACT_DATE

The point is is that I think this query should be using a full table scan instead of the ix_msint_contact_date index.

(note: I changed actual name in query to '???????')

Thanks,

Robert.

Tom Kyte
April 27, 2007 - 11:19 am UTC

my goal was to verify the estimated card= values where even close to the real, actual values (as reported by tkprof)

are they. (we don't even need to see the query to get the answer to that...)

Cardinality <> #rows

Robert, April 27, 2007 - 11:38 am UTC


Tom,

Explain plan card = 30,085,221
Actual rows = 111,828,135

I think I need to update my statistics(?)
Would this suggest I need a histogram? (the query predicate/index in question is using a date range).

Thank you,

Robert.

Tom Kyte
April 27, 2007 - 11:48 am UTC

look at each bit of the plan - are the various STEPS correct.

Not getting it?

Robert, April 27, 2007 - 12:13 pm UTC


Tom,

Here is the plan.
The only 'significant' table here (where 99% of the work is being done is the INTERACTIONS table).
My understanding of how the steps work together is this (in part)... The result set from the INTERACTIONS table is gathered (i.e. once) and combined with the result set of the other two tables (brief_names and names). Since the cardinality for these two tables (brief_names and names) is relatively small, then the main 'bottleneck' would be gathering the result set from the INTERACTIONS table. Therefore this is the place to focus (i.e. Best way to get result set from interactions table...INDEX access vs. Full Table Scan)... Correct?

If so..... what stats (i.e. 'truth') do I need to give CBO in order for it to make correct decision?

Thanks!

----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2856874 Card=1 Bytes=65)
1 0 FILTER
2 1 NESTED LOOPS (Cost=2856874 Card=1 Bytes=65)
3 2 NESTED LOOPS (Cost=4 Card=1 Bytes=41)
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'BRIEF_NAMES' (Cost=2 Card=1 Bytes=16)
5 4 INDEX (UNIQUE SCAN) OF 'UQ_AMBRF_BRIEF_NAME' (UNIQUE) (Cost=1 Card=1)
6 3 TABLE ACCESS (BY INDEX ROWID) OF 'NAMES' (Cost=3 Card=1 Bytes=25)
7 6 INDEX (UNIQUE SCAN) OF 'PK_NAMES' (UNIQUE) (Cost=2 Card=1)
8 2 PARTITION RANGE (ITERATOR)
9 8 TABLE ACCESS (BY GLOBAL INDEX ROWID) OF 'INTERACTIONS' (Cost=2856870 Card=1
Bytes=24)
10 9 INDEX (RANGE SCAN) OF 'IX_MSINT_CONTACT_DATE' (NON-UNIQUE) (Cost=178434
Card=30085221)


Tom Kyte
April 27, 2007 - 4:38 pm UTC

Ok, the optimizer guessed 30,085,221 in the last step

it got 111,828,135

off by a factor or 3/4.

(are you missing a join somewhere by the way? INT seems cartesian joined to the other two?)

It could be that histograms on contact date could help it get the right card=value for that index range scan.

but given this predicate:

AND INT.STS != 'D'
AND INT.CONTACT_DATE BETWEEN to_date('01/01/06', 'MM/DD/RR') and to_date('12/31/06', 'MM/DD/RR')
AND INT.TECHNIQUE IN ('IP','PT','CP','CU','CV','WC','PR', 'BQ','OD','PF','TR','AN','IB','TD',
'TN')
AND INT.CONTACT_CATEGORY||NULL IN ('GP','FN','OP')
AND ( ( 'SO' = 'SO' AND INT.SOLICITOR = '???????' ) OR ( 'SO' = 'ST' AND INT.USER_INSERTING =
'???????' ) )

if we could add technique, contact_category, sts to the end of that index to avoid millions of table access's - you would probably find the index to be the right path

Alberto Dell'Era, April 27, 2007 - 5:29 pm UTC

Have you noticed (good formatting is a bless, indenting makes to Heaven):
 AND ( 
     ( 'SO' = 'SO' AND INT.SOLICITOR      = '???????' ) 
  OR ( 'SO' = 'ST' AND INT.USER_INSERTING = '???????' ) 
     ) 

as written, SOLICITOR wants its place in the index as well, probably in the first position or second after CONTACT_DATE.

Probably another identical index, changing SOLICITOR to USER_INSERTING, is useful to optimize the sibling query that I'm sure is going to be found in the system as well:
 AND ( 
     ( 'SO' = 'ST' AND INT.SOLICITOR      = '???????' ) 
  OR ( 'SO' = 'SO' AND INT.USER_INSERTING = '???????' ) 
     ) 

A reader, March 09, 2009 - 4:54 pm UTC

We have a partitioned table and we are doing some certification of the stats job with our online users. The cpu was good until the stats job ran. Our stats job is executing the following for the 4 partitioned tables

SYS.DBMS_STATS.UNLOCK_TABLE_STATS(owner, tablename);
ALTER TABLE <owner>.<tablename> MOVE PARTITION <partition_name> COMPRESS UPDATE GLOBAL INDEXES PARALLEL;
ALTER TABLE <owner>.<tablename> MODIFY PARTITION <partition_name> REBUILD UNUSABLE LOCAL INDEXES;
SYS.DBMS_STATS.gather_table_stats(owner, tablename, partition_name);
SYS.DBMS_STATS.LOCK_TABLE_STATS(owner, tablename);

After this job ran, the online application started behaving badly, could this be due to a bad explain plan bcos of the stats or something ? I even tried to flush the shared pool but no use. Out of ideas on what else to try. Can you help ?


Tom Kyte
March 10, 2009 - 2:33 am UTC

well, the defaults to dbms_stats change version to version, so not sure what statistics you may or may not have gathered there, but yes, it sounds like it *could* be a plan change.

So, question to you - do you have your history, are you using AWR (if so, yes, you do and you can find what plans used to be and see what changed), if not do you have statspack from when it was "OK"

A reader, March 13, 2009 - 3:30 pm UTC

We have a set of new partitioned tables with a new application. To do statistics on these tables we are just running

dbms_stats.gather_table_stats(<owner>,<table>,<partition>);

Does only the partition get affected by this or something happens to the table as well ? What other options should we use for the partition stats ?

analyze and partitions

A reader, March 14, 2009 - 4:46 pm UTC

greetings thomas,

and thanks like always,

1) if i had dropped or truncated a partition, what is best, to analyze the effected partition or the entire table.

2) if i had added a partition do i need to analyze it or even analyze the table, if yes, why?

thanks in advance
Tom Kyte
March 16, 2009 - 9:31 am UTC

read link above, the optimizer magic one.


the answer is "it depends"

if you truncate a partition - the statistics are not reset for example, you'd sort of need to "fix" that

ops$tkyte%ORA10GR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   varchar2(30)
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('01-jan-2008','dd-mon-yyyy')) ,
 10    PARTITION part2 VALUES LESS THAN (to_date('01-jan-2009','dd-mon-yyyy'))
 11  )
 12  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t
  2  select to_date('1-jan-2007')+mod(rownum,365*2), rownum, rownum
  3    from dual
  4  connect by level <= 100000
  5  /

100000 rows created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select table_name, num_rows from user_tables where table_name = 'T'
  2  union all
  3  select partition_name, num_rows from user_tab_partitions where table_name = 'T';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                  100000
PART1                               50004
PART2                               49996

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table t truncate partition part1;

Table truncated.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select table_name, num_rows from user_tables where table_name = 'T'
  2  union all
  3  select partition_name, num_rows from user_tab_partitions where table_name = 'T';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                  100000
PART1                               50004
PART2                               49996

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', partname => 'PART1' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select table_name, num_rows from user_tables where table_name = 'T'
  2  union all
  3  select partition_name, num_rows from user_tab_partitions where table_name = 'T';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                   49996
PART1                                   0
PART2                               49996



and if you drop the partition, you might need to zap them again, to get accurate information:

ops$tkyte%ORA10GR2> alter table t drop partition part2;

Table altered.

ops$tkyte%ORA10GR2> select table_name, num_rows from user_tables where table_name = 'T'
  2  union all
  3  select partition_name, num_rows from user_tab_partitions where table_name = 'T';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                   49996
PART1                                   0





you could fix that in a few ways - read the associated link to get some ideas.

Order of Dictionary Tables updation while gathering stats at subpartition level

A reader, June 04, 2010 - 10:56 am UTC

If we are gathering stats at SubPartition level with default options. Can you please what are the dictionary tables that gets updated and order in which they get updated? The reason why I am asking this is stats gathering for one of partitions in the table is taking longer than usual. We want to where is taking time.


Tom Kyte
June 09, 2010 - 7:46 am UTC

you would be assuming you could use that knowledge somehow to peek at that session and see where it is - that would mean that stats gathering session would have to commit (else you cannot see it), but it won't. That approach cannot work.

Not that anyone could reliably tell you the order of anything since it would change from release to release, patch to patch.


You can use ASH/AWR if you have access to them to see what it was waiting for/doing.

You can use sql_trace to see detailed information about what it did and what order it did it.

You can use statspack at an aggregate level if this is the only job running to gain insight as well - take a snap, gather stats, take a snap and report.

Order of Dictionary Tables updation while gathering stats at subpartition level

A reader, June 04, 2010 - 10:56 am UTC

If we are gathering stats at SubPartition level with default options. Can you please tell what are the dictionary tables that gets updated and order in which they get updated? The reason why I am asking this is stats gathering for one of partitions in the table is taking longer than usual. We want to where is taking time.


Impact of number of rows returned

A reader, June 22, 2011 - 3:38 pm UTC

In a comment you made on this thread, you mentioned the inacurate statistics would make Oracle

a) going to get tons and tons of rows when in fact it'll get none.
b) going to get no rows when in fact it'll get tons and tons.

What kind of impact on performance it would have if statistics led Oracle picked the same execution path without right rows estimation? Would Oracle allocate wrong size of CPU or RAM thus imperil elapse time?
Tom Kyte
June 23, 2011 - 7:48 am UTC

You need to put the words "think it is" in front of those.


If we were to accidentally get the right plan even with bad statistics - then it would perform exactly as if we got it right on purpose. There would be no difference.

Jess, October 07, 2011 - 9:43 am UTC

Hi Tom,

Running a select from one partition, I would've thought to expect the optimizer to pick 'partition range single' and use partition stats, as it knows exactly which one it's hitting, regardless of whether the stats were up to date (provided there were some stats).

select count (*) from fact_transactions where
txn_date >= to_date('19-SEP-2011 00:00:00','DD-MON-YYYY HH24:MI:SS') and
txn_date <= to_date('19-SEP-2011 23:59:59','DD-MON-YYYY HH24:MI:SS')


When I run a query plan to select from one partition that does NOT have up-to-date stats, it shows access with 'partition range single', which is what I would expect for partition pruning, as it would use that partition's stats (leaving aside for the minute the fact that stats aren't representative of data--clearly Oracle doesn't know that at this point).

---------------------------------------------------------------------------------
|Id|Operation | Name |Rws|Bts|Cst%CPU|Time |Pstrt|Pstp|
---------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 1| 9| 0 (0)|00:00:01| | |
| 1| SORT AGGREGATE | | 1| 9| | | | |
| 2| PARTITION RANGE SINGLE| | 1| 9| 0 (0)|00:00:01| 794| 794|
|*3| INDEX RANGE SCAN |FTX_CALKEYTXNID| 1| 9| 0 (0)|00:00:01| 794| 794|
---------------------------------------------------------------------------------

But when I run the same count(*) for a similarly sized partition that's up-to-date on stats,
I get 'iterator' instead. (Not comparing cost figures because of differences in stats).

----------------------------------------------------------------------------------
|Id|Operation |Name |Rows |Bts|Cst%CPU |Time |Pstrt|Pstp|IN-OUT|
----------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 1 | 8 |2443 (1)|00:00:44| | | |
| 1| SORT AGGREGATE | | 1 | 8 | | | | | |
| 2| PX COORDINATOR | | | | | | | | |
| 3| PX SEND QC (RANDM)|:TQ10000 | 1 | 8 | | | | | P->S |
| 4| SORT AGGREGATE | | 1 | 8 | | | | | PCWP |
| 5| PX BLOCK ITERTR | |7270K|55M|2443 (1)|00:00:44| 828| 828| PCWC |
|*6| TBL ACCESS FULL|FACT_TXNS|7270K|55M|2443 (1)|00:00:44| 828| 828| PCWP |
----------------------------------------------------------------------------------

Does that mean it's not partition pruning properly or am I misinterpreting block iterator?
Are single partition statistics actually used in this case? Why is it trying to iterate when there's only one partition to speak of?

I thought that 'range single' was a better plan than 'block iterator'. Is that not so?
Also, this seems to be running in parallel whereas the first one isn't. Is there a reason for this difference in the plans? I am presuming I want the former (albeit with accurate stats). If so, does that mean I need to force the optimizer to do it?

Thank you.


Tom Kyte
October 07, 2011 - 2:04 pm UTC

it started using parallel query so you got an entirely different plan. It is hitting but one partition - it is doing so in parallel. it is iterating over the parallel chunks. Nothing amiss here.

PX BLOCK = parallel block.

Both plans are appropriate for their respective data sets.

Jess, October 10, 2011 - 6:33 am UTC

Thanks Tom!

partition stats useful if index used?

A reader, October 26, 2011 - 12:22 pm UTC

This is a snippet from a plan that runs fast - snipped a bit so it would fit


PARTITION RANGE ALL | | 1 14 |
| TABLE ACCESS FULL | | APARTITIONEDTABLE | 8809K| 243M|

In this case, the partitioned table is made up of 12 partitions each for the last 12 years. The queries always span at most 2 partitions of the twelve, but not on the boundaries. The partitions are Jan1-Dec31, but the queries often span from july to july the following year for example. In this case, is the partitioned table being full scanned and *then* pruned? I am 99% sure the partition range scan refers to the partitioned table that is being full scanned on the indented step. Which kind of stats would be used for that? You mentioned many posts ago that a partition range scan would "check" all the partitions.. what kind of work does than entail?

Now here is a similar snippet when the range is expanded just a couple months but still only uses two partitions.

PARTITION RANGE ALL 1 | 14 | INDEX RANGE SCAN ALOCALINDEXONPARTITIONEDTABLE| 78 | |

And this runs horribly. Is there any way to tell whether this "method" of going to a partition range all is less effective? Global stats or local stats are used?
Tom Kyte
October 26, 2011 - 1:17 pm UTC

This is saying there are 14 partitions - you are hitting partition 1 through 14. Are you sure about 12?

Look at the pstart/pstop columns in the plan - they tell you what partitions will be hit. In this case, it is going to full scan 1-14, no pruning (unless there is a 15th partition I don't know about - if so, it was pruned)

global stats would be used for any query that hits more than one partition

Thank you -

A reader, October 26, 2011 - 2:19 pm UTC

So Tom.. any query that hits more than one partition, will use global stats. Any query that hits more than one subpartition with not use subpartition stats in likewise fashion?

Thanks,
- D
Tom Kyte
October 26, 2011 - 2:39 pm UTC

local stats are used if and only if we can identify at hard parse time that the query hits exactly one partition and we know what that partition is.

else global stats are used.

sorry about spelling error

A reader, October 26, 2011 - 2:20 pm UTC

I meant will not use.. not with not use.. sorry.

"hitting a partition"

A reader, October 26, 2011 - 2:24 pm UTC

Can you define "hitting a partition"? Does that mean the equivalent of a full table scan of a partitioned table?

Thanks.
Tom Kyte
October 26, 2011 - 2:40 pm UTC

hitting a partition means we will process that partition, we will read that partition, we will touch that partition.

It might be a full scan of the partition
It might be an index access by local rowid
It might be an index access by global rowid

It just means we'll be reading - or have the change to be reading - something from that partition.

Thanks again Tom - this will be in the plan?

A reader, October 26, 2011 - 3:08 pm UTC

Tom -
could mean full table scan,
could mean index range scan,
could mean rowid -

That would show up in our explain plan or tkprof? Or not always?
Tom Kyte
October 26, 2011 - 3:39 pm UTC

yes, it would show up in your plan - that is the entire goal of a plan - to show you the access method being used.


Jaruwan Na Ranong, June 15, 2012 - 4:00 pm UTC

I have a 1TB reporting database which daily perform partition exchange multiple times (based on region, i.e. AMER, APAC, etc) from etl database about 300+ regular table into 300+ partitioned table partition by region. I notice that 11g AUTOTASK job on reporting database attempt to analyze partitioned table at the global level each night and run out of 4 hours windows. My questions are.
1. Are there any benefit to let AUTOTASK collect global stats level each night? The reporting database is designed to be globally available to all users under different timezone. Or benefit to have global stats up to date?
2. Is there a way to not let AUTOTASK collect global stats on partitioned table or to trick AUTOTASK that global stats is up to date. Because the reporting database plan to add more region and growth into 2TB end of this year. If we have to do global stats every night which sometime may run into same time when Transport Tablespace is executing, it will be time and resource consuming.

Thank you.
Tom Kyte
June 16, 2012 - 4:22 am UTC

in a database like this - you KNOW when the stats need to be gathered and in fact you can gather the stats way before time. why wouldn't your statistics updates be coordinated with when you attach the new data - that is when you need them!

You don't mention a version here at all :( In 11g you would be using the new incremental statistics and never gathering global statistics again.

Elaborate

A reader, January 29, 2019 - 8:33 am UTC

Tom said in first post "
This query knows it'll partition eliminate, but not down to "which partition", therefore the optimizer will use the table level statistics to optimize this since it doesn't know "partition 1, 2 or 3"


Is still true on 12.1?

How could use binds and force optimizer to usePartition level stats.?
My case is all query always select from a single partition. Never combining more than one
Connor McDonald
February 06, 2019 - 1:09 am UTC

You should be ok, because we can peek at the values

SQL> create table t ( x int, y int )
  2  partition by list ( x )
  3  (
  4  partition p1 values (1),
  5  partition p2 values (2)
  6  );

Table created.

SQL>
SQL> insert into t s
  2  select 1 , rownum from dual
  3  connect by level <= 1000;

1000 rows created.

SQL>
SQL> insert into t s
  2  select 2 , rownum from dual
  3  connect by level <= 50000;

50000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> exec dbms_stats.gather_table_stats('','T');

PL/SQL procedure successfully completed.

SQL>
SQL> select partition_name, num_rows
  2  from user_tab_partitions
  3  where table_name = 'T';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
P1                                   1000
P2                                  50000

SQL>
SQL> variable v number
SQL> exec :v := 1

PL/SQL procedure successfully completed.

SQL>
SQL> select max(y)
  2  from t
  3  where x = :v;

    MAX(Y)
----------
      1000

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0haq2z8y5nah1, child number 0
-------------------------------------
select max(y) from t where x = :v

Plan hash value: 3783435421

-----------------------------------------------
| Id  | Operation             | Name | E-Rows |
-----------------------------------------------
|   0 | SELECT STATEMENT      |      |        |
|   1 |  PARTITION LIST SINGLE|      |      1 |
|   2 |   SORT AGGREGATE      |      |      1 |
|   3 |    TABLE ACCESS FULL  | T    |   1000 |
-----------------------------------------------


But you would need adaptive cursor sharing to kick in if you had identical queries which just changed by the bind variable.

Elaborate

A reader, January 29, 2019 - 8:33 am UTC

Tom said in first post "
This query knows it'll partition eliminate, but not down to "which partition", therefore the optimizer will use the table level statistics to optimize this since it doesn't know "partition 1, 2 or 3"


Is still true on 12.1?

How could use binds and force optimizer to usePartition level stats.?
My case is all query always select from a single partition. Never combining more than one

Pls elaborate differently

A reader, February 06, 2019 - 7:17 am UTC

What I expected is you do the stats only on partition level in order to test. The question is relative to partitions . Does it find accurate e-rows when global stats are stale and only partition level stats are fine when using binds?.

2- pls elaborate about how configure adaptative cursor sharing.
Thanks.
Connor McDonald
February 07, 2019 - 2:01 am UTC

You can see from the example that it used partition stats - the estimates matched the partition numbers. If it was global, you would not get the exact numbers estimated.

SQL> exec dbms_stats.set_table_stats('','T',numrows=>2000);

PL/SQL procedure successfully completed.

SQL>
SQL> variable v number
SQL> exec :v := 1

PL/SQL procedure successfully completed.

SQL>
SQL> select max(y)
  2  from t
  3  where x = :v;

    MAX(Y)
----------
      1000

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  0haq2z8y5nah1, child number 0
-------------------------------------
select max(y) from t where x = :v

Plan hash value: 3783435421

-----------------------------------------------
| Id  | Operation             | Name | E-Rows |
-----------------------------------------------
|   0 | SELECT STATEMENT      |      |        |
|   1 |  PARTITION LIST SINGLE|      |      1 |
|   2 |   SORT AGGREGATE      |      |      1 |
|   3 |    TABLE ACCESS FULL  | T    |   1000 |
-----------------------------------------------


Adaptive cursor sharing is just there by default - nothing needed to be configured

i did the test

Moris, February 07, 2019 - 2:12 pm UTC

create table t ( x int, y int )
partition by list ( x )
(
partition p1 values (1),
partition p2 values (2)
);
insert into t s
select 1 , rownum from dual
connect by level <= 1000;

insert into t s
select 2 , rownum from dual
connect by level <= 50000;

variable v number
exec :v := 1

select max(y)
from t
where x = :v
;

==> 1000

select * from table(dbms_xplan.display_cursor(format =>'ALLSTATS LAST'));

==> E-rows 1 ..so normal.

SQL_ID 06hjyp6hvr9c5, child number 0
-------------------------------------
select max(y) from t where x = :v

Plan hash value: 746180402

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 37 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 37 |
| 2 | PARTITION LIST SINGLE| | 1 | 1 | 1000 |00:00:00.01 | 37 |
| 3 | TABLE ACCESS FULL | T | 1 | 1 | 1000 |00:00:00.01 | 37 |
-----------------------------------------------------------------------------------------

Note
-----
- dynamic statistics used: dynamic sampling (level=2)


now i force the stats
exec dbms_stats.set_table_stats('','T','P1',numrows=>1000);

select max(y) from t where x = :v ;

==> 1000

the E-rows remain the same !!

SQL_ID 06hjyp6hvr9c5, child number 0
-------------------------------------
select max(y) from t where x = :v

Plan hash value: 746180402

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 37 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 37 |
| 2 | PARTITION LIST SINGLE| | 1 | 1 | 1000 |00:00:00.01 | 37 |
| 3 | TABLE ACCESS FULL | T | 1 | 1 | 1000 |00:00:00.01 | 37 |
-----------------------------------------------------------------------------------------

Note
-----
- dynamic statistics used: dynamic sampling (level=2)








Connor McDonald
February 08, 2019 - 6:32 am UTC

Because we can peek at the values of the bind variable. This is why I *ran* the statement and then used "display_cursor". That gives me the *real* plan the optimizer used.

If I just use plain old explain plan, it "lies" to me, because I see plan that is absent of peeking (and hence not really what is happening)

SQL> variable v number
SQL> exec :v := 1

PL/SQL procedure successfully completed.

SQL> explain plan for select max(y) from t where x = :v;

Explained.

SQL> select * from dbms_xplan.display();

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
Plan hash value: 3783435421

----------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |     8 |   275   (1)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE|      |     1 |     8 |            |          |   KEY |   KEY |
|   2 |   SORT AGGREGATE      |      |     1 |     8 |            |          |       |       |
|   3 |    TABLE ACCESS FULL  | T    | 25500 |   199K|   275   (1)| 00:00:01 |   KEY |   KEY |
----------------------------------------------------------------------------------------------


complements previaus post

Moris, February 07, 2019 - 2:20 pm UTC

but if a specifiy the partition ...i can see the accurate E-rows :

select max(y) from t partition for(1) ; -- where x = 1 ;

--> 1000

SQL_ID 3km2uwam7b49p, child number 0
-------------------------------------
select max(y) from t partition for(1)

Plan hash value: 2831600127

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 37 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 37 |
| 2 | PARTITION LIST SINGLE| | 1 | 1000 | 1000 |00:00:00.01 | 37 |
| 3 | TABLE ACCESS FULL | T | 1 | 1000 | 1000 |00:00:00.01 | 37 |
-----------------------------------------------------------------------------------------


so HOW could make the optimizer get the correct E-rows for a givin partition without specifiying the PARTITION clause.
cause in my package all selects are binded
ie. .. WHERE key = :1 (whereas :1 always is the partition value)

Misunderstood

Moris, February 08, 2019 - 12:55 pm UTC

Hi connor
No it was about execution plan not explain plan as also you can see the e ans a rows.
You can do exactly what a did and you'll find same result. I didn't get the reasons please re read my previous two posts.
Thanks for help
Connor McDonald
February 11, 2019 - 4:49 am UTC

You're missing out a crucial step :-)

SQL> create table t ( x int, y int )
  2  partition by list ( x )
  3  (
  4  partition p1 values (1),
  5  partition p2 values (2)
  6  );

Table created.

SQL> insert into t s
  2  select 1 , rownum from dual
  3  connect by level <= 1000;

1000 rows created.

SQL>
SQL> insert into t s
  2  select 2 , rownum from dual
  3  connect by level <= 50000;

50000 rows created.

SQL> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------


SQL> select partition_name, num_rows from user_tab_partitions where table_name = 'T';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
P1
P2


Continue test

Moris, February 11, 2019 - 7:03 am UTC

In your last post just continue the test..
As I did above by using dbms stats set table stat. Do not stat the global table but only the partition by a set table stat.
Set the stats for partition p1 and you can see then the execution plan.

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.