Skip to Main Content
  • Questions
  • Performance poor after analyzeing table

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: March 20, 2002 - 4:42 pm UTC

Last updated: January 24, 2011 - 7:15 am UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Hi Tom,

As my understanding, it will use Cost Based Optimzer after analyzing a table. But sometimes the preformance is very pool after analyzing. So shall we need to monitor all SQLs and change them to use Hint if have performance problems?

Eg: we have a table ERROR_TBL for 3,528,434 rows
Name Null? Type
---------------------------------- -------- -------------------------
ERROR_NUM NOT NULL NUMBER(10)
PROCEDURE_NAME VARCHAR2(40)
DATA_OWNER VARCHAR2(20)
OPERATION VARCHAR2(1)
ERROR_DATE DATE
ERROR_VALUE VARCHAR2(250)
ERROR_MESSAGE VARCHAR2(2000)

primary key ERROR_TBL_PK(ERROR_NUM)
index ERROR_TBL_IDX(PROCEDURE_NAME)

SQL>
select * from error_tbl where procedure_name = 'tbs_space_prc';
2 rows selected

Explain Plan before analyzing:
--------------------------------------------------------------------
SELECT CHOOSE () , COST=0, CARDINALITY=0
TABLE ACCESS (BY INDEX ROWID) ERROR_TBL , COST=0, CARDINALITY=0
INDEX (RANGE SCAN) ERROR_TBL_IDX , COST=0, CARDINALITY=0

Then I did:
SQL> analyze table ERROR_TBL compute statistics;
SQL> select index_name
, num_rows, sample_size, last_analyzed
, user_stats, blevel, leaf_blocks, distinct_keys
, avg_leaf_blocks_per_key, avg_data_blocks_per_key
from user_indexes
where table_name = 'ERROR_TBL';

INDEX_NAME NUM_ROWS SAMPLE_SIZE BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------- ---------- ----------- ---------- ----------- ------------- ----------------------- -----------------------
ERROR_TBL_IDX 3528434 3528434 3 31876 4 7969 15047
ERROR_TBL_PK 3528434 3528434 2 7092 3528434 1 1


Explain Plan after analyzing:
--------------------------------------------------------------------
SELECT CHOOSE () , COST=9128, CARDINALITY=882109
TABLE ACCESS ANALYZED (FULL) ERROR_TBL, COST=9128, CARDINALITY=882109

this took my 69 seconds for selecting 2 rows

We have lots of million rows' tables. And we analyzed all tables
recently using compute statistics. Some made the same problems after analyzing. But I cannot rewrite all SQLs for applications which wrote by EJB.

How can I do?

Thanks in advance
Ling


and Tom said...

Here you have massively skewed data with really small numbers of distinct values.

You have 3,528,434 rows. There are 4 values of procedure_name. The optimizer is looking at that and saying "oh my, thats not good is it".

Look into the analyze command -- in particular:

analyze table t
compute statistics
for TABLE
for all indexes
for all indexed columns SIZE <N>;

you are probably seeing that the default histrogram bucket size is not appropriate for you. Play with different values of N and see what happens with your query.




Rating

  (153 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Ling, March 21, 2002 - 9:19 am UTC

1. How about bitmap index?
I also used bitmap index
SQL> create  bitmap index error_tbl_idx on ERROR_TBL(PROCEDURE_NAME);
SQL> analyze index error_tbl_idx compute statistics;

But the Explain Plan still same
_________________________________________________________
SELECT CHOOSE ()    , COST=9128, CARDINALITY=882109
  TABLE ACCESS ANALYZED (FULL) ERROR_TBL   , COST=9128, CARDINALITY=882109

2. If I change init.ora to use RULE optimizer, it's better
or not for whole DB performance? Does it not support RBO in future?
 

Tom Kyte
March 21, 2002 - 5:04 pm UTC

since the data in this column is so massively SKEWED and you haven't played with the histograms as I directed -- the answer is a resounding NO.

You have a select *.  So, we need to go from index to table.  Going from index to table for a large percentage of the rows is a no-no.  The optimizer is aware that are 4 unique values but doesn't know that the value of procedure_name = 'tbs_space_prc' is going to return just a FEW rows, its thing 3528434/4 rows -- regardless of the index type!


So, consider this example and use this approach:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t as select mod(rownum,3) X, a.* from all_objects a;
Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into t select * from t;
17309 rows created.

...

ops$tkyte@ORA817DEV.US.ORACLE.COM> insert /*+ append */ into t select * from t;
553888 rows created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.

ops$tkyte@ORA817DEV.US.ORACLE.COM> select count(*) from t;

  COUNT(*)
----------
   1107776

ops$tkyte@ORA817DEV.US.ORACLE.COM> update t set x = 4 where rownum <= 10;
10 rows updated.

<b>so, I have 1million records.  10 of which have the #4.  the rest have values of 0, 1, 2.  Now I index that column</b>

ops$tkyte@ORA817DEV.US.ORACLE.COM> create index t_idx on t(x) unrecoverable parallel 8
  2  /

Index created.

<b>and analyze...</b>

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics;
Table analyzed.

<b>now lets see what happens:</b>


ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly explain
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t where x = 4;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=446 Card=276944 Bytes=24371072)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=446 Card=276944 Bytes=24371072)

<b>full tablescan just like you.  Now, lets play with the histograms:</b>

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics for columns x size 100;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t where x = 4;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=10 Bytes=880)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=4 Card=10 Bytes=880)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=3 Card=10)

<b>and the optimizer has the right info -- does the right thing.  Just to validate, lets see what happens when we user the #1:</b>

ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t where x = 1;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=446 Card=369276 Bytes=32496288)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=446 Card=369276 Bytes=32496288)


<b>back to a full scan - AS IT SHOULD be for that one</b>
 

data skew

Mikito Harakiri, March 21, 2002 - 5:43 pm UTC

Tom, with 4 distinct values wouldn't a histogram with just 4 buckets suffice?

Ling, can you run

select procedure_name, count(1) from error_tbl
group by procedure_name

to see how skewed your column is?

Tom Kyte
March 21, 2002 - 7:29 pm UTC

yes it would


ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics for columns x size 4;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly explain
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t where x = 4;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=10 Bytes=880)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=4 Card=10 Bytes=880)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=3 Card=10)


however, it wouldn't if you insert but one more row.

I would <b>hope</b> the reader would say "ah, histograms, interesting concept, let me go read about them and learn about them so I can use them efficiently, effectively and to their maximum</b> (i did tell them to PLAY with it) and figure out what exactly N should be.

You know -- if I put 4 in there, someone would say "hey, you mean I have to know exactly how many distinct values, yadda yadda yadda..."

(count(*) is more meaningful then count(*) and is exactly the same...) 

question about SIZE=N

Alexander Rakhalsky, March 22, 2002 - 5:37 am UTC

    Hi, Tom!
    1. I have a question about "SIZE N" clause of ANALYZE statement. I supposed, what N determines number of buckets BEFORE histogram compression (what is removing buckets with same values in ALL_TAB_HISTOGRAMS.ENDPOINT_VALUE). So, if I issue ANALYZE ... SIZE 10, maximum number in ALL_TAB_HISTOGRAMS.ENDPOINT_NUMBER (or BUCKET_NUMBER) will be 10. Because that (and also fact, that Oracle uses height-balanced histograms), I was wondered on your statement, what SIZE=number of distinct values will be sufficient. After some experiments (like yours on this page) I suppose, that N in SIZE clause determines number of buckets AFTER histogram compression. See listing:

SQL> SELECT num_rows
  2    FROM user_tables
  3    WHERE table_name = 'A';

 NUM_ROWS
---------
  1369680

SQL> SELECT owner, COUNT(*)
  2    FROM a
  3    GROUP BY owner;

OWNER                           COUNT(*)
------------------------------ ---------
DBSNMP                               512
PLT                              1375488
SYSTEM                             32768

SQL> ANALYZE TABLE a
  2    ESTIMATE STATISTICS
  3    FOR ALL INDEXED COLUMNS SIZE 2;

Table analyzed.

SQL>  SELECT *
  2     FROM user_histograms
  3     WHERE table_name='A'
  4     AND column_name='OWNER';

TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
---------- ---------- --------------- --------------
A          OWNER                  705      4,169E+35
A          OWNER                  723      4,328E+35

SQL> SET AUTOTRACE TRACEONLY EXPLAIN
SQL>  SELECT *
  2     FROM a
  3     WHERE owner='DBSNMP';

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=948 Bytes=21804)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'A' (Cost=12 Card=948 Bytes=21804)
   2    1     INDEX (RANGE SCAN) OF 'Q_IND' (NON-UNIQUE) (Cost=5 Card=948)

SQL> SELECT *
  2    FROM a
  3    WHERE owner='PLT';

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=458 Card=1335581 Bytes=30718363)
   1    0   TABLE ACCESS (FULL) OF 'A' (Cost=458 Card=1335581 Bytes=30718363)

    Is it correct? And if so, how Oracle chooses real number of buckets (723 in example)? Can I adjust this algorithm?
    2. By reading your answer someone can think: "I can collect statistics on my massively skewed data and will forget about anyperformance problems." However, I will point to MAIN histogram's limitation  - they useless if you are using bind variables! See listing:

SQL> var my_var VARCHAR2(10)
SQL> EXEC :my_var:='DBSNMP';

PL/SQL procedure successfully completed.

SQL> select count(*)
  2  from a
  3  where owner =:my_var;

 COUNT(*)
---------
      512

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=458 Card=1 Bytes=11642280)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'A' (Cost=458 Card=684840 Bytes=11642280)

    Because most queries in typical application USES bind variables (and you, Tom, always agitated for it), I offer following rule: "Always hint your queries on massively skewed data, because only you (developer) know, what values you will pass for bind variables at runtime". What you think? 

Tom Kyte
March 22, 2002 - 9:46 am UTC

I'll answer this "upside down" -- I've included a support note that should cover the first part of this question (available on metalink as well).  I will address the second part about bind variables directly.

With skewed information -- there are two cases:

o I am an OLTP/general purpose system with lots of users/queries.
o I am a datawarehouse with few users and few really really big queries.

99% of the world falls into category 1.  For them -- "bind variables rule".  In a data warehouse, this is the one case where the hard parse issue is not as large (data warehouse = queries that run for minutes or hours and you have fewer queries overall executing in a day.  Hence the contention on the shared pool is not relevant and the % of the execute time of the query taken by parsing is trivial).  In an OLTP/general use system, a hard parse can frequently take LONGER THEN the query itself!  Also, you are executing TONS more queries in this sort of system so contention on the shared pool is extremely heavy if you hard parse tons and tons.

So, If I am in an general use system and have skewed data -- I typically know about it.  Consider the system that has a table with a million rows in it representing "work".  As new work is added to the system -- it gets a status code of "1".  A completed piece of work gets a status code of "2", "3",  or "4" representing the outcome (think of it like a queue of work to be done).  Hopefully, you have very few records with a status code of "1" and almost all records are "completed" (it is an order entry system for example - 1 = new order, 2 = processed order, 3 = shipped order, 4 = cancelled order).  Your application will run queries such as:

  o I need some work to work on -- select * from t where status = 1 
    and customer = :customer_name_i_work_on;

  o I need to find the ratio of shipped to cancelled orders
  o etc

That is, you have different queries that will refer to explicit status codes.  There will be other bind variables in there for the variant portion of the query -- but for the skewed column (remember it has a finite domain of well known values) you will be using the value in there.  You still have high reuse on queries (everone in the system uses that first query 10's of thousands of times).  You do not always bind EVERY value -- only those values in a given query that VARY over time (and the status code here NEVER varies).

Query 1 above will use the index (if you have one) on status since few records have a status of 1.  The second query would not -- it would full scan since the status's you are interested in represent the vast majority of the table.

As for HINTING -- I would never even consider it!!  Think of it -- if you *know* the skew of the data so much that you know when to hint -- you must know the values then (and this set of values is small).  You should not use a bind variable for that particular column, you would not be generating an infinite number of unique queries (you have a small well defined set of values here remember) so the hard parse issue is not an issue here -- shared sql is still effectively used.  This is one reason why I don't LIKE cursor_sharing as a final solution (only as a bandaid).  It OVER BINDS which can be just as bad as NOT BINDING.  You use binds on columns where the input values vary from call to call -- NOT on columns where the value is static from call to call.

Hope that makes sense (i don't like hints, they too are what I would consider a bandaid)

As for the first part of your two parter ;)  Hope this support note (i don't write these) clears it up.



Article-ID:         <Note:72539.1>
Circulation:        PUBLISHED (EXTERNAL)
Folder:             server.Performance.SqlTuning
Topic:              Optimizer Statistics and ANALYZE (Inc Histograms)
Title:              Interpreting Histogram Information
Open-Remarks:       See <RemarksOn:NOTE:72539.1>
Document-Type:      BULLETIN
Impact:             MEDIUM
Skill-Level:        NOVICE
Updated-Date:       20-MAR-2002 10:44:38
References:         
Shared-Refs:        
Attachments:        NONE
Content-Type:       TEXT/PLAIN
Keywords:           CBO; 
Products:           5/RDBMS (V7.3.X to V8.1.X); 
Platforms:          GENERIC;  

PURPOSE
To provide an understanding of how histogram information is stored and 
can be interpreted.

SCOPE & APPLICATION
Familiarity of the Cost Based Optimizer is useful.
 
RELATED DOCUMENTS
@<Note:50750.1>
<Note:1031826.6>

Where there is a high degree of skew in the column distribution, called a
non-uniform distribution of data, histograms should lead to a better 
estimation of selectivity. This should produce plans that are more likely 
to be optimal.  

The histogram approach provides an efficient and compact way to represent
data distributions. 

When building histograms the information it stores is interpreted differently
depending on whether the number of buckets requested is less than the number 
distinct values or if it is the same.  Specifically, ENDPOINT_NUMBER and 
ENDPOINT_VALUE in dba/user/all_histograms would have different meanings.


EXAMPLE
-------

Table TAB1

SQL> desc tab1
 Name                            Null?    Type
 ------------------------------- -------- ----
 A                                        NUMBER(6)
 B                                        NUMBER(6)

Column A contains unique values from 1 to 10000.

Column B contains 10 distinct values. The value '5' occurs 9991 times. Values
'1, 2, 3, 4, 9996, 9997, 9998, 9999, 10000' occur only once.

Test queries:

(1) select * from tab1 where b=5;
(2) select * from tab1 where b=3;

Both the above queries would use a FULL TABLE SCAN as there is no other 
access method available.

Then we create an index on column B.

select lpad(INDEX_NAME,10), lpad(TABLE_NAME,10),
       lpad(COLUMN_NAME,10), COLUMN_POSITION, COLUMN_LENGTH 
from user_ind_columns
where table_name='TAB1'

SQL> /

LPAD(INDEX LPAD(TABLE LPAD(COLUM COLUMN_POSITION COLUMN_LENGTH
---------- ---------- ---------- --------------- -------------
    TAB1_B       TAB1          B               1            22

Now, 

(1) select * from tab1 where b=5;
(2) select * from tab1 where b=3;

Both do an INDEX RANGE SCAN to get the ROWID to do a lookup in the table.

With an INDEX present, it would preferrable to do an INDEX RANGE SCAN for
query (2), but a FULL TABLE SCAN for query (1).


ANALYZING THE TABLE
-------------------

Next, analyze the table using compute statistics:

SQL> analyze table tab1 compute statistics;

From dba_tables:

  NUM_ROWS     BLOCKS EMPTY_BLOCKS  AVG_SPACE  CHAIN_CNT AVG_ROW_LEN
---------- ---------- ------------ ---------- ---------- -----------                              
     10000         64            0         86          0          10

From dba_tab_columns:

NUM_DISTINCT LOW  HIGH   DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANALYZ SAMPLE_SIZE
------------ ---- ---- --------- ---------- ----------- ----------- -----------
       10000 Full Full     .0001          0           1 30-JUN-1999       10000
          10 Full Full        .1          0           1 30-JUN-1999       10000

For Oracle7, from user_histograms:

SQL> select lpad(TABLE_NAME,10), lpad(COLUMN_NAME, 10), 
  2  bucket_number, endpoint_value 
  3  from user_histograms 4  where table_name='TAB1';

TABLE_NAME COLUMN_NAME BUCKET_NUMBER ENDPOINT_VALUE
---------- ----------- ------------- --------------
      TAB1           A             0              1
      TAB1           A             1          10000
      TAB1           B             0              1
      TAB1           B             1          10000

For Oracle8, from user_tab_histograms:

SQL> select lpad(TABLE_NAME,10), lpad(COLUMN_NAME, 10), 
  2  bucket_number, endpoint_value 
  3  from user_tab_histograms 4  where table_name='TAB1';

Analyze has created 1 BUCKET for each column. So all values for the column
are in the same bucket.  The BUCKET_NUMBER represents the BUCKET NUMBER and 
ENDPOINT_VALUE represents the last column value in that bucket.

Now query (1) and (2) ; both do a FULL TABLE SCAN. 

So, the fact that you have statistics about the table and columns does not
help the optimizer to distinguish between how many of each value we have.
The reason it does a FULL TABLE SCAN is because there is a 1 BUCKET histogram
and any value selected for should be in that bucket.


CREATING HISTOGRAMS -------------------

What you need now is to create histograms so the Optimizer knows how many 
values occur for each column.

Query (1): select * from tab1 where b=5;
           should do a FULL TABLE SCAN   and

Query (2): select * from tab1 where b=3;
           should do an INDEX RANGE SCAN

SQL> analyze table tab1 compute statistics for columns b size 10;

select lpad(TABLE_NAME,10), lpad(COLUMN_NAME, 5), 
       endpoint_number, endpoint_value 
from user_histograms;

TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE 
      TAB1           B               1              1
      TAB1           B               2              2
      TAB1           B               3              3
      TAB1           B               4              4
      TAB1           B            9995              5
      TAB1           B            9996           9996
      TAB1           B            9997           9997
      TAB1           B            9998           9998
      TAB1           B            9999           9999
      TAB1           B           10000          10000

So, now there are statistics on the table and on the columns.

You requested 10 buckets (size 10) and there are 10 distinct values.

The ENDPOINT_VALUE shows the column value and the ENDPOINT_NUMBER
shows the cumulative number of rows. 

For example, for ENDPOINT_VALUE 2, it has an ENDPOINT_NUMBER 2, the previous 
ENDPOINT_NUMBER is 1, hence the number of rows with value 2 is 1.  

Another example is ENDPOINT_VALUE 5. Its ENDPOINT_NUMBER is 9995. The previous 
bucket ENDPOINT_NUMBER is 4, so 9995 - 4 = 9991 rows containing the value 5. 

So, now QUERY (1) does in fact do a Full Table Scan.

SQL> select * from tab1 where b=5
SQL> /

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=10 Card=9991 Bytes=99910)

1    0   TABLE ACCESS (FULL) OF 'TAB1' (Cost=10 Card=9991 Bytes=99910)


And, QUERY (2) does do an Index Range Scan.

SQL> select * from tab1 where b=3
SQL> /

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=500 Bytes=5000)
1    0   TABLE ACCESS (BY ROWID) OF 'TAB1' (Cost=6 Card=500 Bytes=5000)
2    1     INDEX (RANGE SCAN) OF 'TAB1_B' (NON-UNIQUE)

This is fine if you have a low number of distinct values, but there can
be tables with a huge number of distinct values.  You don't want to 
create a bucket for each value. There would be too much overhead.
In this case you would request less buckets than distinct values.


CREATING HISTOGRAMS WITH LESS BUCKETS THAN DISTINCT VALUES
----------------------------------------------------------

SQL> analyze table tab1 compute statistics for columns b size 8;

SQL> select lpad(TABLE_NAME,10), lpad(COLUMN_NAME, 5), 
  2> endpoint_number, endpoint_value 
  3> from user_histograms;

LPAD(TABLE LPAD( ENDPOINT_NUMBER ENDPOINT_VALUE
---------- ----- --------------- --------------
TAB1     B               0              1
TAB1     B               7              5
TAB1     B               8          10000

Here, Oracle creates the requested number of buckets but puts the same
number of values into each bucket, where there are more endpoint values
that are the same for the frequently occurring value.

The ENDPOINT_NUMBER is the actual bucket number and ENDPOINT_VALUE is 
the endpoint value of the bucket determined by the column value.

From above, bucket 0 holds the low value for the column. You cannot see
buckets 1 to 6 so as to save space.

But we have bucket 1 with an endpoint of 5,
        bucket 2 with an endpoint of 5,
         bucket 3 with an endpoint of 5,
        bucket 4 with an endpoint of 5,
        bucket 5 with an endpoint of 5,
        bucket 6 with an endpoint of 5,
        bucket 7 with an endpoint of 5 AND
        bucket 8 with an endpoint of 10000

So bucket 8 contains values between 5 and 10000.
All buckets contain the same number of values (which is why they are called
height-balanced histograms), except the last bucket may have less values
then the other buckets.

If the data is uniform, you would not use histograms. However, if you request
the same number of buckets as distinct values, Oracle creates 1 bucket.  If
you request less buckets, Oracle uses an algorithm to balance values into each
bucket and any values that remain (which have to be less then the number 
stored in each height-balanced bucket) go into the last bucket.


STORING CHARACTER VALUES IN HISTOGRAMS
--------------------------------------

Character columns have some exceptional behaviour, in as much as we store 
histogram data for the first 5 bytes of any string.  Any predicates that
contain strings greater than 5 characters will not use histogram information
and the selectivity will be 1 / DISTINCT.

Data in histogram endpoints is normalized to double precision floating point
arithmetic.

EXAMPLE
-------

SQL> select * from morgan;

A
----------
a
b
c
d
e
e
e
e


The table contains 5 distinct values. There is one occurance of 'a', 'b', 'c' 
and 'd' and 4 of 'e'.

Create a histogram with 5 buckets.

SQL> analyze table morgan compute statistics for columns a size 5;

Looking in user_histograms:

LPAD(TABLE LPAD( ENDPOINT_NUMBER ENDPOINT_VALUE
---------- ----- --------------- --------------
    MORGAN     A               1     5.0365E+35
    MORGAN     A               2     5.0885E+35
    MORGAN     A               3     5.1404E+35
    MORGAN     A               4     5.1923E+35
    MORGAN     A               8     5.2442E+35

So, ENDPOINT_VALUE     5.0365E+35 represents a
            5.0885E+35 represents b
            5.1404E+35 represents c
            5.1923E+35 represents d
            5.2442E+35 represents e

Then if you look at the cumulative values for ENDPOINT_NUMBER,
the corresponding  ENDPOINT_VALUE's are correct.

 

Excellent - But why I get this

pawan, March 22, 2002 - 8:44 am UTC

Tom,
I was trying yr example and this is what happens
------
ADHOC@SF028I> @count
Enter table name:all_objects
old 1: select count(1) from &TABLE
new 1: select count(1) from all_objects

COUNT(1)
---------
17068

ADHOC@SF028I> set timing on
ADHOC@SF028I> num,3) X,a.* from all_objects a; <

Table created.

Elapsed: 00:00:08.88
ADHOC@SF028I> insert /*+ append */ into t select * from t;

17067 rows created.

Elapsed: 00:00:01.31
ADHOC@SF028I> insert /*+ append */ into t select * from t;
insert /*+ append */ into t select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


Elapsed: 00:00:00.03
ADHOC@SF028I> insert /*+ append */ into t select * from t;
insert /*+ append */ into t select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


Elapsed: 00:00:00.00
ADHOC@SF028I>
---
Can U explain

Tom Kyte
March 22, 2002 - 10:13 am UTC

add a commit after each and every insert /*+ append */ (thats what was in the ... I had to do it a couple of times.

The message is somewhat self explanatory - you cannot read or modify an object after using the append hint. commit and you can read/modify it again.

It works

pawan, March 22, 2002 - 8:46 am UTC

PLease ignore my last posting - I committed and it works

Excellent

ling, March 22, 2002 - 3:33 pm UTC

1) I re-analyzed the table using:
SQL> analyze table error_tbl compute statistics 
    for table
    for all indexes
    for all indexed columns size 10;

Yes, it did same thing with you show me. 

SQL> select * from user_histograms 
    where table_name = 'ERROR_TBL'
    and column_name = 'PROCEDURE_NAME';

TABLE_NAME  COLUMN_NAME      ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE       
----------  ---------------- --------------- -------------- ----------------------------
ERROR_TBL   PROCEDURE_NAME              1514 5.264172163023 eb_coll_ac_ch_addr_trg      
ERROR_TBL   PROCEDURE_NAME              3051 5.264172163023 eb_coll_ac_ch_trg           
ERROR_TBL   PROCEDURE_NAME           3528434 5.474313824568 inst_eb_coll_ac_unique_prc  
ERROR_TBL   PROCEDURE_NAME                 2 5.212249225386 tbs_space_prc               

Execution Plan
------------------------------------------------------------------------------------
SELECT CHOOSE ()    , COST=4, CARDINALITY=2
  TABLE ACCESS ANALYZED (BY INDEX ROWID) ERROR_TBL   , COST=4, CARDINALITY=2
    INDEX ANALYZED (RANGE SCAN) ERROR_TBL_IDX   , COST=3, CARDINALITY=2

2) Question, Histograms are not useful for primary key and unique key columns 'cause they are uniformly. So 
I)
analyze table x compute statistics for column y size n;
II) 
analyze table x compute statistics
for table
for all indexes
for all indexed column size n;

I) is better than II) for saving thousands of rows in user_histograms? Or just doesn't matter?

Thanks for your help  

Tom Kyte
March 22, 2002 - 3:52 pm UTC

well, you won't have thousands of rows -- we use buckets (thats what size is all about).

I is better then II if you know the columns you want to have histograms for (common). II is easier to demo with (so you will see me use it lots).

Ling, March 22, 2002 - 4:07 pm UTC

Thousands of rows in user_histograms means I analyze all of hundreds of our tables. And the default value for SIZE is 75.

Anyway, using method II is easy and quick for analyzing hundreds of tables.

Thanks so much for your help!

i have a simular problem

raju, August 26, 2002 - 5:31 pm UTC

i have three hughe tables each of 1,00,000 size

all have proper indexes.

i wrote a query joining these tables (so, totally two conditions) and some static conditions

when i anlayzed the tables with
script1:
-------
analyze table t1 compute statistics
analyze table t2 compute statistics
analyze table t3 compute statistics

also analyzed the indexes on these tables seperately with
analyze index i1 compute statistics... so on..

after this, when explain plan for the query joining the above three tables all the tables are scanned fully.

then as you said, i tried analyzing the tables
script2:
-------
analyze table t1
compute statistics
for TABLE
for all indexes;

analyze table t2
compute statistics
for TABLE
for all indexes;

analyze table t3
compute statistics
for TABLE
for all indexes;

then also same... all the tables were scanned fully.

then i did
script3:
--------
analyze table t1
compute statistics
for TABLE
for all indexes
for all indexed columns size 100;

analyze table t2
compute statistics
for TABLE
for all indexes
for all indexed columns size 100;

analyze table t3
compute statistics
for TABLE
for all indexes
for all indexed columns size 100;

then it started using the indexes. then i ran the above script3 with column size as 1 then also indexes are being used.
then i ran the script2, then also all the indexes were used.

1.i really don't understand here whatz happening?
2.is it because of histograms
3.how is it possible with histograms

i don't see any topic in your book on histograms?(i am not sure is there a one,.. if please tell me what topic is it?)

Tom Kyte
August 26, 2002 - 7:31 pm UTC

I don't understand either cause I don't see a single query, a single tkprof, a single explain plan. I don't understand the skew of your data -- nothing.

Nothing in, nothing out.

RE: Nothing in, nothing out....

Mark A. Williams, August 26, 2002 - 9:13 pm UTC

Better call the shop and get your crystal ball back ASAP! By the way, how many red lights are there between the shop and your place? Is that good?

Just joking of course...

- Mark

Its really good and helpful

Hitesh, August 27, 2002 - 1:24 am UTC

Keep it up

i have simular problem -- sorry for the confusion

raju, August 27, 2002 - 11:20 am UTC

sorry tom,

 there was some confusion.

 please ignore my previous question.

i have described main tables that i am using in my query(find query at the bottom)

SQL> DESC TRANS_SUMY
Name                  Type           Nullable Default Comments 
--------------------- -------------- -------- ------- -------- 
TRANS_SUMY_ID_C       NUMBER(13)                               
EM_ID_C           VARCHAR2(11)                             
LEAVE_TYP_ID_C        VARCHAR2(4)                              
LEAVE_TYP_T           VARCHAR2(255)                                                                
RQST_STRT_D           DATE                                     
RQST_STRT_DT_DY_I     VARCHAR2(1)                              
RQST_END_D            DATE           Y                         
RQST_END_DT_DY_I      VARCHAR2(1)    Y                           

SQL> SELECT * FROM USER_IND_COLUMNS WHERE TABLE_NAME = 'TRANS_SUMY';

INDEX_NAME         COLUMN_NAME            column postion     index_type
------------------ ---------------------  -----------------  ------------
XUK1_TRANS_SUMY    EM_ID_C                1                  unique
XUK1_TRANS_SUMY    RQST_STRT_D            2                  unique 
XUK1_TRANS_SUMY    RQST_STRT_DT_DY_I      3                  unique
XPK_TRANS_SUMY     TRANS_SUMY_ID_C        1                  unique

SQL> desc TRANS_HDR
Name                    Type         Nullable Default Comments 
----------------------- ------------ -------- ------- -------- 
TRANS_ID_C              NUMBER(13)                             
TRANS_STAT_ID_C         NUMBER(4)                              
TRANS_SUMY_ID_C         NUMBER(13)                             
CURR_APPVR_SEQ_ID_C     NUMBER(2)                              


INDEX_NAME                     COLUMN_NAME         index
------------------------------ ------------------  ------
XUK1_TRANS_HDR                 TRANS_SUMY_ID_C     unique
XPK_TRANS_HDR                  TRANS_ID_C          unique

SQL> desc TRANS_DETL
Name                   Type         Nullable Default Comments 
---------------------- ------------ -------- ------- -------- 
TRANS_ID_C             NUMBER(13)                             
TRANS_APPVR_SEQ_ID_C   NUMBER(2)                              
APPVR_EM_ID_C          VARCHAR2(11)                                      
TRANS_STAT_ID_C        NUMBER(4)                              


INDEX_NAME                     COLUMN_NAME
------------------------------ ---------------------
XPK_TRANS_DETL                 TRANS_ID_C
XPK_TRANS_DETL                 TRANS_APPVR_SEQ_ID_C

DESC TRANS_STAT_REF
Name              Type          Nullable Default Comments 
----------------- ------------- -------- ------- -------- 
TRANS_STAT_ID_C   NUMBER(4)                               
TRANS_STAT_SHT_N  VARCHAR2(15)                            
TRANS_STAT_N      VARCHAR2(30)                            
TRANS_STAT_T      VARCHAR2(255)                           

SELECT
      tr.TRANS_id_c,
      tsum.crtd_dz,
      tsum.leave_typ_t,
      tsum.rqst_strt_d,
      tsum.rqst_end_d,
      trde.appvr_em_id_c ,
      trst.TRANS_stat_t,
      curr_appvr_seq_id_c,
      TRANS_appvr_seq_id_c
      FROM
      TRANS_HDR tr,
      TRANS_DETL trde,
      TRANS_SUMY tsum,
      TRANS_stat_ref trst
      WHERE trst.TRANS_stat_id_c      = tr.TRANS_stat_id_c
            AND   tr.TRANS_id_c           = trde.TRANS_id_c
      AND   tr.TRANS_SUMY_id_c    = tsum.TRANS_SUMY_id_c
      AND   trde.TRANS_appvr_seq_id_c = tr.curr_appvr_seq_id_c
      AND (
               EXISTS (
SELECT  1 FROM TRANS_DETL trde WHERE  tr.TRANS_id_c = trde.TRANS_id_c             AND trde.appvr_em_id_c IN
                ( 
                  SELECT c.em_id_c FROM prxy_em_ref c, prxy_typ_ref d
                  WHERE c.prxy_em_id_c = '692A'
                  AND   d.prxy_typ_id_c  = c.prxy_typ_id_c
                  AND   d.prxy_typ_sht_n = 'GENERAL'
                  AND   c.actv_i         = 'A'
                   )
                   OR
                   (--trde.appvr_prxy_em_id_c <> '629A' AND
                   
                   trde.appvr_prxy_em_id_c = '629A'
                    AND trde.appvr_prxy_em_id_c IS NOT NULL
                     )
                                       ))


Description                        Object Name                  Cost    Cardinality   Bytes
------------------------------------------------------------------------------------------------ 
SELECT STATEMENT, GOAL = CHOOSE                            522    2913          1016637
 FILTER                    
  HASH JOIN                                            522    2913          1016637
   TABLE ACCESS FULL               TRANS_STAT_REF            1    82          11644
   HASH JOIN                                            520    3553          735471
    HASH JOIN                                            157    3553          159885
     TABLE ACCESS FULL               TRANS_HDR                       77    3553          120802
     TABLE ACCESS FULL               TRANS_DETL                    75    77492          852412
    TABLE ACCESS FULL               TRANS_SUMY                    342    71045          11509290
  FILTER                    
   TABLE ACCESS FULL               TRANS_DETL                   75    39          624
   NESTED LOOPS                                            2    1          29
    TABLE ACCESS BY INDEX ROWID       PRXY_TYP_REF                    1    1          11
     INDEX UNIQUE SCAN               XUK1_PRXY_TYP_REF        1    
    TABLE ACCESS BY INDEX ROWID       PRXY_EM_REF                    1    1          18
     INDEX UNIQUE SCAN               XUK1_PRXY_EM_REF        1    


now , i don't understand why none of the indexes of my primary tables(trans_sumy, trans_hdr and trans_detl are not being used)

can you guide me where i am missing or can you suggest me any other indexes to be created on trans_detl or trans_sumy or trans_hdr

this is the row count from all the tables

SQL> select count(*) from trans_sumy;

  COUNT(*)
----------
     71053

SQL> select count(*) from trans_hdr;

  COUNT(*)
----------
     71053

SQL> select count(*) from trans_detl;

  COUNT(*)
----------
     77492

SQL> select count(distinct appvr_em_id_c) from trans_detl;

COUNT(DISTINCTAPPVR_EM_ID_
------------------------------
                          2536

i analyzed the above tables as analyze table <table_name> compute statistics for table for all indexes(as suggested by you) 

i have simular problem -- explain plan again(better view)

raju, August 27, 2002 - 11:27 am UTC

Description Object Name Cost
----------------------------------------------------
SELECT STATEMENT 522
FILTER
HASH JOIN 522
TABLE ACCESS FULL TRANS_STAT_REF 1
HASH JOIN 520
HASH JOIN 157
TABLE ACCESS FULL TRANS_HDR 77
TABLE ACCESS FULL TRANS_DETL 75
TABLE ACCESS FULL TRANS_SUMY 342
FILTER
TABLE ACCESS FULL TRANS_DETL 75
NESTED LOOPS 2
TABLE ACCESS BY INDEX ROWID PRXY_TYP_REF 1
INDEX UNIQUE SCAN XUK1_PRXY_ 1
TYP_REF
TABLE ACCESS BY INDEX ROWID PRXY_EM_REF 1
INDEX UNIQUE SCAN XUK1_PRXY 1
_EM_REF

Tom Kyte
August 27, 2002 - 11:33 am UTC

question for you -- (well, question and request)

Question: why do you feel that not using indexes is "bad". Indexes do not mean any of the following:

o fast response time
o better then full table scan
o must be used


So, does the query perform ok? Is the response time acceptable?


Now the request: make this into a question, its not really a review and its rather "large". Wait till I'm taking questions and post it there.

About Bitmap again

A reader, August 31, 2002 - 10:10 am UTC

Dear Tom,
Histograms are fine. But if we have just a couple of distinct values in a column. Then instead of the B-tree index, why not Bitmap Index, and then collect statistics(histogram). I hear a lot about bitmap indexes not being useful by themselves. It is when they are used in conjunction with other bitmap indexes typically in a warehouse environment do we realise the potential of bitmap indexes. I feel for the columns where there are a few distinct values, and there is a skew ( take for example the status codes in a PO-Purchase Order file, the one still being processed would have a un-processed status, and would be very few in numbers compared to the ones that are processed), we can collect histogram statistics to benefit (to pick up unprocessed POs bitmap indexe would be used, else not). Are there any guidelines on choosing between a bitmap or b-tree indexes?
Thanks in advance for your response, it has always been greatly valued.
Regards,

Tom Kyte
August 31, 2002 - 10:29 am UTC

Bitmap indexes + OLTP systems = recipe for utter and total failure.


So, take for example that PO purchase order file (typically associated with OTLP). Using a bitmap index on it would be very bad.


Now, if you have two states, a b*tree index can be very very useful. Suppose the vast majority are PROCESSED. If you represent PROCESSED as NULL and create index po_idx on po(processed_flag) -- the NULL entries will never make it into the b*tree index, the index will ONLY have pointers to the unprocessed rows (very few of them).

If you don't want to use NULL as the value, you can always use a function based index:

create index po_index on po( decode( processed, 'N', 'N', NULL ) );

that'll index ONLY the entries with N in them (N = not processed). And that index will be quite small and only have one value in it (N) but that is OK, you are using it to quickly find the unprocessed rows.

On the original Question...

A reader, October 14, 2002 - 9:52 am UTC

Tom,

This is on the original question posted-

The cardinality of the EXPLAIN PLAN shows 882109 records whereas a query from the Index table for the Primary Key shows num_rows as 3528434. When he did an analyze table ERROR_TBL compute statistics, my understanding is that it will NOT touch the index statistics and thus there will be a discrepancy between index data and actual table data. This could also have contributed to the performance. Am i right?

Would,
analyze table t
compute statistics
for TABLE
for all indexes,
analyze the table as well as all its indexes?


Thanks in advance

Tom Kyte
October 14, 2002 - 10:40 am UTC

The num_rows IN THE INDEX is 3.5 million.

The estimated cardnality given a predicate:

select * from error_tbl where procedure_name = 'tbs_space_prc';

is 882k.  You cannot COMPARE num_rows with the cardinality expected from the output of a particular step as you were trying -- they are apples and pears.




The analyze table does

o the table
o the indexes
o histograms on all columns


consider:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( x int , y int );

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create index t_idx on t(x);

Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t select rownum, rownum from all_users;

46 rows created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> commit;

Commit complete.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> select table_name, num_rows from user_tables;

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                      46

ops$tkyte@ORA817DEV.US.ORACLE.COM> select index_name, num_rows from user_indexes;

INDEX_NAME                       NUM_ROWS
------------------------------ ----------
T_IDX                                  46

ops$tkyte@ORA817DEV.US.ORACLE.COM> select table_name, count(distinct column_name) from user_tab_histograms group by table_name;

TABLE_NAME                     COUNT(DISTINCTCOLUMN_NAME)
------------------------------ --------------------------
T                                                       2


It got everything.


But -- what you are seeing in the example (3.5 mill in num_rows, 882k in card) is EXACTLY what you might expect given that it had histograms (it knows the predicate would return about 882k rows) 

Why difference in cost when column is indexed!!!

VJ, March 10, 2003 - 6:31 pm UTC

Tom,

This is with regards to the first reply you gave in this thread. Please follow your reply below:
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t where x = 4;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=10 Bytes=880)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=4 Card=10 Bytes=880)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=3 Card=10)

and the optimizer has the right info -- does the right thing. Just to
validate, lets see what happens when we user the #1:

ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t where x = 1;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=446 Card=369276
Bytes=32496288)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=446 Card=369276 Bytes=32496288)


back to a full scan - AS IT SHOULD be for that one

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I have 2 questions:
1. You created an index on the column X and analyzed it. Why is the optimizer behavior different just by changing the column value?
2. Can you throw some light on the "Size" option you used with the analyze command?


Tom Kyte
March 10, 2003 - 7:00 pm UTC

1) that is the job of the optimizer. Its goal in life is to take a query and optimize it.

In this example, where x = 4 should use an index. where x = 1 should not.


2) </code> http://docs.oracle.com/docs/cd/A87860_01/doc/server.817/a85397/statem8a.htm#2058710 <code>

Why difference in cost when column is indexed-contd!!!

VJ, March 10, 2003 - 7:30 pm UTC

Tom,

I know you wouldn't believe until you see the actual test results. I ran your demo with no differences, but i see different behavior than yours. Am i missing something? why is the cost still high?
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=
MKT-TEST->create table t as select mod(rownum,3) X, a.* from all_objects a;

Table created.

MKT-TEST->insert /*+ append */ into t select * from t;

4192 rows created.

MKT-TEST->insert /*+ append */ into t select * from t;
insert /*+ append */ into t select * from t
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


MKT-TEST->commit;

Commit complete.

MKT-TEST->insert /*+ append */ into t select * from t;

8384 rows created.

MKT-TEST->
MKT-TEST->update t set x = 4 where rownum <= 10;
update t set x = 4 where rownum <= 10
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


MKT-TEST->commit;

Commit complete.

MKT-TEST->update t set x = 4 where rownum <= 10;

10 rows updated.

MKT-TEST->select x,count(0) from t group by x;

X COUNT(0)
---------- ----------
0 5585
1 5588
2 5585
4 10

MKT-TEST->analyze table t compute statistics;

Table analyzed.

MKT-TEST->set autotrace traceonly explain
MKT-TEST->select * from t where x = 4;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=4192 Bytes=3
43744)

1 0 TABLE ACCESS (FULL) OF 'T' (Cost=34 Card=4192 Bytes=343744
)




MKT-TEST->analyze table t compute statistics for columns x size 100;

Table analyzed.

MKT-TEST->select * from t where x = 4;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=10 Bytes=820
)

1 0 TABLE ACCESS (FULL) OF 'T' (Cost=34 Card=10 Bytes=820)



MKT-TEST->select * from t where x = 1;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=5588 Bytes=4
58216)

1 0 TABLE ACCESS (FULL) OF 'T' (Cost=34 Card=5588 Bytes=458216
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Tom Kyte
March 10, 2003 - 7:36 pm UTC

you have a tiny number of rows. I had lots. add more rows.

you don't mention a version either but given 4k objects in all_objects, it is probably 8.0 or before -- really old stuff.

Excellent

trevor welch, March 10, 2003 - 8:01 pm UTC

Top stuff

How do I compute the value of N for size

Vivek Sharma, March 11, 2003 - 2:04 am UTC

Dear TOM, this was very helpful. I wanted to know how to calculate the value of N for column size <N>. I have a production database and analyzing the tables and indexes takes around 3-4 hours. If I wrongly calculate the value of N and if it affects the performance, I can only correct the problem in the night, hence continuing the bad performance for a day. So to avoid this, what is the better way to calculate the value of N, so that the first time I analyze the tables and indexes, it should collect correct histogram values.

Thanks in advance.
Bye
Vivek Sharma


Tom Kyte
March 11, 2003 - 7:49 am UTC

depends on your version -- 9i, you can let us figure it out, 8i, you use your knowledge of the data and how it will be used to set it appropriately. You can use the note above to gain an understanding of what they do for you.

This method vs. DBMS_STATS.GATHER_DATABASE_STATS

Peter Tran, April 02, 2003 - 10:27 am UTC

Tom,

Wouldn't using DBMS_STATS.GATHER_DATABASE_STATS give you better results?

Would I get a better histogram analysis using DBMS_STATS.GATHER_DATABASE_STATS package?

In other words, if I had to analyze my tables, would you recommend using the DBMS_STATS.GATHER_DATABASE_STATS or doing it manually with individual "analyze table..." commands?

Thanks,
-Peter

Tom Kyte
April 02, 2003 - 11:19 am UTC

I use analyze in demos.

DBMS_STATS (i prefer gather schema stats) should be used in real life.

Maybe I should switch for the demos, but analyze is easier to type ;)

Repsonse especially for Vivek Sharma

Wolfgang Breitling, April 02, 2003 - 7:39 pm UTC

If you are using dbms_stats you do not need to wait until the next night to correct harmful (for the performance) statistics. Use the stattab option to save your current statistics. If performance suffers after refreshing the statistics simply restore the prior statistics. It's just prudent to have a backup when you make a change - and gathering statistics IS making a change.

getting proper histograms with gather_table_stats

A reader, April 28, 2003 - 3:53 pm UTC

Tom, this is for release 9iR2 could you please explain your following statement with regards to size? How do I let oracle automatically figure out the size value generate  proper Histograms?

"depends on your version -- 9i, you can let us figure it out"

Here is what I have in one of my tables - 10000 values in each bucket (although in actual production the data will be very unevenly distributed):

  GROUP_ID   COUNT(*)
---------- ----------
      1167      10000
      1168      10000
      1169      10000
      1170      10000
      1171      10000
      1172      10000
      1173      10000
      1174      10000
      1175      10000
      1176      10000
      1177      10000

When I ran analyze by manually specifying the size value, the histogram looks right:

analyze table grp compute statistics for columns group_id size 11;

select column_name,ENDPOINT_NUMBER,ENDPOINT_VALUE
from user_histograms;
COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE
-------------------- --------------- --------------
GROUP_ID                      869962           1167
GROUP_ID                      879962           1168
GROUP_ID                      889962           1169
GROUP_ID                      899962           1170
GROUP_ID                      909962           1171
GROUP_ID                      919962           1172
GROUP_ID                      929962           1173
GROUP_ID                      939962           1174
GROUP_ID                      949962           1175
GROUP_ID                      959962           1176
GROUP_ID                      969962           1177

However I would like oracle to automatically figure out the size value. So I did:

SQL> exec dbms_stats.gather_table_stats(user,'GRP',method_opt => 'for all indexes for all indexed columns', cascade => true);

But the histograms now don't look right:
GROUP_ID                          65           1167
GROUP_ID                          66           1169
GROUP_ID                          67           1170
GROUP_ID                          68           1171
GROUP_ID                          69           1173
GROUP_ID                          70           1174
GROUP_ID                          71           1175
GROUP_ID                          72           1177

Am I missing any option in my gather_table_stats statement? 

Tom Kyte
April 28, 2003 - 4:18 pm UTC

I cannot reproduce your situation -- but neither of those look right to me, it should have but two entries given your data (if you want a valid test, you must test with valid data).

Here is my test, and what I get and it is what I expected:

ops$tkyte@ORA920> REM drop table t;
ops$tkyte@ORA920> REM
ops$tkyte@ORA920> REM create table t ( group_id int );
ops$tkyte@ORA920> REM
ops$tkyte@ORA920> REM insert /*+ append */
ops$tkyte@ORA920> REM into t
ops$tkyte@ORA920> REM select group_id
ops$tkyte@ORA920> REM from
ops$tkyte@ORA920> REM (select 1167+rownum-1 group_id from all_objects where rownum <=11),
ops$tkyte@ORA920> REM (select rownum r from all_objects where rownum <= 10000 )
ops$tkyte@ORA920> REM /
ops$tkyte@ORA920> REM commit;
ops$tkyte@ORA920> REM
ops$tkyte@ORA920> REM create index group_idx on t(group_id);
ops$tkyte@ORA920>
ops$tkyte@ORA920> select group_id, count(*) from t group by group_id;

  GROUP_ID   COUNT(*)
---------- ----------
      1167      10000
      1168      10000
      1169      10000
      1170      10000
      1171      10000
      1172      10000
      1173      10000
      1174      10000
      1175      10000
      1176      10000
      1177      10000

11 rows selected.

<b>just like you..</b>

ops$tkyte@ORA920>
ops$tkyte@ORA920> analyze table t compute statistics for columns group_id size 11;

Table analyzed.

ops$tkyte@ORA920> select column_name,ENDPOINT_NUMBER,ENDPOINT_VALUE
  2    from user_histograms
  3   where table_name = 'T';

COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
---------- --------------- --------------
GROUP_ID             10000           1167
GROUP_ID             20000           1168
GROUP_ID             30000           1169
GROUP_ID             40000           1170
GROUP_ID             50000           1171
GROUP_ID             60000           1172
GROUP_ID             70000           1173
GROUP_ID             80000           1174
GROUP_ID             90000           1175
GROUP_ID            100000           1176
GROUP_ID            110000           1177

11 rows selected.

<b>now that isn't what you have -- it is what I expect, -- unless you ahve a much much larger table and didn't tell me about that?</b>


ops$tkyte@ORA920>
ops$tkyte@ORA920> analyze table t delete statistics;

Table analyzed.

ops$tkyte@ORA920> begin
  2     dbms_stats.gather_table_stats
  3     (user,'T',
  4      method_opt => 'for all indexes for all indexed columns<b> size auto</b>',
  5          cascade => true);
  6  end;
  7  /

PL/SQL procedure successfully completed.

<b>there I used size auto and get:</b>

ops$tkyte@ORA920>
ops$tkyte@ORA920> select column_name,ENDPOINT_NUMBER,ENDPOINT_VALUE
  2    from user_histograms
  3   where table_name = 'T';

COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
---------- --------------- --------------
GROUP_ID                 0           1167
GROUP_ID                 1           1177

<b>which, since the data does not actually vary -- is totally what I want to see, the analyze did what it was supposed to!  you'll really need to vary the data if you want to see what production will really do.</b>


ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> analyze table t delete statistics;

Table analyzed.

ops$tkyte@ORA920> begin
  2     dbms_stats.gather_table_stats
  3     (user,'T',
  4      method_opt => 'for all indexes for all indexed columns',
  5          cascade => true);
  6  end;
  7  /

PL/SQL procedure successfully completed.

ops$tkyte@ORA920>
ops$tkyte@ORA920> select column_name,ENDPOINT_NUMBER,ENDPOINT_VALUE
  2    from user_histograms
  3   where table_name = 'T';

COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
---------- --------------- --------------
GROUP_ID             10000           1167
GROUP_ID             20000           1168
GROUP_ID             30000           1169
GROUP_ID             40000           1170
GROUP_ID             50000           1171
GROUP_ID             60000           1172
GROUP_ID             70000           1173
GROUP_ID             80000           1174
GROUP_ID             90000           1175
GROUP_ID            100000           1176
GROUP_ID            110000           1177

11 rows selected.

<b>This is what I would expect to see from your test however (assuming group_id was indexed)</b>

 

size=auto did it

A reader, April 28, 2003 - 4:17 pm UTC

It took longer to analyze but size=auto in the method_opt clause produced the correct histogram (method_opt => 'for all indexed columns size auto'). This is really neat - I don't have to worry about specifying different size values for different tables/columns.


Here is the realistic data - which histogram is more accurate?

A reader, April 29, 2003 - 10:12 am UTC

Tom, what you pointed out in my previous posting was indeed correct - I had posted only a part of the histogram and my data wasn't realistic either. But here are 2 histograms from realistic data. 
1) Which histogram is more accurate - the one with analyze or the one with gather_table_stats? Hope its gather_table_stats - don't wanna specify a size in analyze. 
2) can I speed up dbms_gather_stats?

SQL> select pas_id,count(*) from pas_stats group by pas_id;

    pas_id   COUNT(*)
---------- ----------
        81      36572
       101         39
       121     183437
       181         89
       201        115
       202     135689
       203      89332
       241       3157
       301    1494621
       302     981543
       303       9593

SQL> analyze table pas_stats delete statistics;

SQL> analyze table pas_stats compute statistics for columns pas_id size 11
;

Table analyzed.

Elapsed: 00:03:49.57

select column_name,ENDPOINT_NUMBER,ENDPOINT_VALUE
from user_histograms
where table_name ='pas_stats';


COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE
-------------------- --------------- --------------
pas_id                         36572             81
pas_id                         36611            101
pas_id                        220048            121
pas_id                        220137            181
pas_id                        220252            201
pas_id                        355941            202
pas_id                        445273            203
pas_id                        448430            241
pas_id                       1943170            301
pas_id                       2924778            302
pas_id                       2934371            303

analyze table pas_stats delete statistics;

dbms_stats.gather_table_stats(user,'pas_stats',method_opt => 'for all indexes for all indexed columns size auto',cascade=>true);

  1  select column_name,ENDPOINT_NUMBER,ENDPOINT_VALUE
  2  from user_histograms
  3  where table_name ='pas_stats'
  4* and column_name='pas_id'
SQL> /

COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE
-------------------- --------------- --------------
pas_id                             0             81
pas_id                             1            303 

Query Performance

Rahul, May 29, 2003 - 2:51 am UTC

Hi Tom

Good Morning.
I am wrestling with a query using a function for last 3 days but didn't get the better performance after trying with all ideas,(used hints,indexes etc.).
Below is the function and query-

create or replace function f_getLicenseeLines
(arl_LicenseNo number,
arl_class varchar2 default ''
)
return varchar2 as ls_lines_string varchar2(4000);
ll_codeid number;
begin
ls_lines_string := '';
select codeid into ll_codeid from lwgncodelist where codetype='STS' and code = 'AP';
if length(trim(arl_class)) is null then

for rec in
(
select distinct linedesc
from lwpragentlines al , lwprlicense lic, lwgnclasslines cl
where al.licenseid = lic.licenseid
and al.lineno = cl.lineno
and al.class = cl.class
and lic.licenseno = arl_LicenseNo
and al.linestatus = ll_codeid
)
loop

ls_lines_string := ls_lines_string || ',' || trim(rec.linedesc) ;

end loop;

else

for rec in
(
select distinct linedesc
from lwpragentlines al, lwprlicense lic, lwgnclasslines cl
where lic.licenseno = arl_LicenseNo
and al.licenseid = lic.licenseid
and al.lineno = cl.lineno
and al.class = cl.class
and al.class = trim(arl_class)
and al.linestatus = ll_codeid
)
loop

ls_lines_string := ls_lines_string || ',' || trim(rec.linedesc) ;

end loop;

end if;

ls_lines_string := substr(ls_lines_string,2);
if ls_lines_string = '' then
ls_lines_string := '-';
end if;
return ls_lines_string;
end ;
/

SELECT C.homestate as homestate,C.agentid as agentid, lastname||' '||FIRSTNAME||' '||MIDDLENAME||
decode(bcardcount,0,'',null,'','#Y') AGENTNAME,DECODE(A.SSN,NULL,'-',substr(A.SSN,1,3)||'-'||
substr(A.SSN,4,2)||'-'||SUBSTR(A.SSN,6)) AS SSN, LPAD(licenseno,7,0) LICENSENO ,A.INDIVIDUALID as INDIVIDUALID,
BUSINESSTYPE,to_char(dob,'mm/dd/yyyy') dob , f_getLicenseeLines(B.licenseno) class from lwpragent C,
lwprlicense B, lwgnindividual A Where C.agentid = B.agentid AND A.individualid = C.individualid
AND B.CLASS = 'PRO' AND (B.LICENSESTATUS='A' OR B.LICENSESTATUS ='AP')
/

Table contains the rows-
LWPRAGENT - 162793
LWGNINDIVIDUAL - 150072
LWPRLICENSE - 162865
LWGNCLASSLINES - 18
LWPRAGENTLINES - 414107

Every column used in query and function has proper index on that, but still query is taking 2-3 MINUTES to fire.
I don't know what I am missing.
Please suggest me what to do with this query to it faster?

Thanks a lot.



Tom Kyte
May 29, 2003 - 8:14 am UTC

so, how long does it take without the function and how many rows does it return ultimately.

Query Performance

Rahul, May 29, 2003 - 2:51 am UTC

Hi Tom

Good Morning.
I am wrestling with a query using a function for last 3 days but didn't get the better performance after trying with all ideas,(used hints,indexes etc.).
Below is the function and query-

create or replace function f_getLicenseeLines
(arl_LicenseNo number,
arl_class varchar2 default ''
)
return varchar2 as ls_lines_string varchar2(4000);
ll_codeid number;
begin
ls_lines_string := '';
select codeid into ll_codeid from lwgncodelist where codetype='STS' and code = 'AP';
if length(trim(arl_class)) is null then

for rec in
(
select distinct linedesc
from lwpragentlines al , lwprlicense lic, lwgnclasslines cl
where al.licenseid = lic.licenseid
and al.lineno = cl.lineno
and al.class = cl.class
and lic.licenseno = arl_LicenseNo
and al.linestatus = ll_codeid
)
loop

ls_lines_string := ls_lines_string || ',' || trim(rec.linedesc) ;

end loop;

else

for rec in
(
select distinct linedesc
from lwpragentlines al, lwprlicense lic, lwgnclasslines cl
where lic.licenseno = arl_LicenseNo
and al.licenseid = lic.licenseid
and al.lineno = cl.lineno
and al.class = cl.class
and al.class = trim(arl_class)
and al.linestatus = ll_codeid
)
loop

ls_lines_string := ls_lines_string || ',' || trim(rec.linedesc) ;

end loop;

end if;

ls_lines_string := substr(ls_lines_string,2);
if ls_lines_string = '' then
ls_lines_string := '-';
end if;
return ls_lines_string;
end ;
/

SELECT C.homestate as homestate,C.agentid as agentid, lastname||' '||FIRSTNAME||' '||MIDDLENAME||
decode(bcardcount,0,'',null,'','#Y') AGENTNAME,DECODE(A.SSN,NULL,'-',substr(A.SSN,1,3)||'-'||
substr(A.SSN,4,2)||'-'||SUBSTR(A.SSN,6)) AS SSN, LPAD(licenseno,7,0) LICENSENO ,A.INDIVIDUALID as INDIVIDUALID,
BUSINESSTYPE,to_char(dob,'mm/dd/yyyy') dob , f_getLicenseeLines(B.licenseno) class from lwpragent C,
lwprlicense B, lwgnindividual A Where C.agentid = B.agentid AND A.individualid = C.individualid
AND B.CLASS = 'PRO' AND (B.LICENSESTATUS='A' OR B.LICENSESTATUS ='AP')
/

Table contains the rows-
LWPRAGENT - 162793
LWGNINDIVIDUAL - 150072
LWPRLICENSE - 162865
LWGNCLASSLINES - 18
LWPRAGENTLINES - 414107

Every column used in query and function has proper index on that, but still query is taking 2-3 MINUTES to fire.
I don't know what I am missing.
Please suggest me what to do with this query to it faster?

Thanks a lot.



Query Performance

Rahul, May 29, 2003 - 2:51 am UTC

Hi Tom

Good Morning.
I am wrestling with a query using a function for last 3 days but didn't get the better performance after trying with all ideas,(used hints,indexes etc.).
Below is the function and query-

create or replace function f_getLicenseeLines
(arl_LicenseNo number,
arl_class varchar2 default ''
)
return varchar2 as ls_lines_string varchar2(4000);
ll_codeid number;
begin
ls_lines_string := '';
select codeid into ll_codeid from lwgncodelist where codetype='STS' and code = 'AP';
if length(trim(arl_class)) is null then

for rec in
(
select distinct linedesc
from lwpragentlines al , lwprlicense lic, lwgnclasslines cl
where al.licenseid = lic.licenseid
and al.lineno = cl.lineno
and al.class = cl.class
and lic.licenseno = arl_LicenseNo
and al.linestatus = ll_codeid
)
loop

ls_lines_string := ls_lines_string || ',' || trim(rec.linedesc) ;

end loop;

else

for rec in
(
select distinct linedesc
from lwpragentlines al, lwprlicense lic, lwgnclasslines cl
where lic.licenseno = arl_LicenseNo
and al.licenseid = lic.licenseid
and al.lineno = cl.lineno
and al.class = cl.class
and al.class = trim(arl_class)
and al.linestatus = ll_codeid
)
loop

ls_lines_string := ls_lines_string || ',' || trim(rec.linedesc) ;

end loop;

end if;

ls_lines_string := substr(ls_lines_string,2);
if ls_lines_string = '' then
ls_lines_string := '-';
end if;
return ls_lines_string;
end ;
/

SELECT C.homestate as homestate,C.agentid as agentid, lastname||' '||FIRSTNAME||' '||MIDDLENAME||
decode(bcardcount,0,'',null,'','#Y') AGENTNAME,DECODE(A.SSN,NULL,'-',substr(A.SSN,1,3)||'-'||
substr(A.SSN,4,2)||'-'||SUBSTR(A.SSN,6)) AS SSN, LPAD(licenseno,7,0) LICENSENO ,A.INDIVIDUALID as INDIVIDUALID,
BUSINESSTYPE,to_char(dob,'mm/dd/yyyy') dob , f_getLicenseeLines(B.licenseno) class from lwpragent C,
lwprlicense B, lwgnindividual A Where C.agentid = B.agentid AND A.individualid = C.individualid
AND B.CLASS = 'PRO' AND (B.LICENSESTATUS='A' OR B.LICENSESTATUS ='AP')
/

Table contains the rows-
LWPRAGENT - 162793
LWGNINDIVIDUAL - 150072
LWPRLICENSE - 162865
LWGNCLASSLINES - 18
LWPRAGENTLINES - 414107

Every column used in query and function has proper index on that, but still query is taking 2-3 MINUTES to fire.
I don't know what I am missing.
Please suggest me what to do with this query to it faster?

Thanks a lot.



Without Function

Rahul, May 30, 2003 - 12:07 am UTC

Hi

Without function when I run this query, it takes 47sec but when I use hint /*+First_rows*/ then it takes 37 sec.

Please suggest.

Thanks

Tom Kyte
May 30, 2003 - 7:40 am UTC

suggest you tell me how many rows it returns utlimately

No. of Rows

Rahul, May 30, 2003 - 3:16 am UTC

Hi

Sorry forgot to write no. of rows.
Query return around 1000000 rows.

Thanks

Tom Kyte
May 30, 2003 - 7:46 am UTC

1,000,000 function calls is going to take a while, especially since each function call from SQL is a context switch from sql to plsql back to sql to plsql back to sql ....

it is doing 12,500 of those (if the query takes 47 seconds without and about 2 minutes or so) a SECOND and each function call itself runs sql. Pretty impressive no?


don't know what to tell you. 1,000,000 calls to anything is going to take some amount of time.

Syntax : dbms_stats/Analyze

Neeraj Nagpal, August 06, 2003 - 6:28 pm UTC

I know that you normally suggest using DBMS_STATS as opposed to ANALYZE, what parameters should be passed to DBMS_STATS in order to get the same effect as :

analyze table t
compute statistics
for TABLE
for all indexes
for all indexed columns SIZE <N>;

Thanks Always for your help,
Neeraj Nagpal

Tom Kyte
August 09, 2003 - 11:56 am UTC

method_opt => 'for all indexed columns size N', cascade => true );

Rephrase My question..

Neeraj Nagpal, August 07, 2003 - 4:41 pm UTC

I should have probably described the problem I am getting with the command I use, I have a composite partitioned table with local primary key index on it. When I try to analyze it, this is what I get

Wrote file afiedt.buf

  1   BEGIN
  2      DBMS_STATS.gather_table_stats (
  3         ownname=> 'FNCDEV',
  4         tabname=> 'SALES_LAYER',
  5         partname=> 'ALABAMA',
  6         method_opt => 'for all indexes for all indexed columns',
  7         cascade => false);
  8* End;
13:33:39  SQL> /
 BEGIN
*
ERROR at line 1:
ORA-14506: LOCAL option required for partitioned indexes
ORA-06512: at "SYS.DBMS_DDL", line 179
ORA-06512: at "SYS.DBMS_STATS", line 4469
ORA-06510: PL/SQL: unhandled user-defined exception
ORA-06512: at line 2

 

Tom Kyte
August 10, 2003 - 11:08 am UTC

so, did you try using the options you asked me for in the first place?????

if you do -- you might be surprised, just look up a review.

Behaviour of Query after analyzing

Vivek Sharma, August 18, 2003 - 8:35 am UTC

Dear Tom,

I have a query as under :

select a.customername
from finnonelea.nbfc_customer_m a, finnonelea.lea_agreement_dtl b
where b.applid = :b1 and b.lesseeid = a.customerid;

NBFC_CUSTOMER_M table is a synonym and is an object of another schema FINNONEGCD. Originally all the schemas were not analyzed and this query was using following execution plan :

SELECT STATEMENT [CHOOSE] Cost =
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID LEA_AGREEMENT_DTL
INDEX RANGE SCAN LEA_AGREEMENT_DTL_IDX1
TABLE ACCESS BY INDEX ROWID NBFC_CUSTOMER_M
INDEX UNIQUE SCAN SYS_C0076632

But as soon as I analyze the schemas, the same query goes for a FTS on NBFC_CUSTOMER_M Table which has 2 million records. Why is it so ?
The explain plan after analyzing is

SELECT STATEMENT [CHOOSE] Cost = 11788
HASH JOIN
TABLE ACCESS BY INDEX ROWID LEA_AGREEMENT_DTL
INDEX RANGE SCAN LEA_AGREEMENT_DTL_IDX1
TABLE ACCESS FULL NBFC_CUSTOMER_M [ANALYZED]

Can you pls help me out in tuning this sql. My Application vendor says not to analyze the schemas and use RBO, I suggest analyzing the schemas and let CBO do the work. But this query works drastically slow. There are many more which has got affected after analyzing. Please give your expertise in this matter.

Regards


Tom Kyte
August 18, 2003 - 8:51 am UTC

are ALL of the objects analyzed?

what is the cost/card/bytes of each step (use set autotrace traceonly explain for example)

(but, in order to be supported by your application vendor, you must really follow THEIR rules)

Behaviour of Query after analyzing

Vivek Sharma, August 19, 2003 - 1:25 am UTC

Dear Tom,

I am sorry for my previous question and thanks a lot for your prompt answer. The Schema FINNONELEA was not fully analyzed since we were short of time. This is a big schema with almost 1000 tables out of which only 500 tables analyzed. May be LEA_AGREEMENT_DTL table was not analyzed. I have checked on my development environment which is a replica of this live server. Once I analyze both the tables, it is using proper index whereas if only NBFC_CUSTOMER_M is analyzed, this table goes for FTS.

But your comment that to gain application vendors support, I should not analyze the schema puts me in some sort of confusion. Because, I have read in Oracle Documentation that RBO may not be available in the future version of Oracle hence we should opt for CBO. What is your comment ?

Regards


Tom Kyte
August 19, 2003 - 7:47 am UTC

My comment is

o you bought a product from a vendor
o this product happens to work on a database
o you must follow their rules for configuring that database if you wish to be
supported and supportable by them

they are the ones that need to worry about cbo/rbo, being supported on Oracle. You have to worry about being supported by them, getting the performance you were promised and so on. What they do to support you, give you the performance you were promised is 100% up to them.

execution plan differs in development db and production db

A reader, September 24, 2003 - 8:00 am UTC

Hi

I have a query which runs blazing fast in production but deadly slow in development, obviously my database have different parameters however both of them have very similar amount of data.

In development it uses this plan

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5300 Card=1 Bytes=200)
1 0 FILTER
2 1 NESTED LOOPS (Cost=5300 Card=1 Bytes=200)
3 2 HASH JOIN (Cost=5299 Card=1 Bytes=181)
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'CONTRATOS' (Cost=113 Card=1562 Bytes=39050)
5 4 INDEX (RANGE SCAN) OF 'CONTRATO_SERVICIOS_FK_I' (NON-UNIQUE) (Cost=8 Card=1562)
6 3 TABLE ACCESS (FULL) OF 'CONTRATO_DETALLE_LINEAS' (Cost=5146 Card=6241 Bytes=973596)
7 2 TABLE ACCESS (BY INDEX ROWID) OF 'CLIENTES' (Cost=1 Card=57294 Bytes=1088586)
8 7 INDEX (UNIQUE SCAN) OF 'CLIENTE_PK' (UNIQUE)
9 1 SORT (AGGREGATE)
10 9 TABLE ACCESS (BY INDEX ROWID) OF 'CONTRATO_DETALLE_LINEAS' (Cost=4 Card=1 Bytes=25)
11 10 INDEX (RANGE SCAN) OF 'CON_DET_LIN_CO_AP_CO_FK_I' (NON-UNIQUE) (Cost=3 Card=1)


in Production

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=725 Card=1 Bytes=203)
1 0 FILTER
2 1 NESTED LOOPS (Cost=725 Card=1 Bytes=203)
3 2 NESTED LOOPS (Cost=724 Card=1 Bytes=183)
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'CONTRATOS' (Cost=27 Card=1393 Bytes=37611)
5 4 INDEX (RANGE SCAN) OF 'CONTRATO_SERVICIOS_FK_I' (NON-UNIQUE) (Cost=4 Card=1393)
6 3 TABLE ACCESS (BY INDEX ROWID) OF 'CONTRATO_DETALLE_LINEAS' (Cost=1 Card=329 Bytes=51324)
7 6 INDEX (RANGE SCAN) OF 'CON_DET_LI_CON_LI_PED_FK_I' (NON-UNIQUE) (Cost=1 Card=329)
8 2 TABLE ACCESS (BY INDEX ROWID) OF 'CLIENTES' (Cost=1 Card=57315 Bytes=1146300)
9 8 INDEX (UNIQUE SCAN) OF 'CLIENTE_PK' (UNIQUE)
10 1 SORT (AGGREGATE)
11 10 TABLE ACCESS (BY INDEX ROWID) OF 'CONTRATO_DETALLE_LINEAS' (Cost=1 Card=1 Bytes=26)
12 11 INDEX (RANGE SCAN) OF 'CON_DET_LI_PK' (UNIQUE) (Cost=3 Card=1)


hash_area_size in production is 24mb whereas develoment is 128k so I dont understand why Oracle choses to use hash join.

The plan becomes same if I decrease db_file_multiblock_read_count in development to 8 (originally 16 as production)

I can understand they have different plan if amount of data is very diferent but in my case they are similar (development is a copy of production from 3 days ago)

Tom Kyte
September 24, 2003 - 12:01 pm UTC

look at the card numbers (cardinality)

you may very well have the same volume of data however either

o the skew is totally totally alot (big time) different
o the stats in one of the systems are wrong

can you tell me which might be true? that is what is affecting the plan here.

forget previous post

A reader, September 24, 2003 - 10:15 am UTC

Hi

I found the problem, the slow query has default optimizer parameter values....

usefulness of system statistics

Reader, September 24, 2003 - 6:45 pm UTC

Tom, in 9i, what kind of system statistics oracle generates when i use dbms_stats package to generate statistics? How is it useful? Can I export the system statistics from prod database to test database and use it to tune queries? Thanks.

Tom Kyte
September 25, 2003 - 5:10 am UTC

I wrote about this alot in my new book -- looking at system stats. Here is a short excerpt:

<quote>
Use SYSTEM Statistics

Oracle9i introduced an alternative to OPTIMIZER_INDEX_COST_ADJ and OPTIMIZER_INDEX_CACHING that is worth a serious look. The DBMS_STATS package was enhanced to collect a new type of statistic: system statistics. These statistics measure actual performance characteristics of your unique system. They measure your actual CPU and disk-performance characteristics using your actual workloads. They are not guesses. They are actual, measured observations of what has occurred in your system during the period of measurement. Depending on the type of work you actually perform, different costs will be associated with various operations, affecting the eventual query plans that result.
Every physical configuration will perform a little differently than any other configuration. That is why the perfect numbers for the OPTIMIZER_INDEX_* parameters cannot simply be stated. The best numbers rely on your systems fingerprint-its unique characteristics. Hence, the ability for us to measure, collect, and then use actual performance characteristics of our systems can reduce or remove the need to guess at these figures.

System statistics gathering works in one of two ways:
o You tell Oracle to start measuring now, and then after a while, you tell it to stop measuring now.
o You tell Oracle to measure system statistics for the next <some period of time>, and Oracle will start and stop by itself.

In order to demonstrate how this works, we need to set up a simulation. For this example, we will use the explicit start/stop method. We need to simulate a typical load on our system as well, one that is representative of the load we would expect day to day. Normally, the DBA would simply gather these system statistics during normal loads on the system itself; no simulation is necessary.
.....
</quote>

yes, you can export from prod to test

no, i don't see how you can use it to "tune" queries unless the hardward is plug identical 100% on prod and test. it would simply cause the plans on test to be the same as prod and that may or may not be showing you how prod will actually perform as test has a different "fingerprint"

Thanks

Reader, September 25, 2003 - 6:10 pm UTC


Can CBO see historgrams through a join?

Mike Wilson, September 30, 2003 - 10:00 pm UTC

I finally understand histograms. I ran into a situation with severe data skew today this helped me track down the performance issue. My question is though, can the CBO use histograms with joins? Below find an example of an efficient query that uses histograms to answer a simple question. If I modify the query slightly by joining it to a very small table it reverts back to FTS's and ignoring a useful, analyzed index on a table with valid histogram data that is *should* use to answer the question. Thanks for any light you can shed on this variation of the question above.

/* two tables request and a very small lookup table of values. */

da@prodspi> select count(*) from request;

COUNT(*)
----------
212080

/* very small loookup table that is fk'd to request */

da@prodspi> select pk_x_request_status, str_name from x_request_status;

PK_X_REQUEST_STATUS STR_NAME
-------------------------------- ----------------------------------------
9CF7331A7EE3616EE034080020C902BB NEW
9CF7331A7EE4616EE034080020C902BB RUNNING
9CF7331A7EE5616EE034080020C902BB CANCELLED
9CF7331A7EE6616EE034080020C902BB COMPLETE
9CF7331A7EE7616EE034080020C902BB FAILED
B5408C414C6E27ADE034080020C905F8 READY_TO_RUN
DA5A6AA5552E46E8ADE485F4DB4B9029 NEEDS_TAPE

7 rows selected.


/* now show that the data is skewed */

da@prodspi> edit
Wrote file afiedt.buf

1 select count(a.pk_request), b.str_name, b.pk_x_request_status
2 from request a, x_request_status b
3 where a.pk_x_request_status = b.pk_x_request_status
4* group by b.str_name, b.pk_x_request_status
da@prodspi> /

COUNT(A.PK_REQUEST) STR_NAME PK_X_REQUEST_STATUS
------------------- ---------------------------------------- --------------------------------
1 NEW 9CF7331A7EE3616EE034080020C902BB
11371 FAILED 9CF7331A7EE7616EE034080020C902BB
5 RUNNING 9CF7331A7EE4616EE034080020C902BB
188634 COMPLETE 9CF7331A7EE6616EE034080020C902BB
12070 CANCELLED 9CF7331A7EE5616EE034080020C902BB

/* create histograms so I can find the RUNNING jobs by pk_x_request_status */

da@prodspi> analyze table request compute statistics for all indexes
2 for all indexed columns size 10;

Table analyzed.

da@prodspi> edit
Wrote file afiedt.buf

1 SELECT
2 pk_request
3 FROM "REQUEST"
4* WHERE pk_x_request_status = '9CF7331A7EE4616EE034080020C902BB'
da@prodspi> /

PK_REQUEST
--------------------------------
B53E945C635049E0BB9F13A877AE08E8
DC72E30EC66D44E7ACCAEB97A603E791
927CDFD382B441DB8BF61A4E0B794757
95838D2FE8514778B01840C41120628B
0C4A126DA645402E9D8E73979EB46505

Elapsed: 00:00:01.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=5 Bytes=320)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'REQUEST' (Cost=4 Card=5 Bytes=320)
2 1 INDEX (RANGE SCAN) OF 'REQUEST_STATUS_IDX' (NON-UNIQUE) (Cost=3 Card=5)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
590 bytes sent via SQL*Net to client
504 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5 rows processed

/* very slow before histograms, now, very efficient */


/* now join the large skewed request table to the small lookup table
which allows me to search by a meaningful tag */
da@prodspi> edit
Wrote file afiedt.buf

1 SELECT
2 r.pk_request
3 FROM da.REQUEST r, da.x_request_status rs
4 WHERE r.pk_x_request_status = rs.pk_x_request_status
5* AND rs.str_name = 'RUNNING'
da@prodspi> /

PK_REQUEST
--------------------------------
B53E945C635049E0BB9F13A877AE08E8
DC72E30EC66D44E7ACCAEB97A603E791
927CDFD382B441DB8BF61A4E0B794757
95838D2FE8514778B01840C41120628B
0C4A126DA645402E9D8E73979EB46505

Elapsed: 00:00:01.06


/* notice I am back to FTS and this query becomes very expensive. Also note
the card of step 4. The CBO seems to revert back to believing I have
a standard distribution of data on this column */

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=42395 Bytes=4493870)
1 0 NESTED LOOPS (Cost=601 Card=42395 Bytes=4493870)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'X_REQUEST_STATUS' (Cost=1 Card=1 Bytes=42)
3 2 INDEX (UNIQUE SCAN) OF 'SYS_C0026719' (UNIQUE)
4 1 TABLE ACCESS (FULL) OF 'REQUEST' (Cost=600 Card=42395 Bytes=2713280)




Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
6237 consistent gets
11 physical reads
0 redo size
590 bytes sent via SQL*Net to client
504 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5 rows processed

/* just for fun, show that their is a *very* usable index on this column
that the CBO seems to be ignoring now that that the request table is involved
in a join. */

da@prodspi> edit
Wrote file afiedt.buf

1 SELECT /*+ INDEX (r request_status_idx) */
2 r.pk_request
3 FROM da."REQUEST" r, da.x_request_status rs
4 WHERE r.pk_x_request_status = rs.pk_x_request_status
5* AND rs.str_name = 'RUNNING'
da@prodspi> /

PK_REQUEST
--------------------------------
B53E945C635049E0BB9F13A877AE08E8
DC72E30EC66D44E7ACCAEB97A603E791
927CDFD382B441DB8BF61A4E0B794757
95838D2FE8514778B01840C41120628B
0C4A126DA645402E9D8E73979EB46505

Elapsed: 00:00:01.08

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1981 Card=42395 Bytes=4493870)
1 0 NESTED LOOPS (Cost=1981 Card=42395 Bytes=4493870)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'X_REQUEST_STATUS' (Cost=1 Card=1 Bytes=42)
3 2 INDEX (UNIQUE SCAN) OF 'SYS_C0026719' (UNIQUE)
4 1 TABLE ACCESS (BY INDEX ROWID) OF 'REQUEST' (Cost=1980 Card=42395 Bytes=2713280)
5 4 INDEX (RANGE SCAN) OF 'REQUEST_STATUS_IDX' (NON-UNIQUE) (Cost=340 Card=42395)




Statistics
----------------------------------------------------------
9 recursive calls
0 db block gets
14 consistent gets
0 physical reads
0 redo size
590 bytes sent via SQL*Net to client
504 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5 rows processed


Any ideas as to what might cause this? I have only tested this in this one case.

Tom Kyte
October 01, 2003 - 8:31 am UTC

The optimizer, in looking at:

1 SELECT
2 r.pk_request
3 FROM da.REQUEST r, da.x_request_status rs
4 WHERE r.pk_x_request_status = rs.pk_x_request_status
5* AND rs.str_name = 'RUNNING'
da@prodspi> /

has NO CLUE as to the correlation between the various columns. there are no statistics in 9i and before that would help it -- it is a very difficult problem actually. Think about it -- that RS table could have anything in PK_X_REQUEST_STATUS when STR_NAME = RUNNING. The optimizer in this case falls back to "best guess"


I can say in a future release of the database -- things like this will be solvable without the use of hinting -- but in this case, I would probably go for a first_rows style approach rather then an index hint. you want first rows fastest -- whatever that is -- it leaves more paths open for the optimizer rather then telling it exactly what to do.

More on bitmap indexes

Alla, October 21, 2003 - 10:46 am UTC

I have a table in data warehouse with about 17 million records.

Table has a column called time_key with 63 distinct values

I also have a bitmap index on the time_key column.

Table was analyzed using the following command:

analyze table posted_invoice_lines_facts_t compute statistics for columns time_key size 100;

When I am selecting from the table and say that time_key = :value, it's using the index, but when I need to select few time_key, let's say 10, it does not.

Can you explain this please

Thanks

select distinct account_key
from posted_invoice_lines_facts_t
where time_key = 'SEP-03';

tion Plan
-----------------------------------------------------
SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=12)
0 SORT (UNIQUE) (Cost=6 Card=1 Bytes=12)
1 TABLE ACCESS (BY INDEX ROWID) OF 'POSTED_INVOICE_LINES_FACTS_T' (Cost=3 Card=1 Bytes=12)
2 BITMAP CONVERSION (TO ROWIDS)
3 BITMAP INDEX (SINGLE VALUE) OF 'POSTED_INVOICE_LINES_TIME'


select distinct account_key
from posted_invoice_lines_facts_t
where time_key in ('JAN-03', 'FEB-03', 'MAR-03',
'APR-03', 'MAY-03', 'JUN-03', 'JUL-03',
'AUG-03', 'SEP-03', 'OCT-03');

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=18961 Card=534 Bytes=6408)
1 0 SORT (UNIQUE) (Cost=18961 Card=534 Bytes=6408)
2 1 TABLE ACCESS (FULL) OF 'POSTED_INVOICE_LINES_FACTS_T' (Cost=14135 Card=1042863 Bytes=12514356)



Tom Kyte
October 21, 2003 - 5:08 pm UTC

look at the rows it will retrieve from the table: Card=1,042,863

That is more then you want to 'table access by index rowid' in all likelyhood

tell you what -- HINT the query to use the index and using TKPROF, benchmark both. See which is better

(and remember -- please keep chanting

full scans are not evil
indexes are not good incarnate
full scans are not evil
indexes are not good incarnate
full scans are not evil
indexes are not good incarnate
full scans are not evil
indexes are not good incarnate
full scans are not evil
indexes are not good incarnate

over and over....)

How often should I udpate the Histograms

Ivan, February 13, 2004 - 10:47 am UTC

I have a table that contains 8 million rows. Each night I add roughly 10 thousand new entries. I know the indexes take care of themselves and are updated on each new entry. But what about the histograms? Do they get automatically updated with the new entries?

Thanks,
Ivan

Tom Kyte
February 13, 2004 - 11:03 am UTC

No, you need to gather stats as you change the data over time. As 10,000 rows out of 8,000,000 is a proverbial "drop in the bucket" i would suggest you

a) alter table T monitoring (for all tables in your schema)
b) gather STALE stats only (the database will decide when a table is a candidate for regathering stats, when it has "changed" enough)

dbms_stats.Gather_table_stats and subpartitions

KJ, March 03, 2004 - 12:08 pm UTC

Hi Tom
This is a ODS database. we have an account_balance table that has around 20Million rows. Everyday on an average 500000 rows get added to this. And every month partitions will be added. We have this table analyzed on sundays;

2/29/04 - Table was analyzed with "dbms_stats.gather_table_stats('eis_admin',account_balance,NULL,DBMS_STATS. AUTO_SAMPLE_SIZE,FALSE,'FOR ALL COLUMNS SIZE AUTO', NULL,'DEFAULT',TRUE,NULL,NULL,NULL); "

Partition P200403 had no data

03/01/04 & 03/02/04 - March 01 and 02 data was loaded.
The jobs checks for balances for sysdate-2. Therefore on 01 and 02 it was still looking at Feb partitions.

03/03/04 - Checks for March data that was not analyzed.
The job that updates this table kept waiting. Someone did an analyze on the account_balance table and the job continued fine.

My question is:
1. Should we analyze at the subpartition level?

2.dbms_stats.gather_table_stats - DEFAULT granularity updates just all_tables and all_tab_partitions. "Analyze" updated all_tab_subpartitions only.
TABLENAME LAST_ANALYZED
all_tables 2/29/04
all_Tab_partitions 2/29/04
all_tab_subpartitions 3/03/04

3. on 02/29/04 when the table was analyzed it had 0 rows for March partition. So in today's load, when the data for March 01 was being checked, it just hung up. The dba saw no activity. He did an analyze since the num_rows for Partition P200403 had 0. The "Analyze" update the all_tab_subpartitions. The job went through fine after that. It took 4hrs Vs 30mins.Any suggestion why this must have happened?

Thanks for any suggestion

Tom Kyte
March 03, 2004 - 3:31 pm UTC

if your queries employ partition elimination and the optimizer can tell based on the query the exact (single) partition that will be accessed, partition level stats are a good idea.

sorry -- but you can diagnose "a hang". After the fact -- when you've restarted everything, it is really hard. I cannot say.

Pls explain

A reader, March 03, 2004 - 10:26 pm UTC

b) gather STALE stats only (the database will decide when a table is a candidate
for regathering stats, when it has "changed" enough) .

Do we need to schedule dbms_stats as a daily job?
What are the steps to gather stale stats when table monitoring is on
Any small example will help.
Thanx sir very very much.

Tom Kyte
March 04, 2004 - 8:12 am UTC

"need" vs "want" here. it is up to you. does your data change rapidly enough? if so, sure. if not, still "sure" (gather stale won't do anything).


you just use the gather stale option in the dbms_stats.gather_schema_stats call, thats all.

search for

dbms_stats "gather stale"

on this site or check out the supplied packages guide for details.

Oracle Problem

Fakhry, March 04, 2004 - 2:18 am UTC

How to monitor per user CPU utilization in oracle. For example i have 2 oracle users i want to monitor per user CPU utilization.Oracle Enterprise manager gives overall CPU utilization but not per user CPU usage.

Tom Kyte
March 04, 2004 - 11:05 am UTC

enable auditing and you'll get session_CPU recorded amongst other bits of data.

RULE hint solves the problem?

Jelena, March 30, 2004 - 11:15 am UTC

Hi Tom,
in my case rule based optimizer gives me much better performances then cost, and i don't know how i should collect statistics to make clear to oracle that it should use the index :)
So, i have 2 tables,
BENUTZERGRUPPE- 22915 rows (in german groups)
BENUTZER_BENUTZERGRUPPE- 107865 (in german user in groups)
(this has index on BENUTZERID and 8000 distincts values equally distributed)

Now, i need to see for one users all his groups, so i have a query:
SELECT t1.name, t1.loeschdatum, t1.id, t1.eigentuemerid
FROM BENUTZER_BENUTZERGRUPPE t0, BENUTZERGRUPPE t1
WHERE t0.benutzerid = 9495335
AND t1.id = t0.benutzergruppeid
AND t1.LOESCHDATUM IS NULL;

I analyzed both tables (and index) with all possible ways and combinations for bucket size that came to my mind. And always the explain plan with cost based gives me full table scan and takes long:

Elapsed: 00:00:00.89

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=264 Bytes=92
40)
1 0 NESTED LOOPS (Cost=12 Card=264 Bytes=9240)
2 1 TABLE ACCESS (FULL) OF 'BENUTZERGRUPPE' (Cost=12 Card=21
921 Bytes=548025)
3 1 INDEX (UNIQUE SCAN) OF 'BEN_BENGR_PK' (UNIQUE)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
22032 consistent gets


if i put RULE hint, it has (correct and faster) explain plan:
Elapsed: 00:00:00.57

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 NESTED LOOPS
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BENUTZER_BENUTZERGRUPPE'
3 2 INDEX (RANGE SCAN) OF 'BEN_BENGRUPPE_BENID_IND' (NON-UNIQUE)
4 1 TABLE ACCESS (BY INDEX ROWID) OF 'BENUTZERGRUPPE'
5 4 INDEX (UNIQUE SCAN)OF 'BENUTZERGRUPPE_PK_ID' (UNIQUE)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
15 consistent gets


So, my first question is if values are equally distributed and i have 8000 out of 100.000 distinct values, that means that 1 hit with index returns 8% of the table, i think i read somewhere that oracle uses the index when this ratio is below 10%. So if it is the case, then even without 'compute statistics on all indexed columns' it should use the index, right?

Now, my real problem is that this is TOPLINK generated query, and i can't put any hints or change order of columns. So i have only the option to delete statistics (which doesn't sound too sophisticated to me) :)
Plus it would affect the rest of the queries, so i would rather make it run correctly based on statistics.
So, do you think it's possible?
Thanks,
Jelena


Tom Kyte
March 30, 2004 - 11:43 am UTC

have you read about optimizer_index_cost_adj and optimizer_index_caching?

search for those terms on this site and give them some consideration.

optimizer_index_cost_adj = 10 -> FTS

Jelena, March 30, 2004 - 3:46 pm UTC

Hi Tom,
thanks for your answer.
I did try to set those parameters:
optimizer_index_caching is set to 16 ( i think this is the default value).
And optimizer_index_cost_adj does force index scan but only if the value is set to lower then 10 (but difference in consistence gets is huge!).
It is strange for me that it has to be so low to force index scan? Is it because the table of 22000 is considered small enough so that full table scan is soo much cheeper then unique index?

Jelena


Tom Kyte
March 30, 2004 - 6:30 pm UTC

lets see the AUTOTRACE TRACEONLY EXPLAIN output for that. I want to see what the optimizer thinks it is going to get row-wise.

optimizer_index_cost_adj again

Jelena, March 31, 2004 - 7:45 am UTC

Ok, now db settings are:
optimizer_index_cost_adj = 25, 
optimizer_index_caching = 90 

SQL> desc benutzergruppe                             
 Name                          Null?    Type         
 ----------------------------- -------- -------------
 ID                            NOT NULL NUMBER(10)   
 NAME                                   VARCHAR2(128)
 EIGENTUEMERID                          NUMBER(10)   
 LOESCHDATUM                            DATE         
                         
nr of rows from statistics: 22933                   

         
SQL> desc benutzer_benutzergruppe                   
 Name                          Null?    Type         
 ----------------------------- -------- -------------
 BENUTZERID                    NOT NULL NUMBER(10)   
 BENUTZERGRUPPEID              NOT NULL NUMBER(10)  

nr of rows from statistics: 107750  
primary key (benutzerid, benutzergruppeid) 
and nr of unique benutzerid values 7900.

Both tables analyzed with 
DBMS_STATS.GATHER_TABLE_STATS
with option 'FOR ALL INDEXED COLUMNS SIZE AUTO '

Now, if i execute:
SQL> r
  1  SELECT   t1.name, t1.loeschdatum, t1.id, t1.eigentuemerid,t0.benutzerid
  2    FROM BENUTZER_BENUTZERGRUPPE t0, BENUTZERGRUPPE t1
  3    WHERE t0.benutzerid = 9495335
  4    AND t1.id = t0.benutzergruppeid
  5*   AND t1.LOESCHDATUM IS NULL

Elapsed: 00:00:00.83

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=247 Bytes=26
          182)
   1    0   NESTED LOOPS (Cost=14 Card=247 Bytes=26182)
   2    1     TABLE ACCESS (FULL) OF 'BENUTZERGRUPPE' (Cost=12 Card=11
          47 Bytes=107818)
   3    1     INDEX (UNIQUE SCAN) OF 'BEN_BENGR_PK' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      22032  consistent gets
          0  physical reads
          0  redo size
        752  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed


Now, i change:
SQL> alter session set optimizer_index_cost_adj=5;

Session altered.

SQL> SELECT   t1.name, t1.loeschdatum, t1.id, t1.eigentuemerid,t0.benutzerid
FROM BENUTZER_BENUTZERGRUPPE t0, BENUTZERGRUPPE t1
WHERE t0.benutzerid = 9495335  AND t1.id = t0.benutzergruppeid AND t1.LOESCHDATUM IS NULL;
  2    3
Elapsed: 00:00:00.53

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=247 Bytes=26
          182)

   1    0   NESTED LOOPS (Cost=14 Card=247 Bytes=26182)
   2    1     INDEX (RANGE SCAN) OF 'BEN_BENGR_PK' (UNIQUE) (Cost=3 Card=247 Bytes=2964)
   3    1     TABLE ACCESS (BY INDEX ROWID) OF 'BENUTZERGRUPPE' (Cost=2 Card=1 Bytes=94)
   4    3       INDEX (UNIQUE SCAN) OF 'BENUTZERGRUPPE_PK_ID' (UNIQUE)


So the real problem is that this second explain plan (when forced with index hint or rule for example) is much faster then the first one.
I think that setting optimizer_index_cost_adj to 5 is too low, it would probably prevent most of the FTS which is maybe not the best idea? 
What do you think?
Thanks again!
Jelena
 

Tom Kyte
March 31, 2004 - 8:57 am UTC

I've said in the past -- there are two ways to set these parameters.

way to the right (the default)
way to the left (opposite of what they are)

Alter table xxx cache?

Jelena, April 05, 2004 - 6:39 am UTC

Hi Tom,
at some point in time, some of the tables were altered:
alter table ... cache.
Now i tried to alter nocache, and analyzed tables again and it seems to use correct plan (more via indexes)...
It seems that 'cache' has big effect on explain plan, favourizing full table scan, is that correct?
Thanks,
Jelena

Tom Kyte
April 05, 2004 - 9:48 am UTC

"and analyzed the table again"


alter table cache didn't really affect it, "and analyzed the tables again" would/could have a dramatic affect on a plan. (perhaps you analyzed when tables were small, now they are big, and the optimizer had no idea)



big_table@ORA9IR2> set autotrace traceonly explain
big_table@ORA9IR2> select count(status) from big_table;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1379 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=1379 Card=999620 Bytes=4998100)



big_table@ORA9IR2> alter table big_table cache;

Table altered.

big_table@ORA9IR2> select count(status) from big_table x;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1379 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=1379 Card=999620 Bytes=4998100)



cost -- did not change at all.

optimizer_index_cost_adj question

Sravan, May 06, 2004 - 12:58 pm UTC

Tom,
we are a development environment. I have a query which is taking about 90 secs. Our QM team says that a similar query took less than a second to run earlier. I dont have previous executions plans to compare with the present plan. So, dont really know what it used then. The tables are all analyzed with gather_stats. Then, I read somewhere that optimizer_index_cost_Adj will change the plan if we think oracle is not choosing the optimum plan. So, I changed this value to "10", then the query ran in 0.1 secs.

My question is should I go ahead and set this at the instance level, but then this will effect other plans in FTS usage in the rest of the database. Whats the best way to handle this situation ?

Here are the execution plans before and after adjusting optimizer_index_cost_Adj.
QUERY:
------
SELECT scr.component_id, scr.department_id, scr.sitem_id, scr.patient_type_id, scr.p1 * dsr.p1, scr.p2 * dsr.p2, 
 scr.p3 * dsr.p3, scr.p4 * dsr.p4, scr.p5 * dsr.p5, scr.p6 * dsr.p6, scr.p7 * dsr.p7, scr.p8 * dsr.p8, scr.p9 * dsr.p9, scr.p10 * dsr.p10, scr.p11 * dsr.p11, scr.p12 * dsr.p12     
 FROM srav_dept_sitem_vol1_131196528 dsr,   srav_sitem_comp_real_131196528 scr
WHERE scr.component_id =    3688037.1 and     
 scr.department_id = dsr.department_id and     
 scr.sitem_id = dsr.sitem_id and     scr.patient_type_id = 
 dsr.patient_type_id
------

EXECUTION PLAN BEFORE adjusting the parameter:
------
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=431 Card=186 Bytes=24924)

   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'SITEM_COMP_REAL_131196528' 
(Cost=431 Card=1 Bytes=94)

   2    1     NESTED LOOPS (Cost=431 Card=186 Bytes=24924)
   3    2       TABLE ACCESS (FULL) OF 'DEPT_SITEM_VOL1_131196528' (Cost=47 Card=14275 Bytes=571000)

   4    2       BITMAP CONVERSION (TO ROWIDS)
   5    4         BITMAP AND
   6    5           BITMAP CONVERSION (FROM ROWIDS)
   7    6             INDEX (RANGE SCAN) OF 'STM_CMP_RL_2X_131196528'
          (NON-UNIQUE)

   8    5           BITMAP CONVERSION (FROM ROWIDS)
   9    8             INDEX (RANGE SCAN) OF 'STM_CMP_RL_3X_131196528'
          (NON-UNIQUE) (Cost=3 Card=3)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     352698  consistent gets
       3101  physical reads
          0  redo size
     206588  bytes sent via SQL*Net to client
       3379  bytes received via SQL*Net from client
        250  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3733  rows processed


SQL> alter session set optimizer_index_cost_Adj =10
-----
AFTER
-----
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=193 Card=57 Bytes=10374)

   1    0   HASH JOIN (Cost=193 Card=57 Bytes=10374)
   2    1     TABLE ACCESS (FULL) OF 'SRAV_DEPT_SITEM_VOL1_131196528'
          (Cost=47 Card=13613 Bytes=1361300)

   3    1     TABLE ACCESS (BY INDEX ROWID) OF 'SRAV_SITEM_COMP_REAL_1
          31196528' (Cost=85 Card=26177 Bytes=2146514)

   4    3       INDEX (RANGE SCAN) OF 'SRAV_STM_CMP_RL_5X_131196528' (
          NON-UNIQUE) (Cost=66 Card=26177)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1061  consistent gets
          0  physical reads
          0  redo size
     207087  bytes sent via SQL*Net to client
       3379  bytes received via SQL*Net from client
        250  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3733  rows processed
============================================

Indexes on SRAV_DEPT_SITEM_VOL1_131196528 table

SRAV_DPT_STM_VL1_1X_131196528  UNIQUE_ID
SRAV_DPT_STM_VL1_3X_131196528  DEPARTMENT_ID
SRAV_DPT_STM_VL1_6X_131196528  DEPT_SITEM_ID
--------

Indexes on SRAV_SITEM_COMP_REAL_131196528 table

SRAV_STM_CMP_RL_1X_131196528   UNIQUE_ID
SRAV_STM_CMP_RL_2X_131196528   SITEM_ID
SRAV_STM_CMP_RL_3X_131196528   DEPARTMENT_ID
SRAV_STM_CMP_RL_5X_131196528   COMPONENT_ID
SRAV_STM_CMP_RL_7X_131196528   DEPT_SITEM_ID
-----

CREATE TABLE SUPPORT.SITEM_COMP_REAL_131196528 
(
    ACUITY_ID             NUMBER(12,1)     NULL,
    CHARGE_ID             NUMBER(12,1)     NULL,
    CLASS1_ID             NUMBER(12,1) NOT NULL,
    CLASS2_ID             NUMBER(12,1) NOT NULL,
    CLASS3_ID             NUMBER(12,1) NOT NULL,
    COMMON_ID             NUMBER(12,1) NOT NULL,
    COMPONENT_ID          NUMBER(12,1) NOT NULL,
    COMPONENT_TYPE_ID     NUMBER(12,1) NOT NULL,
    CORP_COMPONENT_ID     NUMBER(12,1)     NULL,
    CPT4_ID               NUMBER(12,1)     NULL,
    CREDIT_COMPONENT_ID   NUMBER(12,1)     NULL,
    DEPARTMENT_ID         NUMBER(12,1) NOT NULL,
    DEPT_SITEM_ID         NUMBER(12,1) NOT NULL,
    ENTERPRISE_ID         NUMBER(12,1) NOT NULL,
    ENTITY_ID             NUMBER(12,1) NOT NULL,
    FACILITY_ID           NUMBER(12,1)     NULL,
    HOUR_COMPONENT_ID     NUMBER(12,1)     NULL,
    INDIRECT_COMPONENT_ID NUMBER(12,1)     NULL,
    LOOKUP_CD             NUMBER           NULL,
    NDC_ID                NUMBER(12,1)     NULL,
    NURSE_INTERVENTION_ID NUMBER(12,1)     NULL,
    OR_ID                 NUMBER(12,1)     NULL,
    P1                    FLOAT            NULL,
    P10                   FLOAT            NULL,
    P11                   FLOAT            NULL,
    P12                   FLOAT            NULL,
    P2                    FLOAT            NULL,
    P3                    FLOAT            NULL,
    P4                    FLOAT            NULL,
    P5                    FLOAT            NULL,
    P6                    FLOAT            NULL,
    P7                    FLOAT            NULL,
    P8                    FLOAT            NULL,
    P9                    FLOAT            NULL,
    PATIENT_TYPE_ID       NUMBER(12,1) NOT NULL,
    PERIODICITY_ID        NUMBER(12,1) NOT NULL,
    SITEM_ID              NUMBER(12,1) NOT NULL,
    SITEM_TYPE_ID         NUMBER(12,1) NOT NULL,
    STANDARD_COMPONENT_ID NUMBER(12,1)     NULL,
    UB_ID                 NUMBER(12,1)     NULL,
    UNIQUE_ID             NUMBER(12,1) NOT NULL,
    UNIT_ID               NUMBER(12,1) NOT NULL,
    YEAR                  NUMBER(10)   NOT NULL
)
TABLESPACE WORK_DATA
LOGGING
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE(BUFFER_POOL DEFAULT)
NOPARALLEL
NOCACHE
/
-----

begin
     dbms_stats.gather_table_stats
     (user,'SRAV_DEPT_SITEM_VOL1_131196528',
      method_opt => 'for all indexed columns size auto',
          cascade => true);
  end;
/
----- 

Tom Kyte
May 06, 2004 - 4:56 pm UTC

are the rowcounts associated with the autotrace's "accurate" or way off from reality.

very close

SRavan, May 07, 2004 - 11:16 am UTC

Tom, thanks for looking into this -

SQL> select count(*) from SRAV_SITEM_COMP_REAL_131196528;
104708

SQL> select count(*) from SRAV_DEPT_SITEM_VOL1_131196528;
13613

SQL> select tbale_name, num_rows from dba_tables;
TABLE_NAME                      num_rows
===========                     ========
SRAV_SITEM_COMP_REAL_131196528    105060
SRAV_DEPT_SITEM_VOL1_131196528    13774

-----------------------------
On a side note -
I have never seen the num_rows in the dba_tables to be higher than the real rows(count(*)). All my previous comparisons showed that count(*) > num_rows.

I know the num_rows is a estimation, but still curious if you have seen this before. 

Tom Kyte
May 07, 2004 - 1:05 pm UTC

i mean with the predicate applied. Card=26177, is that one "right"

it is looking like the stats may be off, or something in using the stats is going wrong at the card on the "output expected set of rows" is way off.

dbms_stat advantages over analyze ?

A reader, August 06, 2004 - 8:56 am UTC

Tom,

We are using only ANALYZE ... ESTIMATE STATISTICS SAMPLE 25 PERCENT; statement in real time, how different is it from using the package dbms_stats ? Oracle 9iR1 version.

Can you tell me the advantage of using the dbms_stats pkg?

Thanks for your valuable time


Tom Kyte
August 06, 2004 - 10:13 am UTC

the biggest -- gather stale.

you alter table t monitoring;

and then just gather stale. You'll find (probably) that most of your tables don't get analyzed each time.



the other -- degree => N. in parallel.

Warehouse - Histogram

Shivaswamy, November 06, 2004 - 6:04 am UTC

The discussion in this thread has been very useful. Thanks.

I have a data warehouse to support. Tables in question have around 100 Million rows. We have a third party tool for reporting. It has bind variables in the report screen, wherein user fills in values and churns out report. Is it true histograms are less effective when used with bind variables? Is it effective at all? What is this peeking ability of 9i Optimizer? Is it used in this context? Your input appreciated, Tom.
Thanks

Tom Kyte
November 06, 2004 - 10:58 am UTC

search this site for

peeking

to read about bind variable peeking. This is going to sound strange coming from me but -- can this reporting tool disable its use of bind variables? in a data warehouse with large reports, I might not want to use them all of the time.

Thanks

Shivaswamy, November 06, 2004 - 4:45 pm UTC

Tom,
Thanks. I now have a better understanding of peeking. probably, now I also understand why users have been these days (8174 to 9203 and now going to 9205)complaining sometimes it goes fast and sometimes it never finishes.. I look at and used to find it is doing what it was sked to - say IFFS, say sequential read. Do you have any suggestions to identify this is a peeking problem? Explain plans may not be accurate. And when a user from reporting group calls to say "it is slow today", where can I look to see it is a peeking issue?

And on the third party tool using binds for reporting - currently I am troubleshooting a report which used to complete in 4 hours but now not completing at times even after 8-10 hours. I am testing histograms for this - I will try to find out the option of avoiding binds. May be grant/revoke on table to flush out is a band aid, I came to know now.

Anyway, if you have a process to identify peeking issue on a query running on production (in DSS/DW with bind) I will lookforwrd.

Thanks.

Performance problem in query

Scott, February 02, 2005 - 8:07 am UTC

Hello Tom,
Firstly I am sorry if I posted this problem in wrong thread. But from last 3 days I am facing this problem in SQL queries. i tuned few and they are working fine. This is the last query which I am not able to tune from last 2 days.
i am on 8.1.7 and we have data warehousing so there are few tables which are not frequently analyzed. It would be great if you could help me in solving this performance problem. This query is running from last 2 days without getting finish. i wanted to give you TKPROF result but query is not getting finished thats why I am providing you explain plan result as well as result taken from OEM regarding actual plan query is choosing. please help me.

Query is

SELECT
CG_GAME_METH.RV_MEANING,
COUNT(DISTINCT RM_OWNER.RM_CUSTOMERS.CUST_ID),
CUST_QUARTERLY_GAMES.description
FROM
RM_OWNER.CG_REF_CODES CG_GAME_METH,
RM_OWNER.RM_CUSTOMERS,
RM_OWNER.RM_GAMES CUST_QUARTERLY_GAMES,
RM_OWNER.RM_LCARDS,
RM_OWNER.RM_COUPONS,
RM_OWNER.RM_CUST_QUARTERLY
WHERE
( CG_GAME_METH.RV_DOMAIN = 'PURCHASE METHOD' )
AND ( RM_OWNER.RM_LCARDS.CUST_CUST_ID=RM_OWNER.RM_CUSTOMERS.CUST_ID )
AND ( RM_OWNER.RM_CUST_QUARTERLY.CUST_CUST_ID=RM_OWNER.RM_CUSTOMERS.CUST_ID )
AND ( RM_OWNER.RM_COUPONS.GAME_METH=CG_GAME_METH.RV_LOW_VALUE )
AND ( CUST_QUARTERLY_GAMES.GAME_ID=RM_OWNER.RM_CUST_QUARTERLY.GAME_GAME_ID )
AND ( RM_OWNER.RM_COUPONS.LCARD_CARD_ID=RM_OWNER.RM_LCARDS.CARD_ID )
AND (
( RM_OWNER.RM_CUSTOMERS.CUST_ID LIKE '1_%' )
)
GROUP BY
CG_GAME_METH.RV_MEANING,
CUST_QUARTERLY_GAMES.description
-------------------------------------------------------------

Explain plan result for the query
-----------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost = 22500667
SORT GROUP BY
HASH JOIN
TABLE ACCESS FULL RM_GAMES [ANALYZED]
HASH JOIN
TABLE ACCESS BY INDEX ROWID CG_REF_CODES [ANALYZED]
INDEX RANGE SCAN CG_REF_DOMAIN_I [ANALYZED]
MERGE JOIN
NESTED LOOPS
MERGE JOIN
INDEX RANGE SCAN CUST_PK [ANALYZED]
SORT JOIN
TABLE ACCESS FULL RM_LCARDS [ANALYZED]
TABLE ACCESS BY INDEX ROWID RM_COUPONS [ANALYZED]
INDEX RANGE SCAN CPN_LCARD_FK_I [ANALYZED]
SORT JOIN
INDEX RANGE SCAN CUST_QUARTERLY_PK [ANALYZED]


Execution plan when query actually ran
----------------------------------------------------------

STEPNAME STEP COST ROWS
---------------------------------------------------------
SELECT STATEMENT 15 21983335 2673
SORT (GROUP BY) 14 21983335 2673
HASH JOIN 13 1333056 426630528
RM_OWNER.RM_GAMES TABLE ACCESS (FULL)
1 1 473
MERGE JOIN 12 1305749 426630528
SORT (JOIN) 9 1178793 17531111
HASH JOIN 8 700468 17531111
RM_OWNER.CG_REF_CODES 2 2 9
TABLE ACCESS (FULL)

HASH JOIN 7 7000076 33114320
HASH JOIN 5 5660 300664
RM_OWNER.CUST_PK_INDEX
(FAST FULL SCAN)
3 801 285257
RM_OWNER.RM_LCARDS 4 4541 466098
TABLE ACCESS (FULL)
RM_OWNER.RM_COUPONS
TABLE ACCESS (FULL) 6 596216 191115414
SORT (JOIN) 11 126956 25116749
RM_OWNER.CUST_QUARTERLY_PK 10 16253 25116749
INDEX
(FAST FULL SCAN)

-------------------

I tried a lot for tuning this sql. I tried using OPTIMIZER_INDEX_ADJ=10 after this other queries started working very fast but this query is running like "DOG". I am not getting anything now. I just joined my recent company so I dont have any idea that what got change and i couldn't fine anybody who could give me this idea. Please advise what could i do to tune this SQL? I know that if i will analyse the tables and will change OPTIMIZER_INDEX_COST_ADJ=10 problem will be solved but i cant analyse tables frequently thats my problem. It would be great if you could suggest some changes (may be in SQL) or anything using your wide and deep knowledge.

Tom Please help me.

Thanks,
Scott

Tom Kyte
February 02, 2005 - 8:20 am UTC

indexes would be deadly on these things. full scans, hash joins -- good, index scans to join -- bad


you say:

<quote>
so there are few tables which are not
frequently analyzed.
</quote>

what does that mean precisely? bad stats in -> bad plans out for sure.


Performance going bad

fahad, February 07, 2005 - 3:52 am UTC

Hi tom,
I am not an expert user of analyze command. But i would like to ask something regarding it. I have 8.1.6 database. We have an erp application running on one of its schema.So there is frequent insert , update operations going on all the time.The performance of database is going perfect since years.
We also have discoverer to make reports. # days back i was asked to produce a report on discoverer from the same schema used by erp appl. The report was hanging after retrieving 700 rows i.e whole data was not coming up. I tried the sql command from sqlplus also-same result. The query was using 4 tables two of which were pretty big. So i thought of experimenting with analyze command. and i analyzed these four tables ( only tables). And then when i ran discoverer report it came up in a flash! so far so good.
Next morning i get a call from one of our appl users that he can't work as he is getting 'ORA 03232: unable to allocate extent of 17 block from tablespace 3' error while he was trying to run some small report from his application.
I checked about this and this problem aroused due to TEMP tablespace have small extent size (64K). I used alter command to extend the initial and next to 1048576 b.the error went away.
But the performance went slow Now the end users started complaining that every query on application is slow.
first i thought there was problem in TEMP. But these problems were coming only after the analyze command i gave.
So i deleted statistics for those tables.
And now the users are happy. the performance returned back to normal.
Can you put some insight into this? What happened? whats wrong. Should i not analyze tables in transactional database? If i am analyzing few tables should i set optimizer_mode to some value in init file.(i didn't do anything in init file).
Please explain , as i fear now regarding analyze command.

Tom Kyte
February 07, 2005 - 4:58 am UTC

ugh,

you analyzed some, not all, a tiny bit of a big database

In production.


that was a huge mistake. you either analyze everything, or nothing. and if this ERP application is from a 3rd party, you have to follow their rules for doing so (whether you can analyze or not is up to them even)

In any case, this is something you would never never never consider doing to a live production system without having a test first (not unless you want to have happen again what happened here!)

changing the optimizer is a fairly large change. Some things will go much faster (you saw that). Many things will go the same speed. Some things may well go "slower" and need to be looked at in order to determine "why" - what stats need to be gathered with more detail perhaps.

but you can only do this in test.


So, don't be "afraid" -- but do be "educated" about them (and 816 is really really old software as well -- there has been 4 almost 5 major releases of Oracle since then!)



do not TEST on PROD.


performance

fahad, February 07, 2005 - 7:56 am UTC

oh my god. I think i indeed commited mistake.
Now some reports are hanging. Is there a way i an return back to where i was..
can setting up optimizer_mode=ruled base will help me?


Tom Kyte
February 07, 2005 - 8:47 am UTC

if you deleted statistics, you are back to where you were.

if you deleted SOME of the statistics, but not all, then you need to finish what you started.


deleting the stats would invalidate the cursors -- which would cause them to reparse again later which would use the RBO since there are no stats.

so, did you "miss" some. select last_analyzed from user_tables, user_indexes. select count(*) from user_tab_histograms. if you need non-null last_analyzed or rows in histograms, you missed.

performance

Fahad, February 07, 2005 - 9:22 am UTC

Dear Tom,
Yes i deleted all the statistics. count is zero in user_tab_histograms.
yet some reports are hanging and are not running.
You suggested that i should test all these things in TEST db. but how can we get a TEST db in such state as of production. I mean this prod database was running since years and started giving performance problem recently. so how can i configure a test db with same settings and test my experiments on it?
Now i am wondering if i should analyze the whole schema.. what could have happened.

Tom Kyte
February 07, 2005 - 9:27 am UTC

then it is nothing to do with statistics.

you take production's backup and you restore it to a test machine and you test things there.




performance

fahad, February 07, 2005 - 9:45 am UTC

thanks tom.
I hope i will solve the problem soon.
i wanted to know if deleting statistics was enough to bring it back to rule based.
thank you

Nice example

Alex, February 07, 2005 - 10:54 am UTC

You know, it's funny I was going to post a review on one of the threads related to one of your books asking why you compare performance between RBO and CBO. I use to think, "why would anyone use RBO, what's the point of writing test case for something that is so obviously obsolete and bad?" Never mind now....I guess Tom's just brainwashed me so I know not to touch RBO.

last analyzed time

A reader, February 17, 2005 - 7:25 pm UTC

Hi Tom
How can I make sure that some of the tables have
been analyzed or not? Do I look at last_analyzed_time
of dba_tab_columns?

Also, if I create a table with monitoring clause on,
how does the anlyzing of tables work? Do you need to analyze only once in the beginning and after that the stats will be automatically updated? Or do you still need to run dbms_stats periodically?


Tom Kyte
February 17, 2005 - 7:42 pm UTC

sure, last analyzed will tell you -- or num_rows in dba_tables


with monitoring, you would use dbms_stats with the gather stale option (search for "gather stale" on this site)

thank you,,

A reader, February 18, 2005 - 10:22 am UTC

we have a situation where estimated cardinality is much less
than the actual values. My guess is that the tables
were analyzed once with gather stale and then loads of
data came in (all tables have monitoring clause on)
as part of performance load testing. Perhaps in such
tests, at the end, we need to do a "gather stale" again
or do we need to just do the gather stale once, analyze
the tables once (as shown in another thread I read) and
then let monitoring do the job. Also are there other
common reasons (apart from CBO bugs which I guess may
not be that common:)) for which estimated cardinality
is much different than the actual ones? If I want
to try something out without disturbing the state
of the database, how can I save the stats, do a "gather stale" and then later restore the stats again?
This would make it clear that the performance problem
is due to bad stats...

Tom Kyte
February 18, 2005 - 2:00 pm UTC

if the estimated cardinality is much lower than reality -- we must fix the root cause. in this case it sounds like "stale stats", I would test that theory yes.

Queries are slow in Partitioned table

Seshadri, March 19, 2005 - 2:38 am UTC

Hi Tom,

I am accessing a table which is range partitioned and it contains records to the extent 20 million. This table has a primary key index which is a local non-prefixed partitioned index. My query is similar to the one mentioned below

Select col1
from tab1,tab2
where tab2.col2 = '1'
and tab1.col3 = tab2.col3

In the above query, Tab1 is the table i am talking about and col3 is having the index. Tab2 is the master table based on which records are fetched from tab1.

I also set the optimiser_index_cost_adj to 20 and optimiser_index_caching to 90

Please help us in solving the problem. thanks in Advance

Seshadri


Tom Kyte
March 19, 2005 - 7:01 am UTC

Local non-prefixed indexes These are indexes that do not have the partition key on the leading edge of their column list. The index may or may not contain the partition key columns.


Please tell me how you have a "primary key index" (assuming you mean an index in support of a primary key constraint) that is a local non-prefixed index?



and please -- why do you believe that an index should be used? have you looked at the cardinalities, does this query return lots of rows -- if so, indexes are not the best thing in the world.


Look at your question here and ask yourself "if I (you) was not initmately familar with the problem, would I have even close to sufficient information here to formuate a guess"

More Analyzing

Andy, March 31, 2005 - 11:06 am UTC

We have a production db v9.2.0.1 with a table of around 1.5 m records.

It has a PK of an ID and a DATE.

The initial value of DATE is 31/12/4712 (to effectively keep the record open-ended).
If a new record comes in for that ID, the DATE col is updated to an end date, and so on.

The table has MONITORING on it, and a daily job fires to analyze any tables with stale stats
(using dbms_stats, based on previous posts). The job is valid, and last fired the previous night.

The problem we have is that the table has been causing poor performance after around
1 month and < 5% increase in records. It's only when we manually analyze that it's
back to normal.

Questions:
1) How can this be? I understand that tables become stale at about 10% increase in data,
so effectively we are back to analyzing manually.
2) Is the updating of the PK problematic? Can this adversely affect any stats?


We also have another curious situation with a different DB (v8.1.7).

We have a parent (p) and child (c) tables, related by an ID column.

Table P:
ID VARCHAR2(16) PRIMARY KEY,
C_DATE DATE


Table C:
ID VARCHAR2(16) PRIMARY KEY,
SEQ NUMBER PRIMARY KEY,
X NUMBER

FK on ID refers to p.ID

A simple query:
SELECT p.ID, c.X
FROM P, C
WHERE P.ID = C.ID
AND P.C_DATE BETWEEN (SYSDATE-1) AND (SYSDATE + 1)

starts performing poorly. The explain plan indicates a FULL TABLE SCAN on table C,
and does not use the index on C.ID to join the queries.

From your previous replies to this subject, FTS's are not necessarily a bad thing.
However, they are here! If I force the optimiser to use the index on the C.ID column
using a hint, my query response is nearly 10 times quicker than without it. The Cost
associated with the use of the index is higher than without, so I can see why the
optimiser is using the FTS, but it shouldn't be, as the performance is better with the higher cost.

Here are some stats, with and without the hint:

With the hint:
SELECT /*+INDEX (C ID_IDX)*/
p.ID, c.X
FROM P, C
WHERE P.ID = C.ID
AND P.C_DATE BETWEEN (SYSDATE-1) AND (SYSDATE + 1)

STATISTICS
-----------------------------------------------------------
0 recursive calls
0 db BLOCK gets
2035 consistent gets
0 physical reads
0 redo SIZE
149079 bytes sent via SQL*Net TO client
3975 bytes received via SQL*Net FROM client
29 SQL*Net roundtrips TO/FROM client
2 sorts (memory)
0 sorts (disk)

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 486 1987
NESTED LOOPS 486 75 K 1987
TABLE ACCESS BY INDEX ROWID P 486 9 K 43
INDEX RANGE SCAN P_C_DATE_IDX 486 4
TABLE ACCESS BY INDEX ROWID C 239 K 32 M 4
INDEX RANGE SCAN C_ID_IDX 239 K 3

Without the hint:
SELECT p.ID, c.X
FROM P, C
WHERE P.ID = C.ID
AND P.C_DATE BETWEEN (SYSDATE-1) AND (SYSDATE + 1)

STATISTICS
-----------------------------------------------------------
0 recursive calls
10 db BLOCK gets
14265 consistent gets
14083 physical reads
0 redo SIZE
150015 bytes sent via SQL*Net TO client
3875 bytes received via SQL*Net FROM client
29 SQL*Net roundtrips TO/FROM client
4 sorts (memory)
0 sorts (disk)

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 486 394
HASH JOIN 486 75 K 394
TABLE ACCESS BY INDEX ROWID P 486 9 K 43
INDEX RANGE SCAN P_C_DATE_IDX 486 4
TABLE ACCESS FULL C 239 K 32 M 287



Questions:
3) What can affect the optimiser so drastically? The query is a simple join, and ordinarily works fine.
It seems to be doing the opposite of what I am expecting, and degrades performance too.
4) I have analyzed the tables with combinations of FOR TABLE, FOR ALL INDEXES and FOR ALL INDEXED COLUMNS.
The above stats are the best case so far. The PK's are not skewed. What else should I be looking at?

Thanks.

Tom Kyte
March 31, 2005 - 11:30 am UTC

the trigger for the gather stale is closer to 10 and what I'm assuming is happening is the hi/low/ndistinct values for this column is materially affecting the plans

Things that start with "A" (auto) work most of the time, in many of the cases. There are and probably will be exceptions to the general case. A column with hi/lo values that change is one of them.

And remember "not necessarily" does not mean "good" or "bad", just that to hunt them down and kill them all is wrong (I see that alot, find all full scans and make them go away, then I get the question "why is this running so slow")

you chopped the rows out -- but what I suggest is you

a) take the autotrace traceonly explain
b) a tkprof
c) compare the optimizers GUESS (autotrace) with the ACTUALS (row source information in the tkprof)

see why they are so divergent (typically bad, missing or stale statistics)


This "date" query looks like stale stats again -- you can either gather stats or if you know better "dbms_stats.set_table_stats/column_stats/index_stats" to let the optimizer know what you know.

More Analyzing

Andy, April 04, 2005 - 5:29 am UTC

OK, thanks. I will attempt what you suggest.

However, I do understand that a full table scan is what the optimiser believes to be the best path, but in this case, we had a query that performed fine (without an fts), then didn't (with an fts), then performs optimally again when I force it to use the same path as before. The point here is that I wouldn't have had to touch the query if it had run quickly with an FTS.

I have perfomed an analysis, as mentioned above, so I don't see how the stats CAN be stale or missing, etc.

Are there other problems that can arise from other factors? It seems to me that the stats and analyzing are a red herring in our case.

We have only used the ANALYZE TABLE command - as someone posted above: are there any differences between that command and the DBMS_STATS analyzing, other than the advantages you stated? Is one more COMPLETE than the other?

Your comments are appreciated.


evil bug in 9i, 10g

Alberto Dell'Era, April 04, 2005 - 7:50 am UTC

Andy,

there's a bug in dbms_stats that may explain your strange behaviour - if you use dbms_stats.gather_table_stats with method_opt=>'for all columns size auto' or method_opt=>'for all INDEXED columns size auto', ie you gather the histograms, the histograms themselves are NOT computed the first time you issue the command - but they are computed if you re-issue the command (see below for test case).

Your data (column DATE) is skewed and so gathering histograms may have sense, so i'm assuming that you're using one of the above commands in the job.

One possible scenario is
1) you increased the number of columns by 5%, and updated(or deleted) another 5% => stale statistics (>10% modifications);
2) the job fires, uses dbms_stats, hits the bug -> histograms are not computed or computed badly -> bad plan
3) you re-gather stats -> all is ok

A lot of guessing here, but perhaps ...

Test case (from my iTar 4354870.999, no bug reference yet):

create table tt (id, x, junk)
as select rownum, 'M', rpad(' ',1000) from all_objects where rownum <= 10000;
update tt set x = 'F' where rownum = 1;
create index tt_id_idx on tt (id);
create index tt_x_idx on tt(x);
alter system flush shared_pool;
exec dbms_stats.gather_table_stats (user,'tt',cascade=>true,method_opt=>'for
all INDEXED columns size auto');
alter session set tracefile_identifier = 'statsMissedIndex';
alter session set sql_trace=true;
select count(junk) /* first stats */ from tt where x = 'F';
select count(junk) /* first stats */ from tt where x = 'M';
exec dbms_stats.gather_table_stats (user,'tt',cascade=>true,method_opt=>'for
all INDEXED columns size auto');
select count(junk) /* second stats */ from tt where x = 'F';
select count(junk) /* second stats */ from tt where x = 'M';
Tkprof says:
********************************************************************************
select count(junk) /* first stats */
from
tt where x = 'F'

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=1434 r=0 w=0 time=7609 us)
1 TABLE ACCESS FULL TT (cr=1434 r=0 w=0 time=7590
us)
********************************************************************************
select count(junk) /* second stats */
from
tt where x = 'F'

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=3 r=0 w=0 time=79 us)
1 TABLE ACCESS BY INDEX ROWID TT (cr=3 r=0 w=0 time=65 us)
1 INDEX RANGE SCAN TT_X_IDX (cr=2 r=0 w=0 time=41 us)(object id
100017)

Same with "for all columns", perhaps the subroutine that resolves "all" has some problems. Support is working on it at this moment.

Thanks Alberto - have noted your comments

Andy, April 04, 2005 - 10:07 am UTC


Tom Kyte
April 04, 2005 - 11:50 am UTC

Alberto and I had an offline discussion on this, there is a fair chance he's nailed it, if so, please let us know, it would be a relevant fact to know.

Histograms

veera, April 04, 2005 - 3:01 pm UTC

Tom,

I want to clarify the following doubts on Histograms.

1) Histograms are hints to optimizer to use index if the
index column has low cardinality.
2) Histogarm buckets are not equivalent to partitions.
3). Histograms will not help the performance if the index values are distributed uniformly.
For example in Trial table (Accounting) we have a column called GL_end_date (1 per month i.e last day of the month). There will be approx 500,000 records trial records for each Gl end date. We have a Trial table with 30 million records with 60 distinct GL end dates. Will an index on GL End date helps a query on Trial table
with a condition of
GL_end_date = TO_DATE('31-Jan-2005', 'dd-mon-yyyy')
or
GL_end_date BETWEEN TO_DATE('31-jan-2005', 'dd-mon-yyyy') AND TO_DATE('31-mar-2005', 'dd-mon-yyyy')

Thanks for your input.


Tom Kyte
April 04, 2005 - 5:29 pm UTC

1) histograms are FACTS the optimizer takes into consideration when computing cardinalities in the plan....

2) correction

3) you still want column stats though, the high/low and number of distinct values.  

It can have a profound effect if the data is uniformally distributed, but with gaps though!


ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t
  2  as
  3  select mod(rownum,10)*2 id, a.* from all_objects a;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_idx on t(id);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for columns id size 254', cascade=>true);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where id = 3;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=100)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=100)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=1)
 
 
 
ops$tkyte@ORA9IR2> select * from t where id = 4;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=64 Card=2794 Bytes=279400)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=64 Card=2794 Bytes=279400)
 
 
 
ops$tkyte@ORA9IR2> set autotrace off


 

More Analyzing

Andy, April 05, 2005 - 8:29 am UTC

My findings:

Ran Alberto’s example above (and as stated, ours is an 8i db) – replicated the same problem.

Issuing DBMS_STATS twice against the table in question didnÂ’t affect our query.

Here are the outputs as suggested:

(First query is without hint; second is with)

***********************************************************************
GUESS (USING AUTOTRACE TRACEONLY EXPLAIN)
***********************************************************************

  1  SELECT
  2  i.i_number, ir.i_number, ir.BASE
  3  FROM T1 i,
  4    T2 ir
  5  WHERE i.i_number = ir.i_number
  6* AND  i.creation_date BETWEEN (SYSDATE-1) AND (SYSDATE+1)
SQL> /

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1060 Card=2152 Bytes =79624)
   1    0   HASH JOIN (Cost=1060 Card=2152 Bytes=79624)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=142 Card=1485 Bytes=31185)
   3    2       INDEX (RANGE SCAN) OF 'I_CREATION_DATE_IDX' (NON-UNIQUE) (Cost=9 Card=1485)
   4    1     TABLE ACCESS (FULL) OF 'T2' (Cost=863 
 Card=753803 Bytes=12060848)




SQL> ed
Wrote file afiedt.buf

  1  SELECT
  2  /*+INDEX (IR T2_IDX)*/
  3  i.i_number, ir.i_number, ir.BASE
  4  FROM T1 i,
  5    T2 ir
  6  WHERE i.i_number = ir.i_number
  7* AND  i.creation_date BETWEEN (SYSDATE-1) AND (SYSDATE+1)
SQL> /

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7567 Card=2152 Bytes=79624)
   1    0   NESTED LOOPS (Cost=7567 Card=2152 Bytes=79624)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=142 Card=1485 Bytes=31185)
   3    2       INDEX (RANGE SCAN) OF 'I_CREATION_DATE_IDX' (NON-UNIQUE) (Cost=9 Card=1485)
   4    1     TABLE ACCESS (BY INDEX ROWID) OF 'T2' 
(Cost=5 Card=753803 Bytes=12060848)
   5    4       INDEX (RANGE SCAN) OF 'T2_IDX' (UNIQUE)
(Cost=3 Card=753803)




SQL> 

***********************************************************************
ACTUAL (TKPROF)
***********************************************************************
SELECT
i.i_number, ir.i_number, ir.BASE
FROM T1 i,
     T2 ir
WHERE i.i_number = ir.i_number
AND  i.creation_date BETWEEN (SYSDATE-1) AND (SYSDATE+1)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00      14114      14159         10          25
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00      14114      14159         10          25

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 18  

Rows     Row Source Operation
-------  ---------------------------------------------------
     25  HASH JOIN 
    440   TABLE ACCESS BY INDEX ROWID T1 
    441    INDEX RANGE SCAN (object id 2969)
 753224   TABLE ACCESS FULL T2 

********************************************************************************

SELECT/*+INDEX (IR T2_IDX)*/
i.i_number, ir.i_number, ir.BASE
FROM T1 i,
     T2 ir
WHERE i.i_number = ir.i_number
AND  i.creation_date BETWEEN (SYSDATE-1) AND (SYSDATE+1)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00         10         57          0          25
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00         10         57          0          25

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 18  
********************************************************************************


Any other suggestions? Thanks. 

Tom Kyte
April 05, 2005 - 12:04 pm UTC

what does the autotrace traceonly explain for the query look like with and without the hint?

and can we see the row source plan for the index one as well?

Good one

A reader, April 05, 2005 - 2:29 pm UTC

Tom
Could you please explain what Oracle do with histograms.

Thanks!


request for statements

Alberto Dell'Era, April 05, 2005 - 4:49 pm UTC

NB I was addressing your first problem, that was on 9.2.0.1 as i understood, not the second one on 8i. Sorry that i didn't make it clear.

What about posting the exact statements you are using to gather statistics for the table (and associated indexes i believe? ):
a) in the job that "last fired the previous night"
b) when you "manually analyze"

If (b) makes a good plan and (a) not, there must be some relevant difference between the two statements - assuming of course that the job fired and analyzed the table you are discussing about, and that caused the bad performances you observed - do you know whether the job fired and analyzed the table ?

Analyzing

Andy, April 06, 2005 - 5:57 am UTC

Tom,
The first half of my last post are the AUTOTRACE TRACEONLY outputs (The "GUESS" section).
The second half are the TKPROF outputs (The "ACTUAL" section).


Alberto,
The job fires nightly, but the table did not get reanalyzed when performance degraded (indicating stale stats). It's not that the job produces bad stats for the table, it is the fact that it didn't think there was a problem with the table!

Thanks, Andy

Analyzing (2)

Andy, April 06, 2005 - 6:08 am UTC

Tom

Re. row source plan for the index: I've just checked the file, and it's not there! Why would it not give the plan?

Andy

Tom Kyte
April 06, 2005 - 9:10 am UTC

cursor has to be closed before stat records are written to trace.

so, if doing testing in sqlplus, exit before running tkprof.

Alberto Dell'Era, April 06, 2005 - 7:29 am UTC

> it is the fact that it didn't think there was a problem
> with the table!

Well, statistics on monitored tables are declared "stale" when the table accumulates > 10% of modifications (insert, update, deletes) ... that's the only check that is performed.

So, probably the statistics were bad(old) since the very beginning; by analyzing manually, you just corrected the problem.

Reply to Alberto

Andy, April 06, 2005 - 8:35 am UTC

Alberto

Here is a chronological breakdown of events:
1) Table analyzed manually about 1 month ago, so that the DBMS_STATS package can monitor the table
2) Table updated\queried at expected speed i.e. things were fine
3) 1 month later a fault reported that table updates are taking a long time. Something has obviously changed at this point, indicating stale stats for the table
4) Analyze job still running nightly, but not picking up the staleness(!) of the table
5) LAST_ANALYZED is my manual analyzed date as in (1)
6) I am forced to perform another manual analyze to get updates back to normal. I shouldn't have to do this!

This indicates that the overnight job has not picked up (newly) stale stats for the table. MONITORING is on the table.

If the table has changed by 10% (via INS\UPD\DEL) then surely the GATHER_STALE will work? But it doesn't.

Regards,
Andy

multicolumn histogram

k., April 06, 2005 - 9:55 am UTC

hi tom,

i have some serious problems with the CBO. the simplest being this one:

suppose i have a select joining 2 tables containing
information about employees. eg: i'm interested in
all employees whose name starts with J and whose age > 40



select count(*) from dat_number where value > 40
returns millions of rows

select count(*) from dat_number where type = 'age' and value>40
returns eg 1000 rows

select count(*) from dat_string where type = 'name' and value like 'j%'
returns eg 25000 rows


still when i run

select s.person_id
from dat_string s, dat_number n
where s.type = 'name'
and s.value like 'j%'
and n.type = 'age'
and n.value > 30
and s.person_id = n.person_id;


the optimizer chooses DAT_STRING to be the driving table


is it because the CBO only takes histograms on type and value
columns separately?


thanks


Tom Kyte
April 06, 2005 - 1:27 pm UTC

why should it return dat_string as the driving table.


but an autotrace traceonly explain with ESTIMATED CARDINALITIES is always very useful.

(i have a problem with your model! ugh, you are going to be fighting performance, comprehesion, data integrity, everything -- literally everything databases were invented to make easier and better, you have defeated with this model)

Analyzing was not the problem!

Andy, April 06, 2005 - 10:50 am UTC

Tom

We found the solution. Typically looking at the wrong area!

Here is the first query again (slightly different criteria; same performance):

SQL> 
  1  SELECT
  2  i.i_number, ir.i_number, ir.base
  3  FROM t1 i, t2 ir
  4  WHERE  i.creation_date >= (SYSDATE-2/24)
  5* AND i.i_number = ir.i_number
SQL> /

...

75 rows selected.

Elapsed: 00:00:11.08

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5134 Card=42920 Byte
          s=1759720)

   1    0   HASH JOIN (Cost=5134 Card=42920 Bytes=1759720)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 't1' (Cost=246 Ca
          rd=29619 Bytes=562761)

   3    2       INDEX (RANGE SCAN) OF 'i_CREATION_DATE_IDX' (NO
          N-UNIQUE) (Cost=17 Card=29619)

   4    1     TABLE ACCESS (FULL) OF 't2' (Cost=863 Car
          d=753959 Bytes=16587098)


Statistics
----------------------------------------------------------
         38  recursive calls
         10  db block gets
      14186  consistent gets
      14136  physical reads
          0  redo size
       4097  bytes sent via SQL*Net to client
        959  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         75  rows processed


Here is the amended SQL:

SQL> 
  1  SELECT
  2  i.i_number, ir.i_number, ir.base
  3  FROM t1 i, t2 ir
  4  WHERE  i.creation_date >= TO_DATE('06/04/2005 14:16:00')
  5* AND i.i_number = ir.i_number
SQL> /

...

75 rows selected.

Elapsed: 00:00:03.07

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=3 Bytes=123)
   1    0   NESTED LOOPS (Cost=14 Card=3 Bytes=123)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 't1' (Cost=4 Card
          =2 Bytes=38)

   3    2       INDEX (RANGE SCAN) OF 'i_CREATION_DATE_IDX' (NO
          N-UNIQUE) (Cost=3 Card=2)

   4    1     TABLE ACCESS (BY INDEX ROWID) OF 't2' (Co
          st=5 Card=753959 Bytes=16587098)

   5    4       INDEX (RANGE SCAN) OF 't2_IDX' (UNIQUE)
           (Cost=3 Card=753959)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        126  consistent gets
          1  physical reads
          0  redo size
       2317  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         75  rows processed

SQL> 

This shows us that using SYSDATE in the criteria caused the FULL TABLE SCAN! Switching to a date literal makes the query tolerable, esp when more tables are added to the query. Didn't realise that would be a problem!?

Thanks for your assistance. 

to Andy

Alberto Dell'Era, April 06, 2005 - 12:17 pm UTC

>3) 1 month later a fault reported that table updates are taking a long time.
> Something has obviously changed at this point, indicating
> stale stats for the table

Well technically the stats are "stale" only when >= 10% of modifications have taken place, i'm not aware of any other condition that may cause the stats to be labeled as "stale" in 9i, so the job is working as expected.
And, a table with < 10% modifications has (normally, in general) statistics so similar that the plan doesn't change, at least not so radically.

Maybe your analyze has simply flushed the plan from memory, and another (optimal) plan has been recalculated instead - are you perhaps mixing histograms and bind variables, or histograms and running with cursor_sharing=force ?

Do you have histograms in place (good to know the statement you use for analyze).

I assume of course that no DDL activity (new indexes, reorganizations or rebuild of indexes) have taken place, and that the only thing that happened is the <10% modifications you said about.

multicolumn histograms 2

k., April 09, 2005 - 7:18 am UTC

<quote>
(i have a problem with your model! ugh, you are going to be fighting 
performance, comprehesion, data integrity, everything -- literally everything 
databases were invented to make easier and better, you have defeated with this 
model)
</quote>

tom, how can you have a problem with my model not knowing anything about it and its purpose?
i respect you - to me you are the highest oracle authority, but in this case you are making conclusions based on...on nothing..

i believe i can justify it before anyone - well and if anyone shows me i could do certain things better, i'll thank them for they help me improve the application.

so here is my question with some more details:

SQL> select /*+ RULE */ count(distinct t1.ent_id) from 
  2    (select ent_id from dat_string where type_cd like 'stA0A001%' and srch_value_txt like 'ab%' and inac_ind='00') t1,
  3    (select ent_id from dat_number where type_cd like 'nuA001%' and inac_ind in ('00') and value_nbr>2000) t2
  4  where
  5     t1.ent_id=t2.ent_id;


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=HINT: RULE
   1    0   SORT (GROUP BY)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'DAT_STRING' (TABLE)
   3    2       NESTED LOOPS
   4    3         TABLE ACCESS (BY INDEX ROWID) OF 'DAT_NUMBER' (TABLE)
   5    4           INDEX (RANGE SCAN) OF 'DAT_NUMBER__TCD_VAL' (INDEX)
   6    3         INDEX (RANGE SCAN) OF 'DAT_STRING__EID_TCD' (INDEX)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1442  consistent gets
          0  physical reads
          0  redo size
        408  bytes sent via SQL*Net to client
        512  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed
          
          
          
          
          
          
          
SQL> select count(distinct t1.ent_id) from 
  2    (select ent_id from dat_string where type_cd like 'stA0A001%' and srch_value_txt like 'ab%' and inac_ind='00') t1,
  3    (select ent_id from dat_number where type_cd like 'nuA001%' and inac_ind in ('00') and value_nbr>2000) t2
  4  where
  5     t1.ent_id=t2.ent_id;


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=13 Card=1 Bytes=51)
   1    0   SORT (GROUP BY)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'DAT_NUMBER' (TABLE) (Cost=1 Card=1 Bytes=17)
   3    2       NESTED LOOPS (Cost=13 Card=692 Bytes=35292)
   4    3         TABLE ACCESS (BY INDEX ROWID) OF 'DAT_STRING' (TABLE) (Cost=7 Card=608 Bytes=20672)
   5    4           INDEX (RANGE SCAN) OF 'DAT_STRING__TCD_SVAL' (INDEX) (Cost=6 Card=725)
   6    3         INDEX (RANGE SCAN) OF 'DAT_NUMBER__EID_TCD' (INDEX) (Cost=0 Card=1)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6771  consistent gets
       4073  physical reads
          0  redo size
        408  bytes sent via SQL*Net to client
        512  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed    
          
          

despite the fact that 
"select ent_id from dat_number where type_cd like 'nuA001' and inac_ind in ('00') and value_nbr>2000"
returns exactly the same data as
"select ent_id from dat_number where type_cd like 'nuA001%' and inac_ind in ('00') and value_nbr>2000"
(ie: select count(distinct type_cd) from dat_number where type_cd like 'nuA001%' == 1)
,
without the wild character in type_cd, CBO chooses the right execution plan (the
same as the one generated by RBO in the first query)

in 9 out of 10 cases i can use type_cd='something' or sometimems type_cd in ('sth1','sth2','sth3',..)
but sometimes i can't avoid using type_cd like 'something%' as
(select count(distinct type_cd) from XXX where type_cd like 'something%') is too high.
i feel helpless when i KNOW oracle can execute the query so that it runs fast,
if it chooses the right execution plan and still cannot find a way to give CBO
all information it needs for it to be able to find the best plan.

 

Tom Kyte
April 09, 2005 - 8:04 am UTC

I've seen your model millions of times, this is all the information:

<quote>
eg: i'm interested in
all employees whose name starts with J and whose age > 40



select count(*) from dat_number where value > 40
returns millions of rows

select count(*) from dat_number where type = 'age' and value>40
returns eg 1000 rows

select count(*) from dat_string where type = 'name' and value like 'j%'
returns eg 25000 rows


still when i run

select s.person_id
from dat_string s, dat_number n
where s.type = 'name'
and s.value like 'j%'
and n.type = 'age'
and n.value > 30
and s.person_id = n.person_id;
</quote>

anyone needs to see. To answer the simple question:

<quote>
all employees whose name starts with J and whose age > 40
</quote>

the query should be:

select * from emp where name like 'J%' and age > 40;

that you have to join to things like dat_string and dat_number using columns like type (usually called ATTRIBUTE_NAME or ATTR_NAME) and value indicates you are using what I term the "funky data model"

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:10678084117056 <code>

has an example almost identical! (except I didn't use two tables for numbers and strings, that'll be a problem for you even more so than that example)

You'll live to regret that generic model, every time I've used them every time (and I have, twice, for real applications, 15 and 13 years ago, never again) they fail to perform adequately, their is very little change for data integrity, the end users cannot ad-hoc it, the people called on to maintain it over time want to kill you....



one more CBO question

k., April 09, 2005 - 7:28 am UTC

AN ASKTOM READER PRESENTS
 

index DAT_NUMBER__TCD_DDT on dat_number(type_cd,data_dt);
index DAT_NUMBER__TCD_VAL on dat_number(type_cd,value_nbr);

table DAT_NUMBER (
   ...
   data_dt     date, -- irrelevant for this query
   inac_ind    char(2), -- a non-indexed field in the query -> oracle has to access the table, not just do an index scan..
   ....
   type_cd     varchar2(90), -- type of information (eg: number of employees, nominal value, volume, etc.)
   ....
   value_nbr   number, -- the actual value
   ....
);


IN

3 simple queries in some of which CBO chooses a wrong index



1) ***************************************   INDEX CHOICE: OK        *************************************
SQL> select count(*) from dat_number where type_cd='nuA001' and inac_ind in ('00') and value_nbr<20000;

  COUNT(*)
----------
      3232


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=331 Card=1 Bytes=12)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'DAT_NUMBER' (TABLE) (Cost=331 Card=1311 Bytes=15732)
   3    2       INDEX (RANGE SCAN) OF 'DAT_NUMBER__TCD_VAL' (INDEX) (Cost=8 Card=1714)
          
          
2) ***************************************   INDEX CHOICE: WRONG     *************************************

SQL> select count(*) from dat_number where type_cd='nuA001' and inac_ind in ('00') and value_nbr>20000;

  COUNT(*)
----------
         8


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=779 Card=1 Bytes=12)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'DAT_NUMBER' (TABLE) (Cost=779 Card=6120 Bytes=73440)
   3    2       INDEX (RANGE SCAN) OF 'DAT_NUMBER__TCD_DDT' (INDEX) (Cost=47 Card=9715)




3) ************************************** AN INTERESTING THING ********************************************
SQL> select count(ent_id) from dat_number where type_cd='nuA001' and inac_ind in ('00') and value_nbr>20000
     and value_nbr<=(select max(value_nbr) from dat_number where type_cd='nuA001');

COUNT(ENT_ID)
-------------
            8


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=97 Card=1 Bytes=12)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'DAT_NUMBER' (TABLE) (Cost=95 Card=306 Bytes=3672)
   3    2       INDEX (RANGE SCAN) OF 'DAT_NUMBER__TCD_VAL' (INDEX) (Cost=4 Card=486)
   4    3         SORT (AGGREGATE)
   5    4           FIRST ROW (Cost=3 Card=9715 Bytes=87435)
   6    5             INDEX (RANGE SCAN (MIN/MAX)) OF 'DAT_NUMBER__TCD_VAL' (INDEX) (Cost=3 Card=9715 Bytes=87435)
   
   
   
   ... when i replace "..value_nbr<(select max(value_nbr)...." with the actual value this subquery
   returns, CBO chooses the wrong index again.
   
what makes CBO choose DAT_NUMBER__TCD_DDT instead of
DAT_NUMBER__TCD_VAL?

thanks 

generic data model

k., April 09, 2005 - 1:35 pm UTC

i can't agree with you.

1) the application has been running for some 2 years now, and has been doing quite well as far as performance is concerned. the database contains information about cca 2,000,000 subjects (companies, banks, etc.) + cca. 1,000,000 other entities (stocks, events,etc.). for the most commonly queried combinations of "data types" [like company name, address, ...] we have warehouse tables.


2)
<quote>
the query should be:

select * from emp where name like 'J%' and age > 40;
</quote>

you don't mean it, do you?
this was just an example, of course if i was to store information about an empoyee i would do it this way, BUT:

most of the information we store in our system is not unique (i mean for example each company can have more addresses, more names, many "empoyee counts" - the information is either not unique at all or changes in time - like in 1998 Oracle had X empoyees, in 1999 Y empoyes and so on)

so still we would have no other choice than to create
1 table for names, 1 for employee count, etc.
and querying this information would need joins.

you just CAN'T store all that information together.

keeping the data normalized is a must.

and there's just a little step from keeping it normalized
to a generic model - which you apparently despise.

the only thing we have lost is data integrity on database level (or more precisely - the ability to control logical relations between various types of information in the database - of course we still use data integrity constaints where we can) - this is the only difference between GM and a normalized databse - or am i wrong?


the flexibility has not been decided for for the developers to be able to store any kind of information in existing tables - it's there for the users - users who have the rights to do it can create any new types and subtypes of information - they can also define integrity rules. and much much more.

it takes some 4 lines in a configuration file to add a new table to the system (and we have added a few not-so-generic tables to it already [eg a table for financial statement items - contains some specific information + has about 20,000,000 records, so we did not want to put it to the number table] - and adding a new table we instantly gain all features available for existing tables -
like query editor, data editor, etc.

no developers are needed to write queries and this stuff.
it is easy for an average user to create the query the want.

i hope i'm not bothering you with all this - i will not go into any more detail as i think this is not the kind of stuff people here are interested in. i just had to say a few words because i believe i am right. and nobody has proven me wrong yet - not even those two years of clients accessing our data stored in such a (in your opinion) bad model.

please could you answer my questions concerning CBO?
regardless of what you thing of the data model, it's
still a problem that i am unable to solve by myself and i would be very grateful if you could help me solve it.

thank you

ps: i hope i don't appear ... how to say it..ee..i do appreciate what you do here. i have learned a lot about oracle thanks to you and i hope to learn much more in the future.


Tom Kyte
April 09, 2005 - 3:39 pm UTC

1) fine, I've yet to see it work successfully myself (and you seem to be hitting some issues, you are asking how to get the optimizer to understand these cross table correlations)

2) i did mean it, given your example, absolutely 100% meant it.

I don't understand your data that you would need multiple tables.

<quote>
most of the information we store in our system is not unique (i mean for example
each company can have more addresses, more names, many "empoyee counts" - the
information is either not unique at all or changes in time - like in 1998 Oracle
had X empoyees, in 1999 Y empoyes and so on)
</quote>

we store data like that in our HR system, rows and columns in tables. The funky data model isn't how it is normally done.

<quote>
and there's just a little step from keeping it normalized
to a generic model - which you apparently despise.
</quote>

to me it is more of a broad jump but anyway...


You have lost the ability to easily query the information, look at the query you need to write?


you'll need sql profiles in 10g before the optimizer can even begin to think about cross table correlations like that.

...

k., April 09, 2005 - 5:41 pm UTC

<quote>
2) i did mean it, given your example, absolutely 100% meant it.
</quote>

ok i agree, that's because my example was just something artificial i made up to present the problem. i apologize for that

<quote>
we store data like that in our HR system, rows and columns in tables. The funky
data model isn't how it is normally done.
</quote>

can you please give me an example?
how can i put for example name and address in one table -

now a common case - a company with 2 names and 3 addresses etc. - in our database it is stored like this:

dat_string - 2 records
each of them containing the vaule itself + some additional info like start date, end date, value indicating whether this information is still valid, historical or deleted, etc.

dat_address - 3 records
contains about the same except that the value is decomposed into street, city, state, etc. columns


had we used one table we'd end up with 6 records,
the table would have to have almost twice as many columns
and editing the data would become a nightmare.


i think the term "generic model" is not appropriate in our discussion - from my point of view, what we are talking about - and what i am justifying here - is NORMALIZATION.

if i named the table "dat_string" "company_name" - leaving the same columns, would you still have the same problem with my model? i would still need the "type_cd" column as there are various types of names (i won't list them though - my vocabulary becomes a limiting factor here - and i'm too lazy to open a dictionary).

what i would gain is the ability to define a foreign key to a hypothetic table "company". that's all.

selecting the data would be all the same, wouldn't it?

so
"You have lost the ability to easily query the information, look at the query you
need to write?"

no i have not, there never was another way to do it
with this kind of data (please correct me if i'm wrong)


and two more things:
do not underestimate oracle - in 9/10 cases CBO chooses
the perfect plan, 1/10 is what i need your help with.
still there almost always exists a good plan, because
typical queries in our system have at least one very selective condition - if not the user has 2 options:

jdbc has a great feature - Statement.setQueryTimeout - one of user properties in our application is "how many seconds my query can run"- if the query exceeds that limit a message is displayed to the user and he/she can either:

1) choose more restrictive conditions (in most cases it's just their laziness that makes them execute searches like "give me all companies starting with "a%" and residing in this city" - they often need more specific information and just do it this way because they are lazy to write a few more letters)

2) they can perform the same query on the warehouse - all it takes is 2 clicks in their query tree - they just have to coop with the fact that the data can be up to 24 hours old (which in 95% of cases is not somthing to worry about)



i read the page about generic models you redirected me to in one of your previous posts - and there are some comments that - from my point of view - state some premise and then conclude right the opposite of what i would think is implied by the premise.
(like the one with flat files+indexes from June 27, 2003)

my roommate wants to go to bed and this is her computer and it's in her room so i'll have to go now.

depending on your local time i wish you good [morning/afternoon/evening/night] :-)

thanks for your time
k.

Tom Kyte
April 09, 2005 - 6:59 pm UTC

we use parent child tables.

editing the data is easy (easy enough that pretty much most every application does it this way)


You have not normalized. Normalization is not what you have done. You have a normalized "attribute value" thing. But you have not modelled your world.

Store for me an address.

Street1, Street2, City, Zip, State

Enforce that street2 cannot be filled in unless street1 is, and the zip/state must be valid. And the city must be verified to be in the state. In your model.

A company has addresses, A company has employees. Three tables.

am i missing something?

k., April 10, 2005 - 4:16 am UTC

ok so

1)*********************************************************
to store the information we were talking about i would use:

1) dat_string - for names
2) dat_number - for employes (their counts - we can't afford storing information abuot all employees - and noone would be interested in that kind of information anyway)
3) dat_adress - for addresses


you would use :
1) company_name
2) company_employee_count
3) company_address

<quote>
"You have lost the ability to easily query the information, look at the query you
need to write?"
</quote>

so can you please show me the difference between these two approaches and especially how your queries differ from mine?

2)*********************************************************
<quote>
Store for me an address.

Street1, Street2, City, Zip, State

Enforce that street2 cannot be filled in unless street1 is, and the zip/state
must be valid. And the city must be verified to be in the state. In your
model.
</quote>

storing Street1, Street2, City, Zip, State
in my model would be exactly the same as in your model
(except that we never came across an address with 2 streets in it - so we don't have street1, street2 in dat_address -just street - but that's not a problem - the same holds for your model, whatever it would be)

well the part with enforcing is more interesting -
and as i said in my last post - we DO lose this kind of data integrity checks on database level - so no need to repeat that from your side..

we do have data integrity checks in our application code.
each piece of information can have as many integrity checks attached to them as they want - most of them have none (exept for some length/value range stuff, which does not affect performance) which is done through one select statement between inserting/updating the data and commitng it.

i agree that it has some performance effects - but (just giving example figures here) we do not mind that much if certain update actions take 0.1 seconds insted of 0.01 seconds.

there's always a tradeoff - and this is a price that we definitely are willing to pay for all that we gain.

even this point (data integrity checks on aplpication level) has some pros:

having imlpemented it on application level we can define wheter a piece of information violating our rules will not be stored in the database (that's what oracle would do) or just marking it as "invalid" - this has shown to be very useful.

but anyway i do agree with you that we have lost the ability to use data integrity checks on database level and that - overall - is a CON.

this is just a small tradeoff.

without these dat_XXX tables we'd end up with tens of
tables for dat_number, tens of tables for dat_string,etc.
and ALL of those tables would still have the VERY SAME structure - so what do we gain here apart from the point mentioned above? it is up to you to tell me what we gain (i can think of a thing or two, but none of them that serious), but we lose much more!



Tom Kyte
April 10, 2005 - 10:29 am UTC

1) I did?

select * from table where <predicate>

as opposed to a multi-table join


2) In my model, it would be

create table ... ( street1, street2, city, state, zip, check constraint...)

I only see generic "type, value" tables in your model? am i missing a big part of it? (not being snarky, I really don't see how these two are even comparable?)



You do realize that integrity checks that CROSS ROWS (involve more than a single row) or cross TABLES in a database that supports multi-versioning and read consistency must also use LOCK TABLE in general right? Because it is very very very hard to lock data you cannot see. and we don't block.

eeeee

k., April 10, 2005 - 11:41 am UTC

tom, please..

could you please go through this post and concentrate on what i write?

i said:

"the application must be able to store multiple names and addresses"

the relation between

company and its names is 1:N
the same holds for addresses (and in fact almost all information about a company)

so we have to store names and addresses in separate tables,
don't we? this is called normalization and i thought we agreed on that.

<quote>
A company has addresses, A company has employees. Three tables
</quote>

reading the quoted statement, i thought we were near a complete agreement on this subject. and still you come up with

<quote>
select * from table where <predicate>

as opposed to a multi-table join
</quote>

as you often say - this does not compute.

you either store the information in one table
or have to do a join!

is that not correct?!

so please either show me how to store names+addresses in one table

or

how to aviod a join in the query.


Tom Kyte
April 10, 2005 - 11:51 am UTC

<quote>
"the application must be able to store multiple names and addresses"

the relation between

company and its names is 1:N
the same holds for addresses (and in fact almost all information about a
company)
</quote>

and in relational speak, I would have two tables:

create table company( company_id primary key, attributes ..... );
create table company_addresses( company_id, address_id, ATTRIBUTES, primary key(company_id,address_id), foreign key (company_id) references company );



For the example you gave sure -- it would have been "flat" as they were definitely scalar attributes. To find all companies with offices in VA:

select * from companies where company_id in ( select company_id from company_addresses where state = 'VA' );


Or, if there were a frequently asked question, I might well use TEXT (indexing technique in Oracle) to create a master/detail index or an index on a function so I could say:

select * from companies w
here contains( state_or_any_set_of_attributes_I_want, '{VA}' ) > 0;






2 tables?

k., April 10, 2005 - 12:04 pm UTC

<quote>
and in relational speak, I would have two tables:

create table company( company_id primary key, attributes ..... );
create table company_addresses( company_id, address_id, ATTRIBUTES, primary
key(company_id,address_id), foreign key (company_id) references company );
</quote>

i can't see a place to store the name in those 2 tables of yours..just like you have company_addresses, you would have to have company_names (you can't store it in "company")

Tom Kyte
April 10, 2005 - 12:14 pm UTC

enter third table?

create table company_names( company_id, name_id, ATTRIBUTES (with meaningful names and all), primary key(company_id, name_id), foreign key(company_id) references company (company_id) );


And the same applies for indexing, if I needed to frequently retrieve all companies with names like 'something' in the state of 'VA', I would be sorely tempted to index using text and create the equivalent of a "meta company" that was indexed.

In fact, when you search this site, that is exactly what happens.

there is a Q&A table with a question and answer.
Then there is a detail table followups, with your text
And then there is yet another detail table responsed with my text

When you search the site however, we search relational and textual attributes over all three (or N in general) tables.


I'm just very much not into the "attribute_name", "attribute_value" models. They are hard for newcomers to inherit. They have always (in my experience) been the cause of big performance issues. Data integrity (a binary situation, you either got it or not) has also been a major issue.

They are good for the developer, but (in my experience) it ends there.


Yes, much of this is my opinion, based on past experiences. But I prefer tables with names and attributes that mean something.

2 tables? .. continued

k., April 10, 2005 - 12:20 pm UTC

..
so having company_addresses and company_names
the only reasonable way (known to me) to get all companies
whose name (at least one of their names to be more precise)
begins with 'A' and residing in Paris is:

select t1.company_id from company_names t1,company_addresses t2 where t1.name like 'a%' and t2.city like 'paris' and t1.company_id=t2.company_id


and that is identical to my model.

of course i am aware that some queries would run too long this way and so we have a warehouse table where we store
cartesian product of all names and addresses for each company (names and addresses being just an example - in reality the warehouse tables usuualy have about 5,6 attributes together)


Tom Kyte
April 10, 2005 - 12:45 pm UTC

or using contains as I've mentioned.

The query using just SQL would actually be:

select *
from company
where company_id in ( select company_id from addresses where ... )
and company_id in ( select company_id from names .... )

and I would say it is very different from your query, I can "see things" and the attributes are flat -- street, city, zip, state are ATTRIBUTES in a single row, how would you represent them?

or, if you wanted to use the text index, we could

select *
from company
where contains( information, 'X within Name and Y within State' ) > 0;






taadydaadydaaa

k., April 10, 2005 - 12:35 pm UTC

<quote>
enter third table?

create table company_names( company_id, name_id, ATTRIBUTES (with meaningful
names and all), primary key(company_id, name_id), foreign key(company_id)
references company (company_id) );

</quote>

and that's exactly what we have!!

before we had tables
like
company_name, company_address, company_identification, stock_identification
(the names were different - i choose these names to describe what they contained)

but for example
company_name
company_identification
stock_identificatiom

all had the same structure - and they already
hade the "type_cd" field - there are various kinds
of company names and various types of all this information

to our shame we never used oracle data integrity anyway so...

we said to ourselves
"having all those tables with the same structure, why don't we just put them all into one table" - and so instead of XX tables in the system we now have some 10 tables or so.

the domain of the type_cd column broadened a bit and there we were.


the good thing about the application (written in java) is
that it can easily adapt to any data model (well almost any) - the model we have now suits our needs well..if the application was to be used in an environment where it would be necesary to use data integrity (like in banking applications, etc.) there would be nothing easier than to look at what the customer needs and design a model fitting their exacts needs and "plug" it in the system.



Tom Kyte
April 10, 2005 - 12:50 pm UTC

you and I will have to agree to disagree on what is relevant.

To me, the application is nice, but the data -- that is where is it at.

To me an environment that doesn't need data integrity is a very foreign environment. Maybe I just haven't worked there yet.

....

k., April 10, 2005 - 1:19 pm UTC

<quote>
select *
from company
where company_id in ( select company_id from addresses where ... )
and company_id in ( select company_id from names .... )
</quote>

i always thought the best way to do this kind of query was join. now for development purposes i use 10g and i noticed that the optimizer is able to do some really cool things like generating the same plan whether you use "exists" or "in". but i'm not sure if back in 8.1.6 which is our production database it wouldnot have been better to use join instead of those in's?

i mean - if 1 of those subqueries returns a 100 rows and the other 1000000 rows with join the optimizer will use nested loops starting with those 100 rows and joining the other company with "company_id" and you get the result immediately..is the optimizer capable of doing such optimizations even if i use "in"?

and in our queries all we are interested in is company_id so (select * from company) is not necesaty here.

<quote>
or, if you wanted to use the text index, we could
</quote>

this sounds interesting - i'll study the docs for text indexing...

<quote>
street, city, zip, state are ATTRIBUTES in a single row,
how would you represent them
</quote>

well exactly the same way as you would - as i have written several tiomes we do have a table called "dat_address" - it has colums for street, city, zip, state - just like you propose.

we are generic in that we do not have "companies"
just entities of type "company", we do not have stocks
we have entities of type "stock"..we dont have
company names - we have string values of type
"company name", etc.

we could have tables stock_identification
company_identification, ompany_name - but they
all wuold have the same structure.

i know, iknow that integrity problem. but bear in mind
that we coop with it quite well and we gain great flexibility.

there are no hard coded queries - all queries are composed in one place. the same holds for inserting/updating - one java class takes care of all editing.

users are given a set of "data types" (company name, company id, employee count,etc.) and in an easy to use query editor they can create any query they like..

and output of any query can be edited.

users can also add new types of information
(again in an easy to use editor - no developeers needed)

as i say..i take any good advice, but "living" with the system i know its strong and week points and always try to find a balance between pros and cons. there is no single piece of software that could not be written better, but it's a long and winding road and one has to be very careful not to get lost..

Tom Kyte
April 10, 2005 - 1:22 pm UTC

It would have been "CBO vs RBO" for the in vs exists. The CBO can and will do either/or optimization on that.

...

k., April 10, 2005 - 1:38 pm UTC

<quote>
To me an environment that doesn't need data integrity is a very foreign
environment. Maybe I just haven't worked there yet.
</quote>

i do agree that having data integrity is ALWAYS better than not having it (who wouldn't?)

we just sacrificed it because there was something else that we wanted more..

still the application takes care of integrity - but as you have mentioned - locking is a problem. we do not lock tables in this process. all integrity violations are marked "invalid" at the end of the day (it is quite rare though for that to happen).


thanks for a nice discussion
:-)



A reader, April 18, 2005 - 1:43 pm UTC

Hi,

I have a table a and any select/update operations that I perform against this table are very slow. I analyzed the table but that does not seem to have had any effect. Is there anything else that I should do. Please advise.

Thanks.

Tom Kyte
April 18, 2005 - 2:46 pm UTC

so, update a set x=1 where rownum=1 is slow?

insufficient data to comment here, look at your plans, use tkprof to analyze what is happening.

Confused with Rownum - Urgent Pls

A reader, April 20, 2005 - 11:22 am UTC

Are Both the queries same.

1. SELECT BchCtl_Bus_YMD,
BchCtl_Mrg_Sts,
BchCtl_Mrg_YMDHMS,
BchCtl_IRN,
TRUNC( BchCtl_Run_YMDHMS )
INTO ValDte,
MrgSts,
MrgDte,
BchCtlIRN,
BchRunYMD
FROM BchCtl_Tbl A
WHERE BchCtl_IRN = ( SELECT MAX( BchCtl_IRN )
FROM BchCtl_Tbl
WHERE BchCtl_End_Sts = 'S'
AND ( ( BchCtl_Cod = 'ENX' || ExtBmkPvder.ENs_Cod )
OR ( BchCtl_Cod = 'ENX'
AND BchCtl_Job = ExtBmkPvder.ENS_Cod ) )
AND BchCtl_Bus_YMD = ( SELECT MAX( BchCtl_Bus_YMD )
FROM BchCtl_Tbl
WHERE BchCtl_End_Sts = 'S'
AND ( ( BchCtl_Cod = 'ENX' || ExtBmkPvder.ENs_Cod )
OR ( BchCtl_Cod = 'ENX'
AND BchCtl_Job = ExtBmkPvder.ENS_Cod ) )
AND BchCtl_Bus_YMD <= ActDte( PriActDt + 1 ) ) );

2. SELECT BchCtl_Bus_YMD,
BchCtl_Mrg_Sts,
BchCtl_Mrg_YMDHMS,
BchCtl_IRN,
TRUNC(BchCtl_Run_YMDHMS)
INTO ValDte,
MrgSts,
MrgDte,
BchCtlIRN,
BchRunYMD
FROM BchCtl_Tbl A
WHERE BchCtl_IRN =
(SELECT bchctl_irn
FROM (SELECT MAX(BchCtl_IRN) bchctl_irn,
BchCtl_Bus_YMD
FROM BchCtl_Tbl bc
WHERE bc.BchCtl_End_Sts = 'S' AND
(BchCtl_Cod = 'ENX' || ExtBmkPvder.ENS_Cod OR
(BchCtl_Cod = 'ENX' AND
BchCtl_Job = ExtBmkPvder.ENS_Cod)) AND
bc.BchCtl_Bus_YMD <= ActDte(PriActDt + 1)
GROUP BY BchCtl_Bus_YMD
ORDER BY BchCtl_Bus_YMD DESC)
WHERE rownum = 1);

Once you put order clause in the subquery the order of the rows(rownum) in the outer query is going be the same for the specified row. Is this correct.


Tom Kyte
April 20, 2005 - 9:03 pm UTC

smaller example -- not sure what you are trying to say here.

news about the evil bug

Alberto Dell'Era, April 23, 2005 - 4:26 pm UTC

The strange behaviour described above
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:3126073805757#37987595575981 <code>
has been accepted as

bug 4284436 "HISTOGRAMS NOT USED AFTER FIRST STATS COLLECTION"
[current status: Code Bug (Response/Resolution)]

To recap (more details and test case on the link above): if you analyze a table using "method_opt=>'for all columns size auto'" or "method_opt=>'for all INDEXED columns size auto'", the histograms are not computed; but they are computed if you analyze the table again (ie run the analyze two times).
It affects both 9.2.0.6 and 10.1.0.3.

Workaround: list the columns explicitly instead of using "all" (or obviously, run the analyze twice).

This probably mainly affects people running test scripts, perhaps while writing books.

hth

Tom Kyte
April 23, 2005 - 5:02 pm UTC

Alberto --

thanks very much for the followup, I truly appreciate it.

Alberto Dell'Era, April 24, 2005 - 4:57 pm UTC

> thanks very much for the followup, I truly appreciate it.

My pleasure - and thanks for the lesson about "churn" on your blog ;)

the "evil bug" is not a bug ...

Alberto Dell'Era, April 27, 2005 - 8:24 am UTC

The bug mentioned above

4284436 "HISTOGRAMS NOT USED AFTER FIRST STATS COLLECTION"

has been closed as "not a bug".

I should have read the documentation more carefully, where it says that AUTO in method_opt=>' .. size AUTO' means:

"AUTO-Oracle determines the columns to collect histograms based on data distribution and the workload of the columns"
</code> http://download-west.oracle.com/docs/cd/B14117_01/appdev.101/b10802/d_stats.htm#1036462 <code>
Explanation by Development on my test case given above:

"SIZE AUTO means that histograms will be created based on data distribution and column usage. Before the first query execution column was not used in WHERE clause, and therefore histograms were not created. If you wish to create histograms based on distribution only, pls. use SIZE SKEWONLY."

Sorry for my mistake!

Puzzling ..

Sree, July 28, 2005 - 12:18 pm UTC

Tom,
THis is very puzzling that when i run the following code as pl/sql script, it runs in no time, but if i put same in a procedure or a package, it's taking a lot of time (i never saw it's done executing even after a lot of time). it's like fell into an infinite loop.

I cannot see what's wrong with this code??

Here is the code
====================================
declare
v_schema varchar2(50):='';
error_mssg varchar2(2000):='';
Begin
select user into v_schema from dual;
v_schema:=v_schema||'USER';

for i in (select object_name name from user_objects
where object_type in ('PROCEDURE','PACKAGE','FUNCTION'))
loop
Begin
Execute Immediate 'Grant Execute on '||i.name||' to '||v_schema;
--dbms_output.put_line('after execute '||v_schema);
Exception
When others then
error_mssg:=sqlerrm;
insert_error('Grant_permissions.allobjects->Error while executing grant statement ',error_mssg);
End;
end loop;

End;

===========================================

this procedure is not working

=========================================
create or replace Procedure allObjects
is
v_schema varchar2(50):='';
error_mssg varchar2(2000):='';
Begin
select user into v_schema from dual;
v_schema:=v_schema||'USER';

for i in (select object_name name from user_objects
where object_type in ('PROCEDURE','PACKAGE','FUNCTION'))
loop
Begin
Execute Immediate 'Grant Execute on '||i.name||' to '||v_schema;
--dbms_output.put_line('after execute '||v_schema);
Exception
When others then
error_mssg:=sqlerrm;
insert_error('Grant_permissions.allobjects->Error while executing grant statement ',error_mssg);
End;
end loop;

End;
======================================================

I am not sure if this question belongs in this thread, but couldn't find a related thread to post this one.
Hope you see the problem why it's not working as a procedure.
Thanks for all your help.

Tom Kyte
July 28, 2005 - 1:11 pm UTC

hehehe, thats a funny one.....


allobjects I presume is a procedure in this same schema.

So, eventually, allobjects selects "itself" from user_objects....

And tries to grant on itself....

but you cannot grant on code that is running - so it waits until the code is not running anymore, but it is the code that is running, so it is waiting for "itself"


add "and object_name <> 'ALLOBJECTS' " to the query...

Really Funny..

sree, July 28, 2005 - 2:40 pm UTC

Thanks Tom. I did not realise that. It's all Crystal Clear NOW!!.

Thanks again

Performance poor when accessing dba_tables/user_tables/dba_users/all_users etc.

Vikas Khanna, September 28, 2005 - 2:55 am UTC

Hi Tom,

The Select to the above mentioned tables is taking too long to return back the results.Even I have to cancel the operation after 5 mins or so.

The other tables and the db is running perfectly fine.

Can you suggest what is wrong?

Thanks
Vikas

Tom Kyte
September 28, 2005 - 9:47 am UTC

er? not sure of the context you are stating this in.

Very Good as always

Orando Reyes, October 12, 2005 - 10:59 am UTC

Tom,

It there any special way a table gets analyzed when it has a LONG RAW column? I have a table with one of this columns and over 2M records, and when I analyze it, using estimate 15%, it takes over 4hours, this is using command line or DBMS_STATS.

Any suggestions on how to deal with this type of tables when analyzing them? And/or why does it behaves so differently?

Thanks in advance as usual.

Orlando Reyes


Tom Kyte
October 12, 2005 - 2:11 pm UTC

long raw data is stored "inline" in the table - any query that has to sample this table is going to sample tons of long raw data as well.

My guess would be: table is huge perhaps.

You can sql_trace it and see what it is doing, where the time is spent however, that would be useful.

Performance slow after analyze

Paru, October 12, 2005 - 7:04 pm UTC

Hi Tom,

We are using 8.1.5 oracle and we do analyze on a weekly basis of the database.

There is a query that when runs from sqlplus runs fine but when called via stored procedure takes a while to return. All the stored procedure does is open a recordset and tries to get the records and send the records as out parameter.

I have tried to pin the stored procedure to the DB, but still no use.

Can you give me some pointers as to what could cause this slowdown from the DB side as you would think that procedure should run fine as it is already in a compiled form.

Thanks,
Paru

Tom Kyte
October 13, 2005 - 10:23 am UTC

... a recordset and tries to get the records and send the records as out parameter. ...


sample code please - i don't know what a "recordset" is when talking about plsql.




Performance poor after analyzeing table

Paru, October 13, 2005 - 11:32 am UTC

CREATE OR REPLACE PROCEDURE CUST_ORDERS_CURR_NEW
(Select_order_nbr_In IN CUSTORDER.ORDER_NBR%TYPE,
Reports_InOut IN OUT TYPES.RC)
IS BEGIN
OPEN Reports_InOut FOR
SELECT /*+ FIRST_ROWS */
cu.Cust_po_nbr,
cu.Order_nbr,
cu.Ship_to_name,
cu.Ship_to_addr1,
cu.Ship_to_addr2,
cu.Ship_to_addr3,
cu.Ship_to_addr4,
cu.Ship_to_city,
cu.Ship_to_state,
cu.Ship_to_zip,
cu.Order_date,
cu.Exp_ship_dt,
cu.Rqst_date,
cu.Process_level,
cu.R_Status,
cu.Order_Type,
cd.phone_nmbr,
cd.customer,
ws.ship_date,
s.SERIAL_NUMBER
FROM
custorder cu,
custdesc cd,
whshipment ws,
whdtag1 s
WHERE
cu.company = 3002 AND
cu.order_nbr = Select_order_nbr_In AND
cd.cust_group = 'CLT ' AND
cd.customer = cu.customer
AND ws.DOC_NBR (+) LIKE '%'|| Select_order_nbr_In
AND ws.COMPANY = S.COMPANY (+)
AND ws.LOCATION = S.LOCATION (+)
AND ws.SHIPMENT_NBR = S.SHIPMENT_NBR (+)
AND cd.customer = cu.customer
AND cu.company=ws.company (+)
AND cu.shipment_nbr = ws.shipment_nbr (+);
END;
/


Tom Kyte
October 13, 2005 - 1:14 pm UTC

that stored procedure should exectute immediately. go into sqlplus and:

set timing on
variable x refcursor
exec CUST_ORDERS_CURR_NEW( ....some number..., :x );


and show us taking long. That procedure does not really do "any work", opening a regular cursor like that should be fast.

Fetching the data, that - that could be an entirely different story, but you said the procedure was slow.

Performance poor after analyzeing table

Paru, October 14, 2005 - 9:04 am UTC

15:04:43 SQL> variable x refcursor;
15:05:52 SQL> begin
15:05:53   2  cust_orders_curr_new(5204838,:x);
15:06:03   3  end;
15:06:04   4  /
PL/SQL procedure successfully completed.
Elapsed: 00:18:35.69

15:24:41 SQL> print :x

The query executes fine within the DB, but there are times when the procedure runs fine, but sometimes it is very very slow like the timings above.

So, what should I be looking for during those slow times? 

Thanks. 

Tom Kyte
October 14, 2005 - 9:13 am UTC

sql trace it.

Performance poor after analyzeing table

A reader, October 14, 2005 - 1:23 pm UTC

Did it, the trace shows those 18 minutes in execution.

But, at times when it runs fast the execute is faster too. So, I pinned the procedure hoping it will execute faster.

Tom Kyte
October 14, 2005 - 5:33 pm UTC

the procedure isn't the issue here - the query is, it is generating the result set upon the open (which isn't unheard of - just strange).

the query plan in plsql - is it the same plan you get when you say the query runs "fast"? (use tkprof to see)


pinning the procedure is futile here, not even remotely related to the issue at hand.

dbms_stats

parag jayant patankar, October 31, 2005 - 1:42 am UTC

Hi Tom,

We are running pro-cobol system received from HO on Oracle 9.2 AIX.
( Do not have source )
Upto this time system was running without having "statistics" in a database. As per Oracle recommendation we had decided to switch system to CBO. For this reason in test system we had collected statistics by

execute dbms_stats.gather_schema_stats( -
ownname => 'test', -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt => 'for all columns size auto', -
degree => 2, -
cascade => true );

But after this we found that some of the pro-cobol programs (in "batch") execution time increased marginally for e.g. execution time increased from 4 mins to 6 mins ...etc. And overall batch processing time increased to 35-45 mins. I would like to know

1. What may be the reasons of increasing execution time ? ( regarding server we have ensured that no process or background jobs are running in these testings durations )

2. In which direction I should dig more to find out the issues ?

regards & thanks
pjp

Tom Kyte
October 31, 2005 - 5:42 am UTC

Well, when you gather stats one of three things will happen:

a) a given query will run faster
b) a given query will run slower
c) a given query will run exactly the same.


Plans have changed - what you'll want to do is find out "which ones" and then we can start in on "why". v$sql_plan will be very useful to compare (extract that view from your production system and start doing some comparisions).

Hotsos has a tool (www.hotsos.com) Laredo, that can help you automate this if you are interested.


To get the details on the CBO itself - Jonathan Lewis has just released a book on the CBO - highly recommended.

(but I would suggest using 'for all columns size 1' as a starter - collecting histograms is very expensive and you generally do not need them in many systems)



clob with cache

atul, October 31, 2005 - 10:03 am UTC

Hi,

What will be benificial.

1)CLOB with cache option and veritas quick i/o datafiles.

2)CLOB with cache option without veritas quick i/o datafiles.

Your views .

Thanks,
Atul

Tom Kyte
November 01, 2005 - 3:45 am UTC

one of the three following answers applies:

a) yes
b) no
c) maybe

to each of A and B.


forget the veritas part, not really relevant. It is just IO and you'll always try to make IO as fast as you can with whatever you have.

you need to figure out "do I or do I not want to cache lobs". Think about some of the things you would need to think about for that

a) how big are they ( do I have room, do I want to have them in my buffer cache)
b) are they re-read freqently (if not, see A)
c) are they written frequently
d) are they written by a batch process or an end user, end users hate to wait for IO to complete, batch processes don't really care, they have lots of patience


and so on...

CBO

Parag J Patankar, November 02, 2005 - 4:52 am UTC

Hi Tom,

Thanks for your answers to my question in this thread. You have told to me

Well, when you gather stats one of three things will happen:

a) a given query will run faster
b) a given query will run slower
c) a given query will run exactly the same

Will you pl tell me in which are the situcations "a given query will run slower" when we switch from RBO to CBO ?

regards & thanks
pjp

Tom Kyte
November 03, 2005 - 4:49 am UTC

when the optimizer chooses a suboptimal plan?

CBO and RBO

Parag J Patankar, November 03, 2005 - 5:40 am UTC

Hi Tom,

Will you pl give me a practical 1 or 2 examples, when we changed to CBO from RBO it will pickup sub-optimal plan ?

regards & thanks
pjp

Tom Kyte
November 03, 2005 - 6:59 am UTC

no, they are exceptions - bad stats, missing stats, invalid stats are the most common causes. queries written for the cbo (adding zero to some column to make an index not usable and other rbo techniques).



Index not used by the optimizer

Praveen, November 03, 2005 - 9:21 am UTC

Hi Tom,

I am working on oracle10g r1. I have a query joining two table t1 & t1 - t1 containing about 2000 records and t2 containing about 0.4 million records. The query is as simple as shown below.

select * from t1, t2
where t1.col1=t1.col1
and t1.col2=t2.col2
and t1.col3=t2.co3;

I want the query to full scan t1 and index scan t2 because each record in t1 will only find about 20 records in t2.

Hence I created an index on t2(col1, col2, col3).

But the query is not at all using the index- rather it does a complete scan of both the tables. Worse case is if I try to create an index on t1, the query will full scan t2 and index scan t1. The index exists on t2 as the following query comfortably uses it.

select * from t2
where col1 = 'abc'
and col2 = 'pqr'
and col3 = 'xyz';

I tried giving the hint /* +INDEX */ as well as analysed the table. The optimizer shows a very high cost in the explain plan of the query and the query is taking ages to complete.

What could be the reason?

Best regards...


Tom Kyte
November 04, 2005 - 2:31 am UTC

oh, no, you DON'T

you want that query to full scan and hash T1 and then full scan T2 and hash join it to T1.


</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894#6760861174154 <code>


indexes are NOT ALL GOODNESS
full scans are NOT ALL EVIL

Please see the previous review

Praveen, November 03, 2005 - 9:35 am UTC

I forgot to add:

Distinct count t1(col1, col2, col3) = 172
Distinct count t2(col1, col2, col3) = 11231

Thanks..


typo

Tyler, November 03, 2005 - 11:55 am UTC

Hi Praveen,

is this the actual query you're running? (it's the one you posted)

select * from t1, t2
where t1.col1=t1.col1
and t1.col2=t2.col2
and t1.col3=t2.co3;

because i assume you're wanting to join t1 and t2 on col1

i think you want this:

select * from t1, t2
where t1.col1=t2.col1
and t1.col2=t2.col2
and t1.col3=t2.co3;

Hopefully that's the solution to your problem.


Typo error....

Praveen, November 04, 2005 - 1:05 am UTC

That was a typo error. The actual query is:

select * from t1, t2
where t1.col1=t2.col1
and t1.col2=t2.col2
and t1.col3=t2.co3;

Thanks for pointing it out, Tyler.

Tom, I have computed statistics for all indexed columns and indexes as well. Given below the explain plan output:


SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4853 Card=11187 Bytes=2472327)
HASH JOIN (Cost=4853 Card=11187 Bytes=2472327)
TABLE ACCESS (FULL) OF T1 (TABLE) (Cost=35 Card=9758 Bytes=927010)
TABLE ACCESS (FULL) OF T2 (TABLE) (Cost=1913 Card=435229 Bytes=54838854)


The query is actually taking a long time to execute. What could be the problem? The version is Oracle 10g r1.

Best regards...


Tom Kyte
November 04, 2005 - 3:37 am UTC

tkprof is your friend, use it, see what it says....

that shouldn't take more than a couple of seconds.

Thanks Tom

Praveen, November 04, 2005 - 6:17 am UTC

100% true.

How CBO is estimating cardinality for queries using indexes on strings

Dmytro, November 07, 2005 - 5:31 am UTC

I have a following problem: on analyzed table with all needed indexes, query works good with /*+RULE*/ hint and awfully without it.
Table contains information about details of 5 kinds in columns: det_id number, det_name varchar2(100),det_kind_id number, det_idn varchar2(14), det_status number, det_location NUMBER.
So, we must take information about 20 (or less) details, that are not sold (not exists(<select from orders>)), have right value in column status, are located in specific location(foreign key to other table) and have idn >= of some value. Something like this:

SELECT det_id AS detail_id
,det_idn AS detail_idn
FROM (SELECT det_id
,det_idn
FROM details
WHERE det_status <> 2
AND det_kind_id = 4
AND det_location = 83082
AND det_idn >= 'UA1111111111'
AND NOT EXISTS (<...>)
ORDER BY det_idn)
WHERE ROWNUM <= 20;

Problem is, that for some reasons CBO decides to use index on det_idn in this case, that is a little strange thing, because the table has nearly 2500000 rows, 1000000 of which have NULL in this field (they are not interesting for us) and other 1500000 records, all have 12 symbols in this field, starting mostly with "UA..." and following is number that uniquely identifies detail among its kind (where can be max. of 5 details with the same det_idn).
Its strange, but CBO thinks, that cardinality of this condition is 8 (!), in fact where is nearly 1200000 records satisfying this condition. Interesting is, that CBO shows greater cardinality if I delete few symbols from the string. After few deletes, it begins to use index on column det_location, that brings us result in 0,2 secs instead of few minutes, but isn't right for program logic. Or we can use /*+RULE*/ hint, what I desperately want to avoid.
So Tom, what is happening here with CBO?

Tom Kyte
November 07, 2005 - 8:59 am UTC

how have you gathered stats

and how many records actually do satisfy the condition (eg: the number 8 is not "astounding" to me, so there must be some reason it is to you. Could be that the remaining 14999992 records are < 'UA111111111' for all I know.

Insufficient detail here.

(but you will really like Jonathan Lewis new book for helping understand whats going on - see my home page for a link)

Some datails about details... :)

Dmytro, November 07, 2005 - 11:21 am UTC

SELECT COUNT(*) FROM details
2401242

SELECT COUNT(*) FROM details WHERE det_idn IS NOT NULL
1411654

SELECT COUNT(DISTINCT det_idn) FROM details
704731

SELECT COUNT(*) FROM details WHERE det_idn>'UA1111111111'
1236065

I was gathering statistics simply by typing "exec dbms_stats.gather_table_stats(<user>,'details')" is it wrong?

Maybe, the problem is, that index det_idn_i, is based on two fields: det_idn & det_kind_id (the last is never null and has five possible values(1..5) almost equally distributed among the records)? As you can see, we nave equality operator on the field det_kind - maybe optimizer thinks, that we have as many distinct values here as in det_idn and so, decides to use this index?.. In fact, using type narrows our search to 300000 records, that is bad enough for using index.
If this is the case, how can I tell CBO about this fact?

Tom Kyte
November 08, 2005 - 9:21 pm UTC

what version - the method opt parameter changed the defaults, what is your default for that (do you have histograms or not here).



Ah... and yes, of course I put an eye on wonderful book by Jonathan Lewis...

Dmytro, November 07, 2005 - 11:26 am UTC

It can be a long time before it gets translated, so I am thinking about buying English version. Of course, your book is lying on my table no more than hand-length away. :)

Beware!!!

Richard, January 21, 2006 - 9:08 am UTC

Beware of launching into analyzing tables, without first studying the SQL statements will be issued to the database.

If you have queries that currently expect data to be returned in a particular order, but do not use an "order by" clause, you may find (as I did!) that the CBO will fail to use an index, and thus return data in an unordered sequence, once the underlying table are analyzed.

Yes, not using an "order by" clause in such situations is stupid, but it does happen! I had to use "alater table <table_name> delete statistics" to fix the problem (fixing the SQL with an "order by" clause was deemed by the Development folk as being "too problematic" for the quick fix we needed. Ho hum!

Tom Kyte
January 21, 2006 - 10:58 am UTC

that is the worst reason I've ever heard in my entire life.

It doesn't take "not gathering stats" for that to happen. It doesn't take gathering stats for that to happen.

Not that indexes return data sorted even (us 7 bit people might not notice sometimes). Typically an index is a binary sort, not a character set sort.

Sigh, this is scary stuff.

Unless and until a query has an order by, well, the data just aint gong to be sorted. You might appear to get lucky sometimes, but that is about it. You have a serious priority one bug in your code, period. Nothing more, nothing less.

... there's more!

Richard, January 22, 2006 - 1:59 pm UTC

None of the databases had EVER been analyzed before... ever!

Now I have to check every SQL statement (.sql scripts, PL/SQL procedures, functions, packages, external .asp code, etc.) before daring to analyze again! Still, at least the DBs will be "healthier" afterwards!

Tom Kyte
January 22, 2006 - 4:17 pm UTC

That is a valid reason to not gather statistics in production for the very first time.

If you have a running system and you just decide one day "I shall gather stats", that will likely be a very very long day for you. That is a major change, one that would need to be tested for sure.

Many queries will go "faster".
Many will not change much at all.
There will be at least one that for some reason does not go so well.

That last one is the only one anyone notices that day.

Oracle Heavy Dependency on Stats/Optimizer

Koshal, May 18, 2006 - 10:10 am UTC

We have around hunderds of jobs (Updating/Inserting data regularly). In one of our jobs we select data from staging table apply the business logic Load it into permanent table.

Ave No of rows in Staging Table 50000
Avg No of rows Inserted daily 10000 (we eliminate certaing rows)

Over a period of 6 months the maximum time it has taken is 30 seconds. It varies from 10 seconds to 30 second depending

During weekend we analyze tables. On one particular day during the time of analyzing the table we had 0 rows in staging table . The next day data has got populated in staging table. But while loading the data into permanent table it took almost 3 hrs still not completed. We killed the job and reran it again it took almost 1 hrs still not completed. Then we looked at the stats and found the stats were not correct. After analyzing the the table, The job finally completed in 15 seconds. Amazing isnt it. But it is true. I dont know why?

Thanks
Koshal







Tom Kyte
May 19, 2006 - 9:45 am UTC

you really don't know why?


You lied to the optimizer, you told it "0 rows" when there were thousands.

Then it developed plans based on this misinformation.

And the plans, optimal if there were zero rows, were less than optimal given thousands of rows.

If anything else happened - then I could understand your confusion, one must expect that "garbage in, garbage out" applies to statistics.

Difference between statistics on indexed columns and on all columns

A reader, December 06, 2006 - 3:01 am UTC

HI, Tom,

What is the difference between analyzing on all columns and analyzing on indexed columns only?

Recently I encounter an interesting case that one query runs very slow (3 seconds per execution fetching 2-4 rows). All the three tables in the query have up-to-date(last analyzed on Nov 28, 2006) statistics for table, all indexes and all indexed columns.

From the trace file I find a table with 2.2M to be full scanned, so I did an analyze on the table to compute statistics for table and all indexes and all columns. This query now runs at about 0.005-0.01seconds fetching 2-4 rows!

The three tables were analyzed on Mar 21, 2006 for table and all indexes and all columns also, and the performance was good until Nov 28.

What will be the explanation for this behavior?

Thanks

Tom Kyte
December 07, 2006 - 8:13 am UTC

simple:

wrong cardinality estimates = wrong plan
correct cardinality estimates = correct plan


you gave the optimizer a little bit more information - and it was able to determine what to "do" better.

does this mean "always gather histograms on all columns" - absolutely NOT.  Does this mean "only gather histograms on indexed columns" - absolutely NOT.  

It means histograms - when useful - are useful on ANY column regardless of whether it is indexed or not:


ops$tkyte%ORA10GR2> create table t1
  2  as
  3  select 99 id, a.*
  4    from all_objects a;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> update t1 set id = 1 where rownum = 1;

1 row updated.

ops$tkyte%ORA10GR2> alter table t1 add constraint t1_pk primary key(object_id);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t2
  2  as
  3  select * from t1;

Table created.

ops$tkyte%ORA10GR2> alter table t2 add constraint t2_pk primary key(object_id);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T1', method_opt=>'for all indexed columns' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2', method_opt=>'for all indexed columns' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t1, t2
  2   where t1.id = 1
  3     and t2.id = 99
  4     and t1.object_id = t2.object_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   501 | 96192 |   330   (2)| 00:00:0
|*  1 |  HASH JOIN         |      |   501 | 96192 |   330   (2)| 00:00:0
|*  2 |   TABLE ACCESS FULL| T1   |   501 | 48096 |   165   (2)| 00:00:0
|*  3 |   TABLE ACCESS FULL| T2   |   501 | 48096 |   165   (2)| 00:00:0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   2 - filter("T1"."ID"=1)
   3 - filter("T2"."ID"=99)

ops$tkyte%ORA10GR2> set autotrace off
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T1', method_opt=>'for all columns' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2', method_opt=>'for all columns' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t1, t2
  2   where t1.id = 1
  3     and t2.id = 99
  4     and t1.object_id = t2.object_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 4225966059

------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CP
------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |   192 |   166   (
|   1 |  NESTED LOOPS                |       |     1 |   192 |   166   (
|*  2 |   TABLE ACCESS FULL          | T1    |     1 |    96 |   165   (
|*  3 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |    96 |     1   (
|*  4 |    INDEX UNIQUE SCAN         | T2_PK |     1 |       |     0   (
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T1"."ID"=1)
   3 - filter("T2"."ID"=99)
   4 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")

ops$tkyte%ORA10GR2> set autotrace off

 

There is still some gap

A reader, December 11, 2006 - 1:45 am UTC

HI, Tom,

I can understand the access plan may be different if there is some filtering columns in the where clause having no statistics. But in my case, I don't see it.

The detailed DDL and SQL is as follow(I replace all the object names since it is a vendor software, but the definition is exactly what I am showing here):

CREATE TABLE "TABLE_1"
( "COL_13" NUMBER(11,0) NOT NULL ENABLE,
"COL_91" NUMBER(11,0) NOT NULL ENABLE,
"COL_90" NUMBER(11,0) NOT NULL ENABLE,
"COL_92" NUMBER(11,0) NOT NULL ENABLE,
"COL_1" NUMBER(6,0) NOT NULL ENABLE,
"COL_62" NUMBER(38,14) NOT NULL ENABLE,
"COL_67" NUMBER(38,14) NOT NULL ENABLE,
"COL_56" NUMBER(38,14) NOT NULL ENABLE,
"COL_14" DATE,
"COL_17" VARCHAR2(1) NOT NULL ENABLE,
CONSTRAINT "TABLE_1_II" PRIMARY KEY ("COL_91", "COL_90", "COL_13")
);

CREATE TABLE "TABLE_2"
( "COL_92" NUMBER(11,0) NOT NULL ENABLE,
"COL_51" NUMBER(38,14) NOT NULL ENABLE,
"COL_50" NUMBER(38,14) NOT NULL ENABLE,
"COL_49" NUMBER(38,14) NOT NULL ENABLE,
"COL_48" NUMBER(38,14) NOT NULL ENABLE,
"COL_47" NUMBER(38,14) NOT NULL ENABLE,
"COL_46" NUMBER(38,14) NOT NULL ENABLE,
"COL_43" NUMBER(38,14) NOT NULL ENABLE,
"COL_42" NUMBER(38,14) NOT NULL ENABLE,
"COL_41" NUMBER(38,14) NOT NULL ENABLE,
"COL_53" NUMBER(38,14) NOT NULL ENABLE,
"COL_54" NUMBER(38,14) NOT NULL ENABLE,
"COL_55" NUMBER(38,14) NOT NULL ENABLE,
"COL_66" NUMBER(38,14) NOT NULL ENABLE,
"COL_65" NUMBER(38,14) NOT NULL ENABLE,
"COL_64" NUMBER(38,14) NOT NULL ENABLE,
"COL_61" NUMBER(38,14) NOT NULL ENABLE,
"COL_59" NUMBER(38,14) NOT NULL ENABLE,
"COL_60" NUMBER(38,14) NOT NULL ENABLE,
"COL_18" VARCHAR2(20) NOT NULL ENABLE,
"COL_89" NUMBER(6,0) NOT NULL ENABLE,
"COL_88" DATE NOT NULL ENABLE,
CONSTRAINT "TABLE_2_II" PRIMARY KEY ("COL_92")
);

CREATE TABLE "TABLE_3"
( "COL_13" NUMBER(11,0) NOT NULL ENABLE,
"COL_16" VARCHAR2(40) NOT NULL ENABLE,
"COL_19" NUMBER(11,0) NOT NULL ENABLE,
"ICOL_19" NUMBER(11,0) NOT NULL ENABLE,
"COL_14" DATE NOT NULL ENABLE,
"COL_10" VARCHAR2(1) NOT NULL ENABLE,
"COL_12" VARCHAR2(2000),
"COL_9" VARCHAR2(1) NOT NULL ENABLE,
"COL_11" VARCHAR2(2000),
"COL_45" NUMBER(11,0) NOT NULL ENABLE,
"COL_44" NUMBER(38,14) NOT NULL ENABLE,
"COL_52" NUMBER(38,14) NOT NULL ENABLE,
"COL_57" NUMBER(38,14) NOT NULL ENABLE,
"COL_63" NUMBER(38,14) NOT NULL ENABLE,
"COL_58" NUMBER(38,14) NOT NULL ENABLE,
"COL_7" NUMBER(38,14) NOT NULL ENABLE,
"COL_6" NUMBER(38,14) NOT NULL ENABLE,
"COL_4" NUMBER(38,14) NOT NULL ENABLE,
"COL_40" VARCHAR2(80),
"COL_39" VARCHAR2(80),
"COL_38" VARCHAR2(80),
"COL_37" VARCHAR2(80),
"COL_36" VARCHAR2(80),
"COL_35" VARCHAR2(80),
"COL_34" VARCHAR2(80),
"COL_33" VARCHAR2(80),
"COL_32" VARCHAR2(80),
"COL_31" VARCHAR2(80),
"COL_30" VARCHAR2(80),
"COL_29" VARCHAR2(80),
"COL_28" VARCHAR2(80),
"COL_27" VARCHAR2(80),
"COL_26" VARCHAR2(80),
"COL_25" VARCHAR2(80),
"COL_24" VARCHAR2(80),
"COL_23" VARCHAR2(80),
"COL_22" VARCHAR2(80),
"COL_21" VARCHAR2(80),
"COL_87" VARCHAR2(80),
"COL_86" VARCHAR2(80),
"COL_85" VARCHAR2(80),
"COL_84" VARCHAR2(80),
"COL_83" VARCHAR2(80),
"COL_82" VARCHAR2(80),
"COL_81" VARCHAR2(80),
"COL_80" VARCHAR2(80),
"COL_79" VARCHAR2(80),
"COL_78" VARCHAR2(80),
"COL_77" VARCHAR2(80),
"COL_76" VARCHAR2(80),
"COL_75" VARCHAR2(80),
"COL_74" VARCHAR2(80),
"COL_73" VARCHAR2(80),
"COL_72" VARCHAR2(80),
"COL_71" VARCHAR2(80),
"COL_70" VARCHAR2(80),
"COL_69" VARCHAR2(80),
"COL_68" VARCHAR2(80),
"COL_93" VARCHAR2(1) NOT NULL ENABLE,
"COL_8" VARCHAR2(4) NOT NULL ENABLE,
"COL_15" VARCHAR2(1) NOT NULL ENABLE,
"COL_88" DATE NOT NULL ENABLE,
"COL_3" DATE NOT NULL ENABLE,
"COL_2" VARCHAR2(25),
"COL_5" VARCHAR2(1),
CONSTRAINT "TABLE_3_II" PRIMARY KEY ("COL_13")
);

CREATE OR REPLACE FORCE VIEW "VIEW_1" ("COL_13", "COL_16", "COL_19", "ICOL_19", "COL_10",
"COL_12", "COL_9", "COL_11", "COL_45", "COL_44", "COL_52", "COL_57",
"COL_63", "COL_58", "COL_7", "COL_6", "COL_4", "COL_93", "COL_8",
"COL_88", "COL_3", "COL_2", "COL_15", "COL_40", "COL_39", "COL_38", "COL_37", "COL_36",
"COL_35", "COL_34", "COL_33", "COL_32", "COL_31", "COL_30", "COL_29", "COL_28", "COL_27",
"COL_26", "COL_25", "COL_24", "COL_23", "COL_22", "COL_21", "COL_87", "COL_86", "COL_85",
"COL_84", "COL_83", "COL_82", "COL_81", "COL_80", "COL_79", "COL_78", "COL_77", "COL_76",
"COL_75", "COL_74", "COL_73", "COL_72", "COL_71", "COL_70", "COL_69", "COL_68", "COL_91",
"COL_90", "COL_92", "COL_1", "COL_62", "COL_67", "COL_56", "COL_51", "COL_50",
"COL_49", "COL_48", "COL_47", "COL_46", "COL_43", "COL_42",
"COL_41", "COL_53", "COL_54", "COL_55", "COL_66", "COL_65",
"COL_64", "COL_61", "COL_59", "COL_60", "COL_18", "COL_89",
"COL_5", "COL_14", "COL_17") AS
select
TABLE_3.COL_13 ,
TABLE_3.COL_16 ,
TABLE_3.COL_19 ,
TABLE_3.ICOL_19 ,
TABLE_3.COL_10 ,
TABLE_3.COL_12 ,
TABLE_3.COL_9 ,
TABLE_3.COL_11 ,
TABLE_3.COL_45 ,
TABLE_3.COL_44 ,
TABLE_3.COL_52 ,
TABLE_3.COL_57 ,
TABLE_3.COL_63 ,
TABLE_3.COL_58 ,
TABLE_3.COL_7 ,
TABLE_3.COL_6 ,
TABLE_3.COL_4 ,
TABLE_3.COL_93 ,
TABLE_3.COL_8 ,
TABLE_3.COL_88 ,
TABLE_3.COL_3 ,
TABLE_3.COL_2 ,
TABLE_3.COL_15 ,
TABLE_3.COL_40 ,
TABLE_3.COL_39 ,
TABLE_3.COL_38 ,
TABLE_3.COL_37 ,
TABLE_3.COL_36 ,
TABLE_3.COL_35 ,
TABLE_3.COL_34 ,
TABLE_3.COL_33 ,
TABLE_3.COL_32 ,
TABLE_3.COL_31 ,
TABLE_3.COL_30 ,
TABLE_3.COL_29 ,
TABLE_3.COL_28 ,
TABLE_3.COL_27 ,
TABLE_3.COL_26 ,
TABLE_3.COL_25 ,
TABLE_3.COL_24 ,
TABLE_3.COL_23 ,
TABLE_3.COL_22 ,
TABLE_3.COL_21 ,
TABLE_3.COL_87 ,
TABLE_3.COL_86 ,
TABLE_3.COL_85 ,
TABLE_3.COL_84 ,
TABLE_3.COL_83 ,
TABLE_3.COL_82 ,
TABLE_3.COL_81 ,
TABLE_3.COL_80 ,
TABLE_3.COL_79 ,
TABLE_3.COL_78 ,
TABLE_3.COL_77 ,
TABLE_3.COL_76 ,
TABLE_3.COL_75 ,
TABLE_3.COL_74 ,
TABLE_3.COL_73 ,
TABLE_3.COL_72 ,
TABLE_3.COL_71 ,
TABLE_3.COL_70 ,
TABLE_3.COL_69 ,
TABLE_3.COL_68 ,
TABLE_1.COL_91 ,
TABLE_1.COL_90 ,
TABLE_1.COL_92 ,
TABLE_1.COL_1 ,
TABLE_1.COL_62 ,
TABLE_1.COL_67 ,
TABLE_1.COL_56 ,
TABLE_2.COL_51 ,
TABLE_2.COL_50 ,
TABLE_2.COL_49 ,
TABLE_2.COL_48 ,
TABLE_2.COL_47 ,
TABLE_2.COL_46 ,
TABLE_2.COL_43 ,
TABLE_2.COL_42 ,
TABLE_2.COL_41 ,
TABLE_2.COL_53 ,
TABLE_2.COL_54 ,
TABLE_2.COL_55 ,
TABLE_2.COL_66 ,
TABLE_2.COL_65 ,
TABLE_2.COL_64 ,
TABLE_2.COL_61 ,
TABLE_2.COL_59 ,
TABLE_2.COL_60 ,
TABLE_2.COL_18 ,
TABLE_2.COL_89 ,
TABLE_3.COL_5 ,
TABLE_1.COL_14 ,
TABLE_1.COL_17 from TABLE_3, TABLE_1, TABLE_2
where TABLE_3.COL_13 = TABLE_1.COL_13
and TABLE_2.COL_92 = TABLE_1.COL_92;

The query is as below:
SELECT VIEW_1.COL_13, VIEW_1.COL_14,
VIEW_1.COL_1, VIEW_1.COL_12,
VIEW_1.COL_10, VIEW_1.COL_11,
VIEW_1.COL_9, VIEW_1.COL_15,
VIEW_1.COL_4, VIEW_1.COL_6,
VIEW_1.COL_7, VIEW_1.COL_8,
VIEW_1.COL_50, VIEW_1.COL_51,
VIEW_1.COL_49, VIEW_1.COL_42,
VIEW_1.COL_43, VIEW_1.COL_41,
VIEW_1.COL_47, VIEW_1.COL_48,
VIEW_1.COL_46, VIEW_1.COL_18,
VIEW_1.COL_89, VIEW_1.COL_44,
VIEW_1.COL_52, VIEW_1.COL_57, VIEW_1.COL_63,
VIEW_1.COL_58, VIEW_1.COL_45,
VIEW_1.COL_62, VIEW_1.COL_67,
VIEW_1.COL_56, VIEW_1.COL_53,
VIEW_1.COL_54, VIEW_1.COL_55,
VIEW_1.COL_66, VIEW_1.COL_65,
VIEW_1.COL_64, VIEW_1.COL_61,
VIEW_1.COL_59, VIEW_1.COL_60,
VIEW_1.COL_5, VIEW_1.COL_17,
VIEW_1.COL_92
FROM
VIEW_1
WHERE COL_91 = :VV_0_v:VV_0_i AND COL_90 = :VV_1_v:VV_1_i
AND COL_14 <= :VV_2_v:VV_2_i AND COL_14 >= :VV_3_v:VV_3_i
AND COL_9 = :VV_4_v:VV_4_i AND COL_10 = :VV_5_v:VV_5_i
ORDER BY COL_14 DESC, COL_13 DESC;

NUM_ROWS at Dec 5/Mar 31
TABLE_1/82,527,625/41,682,555
TABLE_2/2,275,788/1,199,670
TABLE_3/44,908,875/34,401,755

It is TABLE_2 that causes this different access plan, but there is nothing from TABLE_2 in the where or order by clause.

I solved my problem just because I happened to check out last_analyzed in the DBA_TAB_COLUMNS was different for different columns for TABLE_2, and decided to give it a try. Is there anyway to detect such problem?

Thanks

Tom Kyte
December 11, 2006 - 8:05 am UTC

i have no idea what "such problem" is.

RE: There is still some gapsome

A reader, December 11, 2006 - 8:36 pm UTC

HI, Tom,

Sorry, My post "There is still some gap" is linked to the post on December 06, 2006, "Difference between statistics on indexed columns and on all columns".

In my post on Dec 06, I encountered a case that Oracle choose a different access plan if I compute statistics for a table for all columns or for indexed columns only. And you demonstrated a test case to explain why Oracle will behave like this if a non-indexed column is used in the where clause, and I am totally convinced.

However, my query (detailed on my Dec 11 post) does not have any non-indexed column in where/order by clause. But for this query, why Oracle chooses unique index scan after I compute statistics for all columns? And why Oracle chooses to full table scan if I compute statistics for indexed columns only? How do we determine whether we should compute statistics for all columns, or for indexed columns only?

Thanks

Tom Kyte
December 12, 2006 - 6:42 am UTC

your example is HUGE, it contains many non-relevant columns, it is not easily digested - it is hard to understand.

I can say:

FROM
VIEW_1
WHERE COL_91 = :VV_0_v:VV_0_i AND COL_90 = :VV_1_v:VV_1_i
AND COL_14 <= :VV_2_v:VV_2_i AND COL_14 >= :VV_3_v:VV_3_i
AND COL_9 = :VV_4_v:VV_4_i AND COL_10 = :VV_5_v:VV_5_i
ORDER BY COL_14 DESC, COL_13 DESC;

it looks like you might be applying a predicate to a set of unindexed columns there - so it is exactly what I demonstrated above.

RE: There is still some gap

A reader, December 14, 2006 - 3:33 am UTC

HI, Tom,

Sorry about the HUGE query, a simplified version is as below:
  CREATE TABLE "TABLE_1"
   (    "COL_13" NUMBER(11,0) NOT NULL ENABLE,
        "COL_91" NUMBER(11,0) NOT NULL ENABLE,
        "COL_90" NUMBER(11,0) NOT NULL ENABLE,
        "COL_92" NUMBER(11,0) NOT NULL ENABLE,
        "COL_14" DATE,
        "OTHER_COL" NUMBER(6,0) NOT NULL ENABLE,
         CONSTRAINT "TABLE_1_II" PRIMARY KEY ("COL_91", "COL_90", "COL_13")
   );

  CREATE TABLE "TABLE_2"
   (    "COL_92" NUMBER(11,0) NOT NULL ENABLE,
        "OTHER_COL" NUMBER(38,14) NOT NULL ENABLE,
         CONSTRAINT "TABLE_2_II" PRIMARY KEY ("COL_92")
   );

  CREATE TABLE "TABLE_3"
   (    "COL_13" NUMBER(11,0) NOT NULL ENABLE,
        "COL_9" VARCHAR2(1) NOT NULL ENABLE,
        "COL_10" VARCHAR2(1) NOT NULL ENABLE,
         CONSTRAINT "TABLE_3_II" PRIMARY KEY ("COL_13")
   );

create or replace view view_1 as
select table_1.col_13, table_1.col_91, table_1.col_90, table_1.col_14, table_1.col_92, table_3.col_10, table_3.col_9
from TABLE_3, TABLE_1, TABLE_2
where TABLE_3.COL_13 = TABLE_1.COL_13
and TABLE_2.COL_92 = TABLE_1.COL_92;

And the query is:
SELECT COL_13, COL_91, COL_90, COL_92, COL_10, COL_9, col_14
FROM
 VIEW_1
WHERE  COL_91  =  :VV_0_v:VV_0_i AND  COL_90  = :VV_1_v:VV_1_i
  AND  COL_14  <=  :VV_2_v:VV_2_i AND COL_14  >=  :VV_3_v:VV_3_i
  AND  COL_9  = :VV_4_v:VV_4_i AND  COL_10  =  :VV_5_v:VV_5_i
ORDER BY  COL_14 DESC, COL_13 DESC;

In the where and order by clauses, the predicate columns are:
From Table_1: COL_91, COL_90, COL_14, COL_13
From Table_3: COL_9, COL_10

Please note there is predicate from Table_2. However, the query chose a different access plan when I computed statistics for table_2 for "indexed columns" and for "all columns".

I tried to re-produce the access plan of accessing table_2 with full table scan by storing 10% of the data(4M rows in table_1 & table_3, 200K in table_2), but I could not. No matter I analyzed by indexed columns or all columns this round, Oracle always chooses the index unique scan for table_2.

Below is my testing - I cannot produce the plan accessing table_2 using full table scan:
SQL> explain plan for SELECT COL_13, COL_91, COL_90, COL_92, COL_10, COL_9, col_14
  2  FROM
  3   VIEW_1
  4  WHERE  COL_91  =  :VV_0_v:VV_0_i AND  COL_90  = :VV_1_v:VV_1_i
  5    AND  COL_14  <=  :VV_2_v:VV_2_i AND COL_14  >=  :VV_3_v:VV_3_i
  6    AND  COL_9  = :VV_4_v:VV_4_i AND  COL_10  =  :VV_5_v:VV_5_i
  7  ORDER BY  COL_14 DESC, COL_13 DESC;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------


------------------------------------------------------------------------------

| Id  | Operation                      |  Name       | Rows  | Bytes | Cost  |

------------------------------------------------------------------------------

|   0 | SELECT STATEMENT               |             |       |       |       |

|   1 |  SORT ORDER BY                 |             |       |       |       |

|   2 |   NESTED LOOPS                 |             |       |       |       |

|   3 |    NESTED LOOPS                |             |       |       |       |

|*  4 |     TABLE ACCESS BY INDEX ROWID| TABLE_1     |       |       |       |

|*  5 |      INDEX RANGE SCAN          | TABLE_1_II  |       |       |       |

|*  6 |     INDEX UNIQUE SCAN          | TABLE_2_II  |       |       |       |
|*  7 |    TABLE ACCESS BY INDEX ROWID | TABLE_3     |       |       |       |

|*  8 |     INDEX UNIQUE SCAN          | TABLE_3_II  |       |       |       |

------------------------------------------------------------------------------



Predicate Information (identified by operation id):

---------------------------------------------------



   4 - filter("TABLE_1"."COL_14">=:Z AND "TABLE_1"."COL_14"<=:Z)

   5 - access("TABLE_1"."COL_91"=TO_NUMBER(:Z) AND

              "TABLE_1"."COL_90"=TO_NUMBER(:Z))

   6 - access("TABLE_2"."COL_92"="TABLE_1"."COL_92")


   7 - filter("TABLE_3"."COL_10"=:Z AND "TABLE_3"."COL_9"=:Z)

   8 - access("TABLE_3"."COL_13"="TABLE_1"."COL_13")


Note: rule based optimization


26 rows selected.

SQL> exec dbms_stats.gather_table_stats(user,'TABLE_1',method_opt=>'for all indexed columns');

PL/SQL procedure successfully completed.

SQL>  exec dbms_stats.gather_table_stats(user,'TABLE_2',method_opt=>'for all indexed columns');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user,'TABLE_3',method_opt=>'for all indexed columns');

PL/SQL procedure successfully completed.

SQL> explain plan for SELECT COL_13, COL_91, COL_90, COL_92, COL_10, COL_9, col_14
  2  FROM
  3   VIEW_1
  4  WHERE  COL_91  =  :VV_0_v:VV_0_i AND  COL_90  = :VV_1_v:VV_1_i
  5    AND  COL_14  <=  :VV_2_v:VV_2_i AND COL_14  >=  :VV_3_v:VV_3_i
  6    AND  COL_9  = :VV_4_v:VV_4_i AND  COL_10  =  :VV_5_v:VV_5_i
  7  ORDER BY  COL_14 DESC, COL_13 DESC;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------


-------------------------------------------------------------------------------

| Id  | Operation                       |  Name       | Rows  | Bytes | Cost  |

-------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                |             |     1 |    52 |    13 |

|   1 |  SORT ORDER BY                  |             |     1 |    52 |    13 |

|*  2 |   FILTER                        |             |       |       |       |

|   3 |    NESTED LOOPS                 |             |     1 |    52 |     4 |

|   4 |     NESTED LOOPS                |             |     1 |    42 |     3 |

|*  5 |      TABLE ACCESS BY INDEX ROWID| TABLE_1     |     1 |    36 |     3 |

|*  6 |       INDEX RANGE SCAN          | TABLE_1_II  |   102 |       |     2 |


|*  7 |      INDEX UNIQUE SCAN          | TABLE_2_II  |     1 |     6 |       |

|*  8 |     TABLE ACCESS BY INDEX ROWID | TABLE_3     |     1 |    10 |     1 |

|*  9 |      INDEX UNIQUE SCAN          | TABLE_3_II  | 10000 |       |       |

-------------------------------------------------------------------------------



Predicate Information (identified by operation id):

---------------------------------------------------



   2 - filter(TO_DATE(:Z)>=TO_DATE(:Z))

   5 - filter("TABLE_1"."COL_14"<=:Z AND "TABLE_1"."COL_14">=:Z)

   6 - access("TABLE_1"."COL_91"=TO_NUMBER(:Z) AND

              "TABLE_1"."COL_90"=TO_NUMBER(:Z))

   7 - access("TABLE_2"."COL_92"="TABLE_1"."COL_92")

   8 - filter("TABLE_3"."COL_9"=:Z AND "TABLE_3"."COL_10"=:Z)

   9 - access("TABLE_3"."COL_13"="TABLE_1"."COL_13")



Note: cpu costing is off


28 rows selected.

SQL> exec dbms_stats.gather_table_stats(user,'TABLE_1',method_opt=>'for all columns');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user,'TABLE_3',method_opt=>'for all columns');

PL/SQL procedure successfully completed.

SQL> explain plan for SELECT COL_13, COL_91, COL_90, COL_92, COL_10, COL_9, col_14
  2  FROM
  3   VIEW_1
  4  WHERE  COL_91  =  :VV_0_v:VV_0_i AND  COL_90  = :VV_1_v:VV_1_i
  5    AND  COL_14  <=  :VV_2_v:VV_2_i AND COL_14  >=  :VV_3_v:VV_3_i
  6    AND  COL_9  = :VV_4_v:VV_4_i AND  COL_10  =  :VV_5_v:VV_5_i
  7  ORDER BY  COL_14 DESC, COL_13 DESC;

Explained.

SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------


-------------------------------------------------------------------------------

| Id  | Operation                       |  Name       | Rows  | Bytes | Cost  |

-------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                |             |     1 |    43 |    13 |

|   1 |  SORT ORDER BY                  |             |     1 |    43 |    13 |

|*  2 |   FILTER                        |             |       |       |       |

|   3 |    NESTED LOOPS                 |             |     1 |    43 |     4 |

|   4 |     NESTED LOOPS                |             |     1 |    34 |     3 |

|*  5 |      TABLE ACCESS BY INDEX ROWID| TABLE_1     |     1 |    28 |     3 |

|*  6 |       INDEX RANGE SCAN          | TABLE_1_II  |   102 |       |     2 |


|*  7 |      INDEX UNIQUE SCAN          | TABLE_2_II  |     1 |     6 |       |

|*  8 |     TABLE ACCESS BY INDEX ROWID | TABLE_3     |     1 |     9 |     1 |

|*  9 |      INDEX UNIQUE SCAN          | TABLE_3_II  |   200 |       |       |

-------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------



   2 - filter(TO_DATE(:Z)>=TO_DATE(:Z))

   5 - filter("TABLE_1"."COL_14"<=:Z AND "TABLE_1"."COL_14">=:Z)

   6 - access("TABLE_1"."COL_91"=TO_NUMBER(:Z) AND


              "TABLE_1"."COL_90"=TO_NUMBER(:Z))

   7 - access("TABLE_2"."COL_92"="TABLE_1"."COL_92")

   8 - filter("TABLE_3"."COL_9"=:Z AND "TABLE_3"."COL_10"=:Z)

   9 - access("TABLE_3"."COL_13"="TABLE_1"."COL_13")



Note: cpu costing is off


28 rows selected.

SQL> exec dbms_stats.gather_table_stats(user,'TABLE_2',method_opt=>'for all columns');

PL/SQL procedure successfully completed.
SQL> explain plan for SELECT COL_13, COL_91, COL_90, COL_92, COL_10, COL_9, col_14
  2  FROM
  3   VIEW_1
  4  WHERE  COL_91  =  :VV_0_v:VV_0_i AND  COL_90  = :VV_1_v:VV_1_i
  5    AND  COL_14  <=  :VV_2_v:VV_2_i AND COL_14  >=  :VV_3_v:VV_3_i
  6    AND  COL_9  = :VV_4_v:VV_4_i AND  COL_10  =  :VV_5_v:VV_5_i
  7  ORDER BY  COL_14 DESC, COL_13 DESC;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------


-------------------------------------------------------------------------------

| Id  | Operation                       |  Name       | Rows  | Bytes | Cost  |

-------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                |             |     1 |    43 |    13 |

|   1 |  SORT ORDER BY                  |             |     1 |    43 |    13 |

|*  2 |   FILTER                        |             |       |       |       |

|   3 |    NESTED LOOPS                 |             |     1 |    43 |     4 |

|   4 |     NESTED LOOPS                |             |     1 |    34 |     3 |

|*  5 |      TABLE ACCESS BY INDEX ROWID| TABLE_1     |     1 |    28 |     3 |

|*  6 |       INDEX RANGE SCAN          | TABLE_1_II  |   102 |       |     2 |

|*  7 |      INDEX UNIQUE SCAN          | TABLE_2_II  |     1 |     6 |       |

|*  8 |     TABLE ACCESS BY INDEX ROWID | TABLE_3     |     1 |     9 |     1 |

|*  9 |      INDEX UNIQUE SCAN          | TABLE_3_II  |   200 |       |       |

-------------------------------------------------------------------------------



Predicate Information (identified by operation id):

---------------------------------------------------



   2 - filter(TO_DATE(:Z)>=TO_DATE(:Z))

   5 - filter("TABLE_1"."COL_14"<=:Z AND "TABLE_1"."COL_14">=:Z)

   6 - access("TABLE_1"."COL_91"=TO_NUMBER(:Z) AND


PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
              "TABLE_1"."COL_90"=TO_NUMBER(:Z))

   7 - access("TABLE_2"."COL_92"="TABLE_1"."COL_92")

   8 - filter("TABLE_3"."COL_9"=:Z AND "TABLE_3"."COL_10"=:Z)

   9 - access("TABLE_3"."COL_13"="TABLE_1"."COL_13")



Note: cpu costing is off


28 rows selected.

SQL> select table_name, num_rows, last_analyzed from user_tables where table_name like 'TABLE%';

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- --------------------
TABLE_1                           4000000 14-Dec-2006 15:03:59
TABLE_2                            200000 14-Dec-2006 15:31:12
TABLE_3                           4000000 14-Dec-2006 15:09:09

SQL>

In short, I have two questions about this case:
   1) Why in my production server, Oracle chose to full scan table_2 when table_2 was analyzed by "indexed columns", but chose to index unique scan when analyzed by "all columns", although there was no predicate from table_2?
   2) Any general guideline on when we should analyze for "all columns" and when we should analyze for "indexed columns"?

Thanks
 

Does analyzing table improve performance

A reader, February 19, 2008 - 4:23 am UTC

Tom,

I frequently hear that after analyzing tables performance will improve.Especially when batch jobs are running which are very slow.Could you explain ?

Thanks ,
Girish
Tom Kyte
February 19, 2008 - 4:49 pm UTC

I could so pick apart this:

... I frequently hear that after analyzing tables performance will
improve.Especially when batch jobs are running which are very slow. ...

there are just so many things "wrong" about it.

Ok, facts:

the CBO bases many of its decisions (assigns costs) based on available statistics - how many rows, blocks, clustering factors of indexes, etc.

the CBO estimates cardinality of various row sources (steps in plan). for example, if you "select * from t where x = 5", the optimizer estimates how many rows that will return.

the CBO uses these statistics to generate these guesses. For example, if the CBO "knows" from the statistics that "number of rows in T is 100, number of distinct values of X in T is 20", it will guess "we will get five rows".

When the CBO gets the right estimated cardinalities, understands the data, we generally get the most optimal plans (one thing I say is "wrong cardinality, WRONG plan... correct cardinality, correct plan")

so, if your statistics are way out of date, stale, wrong, incorrect, inaccurate - it is highly likely that the CBO will estimate the cardinality wrong - and hence get the wrong plan.

And since a 'batch' is typically construed as being a program that runs lots of long running SQL - well - if you mess up the plan for a long running sql statment and make it run longer - you can see what might happen.


So, you should frequently hear:

"The CBO relies on accurate statistics to generate plans. Poorly performing plans (plans that you KNOW could perform better) are many times related to stale, missing or invalid/insufficient statistics"

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i37048

IMprovement perfromance after analyzing table

Girish, February 19, 2008 - 4:25 am UTC

Tom,

I frequently hear that after analyzing tables performance will improve.Especially when batch jobs
are running which are very slow.Could you explain ?

Thanks ,
Girish


similar question

A reader, February 20, 2008 - 2:54 am UTC

Hi,
As my database is free in evening, is it good idea to use
estimate_percent =>100 instead of dbms_stats.auto_sample_size in DBMS_STATS.gather_schema_stats procedure.
Tom Kyte
February 20, 2008 - 8:58 am UTC

"it depends" is the only answer to a question like this...

will it make a difference? likely not much.
will it finish in time? perhaps, I don't know...
will it make things better? maybe, could make them worse, could make then not any different.

If what you have in place now is working better than good enough, I'd stick with it.

Alexander, March 25, 2008 - 9:45 am UTC

Hi Tom,

Are bitmap indexes anymore expensive to analyze than b*tree?

Thanks.
Tom Kyte
March 26, 2008 - 8:13 am UTC

well, if you ask me, not totally relevant (as in, it doesn't really matter if they were - you either need a bitmap index - or you don't, you cannot compare them to b*tree's)


but.... bitmaps are generally smaller than a b*tree on the same columm(s) - thus would be "cheaper" than a b*tree to gather statistics on.

Alexander, March 26, 2008 - 9:30 am UTC

I have very little experience with bitmaps, I ask because we have a vendor product that uses many of them. We upgraded the database to 10.2.0.2 and how we can't analyze the tables with bitmaps because it takes forever.

I traced the session but I don't understand what Oracle's doing. It's just tons and tons of resursive sql statements that take fractions of a second. But together, they add up to a lot. These are the totals

********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 0 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 0 0.00 0.00 0 0 0 0

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 6 0.01 0.01
log file sync 3 0.00 0.00
SQL*Net break/reset to client 1 0.00 0.00
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 2 5.56 5.56


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 145 0.43 0.39 8 199 0 0
Execute 575 0.47 0.85 79 871 995 173
Fetch 927 0.07 0.47 125 2159 0 860
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1647 0.97 1.71 212 3229 995 1033

Misses in library cache during parse: 78
Misses in library cache during execute: 74

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 160 0.09 0.47

17 user SQL statements in session.
554 internal SQL statements in session.
571 SQL statements in session.
********************************************************************************
Trace file: ocp04s_ora_1024186.trc
Trace file compatibility: 10.01.00
Sort options: default

1 session in tracefile.
17 user SQL statements in trace file.
554 internal SQL statements in trace file.
571 SQL statements in trace file.
98 unique SQL statements in trace file.
5338 lines in trace file.
1 elapsed seconds in trace file.

Do you have any ideas?
Tom Kyte
March 26, 2008 - 9:52 am UTC

i don't see anything "long running" here at all.

also, hopefully this vendor application is a read only/reporting system - bitmap indexes do NOT work in a transactional environment at all


higher cost

BR, April 09, 2008 - 4:45 pm UTC

Hi Tom,

Does higher cost always equate to bad plan? One of my DBA told it might not necessarily be. Could you please explain.

thanks,
BR.
Tom Kyte
April 10, 2008 - 10:18 am UTC

In theory, in a perfect world, in a world where the CBO got estimated card= values 100% correct 100% of the time

cost would equal time.

But we don't have that, the CBO cannot get the estimated card= values 100% correct 100% of the time, so there will be anomalies whereby a query with a higher cost is "cheaper" to run then a query with a lower cost, yes

re: higher cost

BR, April 10, 2008 - 3:22 pm UTC

Thanks!

KR, July 09, 2008 - 12:48 am UTC

Hi Tom - I've been facing lot of performance issues. This is PS erp environment. When a sql is executed through a program it picks up a plan that seems to be of low cost(monitoring through OEM)single digit but very expensive to finish and takes hours. But when I get a plan through sqlplus, it gives me a different plan with proper index and finsihes in minutes consistently on most cases.
At times through application also it finishes quickly when it picks up the correct plan otherwise it just goes for high sequential read and that just takes few hours to finish.
I am on 10.2.0.1 and planning to upgrade to 10.2.0.4.

1. Why the inconsistency in picking up the plan through the application?
2. I check the sqls when they are stuck using sqlplus, but they always written rows in minutes and consistently the best plan. Why is this difference.
3. I even generated 10053 trace and the plans totally different for the sqlID that took long time to finsh and the cut and paste of the same sql through sqlplus finished in minutes.
4. Is there anything to be checked at the optimizer level why not picking up the right plan(not just picking up the plan with low cost and leads to high sequential read and hence the performance issue)?

Your help is highly appreciated.

KR, July 09, 2008 - 11:28 pm UTC

Thanks a lot Tom! I went through the notes and in fact changed the _opt peak parameter feature to false.
But I still have the issue. My main issue is, for the same literal values, I get two different plans. When I capture the sql through OEM, I see a different plan and when I copy that get an explain plan through sqlplus, I get a different plan.

Isn't bind peaking issue is only related to at the time bind var usage with extreame cordinalities? I took an sqltxplain with all details. That output had about three different plans and the application picked the bad and so high sequential read with bad index plan.

Please advise.

Tom Kyte
July 10, 2008 - 7:20 am UTC

what you see in OEM is what really happened (they query the v$ tables v$sql_plan for example)

what you see in sqlplus is an explain plan if you use autotrace or explain plan directly. explain plan hard parses *right now*, using your environment.

So, one obvious thing would be "the application that submits the sql uses different session settings - maybe they are first_rows, you are all_rows"

There is something different between your session in sqlplus and the application session that submitted the sql.

Performance problem with view

Anne, February 19, 2009 - 1:19 pm UTC

Hi Tom,

We are experiencing poor performance with a complex view (The view is a view of 3 union-alls of another layer of views). Database is 10.2.0.4.0 . I am not familiar with this system, and hence do not know the logic in this view.

I ran couple of simple selects from this view with/without flushing shared pool. Used 10046 trace with level 12, then tkprof-ed the trace file.

The selects I ran were :

1.
select count(*)
from vw_comm_permit_spec_prod


2.
select *
from vw_comm_permit_spec_prod
where 1=1
and permit_id = 2528
order by timber_species_id

Both the statments pretty much returned the same result.

Tkprof for stmt#2 : tkprof tsmmn_ora_15401.trc tk.prf aggregate=no sys=no

returned


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 2.53 2.49 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 51.59 50.56 0 1673808 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 54.13 53.06 0 1673808 0 12

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 78

Rows Row Source Operation
------- ---------------------------------------------------
12 SORT ORDER BY (cr=1730220 pr=0 pw=0 time=54440775 us)
12 VIEW (cr=1729821 pr=0 pw=0 time=71038 us)
12 UNION-ALL (cr=1729821 pr=0 pw=0 time=71001 us)
6 VIEW (cr=748 pr=0 pw=0 time=58331 us)
6 UNION-ALL (cr=748 pr=0 pw=0 time=58312 us)
6 NESTED LOOPS (cr=36 pr=0 pw=0 time=535 us)
6 NESTED LOOPS (cr=28 pr=0 pw=0 time=467 us)
6 NESTED LOOPS (cr=20 pr=0 pw=0 time=347 us)
1 VIEW VW_COMM_PERMIT_APPR_SR (cr=17 pr=0 pw=0 time=295 us)
1 UNION-ALL (cr=17 pr=0 pw=0 time=292 us)
1 NESTED LOOPS (cr=13 pr=0 pw=0 time=245 us)
1 NESTED LOOPS (cr=11 pr=0 pw=0 time=210 us)
1 NESTED LOOPS (cr=9 pr=0 pw=0 time=170 us)
1 NESTED LOOPS (cr=6 pr=0 pw=0 time=121 us)
1 TABLE ACCESS BY INDEX ROWID PERMIT (cr=3 pr=0 pw=0 time=71 us)
1 INDEX UNIQUE SCAN PERMIT_PK (cr=2 pr=0 pw=0 time=48 us)(object id 212116)
1 TABLE ACCESS BY INDEX ROWID AUCTION_COMM_SALE_PERMIT (cr=3 pr=0 pw=0 time=45 us)
1 INDEX UNIQUE SCAN AUCTION_COMM_SALE_PERMIT_PK (cr=2 pr=0 pw=0 time=31 us)(object id 211767)
1 TABLE ACCESS BY INDEX ROWID TRACT (cr=3 pr=0 pw=0 time=43 us)
1 INDEX UNIQUE SCAN TRACT_PK (cr=2 pr=0 pw=0 time=25 us)(object id 212317)
1 TABLE ACCESS BY INDEX ROWID PERMIT_TYPE (cr=2 pr=0 pw=0 time=37 us)
1 INDEX UNIQUE SCAN PERMIT_TYPE_PK (cr=1 pr=0 pw=0 time=19 us)(object id 212149)
1 INDEX UNIQUE SCAN APPRAISAL_PK (cr=2 pr=0 pw=0 time=29 us)(object id 211745)
0 NESTED LOOPS (cr=4 pr=0 pw=0 time=30 us)
0 NESTED LOOPS (cr=4 pr=0 pw=0 time=30 us)
0 NESTED LOOPS (cr=4 pr=0 pw=0 time=26 us)
1 TABLE ACCESS BY INDEX ROWID PERMIT (cr=3 pr=0 pw=0 time=14 us)
1 INDEX UNIQUE SCAN PERMIT_PK (cr=2 pr=0 pw=0 time=8 us)(object id 212116)
0 TABLE ACCESS BY INDEX ROWID INFORMAL_COMM_SALE_PERMIT (cr=1 pr=0 pw=0 time=7 us)
0 INDEX UNIQUE SCAN INFORMAL_COMM_SALE_PERMIT_PK (cr=1 pr=0 pw=0 time=6 us)(object id 212005)
0 TABLE ACCESS BY INDEX ROWID PERMIT_TYPE (cr=0 pr=0 pw=0 time=0 us)
0 INDEX UNIQUE SCAN PERMIT_TYPE_PK (cr=0 pr=0 pw=0 time=0 us)(object id 212149)
0 INDEX UNIQUE SCAN APPRAISAL_PK (cr=0 pr=0 pw=0 time=0 us)(object id 211745)
6 TABLE ACCESS BY INDEX ROWID APPR_SPECIES_PRODUCT (cr=3 pr=0 pw=0 time=95 us)
6 INDEX RANGE SCAN APPR_IDX3 (cr=2 pr=0 pw=0 time=64 us)(object id 211760)
6 INDEX UNIQUE SCAN APPR_SPECIES_PRODUCT_PK (cr=8 pr=0 pw=0 time=91 us)(object id 211759)
6 INDEX UNIQUE SCAN APPR_SPECIES_PRODUCT_PK (cr=8 pr=0 pw=0 time=43 us)(object id 211759)
0 HASH GROUP BY (cr=42 pr=0 pw=0 time=1508 us)
0 VIEW VW_COMM_PERMIT_SPEC_PROD_PRE3 (cr=42 pr=0 pw=0 time=1450 us)
0 MINUS (cr=42 pr=0 pw=0 time=1446 us)
0 SORT UNIQUE (cr=21 pr=0 pw=0 time=485 us)
0 NESTED LOOPS (cr=21 pr=0 pw=0 time=469 us)
0 NESTED LOOPS (cr=21 pr=0 pw=0 time=466 us)
0 HASH JOIN (cr=21 pr=0 pw=0 time=462 us)
6 NESTED LOOPS (cr=19 pr=0 pw=0 time=100 us)
1 VIEW VW_COMM_PERMIT_APPR_SR (cr=16 pr=0 pw=0 time=123 us)
1 UNION-ALL (cr=16 pr=0 pw=0 time=119 us)
1 NESTED LOOPS (cr=12 pr=0 pw=0 time=71 us)
1 NESTED LOOPS (cr=10 pr=0 pw=0 time=58 us)
1 NESTED LOOPS (cr=9 pr=0 pw=0 time=48 us)
1 NESTED LOOPS (cr=6 pr=0 pw=0 time=31 us)
1 TABLE ACCESS BY INDEX ROWID PERMIT (cr=3 pr=0 pw=0 time=13 us)
1 INDEX UNIQUE SCAN PERMIT_PK (cr=2 pr=0 pw=0 time=6 us)(object id 212116)
1 TABLE ACCESS BY INDEX ROWID AUCTION_COMM_SALE_PERMIT (cr=3 pr=0 pw=0 time=13 us)
1 INDEX UNIQUE SCAN AUCTION_COMM_SALE_PERMIT_PK (cr=2 pr=0 pw=0 time=6 us)(object id 211767)
1 TABLE ACCESS BY INDEX ROWID TRACT (cr=3 pr=0 pw=0 time=13 us)
1 INDEX UNIQUE SCAN TRACT_PK (cr=2 pr=0 pw=0 time=6 us)(object id 212317)
1 INDEX UNIQUE SCAN PERMIT_TYPE_PK (cr=1 pr=0 pw=0 time=6 us)(object id 212149)
1 INDEX UNIQUE SCAN APPRAISAL_PK (cr=2 pr=0 pw=0 time=9 us)(object id 211745)
0 NESTED LOOPS (cr=4 pr=0 pw=0 time=35 us)
0 NESTED LOOPS (cr=4 pr=0 pw=0 time=33 us)
0 NESTED LOOPS (cr=4 pr=0 pw=0 time=30 us)
1 TABLE ACCESS BY INDEX ROWID PERMIT (cr=3 pr=0 pw=0 time=20 us)
1 INDEX UNIQUE SCAN PERMIT_PK (cr=2 pr=0 pw=0 time=8 us)(object id 212116)
0 TABLE ACCESS BY INDEX ROWID INFORMAL_COMM_SALE_PERMIT (cr=1 pr=0 pw=0 time=6 us)
0 INDEX UNIQUE SCAN INFORMAL_COMM_SALE_PERMIT_PK (cr=1 pr=0 pw=0 time=4 us)(object id 212005)
0 INDEX UNIQUE SCAN PERMIT_TYPE_PK (cr=0 pr=0 pw=0 time=0 us)(object id 212149)
0 INDEX UNIQUE SCAN APPRAISAL_PK (cr=0 pr=0 pw=0 time=0 us)(object id 211745)
6 INDEX RANGE SCAN APPR_SPECIES_PRODUCT_PK (cr=3 pr=0 pw=0 time=31 us)(object id 211759)
0 TABLE ACCESS BY INDEX ROWID COMM_PERMIT_ADDED_SPEC_PROD (cr=2 pr=0 pw=0 time=42 us)
0 INDEX RANGE SCAN CPASP_REAPPR_IDX (cr=2 pr=0 pw=0 time=33 us)(object id 211820)
0 INDEX UNIQUE SCAN APPR_SPECIES_PRODUCT_PK (cr=0 pr=0 pw=0 time=0 us)(object id 211759)
0 INDEX UNIQUE SCAN APPR_SPECIES_PRODUCT_PK (cr=0 pr=0 pw=0 time=0 us)(object id 211759)
0 SORT UNIQUE (cr=21 pr=0 pw=0 time=957 us)
0 NESTED LOOPS (cr=21 pr=0 pw=0 time=945 us)
0 NESTED LOOPS (cr=21 pr=0 pw=0 time=943 us)
0 HASH JOIN (cr=21 pr=0 pw=0 time=940 us)
6 NESTED LOOPS (cr=19 pr=0 pw=0 time=87 us)
1 VIEW VW_COMM_PERMIT_APPR_SR (cr=16 pr=0 pw=0 time=114 us)
1 UNION-ALL (cr=16 pr=0 pw=0 time=110 us)
1 NESTED LOOPS (cr=12 pr=0 pw=0 time=68 us)
1 NESTED LOOPS (cr=10 pr=0 pw=0 time=54 us)
1 NESTED LOOPS (cr=9 pr=0 pw=0 time=46 us)
1 NESTED LOOPS (cr=6 pr=0 pw=0 time=30 us)
1 TABLE ACCESS BY INDEX ROWID PERMIT (cr=3 pr=0 pw=0 time=13 us)
1 INDEX UNIQUE SCAN PERMIT_PK (cr=2 pr=0 pw=0 time=7 us)(object id 212116)
1 TABLE ACCESS BY INDEX ROWID AUCTION_COMM_SALE_PERMIT (cr=3 pr=0 pw=0 time=11 us)
1 INDEX UNIQUE SCAN AUCTION_COMM_SALE_PERMIT_PK (cr=2 pr=0 pw=0 time=7 us)(object id 211767)
1 TABLE ACCESS BY INDEX ROWID TRACT (cr=3 pr=0 pw=0 time=13 us)
1 INDEX UNIQUE SCAN TRACT_PK (cr=2 pr=0 pw=0 time=6 us)(object id 212317)
1 INDEX UNIQUE SCAN PERMIT_TYPE_PK (cr=1 pr=0 pw=0 time=3 us)(object id 212149)
1 INDEX UNIQUE SCAN APPRAISAL_PK (cr=2 pr=0 pw=0 time=8 us)(object id 211745)
0 NESTED LOOPS (cr=4 pr=0 pw=0 time=29 us)
0 NESTED LOOPS (cr=4 pr=0 pw=0 time=26 us)
0 NESTED LOOPS (cr=4 pr=0 pw=0 time=23 us)
1 TABLE ACCESS BY INDEX ROWID PERMIT (cr=3 pr=0 pw=0 time=13 us)
1 INDEX UNIQUE SCAN PERMIT_PK (cr=2 pr=0 pw=0 time=8 us)(object id 212116)
0 TABLE ACCESS BY INDEX ROWID INFORMAL_COMM_SALE_PERMIT (cr=1 pr=0 pw=0 time=7 us)
0 INDEX UNIQUE SCAN INFORMAL_COMM_SALE_PERMIT_PK (cr=1 pr=0 pw=0 time=5 us)(object id 212005)
0 INDEX UNIQUE SCAN PERMIT_TYPE_PK (cr=0 pr=0 pw=0 time=0 us)(object id 212149)
0 INDEX UNIQUE SCAN APPRAISAL_PK (cr=0 pr=0 pw=0 time=0 us)(object id 211745)
6 INDEX RANGE SCAN APPR_SPECIES_PRODUCT_PK (cr=3 pr=0 pw=0 time=15 us)(object id 211759)
0 TABLE ACCESS BY INDEX ROWID COMM_PERMIT_ADDED_SPEC_PROD (cr=2 pr=0 pw=0 time=28 us)
0 INDEX RANGE SCAN CPASP_REAPPR_IDX (cr=2 pr=0 pw=0 time=11 us)(object id 211820)
0 INDEX UNIQUE SCAN APPR_SPECIES_PRODUCT_PK (cr=0 pr=0 pw=0 time=0 us)(object id 211759)
0 INDEX UNIQUE SCAN APPR_SPECIES_PRODUCT_PK (cr=0 pr=0 pw=0 time=0 us)(object id 211759)
0 TABLE ACCESS BY INDEX ROWID COMM_PERMIT_ADDED_SPEC_PROD (cr=2 pr=0 pw=0 time=13 us)
0 INDEX RANGE SCAN CPASP_REAPPR_IDX (cr=2 pr=0 pw=0 time=9 us)(object id 211820)
6 FILTER (cr=1729047 pr=0 pw=0 time=18107764 us)
33 HASH JOIN (cr=1728826 pr=0 pw=0 time=334569 us)
33 TABLE ACCESS FULL REAPPRAISED_ORIGINAL_SPEC_PROD (cr=3 pr=0 pw=0 time=92 us)
16936 VIEW (cr=1728823 pr=0 pw=0 time=53548630 us)
16936 UNION-ALL (cr=1728823 pr=0 pw=0 time=53514756 us)
16365 NESTED LOOPS (cr=37199 pr=0 pw=0 time=556567 us)
16365 NESTED LOOPS (cr=20832 pr=0 pw=0 time=392916 us)
16365 HASH JOIN (cr=4465 pr=0 pw=0 time=212891 us)
4166 VIEW VW_COMM_PERMIT_APPR_SR (cr=4250 pr=0 pw=0 time=83429 us)
4166 UNION-ALL (cr=4250 pr=0 pw=0 time=79261 us)
3876 NESTED LOOPS (cr=3937 pr=0 pw=0 time=59403 us)
3876 HASH JOIN (cr=59 pr=0 pw=0 time=36123 us)
7 TABLE ACCESS FULL PERMIT_TYPE (cr=3 pr=0 pw=0 time=67 us)
3876 HASH JOIN (cr=56 pr=0 pw=0 time=23418 us)
3876 HASH JOIN (cr=33 pr=0 pw=0 time=14750 us)
3876 TABLE ACCESS FULL AUCTION_COMM_SALE_PERMIT (cr=18 pr=0 pw=0 time=50 us)
4226 VIEW index$_join$_283 (cr=15 pr=0 pw=0 time=20466 us)
4226 HASH JOIN (cr=15 pr=0 pw=0 time=16237 us)
4226 BITMAP CONVERSION TO ROWIDS (cr=1 pr=0 pw=0 time=60 us)
7 BITMAP INDEX FULL SCAN PERMIT_TYPEID_BIDX (cr=1 pr=0 pw=0 time=63 us)(object id 233914)
4226 INDEX FAST FULL SCAN PERMIT_PK (cr=14 pr=0 pw=0 time=84 us)(object id 212116)
4043 TABLE ACCESS FULL TRACT (cr=23 pr=0 pw=0 time=4138 us)
3876 INDEX UNIQUE SCAN APPRAISAL_PK (cr=3878 pr=0 pw=0 time=17443 us)(object id 211745)
290 NESTED LOOPS (cr=313 pr=0 pw=0 time=11589 us)
290 HASH JOIN (cr=21 pr=0 pw=0 time=8969 us)
7 TABLE ACCESS FULL PERMIT_TYPE (cr=3 pr=0 pw=0 time=37 us)
290 HASH JOIN (cr=18 pr=0 pw=0 time=8284 us)
290 TABLE ACCESS FULL INFORMAL_COMM_SALE_PERMIT (cr=3 pr=0 pw=0 time=344 us)
4226 VIEW index$_join$_288 (cr=15 pr=0 pw=0 time=15330 us)
4226 HASH JOIN (cr=15 pr=0 pw=0 time=11099 us)
4226 BITMAP CONVERSION TO ROWIDS (cr=1 pr=0 pw=0 time=46 us)
7 BITMAP INDEX FULL SCAN PERMIT_TYPEID_BIDX (cr=1 pr=0 pw=0 time=73 us)(object id 233914)
4226 INDEX FAST FULL SCAN PERMIT_PK (cr=14 pr=0 pw=0 time=56 us)(object id 212116)
290 INDEX UNIQUE SCAN APPRAISAL_PK (cr=292 pr=0 pw=0 time=1300 us)(object id 211745)
40426 TABLE ACCESS FULL APPR_SPECIES_PRODUCT (cr=215 pr=0 pw=0 time=81027 us)
16365 INDEX UNIQUE SCAN APPR_SPECIES_PRODUCT_PK (cr=16367 pr=0 pw=0 time=137310 us)(object id 211759)
16365 INDEX UNIQUE SCAN APPR_SPECIES_PRODUCT_PK (cr=16367 pr=0 pw=0 time=90157 us)(object id 211759)
571 HASH GROUP BY (cr=14598 pr=0 pw=0 time=291354 us)
582 VIEW VW_COMM_PERMIT_SPEC_PROD_PRE3 (cr=14598 pr=0 pw=0 time=290118 us)
582 MINUS (cr=14598 pr=0 pw=0 time=289533 us)
661 SORT UNIQUE (cr=10037 pr=0 pw=0 time=163941 us)
2823 NESTED LOOPS (cr=10037 pr=0 pw=0 time=190879 us)
2823 NESTED LOOPS (cr=7212 pr=0 pw=0 time=173939 us)
2823 HASH JOIN (cr=4387 pr=0 pw=0 time=151349 us)
672 TABLE ACCESS FULL COMM_PERMIT_ADDED_SPEC_PROD (cr=7 pr=0 pw=0 time=53 us)
16365 HASH JOIN (cr=4380 pr=0 pw=0 time=103901 us)
4166 VIEW VW_COMM_PERMIT_APPR_SR (cr=4248 pr=0 pw=0 time=88550 us)
4166 UNION-ALL (cr=4248 pr=0 pw=0 time=84382 us)
3876 NESTED LOOPS (cr=3936 pr=0 pw=0 time=69270 us)
3876 HASH JOIN (cr=58 pr=0 pw=0 time=42128 us)
3876 NESTED LOOPS (cr=35 pr=0 pw=0 time=33341 us)
3876 HASH JOIN (cr=33 pr=0 pw=0 time=13948 us)
3876 TABLE ACCESS FULL AUCTION_COMM_SALE_PERMIT (cr=18 pr=0 pw=0 time=43 us)
4226 VIEW index$_join$_404 (cr=15 pr=0 pw=0 time=20269 us)
4226 HASH JOIN (cr=15 pr=0 pw=0 time=16014 us)
4226 BITMAP CONVERSION TO ROWIDS (cr=1 pr=0 pw=0 time=52 us)
7 BITMAP INDEX FULL SCAN PERMIT_TYPEID_BIDX (cr=1 pr=0 pw=0 time=77 us)(object id 233914)
4226 INDEX FAST FULL SCAN PERMIT_PK (cr=14 pr=0 pw=0 time=4302 us)(object id 212116)
3876 INDEX UNIQUE SCAN PERMIT_TYPE_PK (cr=2 pr=0 pw=0 time=9854 us)(object id 212149)

.....
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net more data to client 1 0.00 0.00
SQL*Net message from client 2 0.11 0.16
********************************************************************************
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 2.53 2.49 0 0 0 0
Execute 4 0.00 0.01 0 0 0 2
Fetch 2 51.59 50.56 0 1673808 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10 54.13 53.07 0 1673808 0 14

Misses in library cache during parse: 1

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 9 0.00 0.00
SQL*Net message from client 9 152.17 273.03
SQL*Net more data to client 1 0.00 0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 308 0.00 0.00 0 0 0 0
Execute 19139 3.65 3.54 0 0 0 0
Fetch 19139 0.39 0.38 21 56987 0 19138
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 38586 4.06 3.93 21 56987 0 19138

Misses in library cache during parse: 10
Misses in library cache during execute: 9

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 21 0.00 0.00

11 user SQL statements in session.
301 internal SQL statements in session.
312 SQL statements in session.
********************************************************************************
Trace file: tsmmn_ora_15401.trc
Trace file compatibility: 10.01.00
Sort options: default

1 session in tracefile.
11 user SQL statements in trace file.
301 internal SQL statements in trace file.
312 SQL statements in trace file.
13 unique SQL statements in trace file.
269131 lines in trace file.
330 elapsed seconds in trace file.

....

Sorry but it goes on and on! I debugged the plan for time taken, and the final line that it came down to, was :
4226 INDEX FAST FULL SCAN PERMIT_PK (cr=14 pr=0 pw=0 time=84 us)(object id 212116)

whcih is part of :


4226 HASH JOIN (cr=15 pr=0 pw=0 time=16237 us)
4226 BITMAP CONVERSION TO ROWIDS (cr=1 pr=0 pw=0 time=60 us)
7 BITMAP INDEX FULL SCAN PERMIT_TYPEID_BIDX (cr=1 pr=0 pw=0 time=63 us)(object id 233914)
4226 INDEX FAST FULL SCAN PERMIT_PK (cr=14 pr=0 pw=0 time=84 us)(object id 212116)


This is a bit-map index. Could you please advise if I am following the right approach in debugging this problem, and if you see something in the plan that stands out.

Sorry about the lengthy update, this view is so complex.

Thanks so much and as always appreciate your help!


Poor performance of View - contd

Anne, February 19, 2009 - 1:21 pm UTC

Looks like it truncated the last part of my update :

I debugged the plan for time taken, and the final line that it came down to, was :
4226 INDEX FAST FULL SCAN PERMIT_PK (cr=14 pr=0 pw=0 time=84 us)(object id 212116)

whcih is part of :


4226 HASH JOIN (cr=15 pr=0 pw=0 time=16237 us)
4226 BITMAP CONVERSION TO ROWIDS (cr=1 pr=0 pw=0 time=60 us)
7 BITMAP INDEX FULL SCAN PERMIT_TYPEID_BIDX (cr=1 pr=0 pw=0 time=63 us)(object id 233914)
4226 INDEX FAST FULL SCAN PERMIT_PK (cr=14 pr=0 pw=0 time=84 us)(object id 212116)


This is a bit-map index. Could you please advise if I am following the right approach in debugging this problem.

Sorry about the lengthy update, this view is so complex. If you need to code, or anything else please let me know.

Thanks so much and as always appreicate your help!

Tom Kyte
February 19, 2009 - 2:25 pm UTC

well, the time= values do not show that this is the "big one"

total    4  54.13    53.06      0  1673808      0      12

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 78

Rows  Row Source Operation
------- ---------------------------------------------------
  12 SORT ORDER BY (cr=1730220 pr=0 pw=0 time=54440775 us) 



so, of the 54 seconds, this does not look like the large bit.

But, it is really hard for me to look at a really large plan....

Poor performance with View

Anne, February 19, 2009 - 3:00 pm UTC

Thanks TOm! I know it is really hard, and I apologize.

Basically what I did is, walked down the plan tree and figured out the nodes under the same "indent". So starting from the top of the plan, the first two lines did not have any children. The third one did which I marked as "A" and it's children as A1, A2 ,... :

A
12 UNION-ALL (cr=1729821 pr=0 pw=0 time=71001 us)

A1
6 VIEW (cr=748 pr=0 pw=0 time=58331 us)
A2
6 FILTER (cr=1729047 pr=0 pw=0 time=18107764 us)
A3
0 HASH JOIN (cr=3 pr=0 pw=0 time=633 us)


Of this, A2 has the max time, So I then drilled down A2. I continued this way. Is this the right approach ?



Also :

select count(*) from Vw_comm_permit_spec_prod
returns 16975 rows in 05 mins

and
select * from Vw_comm_permit_spec_prod where permit_id = 2528
returns 12 rows in 01 min.

Do you think I should do a tkprof for the count(*) too ?

Thanks.

Tom Kyte
February 19, 2009 - 5:02 pm UTC

Anne -

email me the tkprof, I cannot see that number 18107764 - something looks 'funny', but I cannot see it all

but - please take into consideration that right now I'm in the airport - I'll be getting on a plane to take me from Washington DC, to Perth Australia - the next time I'll see this might be - well, a long time from now......

But yes, your approach sounds sound - find the 'big time' but something isn't adding up in the above.


and if you want my opinion on views it would be:

a view is like a subroutine, very very special purpose, you write a view to solve a problem - NOT ALL PROBLEMS - but a specific one. Views are not to be built on other views (one layer) and very problem focused. Generic views that join 500 tables together and can answer any question - run away, fast - run far far far away.


Poor performance of View

Anne, February 19, 2009 - 5:25 pm UTC

Thanks Tom. Yes, that is precisely what I don't get either - why they do not add up. But Thanks for offering to take a look into it - I will email it to you.

Thanks,
Anne

Poor performance of View

Anne, February 19, 2009 - 5:46 pm UTC

Hi Tom,

I just emailed it to you. You could not see the 18107764 because I could not paste the entire plan - it was erroring out upon Submit.

Thanks,
Anne

Which query has better response time

Anne, March 03, 2009 - 3:52 pm UTC

Hi Tom,

This question is related to my previous posting on tuning a complex view (The view is a view of 3 union-alls of
another layer of views). Database is 10.2.0.4.0 .

I've created 2 versions of the tuned view and am trying to decide which one is better. I will paste the trace with wait results, and if you could help me decide, I would really appreciate it.

View1: vw_cpsp_abk_1_2_3
View2: vw_cpsp_abk_1_2_2_1

If I run a select * from the same View multiple times, I get different Elapsed times.

View1:
------
select * from vw_cpsp_abk_1_2_3

16975 rows selected.
Elapsed: 00:00:55.10

2nd time
Elapsed: 00:00:54.80

3rd time:
Elapsed: 00:01:31.05

View2
-----
select * from vw_cpsp_abk_1_2_2_1

16975 rows selected.
Elapsed: 00:00:46.73

2nd time:
Elapsed: 00:01:37.44

3rd time:
Elapsed: 00:01:36.99

To understand the time differences in elapsed time for the same sql , I did a trace with waits. Tkprof results are :

View1: Trace#1
-----------------
select *
from vw_cpsp_abk_1_2_3


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1133 45.31 44.82 0 2251019 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1135 45.32 44.83 0 2251019 0 16975

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Rows Row Source Operation
------- ---------------------------------------------------
16975 HASH JOIN (cr=2335007 pr=0 pw=0 time=14149601 us)
107 TABLE ACCESS FULL TIMBER_PRODUCT (cr=6 pr=0 pw=0 time=320 us)
16975 HASH JOIN (cr=2335001 pr=0 pw=0 time=14079382 us)
71 TABLE ACCESS FULL TIMBER_SPECIES (cr=3 pr=0 pw=0 time=128 us)
16975 HASH JOIN (cr=2334998 pr=0 pw=0 time=14009590 us)
11 TABLE ACCESS FULL UNIT_OF_MEASURE (cr=3 pr=0 pw=0 time=75 us)
16975 VIEW (cr=2334995 pr=0 pw=0 time=13922997 us)
16975 UNION-ALL (cr=2334995 pr=0 pw=0 time=13889026 us)
..................................

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1133 0.00 0.00
SQL*Net more data to client 1 0.00 0.00
SQL*Net message from client 1133 0.29 125.23
********************************************************************************

............

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 2
Fetch 1133 45.31 44.82 0 2251019 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1141 45.32 44.83 0 2251019 0 16977

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1141 0.00 0.00
SQL*Net message from client 1141 30.68 164.57
SQL*Net more data to client 1 0.00 0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 86 0.01 0.01 0 0 0 0
Execute 230899 45.51 44.12 0 0 0 0
Fetch 230951 9.45 9.09 4 1077760 0 230740
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 461936 54.98 53.24 4 1077760 0 230740

Misses in library cache during parse: 34
Misses in library cache during execute: 32

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 4 0.00 0.00

22 user SQL statements in session.
140 internal SQL statements in session.
162 SQL statements in session.
********************************************************************************
Trace file: tsmmndnr_ora_14363.trc
Trace file compatibility: 10.01.00
Sort options: prsela exeela fchela
1 session in tracefile.
22 user SQL statements in trace file.
140 internal SQL statements in trace file.
162 SQL statements in trace file.
40 unique SQL statements in trace file.
3365687 lines in trace file.
263 elapsed seconds in trace file.

View1: Trace#2
---------------

select *
from vw_cpsp_abk_1_2_3


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1133 41.93 41.45 0 2251017 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1135 41.93 41.45 0 2251017 0 16975

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 78

Rows Row Source Operation
------- ---------------------------------------------------
16975 HASH JOIN (cr=2335005 pr=0 pw=0 time=12602674 us)
107 TABLE ACCESS FULL TIMBER_PRODUCT (cr=4 pr=0 pw=0 time=224 us)
16975 HASH JOIN (cr=2335001 pr=0 pw=0 time=12532539 us)
71 TABLE ACCESS FULL TIMBER_SPECIES (cr=3 pr=0 pw=0 time=196 us)
16975 HASH JOIN (cr=2334998 pr=0 pw=0 time=12462763 us)
11 TABLE ACCESS FULL UNIT_OF_MEASURE (cr=3 pr=0 pw=0 time=68 us)
16975 VIEW (cr=2334995 pr=0 pw=0 time=12376183 us)
.........................



Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1133 0.00 0.00
SQL*Net more data to client 1 0.00 0.00
SQL*Net message from client 1133 0.39 70.39
********************************************************************************
................

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 2
Fetch 1133 41.93 41.45 0 2251017 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1141 41.93 41.45 0 2251017 0 16977

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1141 0.00 0.00
SQL*Net message from client 1141 19.26 99.14
SQL*Net more data to client 1 0.00 0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 18 0.00 0.00 0 0 0 0
Execute 230759 40.08 38.82 0 0 0 0
Fetch 230759 8.41 8.11 0 1077410 0 230486
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 461536 48.50 46.94 0 1077410 0 230486

Misses in library cache during parse: 0

22 user SQL statements in session.
0 internal SQL statements in session.
22 SQL statements in session.
********************************************************************************
Trace file: tsmmndnr_ora_21825.trc
Trace file compatibility: 10.01.00
Sort options: prsela exeela fchela
1 session in tracefile.
22 user SQL statements in trace file.
0 internal SQL statements in trace file.
22 SQL statements in trace file.
21 unique SQL statements in trace file.
3363174 lines in trace file.
188 elapsed seconds in trace file.


View2: Trace#1
-----------------
select *
from
vw_cpsp_abk_1_2_2_1


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 1.66 1.64 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1133 38.99 38.29 0 2139695 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1135 40.66 39.94 0 2139695 0 16975

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 78

Rows Row Source Operation
------- ---------------------------------------------------
16975 HASH JOIN (cr=2223683 pr=0 pw=0 time=11379330 us)
107 TABLE ACCESS FULL TIMBER_PRODUCT (cr=6 pr=0 pw=0 time=200 us)
16975 HASH JOIN (cr=2223677 pr=0 pw=0 time=11308999 us)
71 TABLE ACCESS FULL TIMBER_SPECIES (cr=3 pr=0 pw=0 time=114 us)
16975 HASH JOIN (cr=2223674 pr=0 pw=0 time=11239057 us)
11 TABLE ACCESS FULL UNIT_OF_MEASURE (cr=3 pr=0 pw=0 time=78 us)
16975 VIEW (cr=2223671 pr=0 pw=0 time=11169772 us)
16975 UNION-ALL (cr=2223671 pr=0 pw=0 time=11135806 us)
.................

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1133 0.00 0.00
SQL*Net more data to client 1 0.00 0.00
SQL*Net message from client 1133 0.20 46.63
********************************************************************************
.......
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 1.66 1.64 0 0 0 0
Execute 4 0.00 0.00 0 0 0 2
Fetch 1133 38.99 38.29 0 2139695 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1141 40.66 39.94 0 2139695 0 16977

Misses in library cache during parse: 1

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1141 0.00 0.00
SQL*Net message from client 1141 22.77 81.20
SQL*Net more data to client 1 0.00 0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 241 0.00 0.00 0 0 0 0
Execute 230982 38.02 37.10 0 0 0 0
Fetch 230982 8.05 7.71 0 1077860 0 230709
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 462205 46.08 44.82 0 1077860 0 230709

Misses in library cache during parse: 0

22 user SQL statements in session.
223 internal SQL statements in session.
245 SQL statements in session.
********************************************************************************
Trace file: tsmmndnr_ora_20552.trc
Trace file compatibility: 10.01.00
Sort options: prsela exeela fchela
1 session in tracefile.
22 user SQL statements in trace file.
223 internal SQL statements in trace file.
245 SQL statements in trace file.
23 unique SQL statements in trace file.
3366490 lines in trace file.
166 elapsed seconds in trace file.

View2: Trace#2
---------------
select *
from
vw_cpsp_abk_1_2_2_1


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 1.65 1.61 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1133 39.68 39.18 0 2139693 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1135 41.34 40.81 0 2139693 0 16975

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 78

Rows Row Source Operation
------- ---------------------------------------------------
16975 HASH JOIN (cr=2223681 pr=0 pw=0 time=11886001 us)
107 TABLE ACCESS FULL TIMBER_PRODUCT (cr=4 pr=0 pw=0 time=200 us)
16975 HASH JOIN (cr=2223677 pr=0 pw=0 time=11816302 us)
71 TABLE ACCESS FULL TIMBER_SPECIES (cr=3 pr=0 pw=0 time=114 us)
16975 HASH JOIN (cr=2223674 pr=0 pw=0 time=11746805 us)
11 TABLE ACCESS FULL UNIT_OF_MEASURE (cr=3 pr=0 pw=0 time=57 us)
16975 VIEW (cr=2223671 pr=0 pw=0 time=11660517 us)
16975 UNION-ALL (cr=2223671 pr=0 pw=0 time=11626548 us)
.....................

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1133 0.00 0.00
SQL*Net more data to client 1 0.00 0.00
SQL*Net message from client 1133 0.32 69.39
********************************************************************************

..........
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 1.65 1.61 0 0 0 0
Execute 4 0.00 0.00 0 0 0 2
Fetch 1133 39.68 39.18 0 2139693 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1141 41.34 40.81 0 2139693 0 16977

Misses in library cache during parse: 1

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1141 0.00 0.00
SQL*Net message from client 1141 24.40 103.90
SQL*Net more data to client 1 0.00 0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 241 0.00 0.00 0 0 0 0
Execute 230982 40.09 39.06 0 0 0 0
Fetch 230982 8.61 8.09 0 1077860 0 230709
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 462205 48.71 47.16 0 1077860 0 230709

Misses in library cache during parse: 0

22 user SQL statements in session.
223 internal SQL statements in session.
245 SQL statements in session.
********************************************************************************
Trace file: tsmmndnr_ora_22442.trc
Trace file compatibility: 10.01.00
Sort options: prsela exeela fchela
1 session in tracefile.
22 user SQL statements in trace file.
223 internal SQL statements in trace file.
245 SQL statements in trace file.
23 unique SQL statements in trace file.
3366494 lines in trace file.
192 elapsed seconds in trace file.


Sorry about the lengthy update.

Thanks so much for your help!




Tom Kyte
March 03, 2009 - 9:19 pm UTC

looks like most of the time was spent waiting for your client process (whatever it was) to actually ask us to do stuff.


Each of the queries took about 40 seconds. The client took the rest. They are pretty similar performance wise, the 2nd has a slight edge.

They both do a ton of work to get 17,000 rows. Seems like too much work for so few rows.

Which query has better response time

Anne, March 04, 2009 - 10:04 am UTC

Thanks so much Tom! Yes, that is what I thought too - the 2nd one is slightly better. There is some more tuning that can be done to the inner layers though....

Couple of questions :
1. When you say "The client took the rest" , I would like to understand how you got this .

1a.Are you taking it from the wait events - basically per trace, Total waited for SQL*Net message to client + Total waited for SQL*Net more data to client + Total waited for SQL*Net message from client ?

1b. Also which wait events are you looking at ?

View1: Trace#1
-----------------

select *
from vw_cpsp_abk_1_2_3

Are you looking at :


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1133 45.31 44.82 0 2251019 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1135 45.32 44.83 0 2251019 0 16975

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1133 0.00 0.00
SQL*Net more data to client 1 0.00 0.00
SQL*Net message from client 1133 0.29 125.23



OR

are you looking at :

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 2
Fetch 1133 45.31 44.82 0 2251019 0 16975
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1141 45.32 44.83 0 2251019 0 16977

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1141 0.00 0.00
SQL*Net message from client 1141 30.68 164.57
SQL*Net more data to client 1 0.00 0.00

********************************************************************************

1c. "Each of the queries took about 40 seconds" - how did you get that ? . Is it trom the 'Elapsed time" for the query itself , or, is it from the "elapsed time" of overall total for non-recursive statements ?


2. Why do I get different Elapsed times when I run the same query multiple times? Perhaps there was some other processing going on/network issues , which was causing the 3rd time to be much slower ?

select * from vw_cpsp_abk_1_2_3

16975 rows selected.
Elapsed: 00:00:55.10

2nd time
Elapsed: 00:00:54.80

3rd time:
Elapsed: 00:01:31.05

Thanks again for sharing your wealth of knowledge!
Tom Kyte
March 04, 2009 - 1:36 pm UTC

... 1. When you say "The client took the rest" , I would like to understand how you
got this .
...



easy, you said the query was taking over a minute, the time spent in the database is clearly listed in the tkprof. The tkprof shows elapsed time spent in the server.

call     count       cpu    elapsed       disk      query    current   rows
------- ------  -------- ---------- ---------- ---------- ----------  
total     1135     45.32      44.83          0    2251019          0   16975

we spent about 45 seconds doing that query.


2) exactly. Or your client started a virus scan or you opened Word or something.

Which query has better response time

Anne, March 04, 2009 - 3:26 pm UTC

Thanks Tom! So basically the tkprof query Elapsed time + sqlnet client times should be close to the "Elapsed time" that I got when I ran the query from client slqplus. Is this correct ?

Tom Kyte
March 04, 2009 - 4:19 pm UTC

depends, if the client was ONLY doing this one thing - sure, but most real applications have more than one thing going on (so the sql net message from client appears on whatever statement it feels like).

if you look at it from the totals perspective - and if the client a) connects, b) immediately does work in the database, c) disconnects as soon as it is finished - you can say this.


but if the client connects, sits there for a while, does something, hangs out, does something else, hangs out, disconnects - you cannot - there will be lots of extra "message from client" time.

Thanks

Anne, March 05, 2009 - 9:23 am UTC

Thanks Tom!

Pratibha Malhotra, November 20, 2009 - 1:35 am UTC

Tom,

I am runnig a procedure on oracle10g. My procedure runs fine before statistics are collected by default by GATHER_STATS_JOB [default estimate_percent of 100%].
After statistics are collected performance of my insert queries degrade heavily.

When I used estimate_percent for 30%, procedure worked fine.

Below is Explain plan of one of insert query in procedure. Estimation_percent = 30%
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |---------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 216 | 43848 | 1019 (1)| 00:00:13 | | |
| 1 | HASH GROUP BY | | 216 | 43848 | 1019 (1)| 00:00:13 | | |
|* 2 | HASH JOIN | | 216 | 43848 | 1018 (1)| 00:00:13 | | |
|* 3 | TABLE ACCESS BY INDEX ROWID | IT_HRMS_DEPARTMENT | 1 | 73 | 5 (0)| 00:00:01 |
| |
| 4 | NESTED LOOPS | | 55 | 9240 | 324 (0)| 00:00:04 | | |
| 5 | NESTED LOOPS | | 55 | 5225 | 49 (0)| 00:00:01 | | |
| 6 | TABLE ACCESS FULL | IT_HRMS_GRADE | 1 | 9 | 2 (0)| 00:00:01 |
| |
|* 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| IT_TIMES | 55 | 4730 | 47 (0)| 00:00:01 | ROWID | ROWID |
|* 8 | INDEX RANGE SCAN | IDX_TIMES_EMPID | 55 | | 2 (0)| 00:00:01 | | |
|* 9 | INDEX RANGE SCAN | IDX_HRMS_COA | 6 | | 1 (0)| 00:00:01 | | |
| 10 | PARTITION HASH ALL | | 7881 | 269K| 693 (1)| 00:00:09 | 1 | 2 |
|* 11 | TABLE ACCESS FULL | IT_PBI_INFO | 7881 | 269K| 693 (1)| 00:00:09 | 1 | 2 |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("PROJECTNO"="COACODE")
3 - filter("IT_HRMS_DEPARTMENT"."EMPLOYEE_NO"=TO_NUMBER("EMPLOYEEID"))
7 - filter("ISPROCESSED"='N')
8 - access("IT_HRMS_GRADE"."EMPLOYEE_NO"="EMPLOYEEID")
9 - access("COA"="DEPTCOA")
11 - filter("COACODE" IS NOT NULL)

Note
-----
- SQL profile "SYS_SQLPROF_0149431bb54a4001" used for this statement

IT_TIMES is range partitioned with 2.6Lakh data.
IT_PBI_INFO is hash partitioned(2) with 3L data.
these tables would be incremented daily and my procedure would run daily once.

Is it okay to have estimate_percent=30%. How should I arrive at optimize percent value for my database.

Oracle's enterprise manager however says that "optimizer requires up-to-date statistics".
Statistics Table "MCC_STAGING"."IT_PBI_INFO" was not analyzed. Consider collecting optimizer statistics for this table. The optimizer requires up-to-date statistics for the table in order to select a good execution plan.
Statistics Index "MCC_STAGING"."IT_PBI_INFO_PK" was not analyzed. Consider collecting optimizer statistics for this index. The optimizer requires up-to-date statistics for the index in order to select a good execution plan.


Please help and provide pointers to proceeed


Tom Kyte
November 23, 2009 - 3:55 pm UTC

... Statistics Table "MCC_STAGING"."IT_PBI_INFO" was not analyzed.....

... Statistics Index "MCC_STAGING"."IT_PBI_INFO_PK" was not analyzed. ...

I thought you said you gathered statistics?


estimates are frequently used, especially for large objects - less than 30% is typically used.

Pratibha Malhotra, November 20, 2009 - 3:33 am UTC

In addition to my queries in above post, I also wanted to check that
1) is it required to collect statistics seperatly for your schema/tables even if GATHER_STATS_JOB is doing it every night.

2) Should I collect staistics for indexes seperatly or GATHER_STATS_JOB/
EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS('MCC_STAGING',DBMS_STATS.AUTO_SAMPLE_SIZE);
takes care of indexes automatically.
Tom Kyte
November 23, 2009 - 3:56 pm UTC

1) if you are running the default job, you don't need to run yours - or make sure yours runs BEFORE the default job (so that duplicate work is not done, if you just analyzed something, the default job will not redo it)

2) yes, it would do indexes on tables it decided to gather statistics on

Pratibha, November 23, 2009 - 11:11 pm UTC

... Statistics Table "MCC_STAGING"."IT_PBI_INFO" was not analyzed.....
... Statistics Index "MCC_STAGING"."IT_PBI_INFO_PK" was not analyzed. ...
I thought you said you gathered statistics?


That's what surprised me. I was relying on Oracle10g default GATHER_STATS_JOB to gather statistics taht runs at 10 every night.
However EM indicates that table was not analyzed. I am missing something here. Shouldn't GATHER_STATS_JOB take care of this.
Q2: And why is GATHER_STATS_JOB is taking estimate_percent = 100%.

Tom Kyte
November 24, 2009 - 10:33 am UTC

ensure job is running
ensure statistics are not 'locked' on these tables
review the data dictionary to verify that statistics were not gathered on that table (last analyzed in the dictionary).
review the data dictionary to verify that statistics on some things were gathered to prove the job is running (eg: start looking around at obvious things to look at to verify everything)


estimate percent is a function of many things - a small table, it'll use 100% (complete). A table that it starts gathering statistics for but notices certain 'patterns', it could well stop - and restart with a larger percent.

You are using the "automatic" job, every parameter by default is "automatic", meaning we use whatever value we feel is correct. If you don't like that, you would create your own database job to do it.

Pratibha, November 24, 2009 - 10:48 am UTC

Thanks for your response Tom.

I have confirmed that my job isrunning fine and collects ststistics every night at 10:00PM.

One of my main table that is being used in many queries in the procedure is partitioned on date.
It however currently is skewed due to test data available. Having 4 empty partitions should not be a problem.

Can this be the reason behind the issue. When I collect statistics on 33% or delete statistics, my procedure works just fine.

Since data to my tables would increase in big way daily, is it safe to go for 33% estimate. How do i ensure it doesn't cause issues at the production later.

Unfortunaltly, my customer doesn'y have a DBA and I have been told to ensure everything.

Please help
Tom Kyte
November 24, 2009 - 12:00 pm UTC

do you or do you not have valid statistics on the two referenced objects.

... Having 4 empty
partitions should not be a problem.
..

sure it could, if you use bind variables and have local partition statistics. search for "bind variable peeking" for why it could be.


if you delete statistics, we sample the data - which gives us a good guess at the data.

If a 10% sample gives you better than good enough statistic, no reason to do 33% or a high number like that. Just sample a bit of the data.

bitmap j index

satya, October 04, 2010 - 9:04 am UTC

Tom,

The bitmap join index is not working on the below scenario. I have a fact table that has more than one time_key ( customer_order_dt_key, customer_order_shipd_dt_key, cust_paymnt_prcsd_dt_key)

I want to join all these dt_key columns with time_dim.dt_key column. The reason our reporting team can do the queries on of these date key, what is your sugesstion on these . how do we create the "bitmap join index" .Can you please help me

Thanks


Tom Kyte
October 05, 2010 - 1:50 am UTC

give example schema to work with. your example is not sufficient to get started with. make it small, but make it runnable.

Bjindex

satya, October 08, 2010 - 5:11 pm UTC

Attached are the examples. Sorry for the delay
I have several other date_key columns in fact table, would like create all of them in the join index. Would it be possible. otherwise can you give any other example if you have any 

SQL> create bitmap index fcust_dimdt_ind2 on fact_cust_detl(calendar_dt) 
  2  from fact_dw.fact_cust_detl fc, fact_dw.dim_time dt 
  3  where fc.cust_ord_dt_KEY = dt.time_key 
  4  and dt.time_key = fc.cust_procd_dt_key 
  5  local; 
from fact_dw.fact_cust_detl fc, fact_dw.dim_time dt 
                                 * 
ERROR at line 2: 
ORA-25954: missing primary key or unique constraint on dimension 


SQL> 

Tom Kyte
October 11, 2010 - 12:01 pm UTC

did you create the necessary constraints? we need to know that the table you create the index on it key preserved (eg: is joined to things by those things primary/unique keys)


the error message really seemed very helpful here - didn't it? It is pointing to the exact table that needs a primary key and saying "we need a primary key"


Your example is still entirely insufficient (what error do you think I'd receive if I were to run it?)

But the error message was amazingly good in this case so it is pretty trivial to see what is wrong..

sat, October 11, 2010 - 2:01 pm UTC

Would it be possible for you to show some example on the bitmap join index with more the one fact.date_key join with the date_dime.date_key. do you have any examples please.


Tom Kyte
October 11, 2010 - 2:22 pm UTC

... with
more the one fact.date_key join with the date_dime.date_key. ...

unable to parse that, does this example help?


ops$tkyte%ORA11GR2> create table emp
  2  as
  3  select * from scott.emp;

Table created.

ops$tkyte%ORA11GR2> create table dept
  2  as
  3  select * from scott.dept;

Table created.

ops$tkyte%ORA11GR2> alter table dept add constraint dept_pk primary key(deptno);

Table altered.

ops$tkyte%ORA11GR2> create bitmap index emp_bm_idx
  2  on emp( d.dname )
  3  from emp e, dept d
  4  where e.deptno = d.deptno;

Index created.

A reader, October 12, 2010 - 7:40 am UTC

Below is part of the table. The actual table is very long and it has lot of other columns. Our reporting team can query the data using either by cust_ord_date_key or cust_ord_procsd_dt_key. Now I had some difficulty to extract the data. Also the data is currently partitioned by 
cust_ord_date_key i.e ( to_char(cust_ord_date,'YYYYMM').

My question is can we create the bitmap join index on "fact.date_key" columns referencing the same time_dim.calendar_dt. 

Below is the example. 

SQL*Plus: Release 10.1.0.2.0 - Production on Tue Oct 12 08:25:10 2010

Copyright (c) 1982, 2004, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> create table cust_fact (customer_key       number,
  2  prod_key   number,
  3  cust_ord_date_key       number,
  4  cust_ord_procsd_dt_key  number,
  5  cust_locn_key           number,
  6  product_ship_dt_key     number,
  7  etl_timestmp            date) 
  8  /

Table created.

SQL> create index cust_ord_idx1  on cust_fact(cust_ord_date_key) 
  2  /

Index created.

SQL> create index cust_ord_prcd_idx1 on cust_fact(cust_ord_procsd_dt_key) 
  2  /

Index created.

SQL> create index cust_locn_key_idx1 on cust_fact(cust_locn_key) 
  2  /

Index created.

SQL*Plus: Release 10.1.0.2.0 - Production on Tue Oct 12 08:25:10 2010

Copyright (c) 1982, 2004, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> create table cust_fact (customer_key       number,
  2  prod_key   number,
  3  cust_ord_date_key       number,
  4  cust_ord_procsd_dt_key  number,
  5  cust_locn_key           number,
  6  product_ship_dt_key     number,
  7  etl_timestmp            date) 
  8  /

Table created.

SQL> create index cust_ord_idx1  on cust_fact(cust_ord_date_key) 
  2  /

Index created.

SQL> create index cust_ord_prcd_idx1 on cust_fact(cust_ord_procsd_dt_key) 
  2  /

Index created.

SQL> create index cust_locn_key_idx1 on cust_fact(cust_locn_key) 
  2  /

Index created.

SQL> create bitmap index factcust_dtime_idx1 on cust_fact(dt.calendar_dt) from cust_fact cf, time_dim  
dt where dt.date_key = cf.cust_ord_date_key ;

Index created.



When I am trying to create the below index it is failing, because Calendar_dt is already indexed on cust_fact table,we cannot create another index. Is there way we create second index?


SQL> create bitmap index factcust_dtime_idx2 on cust_fact(dt.calendar_dt) from cust_fact cf, time_dim

  2  dt where dt.date_key = cf.cust_ord_procsd_dt_key ;
create bitmap index factcust_dtime_idx2 on cust_fact(dt.calendar_dt) from cust_fact cf, time_dim
                                                        *
ERROR at line 1:
ORA-01408: such column list already indexed


SQL> 

Any help or suggestion is highly appreciated. 

Thank you

Tom Kyte
October 12, 2010 - 8:06 am UTC

tell you what - post an example (short, small, sweet) that is 100% complete. So I don't have to guess at things.

If I run the create index, I get a totally different answer.

give me a complete full SMALL example.

collect stats deteriorate performance

Sanji, December 20, 2010 - 5:22 pm UTC

Hello Tom,

I'm trying to analyze an issue where stats collection on a table deteriorates performance..
I have tried gatherings stats on all columns, all indexed columns, with different sizes. The explain plan remains the same.
However, when i delete the stats, i get a much better response time.
I understand that it's due to skewness of the data and because the optimizer realizes that it's better to use one index over the other, the issue is, i'm not getting how to get the optimizer to use the correct index OR not use it at all.


OPEN:SANJI:TRECS@ORADEV1>exec dbms_stats.delete_table_stats('ur','impaccounthistory')

PL/SQL procedure successfully completed.

OPEN:SANJI:TRECS@ORADEV1>exec dbms_stats.gather_table_stats('ur','impaccounthistory',method_opt=>'for all indexed columns size 100',cascade=>true)

PL/SQL procedure successfully completed.

OPEN:SANJI:TRECS@ORADEV1>OPEN:SANJI:TRECS@ORADEV1>
SELECT a.TAccountID,
a.ImpAccountID,
a.HistoryDate,
a.ImpAccountType,
a.AdjustedBalance,
a.AdjustedBalanceType,
'E'
FROM ur.ImpAccountHistory a
WHERE a.TAccountID = 96
and Adjustedbalance is NOT NULL
AND a.ImpAccountType = '2'
AND a.HistoryDate =
(select Max ( HistoryDate )
From ur.ImpAccountHistory
Where ImpAccountID = a.ImpAccountID
and HistoryDate <= to_date('30Nov2010', 'ddmonyyyy')
)
/

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1333 Card=14 Bytes=742)
1 0 FILTER
2 1 SORT (GROUP BY) (Cost=1333 Card=14 Bytes=742)
3 2 HASH JOIN (Cost=1085 Card=51190 Bytes=2713070)
4 3 TABLE ACCESS (FULL) OF 'IMPACCOUNTHISTORY' (Cost=758 Card=245 Bytes=10045)
5 3 INDEX (FAST FULL SCAN) OF 'PK_IMPACCOUNTHISTORY' (UNIQUE) (Cost=320 Card=1071442 Bytes=12857304)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11080 consistent gets
11030 physical reads
0 redo size
1014 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
2 rows processed

Without stats, the explain plan is

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'IMPACCOUNTHISTORY'
3 2 INDEX (RANGE SCAN) OF 'IMPHISTORY_TACCTID_IDX' (NON-UNIQUE)
4 1 SORT (AGGREGATE)
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'IMPACCOUNTHISTORY'
6 5 INDEX (RANGE SCAN) OF 'IMPACCOUNTHISTORY_FK1' (NON-UNIQUE)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3211 consistent gets
1221 physical reads
0 redo size
1014 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed

Way less logical I/O.... question is, how do i get this sorted out....

Thanks in advance for any insight.

Sanji
Tom Kyte
December 21, 2010 - 6:57 am UTC

no version information... :( looks like 9i maybe?

how about a TKPROF of both to go with this, we always need to see estimated card= values next to ACTUALS to see what is going wrong and where.

collect stats deteriorate performance

Sanji, December 21, 2010 - 8:18 am UTC

Sorry about that...The environment is 9.2.0.4, HP-UX 11i.
This is the tkprof output

With stats

SELECT a.TAccountID,
a.ImpAccountID,
a.HistoryDate,
a.ImpAccountType,
a.AdjustedBalance,
a.AdjustedBalanceType,
:"SYS_B_0"
FROM ImpAccountHistory a
WHERE a.TAccountID = :"SYS_B_1"
and Adjustedbalance is NOT NULL
AND a.ImpAccountType = :"SYS_B_2"
AND a.HistoryDate =
(select Max ( HistoryDate )
From ImpAccountHistory
Where ImpAccountID = a.ImpAccountID
and HistoryDate <= to_date(:"SYS_B_3", :"SYS_B_4")
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.71 1.67 7749 11102 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.71 1.67 7749 11102 0 2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 39

Rows Row Source Operation
------- ---------------------------------------------------
2 FILTER
180 SORT GROUP BY
101996 HASH JOIN
180 TABLE ACCESS FULL OBJ#(130114603)
1082254 INDEX FAST FULL SCAN OBJ#(130114605) (object id 130114605)


Without stats

********************************************************************************

SELECT a.TAccountID,
a.ImpAccountID,
a.HistoryDate,
a.ImpAccountType,
a.AdjustedBalance,
a.AdjustedBalanceType,
:"SYS_B_0"
FROM ImpAccountHistory a
WHERE a.TAccountID = :"SYS_B_1"
and Adjustedbalance is NOT NULL
AND a.ImpAccountType = :"SYS_B_2"
AND a.HistoryDate =
(select Max ( HistoryDate )
From ImpAccountHistory
Where ImpAccountID = a.ImpAccountID
and HistoryDate <= to_date(:"SYS_B_3", :"SYS_B_4")
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.05 0.05 1554 3241 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.05 0.05 1554 3241 0 2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 39

Rows Row Source Operation
------- ---------------------------------------------------
2 FILTER
180 TABLE ACCESS BY INDEX ROWID OBJ#(130114603)
2936 INDEX RANGE SCAN OBJ#(130114608) (object id 130114608)
4 SORT AGGREGATE
1870 TABLE ACCESS BY INDEX ROWID OBJ#(130114603)
1872 INDEX RANGE SCAN OBJ#(130114607) (object id 130114607)

********************************************************************************


variable SYS_B_0 'E'
variable SYS_B_0 96
variable SYS_B_0 '2'
variable SYS_B_0 '30Nov2010'
varialbe SYS_B_0 'ddmonyyyy'

Thanks
Sanji
Tom Kyte
December 21, 2010 - 11:50 am UTC

ugh, why oh why oh WHY are you using cursor_sharing=force/similar??????

which mode are you using and WHY.


what happens if you - in this one case - first_rows hint that

collect stats deteriorate performance

Sanji, December 21, 2010 - 1:10 pm UTC

Well, i tried with cursor_sharing=EXACT... restarted the database (in dev).

The stats on this table were collected as

exec dbms_stats.gather_table_stats('sanji','impaccounthistory',method_opt=>'for all columns size 100',cascade=>true)

OPEN:SANJI:TRECS@ORADEV1>show parameter cursor_sharing

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cursor_sharing string EXACT

OPEN:SANJI:TRECS@ORADEV1>alter session set optimizer_mode = first_rows;

Session altered.

OPEN:SANJI:TRECS@ORADEV1>explain plan for
SELECT a.TAccountID,
a.ImpAccountID,
a.HistoryDate,
a.ImpAccountType,
a.ClosingBalance,
a.ClosingBalanceType,
'P'
FROM ImpAccountHistory a
WHERE a.TAccountID = 96
and Adjustedbalance is NULL
AND a.ImpAccountType = '1'
AND a.HistoryDate =
(select Max ( HistoryDate )
From ImpAccountHistory
Where ImpAccountID = a.ImpAccountID
and HistoryDate <= to_date('30Nov2010', 'ddmonyyyy')
) ;

Explained.

OPEN:SANJI:TRECS@ORADEV1>select * from table (dbms_xplan.display());

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 34 | 1564 | 862 |
|* 1 | FILTER | | | | |
| 2 | SORT GROUP BY | | 34 | 1564 | 862 |
|* 3 | HASH JOIN | | 36186 | 1625K| 706 |
|* 4 | TABLE ACCESS BY INDEX ROWID| IMPACCOUNTHISTORY | 172 | 5848 | 381 |
|* 5 | INDEX RANGE SCAN | IMPACTHIST_TACTID_IDX | 2936 | | 8 |
|* 6 | INDEX FAST FULL SCAN | PK_IMPACTHIST | 1076K| 12M| 321 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("A"."HISTORYDATE"=MAX("IMPACCOUNTHISTORY"."HISTORYDATE"))
3 - access("IMPACCOUNTHISTORY"."IMPACCOUNTID"="A"."IMPACCOUNTID")
4 - filter("A"."ADJUSTEDBALANCE" IS NULL AND "A"."IMPACCOUNTTYPE"='1')
5 - access("A"."TACCOUNTID"=96)
6 - filter("IMPACCOUNTHISTORY"."HISTORYDATE"<=TO_DATE('2010-11-30 00:00:00',
'yyyy-mm-dd hh24:mi:ss'))

Note: cpu costing is off

24 rows selected.


Without stats, this time (with optimizer_mode as first_rows), the explain plan is same as the one with stats.

OPEN:SANJI:TRECS@ORADEV1>exec dbms_stats.delete_table_stats('sanji','impaccounthistory')

PL/SQL procedure successfully completed.

OPEN:SANJI:TRECS@ORADEV1>delete from plan_table;

7 rows deleted.

OPEN:SANJI:TRECS@ORADEV1>explain plan for
2 SELECT a.TAccountID,
3 a.ImpAccountID,
a.HistoryDate,
a.ImpAccountType,
4 5 6 a.ClosingBalance,
a.ClosingBalanceType,
7 8 'P'
FROM ImpAccountHistory a
WHERE a.TAccountID = 96
9 10 11 and Adjustedbalance is NULL
AND a.ImpAccountType = '1'
12 13 AND a.HistoryDate =
(select Max ( HistoryDate )
14 15 From ImpAccountHistory
16 Where ImpAccountID = a.ImpAccountID
and HistoryDate <= to_date('30Nov2010', 'ddmonyyyy')
17 18 )
19
OPEN:SANJI:TRECS@ORADEV1>/

Explained.

OPEN:SANJI:TRECS@ORADEV1>select * from table (dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 288 | 7 |
|* 1 | FILTER | | | | |
| 2 | SORT GROUP BY | | 3 | 288 | 7 |
| 3 | NESTED LOOPS | | 3 | 288 | 5 |
|* 4 | TABLE ACCESS BY INDEX ROWID| IMPACCOUNTHISTORY | 3 | 222 | 2 |
|* 5 | INDEX RANGE SCAN | IMPACTHIST_TACTID_IDX | 2575 | | 1 |
|* 6 | INDEX RANGE SCAN | PK_IMPACTHIST | 1 | 22 | 1 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("A"."HISTORYDATE"=MAX("IMPACCOUNTHISTORY"."HISTORYDATE"))
4 - filter("A"."ADJUSTEDBALANCE" IS NULL AND "A"."IMPACCOUNTTYPE"='1')
5 - access("A"."TACCOUNTID"=96)
6 - access("IMPACCOUNTHISTORY"."IMPACCOUNTID"="A"."IMPACCOUNTID" AND
"IMPACCOUNTHISTORY"."HISTORYDATE"<=TO_DATE('2010-11-30 00:00:00', 'yyyy-mm-dd
hh24:mi:ss'))

Note: cpu costing is off

24 rows selected.

However, if i rerun this sql with optimizer_mode as choose, the explain plan starts using the efficient index.

OPEN:SANJI:TRECS@ORADEV1>alter session set optimizer_mode=choose;

OPEN:SANJI:TRECS@ORADEV1>explain plan for
2 SELECT a.TAccountID,
3 a.ImpAccountID,
a.HistoryDate,
4 5 a.ImpAccountType,
a.ClosingBalance,
a.ClosingBalanceType,
6 7 8 'P'
FROM ImpAccountHistory a
9 10 WHERE a.TAccountID = 96
and Adjustedbalance is NULL
11 12 AND a.ImpAccountType = '1'
13 AND a.HistoryDate =
(select Max ( HistoryDate )
14 15 From ImpAccountHistory
Where ImpAccountID = a.ImpAccountID
16 17 and HistoryDate <= to_date('30Nov2010', 'ddmonyyyy')
)
18 19
OPEN:SANJI:TRECS@ORADEV1>/

Explained.

Elapsed: 00:00:00.00
OPEN:SANJI:TRECS@ORADEV1>select * from table (dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
|* 1 | FILTER | | | | |
|* 2 | TABLE ACCESS BY INDEX ROWID | IMPACCOUNTHISTORY | | | |
|* 3 | INDEX RANGE SCAN | IMPACTHIST_TACTID_IDX | | | |
| 4 | SORT AGGREGATE | | | | |
|* 5 | TABLE ACCESS BY INDEX ROWID| IMPACCOUNTHISTORY | | | |
|* 6 | INDEX RANGE SCAN | IMPACTHIST_FK | | | |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("SYS_ALIAS_1"."HISTORYDATE"= (SELECT
MAX("IMPACCOUNTHISTORY"."HISTORYDATE") FROM "IMPACCOUNTHISTORY" "IMPACCOUNTHISTORY"
WHERE "IMPACCOUNTHISTORY"."IMPACCOUNTID"=:B1 AND
"IMPACCOUNTHISTORY"."HISTORYDATE"<=TO_DATE('2010-11-30 00:00:00', 'yyyy-mm-dd
hh24:mi:ss')))
2 - filter("SYS_ALIAS_1"."IMPACCOUNTTYPE"='1' AND "SYS_ALIAS_1"."ADJUSTEDBALANCE" IS
NULL)
3 - access("SYS_ALIAS_1"."TACCOUNTID"=96)
5 - filter("IMPACCOUNTHISTORY"."HISTORYDATE"<=TO_DATE('2010-11-30 00:00:00',
'yyyy-mm-dd hh24:mi:ss'))
6 - access("IMPACCOUNTHISTORY"."IMPACCOUNTID"=:B1)

Note: rule based optimization

29 rows selected.

Elapsed: 00:00:00.02

A reader, December 21, 2010 - 10:26 pm UTC

could you try below query and see..

select *
from
(
SELECT a.TAccountID, a.ImpAccountID, a.HistoryDate, a.ImpAccountType, a.ClosingBalance, a.ClosingBalanceType, 'P' , MAX(a.HistoryDate ) over (partition by a.ImpAccountID ) as max_dt

FROM ImpAccountHistory a
WHERE a.TAccountID = 96 and Adjustedbalance is NULL AND a.ImpAccountType = '1' AND HistoryDate <= to_date('30Nov2010', 'ddmonyyyy') )

) where max_dt=HistoryDate;

Thanks
Tom Kyte
December 22, 2010 - 2:01 pm UTC

that isn't the same.

they are getting:

(select Max ( HistoryDate )
14 15 From ImpAccountHistory
Where ImpAccountID = a.ImpAccountID
16 17 and HistoryDate <= to_date('30Nov2010', 'ddmonyyyy')


the max history date for an impaccountid


YOU are getting the max history date for a given taccountid, adjustedbalance, impaccountype and impaccountid.


You cannot evaluate that where clause and then get the max history date, you would have to get the max history date (for all rows) AND THEN apply the predicate.

collect stats deteriorate performance..

Rajeshwaran, Jeyabal, December 22, 2010 - 6:21 am UTC

Tom:

How did you identify the cursor_sharing=force/similar from Tkprof Results. Is that using the Bind variable available there ?


variable SYS_B_0  'E'
variable SYS_B_0  96
variable SYS_B_0  '2'
variable SYS_B_0  '30Nov2010'
varialbe SYS_B_0  'ddmonyyyy'

Tom Kyte
December 22, 2010 - 2:29 pm UTC

the bind variable names are a giveaway. the fact your query used literals - but the tkprof shows binds is another.

A reader, December 22, 2010 - 5:23 pm UTC

Hi Tom,

You are correct my bad but we can rewrite as below which would be the same

Select x.*
from
(
SELECT a.TAccountID,
a.ImpAccountID,
a.HistoryDate,
a.ImpAccountType,
a.ClosingBalance,
a.ClosingBalanceType,
'P'
FROM ImpAccountHistory a
WHERE a.TAccountID = 96
and Adjustedbalance is NULL
AND a.ImpAccountType = '1'
)X
Join
( select Max(HistoryDate), ImpAccountID
From ImpAccountHistory
Where HistoryDate<= to_date('30Nov2010', 'ddmonyyyy')
group by ImpAccountID
)y
ON x.ImpAccountID=y.ImpAccountID


Also will below query more efficient than above, since we are scaning the table once in below query

Select *
From
(
SELECT a.TAccountID,
a.ImpAccountID,
a.HistoryDate,
a.ImpAccountType,
a.ClosingBalance,
a.ClosingBalanceType,
'P',
case when a.TAccountID = 96 AND Adjustedbalance is NULL AND a.ImpAccountType = '1' Then 1 END as cond_1,
Max(
case when HistoryDate<= to_date('30Nov2010', 'ddmonyyyy')Then
HistoryDate
END
)over(partition by ImpAccountID) as max_hist_dt


FROM ImpAccountHistory a
)X
where cond_1=1 AND max_hist_dt= HistoryDate

Thanks

Tom Kyte
December 22, 2010 - 5:42 pm UTC

... Also will below query more efficient than above, since we are scaning the table
once in below query
....

it depends - and - the optimizer has the ability and the smarts to do that sort of rewrite itself - IF it makes sense.

It doesn't have to scan the table in the subquery at all - if it makes sense.

there are lots of ways to rewrite the query.

Shaun, December 23, 2010 - 11:06 am UTC

Hi Tom,
There has been a lot of changes in the database I am working on (10g).
How can you make Oracle analyze it and come up with a new execution plan ?
I am looking for the script/command.
Tom Kyte
December 23, 2010 - 11:24 am UTC

really? don't you have a DBA to help out?

you might need to read up on dbms_stats.

http://docs.oracle.com/docs/cd/E11882_01/server.112/e16638/stats.htm

Histogram in 10GR2

Rajeshwaran, Jeyabal, January 11, 2011 - 2:36 pm UTC

Tom:

Refering to product documentation like below

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#i1036461

in 10G is see default value for method_opt as below. but in 9iR2 its 'FOR ALL COLUMNS SIZE 1'

rajesh@10GR2> select dbms_stats.get_param('METHOD_OPT') from dual;


DBMS_STATS.GET_PARAM('METHOD_OPT')
--------------------------------------------------------------------
FOR ALL COLUMNS SIZE AUTO

Elapsed: 00:00:00.23
rajesh@10GR2>


Question

1)If i am working in Oracle 10G database, I dont need to work on SIZE 'N'- by letting to AUTO (either FOR ALL COLUMNS SIZE AUTO / FOR ALL INDEXED COLUMNS SIZE AUTO ) Oracle determines the best. Is that is good (or) I need to still Benchmark?
Tom Kyte
January 12, 2011 - 10:14 am UTC

In many/most cases - you don't want histrograms - eg: in an OLTP system using bind variables - it would be the exception to WANT histrograms. In a report system/warehouse with fewer bind variables - then you would want histograms.

You need to think about whether you even want histograms first. And when you do - and if you KNOW which columns you want histograms on - just use size 254 (if there are less then 254 distinct values - we'll only have that many buckets).

If you want histograms but you don't know which columns you want them on - then the default is as good a starting place as any.

Histogram from Oracle Magazine Jan 2011

Rajeshwaran, Jeyabal, January 21, 2011 - 6:53 am UTC

rajesh@10GR2> 
rajesh@10GR2> begin
  2   dbms_stats.gather_table_stats(
  3    ownname=>user,
  4    tabname=>'T',
  5    method_opt=>'FOR ALL COLUMNS SIZE 12',
  6    estimate_percent=>100);
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.78
rajesh@10GR2> 
rajesh@10GR2> set autotrace on explain;
rajesh@10GR2> select 'len30',count(*) from t where len_30 = rpad('x',30,'x')||'A'
  2  union all
  3  select 'len30',count(*) from t where len_30 = rpad('x',30,'x')||'B'
  4  union all
  5  select 'len31',count(*) from t where len_31 = rpad('x',31,'x')||'A'
  6  union all
  7  select 'len31',count(*) from t where len_31 = rpad('x',31,'x')||'B'
  8  union all
  9  select 'len32',count(*) from t where len_32 = rpad('x',32,'x')||'A'
 10  union all
 11  select 'len32',count(*) from t where len_32 = rpad('x',32,'x')||'B'
 12  union all
 13  select 'len33',count(*) from t where len_33 = rpad('x',33,'x')||'A'
 14  union all
 15  select 'len33',count(*) from t where len_33 = rpad('x',33,'x')||'B'
 16  /

LEN3   COUNT(*)
----- ----------
len30      23061
len30       3062
len31      23061
len31       3062
len32      23061
len32       3062
len33      23061
len33       3062

8 rows selected.

Elapsed: 00:00:00.14

Execution Plan
----------------------------------------------------------
Plan hash value: 2994643399

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     8 |   268 |  1981  (88)| 00:00:24 |
|   1 |  UNION-ALL          |      |       |       |            |          |
|   2 |   SORT AGGREGATE    |      |     1 |    32 |            |          |
|*  3 |    TABLE ACCESS FULL| T    | 23061 |   720K|   247   (1)| 00:00:03 |
|   4 |   SORT AGGREGATE    |      |     1 |    32 |            |          |
|*  5 |    TABLE ACCESS FULL| T    |  3062 | 97984 |   247   (1)| 00:00:03 |
|   6 |   SORT AGGREGATE    |      |     1 |    33 |            |          |
|*  7 |    TABLE ACCESS FULL| T    | 23061 |   743K|   248   (1)| 00:00:03 |
|   8 |   SORT AGGREGATE    |      |     1 |    33 |            |          |
|*  9 |    TABLE ACCESS FULL| T    |  3062 |    98K|   248   (1)| 00:00:03 |
|  10 |   SORT AGGREGATE    |      |     1 |    34 |            |          |
|* 11 |    TABLE ACCESS FULL| T    | 56745 |  1884K|   248   (1)| 00:00:03 |
|  12 |   SORT AGGREGATE    |      |     1 |    34 |            |          |
|* 13 |    TABLE ACCESS FULL| T    | 56745 |  1884K|   248   (1)| 00:00:03 |
|  14 |   SORT AGGREGATE    |      |     1 |    35 |            |          |
|* 15 |    TABLE ACCESS FULL| T    | 56745 |  1939K|   248   (1)| 00:00:03 |
|  16 |   SORT AGGREGATE    |      |     1 |    35 |            |          |
|* 17 |    TABLE ACCESS FULL| T    | 56745 |  1939K|   248   (1)| 00:00:03 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("LEN_30"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA')
   5 - filter("LEN_30"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxB')
   7 - filter("LEN_31"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA')
   9 - filter("LEN_31"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxB')
  11 - filter("LEN_32"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA')
  13 - filter("LEN_32"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxB')
  15 - filter("LEN_33"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA')
  17 - filter("LEN_33"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxB')

rajesh@10GR2>


Now, Steps 3 to 9 in Explain plan Estimated Cardinality matches with Actual counts.

But what using method_opt=FOR ALL COLUMNS SIZE AUTO, I see following results.

(Explain plan Estimated Cardinality *DOESNOT* matches with Actual counts)

rajesh@10GR2> create table t
  2  nologging
  3  as
  4  select
  5   rpad('x',30,'x')||
  6    chr ( ascii('A') + case when rownum < 20000 then 0 else mod(rownum,12) end ) as len_30,
  7   rpad('x',31,'x')||
  8    chr ( ascii('A') + case when rownum < 20000 then 0 else mod(rownum,12) end ) as len_31,
  9   rpad('x',32,'x')||
 10    chr ( ascii('A') + case when rownum < 20000 then 0 else mod(rownum,12) end ) as len_32,
 11   rpad('x',33,'x')||
 12    chr ( ascii('A') + case when rownum < 20000 then 0 else mod(rownum,12) end ) as len_33
 13  from all_objects;

Table created.

Elapsed: 00:00:06.48
rajesh@10GR2> 
rajesh@10GR2> select dbms_stats.get_param('METHOD_OPT') from dual;

DBMS_STATS.GET_PARAM('METHOD_OPT')
-----------------------------------------------------------------------------------------------------------------------------------
FOR ALL COLUMNS SIZE AUTO

Elapsed: 00:00:00.04
rajesh@10GR2> 
rajesh@10GR2> begin
  2   dbms_stats.gather_table_stats(
  3    ownname=>user,
  4    tabname=>'T',
  5    method_opt=>'FOR ALL COLUMNS SIZE AUTO',
  6    estimate_percent=>100);
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.86
rajesh@10GR2> 
rajesh@10GR2> 
rajesh@10GR2> select table_name, column_name, endpoint_number, endpoint_value
  2  from user_tab_histograms
  3  where table_name ='T'
  4  order by column_name
  5  /

TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER                        ENDPOINT_VALUE
---------- ---------- --------------- -------------------------------------
T          LEN_30                   0  625519056839960000000000000000000000
T          LEN_30                   1  625519056839960000000000000000000000
T          LEN_31                   0  625519056839960000000000000000000000
T          LEN_31                   1  625519056839960000000000000000000000
T          LEN_32                   1  625519056839960000000000000000000000
T          LEN_32                   0  625519056839960000000000000000000000
T          LEN_33                   1  625519056839960000000000000000000000
T          LEN_33                   0  625519056839960000000000000000000000

8 rows selected.

Elapsed: 00:00:00.06
rajesh@10GR2>
rajesh@10GR2> 
rajesh@10GR2> 
rajesh@10GR2> 
rajesh@10GR2> set autotrace on explain;
rajesh@10GR2> 
rajesh@10GR2> select 'len30',count(*) from t where len_30 = rpad('x',30,'x')||'A'
  2  union all
  3  select 'len30',count(*) from t where len_30 = rpad('x',30,'x')||'B'
  4  union all
  5  select 'len31',count(*) from t where len_31 = rpad('x',31,'x')||'A'
  6  union all
  7  select 'len31',count(*) from t where len_31 = rpad('x',31,'x')||'B'
  8  union all
  9  select 'len32',count(*) from t where len_32 = rpad('x',32,'x')||'A'
 10  union all
 11  select 'len32',count(*) from t where len_32 = rpad('x',32,'x')||'B'
 12  union all
 13  select 'len33',count(*) from t where len_33 = rpad('x',33,'x')||'A'
 14  union all
 15  select 'len33',count(*) from t where len_33 = rpad('x',33,'x')||'B'
 16  /

LEN3   COUNT(*)
----- ----------
len30      23061
len30       3062
len31      23061
len31       3062
len32      23061
len32       3062
len33      23061
len33       3062

8 rows selected.

Elapsed: 00:00:00.15

Execution Plan
----------------------------------------------------------
Plan hash value: 2994643399

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     8 |   268 |  1981  (88)| 00:00:24 |
|   1 |  UNION-ALL          |      |       |       |            |          |
|   2 |   SORT AGGREGATE    |      |     1 |    32 |            |          |
|*  3 |    TABLE ACCESS FULL| T    |  4729 |   147K|   247   (1)| 00:00:03 |
|   4 |   SORT AGGREGATE    |      |     1 |    32 |            |          |
|*  5 |    TABLE ACCESS FULL| T    |  4729 |   147K|   247   (1)| 00:00:03 |
|   6 |   SORT AGGREGATE    |      |     1 |    33 |            |          |
|*  7 |    TABLE ACCESS FULL| T    |  4729 |   152K|   248   (1)| 00:00:03 |
|   8 |   SORT AGGREGATE    |      |     1 |    33 |            |          |
|*  9 |    TABLE ACCESS FULL| T    |  4729 |   152K|   248   (1)| 00:00:03 |
|  10 |   SORT AGGREGATE    |      |     1 |    34 |            |          |
|* 11 |    TABLE ACCESS FULL| T    |  4729 |   157K|   248   (1)| 00:00:03 |
|  12 |   SORT AGGREGATE    |      |     1 |    34 |            |          |
|* 13 |    TABLE ACCESS FULL| T    |  4729 |   157K|   248   (1)| 00:00:03 |
|  14 |   SORT AGGREGATE    |      |     1 |    35 |            |          |
|* 15 |    TABLE ACCESS FULL| T    |  4729 |   161K|   248   (1)| 00:00:03 |
|  16 |   SORT AGGREGATE    |      |     1 |    35 |            |          |
|* 17 |    TABLE ACCESS FULL| T    |  4729 |   161K|   248   (1)| 00:00:03 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("LEN_30"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA')
   5 - filter("LEN_30"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxB')
   7 - filter("LEN_31"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA')
   9 - filter("LEN_31"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxB')
  11 - filter("LEN_32"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA')
  13 - filter("LEN_32"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxB')
  15 - filter("LEN_33"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA')
  17 - filter("LEN_33"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxB')

rajesh@10GR2>


Questions:

1) I should NOT use SIZE AUTO IN method_opt parameter if the actual value of SIZE is Known (like in this case its 12). Is that correct?
Tom Kyte
January 24, 2011 - 7:15 am UTC

it depends. I was showing a crafted example where I wanted to show exact, precise numbers for a demonstration. I needed something predicable and repeatable. Hence I maxed things out, I computed not estimated, I made it 100% repeatable. In real life, you don't need that.


Estimates are estimates. If size auto gets you plans that are better than good enough - then it is better than good enough.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library