Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, savitha.

Asked: December 06, 2019 - 2:40 pm UTC

Last updated: December 09, 2019 - 10:45 am UTC

Version: Oracle 12 c

Viewed 1000+ times

You Asked

Can you please provide your opinion on the below point. This is what I have noticed.

When we create a table using a CTAS, and then check the user_Tables, the last_analyzed and num_rows column is already populated with accurate data. If it is so, is there a need to gather the table stats explicitly every time a table with CTAS is created. If not, we can further see improvement in your process runs by removing these gather stats if they are not relevant. Because, we see that table with huge data , gather stats takes around 15 mins of time.


and Chris said...

While the database does gather stats during CTAS and other bulk load operations, it doesn't capture every stat.

For example, histograms are missing:

create table t (
  c1, c2, c3
) as
  select level, 
         floor ( 100 / level ), 
         case
           when level = 1 then 1
           else 99999
         end 
  from   dual
  connect by level <= 10000;
  
info+ t

TABLE: T 
  LAST ANALYZED:2019-12-06 
  ROWS         :10000 
  SAMPLE SIZE  :10000 
  INMEMORY     :DISABLED 
  COMMENTS     : 

Columns 
NAME         DATA TYPE   NULL  DEFAULT  LOW_VALUE HIGH_VALUE NUM_DISTINCT HISTOGRAM  
 C1          NUMBER      Yes                    1  100000000        10000 NONE       
 C2          NUMBER      Yes                    0        100           20 NONE       
 C3          NUMBER      Yes                    1      99999            2 NONE  

select count (*)
from   t
where  c1 = 1
and    c2 = 1
and    c3 = 1;

exec dbms_stats.gather_table_stats ( user, 't' ) ;

info+ t

TABLE: T 
  LAST ANALYZED:2019-12-06 
  ROWS         :10000 
  SAMPLE SIZE  :10000 
  INMEMORY     :DISABLED 
  COMMENTS     : 

Columns 
NAME         DATA TYPE   NULL  DEFAULT  LOW_VALUE HIGH_VALUE NUM_DISTINCT HISTOGRAM  
 C1          NUMBER      Yes                    1  100000000        10000 NONE       
 C2          NUMBER      Yes                    0        100           20 FREQUENCY       
 C3          NUMBER      Yes                    1      99999            2 FREQUENCY


So the question for you is:

Does this matter?

If you get good enough plans with the basic stats, then there's no need to gather them again.

On the other hand, if you have lots of queries against columns with highly skewed data, histograms may be essential.

So really you need to test:

Do you save more time by skipping the manual gather than you lose by having (potentially) worse plans?

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

on GATHER AUTO , IOT and Partitioned tables.

Rajeshwaran, Jeyabal, December 07, 2019 - 1:17 pm UTC

By any chance if the underlying table is of IOT - if so then gather stats during CTAS wont happen.

demo@PDB1> create table t (x primary key)
  2  organization index
  3  as
  4  select user_id
  5  from all_users;

Table created.

demo@PDB1> select num_rows,last_analyzed
  2  from user_tab_statistics
  3  where table_name ='T';

  NUM_ROWS LAST_ANALYZ
---------- -----------


demo@PDB1> exec dbms_stats.gather_table_stats(user,'T');

PL/SQL procedure successfully completed.

demo@PDB1> select num_rows,last_analyzed
  2  from user_tab_statistics
  3  where table_name ='T';

  NUM_ROWS LAST_ANALYZ
---------- -----------
        48 07-DEC-2019

demo@PDB1>


By any chance if the underlying table is of partitioned - then partitioned level stats wont be available duirng CTAS.
However we can use the GATHER AUTO option, to just regather only those missing stats.

demo@PDB1> create table t
  2  partition by range( created )
  3  interval( numtoyminterval(1,'year') )
  4  ( partition p2016 values less than
  5     ( to_date('01-jan-2017','dd-mon-yyyy') ) )
  6  as
  7  select *
  8  from all_objects;

Table created.

demo@PDB1> @big_date

Session altered.

demo@PDB1> select partition_name,num_rows,last_analyzed
  2  from user_tab_statistics
  3  where table_name ='T';

PARTITION_NAME         NUM_ROWS LAST_ANALYZED
-------------------- ---------- -----------------------
                          68488 07-DEC-2019 06:29:32 pm
SYS_P6088
SYS_P6089
P2016

demo@PDB1> exec dbms_stats.gather_table_stats(user,'T',options=>'gather AUTO');

PL/SQL procedure successfully completed.

demo@PDB1> select partition_name,num_rows,last_analyzed
  2  from user_tab_statistics
  3  where table_name ='T';

PARTITION_NAME         NUM_ROWS LAST_ANALYZED
-------------------- ---------- -----------------------
                          68488 07-DEC-2019 06:29:32 pm
SYS_P6088                 67938 07-DEC-2019 06:30:42 pm
SYS_P6089                   550 07-DEC-2019 06:30:43 pm
P2016                         0 07-DEC-2019 06:30:40 pm

demo@PDB1>



One more, In the intial response, if the goal is the just build the Histograms post the queries that used to populate sys.col_usage$ - then we could still leverage GATHER AUTO option for it.
that would - just focus on gathering the missing portion of stats rather than re-gathering the entire stats.

demo@pdb1> create table t (
  2    c1, c2, c3
  3  ) as
  4    select level,
  5           floor ( 100 / level ),
  6           case
  7             when level = 1 then 1
  8             else 99999
  9           end
 10    from   dual
 11    connect by level <= 10000;


Table T created.

demo@pdb1> info+ T
TABLE: T
         LAST ANALYZED:2019-12-07 18:37:43.0
         ROWS         :10000
         SAMPLE SIZE  :10000
         INMEMORY     :DISABLED
         COMMENTS     :

Columns
NAME         DATA TYPE   NULL  DEFAULT    LOW_VALUE   HIGH_VALUE   NUM_DISTINCT   HISTOGRAM
 C1          NUMBER      Yes                  1           10000        10000          NONE
 C2          NUMBER      Yes                  0           100          20             NONE
 C3          NUMBER      Yes                  1           99999        2              NONE



demo@pdb1> select count (*)
  2  from   t
  3  where  c1 = 1
  4  and    c2 = 1
  5  and    c3 = 1;

  COUNT(*)
----------
         0


demo@pdb1> exec dbms_stats.gather_table_stats(user,'T', options=>'GATHER AUTO') ;

PL/SQL procedure successfully completed.

demo@pdb1> info+ T
TABLE: T
         LAST ANALYZED:2019-12-07 18:37:43.0
         ROWS         :10000
         SAMPLE SIZE  :10000
         INMEMORY     :DISABLED
         COMMENTS     :

Columns
NAME         DATA TYPE   NULL  DEFAULT    LOW_VALUE   HIGH_VALUE   NUM_DISTINCT   HISTOGRAM
 C1          NUMBER      Yes                  1           10000        10000          NONE
 C2          NUMBER      Yes                  0           100          20             FREQUENCY
 C3          NUMBER      Yes                  1           99999        2              FREQUENCY



demo@pdb1>

Chris Saxon
December 09, 2019 - 10:45 am UTC

Good examples.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.