Skip to Main Content
  • Questions
  • impact of doing gather statistics frequently

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question.

Asked: September 24, 2015 - 9:50 am UTC

Last updated: September 28, 2015 - 3:52 am UTC

Version: 11.1.2

Viewed 10K+ times! This question is

You Asked

Hi all,
I am working in data warehousing project where we are dealing with huge volume of data.There are a lot of procedures where we are creating intermediate tables using CTAS and gathering the statistics of those tables for better performance.each day we are doing gather state for different tables 700k times .After few days , we experience extremely slowness in database.

After investigation DBA told us that, the slowness was due to excessive use of gather state.

1.Can please explain the reason why the execessive use of gather state lead to extream slowness ?
2 .Can u please tell how frequently should we gather the statistics in the database ?
3.After upgrading to 12c, is it still needed to gather the statistics for the CTAS statemnets explictly
Or it is done by the optimizer automatically ?

Thanks
shyam

and Chris said...

1. How does your DBA know that gather stats is causing the slowness? What evidence is there? I can't explain why (if it's actually the case) without further details about your system.

2. When there's sufficient change in the data that it affects query plans - i.e. the stats are stale. Ultimately it depends upon how often and how much data in the columns appearing in your where clauses change. Starting with default stale percentage (10%) is a good starting point. You can adjust this up or down on a table-by-table basis as necessary.

3. Yes, 12c gathers stats in CTAS statements. You can see this in the explain plan and by creating a table:

CHRIS>explain plan for
  2  create table t as
  3    select rownum r from dual
  4    connect by level <= 100;

Explained.

CHRIS>
CHRIS>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------
Plan hash value: 1600317434

---------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |      |     1 |     3   (0)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | T    |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |      |     1 |     2   (0)| 00:00:01 |
|   3 |    COUNT                         |      |       |            |          |
|*  4 |     CONNECT BY WITHOUT FILTERING |      |       |            |          |
|   5 |      FAST DUAL                   |      |     1 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(LEVEL<=100)

17 rows selected.

CHRIS>
CHRIS>create table t as
  2    select rownum r from dual
  3    connect by level <= 100;

Table created.

CHRIS>
CHRIS>select num_rows, to_char(last_analyzed, 'dd/mm hh24:mi') lasta, to_char(sysdate, 'dd/mm hh24:mi') sysd
  2  from   user_tables
  3  where  table_name = 'T';

  NUM_ROWS LASTA       SYSD
---------- ----------- -----------
       100 24/09 13:20 24/09 13:20

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Dynamic

David Aldridge, September 24, 2015 - 9:22 pm UTC

I think that if you have tables with rapidly varying quantities of rows and ranges of values in them, and queries running against them that take a significant time -- say a few seconds or more -- then there's a string case for not gathering statistics at all.

Remove statistics from the tables, lock the statistics, and let dynamic sampling sort it out.

If you're using CTAS, then just do not gather statistics.

Dynamic statistics are not perfect, but they are never as misleading as stale statistics potentially can be, and you are guaranteed to only spend time gathering statistics that are actually useful to the optimiser.
Chris Saxon
September 25, 2015 - 1:23 am UTC

Agreed.

Especially in data warehouse load scenarios. I see many clients with logic like (pseudocode):

procedure LOADER IS

insert into STAGING_TABLE
select * from ...

rowcount := sql%rowcount;

logger('Rows created was '||rowcount);

dbms_stats.gather_table_stats('','STAGING_TABLE');


Why bother with all that effort to calc stats ? I mean, you KNOW how many rows are in the table, because you just logged how many there were !

Many a 'gather_table_stats' command in warehouse loads could be easily replaced with a 'set_table_stats'.

How about Column level stats ?

Rajeshwaran Jeyabal, September 27, 2015 - 3:06 am UTC

Many a 'gather_table_stats' command in warehouse loads could be easily replaced with a 'set_table_stats'.

This should be good at Table level stats, how about column level stats? How do I know num_nulls, NDV for each column?
Connor McDonald
September 28, 2015 - 3:52 am UTC

There are a *lot* of cases where the column level stats could easily be deduced.

For example, lets say I've got 100million rows in my transaction table. I do a gather table tables, and the NDV for customers is (say) 2500. You can be pretty confident that if I load another 1million rows, I'm probably *still* going to have 2500 distinct customers. Or maybe 2510. Or maybe I just go look at my customers table - if that has grown from (say) 10,000 rows to 11,000 rows, I might opt for an NDV of 2700.

But for a vast majority of columns, the stats are readily inferred from simple knowledge of the application, eg

- Transaction_date: high water maker = today
- unique/primary key : NDV goes up by number of rows loadded
- status fields: NDV remains relatively static

The goal is not to have *perfect* stats, the goal is to have the stats *good enough* to get good queries.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library