Skip to Main Content
  • Questions
  • Will creating indexes and histograms improve database performance?

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Krishnaprasad.

Asked: February 27, 2018 - 1:58 pm UTC

Last updated: February 28, 2018 - 3:54 pm UTC

Version: 11.2.0.3

Viewed 1000+ times

You Asked

Hi Team,

I have question regarding of situation i am facing.

i have table let have example table "IP" :

table IP has one column ID i.e IP(ID);

below is value distribution :-

SQL> select count(*),id from IP group by id;

  COUNT(*) ID
---------- --------------------
        25 a
        50 b
        75 c


i would like to know if i create index on IP(ID) i will definitely get benefit , and i am thinking of creating histogram as well it will benefit me for values (a,b,c) .

but if i pass value in where clause for "id" , other than listed in ID column will collecting Histogram help me or it might be chances that it would go out of bound and might go in FTS or index full scan ??? OR will it help me any way in performance gain factor ???

example:-

select count(*) from IP where id='h';====> will gathering histogram  help ???

and Chris said...

Will adding an index definitely improve performance?

No:

create table t as 
  select mod(level, 3) x, lpad('x', 500, 'x') y
  from   dual
  connect by level <= 1000;

set serveroutput off
select /*+ gather_plan_statistics */x, count(*) from t
group  by x;

X   COUNT(*)   
  1        334 
  2        333 
  0        333 

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                
SQL_ID  0v787jt578k0y, child number 0                                                            
-------------------------------------                                                            
select /*+ gather_plan_statistics */x, count(*) from t group  by x                               
                                                                                                 
Plan hash value: 674246460                                                                       
                                                                                                 
----------------------------------------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |   
----------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      3 |00:00:00.01 |      84 |      1 |   
|   1 |  HASH GROUP BY     |      |      1 |   1043 |      3 |00:00:00.01 |      84 |      1 |   
|   2 |   TABLE ACCESS FULL| T    |      1 |   1043 |   1000 |00:00:00.01 |      84 |      1 |   
----------------------------------------------------------------------------------------------   
                                                                                                 
Note                                                                                             
-----                                                                                            
   - dynamic statistics used: dynamic sampling (level=2)  

create index i on t (x);

select /*+ gather_plan_statistics */x, count(*) from t
group  by x;

X   COUNT(*)   
  1        334 
  2        333 
  0        333 

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                       
SQL_ID  0v787jt578k0y, child number 0                                                   
-------------------------------------                                                   
select /*+ gather_plan_statistics */x, count(*) from t group  by x                      
                                                                                        
Plan hash value: 674246460                                                              
                                                                                        
-------------------------------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
-------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      3 |00:00:00.01 |      84 |   
|   1 |  HASH GROUP BY     |      |      1 |   1043 |      3 |00:00:00.01 |      84 |   
|   2 |   TABLE ACCESS FULL| T    |      1 |   1043 |   1000 |00:00:00.01 |      84 |   
-------------------------------------------------------------------------------------   
                                                                                        
Note                                                                                    
-----                                                                                   
   - dynamic statistics used: dynamic sampling (level=2) 


Hmmm. I added an index but we still have a full table scan!

What's going on?

Oracle Database excludes rows where all the columns are null from an index. You're reading all the rows (no where clause). So a full table scan is the only option.

But make the column not null and:

alter table t modify x not null;

select /*+ gather_plan_statistics */x, count(*) from t
group  by x;

X   COUNT(*)   
  1        334 
  2        333 
  0        333 

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                   
SQL_ID  0v787jt578k0y, child number 0                                                               
-------------------------------------                                                               
select /*+ gather_plan_statistics */x, count(*) from t group  by x                                  
                                                                                                    
Plan hash value: 1339806295                                                                         
                                                                                                    
-------------------------------------------------------------------------------------------------   
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |   
-------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT      |      |      1 |        |      3 |00:00:00.01 |       5 |      3 |   
|   1 |  HASH GROUP BY        |      |      1 |   1043 |      3 |00:00:00.01 |       5 |      3 |   
|   2 |   INDEX FAST FULL SCAN| I    |      1 |   1043 |   1000 |00:00:00.01 |       5 |      3 |   
-------------------------------------------------------------------------------------------------   
                                                                                                    
Note                                                                                                
-----                                                                                               
   - dynamic statistics used: dynamic sampling (level=2) 


WOOOOO! A fast full index scan! :)

This does a lot less work than the full table scan (notice the much smaller values for buffers 5 vs 84).

So an index is likely to help provided:

- The column(s) are declared not null
- The index is smaller than the table

So what about histograms?

These don't help directly. They're a tool to improve row estimates. i.e. how much data the optimizer thinks each step of the plan will process.

The closer these estimates are to reality, the more likely you are to have a good plan. A histogram helps if there's large skew in your data. For example, a database of people's addresses will have far more rows for China than Chile.

- The more rows you access (China), the more likely it is a full table scan is the best method.
- The fewer rows you fetch (Chile), the more likely it is an index range scan is optimal.

But there's a lot more to histograms than this. I recommend reading up on how they work in the concepts guide:

https://docs.oracle.com/en/database/oracle/oracle-database/18/tgsql/histograms.html#GUID-BE10EBFC-FEFC-4530-90DF-1443D9AD9B64

And this post from Maria Colgan on common histogram myths:

https://sqlmaria.com/2017/05/09/optimizer-histograms/

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, February 27, 2018 - 5:07 pm UTC

Hi,

Thanks for sharing details,

Currently their is production issue , regarding similar query which I have mentioned .

Just want to know that in my table IP
Data is skew and value passed in filter is not present in Column of table

So does it sounding good idea for creating histogram in my case.


Chris Saxon
February 28, 2018 - 3:54 pm UTC

¯\_(ツ)_/¯

Sorry, we don't have enough details to say if a histogram is useful or not.

Look at your plans. Are they "wrong"? e.g. the optimizer chooses a full table scan instead of an index. Or vice-versa. And can this poor choice be explained by incorrect row estimates? As in, are the estimated number of rows significantly different to the actual number of rows?

Also:

Are you sure there isn't a histogram? Unless someone disabled creation of these, stats gathering is likely to create one if you do have data skew.


More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database