Hi Team,
I have question regarding of situation i am facing.
i have table let have example table "IP" :
table IP has one column ID i.e IP(ID);
below is value distribution :-
SQL> select count(*),id from IP group by id;
COUNT(*) ID
---------- --------------------
25 a
50 b
75 c
i would like to know if i create index on IP(ID) i will definitely get benefit , and i am thinking of creating histogram as well it will benefit me for values (a,b,c) .
but if i pass value in where clause for "id" , other than listed in ID column will collecting Histogram help me or it might be chances that it would go out of bound and might go in FTS or index full scan ??? OR will it help me any way in performance gain factor ???
example:-
select count(*) from IP where id='h';====> will gathering histogram help ???
Will adding an index
definitely improve performance?
No:
create table t as
select mod(level, 3) x, lpad('x', 500, 'x') y
from dual
connect by level <= 1000;
set serveroutput off
select /*+ gather_plan_statistics */x, count(*) from t
group by x;
X COUNT(*)
1 334
2 333
0 333
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID 0v787jt578k0y, child number 0
-------------------------------------
select /*+ gather_plan_statistics */x, count(*) from t group by x
Plan hash value: 674246460
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:00.01 | 84 | 1 |
| 1 | HASH GROUP BY | | 1 | 1043 | 3 |00:00:00.01 | 84 | 1 |
| 2 | TABLE ACCESS FULL| T | 1 | 1043 | 1000 |00:00:00.01 | 84 | 1 |
----------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
create index i on t (x);
select /*+ gather_plan_statistics */x, count(*) from t
group by x;
X COUNT(*)
1 334
2 333
0 333
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID 0v787jt578k0y, child number 0
-------------------------------------
select /*+ gather_plan_statistics */x, count(*) from t group by x
Plan hash value: 674246460
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:00.01 | 84 |
| 1 | HASH GROUP BY | | 1 | 1043 | 3 |00:00:00.01 | 84 |
| 2 | TABLE ACCESS FULL| T | 1 | 1043 | 1000 |00:00:00.01 | 84 |
-------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Hmmm. I added an index but we still have a full table scan!
What's going on?
Oracle Database excludes rows where all the columns are null from an index. You're reading all the rows (no where clause). So a full table scan is the only option.
But make the column not null and:
alter table t modify x not null;
select /*+ gather_plan_statistics */x, count(*) from t
group by x;
X COUNT(*)
1 334
2 333
0 333
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID 0v787jt578k0y, child number 0
-------------------------------------
select /*+ gather_plan_statistics */x, count(*) from t group by x
Plan hash value: 1339806295
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:00.01 | 5 | 3 |
| 1 | HASH GROUP BY | | 1 | 1043 | 3 |00:00:00.01 | 5 | 3 |
| 2 | INDEX FAST FULL SCAN| I | 1 | 1043 | 1000 |00:00:00.01 | 5 | 3 |
-------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
WOOOOO! A fast full index scan! :)
This does a lot less work than the full table scan (notice the much smaller values for buffers 5 vs 84).
So an index is likely to help provided:
- The column(s) are declared not null
- The index is smaller than the table
So what about histograms?
These don't help directly. They're a tool to improve row estimates. i.e. how much data the optimizer thinks each step of the plan will process.
The closer these estimates are to reality, the more likely you are to have a good plan. A histogram helps if there's large skew in your data. For example, a database of people's addresses will have far more rows for China than Chile.
- The more rows you access (China), the more likely it is a full table scan is the best method.
- The fewer rows you fetch (Chile), the more likely it is an index range scan is optimal.
But there's a lot more to histograms than this. I recommend reading up on how they work in the concepts guide:
https://docs.oracle.com/en/database/oracle/oracle-database/18/tgsql/histograms.html#GUID-BE10EBFC-FEFC-4530-90DF-1443D9AD9B64 And this post from Maria Colgan on common histogram myths:
https://sqlmaria.com/2017/05/09/optimizer-histograms/