Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, RAUF.

Asked: June 27, 2018 - 7:23 am UTC

Last updated: June 29, 2018 - 4:22 am UTC

Version: 11g 11.2.0.4.0

Viewed 1000+ times

You Asked

Hi Tom,
I will explain my case in example.

I create simple table just with two columns:

create table OC_YKB.TAB_TEST1 ( GUID number, stat varchar2(1) );


Then insert 1 million of values where GUID is different values and STAT = 'Y'
and then 0.5 million of values with different GUID values and STAT = 'N'

create sequence oc_ykb.sq_test1;

BEGIN
   LOOP
      INSERT INTO OC_YKB.TAB_TEST1
           VALUES (oc_ykb.sq_test1.NEXTVAL, 'Y');

      IF MOD (oc_ykb.sq_test1.CURRVAL, 10000) = 0
      THEN
         COMMIT;
      END IF;

      EXIT WHEN oc_ykb.sq_test1.CURRVAL = 1000000;
   END LOOP;
END;

BEGIN
   LOOP
      INSERT INTO OC_YKB.TAB_TEST1
           VALUES (oc_ykb.sq_test1.NEXTVAL, 'N');

      IF MOD (oc_ykb.sq_test1.CURRVAL, 10000) = 0
      THEN
         COMMIT;
      END IF;

      EXIT WHEN oc_ykb.sq_test1.CURRVAL = 1500000;
   END LOOP;
END;


Then I create index on STAT column:

CREATE INDEX OC_YKB.IX_STAT ON OC_YKB.TAB_TEST1 (STAT);


a) Now is the problem: when I select rows with for example STAT = 'Z' (now rows will be selected) index gives very huge cardinality and cost , so Oracle prefer FULL TABLE SCAN. Why? The same happens with values between 'F' and 'Z'. If I search for 'A', 'B', 'C' the COST is 3 and index works fine. For search of 'D' COST goes much higher, for 'E' even more higher and after 'F' it is consistent big value ( that is quite higher than FULL TABLE SCAN COST )

select * from OC_YKB.TAB_TEST1
where stat = 'Z';


b) Now is another strange thing: If I insert just one row with STAT not in range between 'N' - 'Y' and gather table stats, Index starts to work perfectly for all STAT values from 'A' to 'Z'.

INSERT INTO OC_YKB.TAB_TEST1
           VALUES (oc_ykb.sq_test1.NEXTVAL, 'H');
commit;
exec DBMS_STATS.GATHER_TABLE_STATS(ownname => 'OC_YKB' , tabname => 'TAB_TEST1');

select * from OC_YKB.TAB_TEST1
where stat = 'Z';


question is the same, why ? :) and thanks in advance

and Connor said...

I'll give a quick verbal answer now, and come back later with a test case when I get a spare minute :-)

This will be a combination of things.

1) Boundary conditions

If I have values (say) 500 => 1000 in a column X, then by default, we will store the low and high value for that column when we gather stats (low=500, high=1000). Queries *outside* this range will be costed in a sliding scale based on their "distance" from the boundary. So X=499 will be costed pretty much the same as X=500, ie, same number of rows expected to be returned. X=250 will seen as far less likely because it is further outside the boundary. X=100 will seen as even less likely than that (ie, its expected cardinality will be very low). The lower the cardinality will lean things more toward an index over a full scan.

2) Auto histogram

Continuing from my example above, run a few queries of the type "X = value", or "X > value" etc, and we *record* that fact. We know that X is an "interesting" column for you. So the *next* time you gather stats, we will record *more* information about X - we will gather a histogram for that column. So now, we will (potentially) have information about *each* value of X in our table. So now, when I run query for X=value, then the probability of my estimate being more accurate is much higher.


Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

further clarification

RAUF HUSEYNZADA, June 27, 2018 - 11:28 am UTC

Thanks for the answer. But concerning second point "2) Auto histogram" if I delete last one row for STAT = 'H' and gather statistics for table the problem comes again. Cardinality and Cost goes very high again.

DELETE FROM OC_YKB.TAB_TEST1 WHERE STAT = 'H';
commit;
exec DBMS_STATS.GATHER_TABLE_STATS(ownname => 'OC_YKB' , tabname => 'TAB_TEST1');

select * from OC_YKB.TAB_TEST1
where stat = 'Z';


Concerning "1) Boundary conditions" I got that oracle works this way but don't understand why. Can you suggest any article or something with explanation?.

I thought that when we put search WHERE STAT = 'Z' index just checks in B-Tree for 'Z' branch, but sees only 'N' and 'Y' branchs, so in 1-2 steps index should understand that there are no rows with STAT = 'Z'
Connor McDonald
June 28, 2018 - 6:31 am UTC

OK, here's the full example now I've had time to come back to it

SQL> create table TAB_TEST1 ( GUID number, stat varchar2(1) );

Table created.

SQL>
SQL> create sequence sq_test1 cache 1000;

Sequence created.

SQL>
SQL> INSERT /*+ APPEND */ INTO TAB_TEST1 select sq_test1.NEXTVAL, 'Y' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> INSERT /*+ APPEND */ INTO TAB_TEST1 select sq_test1.NEXTVAL, 'N'  from dual connect by level <= 1500000;

1500000 rows created.

SQL> CREATE INDEX IX_STAT ON TAB_TEST1 (STAT);

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','TAB_TEST1');

PL/SQL procedure successfully completed.

SQL>
SQL> select histogram from user_tab_col_statistics
  2  where  table_name = 'TAB_TEST1'
  3  and column_name = 'STAT';

HISTOGRAM
---------------
NONE

1 row selected.

SQL>
SQL> select count(*)
  2  from user_tab_histograms
  3  where  table_name = 'TAB_TEST1'
  4  and column_name = 'STAT';

  COUNT(*)
----------
         2

1 row selected.

--
-- You can see that at this point, we have NO histogram 
--

SQL> select * from TAB_TEST1 where stat = 'Z';

no rows selected

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  b0vapgn674m4q, child number 0
-------------------------------------
select * from TAB_TEST1 where stat = 'Z'

Plan hash value: 1414169785

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |  1220 (100)|          |
|*  1 |  TABLE ACCESS FULL| TAB_TEST1 |  1136K|  8877K|  1220   (2)| 00:00:15 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STAT"='Z')


18 rows selected.

SQL> select * from TAB_TEST1 where stat = 'V';

no rows selected

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  5v9aqvvxv16yf, child number 0
-------------------------------------
select * from TAB_TEST1 where stat = 'V'

Plan hash value: 1414169785

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |  1220 (100)|          |
|*  1 |  TABLE ACCESS FULL| TAB_TEST1 |  1250K|  9765K|  1220   (2)| 00:00:15 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STAT"='V')


18 rows selected.

SQL> select * from TAB_TEST1 where stat = 'M';

no rows selected

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  1sf991p8y0s87, child number 0
-------------------------------------
select * from TAB_TEST1 where stat = 'M'

Plan hash value: 1414169785

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |  1220 (100)|          |
|*  1 |  TABLE ACCESS FULL| TAB_TEST1 |  1136K|  8877K|  1220   (2)| 00:00:15 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STAT"='M')


18 rows selected.

SQL> select * from TAB_TEST1 where stat = 'G';

no rows selected

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  4apjjdcbrsydq, child number 0
-------------------------------------
select * from TAB_TEST1 where stat = 'G'

Plan hash value: 1414169785

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |  1220 (100)|          |
|*  1 |  TABLE ACCESS FULL| TAB_TEST1 |   454K|  3551K|  1220   (2)| 00:00:15 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STAT"='G')


18 rows selected.

SQL> select * from TAB_TEST1 where stat = 'B';

no rows selected

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  5272w7agrdzxj, child number 0
-------------------------------------
select * from TAB_TEST1 where stat = 'B'

Plan hash value: 459980392

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB_TEST1 |     1 |     8 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IX_STAT   |     1 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("STAT"='B')


19 rows selected.

--
-- And from above you can see the "boundary value" stuff in effect, ie, 
-- the further away we get, the lower the estimate
--
-- Now we insert a new value, not that this really matters, it is the queries above
-- that have told the database we have an interest in the STAT column
--


SQL> INSERT INTO TAB_TEST1 VALUES (sq_test1.NEXTVAL, 'H');

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','TAB_TEST1');

PL/SQL procedure successfully completed.

SQL>
SQL> select histogram from user_tab_col_statistics
  2  where  table_name = 'TAB_TEST1'
  3  and column_name = 'STAT';

HISTOGRAM
---------------
FREQUENCY

1 row selected.

SQL>
SQL> select count(*)
  2  from user_tab_histograms
  3  where  table_name = 'TAB_TEST1'
  4  and column_name = 'STAT';

  COUNT(*)
----------
         3

1 row selected.

--
-- See, now we have a histogram, so we get *different* calculations on values
--

SQL>
SQL> select * from TAB_TEST1 where stat = 'Z';

no rows selected

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  b0vapgn674m4q, child number 0
-------------------------------------
select * from TAB_TEST1 where stat = 'Z'

Plan hash value: 1414169785

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |  1220 (100)|          |
|*  1 |  TABLE ACCESS FULL| TAB_TEST1 |  1136K|  8877K|  1220   (2)| 00:00:15 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STAT"='Z')


18 rows selected.

SQL> DELETE FROM TAB_TEST1 WHERE STAT = 'H';

1 row deleted.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','TAB_TEST1');

PL/SQL procedure successfully completed.

SQL>
SQL> select histogram from user_tab_col_statistics
  2  where  table_name = 'TAB_TEST1'
  3  and column_name = 'STAT';

HISTOGRAM
---------------
FREQUENCY

1 row selected.

SQL>
SQL> select count(*)
  2  from user_tab_histograms
  3  where  table_name = 'TAB_TEST1'
  4  and column_name = 'STAT';

  COUNT(*)
----------
         2

1 row selected.

SQL>
SQL> select * from TAB_TEST1 where stat = 'Z';

no rows selected

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  b0vapgn674m4q, child number 0
-------------------------------------
select * from TAB_TEST1 where stat = 'Z'

Plan hash value: 1414169785

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |  1220 (100)|          |
|*  1 |  TABLE ACCESS FULL| TAB_TEST1 |  1136K|  8877K|  1220   (2)| 00:00:15 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STAT"='Z')


18 rows selected.

SQL>


and even though we deleted H, we will keep that FREQUENCY histogram each time we gather stats. Hope this helps

RAUF HUSEYNZADA, June 28, 2018 - 10:30 am UTC

Thanks a lot, lower / upper bounds and histogram both makes huge effects
Connor McDonald
June 29, 2018 - 4:22 am UTC

glad we could help

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.