You Asked

Team:

Please see the below test case, when we hardcode the values in the Predicates, Optimizer estimated for cardinality was correct.

but when we go for a join with values from another table, the estimated for cardinality was way off from Actuals

Questions:
1) why the cardinality deviates when we go for a join? can't the optimizer pickup the t2.start_time and t2.end_time column statistics and optimize the sql the produce the right cardinality for execution? ( the reason for the ask is we got simillar kind of sql in application, where the estimated cardinality is getting misleading and the optimizer is picking up the incorrect index for execution )

2) what additional piece of information can i provide here for the optimizer to come up with the correct cardinality for below join?

drop table t1 purge;
drop table t2 purge;

create table t1 as select * from all_objects;
create index t1_idx on t1( created );
create table t2 ( start_time date, end_time date );
insert into t2 values( to_date('13-nov-2019','dd-mon-yyyy'), to_date('13-nov-2019 22:00','dd-mon-yyyy hh24:mi'));
commit;

exec dbms_stats.gather_table_stats(user,'T2');
exec dbms_stats.gather_table_stats(user,'T1',method_opt=>'for columns created size 2048');

demo@PDB1> set autotrace traceonly explain statistics
demo@PDB1> select *
  2  from t1
  3  where created between to_date('13-nov-2019','dd-mon-yyyy')
  4  and to_date('13-nov-2019 22:00','dd-mon-yyyy hh24:mi');


Execution Plan
----------------------------------------------------------
Plan hash value: 1775246573

----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |     1 |   135 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |     1 |   135 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CREATED">=TO_DATE(' 2019-11-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "CREATED"<=TO_DATE(' 2019-11-13 22:00:00', 'syyyy-mm-dd hh24:mi:ss'))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
       2472  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

demo@PDB1> select *
  2  from t1, t2
  3  where t1.created between t2.start_time and t2.end_time;


Execution Plan
----------------------------------------------------------
Plan hash value: 2010545385

---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        | 10060 |  1483K|     6   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |        | 10060 |  1483K|     6   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |        | 10060 |  1483K|     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | T2     |     1 |    16 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | T1_IDX |    92 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     | 10060 |  1326K|     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."CREATED">="T2"."START_TIME" AND
              "T1"."CREATED"<="T2"."END_TIME")


Statistics
----------------------------------------------------------
         59  recursive calls
          0  db block gets
         71  consistent gets
          0  physical reads
          0  redo size
       2630  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
          1  rows processed

demo@PDB1>

and Connor said...

A join (like the one above) is more akin the bind variable costing, ie, we do not know the values of the incoming "parameters" in advance. You can see the costs are comparable to binds

SQL> drop table t1 purge;

Table dropped.

SQL> drop table t2 purge;

Table dropped.

SQL>
SQL> create table t1 as select * from dba_objects;

Table created.

SQL> create index t1_idx on t1( created );

Index created.

SQL> select max(created) from t1;

MAX(CREAT
---------
12-JUN-20

1 row selected.

SQL> select min(created) from t1;

MIN(CREAT
---------
30-MAY-19

1 row selected.

SQL>
SQL> create table t2 ( start_time date, end_time date );

Table created.

SQL> insert into t2 values( to_date('26-may-2020','dd-mon-yyyy'), to_date('26-may-2020 22:00','dd-mon-yyyy hh24:mi'));

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> exec dbms_stats.gather_table_stats(user,'T2');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user,'T1',method_opt=>'for columns created size 2048');

PL/SQL procedure successfully completed.

SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t1
  3      where created between to_date('26-may-2020','dd-mon-yyyy')
  4      and to_date('26-may-2020 22:00','dd-mon-yyyy hh24:mi');

Execution Plan
----------------------------------------------------------
Plan hash value: 1775246573

----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |     5 |   660 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |     5 |   660 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |     5 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CREATED">=TO_DATE(' 2020-05-26 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "CREATED"<=TO_DATE(' 2020-05-26 22:00:00', 'syyyy-mm-dd hh24:mi:ss'))

SQL>
SQL> select *
  2      from t1, t2
  3      where t1.created between t2.start_time and t2.end_time;

Execution Plan
----------------------------------------------------------
Plan hash value: 2010545385

---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |  3665 |   529K|    10   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |        |  3665 |   529K|    10   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |        |  3665 |   529K|    10   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | T2     |     1 |    16 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | T1_IDX |   248 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |  3665 |   472K|     7   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."CREATED">="T2"."START_TIME" AND
              "T1"."CREATED"<="T2"."END_TIME")

SQL>
SQL> set autotrace off
SQL>
SQL> explain plan for select *
  2      from t1
  3      where created between :1 and :2;

Explained.

SQL>
SQL>
SQL> select dbms_xplan.display_plan() from dual;

DBMS_XPLAN.DISPLAY_PLAN()
----------------------------------------------------------------------------------------------------
 Plan Hash Value  : 4096079515

------------------------------------------------------------------------------------------
| Id  | Operation                              | Name   | Rows | Bytes | Cost | Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |        |  206 | 27192 |   10 | 00:00:01 |
| * 1 |   FILTER                               |        |      |       |      |          |
|   2 |    TABLE ACCESS BY INDEX ROWID BATCHED | T1     |  206 | 27192 |   10 | 00:00:01 |
| * 3 |     INDEX RANGE SCAN                   | T1_IDX |  370 |       |    2 | 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter(TO_DATE(:2)>=TO_DATE(:1))
* 3 - access("CREATED">=:1 AND "CREATED"<=:2)


1 row selected.

If you want the optimizer to know more ... then you typically have to ask it to dig deeper into the data, eg


SQL>
SQL> set feedback only
SQL> select /*+ dynamic_sampling(11) gather_plan_statistics */ *
  2      from t1, t2
  3      where t1.created between t2.start_time and t2.end_time;

5 rows selected.

SQL> set feedback on
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  06f9basp2rfdt, child number 0
-------------------------------------
select /*+ dynamic_sampling(11) gather_plan_statistics */ *     from
t1, t2     where t1.created between t2.start_time and t2.end_time

Plan hash value: 2010545385

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      5 |00:00:00.01 |      11 |
|   1 |  NESTED LOOPS                |        |      1 |      5 |      5 |00:00:00.01 |      11 |
|   2 |   NESTED LOOPS               |        |      1 |    248 |      5 |00:00:00.01 |       9 |
|   3 |    TABLE ACCESS FULL         | T2     |      1 |      1 |      1 |00:00:00.01 |       6 |
|*  4 |    INDEX RANGE SCAN          | T1_IDX |      1 |    248 |      5 |00:00:00.01 |       3 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |      5 |      5 |      5 |00:00:00.01 |       2 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."CREATED">="T2"."START_TIME" AND "T1"."CREATED"<="T2"."END_TIME")

Note
-----
   - dynamic statistics used: dynamic sampling (level=0)


27 rows selected.

SQL>
SQL>
SQL>

Rating

(2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Dynamic Sampling

Rajeshwaran, Jeyabal, June 12, 2020 - 6:43 am UTC

Team- Thanks for Inputs - adding dynamic sampling(11) to my actual query, doesn't pick the right index, any other work around for us plesae?

June 13, 2020 - 9:43 am UTC

The optimizer can only do so much.

You could look at adding an OPT_ESTIMATE hint for the join cardinality, but that would suggest either

a) you expect it to be consistent value , or
b) you have an algortim to get an estimate yourself and the fold it in with dynamic SQL

Multiple executions should yield some statistics feedback, but of course, some times its too late

algortim to get an estimate yourself

Rajeshwaran, Jeyabal, June 14, 2020 - 8:15 am UTC

Thanks.

But when you say
...
b) you have an algortim to get an estimate yourself and the fold it in with dynamic SQL
...

are we referring to Cardinality Hint? if that is not, can you show us an example of the above reference?

June 15, 2020 - 3:26 am UTC

No I was referring to the opt_estimate hint for join cardinalities. But the logic is the same - you basically come up with a set of variable SQL's to best suit your needs.

eg

select count(*) into x from t2 where ....

if x < 10 then
  open rc for select /*+ OPT_ESTIMATE(JOIN (t1 t2) ROWS=200) */*
elsif x between 10 and 1000 then
  open rc for select /*+ OPT_ESTIMATE(JOIN (t1 t2) ROWS=2000) */*
else
  open rc for select /*+ OPT_ESTIMATE(JOIN (t1 t2) ROWS=20000) */*

Its definitely not ideal, but a means of adding that little more information to the optimizer for those tricky moments.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

Dynamic Sampling

algortim to get an estimate yourself

More to Explore

Performance

on Cardinality deviation

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

Dynamic Sampling

algortim to get an estimate yourself

More to Explore

Performance