Skip to Main Content
  • Questions
  • I have a query where in my humble opinion the optimizer fails measurably, and cannot get to work out why!

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Antonis.

Asked: August 14, 2020 - 10:46 am UTC

Last updated: August 18, 2020 - 10:45 am UTC

Version: 11.2.0.3

Viewed 1000+ times

You Asked

CART_V_PICK_ANAL is a view on top of views.. on top of a table with 33K rows.
I Know the plan is wrong, it goes down to how oracle accesses this 33K table.
The way it should access it is clear to me.
I do not know what kind of limit I am reaching.. is it "view depth" because a step back.. it would work fine.
as a small backgraound info:
orders => 700K
Orddetails =>6M
ordanalysis => 46K
Taks =>33k.

Look at the following results:

SQL> select orderid from orders where ordercode = '#00000000820   ';

   ORDERID
----------
   4538931

Elapsed: <b>00:00:00.048</b>

SQL> sELECT  * FROM CART_V_PICK_ANAL WHERE orderid = 4538931 ;

WAREHOUSEID ORDERSGROUP             ORDERID LOCATIONID PREPACKAGEID     ITEMID STOCKSTATUSID WMSSTATUSID  LU1PERLU2 SUM_CARTQTY SUMPICKQTY CURPICKQTY RUNPICKINGQTY PICKINGQTY SUM_BOXQTY   DETLINES   VS_LOTID VS_DATEIN      LOTID   TASKTYPE PICKINGTYPE  ORDANALID     TASKID DATEIN  
----------- -------------------- ---------- ---------- ------------ ---------- ------------- ----------- ---------- ----------- ---------- ---------- ------------- ---------- ---------- ---------- ---------- --------- ---------- ---------- ----------- ---------- ---------- --------
          9 4538931                 4538931      89328                  157357            60          11         48          10         10         10            10         10          0          1                                         25           3   28344677    3044260 16/03/18
          9 4538931                 4538931      90813                  150503            60          11         75           0         75         75            75          0          1          1                                         25           3   28344675    3042539 21/09/17
 ...
23 rows selected. 

Elapsed: <b>00:00:00.038</b>

SQL> sELECT  * FROM CART_V_PICK_ANAL WHERE orderid in (select orderid from orders where ordercode = '#00000000820   ');

WAREHOUSEID ORDERSGROUP             ORDERID LOCATIONID PREPACKAGEID     ITEMID STOCKSTATUSID WMSSTATUSID  LU1PERLU2 SUM_CARTQTY SUMPICKQTY CURPICKQTY RUNPICKINGQTY PICKINGQTY SUM_BOXQTY   DETLINES   VS_LOTID VS_DATEIN      LOTID   TASKTYPE PICKINGTYPE  ORDANALID     TASKID DATEIN  
----------- -------------------- ---------- ---------- ------------ ---------- ------------- ----------- ---------- ----------- ---------- ---------- ------------- ---------- ---------- ---------- ---------- --------- ---------- ---------- ----------- ---------- ---------- --------
          9 4538931                 4538931      89328                  157357            60          11         48          10         10         10            10         10          0          1                                         25           3   28344677    3044260 16/03/18
          9 4538931                 4538931      90813                  150503            60          11         75           0         75         75            75          0          1          1                                         25           3   28344675    3042539 21/09/17
..
23 rows selected. 

Elapsed: <b>00:01:48.513</b>



How should I proceed?
I am pretty sure there should be an easy way out of this.

and Chris said...

The short answer is the optimizer doesn't know which value of orderid the subquery selects. So it has less accurate row estimates.

This simple demo shows why this can be a problem. There's one table with 100 rows. 1 of this has the value 1, the rest 99. If you search for the value 1, the optimizer knows there's only one row for this value, so the E(stimated)-rows for this table is one:

create table t1 as 
  select case level
           when 1 then 1
           else 99
         end c1
  from   dual
  connect by level <= 100;
  
exec dbms_stats.gather_table_stats ( user, 't1', method_opt => 'for columns c1 size 2' ) ;
  
set serveroutput off
alter session set statistics_level = all;

select * from t1
where  c1 = 1;

select * 
from   table(dbms_xplan.display_cursor(null, null, 'ROWSTATS LAST'));
/*
PLAN_TABLE_OUTPUT                                               
SQL_ID  54cws9vns3mbp, child number 0                            
-------------------------------------                            
select * from t1 where  c1 = 1                                   
                                                                 
Plan hash value: 3617692013                                      
                                                                 
-------------------------------------------------------------    
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |    
-------------------------------------------------------------    
|   0 | SELECT STATEMENT  |      |      1 |        |      1 |    
|*  1 |  TABLE ACCESS FULL| T1   |      1 |      1 |      1 |    
-------------------------------------------------------------    
                                                                 
Predicate Information (identified by operation id):              
---------------------------------------------------              
                                                                 
   1 - filter("C1"=1) 
*/   


If instead, you

- Create a two row table with the values 1 and 99
- Find the rows from the first table with the value 1 by using a query using t2

the optimizer doesn't know which row from t2 it'll fetch. So it hedges its bets and estimates 50 rows from t1:

create table t2 ( c1 int );
insert into t2 values ( 1 );
insert into t2 values ( 99 );

exec dbms_stats.gather_table_stats ( user, 't2' ) ;

select * from t1
where  c1 = ( select c1 from t2 where c1 = 1 );

select * 
from   table(dbms_xplan.display_cursor(null, null, 'ROWSTATS LAST'));

/*
PLAN_TABLE_OUTPUT                                                 
SQL_ID  6hcsxnm979h57, child number 0                              
-------------------------------------                              
select * from t1 where  c1 = ( select c1 from t2 where c1 = 1 )    
                                                                   
Plan hash value: 1484901111                                        
                                                                   
--------------------------------------------------------------     
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |     
--------------------------------------------------------------     
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |     
|*  1 |  TABLE ACCESS FULL | T1   |      1 |     50 |      1 |     
|*  2 |   TABLE ACCESS FULL| T2   |      1 |      1 |      1 |     
--------------------------------------------------------------     
                                                                   
Predicate Information (identified by operation id):                
---------------------------------------------------                
                                                                   
   1 - filter("C1"=)                                               
   2 - filter("C1"=1)
*/


So my guess is less accurate row estimates when using a subquery means the optimizer changes access paths.

It's possible there's something else going on. In any case, to find out precisely why there are different plans for your queries, please get the execution plans like I've done and share them here:

set serveroutput off
alter session set statistics_level = all;

select * from <your query>

select * 
from   table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Some more information.

A reader, August 18, 2020 - 9:14 am UTC

The ordercode is a unique key while orderid is the primary key.
So, selectivity should not be an issue.. I guess.

And the strange think is that the “stupid” rule optimizer has the same problem, which for me indicates that it is probably some thing missing before the optimizer.... it seems to ignore the possibility of executing this way!
I will try to do what you ask, but at them moment it is not possible.

Stay safe!
Chris Saxon
August 18, 2020 - 10:45 am UTC

The ordercode is a unique key while orderid is the primary key.

That's in orders, not the view, right?

The point is the optimizer doesn't know how many rows there in the view for a specific order.

it seems to ignore the possibility of executing this way!

The problem is the optimizer doesn't know what value of orderid the subquery fetches. The only way to know is to execute the subquery.

In the general case, this is impractical in the optimization phase - the subquery itself could take a long time to execute.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.