Hi Tom,
I have a large database table that I am trying to pull data from in summary format, which involves summarizing the data with a variable start date. For the sake of example, I have framed the question as a much simpler version of the code than I actually have in production. For the purposes of this example, assume I have a table of orders that I want to pull from, which contains order information by chain, store, and date (date is a timestamp column).
The orders table contains orders from the time the stores came into being and started stocking the shelves, but I only want data from the time the individual store actually opened its doors to the public. For that reason, I have built a case statement which has hard-coded the store opening date (yes, I know this is not good design, but assume the database does not have that information, and I have been asked to create an ad-hoc report that will be used once and thrown away ;) ).
My issue is this. If I hard-code a single start date, the query returns in a matter of seconds. However, this does not work for me because each individual store has a unique opening date, so I have hard-coded the opening date using date literals for each store into a CASE statment. Although I would expect an increase in processing time to execute the CASE statement, it results in an EXPONENTIAL increase in processing time.
I have tried changing the TO-DATE in the case literals to TIMESTAMP, thinking that might avoid an implicit conversion, but it does not seem to make a difference.
So, I am looking for a better way to do this. My explain plan shows indexes are being used, there is an index by chainid, storeid, and ordertmst.
It is obviously a monkey wrench thrown into the works by the CASE statement, but I cannot figure out why the disparity is so very large.
select vtc.chainid, vtc.storeid, vtc.orderid, vtc.ordertmst, count(*) as itemcount
from orders vtc
where vtc.backlog = 'N'
and vtc.ordertmst >= --to_date ( '20200101', 'yyyymmdd' ) -- <<< if I use this instead of the case, results come back almost immediately
----- this case statment causes the query to return in HOURS instead of seconds.
case vtc.chainid
when 501 then to_date ( '20200101', 'yyyymmdd' )
when 504 then
case
when vtc.storeid in ( 2 ) then to_date ( '2020021400', 'yyyymmddhh24' )
when vtc.storeid in ( 12 ) then to_date ( '2020020310', 'yyyymmddhh24' )
when vtc.storeid in ( 9, 10 ) then to_date ( '2020022506', 'yyyymmddhh24' )
when vtc.storeid in ( 5, 6 ) then to_date ( '202003251355', 'yyyymmddhh24mi' )
when vtc.storeid in ( 7, 8 ) then to_date ( '202003121920', 'yyyymmddhh24mi' )
when vtc.storeid in ( 3, 4 ) then to_date ( '202004101355', 'yyyymmddhh24mi' )
else to_date ( '20200101', 'yyyymmdd' )
end
when 507 then
case
when vtc.storeid in ( 1, 2 ) then to_date ( '202004160950', 'yyyymmddhh24mi' )
when vtc.storeid in ( 3, 4 ) then to_date ( '202005051400', 'yyyymmddhh24mi' )
else to_date ( '47120101', 'yyyymmdd' )
end
when 502 then
case
when vtc.storeid in ( 11 ) then to_date ( '202004031800', 'yyyymmddhh24mi' )
when vtc.storeid in ( 14 ) then to_date ( '202004231200', 'yyyymmddhh24mi' )
else to_date ( '20200101', 'yyyymmdd' )
end
when 701 then to_date ( '2020011403', 'yyyymmddhh24' )
when 702 then to_date ( '2020010600', 'yyyymmddhh24' )
when 608 then to_date ( '202001290235', 'yyyymmddhh24mi' )
when 607 then to_date ( '202002030349', 'yyyymmddhh24mi' )
when 601 then to_date ( '202002110021', 'yyyymmddhh24mi' )
when 602 then to_date ( '202002162335', 'yyyymmddhh24mi' )
when 606 then to_date ( '202002240142', 'yyyymmddhh24mi' )
when 618 then to_date ( '202002262130', 'yyyymmddhh24mi' )
when 614 then to_date ( '202003012227', 'yyyymmddhh24mi' )
when 605 then to_date ( '202003230300', 'yyyymmddhh24mi' )
when 610 then to_date ( '202003200900', 'yyyymmddhh24mi' )
when 609 then to_date ( '202003292340', 'yyyymmddhh24mi' )
when 613 then to_date ( '202004052000', 'yyyymmddhh24mi' )
when 617 then to_date ( '202004072300', 'yyyymmddhh24mi' )
when 506 then to_date ( '202005100908', 'yyyymmddhh24mi' )
else to_date ( '20200101', 'yyyymmdd' )
end
group by vtc.chainid, vtc.storeid, vtc.orderid, vtc.ordertmst
The optimizer is unable to use an index on the timestamp column when you compare it to the case expression. So I suspect you're seeing the query go from a nice, fast index access to less efficient access method. If you share the two plans we can comment further.
Here's an example similar to your situation. The first query compares the date column to a date literal and uses the index. But put the same dates in a complex case expression and it switches to a full table scan:
create table t (
c1 int, c2 date, c3 int, c4 varchar2(100)
);
insert into t
with rws as (
select level x from dual
connect by level <= 1000
)
select x, date'2020-01-01' + ( x / 2 ),
mod ( x, 4 ), lpad ('x', 100, 'x' )
from rws;
commit;
create index i on t ( c2 );
set serveroutput off
select * from t
where c2 >= date'2021-05-01';
select *
from dbms_xplan.display_cursor( format => 'BASIC LAST +PREDICATE');
----------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T |
|* 2 | INDEX RANGE SCAN | I |
----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C2">=TO_DATE(' 2021-05-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
select * from t
where c2 >= case c3
when 0 then date'2021-05-01'
when 1 then date'2021-05-01'
when 2 then date'2021-05-01'
when 3 then date'2021-05-01'
end;
select *
from dbms_xplan.display_cursor( format => 'BASIC LAST +PREDICATE');
----------------------------------
| Id | Operation | Name |
----------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | TABLE ACCESS FULL| T |
----------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C2">=CASE "C3" WHEN 0 THEN TO_DATE(' 2021-05-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') WHEN 1 THEN TO_DATE(' 2021-05-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') WHEN 2 THEN TO_DATE(' 2021-05-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') WHEN 3 THEN TO_DATE(' 2021-05-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') END )
In your example it looks to me like all the dates in the expression are on or after the original date you used. So an easy workaround (for now) is:
Do both!
Keep the simple literal and the complex case expression.
You will almost certainly need to come up with a better solution long-term though. Over time your data will grow, and the (presumably) index will no longer be effective, meaning the optimizer switches to a different access method, which may lead you back to your current situation.