Skip to Main Content
  • Questions
  • Partition Pruning behaves differently when date format not specified in TO_DATE function

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Jamie.

Asked: November 24, 2009 - 5:17 pm UTC

Last updated: December 07, 2009 - 3:36 pm UTC

Version: 10.2.0.4.0

Viewed 10K+ times! This question is

You Asked

Tom,

Firstly let me thank you for all your advice over the years - I'm a long time reader – this is my first question to you.

I have a partitioned table with a local index on column ar_id.

Query 1 - date format specified = full table scan:
SELECT *
  FROM ar_msmt_period amp
 WHERE ar_id = 2002904
   AND amp.start_dte >= TO_DATE ('01SEP2009', 'DDMONYYYY');
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Optimizer Mode=ALL_ROWS  2 M   252498                           
  PARTITION RANGE ITERATOR  2 M 112 M 252498                        24 26
    TABLE ACCESS FULL AR_MSMT_PERIOD 2 M 112 M 252498                        24 26

Query 2 - date format not specified (note this is the ONLY difference) = index range scan - also note PStart = KEY (so I’m guessing partition pruning not the same):
SELECT *
  FROM ar_msmt_period amp
 WHERE ar_id = 2002904
   AND amp.start_dte >= TO_DATE ('01SEP2009');
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Optimizer Mode=ALL_ROWS  2 M   1649310                           
  PARTITION RANGE ITERATOR  2 M 112 M 1649310                        KEY 26
    TABLE ACCESS BY LOCAL INDEX ROWID AR_MSMT_PERIOD 2 M 112 M 1649310                        KEY 26
      INDEX RANGE SCAN IDX_FK_AR_MSMT_PERIOD 22 M   4339                        KEY 26


The second plan runs much faster.
We try to encourage developers to specify the date format in their queries. Yet in this case it results in poorer query performance.

Why does the missing date format make a difference?

My guess would be that it relates to the overloaded TO_DATE function?

This is a similar test example I was able to create (not related to partitioning) where a function based index is ignored because the date format is not specified;

CREATE TABLE zz_jctemp AS 
SELECT dtes.dte_id
  FROM (SELECT     TO_CHAR (  TO_DATE ('20000101'
                                     , 'YYYYMMDD'
                                      )
                            + LEVEL
                          , 'YYYYMMDD'
                           ) AS dte_id
              FROM DUAL
        CONNECT BY LEVEL <= 10) dtes
     , (SELECT     LEVEL
              FROM DUAL
        CONNECT BY LEVEL <= 100000) row_gen;

CREATE INDEX ZZ_IDX1 ON ZZ_JCTEMP (TO_DATE(DTE_ID,'YYYYMMDD'));

BEGIN
   DBMS_STATS.gather_table_stats (ownname => USER
                                , tabname => 'ZZ_JCTEMP'
                                , estimate_percent => 100
                                , DEGREE => 4
                                , method_opt => 'FOR ALL INDEXED COLUMNS'
                                , CASCADE => TRUE
                                 );
END;
/


Query 1 - date format specified = index lookup:
SELECT *
  FROM zz_jctemp
 WHERE TO_DATE (dte_id, 'YYYYMMDD') = TO_DATE ('20000101', 'YYYYMMDD');
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Optimizer Mode=ALL_ROWS  44 K   114    TABLE ACCESS BY INDEX ROWID ZZ_JCTEMP 44 K 260 K 114                           
    INDEX RANGE SCAN ZZ_IDX1 44 K   66    

Query 2 - date format not specified = full table scan
 SELECT *
  FROM zz_jctemp
 WHERE TO_DATE (dte_id) = TO_DATE ('20000101', 'YYYYMMDD');
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Optimizer Mode=ALL_ROWS  10 K   455    TABLE ACCESS FULL ZZ_JCTEMP 10 K 58 K 455    


My guess is that the partition example is the same underlying reason as the function based index example. That is, the optimiser treats the overloaded function as a different function altogther (even though it returns the same result) and cannot use the function in its evaluation of the plan.

What are your thoughts?

and Tom said...

One of the cases is deterministic, one is not.

to_date( '01sep2009', 'ddmonyyyy' ) is known AT HARD PARSE TIME to be the 1st of September, 2009.

to_date( '01sep2009' ) is known at HARD PARSE TIME to be .... well, I don't now, it could be many things:

ops$tkyte%ORA10GR2> select to_char( to_date( '01sep2009' ), 'dd-mon-yyyy hh24:mi:ss' ) from dual;

TO_CHAR(TO_DATE('01S
--------------------
01-sep-2009 00:00:00

ops$tkyte%ORA10GR2> alter session set nls_date_format = 'hh24monddss';

Session altered.

ops$tkyte%ORA10GR2> select to_char( to_date( '01sep2009' ), 'dd-mon-yyyy hh24:mi:ss' ) from dual;

TO_CHAR(TO_DATE('01S
--------------------
20-sep-2009 01:00:09





forget the second plan entirely for a moment, look at the first one and ask yourself "are the estimated row counts even close to reality?"

if not, ask "why not, am I missing some statistics, could dynamic sampling help, https://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html are my statistics stale, etc..."

and my example above should be more than enough to demonstrate to the developers that they are going to look bad and be called stupid if they do not use formats - their code will break, not might - it will someday. It is just a matter of time.



the example above should also explain the "mystery" of why the index could not be used in the second case (the function based index).


to_date(dt) is not equivalent to to_date(dt,'format')


The reasons are entirely different for the index not being used in both cases.


In case 1: when you are explicit, the estimated cardinalities are computed using the actual value, at hard parse time we KNOWN what the date is. When you use an implicit mask, we do NOT know (see how the partition start/stop went from being KNOWN in case one to NOT being known in case 2?? we didn't have the value). The estimated card = value was computed using a guess.

What you need to do in case 1 is figure out why the estimated card= value in the explicit (correct) plan is so far off from reality


In case 2: the index cannot be used to evaluate to_date(dte_id), because to_date(dte_id) is not the same as TO_DATE(DTE_ID,'YYYYMMDD')


ops$tkyte%ORA10GR2> CREATE TABLE t AS
  2  SELECT dtes.dte_id
  3    FROM (SELECT     TO_CHAR (  TO_DATE ('20000101'
  4                                       , 'YYYYMMDD'
  5                                        )
  6                              + LEVEL
  7                            , 'YYYYMMDD'
  8                             ) AS dte_id
  9                FROM DUAL
 10          CONNECT BY LEVEL <= 1) dtes
 11  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select to_char(   to_date(dte_id),   'dd-mon-yyyy hh24:mi:ss' ) from t;
select to_char(   to_date(dte_id),   'dd-mon-yyyy hh24:mi:ss' ) from t
                          *
ERROR at line 1:
ORA-01861: literal does not match format string


ops$tkyte%ORA10GR2> alter session set nls_date_format = 'yyyymmdd';

Session altered.

ops$tkyte%ORA10GR2> select to_char(   to_date(dte_id),   'dd-mon-yyyy hh24:mi:ss' ) from t;

TO_CHAR(TO_DATE(DTE_
--------------------
02-jan-2000 00:00:00

ops$tkyte%ORA10GR2> alter session set nls_date_format = 'ddyyhhmm';

Session altered.

ops$tkyte%ORA10GR2> select to_char(   to_date(dte_id),   'dd-mon-yyyy hh24:mi:ss' ) from t;

TO_CHAR(TO_DATE(DTE_
--------------------
20-feb-2000 01:00:00




see how many ways that string could be interpreted ? and we don't know HOW to do that until runtime



This just demonstrates the evil that is the to_date() function without a format!!!!!!!!!!!


Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, November 27, 2009 - 1:08 pm UTC

Excellent Example !!!

A case of missing bind?

Martin Vajsar, November 28, 2009 - 6:04 am UTC

The lesson learned from the missing format specification in the to_date function is indeed very valuable. However it seems to me that the main problem lies in not using bind variables, consequences of which have been reiterated by Tom countless times. If the values were bound as a date, the format issue could not even arise.

(May be that the date values come in the text format in the application, but even in this case it seems strange to me to use to_date. I'd convert the string to a date immediately after obtaining it from somewhere, first of all to detect malformed representations as close to source as possible - a lesson learned in the hard way. Of course, this is a matter of personal preference.)
Tom Kyte
November 28, 2009 - 1:55 pm UTC

No - not really.

*many* if not *most* client applications deal with dates in strings often.

Anything that uses a string to hold a date, must supply a format that can unambiguously convert that string into a date.

binds or literals - they would be the same.

Only if you are using a date type in a client application that represents the binary date format used in the database so there is in fact no conversion - would you not have to worry about the format (because there would not be any to_date calls)

Follow-up from opening post

Jamie Cowburn, December 03, 2009 - 10:17 pm UTC

Hi Tom,

Thanks for your response, which makes sense and was much appreciated.

I'm not sure if you care for a follow-up or not, but I took your advice on the DYNAMIC_SAMPLING hint.
SELECT /*+ DYNAMIC_SAMPLING(amp 3) */
       *
  FROM ar_msmt_period amp
 WHERE ar_id = 2002904 
   AND amp.start_dte >= TO_DATE ('01SEP2009', 'DDMONYYYY');

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Optimizer Mode=ALL_ROWS  18 K   28957                           
  PARTITION RANGE ITERATOR  18 K 767 K 28957                        22 25
    TABLE ACCESS BY LOCAL INDEX ROWID AR_MSMT_PERIOD 18 K 767 K 28957                        22 25
      INDEX RANGE SCAN IDX_FK_AR_MSMT_PERIOD 2 K   56                        22 25

(Actual number of rows returned is 141)

This is a monster table from a large financial organisation's datawarehouse environment.
Table stats on this table are a bit iffy to say the least, and getting the system changed to collect better stats is not possible at the moment, for a myriad of reasons.

So for the moment we are going with DYNAMIC_SAMPLING hint in our query. We will look at improving the stats on the table in the longer term.

Thanks again.

Tom Kyte
December 04, 2009 - 4:38 pm UTC

thanks for the followup, dynamic sampling in a report system can be a very good thing.

Not a silver bullet or anything, no werewolves will be killed, but a good tool to know about.

A reader, December 07, 2009 - 11:31 am UTC

Tom,
I read your article on dynamic sampling as you had pointed out in the answer. My question is, does SQL profile store information about cardinality also in addition to the query plan?

Using a SQL profile is a bit like gathering statistics for a query and storing that information for the optimizer to use at hard parse time—it saves on the cost of dynamic sampling by “sampling” once and persisting the values.

Thanks...

Tom Kyte
December 07, 2009 - 3:36 pm UTC

... does SQL profile store information about cardinality also in addition to the query plan? ...

yes, it uses a series of "selectivity" hints to tell the optimizer what the various parts of the query will actually return. It saves all of this information in the dictionary for subsequent hard parses.

Partition pruning is not used

aliyar, December 04, 2013 - 3:41 pm UTC

Dear Tom ,

Thanks for your help for DBA world. could you please clarify my issue.

Database : 11.2.0.3 running on 08 node RAC Exadata server

only one DB is running. all of sudden , performance of DB wend
bad. after analysis identified that there are lots of Cluster waits. that too for particular below MERGE query very high.


MERGE INTO range_hash PA
USING(SELECT P2(I).range_hashID range_hashID,
P2(I).testID testID,
P2(I).range_hashKEY range_hashKEY,
P2(I).ATTRIBUTEVALUE ATTRIBUTEVALUE,
P2(I).ACTIVE ACTIVE,
P2(I).CREATIONDATE CREATIONDATE
FROM DUAL/*TABLE(P2) TMP*/) SRC
-- FROM PRODATTRIBUTE_GTT TMP) SRC
ON (PA.testID = SRC.testID AND
PA.range_hashKEY = SRC.range_hashKEY AND
PA.CREATIONDATE = SRC.CREATIONDATE)
WHEN MATCHED THEN
UPDATE
SET --PA.testID = SRC.testID,
-- PA.range_hashKEY = SRC.range_hashKEY,
--##PA.range_hashVALUE = SRC.ATTRIBUTEVALUE,
PA.ACTIVE = SRC.ACTIVE,
--PA.CREATIONDATE = SRC.CREATIONDATE,
PA.LASTMODDATE = (CASE
WHEN PA.LASTMODDATE > SYSDATE--SYSTIMESTAMP
THEN PA.LASTMODDATE + 1/24/60/60
ELSE SYSDATE--SYSTIMESTAMP
END)
WHEN NOT MATCHED THEN
INSERT(PA.range_hashID,
PA.testID,
PA.range_hashKEY,
PA.range_hashVALUE,
PA.ACTIVE,
PA.CREATIONDATE,
PA.LASTMODDATE,
PA.MODCOUNT)
VALUES(range_hash_SEQ.NEXTVAL,
SRC.testID,
SRC.range_hashKEY,
SRC.ATTRIBUTEVALUE,
SRC.ACTIVE,
SRC.CREATIONDATE,
SYSDATE,
0)
;


table range_hash is sub partitioned by range-hash. both partition/sub partition columns are number data type column

from AWR report :

Cluster Wait Time (s) Executions %Total Elapsed Time(s) %Clu %CPU %IO  SQL Id SQL Module SQL Text
64,810.51 810 61.38 89,956.58 72.05 9.53 2.39 5psw902g9b88m  PA_OLOCK MERGE INTO range_hash PA...


Execution plan as follows :


Plan hash value: 1572288415

----------------------------------------------------------------------------------------------------------------
| Id  | Operation    | Name         | Rows  | Bytes | Cost  | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT    |         |       |       |     5 |       |       |
|   1 |  MERGE     | range-hash     |       |       |       |       |       |
|   2 |   VIEW     |         |       |       |       |       |       |
|   3 |    SEQUENCE    | range-hash_SEQ |       |       |       |       |       |
|   4 |     MERGE JOIN OUTER   |         |     1 |    99 |     5 |       |       |
|   5 |      FAST DUAL    |         |     1 |       |     1 |       |       |
|   6 |      PARTITION RANGE SINGLE  |         |     1 |    99 |     4 |   KEY |   KEY |
|   7 |       PARTITION HASH ALL  |         |     1 |    99 |     4 |     1 |  LAST |
|   8 |        TABLE ACCESS BY LOCAL INDEX ROWID| range-hash     |     1 |    99 |     4 |   KEY |   KEY |
|   9 |  INDEX RANGE SCAN  | PA_PRODUCT_KEY_P     |     1 |       |     3 |   KEY |   KEY |
----------------------------------------------------------------------------------------------------------------



we have latest statistics. we have same merge query running on
another DB. there it is using partition pruning, running faster( 2sec) and compared all aspects from both. everything is almost same ( partition type,indexes,). but here it is taking 110 sec per execution.

not sure why in this DB , partition pruning is not taking place?

any other way to force partition pruning? or what are the possible causes for not to choose partition pruning ?

could you please help us to understand what went wrong all of sudden?

Thanks in advance



More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.