Skip to Main Content
  • Questions
  • Storing date field as number datatype ...

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Sasidhar.

Asked: November 27, 2006 - 1:27 am UTC

Last updated: June 05, 2008 - 10:19 am UTC

Version: 9.2.0.4

Viewed 1000+ times

You Asked

Hi Tom,

1.Is there any advantage of storing date field as Number datatype in the table ? we have few tables like this

Ex:

Desc t1
-----------------------
...
..
Startdate Number
Enddate Number
..
..

2.Will this have an impact on peformance , when we have where clause like below

select c1,c2..cn
from t1
where
..
startdate >= :st_date and
enddate <= :end_date

how would it work if there is an index on start and enddate columns.

If you had already answered this type question in your site,would it be possible to provide the link..(I searched in this site but not able to find links on this subject..)

Thanks for your time.

Thanks,
Sasi




and we said...

1) there are ONLY DISADVANTAGES.

there are no, none, nada, nunca, zero advantages.

disadvantage one: someone will store Feb 29th, 2001 in your number. the will not be allowed to in a date.

disadvantage two: you have blinded the optimizer, made it quite impossible to do its job correctly.

ops$tkyte%ORA10GR2> create table t
2 as
3 select dt+rownum dt, to_number( to_char( dt+rownum, 'yyyymmdd' ) ) num
4 from (select to_date( '01-jan-1995', 'dd-mon-yyyy') dt from all_objects )
5 /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=>'for all columns size 254' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where dt between to_date( '31-dec-2000' ) and to_date('01-jan-2001' );

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 35 (6)| 00:00:01
|* 1 | TABLE ACCESS FULL| T | 1 | 14 | 35 (6)| 00:00:01
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("DT"<=TO_DATE('2001-01-01 00:00:00', 'yyyy-mm-dd
hh24:mi:ss') AND "DT">=TO_DATE('2000-12-31 00:00:00', 'yyy
hh24:mi:ss'))

ops$tkyte%ORA10GR2> select * from t where num between 20001231 and 20010101;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 185 | 2590 | 35 (6)| 00:00:01
|* 1 | TABLE ACCESS FULL| T | 185 | 2590 | 35 (6)| 00:00:01
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("NUM"<=20010101 AND "NUM">=20001231)

ops$tkyte%ORA10GR2> set autotrace off

lots of numbers between those numbers, few DATES between those DATES!!!!

use the proper and correct datatype, don't be "smart" and try to obfuscate things...


2) see above for numbers....

if you have an index on startdate, and another index on enddate, it'll use one or the other (but not both)


Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Does it apply to DWH

Umesh Kasturi, November 28, 2006 - 2:13 am UTC

Tom,
Does the same theory hold good on DWH. Because I have heard that in DataWare House, dates are stored in a "look up table of Numbers" and those Numbers are used in all other tables
Please comment

Thanks


Tom Kyte
November 28, 2006 - 7:19 am UTC

this "theory" (which is in fact - well, a fact) is MOST TRUE in a data warehouse whereby you actually need the optimizer to get the best estimated cardinalities possible in order to generate a plan that runs in your lifetime.


wrong cardinality => wrong plan
wrong plans (are caused by) wrong cardinality estimates


everything you do to confuse the optimizer - by using strings or numbers to store dates, by putting "really large dates/numbers in place of using NULLS" and so on - will reduce the ability of the optimizer to correctly estimate the cardinality.

Excellent !!!

parag j patankar, November 28, 2006 - 9:05 am UTC

Hi Tom,

Perfect answer.

I would like to add that if you use number or string to store date, you will miss time component attached to date field. If you want to define time you have to add column(s) in a table and many cases, i find out this part was skipped from initial design.

best regards
PJP

Well explained.

Sasi, November 28, 2006 - 12:24 pm UTC

Hi Tom,

Excellent example (and a tricky date -Feb 29th, 2001) . I shall take this example to explain it to my developers.

We are not only storing dates as NUMBER datatype but also as RAW.(which is converted by 3rd party tools and then stored in a table).

Once again thanks for your time.

Thanks,
Sasi

Tom Kyte
November 28, 2006 - 7:48 pm UTC

</code> http://asktom.oracle.com/Misc/see-you-just-cannot-make-this-stuff-up.html <code>


a raw to store a date, that is indeed a new one on me.

Dates as RAW

Bo Brunsgaard, November 29, 2006 - 4:09 am UTC

Dates stored as raw may seem out there, but I may have an explanation.

I have seen this done where data was received off an IBM mainframe system, containing MVS timestamps. These were (if my failing memory serves me correct) 8 byte binary values, and were stored directly as received (since their actual semantic content as date/times were not used in the receiving application).

Not to suggest that raw(8) is an optimal choice for storing dates in general....

Bo



Tom Kyte
November 30, 2006 - 8:56 am UTC

yeah, but that is like saying:

We store our numbers in strings because our input file has strings.




Sometimes semantics don't match

Michael Friedman, December 31, 2006 - 10:42 am UTC

How about "I store my numbers in strings because I need to do decimal arithmetic which native number types don't do."

That's a design decision that Oracle's original developers made - instead of using the obviously faster floating point types that are implemented by most systems in hardware to IEEE specs.

This example with MVS time stamps is one of five cases where I would seriously consider not storing date datatypes in DATE fields:

1. I need to map exactly to an external system whose date semantics either do not exactly match Oracle's or which may not match Oracle's (ie. for daylight savings time, leap seconds, etc.)

2. I need to handle date / time to a greater precision than Oracle supports (ie. to the microsecond)

3. I need exact to the second date / time tracking including leap seconds

4. I need exact intervals between times... so the interval between midnight the day before daylight savings time starts and midnight the day after daylight savings time starts should not be exactly 2 days

5. I need to support dates before or after the date range supported by Oracle (4000BC to 4000AD if I remember correctly)

Um... actually Base 100

Michael Friedman, December 31, 2006 - 12:02 pm UTC

Let me correct that... Oracle stores numbers in Base 100.

Still, point remains... sometimes it makes sense not to use native types - Oracle had good reasons not to use native number types and sometimes you have a good reason not to use an Oracle type.

That said, it's damned rare.

Dilemma

Kenneth Chan, December 09, 2007 - 9:45 pm UTC

Hi Tom,

I have came across an implementation (surprisingly - an Oracle commercial product) that stores dates as the "Number" data type. Instead of storing something like '20070101' as numeric like what you did in the example, it counts the number of seconds passed so far based on 1st Jan, 1970. For example if today was 1st Feb, 2001, the value would be

select (to_date('20010201', 'yyyymmdd') - to_date('19700101', 'yyyymmdd'))*86400 from dual;
which turns out to be 980985600.

I know it aligns with the time presentation begin used widely on UNIX boxes (the POSIX time); it also eliminates the 'Feb 29th, 2001' problem. How would you justify that? I mean is there a special purpose to storing dates in this way instead of a DATE data type.

Thanks in advance.
Tom Kyte
December 10, 2007 - 11:09 am UTC

sure, because they want to make sure that their stuff stops working...

http://en.wikipedia.org/wiki/Year_2038_problem

Date as VARCHAR2

A reader, June 04, 2008 - 2:35 pm UTC

Tom,
Our developers are asking for a justification on why they cannot store DATE as VARCHAR2 data type. I quickly opened Jonathan Lewis's Cost Based Oracle Fundamentals since I remebered reading something about storing DATE as VARCHAR2. This is covered on Chapter 6, page 118 Daft Data Types.

I ran the example in that section on our 10.2.0.3 database and I was surprised to find that cardinality was the same whether I use VARCHAR2 or NUMBER or DATE to store DATE data though in the book Jonathan's example shows different cardinality for DATE, VARCHAR2 and NUMBER. I use 10046 trace/tkprof.

I am really at a loss as to how to convince them on the merits of using correct data type.

Thanks...

Tom Kyte
June 04, 2008 - 4:49 pm UTC

how about these for reasons:

BECAUSE IT WOULD BE STUPID TO USE THE WRONG DATATYPE TO STORE SOMETHING.
BECAUSE IT WOULD MAKE THE DEVELOPERS LOOK STUPID.
BECAUSE IT IS WRONG.
BECAUSE IT FLIES IN THE FACE OF COMMON SENSE.
BECAUSE IT WOULD ENSURE YOU WOULD HAVE THE 29th of Feb, 2009 in your table (among other really bad dates)

BECAUSE YOU WOULD SPEND LOTS OF CPU CYCLES CONVERTING THIS STRING INTO A DATE again in order to do things like..... math - how many days between, give me everything within one month of this.... etc

I'd love to hear from them why they think this would be "smart", "intelligent", a "good idea", "superior to using the right datatype"

ops$tkyte%ORA10GR2> create table t ( str_date, date_date, number_date, data )
  2  as
  3  select to_char( dt+rownum,'yyyymmdd' ),
  4         dt+rownum,
  5         to_number( to_char( dt+rownum,'yyyymmdd' ) ),
  6             rpad('*',45,'*')
  7    from (select to_date('01-jan-1995','dd-mon-yyyy') dt
  8            from all_objects)
  9  /

Table created.

ops$tkyte%ORA10GR2> create index t_str_date_idx on t(str_date);

Index created.

ops$tkyte%ORA10GR2> create index t_date_date_idx on t(date_date);

Index created.

ops$tkyte%ORA10GR2> create index t_number_date_idx on t(number_date);

Index created.

ops$tkyte%ORA10GR2> begin
  2          dbms_stats.gather_table_stats
  3          ( user, 'T',
  4            method_opt=> 'for all indexed columns',
  5            cascade=> true );
  6  end;
  7  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> set autotrace on explain
ops$tkyte%ORA10GR2> select *
  2    from t
  3   where str_date between '20001231' and '20010101';

STR_DATE DATE_DATE NUMBER_DATE DATA
-------- --------- ----------- ---------------------------------------------
20001231 31-DEC-00    20001231 *********************************************
20010101 01-JAN-01    20010101 *********************************************


Execution Plan
----------------------------------------------------------
Plan hash value: 623125948

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |   298 | 14006 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T              |   298 | 14006 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_STR_DATE_IDX |   298 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("STR_DATE">='20001231' AND "STR_DATE"<='20010101')

ops$tkyte%ORA10GR2> select *
  2    from t
  3   where number_date between 20001231 and 20010101;

STR_DATE DATE_DATE NUMBER_DATE DATA
-------- --------- ----------- ---------------------------------------------
20001231 31-DEC-00    20001231 *********************************************
20010101 01-JAN-01    20010101 *********************************************


Execution Plan
----------------------------------------------------------
Plan hash value: 2837589515

-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   |   297 | 13959 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T                 |   297 | 13959 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_NUMBER_DATE_IDX |   297 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NUMBER_DATE">=20001231 AND "NUMBER_DATE"<=20010101)

ops$tkyte%ORA10GR2> select * from t where date_date
  2      between to_date('20001231','yyyymmdd') and to_date('20010101','yyyymmdd');

STR_DATE DATE_DATE NUMBER_DATE DATA
-------- --------- ----------- ---------------------------------------------
20001231 31-DEC-00    20001231 *********************************************
20010101 01-JAN-01    20010101 *********************************************


Execution Plan
----------------------------------------------------------
Plan hash value: 546586007

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |     1 |    47 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T               |     1 |    47 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_DATE_DATE_IDX |     1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATE_DATE">=TO_DATE('2000-12-31 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
              "DATE_DATE"<=TO_DATE('2001-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

ops$tkyte%ORA10GR2> set autotrace off
ops$tkyte%ORA10GR2>





wouldn't raw be better

Jim, June 04, 2008 - 6:51 pm UTC

Why don't those developers just store all values as raw? Wouldn't that allow them the most flexibility? For large raw values they could use gzip compression and save space even.
Tom Kyte
June 05, 2008 - 9:41 am UTC

better yet, they need one table:


create table objects ( object_id number primary key, data xmltype );



never have to bother with that pesky database thingy again.


Date as VARCHAR2

A reader, June 05, 2008 - 9:26 am UTC

Tom,
Thanks for your reply and as usual a great example.

When I asked the developers, their argument is that they are getting data from a legacy system. In the legacy system, it is stored as 200805 for May 2008, 200806 for June 2008 and so on (PIC 9(006)). Their question is that how they can store 200805 as 200805 in a DATE field? If they convert 200805 to 20080501, then it means they are changing data.

Thanks...
Tom Kyte
June 05, 2008 - 10:19 am UTC

1-may-2008 is generally regarded as "may, 2008"

use a date. period.

Storing months in a DATE field

Martin Vajsar, June 06, 2008 - 3:44 pm UTC

If you want to make sure that a date field will accept only dates representing months, you can constraint them:
check(thedate=trunc(thedate, 'MM'))

More to Explore

DBMS_STATS

More on PL/SQL routine DBMS_STATS here