Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Bob.

Asked: June 07, 2023 - 11:34 am UTC

Last updated: June 13, 2023 - 9:11 am UTC

Version: various currently 19 g

Viewed 1000+ times

You Asked

We have a table with an END_DATE column defined as

Column_Name Data_Type       Nullable        Data_Default
END_DATE    DATE            Yes         (null)


During the period of 24-oct-2007 07:36:04 to 23-sep-2019 11:41:41 a number (but not all) of the rows have a dump century of 10,000 for END_DATE

select dump(end_date),  end_date, to_char(end_date,'dd-mon-yyyy hh24:mi:ss') from rate where rate_id =8076496;


DUMP(END_DATE)                     END_DATE    TO_CHAR(END_DATE,'DD-MON-YYYY HH24:MI:SS')
Typ=12 Len=7: 200,122,12,31,1,1,1  31-DEC-22   00-000-0000 00:00:00


while others have the expected century, 2000

DUMP(END_DATE)                     END_DATE    TO_CHAR(END_DATE,'DD-MON-YYYY HH24:MI:SS')
Typ=12 Len=7: 120,120,12,31,1,1,1  31-DEC-20   31-DEC-2020 00:00:00


The problem went away after 23-sep-2019 11:41:41, but a customer who queries for rows including the period of 24-oct-2007 07:36:04 to 23-sep-2019 11:41:41 encounters errors in processing the "00-000-0000 00:00:00" values.
While we could fix the problematic rows with SQL, the customer wants us to tell them why the problem could occur in the first case.
So the question is, how could this erroneous century get into an Oracle database in the first place?

and Chris said...

MOS note 331831.1 covers this issue:

How can these dates end up in the database?

The issue here comes from the way the client side code and the database communicate with each other. Depending on the level at which the connection is made, Oracle does not always do a full check of the value that is inserted (this is by design, this is not a fault).

If you use standard SQL commands in an application like SQL*Plus, then something like this can never happen, because the layer that this works on has all the checks and bounds that are needed to stop this data from entering. If illegal byte sequences were inserted in a database in those circumstances, it would be a bug in Oracle software.

However, if your client application uses OCI or Pro*C (not JDBC thin driver), then you can use the DATE external data type in OCI and Pro*C to insert illegal values into the database. The DATE external data type is not checked for valid values by Oracle, it is left to the client side to make sure that what is inserted is a correct DATE.

Detection of these incorrect values in the database & solutions

In order to correct a problem like this the action plan in your situation should be (in any order):

* Track down these invalid values in the database and update the rows with the date that was meant to be stored.
* Track down where these rows were inserted from and under which circumstances, and try to find out and fix the problem in the application that let these values in.


The MOS note contains further details if you need them.

There are a few old bugs related to this problem, but they were fixed in the 11g days. So unless you've upgraded recently they're unlikely to apply. If you believe these values originate from the database itself, speak with support.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Some bugs re: inserting invalid values with plain SQL are not fixed as of Oracle 19

mathguy, June 11, 2023 - 2:50 pm UTC

This question took me back to a thread on the Oracle Technology Network or whatever it is called now, more than seven years back.

https://forums.oracle.com/ords/apexds/post/how-can-i-trap-an-impossibly-large-date-6142

My posts in the latter part of that thread should be taken with a pinch of salt as I had just started, that same month, to learn Oracle and SQL; I am not a computer professional, and I had not worked with databases before then. However, it seems that most of what I reported there is unchanged in my current version of Oracle, which is 19.3.

The problem reported there was exactly that of invalid dates that make their way into the database. Some of those problems (bugs) survive in Oracle 19; there may have been other bugs that were fixed, but there are some that have not been fixed. I suspect this is still true for Oracle 23, or whatever the "current" version is as I write this.

To your main question - how do such invalid values get into the db in the first place. The most likely scenario is a direct load (simple copying of bytes with no validation whatsoever). Suppose you import data from some other system, not from Oracle db. You tell the Oracle db: "accept the incoming values as valid; I guarantee that they are" - but in fact they aren't valid. Specifically, Oracle uses one particular scheme for encoding dates - seven bytes where the first is the century, the second is the year within century, both in excess-over-100 encoding (so that negative years can also be encoded as positive integers in base 100). If the data source you import from uses a different scheme, data that is valid in the source, copied byte-for-byte to Oracle db without any checking, may be invalid in Oracle db.

This is why such raw copying of data should - in most cases - be used ONLY for Oracle db - to - Oracle db loads, not for importing data from other sources (in my opinion anyway).

You said you could fix the problematic rows with SQL. I wonder what you have in mind - other than, perhaps, trying to find all the invalid rows and setting the date values to NULL. It is not clear how you would be able to reconstitute the correct, intended date from the invalid values stored in the db. Most importantly, as I discovered in that old thread on OTN, in some cases an invalid date may actually be stored in the Oracle db in a seven-byte sequence that represents a valid date - but one that has nothing to do with the intended value! Obviously such problems can't be fixed if you only look at invalid data to begin with.

Here is a quick demo of causing invalid data to be stored in the Oracle db, using purely Oracle SQL commands (which, again, is not the most likely reason for YOUR issue). It may be argued that "INSERT ... VALUES ..." uses a form of raw data manipulation, similar to batch loads; but the invalid dates should be caught earlier than that. In the examples I use date arithmetic (adding an insane number of days or of months to a valid "starting" date); that date arithmetic calculation should throw the error, regardless of what is intended to be done with the result of the calculation. Whether it is used in an INSERT statement or for any other purpose is irrelevant.

select banner_full from v$version;

BANNER_FULL                                                                     
--------------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0


drop table t purge;

create table t (id number, dt date);

insert into t (id, dt) values (1, date '2020-12-31' + 17500000);
insert into t (id, dt) values (2, add_months(date '2020-12-31', 574000));
insert into t (id, dt) values (3, date '2020-12-31' + 45000000);
insert into t (id, dt) values (4, add_months(date '2020-12-31', 65550*12));

commit;

alter session set nls_date_format='yyyy-mm-dd';

select id, dt, to_char(dt, 'yyyy-mm-dd') as dt_char, dump(dt) from t;

 ID  DT          DT_CHAR     DUMP(DT)                           
---  ----------  ----------  -----------------------------------
  1  9998-05-17  9998-05-17  Typ=12 Len=7: 200,98,5,17,1,1,1    
  2  9918-04-30  9918-04-30  Typ=12 Len=7: 200,18,4,30,1,1,1    
  3  5846-10-25  0000-00-00  Typ=12 Len=7: 42,54,10,25,1,1,1    
  4  2034-12-31  2034-12-31  Typ=12 Len=7: 120,134,12,31,1,1,1 


The example shows several interesting things. I was able to cause invalid data to be inserted in my table, using purely Oracle SQL code. The first two examples show how I can get dates in the 100-th century, either by adding days to a valid date or using ADD_MONTHS on a valid starting date. Note that in both cases the "year" byte is less than 100, which in Oracle encoding should mean the year is negative (B.C. or whatever the politically correct notation is for that these days). But the "century" byte is greater than 100; so this kind of DUMP can be immediately recognized as "invalid" for a date encoding.

The third example (ID = 3) results in both the "century" and the "year" bytes being less than 100. This should encode a B.C. year, but selecting the date with NLS_DATE_FORMAT as yyyy-mm-dd shows a date in year 5846 A.D.. Strangely, if we explicitly apply TO_CHAR to the date, we get the string 0000-00-00. Which means that something else is happening; converting the date to a string implicitly (in the DT column) vs. explicitly (in the DT_CHAR) column results in different output, even though both use the same format model.

The last example is the most interesting. We get a CORRECT dump for the date shown, 31 December 2034! However, that is NOT the correct result of the date expression in the INSERT statement; that expression should return 31 December in year 65550 + 2020 = 67570. If someone working in astronomy or particle physics, where dates in year 67570 may be valid, uses Oracle db for storing data and making computations, this would be a very definite bug. (Caused by their own incompetence - if they need years beyond 9999 they shouldn't be using Oracle db in the first place.)

One can play with this further. For example, I tried (not shown here) to create the same invalid results using date plus interval arithmetic: adding days or months using either interval literals or NUMTODSINTERVAL and NUMTOYMINTERVAL. Interestingly, that works (either way) for adding insanely many days, but it throws errors for adding insanely many months. Some checking seems to be happening, but not in a consistent manner.
Connor McDonald
June 12, 2023 - 4:44 am UTC

I'll log bugs for these even though they're fairly niche

Real-life example regarding fairly niche isssue

mathguy, June 12, 2023 - 3:42 pm UTC

In the old OTN thread, the original poster stated that the issue of bad data entering the db arose in a production system. His specific case involved ADD_MONTHS, not with a hard-coded huge number as argument, but rather with the result of a more complex calculation that returned a number. If that calculation had a bug that caused the return value to be unreasonably high for the way it was used, but Oracle accepted the result instead of rejecting it on principle, it is easy to see how that caused problems down the line.

I don't know what that poster's specific use case was, but it is easy to concoct very realistic scenarios where the same issue may arise. I am providing an example to support your bug filing process - in case people in the organization may dismiss it as irrelevant.

Suppose an outside data source provides an effective date of 1 June 2020 00:00:00 UTC, but it is given as POSIX time, 1590969600. This is the number of seconds since 1 January 1970 00:00:00 UTC. A user wants to convert this to a proper Oracle date. So he would do something like this:

create table t (id number, dt date);

insert into t (id, dt) values (101, date '1970-01-01' + 1590969600/86400);
  
select id, dt, to_char(dt, 'yyyy-mm-dd hh24:mi:ss') as dt_char, dump(dt) as dt_dump
from   t
;

  ID DT        DT_CHAR             DT_DUMP                             
---- --------- ------------------- ------------------------------------
 101 01-JUN-20 2020-06-01 00:00:00 Typ=12 Len=7: 120,120,6,1,1,1,1  


So far so good - everything seems fine.

But suppose the data provider changes their convention, and starts sending POSIX time in milliseconds instead of seconds without warning; not unheard of. Without changing the SQL statement, the Oracle user may do something like this. (Think of the input POSIX time as an input coming from the data file, not a hard-coded integer; the user may not see that the integer is now three digits longer.)

insert into t (id, dt) values (133, date '1970-01-01' + 1590969600000/86400);

select id, dt, to_char(dt, 'yyyy-mm-dd hh24:mi:ss') as dt_char, dump(dt) as dt_dump
from   t
;

  ID DT        DT_CHAR             DT_DUMP                             
---- --------- ------------------- ------------------------------------
 101 01-JUN-20 2020-06-01 00:00:00 Typ=12 Len=7: 120,120,6,1,1,1,1     
 133 27-OCT-49 0000-00-00 00:00:00 Typ=12 Len=7: 225,49,10,27,1,1,1


Nothing changed in the code; the input is now 1000 times greater, and the date arithmetic calculation is now very similar to what I described in the previous Comment. This doesn't seem that far-fetched to me.

Incidentally, this might be how bad data was created in Bob's case (original poster in this thread, on AskTom). A direct load of bad raw data is still more likely, but I can see how inputting POSIX time just as I demonstrated above could easily result in the same issue.
Chris Saxon
June 13, 2023 - 9:11 am UTC

Thanks for the example.

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.