Skip to Main Content
  • Questions
  • Type Mismatch - using the wrong datatype to store my data

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Yang.

Asked: November 28, 2007 - 1:52 am UTC

Last updated: November 30, 2007 - 1:38 pm UTC

Version: 10g

Viewed 10K+ times! This question is

You Asked

Hi Tom,

I have some curiosities about how does Oracle parse SQL statements. Thank you for your help

create table TEST
(
  ID    NUMBER,
  VALUE VARCHAR2(100)
);
Insert into test values(1, '1');
Insert into test values(2, '2');
Insert into test values(3, '3');
Insert into test values(4, 'a');

create table TEST2
(
  ID    NUMBER,
  VALUE VARCHAR2(100)
);
Insert into test2 values(1, 'A');
Insert into test2 values(2, 'A');
Insert into test2 values(3, 'A');
Insert into test2 values(4, 'B');
commit;


Both of the following queries will be failed because the value of column "VALUE" in table test2 can't be transfered to a number when
test2.value='B'. Faire enough.

Select * From test2
Where Id In (Select Value From test);
ORA-01722: invalid number

Select t2.* 
From test2 t2, test t
Where t2.id = t.value;
ORA-01722: invalid number


If I put more selection conditions on the first one, it will work:
Select * From test2
Where Id In (Select Value From test);
And Value <>'B';


However, this "solution" doesn't work for the second one:
Select t2.* 
From test2 t2, test t
Where t2.id = t.value
And t2.value<>'B';

ORA-01722: invalid number

Could you please give me some explaination about this? why the first query can be passed if I add more condition but the second one can't?
Thank you

and Tom said...

You are trying to compare a number to a string.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:11504677087008


In order to do that, we MUST convert the string to a number. That is the only deterministic approach - since the number 1 could be represented in an unlimited number of ways in a string - 1.0, 1, +1, 1e0, 01.00, etc....

So, so far so good - that is easy to understand.

Now for the tricky part.

The query plan drives what happens - what happens and in what order things happen.
Select t2.* 
From test2 t2, test t
Where t2.id = t.value
And t2.value<>'B';



who is to say that "where t2.id = t.value" happens BEFORE or AFTER "t2.value <>'B'"



In your particular case - there is NO SAFE WAY to perform your queries. There is NOTHING in the table TEST to indicate when a value might be "a number".


Therefore, you have to convert TEST2.ID into a string - but beware that there are an unlimited number of formats for doing so.


In short - use numbers to store numbers, strings for strings, dates for dates.


You have a table with a string (test.value) and a table with a number (test2.id). They are - to me - simply NOT COMPARABLE.


Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

It can happen with binds too

Chuck Jolley, November 28, 2007 - 1:24 pm UTC

We found that this can happen with binds also.
A programmer here wrote a front end window that sent a bind in that always contained numeric data into a query against a varchar2 column that contained both numeric and character data.
But he created a bug by declaring the bound variable as numeric data type. (it was, after all, always numeric for this query)

This worked fine in 7.3
The Oracle SQL engine did the implicit conversion to char so that
where vc2_column = :numeric
became
where vc2_column = to_char(:numeric)

8.17 uncovered the bug
It turned
where vc2_column = :numeric
into
where to_number(vc2_column) = :numeric

Invalid number.
oops, the column contained non-numeric data too.

Why did Oracle change? I don't know. They probably were fixing something else or making the optimizer work better.
Our sloppy code was not their fault though.

Lesson:
If you need Oracle to compare two different types of data you need to take responsibility to explicitly convert the types yourself.

I guess that really applies to all programming, but it's worth repeating occasionally.

TO_CHAR

Yang, November 28, 2007 - 5:48 pm UTC

Yes I understand that number can not be compared with a Char and the using of "to_char" in my queries would be very helpful.

And also, I understand the order in which Oracle executes query is not necessary the same as the order in which we put criterion in query.

However, I still would like to know why the following query can be executed successfully, if "value <>'B'" is not executed BEFORE "id in (select value from test)" and test2.id is not comparable with test.value.

<code>
Select * From test2
Where Id In (Select Value From test);
And Value <>'B';


Thank you.</code>
Tom Kyte
November 29, 2007 - 8:28 am UTC

it would not be executed successfully if

id in (select value frmo test)

is evaluated before

where value <> 'B'

who said it would be?????


Yang, November 29, 2007 - 9:04 am UTC

Tom, maybe I didn't make it clear in my last post. Sorry. Let me try again.

According to my understanding:
1,
Select * From test2
Where Id In (Select Value From test);
And Value <>'B';

The reason why query 1 can be executed successfully is: "value <>'B'" is evaluated before "ID in (...)".

Now, what makes me confused is:
2,
Select t2.*
From test2 t2, test t
Where t2.id = t.value
And t2.value<>'B';

Why in this query "t2.value<>B" can not be evaluated before "t2.id = t.value" so that query 2 can be executed successfully? Thank you.

Tom Kyte
November 29, 2007 - 6:35 pm UTC

1) by luck, apparently - it was. Yes.

2) it could have been, no one said "IT CANNOT BE"

All we can say is "IT WAS NOT"

and you have no reason to expect it to have - you already agreed that there are no orders of operation, that is all you need to understand to understand how fragile BOTH of those queries are.

by luck?

Yang, November 29, 2007 - 8:00 pm UTC

Alright. The table design is not appropriate and queries are fragile, I agree with that.

Here I come up with another question which is not so technical:
When you said "by luck" or "it should have", do you mean Oracle evaluates queries in random way? If the evaluation(or execution) of query is not determined by fixed, scientific rules but luck, am I supposed to expect that query 1 could be failed or query 2 could be passed if I re-execute them enough times(hundreds or maybe thousands)?

If you are not happy with this "irrelevant" question, please just ignore it. Thanks a lot.
Tom Kyte
November 30, 2007 - 12:53 pm UTC

not randomly, no. of course not.

but, the optimizer is a "model", it takes LOTS of inputs, lot and lots of them, and then decides what to do.

So yes, a query that worked yesterday "by complete and utter accident" might not tomorrow because the plan changed.

Consider (using 9i for this trivial example)

ops$tkyte%ORA9IR2> create table t ( x int, str varchar2(10) );

Table created.

ops$tkyte%ORA9IR2> insert into t values ( 1, '1' );

1 row created.

ops$tkyte%ORA9IR2> insert into t values ( 2, 'x' );

1 row created.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select * from t where str = 1 and x = 1;

         X STR
---------- ----------
         1 1

ops$tkyte%ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA9IR2> select * from t where str = 1 and x = 1;
ERROR:
ORA-01722: invalid number


simple change in optimizer -> simple change in way query is processed -> query fails


you should never use an implicit conversion.

if you code explicit conversions, it will be obvious what MIGHT happen:


select * from t where to_number(str) = 1 and x = 1;
                      ^^^^^^^^^^^^^^ uh-oh, that could be a problem!


it is not that you would re-execute the query over and over - it is that something would change the plan and then BAM.


Re: by luck?

Michael, November 30, 2007 - 7:30 am UTC

Yang,

please forgive my answering your latest reply on this thread.

I have the impression that you mis-interpreted Thomas Kyte's last
message to you. He did not mean to say that Oracle arrived at a
query's execution plan by some process that included random decisions.

What he is telling you is that the optimizing process depends on a
great variety of information the optimizer has (or doesn't have) about
the tables involved in a query. Among those information is statistical
information (# of rows in a table, # of key values in an index, ....)
as well as structural info (presence of keys, constraints, ....).

Given the great number of different information the query optimizer
can utilize to decide which query exexution plan to use, there is no
way to explain this decision process. In fact, the entire issue is complex
enough to write a whole book about (Jonathan Lewis: Cost Based Optimizer
Fundamentals. A great book, but definitely not easy reading!)

This means that:

Given two different queries Q1 and Q2 that both contain an AND between two
conditions A and B, the optimizer may (for what reason soever) choose to
execute A before B in the execution plan for Q1, but B before A in the
execution plan for Q2.

Given the same query Q1, the optimizer may come up with different execution plans
at different point in time when the information the optimizer uses has changed
sufficiently. (like: At T1 your table holds 2000 rows, at T2 it containt 2,000,000 rows)

I guess the bottom line of what I am saying is that SQL queries describe WHAT you want
to retrieve, but not HOW the data is to be retrieved. Should you discover that a
certain query will only work if data is retrieved in a certain way, then ... it's rather
likely that you have discovered a flaw that will sooner or later come back to haunt you.

Let me add a last sentence: I am reading ASKTOM for almost 10 years now, I have had
the pleasure of attending a lecture Mr. Kyte was giving. Let me assure you that he
is one of the most gentle, well-mannered people I've ever met. He's a person who
makes fun of others easily. (well, not unless the mythological "u" comes back to
pull his strings). He's doing all this in addition to his regular job at Oracle.


All the best
Michael
Tom Kyte
November 30, 2007 - 1:38 pm UTC

... I guess the bottom line of what I am saying is that SQL queries describe WHAT
you want to retrieve, but not HOW the data is to be retrieved. ...


well said...

COOL

Yang, December 01, 2007 - 3:37 am UTC

Tom, your last example is really convincing. Now I totally understand. Thank you very much.

And Michael, your explanation also helps me a lot. Thank you.