Skip to Main Content
  • Questions
  • Oracle JDBC driver getDatabaseMetadata()["COLUMN_SIZE"] returns column size in bytes (not in characters) for "BYTE" length semantics

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Sam.

Asked: September 29, 2023 - 10:26 pm UTC

Last updated: June 26, 2024 - 1:49 pm UTC

Version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Viewed 10K+ times! This question is

You Asked

Description of getDatabaseMetadata()["COLUMN_SIZE"] in Javadoc:
"The COLUMN_SIZE column specifies the column size for the given column.
For character data, this is the ***length in characters***."

Scenario:
- Check that database encoding is multiple bytes per character.
- Create column with the definition VARCHAR2(4 BYTE).
- Try to insert 4 characters each of which is encoded as multiple bytes.
- DBMS correctly generates ORA-12899: value too large for column (actual: 8, maximum: 4)
- Issue: Oracle JDBC driver getDatabaseMetadata()["COLUMN_SIZE"] returns 4, should return 1.

Steps to reproduce:
select * from NLS_DATABASE_PARAMETERS where PARAMETER = 'NLS_CHARACTERSET';
-- NLS_LENGTH_SEMANTICS = BYTE
-- NLS_CHARACTERSET = AL32UTF8

create table T {
 C VARCHAR2(4 BYTE)
);

insert into T(C) values ('1234');
-- OK

insert into T(C) values ('яяяя');
-- SQL Error: ORA-12899: value too large for column "T"."C" (actual: 8, maximum: 4)

Write Java program where you get getDatabaseMetadata()["COLUMN_SIZE"].
-- It returns 4, should return 1.

and Connor said...

Some information from , our NLS expert Sergiusz Wolicki:

It is enough if the characters are ASCII (0x00-0x7f). Note that any column length is formally a length constraint, not a capacity. 4 means “up to 4 characters”, not “at least 4 characters”. A VARCHAR2(32767 CHAR) column cannot store 32767 characters if they are not all ASCII.

With byte length semantics, byte length constraint always equals char length constraint.

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Sam, January 31, 2024 - 3:20 am UTC

Sorry for late comment (missed the answer). I believe the answer doesn't answer the question. The question is about JDBC Java Driver from Oracle. It returns wrong metadata for maximum size of the column. According to JDBC specification, it should return maximum size of the column in characters but returns in bytes. The function is getDatabaseMetadata().getColumns("<column-name>")["COLUMN_SIZE"], column maximum size is defined using byte semantics.
Sergiusz Wolicki
June 24, 2024 - 6:40 pm UTC

The answer is correct. For columns declared with byte length semantics, the maximum number of bytes that can be inserted is the same as the maximum number of characters that can be inserted. If you can insert 4 bytes, you can also insert 4 characters (provided they are all from the ASCII repertoire). As said, column size in Oracle is a column's length constraint, not a column's capacity. The constraint means "up to four characters". It does not mean it is guaranteed that *any* 4 characters can be stored.

We cannot return 1 because you can insert 4 characters. Just not any 4 characters.

You can check the data dictionary column ALL_TAB_COLUMNS.CHAR_LENGTH to confirm that the column character length is 4 in your example.

If you want the capacity in characters, declare the column with character length semantics VARCHAR2(n CHAR) and do not exceed 1000 characters (max_string_size=standard) or 8191 characters (max_string_size=extended).

Note that even then, there will be confusion between the number of Java UTF-16 char's and Oracle AL32UTF8 characters when processing supplementary Unicode characters. If you read 10 supplementary characters from a VARCHAR2(10 CHAR) column, you will get a String of length()=20 (=10 surrogate pairs).


A reader, June 25, 2024 - 5:56 am UTC

The causes of confusion are official definitions both saying 'column size in characters'.

https://github.com/openjdk/jdk/blob/master/src/java.sql/share/classes/java/sql/DatabaseMetaData.java
The COLUMN_SIZE column represents the specified column size for the given column. For character data, this is the length in characters.

https://docs.oracle.com/en/database/oracle/oracle-database/18/refrn/ALL_TAB_COLUMNS.html
CHAR_LENGTH - Displays the length of the column in characters.

I don't know from which official sources ChatGPT got these statements saying that the answer in this topic is correct:
https://chatgpt.com/share/958978b6-96ba-41fd-9065-ef15515547e2
Connor McDonald
June 26, 2024 - 2:33 am UTC

As Sergiusz said

"column size in Oracle is a column's length constraint, not a column's capacity"

We *could* indeed store 4 characters in a column defined as CHAR_LENGTH of 4, but not necessarily for every character.

But I'll file a request with the docs team to see if we can some more information in the docs, because its obviously an area where a lot of confusion can take place.

Do not use BYTE semantics

Peter G, June 25, 2024 - 10:57 am UTC

Hi Sam,
I think you should just stop using BYTE length semantics.
Highly unfortunately, this is still the default when creating a new database.
BR
Peter

Sam, June 26, 2024 - 4:52 am UTC

Thanks to everybody for continuing this topic and propositions to improve docs in this area. And I'm sorry if I write something obvious or stupid.

> I think you should just stop using BYTE length semantics.
> Highly unfortunately, this is still the default when creating a new database.

That's why now we use explicit CHAR in column definition. We hope that, for CHAR semantics, size is capacity (maybe except these scary surrogate pairs), not only "no more than" constraint. Otherwise how you will explain to customer that their string of 10 characters long could not be stored in column with definition VARCHAR2(10 CHAR)?

"column size in Oracle is a column's length constraint, not a column's capacity" - how then to write SQL condition that is true if and only if value could be assigned to column? Probably, length in bytes is capacity, not constraint, and we could write LENGTHB(VALUE) <= ALL_TAB_COLUMNS.DATA_LENGTH? Oracle doc says that DATA_LENGTH is "length of the column (in bytes)".
Chris Saxon
June 26, 2024 - 1:49 pm UTC

You'll get an exception if the value doesn't fit, e.g.:

create table t ( c1 varchar2 (1 byte ) );

insert into t values ( 'a' );
--1 row inserted.
insert into t values ( 'â' );
--ORA-12899: value too large for column "CHRIS"."T"."C1" (actual: 2, maximum: 1)
insert into t values ( 'aa' );
--ORA-12899: value too large for column "CHRIS"."T"."C1" (actual: 2, maximum: 1)


If you want to precheck the data you could use a condition like you propose; there's no need to though.

Sam, August 04, 2024 - 7:31 pm UTC

> If you want to precheck the data you could use a condition like you propose; there's no need to though.
I need this when inserting a lot of records - I execute one INSERT (without prechecks); if it fails, execute one SQL to insert only 'good' records and another SQL to log 'bad' records. Doing this record by record appeared too slow.

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database