Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Shimmy.

Asked: January 05, 2016 - 7:45 pm UTC

Last updated: February 26, 2016 - 5:08 am UTC

Version: 12.1.0.2

Viewed 10K+ times! This question is

You Asked

If I have to load a big XML files, containing millions of xml documents in single file that's more than 1GB, into an Oracle DB table, is the best way to do this is by utilizing JAVA SAX as described in
http://www.liberidu.com/blog/2008/07/11/howto-load-really-big-xml-files/
https://community.oracle.com/thread/463009?tstart=0

Or do we have any PL/SQL approach that would give us the same performance?

Thank you in advance.

Sample file looks like this
<u><b>test.xml</b></u>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader><MessageFormatNumber>1.0</MessageFormatNumber><MessageFormatVersionNumber>11</MessageFormatVersionNumber></MessageHeader></Interface>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader><MessageFormatNumber>2.0</MessageFormatNumber><MessageFormatVersionNumber>12</MessageFormatVersionNumber></MessageHeader></Interface>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader><MessageFormatNumber>3.0</MessageFormatNumber><MessageFormatVersionNumber>122</MessageFormatVersionNumber></MessageHeader></Interface>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader><MessageFormatNumber>4.0</MessageFormatNumber><MessageFormatVersionNumber>121</MessageFormatVersionNumber></MessageHeader></Interface>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader><MessageFormatNumber>5.0</MessageFormatNumber><MessageFormatVersionNumber>122</MessageFormatVersionNumber></MessageHeader></Interface>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader><MessageFormatNumber>6.0</MessageFormatNumber><MessageFormatVersionNumber>11</MessageFormatVersionNumber></MessageHeader></Interface>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader><MessageFormatNumber>7.0</MessageFormatNumber><MessageFormatVersionNumber>11</MessageFormatVersionNumber></MessageHeader></Interface>
.......


and Connor said...

How about as an external table ?


SQL> drop table t purge;

Table dropped.

SQL>
SQL> CREATE TABLE t (doc clob)
  2  ORGANIZATION EXTERNAL
  3  (
  4  TYPE ORACLE_LOADER
  5  DEFAULT DIRECTORY temp
  6  ACCESS PARAMETERS
  7  (
  8  FIELDS
  9  (doc CHAR TERMINATED BY '\n')
 10  )
 11  LOCATION ('xml3.dat')
 12  )
 13  REJECT LIMIT UNLIMITED;

Table created.

SQL>
SQL> select * from t;

DOC
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><MessageHeader

7 rows selected.

SQL>
SQL> drop table t1 purge;

Table dropped.

SQL> create table t1 ( x xmltype );

Table created.

SQL>
SQL> insert into t1 select xmltype(doc) from t;

7 rows created.

SQL>


Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Loading Big XML file

Shimmy, February 23, 2016 - 5:06 pm UTC

First of all, thank you for your help.

In the external file definition above, how big(SIZE) can the CHAR be "(doc CHAR TERMINATED BY '\n')"
When I defined it as 100,000 it did take it. Do you know whats's the max size limit for the CHAR declaration in this context? I tried to google, but did not get any relevant info. The full creation script is below...
CREATE TABLE stk_ext (doc clob)
   ORGANIZATION EXTERNAL
   (
    TYPE ORACLE_LOADER
    DEFAULT DIRECTORY EXT_DAT
    ACCESS PARAMETERS
    (
    FIELDS
    (doc CHAR(100000) TERMINATED BY '\n')
   )
   LOCATION ('test.xml')
   )
   REJECT LIMIT UNLIMITED;

Chris Saxon
February 24, 2016 - 1:03 am UTC

SQL> drop table t purge;

Table dropped.

SQL> CREATE TABLE t (doc clob)
  2     ORGANIZATION EXTERNAL
  3     (
  4      TYPE ORACLE_LOADER
  5      DEFAULT DIRECTORY TEMP
  6      ACCESS PARAMETERS
  7      (
  8      FIELDS
  9      (doc CHAR(2147483647) TERMINATED BY '\n')
 10     )
 11     LOCATION ('dept.ctl')
 12     )
 13     REJECT LIMIT UNLIMITED;

Table created.

SQL>
SQL> select * from t;

DOC
-----------------------------------------------------------------------------
xxx

SQL> drop table t purge;

Table dropped.

SQL> CREATE TABLE t (doc clob)
  2     ORGANIZATION EXTERNAL
  3     (
  4      TYPE ORACLE_LOADER
  5      DEFAULT DIRECTORY TEMP
  6      ACCESS PARAMETERS
  7      (
  8      FIELDS
  9      (doc CHAR(2147483648) TERMINATED BY '\n')
 10     )
 11     LOCATION ('dept.ctl')
 12     )
 13     REJECT LIMIT UNLIMITED;

Table created.

SQL>
SQL> select * from t;
ERROR:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-00554: error encountered while parsing access parameters
KUP-01006: error signalled during parse of access parameters
KUP-04029: numeric field overflowed



no rows selected


So power(2,31) looks like it.

Seperating Valid and Invalid XML records....

Shimmy, February 25, 2016 - 6:52 pm UTC

Hello,

I have a table with CLOB column. The table can have 500K+ records with each CLOB having more than 50K characters.

I need to select rows that are valid XML and another SQL to find rows that are not valid XML.
XMLIsValid function does not work for me as it looks like it needs a schema. It always returns 0, even for valid XML

CREATE TABLE CLOB_TEMP
(ID  NUMBER(10), DOC   CLOB);


INSERT INTO CLOB_TEMP VALUES
(1, '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><Header><FormatNumber>1.0</FormatNumber></Header></Interface>');
INSERT INTO STK_TEMP VALUES
(2, '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><Header><FormatNumber>2.0</FormatNumber></Header></Interface>');
INSERT INTO STK_TEMP VALUES
(3, '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><Header><FormatNumber>3.0</FormatNumber></Header></Interface>');
INSERT INTO STK_TEMP VALUES
(4, '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Interface><Header><FormatNumber>4.0</FormatNumber></Header><Interface>');

COMMIT;

I need to get rows 1,2 and 3 as valid and 4 as invalid.

What's the best way to get the info, other than create a custom user function like the one below

CREATE OR REPLACE FUNCTION ISXMLVALID_FUN(I_DOC CLOB)
RETURN NUMBER
AS
  xmldata XMLTYPE;
BEGIN
  xmldata := XMLTYPE(I_DOC);
  return 1;
EXCEPTION
  WHEN OTHERS THEN
    return 0;
END;
/


Chris Saxon
February 26, 2016 - 5:08 am UTC

I think your function is the probably the way to go. I dont know of any SQL-direct functions (that dont yield an error if the xml is not valid).

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here