Skip to Main Content
  • Questions
  • XML Parse error due to escape characters

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Praveen.

Asked: August 18, 2016 - 4:17 pm UTC

Last updated: September 08, 2016 - 1:17 pm UTC

Version: Oracle database 12C

Viewed 10K+ times! This question is

You Asked

Hi Tom,

We have a process wherein we try to consume the XML and store it in a table as CLOB . These XMLs which we are getting are having the escape characters like '&' or '<' in the text part of the XML.

When we try to parse the XML using XMLtable it throws the below error . Is there any method/mechanism/workaround using which we can overcome the below error. The only thing is that we cannot change the data from source.

This is what we have done :


create table test_xml (trd_pyload clob );

insert into test_xml values ('<ROWSET>
 <ROW>
  <SHD_ID>1</SHD_ID>
  <AMP_TEXT>XYZABC & ASSOC</AMP_TEXT>
 </ROW>
</ROWSET>');
COMMIT;


SELECT *
  FROM test_xml
       INNER JOIN
       XMLTABLE (
          '/ROWSET/ROW'
          PASSING XMLTYPE (TRD_PYLOAD)
          COLUMNS SHD_ID NUMBER PATH 'SHD_ID',
                  AMP_TEXT VARCHAR2 (250) PATH 'AMP_TEXT')
          ON 1 = 1;


ORA-31011: XML parsing failed
ORA-19213: error occurred in XML processing at lines 4
LPX-00242: invalid use of ampersand ('&') character (use &amp;)
ORA-06512: at "SYS.XMLTYPE", line 272
ORA-06512: at line 1


and Chris said...

There's a couple of options.

You can convert ampersands, less than and greater than to &amp; &lt; and &gt; respectively.

Or you can wrap text that may contain escape characters in <![CDATA[ ]]> tags.

It's easiest to change this in the source data if you're able to. But if you're not it's still possible on the way out.

With ampersands you could do a straight replace - provided you know it isn't used to escape characters!

set define off
create table test_xml (trd_pyload clob );

insert into test_xml values ('<ROWSET>
 <ROW>
  <SHD_ID>1</SHD_ID>
  <AMP_TEXT>XYZABC & ASSOC</AMP_TEXT>
 </ROW>
</ROWSET>');

SELECT amp_text
  FROM test_xml,
       XMLTABLE (
          '/ROWSET/ROW'
          PASSING XMLTYPE (replace(TRD_PYLOAD, '&', '&amp;'))
          COLUMNS SHD_ID NUMBER PATH 'SHD_ID',
                  AMP_TEXT VARCHAR2 (250) PATH 'AMP_TEXT');

AMP_TEXT        
XYZABC & ASSOC 


Greater than and less than are trickier. You'd need to figure out which are genuine tag markers and which aren't. It's easier to wrap the offending tag contents in CDATA tags.

You can do this using regexp_replace.

Use the following regex to find the tags you know could have escape characters:

<AMP_TEXT>(.+)<\/AMP_TEXT>

And the following replacement string:

<AMP_TEXT><![CDATA[\1]]></AMP_TEXT>

\1 is a back reference to the text you matched between the tags.

Put together this is:

DELETE test_xml;

insert into test_xml values ('<ROWSET>
 <ROW>
  <SHD_ID>1</SHD_ID>
  <AMP_TEXT>XYZABC & <ASSOC></AMP_TEXT>
 </ROW>
</ROWSET>');
COMMIT;

SELECT amp_text
  FROM test_xml,
       XMLTABLE (
          '/ROWSET/ROW'
          PASSING XMLTYPE (
            regexp_replace(TRD_PYLOAD, 
              '<AMP_TEXT>(.+)<\/AMP_TEXT>',
              '<AMP_TEXT><![CDATA[\1]]></AMP_TEXT>'
            )  
          )
          COLUMNS SHD_ID NUMBER PATH 'SHD_ID',
                  AMP_TEXT VARCHAR2 (250) PATH 'AMP_TEXT');

AMP_TEXT          
XYZABC & <ASSOC>  

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

XML Encoding/Decoding

Vishnusimman Manivannan, August 29, 2016 - 2:09 pm UTC

Hi Tom,

Thanks for the update.. I have question though... I'm having a similar problem and we do create projects/orders from External vendors by getting data in XML. But here the xml structure is so big and adding the above tag to the entire xml structure is pretty bit time consuming... Is there any other way to apply it generic by passing the whole xml.
Chris Saxon
August 30, 2016 - 1:32 am UTC

XML Encoding/Decoding

Vishnusimman Manivannan, September 08, 2016 - 4:33 am UTC

Hi Tom,

Thanks for the update. Sorry for the delay in response. Please let me know if i need to ask this question in a new topic. Currently i'm using DBMS_XMLGEN.CONVERT utitlity to encode and decode XML's with special characters. The problem is in the xml input currently we know which tags can get Special Characters and we doing encode of it when we generate child XML from main XML files. But the problem is i ended up added dbms_xmlgen.convert utility in almost 50% of the fields. I thought if there could be a generic way of applying it to the whole xml. Below example is illustrates my problem. Is there any other way or utility which i can use to apply for the whole xml instead of specific tags?



create table xmlTestData(xmldata XmlType,xmlClob CLOB);

insert into xmlTestData Values (null,'<TEST><FIRST_NAME>Vishnu</FIRST_NAME><LAST_NAME>Manivannan<LAST_NAME><COMPANY>L '||chr(38)||' T</COMPANY></TEST>');

select * from xmlTestData;

update xmlTestData
set xmlData = xmlType(xmlClob);
/*Error report -
SQL Error: ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00242: invalid use of ampersand ('&') character (use &amp;)
Error at line 1
ORA-06512: at "SYS.XMLTYPE", line 272
ORA-06512: at line 1
31011. 00000 -  "XML parsing failed"
*Cause:    XML parser returned an error while trying to parse the document.
*Action:   Check if the document to be parsed is valid.*/
update xmlTestData
set xmlData = xmlType(DBMS_XMLGEN.CONVERT(xmlClob,0));
/*
Error report -
SQL Error: ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00210: expected '<' instead of 't'
Error at line 1
ORA-06512: at "SYS.XMLTYPE", line 272
ORA-06512: at line 1
31011. 00000 -  "XML parsing failed"
*Cause:    XML parser returned an error while trying to parse the document.
*Action:   Check if the document to be parsed is valid.
*/
select DBMS_XMLGEN.CONVERT(xmlClob,0) from xmlTestData;
--Output comes as below 
&lt;TEST&gt;&lt;FIRST_NAME&gt;Vishnu&lt;/FIRST_NAME&gt;&lt;LAST_NAME&gt;Manivannan&lt;LAST_NAME&gt;&lt;COMPANY&gt;L &amp; T&lt;/COMPANY&gt;&lt;/TEST&gt;


Connor McDonald
September 08, 2016 - 1:17 pm UTC

Well, here's the issue... you dont really have a valid XML to start with.

For the XML to valid, either the ampersands (and other chars) are *already* escaped, OR the entire string needs to be escaped (which is where DBMS_XMLGEN.CONVERT comes into play).

So I think you'll need to roll your own solution here, for example

SQL> select replace(xmlClob,'&','&amp;') from xmlTestData;

REPLACE(XMLCLOB,'&','&AMP;')
--------------------------------------------------------------------------------
<TEST><FIRST_NAME>Vishnu</FIRST_NAME><LAST_NAME>Manivannan<LAST_NAME><COMPANY>L
&amp; T</COMPANY></TEST>


but you'll need to be very careful... because there might be other VALID xml content, eg

SQL> select XMLCLOB from xmlTestData;

XMLCLOB
-------------------------------------------------------------------------------------------------------
<TEST><FIRST_NAME>Vishnu</FIRST_NAME><LAST_NAME>Manivannan<LAST_NAME><COMPANY>L & T</COMPANY></TEST>
<TEST><FIRST_NAME>Connor</FIRST_NAME><LAST_NAME>McDonald<LAST_NAME><COMPANY>L &lt; T</COMPANY></TEST>


I need to make sure that I *dont* escape the correct &lt; so things get more complicated, eg notice the check for a space.

SQL> select replace(xmlClob,'& ','&amp; ') from xmlTestData;

REPLACE(XMLCLOB,'&','&AMP;')
-----------------------------------------------------------------------------------------------------------
<TEST><FIRST_NAME>Vishnu</FIRST_NAME><LAST_NAME>Manivannan<LAST_NAME><COMPANY>L &amp; T</COMPANY></TEST>
<TEST><FIRST_NAME>Connor</FIRST_NAME><LAST_NAME>McDonald<LAST_NAME><COMPANY>L &lt; T</COMPANY></TEST>


More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here