Skip to Main Content
  • Questions
  • How to allow special characters to store in DB

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Bala.

Asked: June 06, 2004 - 6:47 pm UTC

Last updated: April 12, 2011 - 12:20 pm UTC

Version: 8.1.7

Viewed 100K+ times! This question is

You Asked

Hi Tom,


Many thanks for your help.

I need to store special characters like single quote ('), double quote (''). / etc.

In the input, there will be only one quote. But to store single quote, we need to give two quotes.

Is there any way, Oracle will allow to store special characters to do the job simpler.

If i define the column as CLOB, then i can put text which even contains special characters with out much effort.


But with VARCHAR2, how to i achieve storing the special characters in DB column.



Thanks



and Tom said...

you need to use bind variables in your code. Then there are no special characters.

Your code should have:


insert into t values ( :x );

or

insert into t values ( ? );

and you bind the value in. In sqlplus you can (for one off statements) use ''

insert into t values ( 'How''s about this, two quotes '' are really one' );

each '' is really '



Rating

  (35 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

How do achieve this in Java (via JDBC)

Bala, June 07, 2004 - 12:07 pm UTC

Hi tom,

Thanks a lot.

I need to achieve this handling special characters from Java. The sql insert statement is embedded in JDBC code.

I just want to confirm that, if i use prepared statements (one must use in Java), the actual implementation of Bind variables in Java, then it will automatically handle storing the special characters in DB VARCHAR2 columns.


Is my understanding correct.


Thanks a lot.

Tom Kyte
June 07, 2004 - 1:30 pm UTC

the two most important words you need to know as a java developer doing jdbc coding:

*bind variables*


Use prepared statments.

FORGET that the statement class even exists.

binding removes the need to escape, yes.

How to allow storing and retrieving special characters

Shaun White, July 08, 2004 - 2:50 pm UTC

How do I keep/load special characters using sqlldr. When loading data with special characters to varchar2 and clob columns, the data in both columns invalidate the special chacacters or atleast they are not readable just empty squares returned.

Tom Kyte
July 08, 2004 - 3:53 pm UTC

set your nls_lang properly to match the character set in your database.

special characters

mike, September 22, 2004 - 6:32 pm UTC

Tom:

When a user enters a special character in <texarea> field in HTML form as:

"Rule 1" Wait 1 hour “Finalize book”

And I read back from oracle using sql*plus “select” it has the upside down question mark:

"Rule 1" Wait 1 hour ¿ Finalize book¿
When I read back to the web form it looks fine.

Is there any special coding for mod_plsql that I have to do for C.R.U.D for the special characters.

The instance is using WE8ISO8859P1 for character set.

Thanks,



Tom Kyte
September 22, 2004 - 7:40 pm UTC

seems they might not be using characters that are in the we8iso8859p1 character set.

funny, in order to put that back into a textarea, you'd have to use htp.escape_sc or something equivalent -- that only escapes real "quotes" with &quot; -- what does the actual html show for these special characters.

A reader, September 23, 2004 - 5:35 am UTC


special characters

mike, September 23, 2004 - 7:31 pm UTC

Tom:

The HTML looks like this (without using escapes_sc) looks like IE does that automatically:

On Edit Form:

<TD><TEXTAREA name="i_item_description" cols=39 rows=3 WRAP="VIRTUAL">&quot;Rule 1&quot; Wait 1 hour ?Finalize book?</TEXTAREA></TD>

On print form.

<TD>
"Rule 1" Wait 1 hour ?Finalize book?
</TD>

1. How can i verify that we8iso8859p1 character
set does not support that character?

2. How can I fix this issue? If it is character set do I have to change it to UTF8? Would that be easy to do?

Thank you,


Tom Kyte
September 24, 2004 - 10:29 am UTC

IE cannot do that automagically.

You must be doing that.

<form action=foo>
<textarea name="i" cols=39 rows=3 wrap="virtual">
"Rule 1" Wait one hour
</textarea>
</form>


put that into a file and open it. You'll see that view source in IE will be something like exactly the same. You must be escapeing that somewhere. (ie 6.0.26)

If IE did that automatically, could do that automatically -- there would be no need to escape special characters. YOU NEED to be very careful -- what if something puts in the rule

"And then you shall enter </textarea>"

And you do not escape <> as &lt; and &gt; ???? oh sad day -- be careful with that on all forms...


in sqlplus dump that character -- see what "ascii code" it is.

You can refer to

Article-ID: <Note:14786.1>
Alias: OLS:106056.069
Circulation: PUBLISHED (EXTERNAL)
Folder: server.Globalization
Topic: Reference - Character Set Definitions
Title: Definition of the WE8ISO8859P1 Character Set


on metalink -- but specially there is a gap:


127 7F 127 XE33 DEL (delete, rubout) Control character
<nothing>
160 A0 155 SP30 No-break / required space Punctuation






Similar isssue and how I solved

Meyer Tollen, September 24, 2004 - 6:51 am UTC

I have a MOD/PLSQL HTMLDB application...

Had the same "intermittent" problem...found out:

When users pasted into the text area from MS-WORD or MS-OUTLOOK (Microsoft Office Programs), I would get the upside down Question mark you mention instead of quotes when I printed their data back to the screen...MS uses a special char and their quotes...

However, if they typed quotes into the textarea from their keyboard there was no issue....it only happened when pasting from microsoft office apps.

Test it again...Is that what you are having?

If yes, this will fix it for if you do this on the insert...

replace(P_TEXTAREA_NAME,'¿','''');

FYI it is ASCII #191

Hope that solves it for you,
Meyer

character set

mike, September 24, 2004 - 10:31 am UTC

Meyer:

I do not have a problem with regular quote. The problem comes with "reverse quote" (not sure what is called but it ia the last set in this sentence.

"Rule 1" Wait 1 hour “Finalize book”

Fixing it using SQL*plus does not solve future problems. Keep checking what special characters the user entered and then fixing it is not very exciting.



Tom Kyte
September 24, 2004 - 11:43 am UTC

well, since you have to fix your code anyway (if you are not escaping special characters, you have a horrible bug in the first place)

So, you could

a) use sqlplus to 'fix' existing data
b) either use a trigger to :new.string :=
translate(:new.string,chr(NNN)||chr(MMM),'''''...' );

where NNN, MMM are the 'bad guys' that word is injecting.

Might not be exciting, but neither is probably 1/2 of what we have to do day to day to keep things working right.

Giridhar, January 10, 2005 - 6:47 am UTC

Hi tom,
i have similar problem. User entered special character RENE(E acute uppercase) in Mainframes. we get the files ftped from Mainframes to oracle. when the data is inserted in our datbaase, its getting translated as REN?.
Our NLS settings are as follows:
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET WE8ISO8859P1
NLS_NCHAR_CHARACTERSET AL16UTF16

May i know what went wrong?
Thanks in advance.
Giridhar

Tom Kyte
January 10, 2005 - 8:59 am UTC

No idea, knowing the database character set isn't enough.

who/what loads this
how is the data transfered.
what is the client nls settings when you load
is that E acute uppercase character supported in we8iso
is the E acute uppercase in fact still there after you transfer from mainframe before you load it...


Web Characters

Martin Guillen, May 15, 2005 - 5:50 pm UTC

Tom: I have an we8iso8859p1 database with an accented character on it:
SQL> select * from nls_database_parameters where parameter = 'NLS_CHARACTERSET';

PARAMETER        VALUE
---------------- ------------
NLS_CHARACTERSET WE8ISO8859P1

The client NLS_LANG is AMERICAN_AEMERICA.WE8ISO8859P1 too:
SQL> select * from tilde;

A
----------
ñ

But when I query the data with the mod_plsql module via this test procedure:
SQL> ed
Wrote file afiedt.buf

  1  create or replace
  2  procedure test
  3  as
  4     l_test tilde.a%type;
  5  begin
  6     htp.htmlopen;
  7             htp.headopen;
  8             htp.headclose;
  9             htp.bodyopen;
 10                     select * into l_test from tilde;
 11                     htp.p(l_test);
 12                     htp.br;
 13                     htp.print(owa_util.get_cgi_env('REQUEST_CHARSET'));
 14             htp.bodyclose;
 15     htp.htmlclose;
 16* end;
SQL> /

Procedure created.

The browser shows:
¢
WE8ISO8859P1

So it yous then character '¢'...
The browsers used to test it (ie and firefox) use western (iso-8859) encoding.
The dad is configured with:
PlsqlNLSLanguage AMERICAN_AMERICA.WE8ISO8859P1
So all seems to be using the same character set.
Why the browser don't you what is on the database?
Thank you,
Martin. 

Tom Kyte
May 15, 2005 - 8:01 pm UTC

but what characterset is mod_plsql having in its NLS_LANG? the app server has what environment set up?

Web Characters

Martin Guillen, May 16, 2005 - 1:53 am UTC

Tom:
Application servers (Oracle HTTP server 10g 9.0.4 for Linux and Windows) have the same settings. For example on Linux:
oracle@star:~> export NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
oracle@star:~> env | grep NLS
NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
XNLSPATH=/usr/X11R6/lib/X11/nls
oracle@star:~> /opt/oracle/product/oas904/opmn/bin/opmnctl startall
opmnctl: starting opmn and all managed processes...
oracle@star:~>
Same results:
¢
WE8ISO8859P1
Any way opmnctl script sets NLS_LANG to AMERICAN_AMERICA.WE8ISO8859P1.

ñ is chr(241) in the browser (i've tested it) with:
create or replace
procedure test2
as
begin
htp.htmlopen;
htp.headopen;
htp.headclose;
htp.print(chr(241));
htp.bodyclose;
htp.htmlclose;
end;
Also the mod_plsql sets nls parameters, this is an sql_trace(snipped) done with a logon trigger:
=====================
PARSING IN CURSOR #1 len=67 dep=0 uid=149 oct=42 lid=149 tim=353136398337 hv=107578029 ad='67da9f68'
alter session set nls_language= "AMERICAN" nls_territory= "AMERICA"
END OF STMT
PARSE #1:c=0,e=743,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=4,tim=353136398320
EXEC #1:c=0,e=155,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=353136400258
=====================


Tom Kyte
May 16, 2005 - 7:58 am UTC

that is just the language at the bottom, nothing about the character set.


sorry -- if the NLS_LANG is consistently set everywhere across the board, we just transfer bits and bytes from place to place and don't touch them.


Are you saying that when you htp.print(chr(241)) you get the right thing?

print out the DUMP() of the string too if so:

9 htp.bodyopen;
10 select * into l_test from tilde;
11 htp.p(l_test);
htp.br;
select DUMP(column) into l_string from tilde;
htp.p(l_string)
12 htp.br;
13 htp.print(owa_util.get_cgi_env('REQUEST_CHARSET'));
14 htp.bodyclose;

Web Characters

Martin Guillen, May 16, 2005 - 11:32 am UTC

This is the procedure:
SQL> ed
Wrote file afiedt.buf

  1  create or replace
  2  procedure test3
  3  as
  4     l_test tilde.a%type;
  5     l_test2 varchar2(30);
  6  begin
  7     htp.htmlopen;
  8             htp.headopen;
  9             htp.headclose;
 10             htp.bodyopen;
 11                     select * into l_test from tilde;
 12                     htp.p(l_test);
 13                     htp.br;
 14                     select dump(a) into l_test2 from tilde;
 15                     htp.p(l_test2);
 16                     htp.br;
 17                     htp.print(chr(241));
 18             htp.bodyclose;
 19     htp.htmlclose;
 20* end;
SQL> /

Procedure created.

SQL>

And the result:
¤
Typ=1 Len=1: 164
ñ

SQL> select tilde.a, chr(164), chr(241) from tilde;
A          C C
---------- - -
ñ          ñ ±

Do you show especial characters in this web site?
 

Tom Kyte
May 16, 2005 - 1:14 pm UTC

the dump is showing that character set translation took place between the database and the plsql routine. I believe the nls_lang for the mod_plsql module is not what you believe it to be.


</code> http://docs.oracle.com/docs/cd/B14099_06/web.1012/b14007/confmods.htm#sthref884 <code>

check the PlsqlNLSLanguage setting

Web Characters

Martin Guillen, May 16, 2005 - 1:48 pm UTC

dads.conf has:
PlsqlNLSLanguage AMERICAN_AMERICA.WE8ISO8859P1
In fact if I change that parameter it gets reflected in:
htp.print(owa_util.get_cgi_env('REQUEST_CHARSET'))
when the procedure is executed.
Thank you Tom. I will searh a little more in the web to see if someone had the same problem.
Thank you again,
Martin.

Special characters

Linda Cotto, October 07, 2006 - 7:44 pm UTC

I have a datafile that was created as
/cust/oravl22//ORACLE/PCTIOL/dbf/refwaitdat.dbf
Now the file_name is showing the two slashes after the custoravl22.
The dbf was stored in the /cust/oravl22/ORACLE_PCTIOL/dbf/refwaitdat.dbf

When trying to resize it gives me a:
ORA-01511: error in renaming log/data files
ORA-01516: nonexistent log file, datafile, or tempfile
"/cust/oravl22//ORACLE/PCTIOL/dbf/refwaitdat.dbf"

I see this tablespace and datafile through the console also.


Tom Kyte
October 07, 2006 - 7:46 pm UTC

can you offline the tablespace and rename the datafile - does that help?

special character

sam, November 08, 2008 - 12:08 am UTC

Tom:

Is there something special I have to do to store/retrieve special characters from a web form using mod_plsql.
The DB is we8 characterset.

I assume user can key in whatever diacritic or symbol using his keyboard

http://www.utexas.edu/learn/html/spchar.html

and oracle will save it. is this correct?

Now on the way out from DB to web page, I have to use escpae_SC and it the diacritic will be displayed


Tom Kyte
November 11, 2008 - 2:51 pm UTC

they can key in whatever characters are supported by your we8iso... character set.

For example, if I paste in MS Word 'smart' (also known as dumb) quotes, you'll see this (since my database is not a WINDOWS characterset, but the we8iso... one)

This is `the so called smart quotes¿ in action


when I pasted that - it looked "normal", after I submit - it did not - why? because my database doesn't support those characters - I didn't create it to support those characters.

special chars

A reader, November 12, 2008 - 6:49 pm UTC

Tom:

Are the diacritics here supported by we8iso?

http://www.utexas.edu/learn/html/spchar.html

When i retrieve those from DB to html for do i have to escape it using escpae_sc so they display OK.
Tom Kyte
November 14, 2008 - 4:15 pm UTC

like demonstrated above, smart quotes - no

special

A reader, November 13, 2008 - 2:21 pm UTC

Tom:

Actually iwant the XML to display the ASCII quivalent for each special charcter. how woudl you do that without escaping it.
Tom Kyte
November 14, 2008 - 4:50 pm UTC

I would

a) learn everything about charactersets, especially the ones I'm using
b) learn everything about the character set I'm using in XML

so that I can successfully consider all of the issues I'll encounter when developing my code.

escaping entities is going to be UNAVOIDABLE in XML - it supports what it supports.

Think about it, what would you do with < or > in XML? You have to escape many things, many many things

retrieve special symbols

A reader, November 15, 2008 - 2:35 pm UTC

Tom:

I think XML has a special set of characters and diacritic symbols that cannot be used in normal XML strings.
Some of these characters a

& - &amp;
< - &lt;
> - &gt;
" - &quot;
' - '

For example, the following XML string is invalid:

<Organization>Oracle & Microsoft</Organization>

Whereas the following is valid XML:

<Organization>Oracle &amp; Microsoft</Organization>

Users enter many book title descriptions that include diacritics (using alt key and numeric combination &#xxxx). Oracle 9i saves those titles in DB.

I want to generate an XML file from DB that is used by another desktop publishing software which parses this XML and fails if the diacritic is printed instead of the "&#xxxx". I think i need to print the numeric character reference for those diacritics. correct?

Is there a special function in Oracle for that or how would you do it.
Tom Kyte
November 18, 2008 - 6:48 pm UTC

.... I think XML has a special set of characters and diacritic symbols that cannot
be used in normal XML strings.....

ding, ding, ding - you win. Yes, that is absolutely true and absolutely why I wrote:


I would

a) learn everything about charactersets, especially the ones I'm using
b) learn everything about the character set I'm using in XML

so that I can successfully consider all of the issues I'll encounter when developing my code.

escaping entities is going to be UNAVOIDABLE in XML - it supports what it supports.

Think about it, what would you do with < or > in XML? You have to escape many things, many many things



You no longer have to "think" that, you can rest assured, it is true.


Have you read the XML database developers guide? We do that - yes...

otn.oracle.com - has all of the documentation for your version.

xml

A reader, November 19, 2008 - 12:18 am UTC

Tom:

so there is not simpy a function that converts the special character in my oracle 9i weiso8859 database to to the XML entiry reference (&3nnnn) or unicode?

I need to read 26 chapters of this to figure it out? Is it that difficult.

http://download.oracle.com/docs/cd/B10501_01/appdev.920/a96620/toc.htm
Tom Kyte
November 21, 2008 - 5:44 pm UTC

aren't you sort of wanting to know about XML and how to process XML and what is available to you in XML and the functionality in the database to process XML?

You should be dying to read that manual - not complaining.


You are most interested in chapter 10 (a quick skim would have told you that - "Generating XML Data from the Database"

ops$tkyte%ORA11GR1> select xmlelement( "myTag", 'howdy <doody>' ) from dual;

XMLELEMENT("MYTAG",'HOWDY<DOODY>')
-------------------------------------------------------------------------------
<myTag>howdy &lt;doody&gt;</myTag>



Now, imagine the possibilities - read that chapter.

Yes, sometimes you have to read in order to learn.

xml

A reader, November 21, 2008 - 8:41 pm UTC

Tom:

Yes that is the chapter i was reading and I did notice a mention that XMLELEMENT does escape and convert to unicode.

Does this mean that every diacritic or special charater stored in in the We8ISO oracle database will be converted to the unicode equivalent character ( i believe this is the same number user enters with ALTnnn) on the keyboard unless i am wrong.
Tom Kyte
November 24, 2008 - 4:32 pm UTC

it outputs well formed XML - if that means having to escape something, sure, it'll do it.

You really need to read

Frank van Bortel, November 24, 2008 - 5:58 am UTC

First, escape_sc would have done the trick and was mentioned before.
Second - Unicode and MicroSoft Windows is not the same. "Special" characters, being entered using ALT+numeric keybord is just the entry of the current codepage - usually cp850.
Third: why does everybody insist on using WE8ISO in a web environment with "special" (read MS coded) characters?
Just stick to WE8MSWIN1252.
Last: 99% of these problems are not. They are just the incapability of the program used, to display the correct characters.
CLI users of SQL*Plus should excersise "chcp 1252" before starting SQL*Plus, and select Lucinda Console for lettertype - lots of "problems" gone...

How to allow special characters to store in DB

sachin, November 24, 2008 - 8:03 am UTC

I could not understand, what you are trying to say?
Tom Kyte
November 24, 2008 - 7:24 pm UTC

if you are commenting to Frank - I believe he is trying to say

"you know, if you all took time to understand charactersets - something you you will have to deal with, in fact currently have to deal with - you'd be a lot better off than saying 'special characters' - since in reality there is no such concept as 'special characters'."

He is trying to say "understand the problem, then the solutions are rather easy, think about the characterset you choose carefully"

A reader, November 24, 2008 - 7:43 pm UTC

Tom:

I am little confused.

can you explain how the charcter set code thing works from the time you insert from a client screen using windows keyboard to the time it goes to Oracle DB until it is retreived into from database into XML file? The XML file will be generated over the web so that user can do view/source and save the file and import it into other deskptop publishing software to generate a catalog.

basically, When user types ALT1234 for a diacritic/special on the client how is it saved in the WE8ISO8859 Database? Then how do you generate an XML file for this code in WE8 or unicode.

Do i use XMELEMENT or CONVERT function?
Tom Kyte
November 24, 2008 - 8:05 pm UTC

you really want to read this:

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/toc.htm

basically

if a client has the same character set as the database
then
   NO CONVERSION TAKES PLACE in either direction
else (if they differ)
then
   upon insert the data from the client is converted into the characterset 
   of the database, ISO tells us the rules to map data from characterset A 
   into characterset B

   upone retrieval, the data from the database is converted into the characterset
   of the client, ISO .......
end if;


So, that describes the client -> database -> client


Now, enter XML. XML uses an encoding scheme (you can read all about that, 10's of thousands of sources on XML - suggestion: get a book on XML if you are going to be doing lots of XML, learn about that thing you are going to use)

So, when creating the XML stuff - we encode what we must, following the documented rules of XML



Reader: start reading

Frank van Bortel, November 27, 2008 - 11:25 am UTC

Tom, I was kind of saying that "a reader" should do some reading, as you also suggested.

To followup on your character conversions (and make things more difficult) consider code points. Let's take a special character as the Euro symbol.

Many MicroSoft Windows implementations allow that symbol to be generated by pressing the ALT key, and the keys 0, 1, 2 and 8 of the numeric keypad. If you use the correct codepage, that is - and here's the first source of special character problems: how would a double-byte Chinese version of before mentioned OS represent that symbol? Not as a euro-symbol. Not a database in sight.

Just a client representation failure. Set the correct code page, and you might see the correct symbol.

Now, let's expand the model with an Oracle client and database implementation, but rollback on the OS - assume similar OS's are being used on the input as well as the output side.

MicroSoft uses 0x80 (128 decimal) as code point for the Euro symbol (at least in West European cope pages). So, the oracle client gets offered 0x80.
Now, assume the Oracle client uses WE8MSWIN1252, and
your database uses WE8ISO8859P15 (which it should - P1 does not support the euro symbol), then 0x80 will be translated to 0xA4 - the code point for the Euro Symbol in WE8ISO8859p15.

Upon retrieval, 0xA4 gets translated back to whatever the client uses - if that is NOT a characterset, that maps 0xA4 back to 0x80, I hope you understand, you will have a problem.

If you have a program, that does not understand how to represent 0x80, you have yet another problem (Command Line interfaces - CLI, or "DOS-box", often suffer from this).

I hope this clarifies some of this -difficult- matter.

BTW. this also explains why some scenarios will happily store and display special characters: namely when there's no conversion at all - the scenario Tom uses...

Oh - and my CLI shows a question mark when using ALT+0128, unless I specify the CLI to use Lucinda Console; the standard character set does not know how the Euro symbol.

xml

A reader, January 13, 2009 - 12:21 am UTC

Tom

to create a valid xml file from data in oracle database with ISO88591 do you write the code to convert every single special character or symbol to the equivalent html reference or iso or unicode reference per this list.

http://www.w3schools.com/tags/ref_entities.asp

would you store all of these in a table and just replace each occurence of the character with equivalent reference entity.

The file will be eventualy will be imported by desktop publishing software to generate a catalogue.

special

A reader, January 13, 2009 - 12:40 pm UTC

Tom:

Would this option also work instead of escaping all those special symbols. Put them in a CDATA tag.

<?xml version="1.0" ?>
<my_xml>
<text_data desc="sample html">
<![CDATA[
<html>
<head>
<title>Hello!</title>
</head>
<body>
<h1>Hello world</h1>
</body>
</html>
]]>
</text_data>
<text_data desc="sample text">
<![CDATA[
<p>
This is some sample text.<br />
Hello world
</p>
]]>
</text_data>
</my_xml>
Tom Kyte
January 16, 2009 - 4:02 pm UTC

woe be to you that puts data like ]]> inside the cdata.

think about it.

XML

A reader, January 16, 2009 - 4:15 pm UTC

Tom:

I am not going to use the PCDATA tag.

If you want to give a user an XML file generated by a client program so he import the data in desktop publishing program and print it, would you code it so it will replace the 5 reseverd XML charcters ( <, > , &, ", ') with entity code equivalent or you would do the whole ISO character from 160 to 259.




xml

A reader, January 16, 2009 - 11:49 pm UTC

Tom:

I have create a sample xml file with some special symbol entities (second description). It seems IE xml parser can't read them. Do you know if this is correct or not. I also tried the file with special sybmol and still would not display in IE.

<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with &quot;XML&quot;. 7 &gt; 5 &amp; 6 &lt; 8</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>These are special diacritics Degree sign
&176; Capital O, slash
&216; </description>
</book>
</catalog>

Tom Kyte
January 17, 2009 - 8:43 am UTC

SMK - this is sort of a forum on how to use Oracle, not IE....

but an entity in XML is a well defined, documented thing
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

&176; is NOT an html entity, read around for details on how to properly format that (missing a # sign)

special characters

A reader, January 17, 2009 - 12:04 pm UTC

Tom:

sorry did not meant to ask a windoze question.

this is a pure Oracle one.

If you simply want to create Well_formed XML file from oracle database (ISO8859-P1) that stores spanish diacritics:

Á É Í Ó Ú Ñ Ü á é í ó ú ñ ñ ü

would you write them as they are to the file or would you REPLACE each character with the ENTITY Number corresponding to it i.e

Á = "Á"
É = "É"

replace (column,CHR(193),'Entity Number'); ---OR
replace (column,'Á' ,'Entity Number');

or does this also depend on the client program reading this data.
Tom Kyte
January 17, 2009 - 12:49 pm UTC

it is not really an Oracle one, it is - much like the above - an XML question.

It would depend on the encoding you decided to use for your XML wouldn't it (your xml specifies a characterset).
http://www.w3.org/International/O-charset


(SMK - you have bought and read a book or two on XML right? I mean, as much as you have been asking about it - I hope you have 'studied' it, it seems to be a big part of your current job...)

oracle XML

A reader, January 18, 2009 - 9:53 pm UTC

Tom:

yes i did. I always listen to your advice.

It seems that it depends more on the XML editor/parser and how it translates things rather than how the document is encoded. XML parsers should be able to read both: the special character OR the entity code for it. It should translate to the corresponding character and display it.


There most popular charactersets are: acsii/win1252, iso-8859 and unicode.

Everyone says MXL parsers need a UNICODE (UTF-8 or UTF-16) file but it is not true as they can read the 255 characters supported by ISO-8859.


1. If i am creating this file on the server where oracle is installed, I assume it will create in the same characterset of the database "ISO8859", correct? Is there a way for Oracle to create the file in UNICODE format.

2. If the client (i.e PB) is creating the file, then I assume Oracle has nothing to do with it other than delivering the data in ISO8859 format to the client. The client will convert that character set to Win-1252 and then save the file in whatever format it supports. correct?


line feed

A reader, November 25, 2009 - 6:26 pm UTC

Tom:

Happy Thangsgiving!

Quick question on how oracle stores line feed (ASCII 10)

&#10; Line feed

A visual basic client sends some data in XML format file.
There are a few line feeds that needs to be stored in oracle.

The client encodes the data in the XML file. It then calls an oracle API using HTTP so it can stored nicely in the db. The XML file has entity reference for line feed "&#10;" .

I use XML EXTRACEVALUE oracle function to extract the data from the file and store it into oracle table. When i quey the data in SQL*plus I see a new line where the line feed is. When I look using toad I see a || symbol.

My question, how can I verify that the line feed is stored correctly? Should it be stored as &#10; or CHR(10) or anything else?

I also display this data into a web page in XML format. I do not see any of those &#10 characters. The data is printed in one line. We would need to display those line feeds on the web page though.

Can you advise please.

Tom Kyte
November 28, 2009 - 11:17 am UTC

we just store whatever you give us.




If you store an xml document with entity codes in it - that is what you have stored, just select out the entire xml document - do not do any processing, just get the document - and you'll see what exactly is in there.

we do not convert the entity codes when you save an xml document. Upon display - different environments will do what they want to display it

ops$tkyte%ORA10GR2> create table t ( x xmltype );

Table created.

ops$tkyte%ORA10GR2> set define off
ops$tkyte%ORA10GR2> insert into t values ( xmltype( '<data>Hello&#10;World</data>' ) );

1 row created.

ops$tkyte%ORA10GR2> select * from t;

X
-------------------------------------------------------------------------------
<data>Hello
World</data>


ops$tkyte%ORA10GR2> select t.x.getclobval() from t t;

T.X.GETCLOBVAL()
-------------------------------------------------------------------------------
<data>Hello&#10;World</data>


specia character

A reader, November 28, 2009 - 7:52 pm UTC

Tom:
 Thanks,
 
 1.  Can you demo it using VARCHAR2 (not xmltype) column and XMELEMENT function to extract that data.. 
 I have a VARCHAR2(50) type column and my cursor uses XMLELEMENT().
 It seems the line feed is there but the web page does not show &#10;.
 
 2. using your test, should not the line feed show up on the web page. It did not. Shall i replace that with <br>?
 
 SQL> select title from books where bkno=1234
 
 TITLE
 ----------------------------------------------------------------------------------------------------
 Justice for All:
 Earl Warren and the
 Nation He Made
 
 SQL> select t.title.getclobval() from books t where bkno=1234
 
 ERROR at line 1:
 ORA-22806: not an object or REF
 
 SQL> select xmlelement("title",t.title) from books t where bkno=1234
  
 XMLELEMENT("TITLE",T.TITLE)
 ----------------------------------------------------------------------------------------------------
 <title>Justice for All:
 Earl Warren and the
 Nation He Made</title>
 
 SQL> select dump(title) from books where bkno=1234
  
 DUMP(TITLE)
 ----------------------------------------------------------------------------------------------------
 Typ=1 Len=51: 74,117,115,116,105,99,101,32,102,111,114,32,65,108,108,58,10,69,97,114,108,32,87,97,11
 
  
 
 

 

Tom Kyte
November 29, 2009 - 8:31 am UTC

1) well, you sure can.

This I can promise you - if you insert &#10; in a varchar2 - we will store....

&#10;

and nothing else, a varchar2 is just a string, what you put in, you get out.

xmlelement will do formatting for you, conversions, transformations - it will not show you exactly what is in the table because it has to parse and process the data in order to give you the data. If you want the original text, just SELECT THE ORIGINAL TEXT.


2) if you are just using html without a <pre> tag - newlines are "suggestions", not mandates.

If you want a break - use <pre> </pre> OR use <br> where you have newlines.

newline

A reader, November 29, 2009 - 12:51 pm UTC

Tom:

Here is how my program logic works.
I still do not understand why i do not see the line feed entity in the generated xml file from DB using view source
or sql*plus.  please see examples below.

1) I use EXTRACTELEMENT() to extract the TITLE from a well formed XML file.
2) I save that data into a TITLE VARCHAR2(50) field.
3) Then I use cursor and XMLELEMENT() to query data from database and generate an XML file.

begin
 cursor c_query (bkno) is
 SELECT XMLELEMENT("Title",title),other columns from books;
 
 owa_util.mime_header('text/xml',TRUE);
 htp.p('<?xml version="1.0" encoding="ISO-8859-1"?>');
 htp.p('<ROWSET>');

 for x in c_query(p_bkno)
 loop
  htp.p(replace(x.result.getclobval(),'shortxmltagname','longxmltagname'));
 end loop;
 htp.p('</ROWSET>');

I will demo my point here:

/** here I can see the entity reference. it does not seem database thinks this is a line feed ***/

SQL> create table t (x varchar2(100) );

Table created.

SQL> insert into t values ('Justice for All:&#10; Earl Warren and the&#10; Nation He Made');
1 row created.

SQL> select * from t;
X
--------------------------------------------------------------------------------
Justice for All:&#10; Earl Warren and the&#10; Nation He Made

1 row selected.


/*** THis how the code works. I do see line feeds in sql*plus but no entiry reference in the xml  ***?

create table t1  (x varchar2(100));

Table created


SQL>INSERT INTO T1 
SELECT extractvalue(column_value,'/title/text()') title
from table(xmlsequence(extract(xmltype('<data><title>Justice for All:&#10; Earl Warren and the&#10; Nation He Made</title></data>'),'/data/*')));


SQL> SELECT * from t1;

TITLE
--------------------------------------------------------------------------------
Justice for All:
Earl Warren and the
Nation He Made


1 row selected.


SQL> select xmlelement("title",x) from t1;

XMLELEMENT("TITLE",X)
--------------------------------------------------------------------------------
<title>Justice for All:
Earl Warren and the
Nation He Made</title>


1 row selected.

so how can i get the line feed entity printed when i generate the xml file and replace it with <br> or <pre> tags.




Tom Kyte
November 30, 2009 - 1:43 am UTC

the entity reference is just that, it is the same as a newline, Just like you can use &#32; for a space - it is still *just a space*

&#10; is a newline

chr(10) is a newline

ascii code 10 is a newline

a newline is a newline is a newline

and you are seeing the newline in sqlplus as a newline

and you are seeing the newline in html if you view source, but in html a newline is a suggestion, not a mandate - hence, it just shows it as whitespace.

If you want to see a line break, you'll need to convert newlines into <br> tags (a simple REPLACE() will do that) OR use the pre tag.

Pratap, June 10, 2010 - 10:14 am UTC

Hi Tom,
I need to store the º (Alt+0186)symbol in oracle database. Currently I am using a column of varchar2 type since this column has some other data as well.But when I insert this in database, instead of degree sign I get question (?) syombol like ?C. Please let me know your suggestions.
Tom Kyte
June 10, 2010 - 2:37 pm UTC

that is alt+0186 in SOME CHARACTER SET, not all....


Apparently the character set your database supports, or that your client is configured to support (NLS_LANG setting) does not agree with you that alt+0186 is the 'degree' symbol.

You might want to peek at this document:

http://docs.oracle.com/docs/cd/E11882_01/server.112/e10729/toc.htm

to get an idea of the complexity behind what you are asking here... The choice of character set is crucial and if you have one that does not support your data - you may be somewhat "out of luck" as far as storing it goes. UTF might be something you would want to consider (see the nvarchar2 type)

Pratap, June 11, 2010 - 9:32 am UTC

Hi Tom,

Many thanks for your suggestion.

We have AL32UTF8 character set configured. Also we tried having column of NVARCHAR2 data type. But still we could not get rid of the issue.

Please let us know if we are missing anything there.


Tom Kyte
June 22, 2010 - 7:40 am UTC

in your clients character set, alt+that number isn't a degree - your CLIENT has a NLS_LANG setting.

Need to select from a table having special character in data

kc, April 05, 2011 - 8:03 am UTC

Hi Tom,

I have a table whose value in one of the rows is "M@M" and a procedure which uses this value in the where clause eg: "SELECT column1, column2 from table_name WHERE column3 = 'M&M';" . However when i try to compile this procedure, it asks for an input value because of the @.
All i can find is to use SET ESCAPE '\'; and then use a "\" before the "@" , but that cannot be incorporated inside the procedure.
Kindly help me find a permanent solution to this.

PS : Im on Oracle 10.2.0.4

Tom Kyte
April 12, 2011 - 12:20 pm UTC

SQL> set define off


'&' is only a special character to sqlplus, you just need to tell sqlplus to stop thinking it is special. set define off does that.

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here