Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Lisa.

Asked: August 21, 2000 - 5:30 pm UTC

Last updated: June 01, 2011 - 3:00 pm UTC

Version: 8i

Viewed 10K+ times! This question is

You Asked

How would you go about stripping special characters from a partnumber. I do not know of an Oracle function to do this. Are there any packages that do this? I want to strip everything except A-Z, a-z, 0-9.
Thanks for your help
Lisa

and Tom said...

There is no native function but we can build one. It could be:

ops$tkyte@8i> create or replace package utils
2 as
3 function strip_bad( p_string in varchar2 )
4 return varchar2;
5 end;
6 /

Package created.

ops$tkyte@8i>
ops$tkyte@8i>
ops$tkyte@8i> create or replace package body utils
2 as
3
4 g_bad_chars varchar2(256);
5 g_a_bad_char varchar2(256);
6
7 function strip_bad( p_string in varchar2 ) return varchar2
8 is
9 begin
10 return replace(
11 translate( p_string,
12 g_bad_chars,
13 g_a_bad_char ),
14 substr( g_a_bad_char, 1, 1 ),
15 '' );
16 end;
17
18
19
20 begin
21 for i in 0..255 loop
22 if ( i not between ascii('a') and ascii('z') AND
23 i not between ascii('A') and ascii('Z') AND
24 i not between ascii('0') and ascii('9') )
25 then
26 g_bad_chars := g_bad_chars || chr(i);
27 end if;
28 end loop;
29 g_a_bad_char := rpad(
30 substr(g_bad_chars,1,1),
31 length(g_bad_chars),
32 substr(g_bad_chars,1,1));
33 end;
34 /

Package body created.

ops$tkyte@8i> select
2 utils.strip_bad( 'How is this?' ) ,
3 dump( utils.strip_bad( 'How is this?' ) )
4 from dual;

UTILS.STRIP_BAD('HOWISTHIS?')
----------------------------------------------------------------------------------------------------
DUMP(UTILS.STRIP_BAD('HOWISTHIS?'))
----------------------------------------------------------------------------------------------------
Howisthis
Typ=1 Len=9: 72,111,119,105,115,116,104,105,115






Rating

  (48 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Very good but....

Eric W., May 17, 2002 - 6:21 pm UTC

Was wondering why all the code isn't inside the function. Kinda confused me.

Thanks.

Tom Kyte
May 18, 2002 - 9:49 am UTC

Well, I needed a custom string of characters that was built procedurally.

I want to procedurally build this string of characters ONCE per session. Hence I used the automatic startup code in the package body that will be run once the first time the package is used in the session.

can you explian little more this code ?

A reader, June 25, 2002 - 6:25 pm UTC

if you are calling just a function, is that loop executed ?
if yes how can we trace it ?
what would be the logical flow ?

if we have signigicantly large string varchar2(4000)
and around 2M rows ?
( I have a comment field which has this kind of funny chars
i want to either remove it or convert it into space
the commets varchar2(4000) !!
)

can we do the same thing ?

what about char like '~!@#$%^&*()'

do we have reverse function for ascii() ?
so that I can just specify the rage and sould opereate on the string ?

Thanks,

Tom Kyte
June 25, 2002 - 7:14 pm UTC

That loop is executed when the package is used for the first time in a session. So yes, when you call that package for the first time -- that loop is executed. None of the subsequent calls in your session will "loop"

"trace it" -- what do you mean?

the logical flow would be -- that code is silently and magically invoked by the plsql runtime BEFORE you call that package for the first time. Just like a constructor in Java or C++ is. It is "magic"

Yes, you can do that to a varchar2(4000) and 2 million rows. Be patient.

what about those characters? If you want to keep them -- modify the logic to accomidate that.

chr = reverse of asscii



Thanks

A reader, June 25, 2002 - 8:32 pm UTC

Thanks tom and your reply is very fast.

trace meance just a debug but I can do it with dbms_output
so sorry for that question.

thanks again
v--



useful for checking passwords

Howard Hammermann, July 17, 2002 - 12:31 pm UTC

I used part of this code on a form that validates a user's password. It checks to be sure their password does not include any special characters. Excellent.

how to print the chars which are not ther in table.

Chandra S.Reddy, May 22, 2003 - 11:58 am UTC

Hi Tom,
I have a table like below.

desc t1
Name
-------
S

select S, count(s) from t1
group by S ;


S COUNT(*)
- ----------
A 17
B 78
C 84
F 23
G 30
I 30
O 40
P 24
R 60
S 36
T 108
U 24
W 18

In the above list some characters are missing, since they are not there in table t1 itself.(say letters like D,M,N,O,X,Y,Z). these characters are not at all there in table.
But my requirement is missing chars in the list of A,B,C, D, E...XYZ should display with the count zero.

like
D 0
M 0
N 0
O 0
X 0
Y 0
Z 0

Will it be possible through the SQL query.

Have a great time.

Tom Kyte
May 23, 2003 - 8:23 am UTC

select a.s, nvl(b.cnt,0)
from (select chr(ascii('A')+rownum-1) s from all_objects where rownum <= 26 ) a,
(select S, count(s) cnt from t1 group by S ) b
where a.s = b.s(+)
/

What else you got....

DH, May 23, 2003 - 10:53 am UTC

in that "utils" package of yours? :-)

Tom Kyte
May 23, 2003 - 11:47 am UTC

nothing actually, it was a "demo" ;)

I'll bet...

DH, May 23, 2003 - 1:51 pm UTC

that you've probably got dozens of such 'demos' that would make a great utility package.

Great.

A reader, May 25, 2003 - 1:28 am UTC

Great answer.

Issue with number datatype accessed from java

raj, December 08, 2003 - 10:38 pm UTC

We have a number(12,2) columns for *amounts* all over the database. Somehow bad data is stored in these columns and core-dumps or throws null pointer error to java programs.
Eg:

select number_col, dump(number_col, 10)
from table_xxx
where number_col < 1 and number_col > 0.9999999999
/

0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1
0 Typ=2 Len=2: 193,1

select dump(0, 10) from dual
/
Typ=2 Len=1: 128
select dump(0.0) from dual;
Typ=2 Len=1: 128


I have already read how Oracle stores numbers in metalink.
Our developer also has did quite some research from the java side and tried to insert such a value, but failed.
{ zero (0.0) is inserted with bad data, the dumps above show the difference }

Tom,
Could you please clarify how this would happen ? We have been using 9.0.1 and currently running 9.2.0.4 and we have
always kept our jdbc versions upto date.

Here is an excerpt of what the java developer had reported:
===========================================================
These are the details.
A zero is stored in Oracle as -128
A one is stored in Oracle as -63 2 (2 bytes)
This particular digit (in which the null pointer is thrown) is stored as -63 1

Now, Oracle translates dobles through a particularly goofy logic.
This is inside classes12.jar oracle.sql.LnxLibThin class

public byte[] lnxren(double d) throws SQLException {
byte[] is = new byte[20];
int i = 0;
boolean bool = !(d < 0.0);
d = Math.abs(d);
if (d < 1.0) {
for (int i_475_ = 0; i_475_ < 8; i_475_++) {
if (powerTable[i_475_][2] >= d) {
i -= (int) powerTable[i_475_][0];
d *= powerTable[i_475_][1];
}
}
if (d < 1.0) {
i--;
d *= 100.0;
}
} else {
for (int i_476_ = 0; i_476_ < 8; i_476_++) {
if (powerTable[i_476_][1] <= d) {
i += (int) powerTable[i_476_][0];
d *= powerTable[i_476_][2];
}
}
}
if (i > 62)
throw new SQLException(CoreException.getMessage((byte) 3));
if (i < -65)
throw new SQLException(CoreException.getMessage((byte) 2));
boolean bool_477_ = !(d >= 10.0);
byte i_478_ = 8;
int i_479_ = 0;
byte i_480_ = (byte) (int) d;
for (/**/; i_479_ < i_478_; i_479_++) {
is[i_479_] = i_480_;
d = (d - (double) i_480_) * 100.0;
i_480_ = (byte) (int) d;
}
i_479_ = 7;
if (bool_477_) {
if (i_480_ >= 50)
is[i_479_]++;
} else
is[i_479_] = (byte) ((is[i_479_] + 5) / 10 * 10);
for (/**/; is[i_479_] == 100; is[i_479_]++) {
if (i_479_ == 0) {
i++;
is[i_479_] = (byte) 1;
break;
}
is[i_479_] = (byte) 0;
i_479_--;
}
for (i_479_ = 7; i_479_ != 0 && is[i_479_] == 0; i_479_--)
i_478_--;
byte[] is_481_ = new byte[i_478_ + 1];
is_481_[0] = (byte) i;
System.arraycopy(is, 0, is_481_, 1, i_478_);
return NUMBER._toLnxFmt(is_481_, bool);
}

If,
if (i > 62)
throw new SQLException(CoreException.getMessage((byte) 3));
if (i < -65)
throw new SQLException(CoreException.getMessage((byte) 2));

is commented out then this will give -63 1 if a zero goes into this method as an argument. However,
such a thing won't happen because one those code is not commented out and two there is error check before that in classes12.jar oracle.sql.NUMBER
which is the caller to this method:

public static byte[] toBytes(double d) throws SQLException {
if (d == 0.0 || d == -0.0)
return _makeZero();
if (d == Double.POSITIVE_INFINITY)
return _makePosInf();
if (d == Double.NEGATIVE_INFINITY)
return _makeNegInf();
return _getLnxLib().lnxren(d);
}

The makezero returns -128. And hence -63 1 should not go in, and I have not been able to get it in either,
and I did try a lot of permutations.
Now, we can again have a fail safe mechansim to check whether it is -63, 1 and treat it as zero. But
again it does not address why it happened.
==========================================================

Sorry if the information is overwhelming.

We are currently updating these *bad* values to zero on tables that have millions of rows as a work-around.

I have seen this type of data break ASP pages in my previous project.

Can you please provide some insight on how this could
occur and any other solution than updating ?

Tom Kyte
December 09, 2003 - 6:23 am UTC

are you using anything other then jdbc to insert this data?

9999999 times out of 10000000 i see this, there is a C program out there using the native (binary) format of a number (performance they say) and they write bad data into the database (when you use binary formats for dates and numbers, we just accept the binary data, we don't verify it at all - sort of a "direct mode")

Checking for special characters in CLOB

A reader, December 15, 2003 - 8:18 am UTC

Tom,
Is there a way to find all special characters that may exist in a clob column. Users have entered special characters into the clob columns by doing a copy/past from MSWord.
Ex. go into MSWord symbol (or character map) and copy/past the fraction 1/3.

Thank you


Tom Kyte
December 15, 2003 - 10:03 am UTC

finding "all" -- would be procedural.

finding "if there exists" would be a simple dbms_lob.instr.

take a look at
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1533006062995 <code>

Please consider this issue...

Muhammad Mustafa, January 17, 2004 - 1:12 am UTC

I want to get this result. Is it possible?

STRING

"No pre-sales question should be answered via Technical Support, all pre-sales suggestions should be considered invalid. Technical Support is not authorized to make product suggestions. "

RESULT

"No pre-sales question should be answered chr(10)
via Technical Support, all pre-sales chr(10)
suggestions should be considered invalid.chr(10)
Technical Support is not authorised to chr(10)
make product suggestions."

I want to loop through a String and replace a space with chr(10) at a specific length of a string.

Thanks TOM

Tom Kyte
January 17, 2004 - 1:20 am UTC

search this site for

"word_wrap"

and you'll find my word wrap routine that does this.

Could the strip_bad function be simplified?

Robert Massey, January 17, 2004 - 9:06 pm UTC

Tom,

Could you simplify the strip_bad function by only using the TRANSLATE function like this?


create or replace package utils2 as
function strip_bad( p_string in varchar2 )
return varchar2;
end;
/

create or replace package body utils2 as

g_bad_chars varchar2(256);

function strip_bad( p_string in varchar2 ) return varchar2 is
begin
return translate( p_string, g_bad_chars, 'a');
end;



begin
g_bad_chars := 'a';
for i in 0..255 loop
if ( i not between ascii('a') and ascii('z') AND
i not between ascii('A') and ascii('Z') AND
i not between ascii('0') and ascii('9') )
then
g_bad_chars := g_bad_chars || chr(i);
end if;
end loop;
end;
/

It seems to return the same results.


Tom Kyte
January 18, 2004 - 1:03 pm UTC

looks ok to me. sure

Could the strip_bad function be simplified?

Duke Ganote, February 16, 2004 - 2:47 pm UTC

Tom-- I think strip_bad could be further simplified.

GANOTEDP\dganote> ed
Wrote file afiedt.buf

1 CREATE OR REPLACE FUNCTION strip_bad (p_string IN VARCHAR2)
2 RETURN VARCHAR2
3 IS
4 good_string VARCHAR2(62)
5 := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ||
6 'abcdefghijklmnopqrstuvwxyz' ||
7 '1234567890';
8 BEGIN
9 RETURN
10 TRANSLATE (p_string, 'T' ||
11 TRANSLATE ( p_string, '.' || good_string
12 , '.'
13 ) -- find the bad characters
14 , 'T'); -- remove the bad characters
15* END;
16 /

Function created.

GANOTEDP\dganote> select strip_bad('0110-02-000') from dual;

STRIP_BAD('0110-02-000')
--------------------------
011002000

GANOTEDP\dganote> select strip_bad('0110.-()*"'||chr(10)||'~!@#$%^&*()') from dual;

STRIP_BAD('0110.-()*"'||CHR(10)||'~!@#$%^&*()')
--------------------------------------------------
0110



Simplification & question regarding "Special Characters", version 8.1.6

Duke Ganote, February 17, 2004 - 12:05 pm UTC

Tom-- Now I see you proposed a similar nested TRANSLATE function at:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:671777767982, <code>

Thank you

Arthur Bailey, April 16, 2004 - 9:09 pm UTC

I used the code midified as below:
create or replace package utils
as
function strip_bad( p_string in varchar2 )
return varchar2;
end;
/

create or replace package body utils as

g_bad_chars varchar2(256);
g_a_bad_char varchar2(256);

function strip_bad( p_string in varchar2 ) return varchar2
is
begin
return replace(
translate( p_string,
g_bad_chars,
g_a_bad_char),
chr(127),'');
end;



begin

for i in 0..255 loop
if i not between ascii(' ') and ascii('~')
then
g_bad_chars := g_bad_chars || chr(i);
g_a_bad_char:= g_a_bad_char||chr(127);
end if;
end loop;
end;
/




"Stripping special characters", version 8i

M.S.Murugupandiyan, July 18, 2004 - 4:40 am UTC

Exactly this is what i need...Excellent...thanks

Unicode vs ASCII caution

Duke Ganote, August 11, 2004 - 2:59 pm UTC

Please note that the strip_bad function is ASCII-oriented, as this example shows (the ö character is removed):

SQL> select phrs_txt, strip_bad(phrs_txt) from glsry where phrs_no = 200 and iso_639_3_cd = 'ger';

PHRS_TXT               
-------------------
STRIP_BAD(PHRS_TXT)
-------------------
Strömungsgeschwindigkeit
Strmungsgeschwindigkeit
 

Tom Kyte
August 12, 2004 - 7:45 am UTC

it is absolutely ascii oriented, it does use

22 if ( i not between ascii('a') and ascii('z') AND
23 i not between ascii('A') and ascii('Z') AND
24 i not between ascii('0') and ascii('9') )

after all.

A reader, August 19, 2004 - 11:59 am UTC


Retaining Special Characters while inserting

Thiru, September 07, 2006 - 10:26 am UTC

Data in a particular column has a special character equivalent to keyboard press of ALT+0227 (ã).

On the db on windows, the column gets the character required. But on AIX the character gets inserted as a (?) instead of ã.

Is this something to do with how the db is created? If so, please advise how to go about changing the parameters?

Tom Kyte
September 07, 2006 - 12:01 pm UTC

look at your NLS_LANG settings on the client, this is normal charcterset translation happening.

by default, on unix, the character set is us7ascii, on windows it is an 8bit characterset.

Thanks..

Thiru, September 07, 2006 - 2:48 pm UTC

Changed the NLS_CHARACTERSET to WE8ISO8859P1 and it worked.
So this should have been set at the time of db creation, right?

Tom Kyte
September 08, 2006 - 4:01 pm UTC

nope, precisely the same comments apply.

if the client character set differs from server characterset then character set conversion takes place.

"unprintable" :))

Duke Ganote, September 27, 2006 - 10:49 am UTC

We can easily eliminate just the ASCII unprintable control characters, while leaving Unicode and other special characters:

select TRANSLATE ('÷'
, 'T' || CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)
|| CHR(6)||CHR(7)||CHR(8)||CHR(9)||CHR(10)
|| CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)
|| CHR(16)||CHR(17)||CHR(18)||CHR(19)||CHR(20)
|| CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)
|| CHR(26)||CHR(27)||CHR(28)||CHR(29)||CHR(30)
|| CHR(31)||CHR(127) -- all ASCII control characters
, 'T') clean_desc
from dual
/

C
-
÷

Good

A reader, November 21, 2006 - 11:40 pm UTC

Very Good. Thanks

Bad Characters

Maverick, January 09, 2007 - 3:20 pm UTC

Tom, I was testing your UTILS Package [from top] and just took bad characters part of it ..but when i try to display what's been gathered as bad characters, nothing is coming. it has a blank string..where am i doing wrong :-( ?

declare
g_bad_chars varchar2(256);
g_a_bad_char varchar2(256);

begin
for i in 0..255 loop
if ( i not between ascii('a') and ascii('z') AND
i not between ascii('A') and ascii('Z') AND
i not between ascii('0') and ascii('9') )
then
g_bad_chars := g_bad_chars || chr(i);
end if;
end loop;
/* g_a_bad_char := rpad(
substr(g_bad_chars,1,1),
length(g_bad_chars),
substr(g_bad_chars,1,1));
*/
dbms_output.put_line(g_bad_chars);
End;

I do not see anything in output [and yes, set serveroutput is on :-)]

Thank you..
Tom Kyte
January 11, 2007 - 9:20 pm UTC

the first bad character is chr(0)

which in the C language, with strings means "end of string, stop printing..."

ops$tkyte%ORA9IR2> exec dbms_output.put_line( 'hello world' );
hello world

PL/SQL procedure successfully completed.

ops$tkyte%ORA9IR2> exec dbms_output.put_line( chr(0) || 'hello world' );


PL/SQL procedure successfully completed.


Pls disregard my previous post..But

Maverick, January 10, 2007 - 8:16 am UTC

I could see the output on SQLplus [I was trying it in TOAD and could not see the output].
But this opened up another can of worms for me:-)

When i run this query, I am getting output but for bad characters, it's showing me a list of all those sepcial characters and also,does a repeat of all the characters. Here is an example of what i am talking about [btw, i am using Oracle 9i]


SQL> declare
2 g_bad_chars varchar2(256):='';
3 begin
4 for i in 0..255 loop
5 if ( i Not between ascii('a') and ascii('z') AND
6 i Not between ascii('A') and ascii('Z')
7 )
8 then
9 g_bad_chars := g_bad_chars || chr(i);
10 end if;
11 end loop;
12 dbms_output.put_line(g_bad_chars);
13 End;
14 /

!"#$%&'()*+,-./0123456789:;<=>?@[\]^_`{|}~
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnop
qrstuvwxyz{|}~

PL/SQL procedure successfully completed.

Also, Why are you using 0..255 [256 ASCII Character check, when there are only 127 {I think :-)}]

Thanks for any suggestions you have on this.

What about something more linguistically aware?

Michael Friedman, January 11, 2007 - 11:15 pm UTC

We have a requirement to accept text in multiple languges.

For example, a material may have a name in Thai, Japanese, Simplified Chinese, Traditional Chinese, Korean, and German, and English.

We need to prevent people from accidentally using inappropriate characters in any of these fields. For example, you should not be able to put a Simplified Chinese character in the Traditional Chinese field. This is complicated, of course, by the fact that some characters are used by both Simplified Chinese and Traditional Chinese. Japanese and Korean also both use some Chinese characters. And, of course, all of them use the Western 0-9.

Are there any linguistic functions in Oracle that will let us validate the content of each field?

The most linguistically aware string functions in Oracle seem to be the REGEXP functions, but they don't seem to have the ability to handle this.

Any other options?

To: Maverick

Michel Cadot, January 12, 2007 - 12:33 am UTC


From SQL Reference:
ASCII returns the decimal representation in the database character set of the first character of char.
CHR returns the character having the binary equivalent to n as a VARCHAR2 value in either the database character set or, if you specify USING NCHAR_CS, the national character set.

What character set are you using?

Regards
Michel

To Michel Cadot..

Maverick, January 12, 2007 - 10:02 am UTC

Michel, thanks for responding ..I am using "WE8MSWIN1252" character set. [I have no clue what that is and how it effects my pl/sql code given above but..]

Thanks,

To: Maverick

Michel Cadot, January 12, 2007 - 12:47 pm UTC


This answer your last question. You use 0 to 255 as this is the range of code points defined in WE8... (actually in WE8SWIN1252 not all these values are used, some are undefined).
ASCII function was extended to support more than ASCII code points. It should be renamed as something like CODE_POINT or CDPT to follow CHR function name.

There is no problem with your PL/SQL procedure. The problem comes from TOAD that badly handles the string you put in dbms_output buffer.

If you execute it in SQL*Plus, you get (with WE8MSWIN1252 and in 10.2 but maybe other and/or incorrect result in other version):

SQL> declare
2 g_bad_chars varchar2(256):='';
3 begin
4 for i in 0..255 loop
5 if ( i Not between ascii('a') and ascii('z') AND
6 i Not between ascii('A') and ascii('Z')
7 )
8 then
9 g_bad_chars := g_bad_chars || chr(i);
10 end if;
11 end loop;
12 dbms_output.put_line(g_bad_chars);
13 End;
14 /


 !"#$%&'()*+,-./0123456789:;<=>?@[\]^_`{|}~¿¿¿¿¿¿¿¿¿¿¿¿¿`¿¿¿¿¿¿¿¿¿¿¿¿Y ¡¢£¤¥¦§¨©ª«
¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

PL/SQL procedure successfully completed.

Regards
Michel

Thanks Michele..

Maverick, January 13, 2007 - 6:08 pm UTC

The output I was getting was from sqlplus and used this in both versions of oracle 9i and 10g [release 2].

I do not see the output you shown above. I am not sure why they are different?

if you have any more information, i'm interested..
Thanks

To: Maverick - Other checks

Michel Cadot, January 14, 2007 - 12:58 am UTC


What is your Oracle db version? What is your db server OS and version?
Same questions for the client.
If you use Windows, do you use DOS or Windows SQL*Plus?
What is your NLS_LANG setting?
What is your client character set (at OS level)?

Spool the result in a file and then dump it to check what you get.

Regards
Michel

Debug

Michael Friedman, January 14, 2007 - 7:36 am UTC

Maverick,

Suggest you do some reasonable debugging.

For example, add code to your DBMS_OUTPUT to print ASCII('a'), ASCII('z'), ASCII('A'), ASCII('Z').

Then change your code so that you concate '| ' || i || ':' || chr(i).

Show us that.

This is what my view is?

Don, January 18, 2007 - 2:37 pm UTC

Hi Tom,

I am not sure if I am right exactly, but I made a change in the package which you wrote -

changed
from
"g_a_bad_char := rpad(
substr(g_bad_chars,1,1),
length(g_bad_chars),
substr(g_bad_chars,1,1));"
to
"g_a_bad_char := substr(g_bad_chars,1,1);"

and rest of the package stays same.

It gives me the same result.

Although it gives me the same result, I am expecting there could be some reason you have used this above.

Could you please explain me this?

Generalized Solution

Chuck Jolley, January 22, 2007 - 12:15 pm UTC

This might make a base for a more generalized function.


create or replace function strip_bad(ar_str varchar2,
ar_good varchar2 default '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
ar_placeholder varchar2 default chr(1))
return varchar2
as
v_result varchar2(4000);
v_char varchar2(1);
v_len number;
begin
v_result := ar_str;
v_len := nvl(length(v_result), 0);
for i in 1..v_len loop
v_char := substr(v_result, i, 1);
if v_char <> ar_placeholder and instr(ar_good, v_char) = 0 then
v_result := replace(v_result, v_char, ar_placeholder);
end if;
end loop;
v_result := replace(v_result, ar_placeholder);
return v_result;
end;

Striping all characters except some of them

Hitesh Bajaj, January 23, 2007 - 4:43 am UTC

Hi Tom,

Please show us how to eliminate all the other unnecessary characters from the strings using the REGEXP_REPLACE function (available in 10g) except the one listed below.

"Regular_Text should contain only following characters a-z,A-Z,0-9,@,&,$,%,SingleQuote and Underscore."

If possible please do help us to explain how that works.

Regards

Re :Striping all characters except some of them

Frank Zhou, January 23, 2007 - 10:20 am UTC


Hitesh,

You can use negate [^a-z,_,A-Z,0-9,@,&,$,%,'']
any character that are not allowed will become null.

Frank

SQL> COLUMN str_old FORMAT A30
SQL> COLUMN st_new FORMAT A30

SQL> SELECT str str_old,
2 REGEXP_REPLACE( str, '[^a-z,_,A-Z,0-9,@,&,$,%,'']' , '') AS st_new
3 FROM ( SELECT 'az*^!~:()+=<>@$%_AZ097''' AS str FROM dual);

STR_OLD ST_NEW
------------------------------ ------------------------------
az*^!~:()+=<>@$%_AZ097' az@$%_AZ097'

Striping other characters

Hitesh, January 24, 2007 - 12:09 pm UTC

Thanks Zuran for the answer.

What's up w/ Tom?

Mike Friedman, January 28, 2007 - 7:11 am UTC

It's been more than 2 weeks since his last post... anyone here any news?

Hey Jedi where are you!

kr, January 29, 2007 - 10:03 am UTC

I hope you are just chilling off some where on a vacation, and all is well with you and family.
Tom Kyte
January 31, 2007 - 1:26 pm UTC

There is single short SQL solution for this problem

Santha Kumar Nowpada, April 09, 2007 - 1:39 pm UTC

It is something like this

Select translate(:My_Text, ':'||translate(:My_Text, ':ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789', ':'), ':') from dual

Many Fields

V, May 21, 2007 - 1:31 pm UTC

Your strip_bad function is great. Is there anyway to use it against all fields in a query?
i.e.
select a,b,c,d,.... from TEST; --The query could have over 100 fields.

Where any of the fields could have invisable chars without doing:
select util.strip_bad(a), util.strip_bad(b).... from TEST;

Something along the lines of:

select util.strip_bad(*) from (select select a,b,c,d,.... from TEST);


Tom Kyte
May 21, 2007 - 1:56 pm UTC

you need to apply a function to any and every column you wish a function to be applied to

RE: " single short SQL solution "

Duke Ganote, October 24, 2008 - 12:29 pm UTC

In 10g, use the POSIX "character class":
http://www.psoug.org/reference/regexp.html

for example, this lists all 127 ASCII characters and replaces all "unprintable" characters with an underscore:

select TO_CHAR(level,'999')||' '||chr(level) ||' -> '||
REGEXP_REPLACE(
chr(level)
, '[^[:print:]]'
, '_') from dual
connect by level <= 127
/

I want to validate for Unprinable chars and break the logic up on it

Raj, May 24, 2011 - 7:10 am UTC

Please give me some solution

my valid range is 32 - 126 i'm doing a basic ascii for loop compartion for 2Million files selecting the data from table and character by character it's taking almost 1/2 hr

i want to optimize this one!

and also is't safe to do validation for unprintable chars on DB layer ?
Tom Kyte
May 24, 2011 - 7:52 am UTC

what is your database character set.

what do you mean by:

and also is't safe to do validation for unprintable chars on DB layer ?

...validation for unprintable chars on DB layer

Duke Ganote, May 24, 2011 - 8:56 am UTC

@Raj-- "Validation"? That's vague: what do you want to happen when you find something outside the acceptable ASCII range of 32-126?
Tom Kyte
May 24, 2011 - 10:10 am UTC

I would be assuming "blank"

I want to validate for Unprinable chars and break the logic up on it

Raja Papaiah, May 27, 2011 - 2:12 pm UTC

Thank you for very much for assisting,

Well here is my clear explanation ===>

I'm doing an logic of plsql to check unprintable character for a column say "DATA" on a TABLE by using a bulk collect cursor.
up on any unprintale char if it founds, then it will stop the execution and come out with status=true boolean value.

below is the procedure which I'm calling on every record column data value to validate for unprintable chars, but I want to optimized the below code(it taking app. 15mins for 1 million records to execute the plsql code)

We have a table of 200 million records with once of it's single data column size of 1000 to 4000 characters app. <<worst case>>

Please give me a solution for this.

PROCEDURE unprintablechars(
      p_content IN   VARCHAR2(4000),
      p_status OUT  BOOLEAN)
  IS
      -- variable declaration
      v_str_length   SIMPLE_INTEGER:=0 ;
  BEGIN
       -- Set the default value to false;
       p_status:=FALSE;  
       IF (p_content IS NOT NULL)
       THEN
              -- Get the length of the colContent String
              v_str_length:=LENGTH(p_content);
              -- Check for unprintable characters
           LOOP
       -- If ASCII range out of 32 to 126, then set flag := true
              IF(ASCII(SUBSTR(p_content,v_str_length,1)) < 32
                 OR ASCII(SUBSTR(p_content,v_str_length,1)) > 126)
                      AND ((ASCII(SUBSTR(p_content,v_str_length,1)) <> 13))
                      --excluding carriage return
            THEN
                 p_status:=TRUE;
                 EXIT;
             END IF;
               -- Decrement the strLength to get the substring character of the colContents
               v_str_length:=v_str_length-1;
            -- Exit the loop when the strlength reaches zero
            EXIT WHEN v_str_length=0;
            -- End the loop logic
       END LOOP;
    END IF;
END unprintablechars;



Thank you once again to TOM @ ORACLE :)
Tom Kyte
May 27, 2011 - 2:38 pm UTC

Just use sql, there is no need for procedural code here.

ops$tkyte%ORA11GR2> create or replace view goodchars
  2  as
  3  select ' '||max(replace(sys_connect_by_path( ch, ' ' ),' ', '' ) ) chars
  4    from (
  5  select chr(case when level = 95 then 13 else level+32 end) ch, rownum r
  6    from dual
  7  connect by level <= 126-32+1
  8         )
  9    start with r = 1
 10  connect by prior r+1 = r
 11  /

View created.

Elapsed: 00:00:00.02
ops$tkyte%ORA11GR2> select * from goodchars;

CHARS
-------------------------------------------------------------------------------
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
opqrstuvwxyz{|}~


Elapsed: 00:00:00.01

<b>that was a bit of technique (also known as trickery) to generate your string of acceptable characters...</b>


ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> drop table t;

Table dropped.

Elapsed: 00:00:00.03
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create table t ( x varchar2(20) );

Table created.

Elapsed: 00:00:00.01
ops$tkyte%ORA11GR2> insert into t values ( 'hello world' );

1 row created.

Elapsed: 00:00:00.02
ops$tkyte%ORA11GR2> insert into t values ( 'hello ' || chr(127) || ' world' );

1 row created.

Elapsed: 00:00:00.00

<b>one good string, one bad string...</b>


ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select x, nvl2( replace( translate( x, (select chars from goodchars), 'x' ), 'x', '' ), 'Bad', 'Fine' )
  2    from t;

X                    NVL2
-------------------- ----
hello world          Fine
hello  world        Bad

Elapsed: 00:00:00.01

<b>and we can see that... Here is how it works.

We take your string X, first argument to translate

We get the string of good characters and that is the second argument.

The last argument is just 'x'.  This will have the effect of turning ' ' into 'x', and then every other character in the good characters string into nothing (they disappear).

Then we replace 'x' with nothing - the x's disappear.

Then, using nvl2, we just see if we are null or not.  If null - good, if not something was leftover and it must be bad</b>


ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select count( case when flag = 'Fine' then 1 end ) good_cnt,
  2         count( case when flag = 'Bad' then 1 end ) bad_cnt
  3    from ( select object_name, nvl2( replace( translate( object_name, (select chars from goodchars), 'x' ), 'x', '' ), 'Bad', 'Fine' ) flag
  4             from big_table.big_table )
  5  /

  GOOD_CNT    BAD_CNT
---------- ----------
   1000000          0

Elapsed: 00:00:02.22

<b>that be a 1,000,000 row table - on my laptop.</b>


ops$tkyte%ORA11GR2> update big_table.big_table
  2  set object_name = substr( object_name, 1, 29 ) || chr(128)
  3  where rownum <= 100;

100 rows updated.

Elapsed: 00:00:00.00
ops$tkyte%ORA11GR2> select count( case when flag = 'Fine' then 1 end ) good_cnt,
  2         count( case when flag = 'Bad' then 1 end ) bad_cnt
  3    from ( select object_name, nvl2( replace( translate( object_name, (select chars from goodchars), 'x' ), 'x', '' ), 'Bad', 'Fine' ) flag
  4             from big_table.big_table )
  5  /

  GOOD_CNT    BAD_CNT
---------- ----------
    999900        100

Elapsed: 00:00:02.04


That will run significantly faster than your plsql.

strip characters

A reader, May 27, 2011 - 2:49 pm UTC


Re: I want to validate for Unprinable chars and break the logic up on it

Raja Papaiah, May 27, 2011 - 5:30 pm UTC

Thanks a lot..!

It optimized upto some extent but i fill bit more time it's taking for 09 mins for 2.4 Million files on data of each row cosist of a minimum 100 characters to 4000 app.
--> tested using my laptop, SQL Tools

One more concern is't possible to to stop counting the bad character if any one bad character found it should stop counting or logic so that it may save some time..

Here i'm trying to validate the given data and has to responsed weather it's having proper data then it will allowed to do some other operation on it otherwise it should stop it basing on a boolean flag in a procedure!.

executed result:

PL/SQL block, executed in 09:25.693 (565.693 sec.)
Total no. of records : 2458815
Unprintable chars present in the file
Total execution time 09:26.005 (566.005 sec.)


Tom Kyte
May 31, 2011 - 9:33 am UTC

and that is 4,444 per second. Or about 0.00022 seconds per comparison. I'm feeling pretty good with that.


One more concern is't possible to to stop counting the bad character if any one bad character found it should stop counting or logic so that it may save some time..



Do you mean you want to stop scanning the data as soon as a row is found with bad data?

If so,

select * from (
select object_name,
nvl2( replace( translate( object_name,
(select chars from goodchars), 'x' ), 'x', '' ), 'Bad', 'Fine' ) flag
from big_table.big_table
) where flag = 'Bad' and ROWNUM=1;


if that query returns a row, then there is bad data. If it does not, then there is only "good" data.

Re: I want to validate for Unprinable chars and break the logic up on it

Raja Papaiah, June 01, 2011 - 2:54 pm UTC

Thanks a lot for giving me the good solution! AWESOME!

select * from (
select object_name,
    nvl2( replace( translate( object_name,
              (select chars from goodchars), 'x' ), 'x', '' ), 'Bad', 'Fine' ) flag
      from big_table.big_table
) where flag = 'Bad' and ROWNUM=1; 


By the by Can you clarify me one doubt,

We are actually storing the entire file content into db rows, will it be faster way can we handle the validation check or else using java on files is faster to handle the validation check ?
Tom Kyte
June 01, 2011 - 3:00 pm UTC

I have a gut feeling that java would be many times slower than SQL.

Benchmark it if you are curious.

You might have good luck with regular expressions and sed though.

But I think the database would best them both.

Purely a gut feel, up to someone else to benchmark :)

thanks it's work for me

rajashree, October 03, 2012 - 12:58 am UTC

Thanks. My special character problem solved

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library