I have 7M rows from legacy system.
insert /*+ append */ into TGT_DATA
(
LST_NAM,
FRST_NAM,
MDL_NAM
)
SELECT /*+ parallel(10)*/
func_text_clean(LST_NAM),
func_text_clean( FRST_NAM),
func_text_clean( MDL_NAM)
FROM P_ONST;
If function is removed from select query runs within 10 mins but with function it takes more than an hour.
my function have just a return statement:
<i> replace(
replace(
replace(
replace(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
regexp_replace(upper( ltrim(rtrim(text_data))),
'[/.]',''),
'["#Â??\?;]', ' ') ,
'(?i)^(?:(?![×Þß÷þø])[-0-9a-zÀ-ÿ])+$', ''),
'(^[[:cntrl:]^\t]+)|([[:cntrl:]^\t]+$)',' ')
, chr(13), ' ')
, chr(10), ' ')
, chr(9),' ')
,'<BR />',' ')</i>
In addition these replacement should occur in order.
I tried below options but no help:
1.With function
2.Pragma UDF
What are the other options to tune this query?
Eeesh, that's some complex replacement you've got going on there!
Have you tried placing the replacement directly in the SQL statement?
Assuming you have (and this makes little difference...)
You could switch the single character replaces to one translate:
with val as (
select 'a' || chr(10) || 'b' || chr(13) || 'c' v
from dual
)
select v, translate ( v, chr(10) || chr(13), ' ' )
from val;
V TRANSLATE(V,CHR(10)||CHR(13),'')
a
b
c a b c
Which may help a little.
But I suspect a lot of the performance hit is coming from those regular expressions.
Can you simplify these at all?
Or - assuming that only "some" of the rows require cleaning - can you have some checks to only do this for strings containing "bad" characters?