Skip to Main Content
  • Questions
  • Extract domain names from a column having multiple email addresses

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Guru.

Asked: October 21, 2016 - 2:53 pm UTC

Last updated: November 28, 2016 - 11:02 am UTC

Version: 11.2.0.3.0

Viewed 10K+ times! This question is

You Asked

Hi Tom,

i am trying to extract the domain names from a comma delimited email address and show them as comma delimited.
i was successful to some extent where i am able to grab the domain name using REGEXP_SUBSTR and REGEXP_REPLACE and show them in rows.

i am not able to get them displayed in a single column as comma delimited values.

something like below:

create table message ( Message_ID int, Message_Date date, Email_List varchar2(100) );

with data in it:

insert into message values ( 1, to_date( '01-Oct-2016'), 'xyz@gmail.com,abc@jpn.gov.jp,123@yahoo.co.in' );
insert into message values ( 2, to_date( '15-Oct-2016'), 'abc@gmail.com,xyz@yahoo.co.in,123@test.co.jp' );

SELECT SUBSTR(REGEXP_SUBSTR('xyz@gmail.com,abc@jpn.gov.jp,123@yahoo.co.in','[^,]+',1,LEVEL), INSTR(REGEXP_SUBSTR('xyz@gmail.com,abc@jpn.gov.jp,123@yahoo.co.in','[^,]+',1,LEVEL),'@')+1, LENGTH(REGEXP_SUBSTR('xyz@gmail.com,abc@jpn.gov.jp,123@yahoo.co.in','[^,]+',1,LEVEL))) "REGEXPR_SUBSTR" 
FROM DUAL CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE ('xyz@gmail.com,abc@jpn.gov.jp,123@yahoo.co.in', '[^,]+'))+1;

Result:
REGEXPR_SUBSTR
----------------
gmail.com
jpn.gov.jp
yahoo.co.in

Can you please assist on how i can extract the domain names and show them as a comma delimited values under the same column (REGEXPR_SUBSTR). something like below:
REGEXPR_SUBSTR
----------------
gmail.com,jpn.gov.jp,yahoo.co.in


thanks in advance
Guru

and Chris said...

Why the need for regular expressions?

It's probably easier to split the strings into rows similar to how you've done, then use listagg to glue them back together again:

create table message ( Message_ID int, Message_Date date, Email_List varchar2(100) );

insert into message values ( 1, to_date( '01-Oct-2016'), 'xyz@gmail.com,abc@jpn.gov.jp,123@yahoo.co.in' );

insert into message values ( 2, to_date( '15-Oct-2016'), 'abc@gmail.com,xyz@yahoo.co.in,123@test.co.jp' );

select listagg(dom, ',') within group (order by dom) from (
select message_id, substr(emails, instr(emails, '@')+1) dom
from   message, xmltable(
    'if (contains($X,",")) then ora:tokenize($X,"\,") else $X'
  passing Email_List as X
  columns emails varchar2(4000) path '.'
)
)
group  by message_id;

LISTAGG(DOM,',')WITHINGROUP(ORDERBYDOM)  
gmail.com,jpn.gov.jp,yahoo.co.in         
gmail.com,test.co.jp,yahoo.co.in


If you want to understand the XMLTable string split, check out Stew Ashton's article:

https://stewashton.wordpress.com/2016/08/01/splitting-strings-surprise/

Rating

  (12 ratings)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

XMLTable Vs Substr + Instr Vs Regexp_Replace

Rajeshwaran Jeyabal, October 24, 2016 - 10:35 am UTC

I am on 12c (12.1.0.2) and comparing XMLTable approach with Substr/Instr on a huge data set.

drop table t purge;
create table t(x int, y varchar2(4000));
insert into t values(1,'|a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z');
commit;
declare
 l_cnt int := 1;
 l_max_limit int := 100000;
begin 
 loop
  dbms_application_info.set_client_info(
   'l_cnt ='||l_cnt);
  insert into t(x,y)
  select rownum+l_cnt, y
  from t 
  where rownum <= (l_max_limit - l_cnt);
  l_cnt := l_cnt + sql%rowcount;
  commit;
  exit when l_cnt >= l_max_limit;
 end loop;
end;
/ 
exec dbms_stats.gather_table_stats(user,'T',no_invalidate=>false);

demo@ORA12C> $type d:\script.sql
set termout off
exec dbms_application_info.set_client_info('Query1');
select *
from t , xmltable(
    'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2'
  passing y as x2
  columns x2 varchar2(4000) path '.'
)
where x2 is not null;
exec dbms_application_info.set_client_info('Query2');

select x,  substr( y||'|' , instr( y||'|' ,'|',1,column_value)+1,
                        instr(y||'|' ,'|',1,column_value+1) -
                        instr(y||'|' ,'|',1,column_value) - 1) new_val
from t , table( cast(multiset(select level
                 from dual
                 connect by  level <= length(y) -
                        length(replace(y,'|')) ) as sys.odcinumberlist)) t2;
exec dbms_application_info.set_client_info('end');
set termout on

demo@ORA12C> @tkfilename.sql
D:\APP\VNAMEIT\diag\rdbms\ora12c\ora12c\trace\ora12c_ora_8456.trc


PL/SQL procedure successfully completed.

demo@ORA12C> @tktrace.sql

Session altered.


PL/SQL procedure successfully completed.

demo@ORA12C> @d:\script.sql
demo@ORA12C> exit



Tkprof shows me this

********************************************************************************

select *
from t , xmltable(
    'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2'
  passing y as x2
  columns x2 varchar2(4000) path '.'
)
where x2 is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0        369          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    17335     27.92      27.64          0     318042          0     2600000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    17337     27.93      27.66          0     318411          0     2600000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 135  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
   2600000    2600000    2600000  NESTED LOOPS  (cr=318042 pr=0 pw=0 time=31708769 us cost=2780364 size=2450400000 card=40840000)
    100000     100000     100000   TABLE ACCESS FULL T (cr=18045 pr=0 pw=0 time=209906 us cost=240 size=5800000 card=100000)
   2600000    2600000    2600000   COLLECTION ITERATOR PICKLER FETCH XQSEQUENCEFROMXMLTYPE (cr=299997 pr=0 pw=0 time=19206564 us cost=28 size=816 card=408)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   17335        0.00          0.01
  SQL*Net message from client                 17335        0.00         26.62
********************************************************************************

select x,  substr( y||'|' , instr( y||'|' ,'|',1,column_value)+1,
   instr(y||'|' ,'|',1,column_value+1) -
   instr(y||'|' ,'|',1,column_value) - 1) new_val
from t , table( cast(multiset(select level
   from dual
   connect by  level <= length(y) -
   length(replace(y,'|')) ) as sys.odcinumberlist)) t2

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    17335      9.65       9.56          0      18045          0     2600000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    17337      9.65       9.56          0      18045          0     2600000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 135  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
   2600000    2600000    2600000  NESTED LOOPS  (cr=18045 pr=0 pw=0 time=7455779 us cost=2722119 size=49008000000 card=816800000)
    100000     100000     100000   TABLE ACCESS FULL T (cr=18045 pr=0 pw=0 time=148734 us cost=240 size=5800000 card=100000)
   2600000    2600000    2600000   COLLECTION ITERATOR SUBQUERY FETCH (cr=0 pr=0 pw=0 time=4876786 us cost=27 size=16336 card=8168)
   2600000    2600000    2600000    CONNECT BY WITHOUT FILTERING (cr=0 pr=0 pw=0 time=3926854 us)
    100000     100000     100000     FAST DUAL  (cr=0 pr=0 pw=0 time=53026 us cost=2 size=0 card=1)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   17335        0.00          0.01
  SQL*Net message from client                 17335        0.00         18.98
********************************************************************************


Regexp_Replace Vs Substr/Instr approach is available here

https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:73830657104020#2341120800346243823

comparing all these, Substr/Instr approach look good. could you help us to understand any specific reason for using XMLTable in the above case?
Connor McDonald
October 25, 2016 - 2:25 am UTC

I'm not disputing your results, but lot of network fetch time there. What happens when you repeat the tests without the client fetch, ie, run it all in plsql,

begin
for i in ( query1 )
  null;
end loop;
end;

XMLTable Vs Substr + Instr Vs Regexp_Replace

Rajeshwaran Jeyabal, October 25, 2016 - 2:47 am UTC

Here were my results after executing from plsql.

demo@ORA12C> begin
  2     dbms_application_info.set_client_info('Query1');
  3     for x in ( select *
  4             from t , xmltable(
  5                     'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2'
  6               passing y as x2
  7               columns x2 varchar2(4000) path '.'
  8             )
  9             where x2 is not null )
 10     loop
 11             null ;
 12     end loop;
 13     dbms_application_info.set_client_info('Query2');
 14     for x in ( select x,  substr( y||'|' , instr( y||'|' ,'|',1,column_value)+1,
 15                          instr(y||'|' ,'|',1,column_value+1) -
 16                          instr(y||'|' ,'|',1,column_value) - 1) new_val
 17                             from t , table( cast(multiset(select level
 18                   from dual
 19                   connect by  level <= length(y) -
 20                          length(replace(y,'|')) ) as sys.odcinumberlist)) t2)
 21     loop
 22             null ;
 23     end loop;
 24     dbms_application_info.set_client_info('end');
 25  end;
 26  /

PL/SQL procedure successfully completed.

demo@ORA12C>


Tkprof shows me this.

********************************************************************************

SQL ID: gm7bfdczsxppw Plan Hash: 1228665238

SELECT * 
FROM
 T , XMLTABLE( 'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2' 
  PASSING Y AS X2 COLUMNS X2 VARCHAR2(4000) PATH '.' ) WHERE X2 IS NOT NULL 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.07          0        324          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    26001     22.02      22.04        874      26622          0     2600000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    26003     22.04      22.12        874      26946          0     2600000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 135     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
   2600000    2600000    2600000  NESTED LOOPS  (cr=26839 pr=881 pw=0 time=20990114 us cost=2780364 size=2450400000 card=40840000)
    100000     100000     100000   TABLE ACCESS FULL T (cr=26622 pr=874 pw=0 time=179966 us cost=240 size=5800000 card=100000)
   2600000    2600000    2600000   COLLECTION ITERATOR PICKLER FETCH XQSEQUENCEFROMXMLTYPE (cr=217 pr=7 pw=0 time=13703825 us cost=28 size=816 card=408)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path read                                1        0.01          0.01
********************************************************************************

SQL ID: af4azxkk9aazp Plan Hash: 4103451135

SELECT X, SUBSTR( Y||'|' , INSTR( Y||'|' ,'|',1,COLUMN_VALUE)+1, INSTR(Y||'|' 
  ,'|',1,COLUMN_VALUE+1) - INSTR(Y||'|' ,'|',1,COLUMN_VALUE) - 1) NEW_VAL 
FROM
 T , TABLE( CAST(MULTISET(SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= LENGTH(Y)
   - LENGTH(REPLACE(Y,'|')) ) AS SYS.ODCINUMBERLIST)) T2


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    26001     10.14      10.63        874      26622          0     2600000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    26003     10.14      10.63        874      26622          0     2600000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 135     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
   2600000    2600000    2600000  NESTED LOOPS  (cr=26622 pr=874 pw=0 time=7933014 us cost=2722119 size=49008000000 card=816800000)
    100000     100000     100000   TABLE ACCESS FULL T (cr=26622 pr=874 pw=0 time=141734 us cost=240 size=5800000 card=100000)
   2600000    2600000    2600000   COLLECTION ITERATOR SUBQUERY FETCH (cr=0 pr=0 pw=0 time=5231168 us cost=27 size=16336 card=8168)
   2600000    2600000    2600000    CONNECT BY WITHOUT FILTERING (cr=0 pr=0 pw=0 time=4104094 us)
    100000     100000     100000     FAST DUAL  (cr=0 pr=0 pw=0 time=54817 us cost=2 size=0 card=1)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path read                                1        0.00          0.00



********************************************************************************

Avoiding LISTAGG() ?

Stew Ashton, October 25, 2016 - 9:05 am UTC

First of all, thanks for the plug of my blog, much appreciated!

Do we really need to split the addresses up and then splice them back together?
with address_lists as (
  select 'xyz@gmail.com,abc@jpn.gov.jp,123@yahoo.co.in' address_list
  from dual
)
SELECT ltrim(
  REGEXP_REPLACE(','||address_list, ',[^@]*@', ','),
  ','
) domain_list
from address_lists;

DOMAIN_LIST
--------------------------------
gmail.com,jpn.gov.jp,yahoo.co.in

Chris Saxon
October 25, 2016 - 4:42 pm UTC

Nice work with the regex :)

@Rajeshwaran Jeyabal on xmltable vs substr+instr

Stew Ashton, October 25, 2016 - 9:44 am UTC

Hi Rajeshwaran,

Thanks for your work on comparing XMLTABLE() with the classic approach. I tried to reproduce your findings but could not.

I use something like Tom Kyte's RUNSTATS package so I do not have the overhead of SQL tracing. I tried running with 10,000 rows and 100,000 rows. At worst, XMLTABLE is 20% slower. At best, it ran 1% faster.

I am not going to post my entire test case, because I would have to include my test harness. I fetch the rows using a FOR loop in PL/SQL, so the fetch size is 100. I don't know what ARRAYSIZE you used in your test.

Just a detail, but your solution does not work if the input string is 4000 bytes long. When you try to add the '|' at the end, you will get the exception "ORA-01489: result of string concatenation is too long".

Best regards, Stew
Chris Saxon
October 25, 2016 - 4:58 pm UTC

Could you share a link showing your full tests Stew?

To Stew on Array size

Rajeshwaran Jeyabal, October 25, 2016 - 12:39 pm UTC

I don't know what ARRAYSIZE you used in your test.

demo@ORA12C> show arraysize
arraysize 150
demo@ORA12C>


Tkprof confirms the same.

select *
from t , xmltable(
    'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2'
  passing y as x2
  columns x2 varchar2(4000) path '.'
)
where x2 is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0        369          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    17335     27.92      27.64          0     318042          0     2600000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    17337     27.93      27.66          0     318411          0     2600000

demo@ORA12C> select 2600000/17335 from dual;

2600000/17335
-------------
   149.985578

1 row selected.

demo@ORA12C>


When i bound that SQL inside an plsql, the arraysize is set to 100 (by default)

********************************************************************************

SQL ID: gm7bfdczsxppw Plan Hash: 1228665238

SELECT * 
FROM
 T , XMLTABLE( 'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2' 
  PASSING Y AS X2 COLUMNS X2 VARCHAR2(4000) PATH '.' ) WHERE X2 IS NOT NULL 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.07          0        324          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    26001     22.02      22.04        874      26622          0     2600000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    26003     22.04      22.12        874      26946          0     2600000

demo@ORA12C> select 2600000/26001 from dual;

2600000/26001
-------------
    99.996154

1 row selected.

demo@ORA12C>


BTW, when i ran Runstats - the outcome was this.

demo@ORA12C> exec runstats_pkg.rs_start;

PL/SQL procedure successfully completed.

demo@ORA12C> begin
  2     for x in ( select *
  3                     from t , xmltable(
  4                                     'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2'
  5                       passing y as x2
  6                       columns x2 varchar2(4000) path '.'
  7                     )
  8                     where x2 is not null )
  9     loop
 10                     null ;
 11     end loop;
 12  end;
 13  /

PL/SQL procedure successfully completed.

demo@ORA12C> exec runstats_pkg.rs_middle;

PL/SQL procedure successfully completed.

demo@ORA12C> begin
  2     for x in ( select x,  substr( y||'|' , instr( y||'|' ,'|',1,column_value)+1,
  3                                              instr(y||'|' ,'|',1,column_value+1) -
  4                                              instr(y||'|' ,'|',1,column_value) - 1) new_val
  5                                                     from t , table( cast(multiset(select level
  6                               from dual
  7                               connect by  level <= length(y) -
  8                                              length(replace(y,'|')) ) as sys.odcinumberlist)) t2)
  9     loop
 10                     null ;
 11     end loop;
 12  end;
 13  /

PL/SQL procedure successfully completed.

demo@ORA12C> exec runstats_pkg.rs_stop(1000);
Run1 ran in 2169 hsecs  and CPU time in 2137 hsecs
Run2 ran in 849 hsecs  and CPU time in 840 hsecs
run 1 ran in 255.48% of the time

Name                                  Run1        Run2        Diff
STAT...Effective IO time             7,217       5,994      -1,223
STAT...CPU used by this sessio       2,138         841      -1,297
STAT...CPU used when call star       2,140         840      -1,300
STAT...recursive cpu usage           2,107         804      -1,303
STAT...DB time                       2,173         854      -1,319
STAT...Elapsed Time                  2,175         852      -1,323
STAT...session uga memory           66,472      64,576      -1,896
STAT...physical read total byt   7,159,808   7,168,000       8,192
STAT...physical read bytes       7,159,808   7,168,000       8,192
STAT...cell physical IO interc   7,159,808   7,168,000       8,192
STAT...sorts (rows)                      0     100,000     100,000
STAT...workarea executions - o           2     100,002     100,000
STAT...sorts (memory)                    1     100,001     100,000
STAT...session pga memory           65,536     196,608     131,072
STAT...session pga memory max      327,680           0    -327,680
STAT...session uga memory max    1,003,624           0  -1,003,624
STAT...logical read bytes from   3,031,040     917,504  -2,113,536
STAT...file io wait time        20,826,000   8,222,460 -12,603,540

Run1 latches total versus runs -- difference and pct
        Run1        Run2        Diff       Pct
       6,969       2,161      -4,808    322.49%

PL/SQL procedure successfully completed.

demo@ORA12C>
demo@ORA12C>

Chris Saxon
October 25, 2016 - 5:00 pm UTC

FWIW I found the XMLTABLE approach slower too. Though marginal difference in number of latches:

SQL> begin
  2    for c in (
  3  select *
  4  from t , xmltable(
  5      'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2'
  6    passing y as x2
  7    columns x2 varchar2(4000) path '.'
  8  )
  9  where x2 is not null
 10    ) loop
 11     null;
 12     end loop;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>
SQL> exec runstats_pkg.rs_middle;

PL/SQL procedure successfully completed.

SQL>
SQL> begin
  2    for c in (
  3  select x,  substr( y||'|' , instr( y||'|' ,'|',1,column_value)+1,
  4                          instr(y||'|' ,'|',1,column_value+1) -
  5                          instr(y||'|' ,'|',1,column_value) - 1) new_val
  6  from t , table( cast(multiset(select level
  7                   from dual
  8                   connect by  level <= length(y) -
  9                     length(replace(y,'|')) ) as sys.odcinumberlist)) t2
 10    ) loop
 11     null;
 12     end loop;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>
SQL> exec runstats_pkg.rs_stop(1000);
Run1 ran in 2426 hsecs
Run2 ran in 1312 hsecs
run 1 ran in 184.91% of the time

Name                                  Run1        Run2        Diff
STAT...CPU used by this sessio       2,212       1,193      -1,019
STAT...recursive cpu usage           2,124       1,104      -1,020
STAT...CPU used when call star       2,216       1,192      -1,024
STAT...Elapsed Time                  2,427       1,313      -1,114
STAT...DB time                       2,429       1,307      -1,122
STAT...no work - consistent re      28,135      27,007      -1,128
STAT...consistent gets pin          28,169      27,034      -1,135
STAT...consistent gets pin (fa      28,169      27,034      -1,135
STAT...recursive calls              27,342      26,050      -1,292
STAT...consistent gets from ca      29,181      27,070      -2,111
STAT...session logical reads        29,238      27,127      -2,111
STAT...consistent gets              29,181      27,070      -2,111
LATCH.cache buffers chains          57,604      54,974      -2,630
STAT...table scan disk non-IMC   2,315,460   2,234,017     -81,443
STAT...table scan rows gotten    2,320,162   2,238,719     -81,443
STAT...sorts (rows)                      2     100,000      99,998
STAT...workarea executions - o           3     100,002      99,999
STAT...sorts (memory)                    1     100,001     100,000
STAT...session uga memory          -65,488      65,488     130,976
STAT...session pga memory         -327,680     524,288     851,968
STAT...logical read bytes from 239,517,696 222,224,384 -17,293,312

Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
62,094      59,582      -2,512    104.22%

Reproducible test

Stew Ashton, October 25, 2016 - 7:48 pm UTC

Well, it depends on which test you run first, doesn't it?
SQL> alter system flush shared_pool;

System FLUSH altered.

SQL> alter system flush buffer_cache;

System FLUSH altered.

SQL> begin
  runstats_pkg.rs_start;
  for c in (
    select x,  substr( '|'||y||'|' , instr( '|'||y||'|' ,'|',1,column_value)+1,
    instr('|'||y||'|' ,'|',1,column_value+1) -
    instr('|'||y||'|' ,'|',1,column_value) - 1) new_val
    from t , table( cast(multiset(select level
    from dual
    connect by  level <= length(y) + 1 -
    length(replace(y,'|')) ) as sys.odcinumberlist)) t2
  ) loop
  null;
  end loop;
  runstats_pkg.rs_middle;
  for c in (
    select x, x2
    from t , xmltable(
      'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2'
      passing y as x2
      columns x2 varchar2(4000) path '.'
    )
  ) loop
  null;
  end loop;
  runstats_pkg.rs_stop(1000);
end;
/

PL/SQL procedure successfully completed.

Run1 ran in 811 cpu hsecs
Run2 ran in 859 cpu hsecs
run 1 ran in 94.41% of the time
 
Name                                      Run1            Run2            Diff
LATCH.object queue header oper           1,986              62          -1,924
LATCH.cache buffers lru chain            2,022              62          -1,960
STAT...file io wait time                 3,921             292          -3,629
STAT...table scan disk non-IMC       3,120,786       3,128,696           7,910
STAT...table scan rows gotten        3,127,709       3,135,619           7,910
LATCH.row cache objects                 13,177           2,661         -10,516
LATCH.shared pool                       12,482             906         -11,576
LATCH.cache buffers chains              74,233          55,504         -18,729
STAT...sorts (rows)                    100,107             399         -99,708
STAT...sorts (memory)                  100,013              42         -99,971
STAT...workarea executions - o         100,005              21         -99,984
STAT...session pga memory               65,536         -65,536        -131,072
STAT...physical read bytes           7,585,792         466,944      -7,118,848
STAT...physical read total byt       7,700,480         581,632      -7,118,848
STAT...cell physical IO interc       7,700,480         581,632      -7,118,848
STAT...logical read bytes from     221,560,832     229,359,616       7,798,784
 
Run1 latches total versus runs -- difference and pct
          Run1               Run2              Diff        Pct
           110,952            62,094           -48,858    178.68%
Note in my blog I didn't compare XMLTABLE to the classic substr+instr approach, but to Connor's approach using analytics.

Yes the classic approach is probably faster almost all the time. How much faster depends. It doesn't work past 3999 bytes of input.

P.S. My test table does not have '|' in the first byte of each string.

Reproducible test

Rajeshwaran Jeyabal, October 26, 2016 - 2:12 am UTC

I dont understand, why did Stew flushed buffer cache and shared pool ? I would rather warm up the buffer cache and shared pool.

In addition to this, I have modified the runstats_pkg slightly to report CPU time in addition to Elapsed time.

demo@ORA12C> begin
  2     for i in 1..3
  3     loop
  4             for x in ( select *
  5                       from t , xmltable(
  6                              'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2'
  7                             passing y as x2
  8                             columns x2 varchar2(4000) path '.'
  9                       )
 10                       where x2 is not null )
 11             loop
 12                      null ;
 13             end loop;
 14
 15             for x in ( select x,  substr( y||'|' , instr( y||'|' ,'|',1,column_value)+1,
 16                     instr(y||'|' ,'|',1,column_value+1) -
 17                     instr(y||'|' ,'|',1,column_value) - 1) new_val
 18                                    from t , table( cast(multiset(select level
 19                          from dual
 20                          connect by  level <= length(y) -
 21                   length(replace(y,'|')) ) as sys.odcinumberlist)) t2)
 22             loop
 23                      null ;
 24             end loop;
 25     end loop;
 26  end;
 27  /

PL/SQL procedure successfully completed.

demo@ORA12C> set echo on
demo@ORA12C> @d:\script.sql
demo@ORA12C> exec runstats_pkg.rs_start;

PL/SQL procedure successfully completed.

demo@ORA12C>
demo@ORA12C> begin
  2    for x in ( select *
  3                from t , xmltable(
  4                              'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2'
  5                      passing y as x2
  6                      columns x2 varchar2(4000) path '.'
  7                )
  8                where x2 is not null )
  9    loop
 10                  null ;
 11    end loop;
 12  end;
 13  /

PL/SQL procedure successfully completed.

demo@ORA12C>
demo@ORA12C> exec runstats_pkg.rs_middle;

PL/SQL procedure successfully completed.

demo@ORA12C>
demo@ORA12C> begin
  2    for x in ( select x,     substr( y||'|' , instr( y||'|' ,'|',1,column_value)+1,
  3                 instr(y||'|' ,'|',1,column_value+1) -
  4                 instr(y||'|' ,'|',1,column_value) - 1) new_val
  5                  from t , table( cast(multiset(select level
  6                         from dual
  7                         connect by  level <= length(y) -
  8                         length(replace(y,'|')) ) as sys.odcinumberlist)) t2)
  9    loop
 10           null ;
 11    end loop;
 12  end;
 13  /

PL/SQL procedure successfully completed.

demo@ORA12C>
demo@ORA12C> exec runstats_pkg.rs_stop(1000);
Run1 ran in 2350 hsecs  and CPU time in 2295 hsecs
Run2 ran in 921 hsecs  and CPU time in 910 hsecs
run 1 ran in 255.16% of the time

Name                                  Run1        Run2        Diff
STAT...recursive cpu usage           2,250         879      -1,371
STAT...CPU used when call star       2,298         913      -1,385
STAT...CPU used by this sessio       2,299         911      -1,388
STAT...Elapsed Time                  2,354         933      -1,421
STAT...DB time                       2,373         923      -1,450
STAT...file io wait time            14,073      15,791       1,718
STAT...physical read total byt     122,880     131,072       8,192
STAT...cell physical IO interc     122,880     131,072       8,192
STAT...physical read bytes           8,192      16,384       8,192
STAT...workarea executions - o           4     100,003      99,999
STAT...sorts (memory)                    4     100,003      99,999
STAT...sorts (rows)                     52     100,051      99,999
STAT...session pga memory          -65,536      65,536     131,072
STAT...logical read bytes from 222,535,680 219,332,608  -3,203,072

Run1 latches total versus runs -- difference and pct
        Run1        Run2        Diff       Pct
      61,269      57,771      -3,498    106.05%

PL/SQL procedure successfully completed.

demo@ORA12C>
demo@ORA12C>

Connor McDonald
October 26, 2016 - 2:26 am UTC

Valid point, but if you are going to warm the buffer cache, then I would run both queries several times, and then perform the comparison tests

Reproducible test

Rajeshwaran Jeyabal, October 26, 2016 - 2:35 am UTC

Other than the latches, do look at the CPU time reported by the Runstats comparison - The CPU time by XMLTABLE approach is twice than the classical approach.

The same is alluded with Tkprof too.

Still getting different results

Stew Ashton, October 26, 2016 - 8:16 am UTC

Several points:
1) The order in which we do the tests might make a difference.
2) I think it is more reasonable to have input without a '|' at the beginning.
3) I flushed everything before the first test just to show that results might vary depending on circumstances that are not reported.

Now, I did the same test several times in a row, so shared pool and buffer cache were primed. I do not get a big variation in CPU, no matter how many times I run the test. Here is a subset of my output, exact same code as posted above with XMLTABLE as run2:
Run1 ran in 830 cpu hsecs
Run2 ran in 890 cpu hsecs
run 1 ran in 93.26% of the time
 
Name                                      Run1            Run2            Diff
...
STAT...Elapsed Time                        882             933              51
STAT...recursive cpu usage                 806             860              54
STAT...CPU used by this sessio             831             891              60
... 
Run1 latches total versus runs -- difference and pct
          Run1               Run2              Diff        Pct
            64,502            60,009            -4,493    107.49%
I'm going on vacation, so I'll let others have the last word.
Chris Saxon
October 26, 2016 - 9:00 am UTC

For me this just highlights the importance of testing in your own environment with your own data. Just because someone produced a benchmark showing X is faster, doesn't mean it will be for you.

Tkprof

Rajeshwaran Jeyabal, October 26, 2016 - 1:50 pm UTC

Team,

Could you take the above test case and run that on your database and post its Tkprof? just want to see how it goes for you.

( I am more interested to see the CPU usage between both these approaches )


Connor McDonald
November 09, 2016 - 1:44 am UTC

12.1.0.2, Windows

SELECT * 
FROM
 T , XMLTABLE( 'if (contains($X2,"|")) then ora:tokenize($X2,"\|") else $X2' 
  PASSING Y AS X2 COLUMNS X2 VARCHAR2(4000) PATH '.' ) WHERE X2 IS NOT NULL 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.01          0        369          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    26001     14.43      14.62          0      26621          0     2600000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    26003     14.46      14.63          0      26990          0     2600000


SELECT X, SUBSTR( Y||'|' , INSTR( Y||'|' ,'|',1,COLUMN_VALUE)+1, INSTR(Y||'|' 
  ,'|',1,COLUMN_VALUE+1) - INSTR(Y||'|' ,'|',1,COLUMN_VALUE) - 1) NEW_VAL 
FROM
 T , TABLE( CAST(MULTISET(SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= LENGTH(Y)
   - LENGTH(REPLACE(Y,'|')) ) AS SYS.ODCINUMBERLIST)) T2 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    26001      6.63       6.69          0      26621          0     2600000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    26003      6.63       6.69          0      26622          0     2600000




2-d benchmark - my 2cents

Brendan, November 27, 2016 - 2:30 pm UTC

I saw this today and thought it would be a nice example for my 2-d benchmarking framework, that I mentioned on the Complex sql thread yesterday.

I took Rajesh's test dataset and just tweaked it to repeat the base string for width parameter, so that W4 has it four times, for example. Depth is the number of records added, and I took WxD grid as (1, 2, 4) x (50000, 100000, 200000).

num_records_out
===============
Run Type                      Depth             W1             W2             W4
MUL_QRY                      D50000        1300000        2600000        5200000
MUL_QRY                     D100000        2600000        5200000       10400000
MUL_QRY                     D200000        5200000       10400000       20800000
XML_QRY                      D50000        1300000        2600000        5200000
XML_QRY                     D100000        2600000        5200000       10400000
XML_QRY                     D200000        5200000       10400000       20800000

cpu_time
========
Run Type                      Depth             W1             W2             W4
MUL_QRY                      D50000           2.62           4.83          16.82
MUL_QRY                     D100000           5.22          12.66          33.41
MUL_QRY                     D200000           10.5          25.56          65.62
XML_QRY                      D50000           5.71          10.57          20.18
XML_QRY                     D100000          11.31          21.48          40.05
XML_QRY                     D200000          22.21          41.58          79.62

cpu_time_RATIO
==============
Run Type                      Depth             W1             W2             W4
MUL_QRY                      D50000              1              1              1
MUL_QRY                     D100000              1              1              1
MUL_QRY                     D200000              1              1              1
XML_QRY                      D50000           2.18           2.19            1.2
XML_QRY                     D100000           2.17            1.7            1.2
XML_QRY                     D200000           2.12           1.63           1.21


XML is always slower, but very interesting to note it gets closer as W increases, which seems to be due to its linear variation, while the Multiset SQL seems to be superlinear. It looks like in fact that XML might be faster at some higher level of W.

I will put my complete output log, and the code for creating the parametrised datasets onto my GitHub project later (BrenPatF).

Database Time Version
-------------------- -------------------- ------------------------------
Start: ORCL 27-NOV-2016 14:00:10 Oracle Database 12c 12.1.0.2.0

Connor McDonald
November 28, 2016 - 2:50 am UTC

Thanks for stopping by Brendan, and the investigative work

Thanks.

Rajeshwaran, Jeyabal, November 28, 2016 - 10:50 am UTC

Thanks. The CPU time by XMLTABLE approach is twice than the classical approach.

The same is alluded with Tkprof and Runstats results.
Chris Saxon
November 28, 2016 - 11:02 am UTC

It depends!

While the results show XMLTABLE slower the gap narrows for the larger W values. In Brendan's W4 examples it is only 1.2 times slower. So there may be cases where it is faster...

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.