Skip to Main Content
  • Questions
  • Need help to improve performance while parsing LOB string and insert into a table

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Mohammed Imran.

Asked: October 07, 2015 - 12:03 pm UTC

Last updated: October 09, 2015 - 8:33 am UTC

Version: Oracle Database 11g Release 11.2.0.3.0

Viewed 1000+ times

You Asked

Hi,

Firstly I thank you for giving solutions for various problems.
I have been following your website for almost 5 years now and found it to be the best place to clarify & understand any query related to Oracle.

My Query is on how to fix performance issue due to Substr or what is best and fastest method to parse a string with lakhs of comma separated values.

Currently I have written PLSQL function to parse values from an LOB column(using bulk collect) using SUBSTR function and later insert the parsed values into a different table. But I am facing performance issue since the LOB column which is parsed have lakhs of comma separated values and there are lakhs of such records. Below is the code snippet

Function Parse_F Is
Cursor c_1 IS select c1,c2 from tab1; --c1 datatype -> CLOB

TYPE rec_type IS TABLE OF c_1%rowTYPE
INDEX BY SIMPLE_INTEGER;
l_var_1 rec_type;

TYPE tabobj IS TABLE OF tab2%ROWTYPE;
l_var_2 tabobj := tabobj();
Begin

Open c_1;
LOOP
FETCH c_1 BULK COLLECT INTO l_var_1 LIMIT 100;

FOR indx IN 1..l_var_1.COUNT LOOP

LOOP
--Here logic is written to parse l_var_1(indx).c1 using substr only
--the parsed values are copied to collection variable l_var_2
--below line is an example
l_var_2(l_var_2.LAST).col1 := SUBSTR(l_var_1(indx).c1, l_startpos + 1,INSTR(l_var_1(indx).c1, ',', 1, l_commapos) - l_startpos - 1);
.
.
.

END LOOP

--now data is inserted into tab2 from collection l_var_2
FORALL indx IN INDICES OF l_var_2
INSERT INTO tab2
VALUES l_var_2(indx);
--exceptions handled
Commit;
l_var_2.DELETE;


END LOOP;

EXIT WHEN l_var_1.COUNT < 100;

END LOOP;

CLOSE c_1;

End;

The DBMS Profiler shows lines of code with SUBSTR function as most executed and most time consumed statements of function. Hence I do not know how to fix the performance issue.

Stats from initial testing:
--------------------------
Tab1 – 1600 rows
Tab2 – 2,50,000 rows inserted after parsing
Time Taken: 20 mins

It took 20 minutes to parse around 1600 rows which way too high than what was expected, I guess it could hang if the same is run for production data.
Please help me in resolving the issue or suggest me a better way which parses and insert data quickly.

Thanks,
Imran.

and Chris said...

Is the process inserting 1,600 rows or 2.5 million rows?

In either case I recommend changing this to be a single insert. Based on the code posted, you could rewrite the whole procedure to be just:

insert into t2
  select <substr operations>
  from   t1;


If you're doing this because some rows may throw exceptions, then take a look at DML error logging:

https://oracle-base.com/articles/10g/dml-error-logging-10gr2
https://www.youtube.com/watch?v=8WV15BzQIto

This allows your insert to complete with invalid rows placed into an error table.

If you find the single statement approach still takes too long, please submit a new question including the execution plan for the statement.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

In my case Source and Destination tables have one to many relationship

Mohammed Imran Pasha, October 08, 2015 - 4:44 am UTC

Is the process inserting 1,600 rows or 2.5 million rows?

The process here is reading and parsing 1600 rows from tab1 and derived 2.5 million rows are inserted into tab2.

Just an insert statement is suggestible in case where each record from tab1 needs to be parsed and inserted into tab2 as single record. In my scenario each record of tab1 (LOB column) is responsible for thousands of records inserted into tab2. Below is an example.

create table emp_names(first_name varchar2(100),last_name varchar2(100));
/
create table emp_staging_tab(lob_col CLOB);
/
insert into emp_staging_tab values('Chris,Saxon,Mohammed,Imran,Jim,Bunker,Saurav,Ganguly');
commit;


Now we need to write logic to parse lob_col from emp_staging_tab table and insert into emp_name as

insert into emp_names values('Chris','Saxon');
insert into emp_names values('Mohammed','Imran');
insert into emp_names values('Jim','Bunker');
insert into emp_names values('Saurav','Ganguly');


One more complexity here is the number of values in or length of lob_col is variable which means there can be single or thousand employee names in a single emp_staging_tab row. I think this cannot be done in a single insert statement.

Please suggest? Or Let me know if query is not clear

Thanks,
Imran.
Connor McDonald
October 08, 2015 - 8:51 am UTC

Try this ... Here's my sample population script

SQL> create table emp_names(first_name varchar2(100),last_name varchar2(100));

Table created.

SQL> create table emp_staging_tab(lob_col CLOB);

Table created.

SQL> insert into emp_staging_tab values('Chris,Saxon,Mohammed,Imran,Jim,Bunker,Saurav,Ganguly');

1 row created.

SQL> commit;

Commit complete.

SQL> set timing on
SQL> declare
  2    c clob;
  3    c1 varchar2(32767);
  4    tot int := 0;
  5  begin
  6    c1 := 'Chris,Saxon,Mohammed,Imran,Jim,Bunker,Saurav,Ganguly,' ||
  7           'Chris2,Saxon2,Mohammed2,Imran2,Jim2,Bunker2,Saurav2,Ganguly2,' ||
  8           'Chris3,Saxon3,Mohammed3,Imran3,Jim3,Bunker3,Saurav3,Ganguly3,';
  9
 10    c1 := c1 || c1 || c1 || c1 || c1 || c1 || c1 ;
 11
 12    for i in 1 .. 800 loop
 13      dbms_lob.createtemporary(c,true);
 14
 15      for j in 1 .. 40 loop
 16        dbms_lob.writeappend(c, length(c1), c1);
 17        tot := tot + 84;  -- first/second names in 'c1'
 18      end loop;
 19
 20      insert into emp_staging_tab values (c);
 21      commit;
 22
 23      dbms_lob.freetemporary(c);
 24    end loop;
 25  end;
 26  /

PL/SQL procedure successfully completed.


This creates 800 clobs, each with approx ~3300 first/last name pairs.

And here a parsing routine

SQL> set serverout on
SQL> declare
  2    l_big_chunk varchar2(32767);
  3    l_leftover varchar2(200);
  4    l_offset pls_integer;
  5    l_len pls_integer;
  6
  7    type t_rows is table of emp_names%rowtype index by pls_integer;
  8    l_rows t_rows;
  9
 10    l_comma pls_integer;
 11    l_first_name boolean;
 12  begin
 13  for i in ( select rownum r, lob_col from emp_staging_tab ) loop
 14    l_offset := 1;
 15    l_first_name := true;
 16
 17    loop
 18      l_big_chunk := l_leftover || dbms_lob.substr(i.lob_col,32767,l_offset);
 19      l_len := length(l_big_chunk);
 20      l_offset := l_offset + l_len;
 21
 22      dbms_application_info.set_client_info(i.r||'-'||l_offset);
 23
 24      loop
 25        l_comma := instr(l_big_chunk,',');
 26        exit when l_comma = 0 or l_big_chunk is null;
 27        if l_comma != 0 then
 28           if l_first_name then
 29             l_rows(l_rows.count+1).first_name := substr(l_big_chunk,1,l_comma-1);
 30           else
 31             l_rows(l_rows.count).last_name := substr(l_big_chunk,1,l_comma-1);
 32           end if;
 33           l_first_name := not l_first_name;
 34           l_big_chunk := substr(l_big_chunk,l_comma+1);
 35        else
 36           l_leftover := l_big_chunk;
 37        end if;
 38      end loop;
 39
 40      exit when l_len < 32767;
 41    end loop;
 42
 43  end loop;
 44    forall i in 1 .. l_rows.count
 45       insert into emp_names values l_rows(i);
 46
 47    dbms_output.put_line(sql%rowcount);
 48    commit;
 49
 50  end;
 51  /
2688004

PL/SQL procedure successfully completed.

Elapsed: 00:00:12.89

SQL> select * from emp_names
  2  where rownum < 100;

FIRST_NAME                     LAST_NAME
------------------------------ ----------------
Mohammed2                      Imran2
Jim2                           Bunker2
Saurav2                        Ganguly2
Chris3                         Saxon3
Mohammed3                      Imran3
...
...
...


Thats about 2.6million rows in 12 seconds on my laptop :-)

Hope this helps.

Addenda: Thanks to Christian Neumueller for pointing out that the approach above may need to be modified if you are using a multibyte characterset. So be careful if this applies to you.

The sample provided worked fine

Mohammed Imran Pasha, October 09, 2015 - 5:18 am UTC

Thanks a ton Chris. The sample you provided was of great help applying the same logic to my program gave desired results. below are the stats with new logic.

Tab1 – 1600rows
Tab2 – 2,50,000 rows inserted after parsing
Time Taken: 7 secs
Connor McDonald
October 09, 2015 - 8:33 am UTC

The database is pretty snappy when its gets a chance to be :-)

I'm glad it worked out well

Cheers,
Connor

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here