Skip to Main Content
  • Questions
  • parsing a CLOB field which contains CSV data

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Antonio.

Asked: November 09, 2015 - 4:43 pm UTC

Last updated: September 15, 2017 - 12:32 am UTC

Version: 11.2

Viewed 10K+ times! This question is

You Asked

Hi everybody,

I have a CLOB filed, in a table, which contains N floating point numbers (e.g. 130,406325;131,394785;113,24137; ...). I'd like to be able to store such N values in a temporary Oracle data structure (e.g. collection), in order to use the whole set of values as if they were available in a view. Is that possible?


Thanks in advance,
Regards,
Antonio

and Connor said...

Addenda: This question has been updated to use Kim's correct answer in the reviews



Yes.

I'd say the easiest way would be to parse it out into a global temporary table, and then use it like any other table.


SQL> drop table T purge;

Table dropped.

SQL>
SQL> create table T ( c clob );

Table created.

--
-- some sample data in a clob
--

SQL>
SQL> declare
  2    v varchar2(32000) := '123.45,456.56,678.78,';
  3    ctemp clob;
  4  begin
  5    for i in 1 .. 10 loop
  6      v := v ||v;
  7    end loop;
  8
  9    dbms_lob.createtemporary(ctemp,true);
 10    ctemp := v;
 11
 12    for i in 1 .. 20 loop
 13      dbms_lob.writeappend(ctemp,length(v),v);
 14    end loop;
 15
 16    insert into T values (ctemp);
 17    commit;
 18  end;
 19  /

PL/SQL procedure successfully completed.

SQL>
SQL> select dbms_lob.getlength(c) from t;

DBMS_LOB.GETLENGTH(C)
---------------------
               451584

SQL>

--
-- a GTT to hold the parsed numbers
--

SQL>
SQL> drop table GTT;

Table dropped.

SQL>
SQL> create global temporary table GTT ( n number )
  2  on commit preserve rows;

Table created.

SQL>
SQL>
SQL> set serverout on
SQL> declare
  2    l_big_chunk varchar2(32767);
  3    l_leftover varchar2(200);
  4    l_offset pls_integer;
  5    l_len pls_integer;
  6
  7    type t_rows is table of number index by pls_integer;
  8    l_rows t_rows;
  9
 10    l_comma pls_integer;
 11  begin
 12  for i in ( select rownum r, c from t ) loop
 13    l_offset := 1;
 14
 15    loop
 16      l_big_chunk := dbms_lob.substr(i.c,32700,l_offset);
 17      l_len := length(l_big_chunk);
 18      l_offset := l_offset + l_len;
 19      l_big_chunk := l_leftover || l_big_chunk;
 20
 21      dbms_application_info.set_client_info(i.r||'-'||l_offset);
 22
 23      loop
 24        l_comma := instr(l_big_chunk,',');
 25        exit when l_comma = 0 or l_big_chunk is null;
 26        l_rows(l_rows.count+1) := to_number(substr(l_big_chunk,1,l_comma-1));
 27        l_big_chunk := substr(l_big_chunk,l_comma+1);
 28      end loop;
 29      l_leftover := l_big_chunk;
 30
 31      exit when l_len < 32700;
 32    end loop;
 33
 34  end loop;
 35    forall i in 1 .. l_rows.count
 36       insert into gtt values ( l_rows(i) ) ;
 37
 38    dbms_output.put_line('rows = '||sql%rowcount);
 39    commit;
 40
 41  end;
 42  /
rows = 64512

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> select * from gtt where rownum <= 20;

         N
----------
    123.45
    456.56
    678.78
    123.45
    456.56
    678.78
    123.45
    456.56
    678.78
    123.45
    456.56
    678.78
    123.45
    456.56
    678.78
    123.45
    456.56
    678.78
    123.45
    456.56

20 rows selected.

SQL>


Rating

  (9 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Trust but Verify

Niall, November 12, 2015 - 8:34 am UTC

Great answer on the how, but you almost certainly want to verify the input data. The biggest potential problem I can see is that the field separator might validly also appear in the data which will likely lead to incorrect data being loaded.
Connor McDonald
November 13, 2015 - 1:43 am UTC

Agreed, and dont forget that things can get a lot more complicated once you head into multibyte charactersets.

Using XML

Karthick, November 12, 2015 - 10:19 am UTC

I don't have a 11.2 DB So did it in 12.1.0.2.0

SQL> insert into gtt
  2  select t1.val
  3    from (
  4          select xmltype('<table><row>' || replace(c, ',', '</row><row>') || '</row></table>') xmldata
  5            from t
  6         ) t
  7       , xmltable
  8         (
  9            '/table/row' passing t.xmldata
 10            columns
 11              val number path '.'
 12         ) t1
 13   where t1.val is not null;

64512 rows created.

Elapsed: 00:00:00.57

XML Solution On 10g is slow

Karthick, November 12, 2015 - 10:33 am UTC

I tried on 10.2.0.5.0 and the performance was not that good compared to PL/SQL solution.

SQL> insert into gtt
  2  select t1.val
  3    from (
  4          select xmltype('<table><row>' || replace(c, ',', '</row><row>') || '</row></table>') xmldata
  5            from t
  6         ) t
  7       , xmltable
  8         (
  9            '/table/row' passing t.xmldata
 10            columns
 11              val number path '.'
 12         ) t1
 13   where t1.val is not null;

64512 rows created.

Elapsed: 00:02:24.40
SQL>  truncate table gtt;

Table truncated.

Elapsed: 00:00:00.15
SQL> set serveroutput on
SQL>
SQL> declare
  2    l_big_chunk varchar2(32767);
  3    l_leftover varchar2(200);
  4    l_offset pls_integer;
  5    l_len pls_integer;
  6
  7    type t_rows is table of number index by pls_integer;
  8    l_rows t_rows;
  9
 10    l_comma pls_integer;
 11  begin
 12  for i in ( select rownum r, c from t ) loop
 13    l_offset := 1;
 14
 15    loop
 16      l_big_chunk := l_leftover || dbms_lob.substr(i.c,32767,l_offset);
 17      l_len := length(l_big_chunk);
 18      l_offset := l_offset + l_len;
 19
 20      dbms_application_info.set_client_info(i.r||'-'||l_offset);
 21
 22      loop
 23        l_comma := instr(l_big_chunk,',');
 24        exit when l_comma = 0 or l_big_chunk is null;
 25        if l_comma != 0 then
 26           l_rows(l_rows.count+1) := to_number(substr(l_big_chunk,1,l_comma-1));
 27           l_big_chunk := substr(l_big_chunk,l_comma+1);
 28        else
 29           l_leftover := l_big_chunk;
 30        end if;
 31      end loop;
 32      exit when l_len < 32767;
 33    end loop;
 34
 35  end loop;
 36    forall i in 1 .. l_rows.count
 37       insert into gtt values ( l_rows(i) ) ;
 38
 39    dbms_output.put_line('rows = '||sql%rowcount);
 40    commit;
 41
 42  end;
 43  /
rows = 64512

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.81

Connor McDonald
November 13, 2015 - 1:44 am UTC

Reminds me of the funny quote:

"When you have a problem, you can use XML. Now you have two problem"

:-)

Similar Question at OTN

Marcus, November 13, 2015 - 6:54 am UTC

There is a similar question at OTN-PL/SQL with interesting solutions: https://community.oracle.com/thread/3649295


'Fraid a slight bug in answer

Kim Berg Hansen, November 13, 2015 - 1:02 pm UTC

Hi, Connor

I'm afraid you've been lucky with your test data:

  v varchar2(32000) := '123.45,456.56,678.78,';


Each "element" ('123.45,' and '456.56,' and '678.78,') is exactly 7 characters (including the comma delimiter.) You read chunks of size 32767 which happens to be divisible by 7. So you never get numbers that are "split" between chunks, so your "leftover" handling was never tested ;-)

I run your test exactly as is, except for one line changed:

  v varchar2(32000) := '123.456,456.567,678.789,';


Here each "element" is 8 characters. 32767 is not divisible by 8.

After running the test with the slightly changed test data, I run a query on the GTT:

select * from gtt
where n not in (123.456, 456.567, 678.789)
/

         N
----------
         7
        89
       456
      .567
     8.789
    23.456
         9
        56
       567
      .789
     3.456
    56.567

12 rows selected.


That's wrong data from places where a number was split in seperate chunks and the leftover handling failed.

Let's look at the code:

SQL> declare
  2    l_big_chunk varchar2(32767);
  3    l_leftover varchar2(200);
  4    l_offset pls_integer;
  5    l_len pls_integer;
  6
  7    type t_rows is table of number index by pls_integer;
  8    l_rows t_rows;
  9
 10    l_comma pls_integer;
 11  begin
 12  for i in ( select rownum r, c from t ) loop
 13    l_offset := 1;
 14
 15    loop
 16      l_big_chunk := l_leftover || dbms_lob.substr(i.c,32767,l_offset);
 17      l_len := length(l_big_chunk);
 18      l_offset := l_offset + l_len;
 19
 20      dbms_application_info.set_client_info(i.r||'-'||l_offset);
 21
 22      loop
 23        l_comma := instr(l_big_chunk,',');
 24        exit when l_comma = 0 or l_big_chunk is null;
 25        if l_comma != 0 then
 26           l_rows(l_rows.count+1) := to_number(substr(l_big_chunk,1,l_comma-1));
 27           l_big_chunk := substr(l_big_chunk,l_comma+1);
 28        else
 29           l_leftover := l_big_chunk;
 30        end if;
 31      end loop;
 32      exit when l_len < 32767;
 33    end loop;
 34
 35  end loop;
 36    forall i in 1 .. l_rows.count
 37       insert into gtt values ( l_rows(i) ) ;
 38
 39    dbms_output.put_line('rows = '||sql%rowcount);
 40    commit;
 41
 42  end;
 43  /


Line 24 exits the loop if no comma is found (l_comma = 0), so you never get to line 29. Therefore l_leftover is never filled with the leftover characters of l_big_chunk. Leftover characters just dissappear.

If that is fixed so l_leftover actually gets leftover characters, then line 16 will try to put more characters in l_big_chunk than there is room for and raise exception.

So here's a take on a quick fix:

declare
  l_big_chunk varchar2(32767);
  l_leftover varchar2(200);
  l_offset pls_integer;
  l_len pls_integer;

  type t_rows is table of number index by pls_integer;
  l_rows t_rows;

  l_comma pls_integer;
begin
for i in ( select rownum r, c from t ) loop
  l_offset := 1;

  loop
    l_big_chunk := dbms_lob.substr(i.c,32700,l_offset);
    l_len := length(l_big_chunk);
    l_offset := l_offset + l_len;
    l_big_chunk := l_leftover || l_big_chunk;
    
    dbms_application_info.set_client_info(i.r||'-'||l_offset);

    loop
      l_comma := instr(l_big_chunk,',');
      exit when l_comma = 0 or l_big_chunk is null;
      l_rows(l_rows.count+1) := to_number(substr(l_big_chunk,1,l_comma-1));
      l_big_chunk := substr(l_big_chunk,l_comma+1);
    end loop;
    l_leftover := l_big_chunk;
    
    exit when l_len < 32700;
  end loop;

end loop;
  forall i in 1 .. l_rows.count
     insert into gtt values ( l_rows(i) ) ;

  dbms_output.put_line('rows = '||sql%rowcount);
  commit;

end;
/


Could probably be shined up quite a bit, but as quick fix I think it works ;-)
Connor McDonald
November 14, 2015 - 12:11 am UTC

Excellent thanks Kim.

And a good example of ensuring testing is not benefiting from a boundary condition

To - Karthick from India

Rajeshwaran, Jeyabal, November 15, 2015 - 1:56 pm UTC

XMLTABLE solution doesn't work if data length exceeds more than 4000 bytes.

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9522821800346391115#9522913800346098057
Connor McDonald
November 16, 2015 - 12:50 am UTC

THanks for the additional info.

12c JSON Solution

Robert Marz (@RobbieDatabee), January 13, 2016 - 3:21 pm UTC

select n
  from t
     , json_table('['||t.c||']', '$[*]' columns n varchar2 path '$') j
/


Is yery fast, has no size limits and does not need any extra tables

Chris Saxon
January 13, 2016 - 4:32 pm UTC

Nice, thanks for stopping by.

LOB2Table package

Michael, February 03, 2016 - 2:56 pm UTC

There's a free solution for this problem:

http://sourceforge.net/projects/lob2table/

The package is quite fast - typically it parses about 15000 rows (a 30 columns) per second.

I.e.:
select d.deptno, d.dname,
       t.row_no, t.column1, t.column2, t.column3, t.column4
from   dept d
       cross join table(
         lob2table.separatedcolumns(
           d.myclob, /* the data LOB */
           chr(10),  /* row separator */
           ',',      /* column separator */
           '"'       /* delimiter (optional) */
         )
       ) t
where  d.deptno in(10, 40)

Chris Saxon
February 04, 2016 - 2:36 am UTC

Thanks for the additional info.

it works

Anton Krougly, January 12, 2017 - 3:34 pm UTC

CREATE OR REPLACE TYPE t_strings_tbl as table of varchar2(32767);

procedure clob2strings(
p_clob in clob,
p_stbl in out nocopy t_strings_tbl
) is
c_buffer_capacity constant integer default 32767;
c_delim_char constant varchar(1) default chr(10);

v_buffer varchar2(32767);
v_buffer_size integer;
v_clob_pos integer default 1;
v_chunk_size integer;
v_leftover varchar2(32767);
v_line varchar2(32767);
v_offset integer;
v_delim_pos pls_integer;
begin
loop
v_buffer_size := c_buffer_capacity - nvl(length(v_leftover), 0);
v_chunk_size := v_buffer_size;
DBMS_LOB.READ (p_clob, v_chunk_size, v_clob_pos, v_buffer);
v_clob_pos := v_clob_pos + v_chunk_size;
v_buffer := v_leftover || v_buffer;

v_offset := 1;

loop
exit when v_buffer is null;
v_delim_pos := instr(v_buffer, c_delim_char, v_offset);
if (v_delim_pos = 0) then
if (v_chunk_size < v_buffer_size) then
v_delim_pos := length(v_buffer) + 1;
end if;
end if;
if (v_delim_pos > 0) then
v_line := rtrim(substr(v_buffer, v_offset, v_delim_pos - v_offset), ' '||chr(13));
v_offset := v_delim_pos + 1;
if (v_line is not null) then
p_stbl.extend();
p_stbl(p_stbl.count()) := v_line;
end if;
end if;
exit when v_delim_pos = 0 or v_delim_pos > length(v_buffer);
end loop;
v_leftover := substr(v_buffer, v_offset);

exit when v_chunk_size < v_buffer_size;
end loop;
end;

Connor McDonald
January 13, 2017 - 1:44 am UTC

nice work

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library