Skip to Main Content

Breadcrumb

Dev Live Dev Intro

This month we are celebrating Developers at AskTOM. We welcome Developers of all levels of experience to join us at our FREE Developer Live events coming in August. Just click on the left to register today! If you are brand new to Database Technology, then we also have got you covered. Just click on the right for your comprehensive FREE training program to kick start your Oracle Database Development journey!

Question and Answer

Tom Kyte

Thanks for the question, Perumal.

Asked: May 27, 2004 - 8:25 am UTC

Answered by: Tom Kyte - Last updated: July 20, 2020 - 3:25 am UTC

Category: Database - Version: 8.1.7

Viewed 1000+ times

You Asked

Hi Tom,
I have a table in which one column is CLOB datatype.
Data in this column are 100 lines. ( single record has 100 lines)
Each line has been separated by chr(13) ( enter key's ascii value).

My requirement is
If i search for a specific word, this word matches with
First record's - 3rd line
fifth record's - 50th line
100 th record's - 1 st line .
Now, i need to show only the lines. not the entire record.
means that my output will be 3 rows.But I should not show all three rows. instead 3 rows with the 3 lines only. That is lines which have the keyword.

How to achive this?
Thank you

and we said...

I think it'll be easiest to call a tiny big of plsql from sql in this case. We need to parse out the n'th line here.

ops$tkyte@ORA9IR2> create table t ( x int primary key, y clob );

Table created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t values ( 1, 'line 1
2 line 2
3 line 3
4 this is an interesting line 4
5 line 5
6 line 6' );

1 row created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t values ( 2, 'line 1
2 line 2
3 line 3
4 this is an interesting line 4' );

1 row created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t values ( 3, 'this is an interesting line 1
2 line 2
3 line 3
4 this is an interesting line 4' );

1 row created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> declare
2 l_text long;
3 begin
4 for i in 1 .. 3000
5 loop
6 l_text := l_text || 'line ' || i || chr(10);
7 end loop;
8 l_text := l_text || 'interesting line 3001';
9 insert into t values ( 4, l_text );
10 l_text := l_text || chr(10) || 'line 3002';
11 insert into t values ( 5, l_text );
12 l_text := 'interesting line 0' || chr(10) || l_text;
13 insert into t values ( 6, l_text );
14 end;
15 /

PL/SQL procedure successfully completed.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace function get_line( p_lob in clob, p_what in varchar2 ) return varchar2
2 as
3 l_text long;
4 l_instr number;
5 begin
6 l_instr := dbms_lob.instr( p_lob, p_what );
7 l_instr := dbms_lob.instr( p_lob, chr(10), l_instr );
8 if ( l_instr = 0 ) then l_instr := dbms_lob.getlength(p_lob); end if;
9 if ( l_instr < 4000 )
10 then
11 l_text := rtrim(chr(10) || dbms_lob.substr( p_lob, l_instr, 1 ), chr(10));
12 else
13 l_text := rtrim(chr(10) || dbms_lob.substr( p_lob, 4000, l_instr-3999 ), chr(10));
14 end if;
15 return substr( l_text, instr(l_text,chr(10), -1, 1 )+1 );
16 end;
17 /

Function created.

ops$tkyte@ORA9IR2> select x, get_line( y, 'interesting' ) y, dbms_lob.getlength(y) len
2 from t
3 where dbms_lob.instr( y, 'interesting' ) > 0
4 /

X Y LEN
---------- ---------------------------------------- ----------
1 this is an interesting line 4 64
2 this is an interesting line 4 50
3 this is an interesting line 1 73
4 interesting line 3001 28914
5 interesting line 3001 28924
6 interesting line 0 28943

6 rows selected.


If you want this to be speedy and scale way up -- you'll store the "lines" as detail records in a child table using varchar2(4000)'s.


you could also use

o intermedia text to index, so the search would be fast using "contains"
o ctx_doc.highlight to find the terms and get their offsets
o a similar "parse" process to extract them

and you rated our response

  (48 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

Great Work

May 27, 2004 - 1:09 pm UTC

Reviewer: Jeff from Atlanta

Tom you're the man

Thank you Tom

May 27, 2004 - 11:47 pm UTC

Reviewer: perumal from Chennai,India

Hi Tom,
Thank you. I am realy happy and thank you for your response and suggestion.

I got error and have a doubt...

May 28, 2004 - 5:52 am UTC

Reviewer: perumal from Chennai,INDIA.

Hi Tom,
   
I have done as you did.
I got the following error.

SQL> declare
  2              l_text long;
  3          begin
  4              for i in 1 .. 3000
  5              loop
  6                  l_text := l_text || 'line ' || i || chr(10);
  7              end loop;
  8              l_text := l_text || 'interesting line 3001';
  9              insert into t values ( 4, l_text );
 10            l_text := l_text || chr(10) || 'line 3002';
 11            insert into t values ( 5, l_text );
 12            l_text := 'interesting line 0' || chr(10) || l_text;
 13            insert into t values ( 6, l_text );
 14        end;
 15  /
            insert into t values ( 4, l_text );
                                      *
ERROR at line 9:
ORA-06550: line 9, column 39:
PLS-00382: expression is of wrong type
ORA-06550: line 9, column 13:
PL/SQL: SQL Statement ignored
ORA-06550: line 11, column 37:
PLS-00382: expression is of wrong type
ORA-06550: line 11, column 11:
PL/SQL: SQL Statement ignored
ORA-06550: line 13, column 37:
PLS-00382: expression is of wrong type
ORA-06550: line 13, column 11:
PL/SQL: SQL Statement ignored

 what's wrong with me. kindly show it.
 If my data is like this... 

SQL> insert into t values(1,'line 1
  2   line 2
  3   this is interesting line 3
  4   this is also interesting line 4
  5   line 5');

1 row created.

SQL> 
SQL> insert into t values(2, 'this is interesting line 1
  2   this is also interesting line 2
  3   this too interesting line 3
  4   line 5
  5   line 6
  6   this is can also interesting line 7');

1 row created.

SQL>  
SQL> insert into t values(3, 'this is interesting line 1
  2   line 2
  3   this too interesting line 3
  4   this is also interesting line 5
  5   line 6
  6   this is can also interesting line 7');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from t;

         X Y
---------- --------------------------------------------------------------------------------
         1 line 1
            line 2
            this is interesting line 3
            this is also interesting line 4
            lin

         2 this is interesting line 1
            this is also interesting line 2
            this too interestin

         3 this is interesting line 1

         X Y
---------- --------------------------------------------------------------------------------
            line 2
            this too interesting line 3
            this is also in




SQL> select x,get_line(y,'interesting') from t;

         X
----------
GET_LINE(Y,'INTERESTING')
---------------------------------------------------------------------------------------------------------
         1
 this is interesting line 3

         2
this is interesting line 1

         3
this is interesting line 1


SQL> 

   The output is not as we expected.
    First record's line 4,Second record's line 2... Missing...

Expecting your's
thank you 

Tom Kyte

Followup  

May 28, 2004 - 11:12 am UTC

In 8i, clobs were different, you could not use them as easily as in 9i. You would have to insert an empty_clob() -- return it back out and use dbms_lob.writeAppend to place that much text into it.


You did not say you wanted *all* occurences. I just got the first, you have the code - you see the technique. You can return upto 4000 characters as a varchar2, just modify the code to get out all of the values of interest.

Oracle Text and Highlight

June 01, 2004 - 2:24 am UTC

Reviewer: Edgar Chupit from Riga, Latvia

Hello Tom,

Is there an efficient way of using ctx_doc.highlight function to find the terms and offsets?

I've tested both ctx_doc.highlight and ctx_doc.markup and they both retrieve document, pass it via the filter and then finds the offsets. Document retrieving and filtering is very expensive procedure, but according to some sources on the web (for example TechNet) Oracle Text stores "token" together with document id and token positions in documents.

If this is true than why does Oracle Text needs to filter the document again and not simply retrieve offsets from the index?

Is there a possibility to effectively retrieve lines that contains highlighted terms?

Thank you.

Tom Kyte

Followup  

June 01, 2004 - 8:15 am UTC

what sort of performance have you measured. You say it is "expensive", what metrics are you using to show that?

Oracle Text and Highligh

June 01, 2004 - 9:22 am UTC

Reviewer: Edgar Chupit from Riga, Latvia

Hello Tom,

Mainly I've measured response time. To measure this I've used Your package runstats. I've created table and populated it with 8 rows (8 word documents) total size of documents was ~100Kbytes. Than I compared speed of highlight vs markup and received results that their
speed is equal.

If there would be a possibility to retrieve offsets of hitwords than I could store a plain text copy of the document in the clob process them same way like makrup does only without going via the filtering mechanism (that have to be expensive for word documents, because we have to call external INSO filter to parse whole document).

As You can see from the tests the highlight process for the 8 rows takes about 2 secs on my configuration, with 50 rows this is about 10-13 seconds.

Here is the test I've used:

SQL> drop index test_idx;

Index dropped.

SQL> drop table test;

Table dropped.

SQL> 
SQL> create table test( id integer, data blob );

Table created.

SQL> --*** insert into test table 2 word documents, i've prepared them separatly
SQL> -- and inserted into document table
SQL> insert into test
  2  select id, data
  3  from document
  4  where rownum < 3;

2 rows created.

SQL> --***
SQL> insert into test
  2  select id, data
  3  from test
  4  /

2 rows created.

SQL> insert into test
  2  select id, data
  3  from test
  4  /

4 rows created.

SQL> insert into test
  2  select id, data
  3  from test
  4  /

8 rows created.

SQL> 
SQL> select max(dbms_lob.getlength(data)) max,
  2         min(dbms_lob.getlength(data)) min,
  3         avg(dbms_lob.getlength(data)) avg,
  4         sum(dbms_lob.getlength(data)) sum,
  5         count(*) count
  6    from test
  7  /

       MAX        MIN        AVG        SUM      COUNT
---------- ---------- ---------- ---------- ----------
      7797       3769       5783      92528         16

SQL> 
SQL> create index test_idx on test(data) indextype is ctxsys.context;

Index created.

SQL> 
SQL> select token_count from dr$test_idx$i where token_text = 'NR';

TOKEN_COUNT
-----------
         16

SQL> 
SQL> set serveroutput on size 10000;
SQL> set timing on;
SQL> 
SQL> declare
  2    idxName varchar2(25) := 'TEST_IDX';
  3    query varchar(25) := 'NR';
  4  begin
  5    otis.runstats_pkg.rs_start;
  6    for r in ( select rowid rid from test where contains(data,query)>0 ) loop
  7        ctx_doc.highlight( idxname, r.rid, query, 'HIGHTAB', plaintext=>TRUE );
  8    end loop;
  9    otis.runstats_pkg.rs_middle;
 10    for r in ( select rowid rid from test where contains(data,query)>0 ) loop
 11        ctx_doc.markup( idxname, r.rid, query, 'MARKTAB', plaintext=>TRUE );
 12    end loop;
 13    otis.runstats_pkg.rs_stop(P_DIFFERENCE_THRESHOLD=>150);
 14    rollback;
 15  end;
 16  /
Run1 ran in 255 hsecs
Run2 ran in 239 hsecs
run 1 ran in 106,69% of the time
        
Name                                Run1      Run2      Diff
LATCH.child cursor hash table        157         0      -157
STAT...table scan rows gotten        253        22      -231
STAT...calls to get snapshot s       599       324      -275
STAT...recursive calls               637       311      -326
STAT...rows fetched via callba       453       114      -339
STAT...table fetch by rowid          476       130      -346
LATCH.session allocation             394        48      -346
STAT...index fetch by key            498       129      -369
LATCH.library cache pin            1,468     1,015      -453
LATCH.library cache lock             779       225      -554
LATCH.row cache objects            1,591       909      -682
STAT...buffer is not pinned co     1,018       313      -705
STAT...session logical reads       2,259     1,211    -1,048
STAT...consistent gets - exami     1,357       283    -1,074
LATCH.shared pool                  1,893       807    -1,086
STAT...consistent gets             1,512       365    -1,147
STAT...consistent gets from ca     1,504       357    -1,147
LATCH.library cache                2,731     1,268    -1,463
STAT...undo change vector size    48,304    50,896     2,592
LATCH.cache buffers chains         5,406     8,053     2,647
STAT...IMU undo allocation siz     7,668    11,400     3,732
STAT...redo size                 101,944   132,824    30,880
        
Run1 latches total versus runs -- difference and pct
Run1      Run2      Diff     Pct
16,073    14,200    -1,873 113.19%

PL/SQL procedure successfully completed.

Elapsed: 00:00:05.60 

Tom Kyte

Followup  

June 01, 2004 - 11:44 am UTC

it is not as simple as that.  intermedia text queries can get quite involved.  Also, indexes get "out of sync" (that is why my home page says "approximately", I update documents, the index entries remain -- they must go from the index to the document to verify).  Also, just because something is found in the index doesn't mean it'll "hit" in the document (phrases for example -- the word 'index' might appear in the document 500 times but the phrase 'the index they created' might only be in there once, you could have searched for the phrase -- not a word)....

But, if you want to make it "fast", you could store the filtered document and index that.  In that manner, the markup will be blinding fast -- no filtering needed at run time.  Consider (this is just a desktop computer, nothing fancy)


ops$ora9ir2@ORA9IR2> create table demo
  2  ( id        int primary key,
  3    theBlob   blob,
  4    theClob   clob
  5  )
  6  /
 
Table created.
 
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> create or replace directory my_files as '/home/tkyte/Desktop'
  2  /
 
Directory created.
 
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> declare
  2      l_blob  blob;
  3      l_clob  clob;
  4      l_bfile bfile;
  5  begin
  6      insert into demo values ( 1, empty_blob(), empty_clob() )
  7      returning theBlob, theClob into l_blob, l_clob;
  8
  9      l_bfile := bfilename( 'MY_FILES', 'asktom.rtf' );
 10      dbms_lob.fileopen( l_bfile );
 11      dbms_lob.loadfromfile( l_blob, l_bfile, dbms_lob.getlength( l_bfile ) );
 12      dbms_lob.fileclose( l_bfile );
 13
 14      l_bfile := bfilename( 'MY_FILES', 'asktom.txt' );
 15      dbms_lob.fileopen( l_bfile );
 16      dbms_lob.loadfromfile( l_clob, l_bfile, dbms_lob.getlength( l_bfile ) );
 17      dbms_lob.fileclose( l_bfile );
 18  end;
 19  /
 
PL/SQL procedure successfully completed.
 
ops$ora9ir2@ORA9IR2> select dbms_lob.getlength(theBlob) blob_len,
  2         dbms_lob.getlength(theClob) Clob_len
  3    from demo
  4  /
 
  BLOB_LEN   CLOB_LEN
---------- ----------
     42644      19199
 
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> create index blob_idx on demo(theBlob) indextype is ctxsys.context;
 
Index created.
 
ops$ora9ir2@ORA9IR2> create index clob_idx on demo(theClob) indextype is ctxsys.context;
 
Index created.
 
ops$ora9ir2@ORA9IR2> select * from ctx_user_index_errors;
 
no rows selected
 
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> create table marktab ( query_id number, document clob );
 
Table created.
 
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> create or replace procedure p1
  2  as
  3          l_clob clob;
  4  begin
  5          ctx_doc.markup( 'blob_idx', '1', 'similar', l_clob );
  6  end;
  7  /
 
Procedure created.
 
ops$ora9ir2@ORA9IR2> create or replace procedure p2
  2  as
  3  begin
  4          ctx_doc.markup( 'blob_idx', '1', 'similar', 'MARKTAB', plaintext=>true );
  5  end;
  6  /
 
Procedure created.
 
ops$ora9ir2@ORA9IR2> create or replace procedure p3
  2  as
  3          l_clob clob;
  4  begin
  5          ctx_doc.markup( 'clob_idx', '1', 'similar', l_clob );
  6  end;
  7  /
 
Procedure created.
 
ops$ora9ir2@ORA9IR2> create or replace procedure p4
  2  as
  3  begin
  4          ctx_doc.markup( 'clob_idx', '1', 'similar', 'MARKTAB', plaintext=>true );
  5  end;
  6  /
 
Procedure created.
 
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> exec p1; p2; p3; p4;
 
PL/SQL procedure successfully completed.

<b>just to get everything "level" (shared sql et.al.)</b>

 
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> @connect /
ops$ora9ir2@ORA9IR2> set termout off
ops$ora9ir2@ORA9IR2> set termout on
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> set timing on
ops$ora9ir2@ORA9IR2> exec for i in 1 .. 10 loop p1; end loop;
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:02.22
ops$ora9ir2@ORA9IR2> exec for i in 1 .. 10 loop p2; end loop;
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:02.14
ops$ora9ir2@ORA9IR2> exec for i in 1 .. 100 loop p3; end loop;
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.68
ops$ora9ir2@ORA9IR2> exec for i in 1 .. 100 loop p4; end loop;
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:01.06
ops$ora9ir2@ORA9IR2>


<b>In memory markup = faster, filtered markup = really fast (note I compared 10 iterations to 100 iterations here -- else the timing was just too small to make sense of)</b>

 

Thank you

June 01, 2004 - 1:49 pm UTC

Reviewer: Edgar Chupit from Riga, Latvia

Hello once again,

Thank you for in-depth explanation, this is the solution I was thinking of. Too bad that Oracle Text doesn't have internal filtered object caching, it would be very nice if we'd have one that stores clob version on the fly. But for now, yes, I will try this method of manual clob caching.

I think I will create user_datastore index on the blob field and direct datastore on clob field. When my stored procedure will be called for the blob field it will use ctx_doc.filter to populate clob field, and than after blob field index sync I will call clob index sync.

Error in my proposal

June 01, 2004 - 2:11 pm UTC

Reviewer: Edgar Chupit from Riga, Latvia

Hello,

I've just realized that the method I've proposed is weak, because call to the ctx_doc.filter will invoke my procedure once again and I'll end up with infinitive recoursion.

Now I have a choise between 3 indexes on the same table or manual invocation of $ORACLE_HOME/ctx/bin/ctxhx

Tom Kyte

Followup  

June 01, 2004 - 3:53 pm UTC

or create a procedure that does the insert -- don't let the clients do it.

in that fashion you can:

a) insert the blob
b) filter the blob
c) store the filtered text

and you only need an index on the filtered text -- you don't need an index on anything else -- use ifilter to turn blob into clob, insert clob, have index on clob and search that.

Thank you

June 02, 2004 - 4:14 am UTC

Reviewer: Edgar Chupit from Riga, Latvia

Hello,

using this method I’ve achieved much faster markup response time, and query time was improved from 50-60 seconds per search to 1-5 seconds per search.


how to return multiple markup results to web applications

July 08, 2004 - 1:12 pm UTC

Reviewer: Raj from Chicago

Hi Tom,

The information you put here is very helpful.
I have a similar problem, can you please comment on how I should proceed:

I am doing a search on publications table Say I have got 10 matches. I need to show this 10 matches to the user using our web application. I found we can use markup procedure to highlight the search words and store it in a markup:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:350017644883, <code>

<quote>
We pass it a unique key that we can use to identify our markedup document (I
find userenv('sessionid') great for this purpose).
</quote>

Since I have 10 results using userenv('sessionid') for all the 10 results would not probably work for my case. In this case how do I store the results in markup table? How do I maintain this table (delte the old records?)

The problem is to get search results, mark it up and show to the user. I am using Oracle 9i. Please let me know if you need more information.

Thanks much!!

Tom Kyte

Followup  

July 08, 2004 - 1:30 pm UTC

use a sequence. select s.nextval from dual;



Re: how to return multiple markup results to web applications

July 08, 2004 - 2:23 pm UTC

Reviewer: Raj from chicago

Thank you!

What is the best way to maintain the markup table? Do I schedule a job and delete it everynight or something like that?

Is there a way to return an array of clobs to web application. I initially considered the option of creating a table type of CLOB. Fill my results in an object of this type and return a cursor. But I think this doesn't work. It is not a real table and there are no columns in it.

Thanks again!

Tom Kyte

Followup  

July 08, 2004 - 3:40 pm UTC

it is up to you as to how to maintain the markup table. me, i cleaned it up after marking up.


what is a "web application", that could be most anythat.

assuming java/jdbc -- sure, you can return collections. search for

java array

on this site, i've demoed that many times.

Re: how to return multiple markup results to web applications

July 08, 2004 - 7:12 pm UTC

Reviewer: Raj from Chicago

It's an ASP.NET web application. I wonder if there is a way to return such collections. I know I can return a recordset. So, I tried to create a type of table CLOB and return a cursor to it. But it does not work (table or view does not exist error);

something like:
Type t_tblLOB is TABLE of CLOB;
store clob markup in t_tblLOB
return a cursor to t_tblLOB.

Any idea how I can achieve this for ASP.NET web application.

Thanks for your help!

Tom Kyte

Followup  

July 08, 2004 - 9:03 pm UTC

ops$tkyte@ORA9IR2> create type myTableType as table of clob;
  2  /
 
Type created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace function foo return myTableType
  2  pipelined
  3  as
  4          l_clob clob;
  5  begin
  6          for i in 1 .. 10
  7          loop
  8                  l_clob := 'hello ' || i;
  9                  pipe row ( l_clob );
 10          end loop;
 11          return;
 12  end;
 13  /
 
Function created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from table(foo());
 
COLUMN_VALUE
-------------------------------------------------------------------------------
hello 1
hello 2
hello 3
hello 4
hello 5
hello 6
hello 7
hello 8
hello 9
hello 10
 
10 rows selected.
 
 

July 09, 2004 - 5:29 am UTC

Reviewer: A reader


First timer dealing with CLOB

July 10, 2004 - 7:31 am UTC

Reviewer: George from MSP, MN USA

Thank you Perumal and Tom.
This was very helpful in giving me an idea of the CLOB. I will try to work with it and hope I can figure it out from there...

lob opening question

July 11, 2004 - 4:29 pm UTC

Reviewer: A reader

From 10g doc
"Opening a LOB in read write mode defers any index maintenance on the LOB column until you close the LOB. useful if there is an extensible index on the LOB column and you do not want the database to perform index maintenance every time you write to the LOB.
This technique can increase the performance of your application if you are doing several write operations on the LOB while it is open."

What is an extensible index associated with a lob?
If it has to be created separately - then it seems to
imply that one does not need to open the lob in read-write
mode usually for performance gains - is that correct?


Thanx!

Tom Kyte

Followup  

July 12, 2004 - 11:02 am UTC

Oracle text is an extensible index for example. Any "non standard" index would be.



thanx Tom!

July 12, 2004 - 11:20 am UTC

Reviewer: A reader

"Oracle text is an extensible index for example. Any "non standard" index would be."
I take it that this does not mean a LOB index
(that points to chunks of LOB data), correct?

Tom Kyte

Followup  

July 12, 2004 - 11:50 am UTC

the lob index is always maintained, else we wouldn't be able to find the bits and pieces of it!

thanx!

July 12, 2004 - 12:28 pm UTC

Reviewer: A reader

just confirming to see if I am missing something!:)

performance problem while using ctx_doc.markup

April 12, 2005 - 8:38 am UTC

Reviewer: Deba from India

I have seen your initial process to fasten up process while using ctx_doc.markup. But still is not clear to me and you will understand what I am trying to say if you kindly go through my problem.

My environment is Oracle 9i rel 2 (9.2.0.5) and OS is Solaris version 9. 

I have created a table 
CREATE TABLE DOCUMENTS 
( 
  NAME          VARCHAR2 (256), 
  MIME_TYPE     VARCHAR2 (128), 
  DOC_SIZE      NUMBER, 
  DAD_CHARSET   VARCHAR2 (128), 
  LAST_UPDATED  DATE, 
  CONTENT_TYPE  VARCHAR2 (128), 
  CONTENT       CLOB, 
  BLOB_CONTENT  BLOB, 
  OWNER         VARCHAR2 (100), 
  DOMAIN        VARCHAR2 (20), 
  DOC_TYPE      VARCHAR2 (30), 
  RESOLUTION    VARCHAR2 (2000), 
  SUBJECT       VARCHAR2 (300), 
  UPLOAD_TYPE   VARCHAR2 (15)
)
LOB (BLOB_CONTENT) STORE AS BLOB_CONTENT
(TABLESPACE CLOB_DATA DISABLE STORAGE IN ROW  
     CHUNK 5 PCTVERSION 1 
      STORAGE (INITIAL 1m next 1m maxextents unlimited pctincrease 0));

Now I have inserted a 4 records where one is ms-word document, one is pdf document, one is html document and one is text document(.txt) using WEBDB and Oracle HTTP server.

After inserting the records, I created index as follows

begin
ctx_ddl.create_preference('doc_lex', 'BASIC_LEXER');
ctx_ddl.set_attribute('doc_lex', 'printjoins', '_-');
end;

create index doc_index on documents(blob_content) indextype is ctxsys.context
parameters ('datastore ctxsys.default_datastore filter ctxsys.inso_filter format column upload_type LEXER doc_lex');

Now I have written a small routine which is as follows
 
create or replace procedure das as 
lclob clob;
begin
for rec in (SELECT  rowid,name FROM DOCUMENTS WHERE  contains(blob_content,'oracle',1) > 0 ) loop
ctx_doc.markup( 'DOC_INDEX', rec.rowid, 'oracle', lclob);
end loop;
end;
/

Now I am runing the above procedure

<I am doing fresh login here>

17:53:40 SQL> execute das;

PL/SQL procedure successfully completed.

Elapsed: 00:00:26.31
17:54:22 SQL> 

In the above procedure there is nothing except ctx_doc.markup and one sql statement.If I run that sql statement separately then the required is 

17:54:22 SQL> SELECT  rowid,name FROM DOCUMENTS WHERE  contains(blob_content,'oracle',1) > 0;

ROWID
------------------
NAME
--------------------------------------------------------------------------------
AATIa7AA0AAAWhiAAC
F13934/mod_plsql_Overview.htm

AATIa7AA0AAAWhiAAB
F22084/Batch_Process_Automation(Support_Doc).pdf


Elapsed: 00:00:01.12

So it is clear that main problem is coming from ctx_doc.markup. 

I need your valuable comments to solve this performance issue.

Regards
Deba 

Tom Kyte

Followup  

April 12, 2005 - 8:56 pm UTC

why would you markup N documents at a time?

but basically, I do not have your documents or even a concept of their size. I might try doing it document by document and seeing "ah hah, it is this TYPE of document" and work from there.

performance problem while using ctx_doc.markup

April 14, 2005 - 9:00 am UTC

Reviewer: Deba from India

Hi Tom,

Thanks for reply. It is not clear what you are trying to say . I am not doing markup N documents at a time. Rather I am taking a row and marking up that document. That's why I am using FOR loop. The size of the documents are 500 kb in average. It may be a case that I am able to understand what you are trying to say. If possible please give me one example. The problem is that I ahve to resolve this problem immediately.The sizes of documents are as follows 

SQL>select subject,dbms_lob.getlength(blob_content) from documents;

SUBJECT
------------------------------------------------------------DBMS_LOB.GETLENGTH(BLOB_CONTENT)
--------------------------------
html doc                  26171
word doc                  31232
pdf                       24585

I thinks sizes are really small. Now if I use ctx_doc.markup then response will be coming within 2 seconds. But when I am uploading any big file say around 400 kb file, then response is becoming too slow. Is there any restriction about the size of the files to be uploaded ? Is it possible to have any solution of using the same method for handling documents whose sizes are more than 500 kb ? I want your valuable comments to tune the ctx_doc.markup.

Thanks
Deba 

Tom Kyte

Followup  

April 14, 2005 - 9:15 am UTC

create or replace procedure das as
lclob clob;
begin
for rec in (SELECT rowid,name FROM DOCUMENTS WHERE
contains(blob_content,'oracle',1) > 0 ) loop
ctx_doc.markup( 'DOC_INDEX', rec.rowid, 'oracle', lclob);
end loop;
end;
/

yes you were? you looped over all documents with oracle and marked them up. normally I would expect an application to do "a document" to be shown to a user, not all documents at the same time.

what speed are you seeing at 500kb, becoming too slow is too vague.

performance problem while using ctx_doc.markup

April 18, 2005 - 1:11 am UTC

Reviewer: deba from India

Tom,

Your statement "normally I would expect an application to do 'a document' to be shown to a user, not all documents at the same time " is not clear till now. I am telling you what I am trying to do. In my portal, there is text box where use can put a search criteria. After that when user chooses that search will be done on all type of documents, then obviously I have to sacn all the documents. So you know my objective now. I am looking for a solution to make it faster ( as you are always helping out ).

"it is too slow " means that if I start searching a document ( size is 484 kb ) it is so slow that I had to kill my session.

Thanks
Deba

Tom Kyte

Followup  

April 18, 2005 - 7:09 am UTC

there is however

a) the search (no markup)
b) the click (markup)


when you search on my site, is it slow? I am using the same technology here.

performance problem while using ctx_doc.markup

April 20, 2005 - 2:45 am UTC

Reviewer: Deba from India

Tom,

What you are saying is not clear . What is the meaning of
"the search (no markup)" and "the click (markup)". Please clarify it. Please tell me hot to do this . If you find any problem in my code you can rectify that . Please give some examples ( which you generally provide in every cases and which are also very usefull to undertsand )

Thanks
Deba

Tom Kyte

Followup  

April 20, 2005 - 6:56 am UTC

when you search, you use contains in the query:

select* from docs where contains( the_doc, :x ) > 0;

then the user picks A DOCUMENT from that result (not all docs, just ONE) and says "show me", then you call the markup procedure.

you are marking up all documents in a result set in your example, that is not how it would be used.

How to - CLOB Conversion

July 28, 2005 - 9:54 am UTC

Reviewer: VIKAS from INDIA

Dear Mr. Kyte,

While Migrating our database using OMWB (Only Objects, functions, Triggers, Procedures. -> Not data.) from Ms Sql server to Oracle 9i, certain Table columns in our Database have got assigned CLOB as their Datatype by default.

Now it is required from us to change their type to Varchar2(4000), these columns will also be containing Null values.

Can you pls advice me, How this can be achieved effectively in Oracle 9i?

I'd tried changing the datatype using Enterprise Manager, but i am getting

"Ora - 22859 Invalid modification of columns"

error, even though if am logged in as the object owner or sysdba.

What is the reason behind this? What actually is Oracle trying to do internally? Are my tables locked, or do i not have sufficient priviliges? (I tried doing this as an sysdba, dba roles and grant)

Can you also suggest me a querry which can give me the Name, Column_name of the tables which contain CLOB's in my database? also...

Is their any way through which all the columns with CLOB datatype in all the tables of my database get modified to Varchar2(4000) in a one go (using command/script)?

Tom Kyte

Followup  

July 28, 2005 - 10:34 am UTC

you cannot simply change datatypes, especially when the column contains data.

Your choices:

a) create new column, populate it, drop old, rename new.

b) rebuild the table into new table, drop old table, rename new table.


I'll demonstrate (a)

ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t
  2  as
  3  select owner, view_name, to_lob(text) text from all_views;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> desc t;
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 OWNER                                             VARCHAR2(30)
 VIEW_NAME                                         VARCHAR2(30)
 TEXT                                              CLOB
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table t add text_new varchar2(4000);
 
Table altered.
 
ops$tkyte@ORA9IR2> update t set text_new = dbms_lob.substr(text,4000,1), text=null;
 
2248 rows updated.
 
ops$tkyte@ORA9IR2> alter table t drop column text;
 
Table altered.
 
ops$tkyte@ORA9IR2> alter table t rename column text_new to text;
 
Table altered.
 
ops$tkyte@ORA9IR2> desc t;
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 OWNER                                             VARCHAR2(30)
 VIEW_NAME                                         VARCHAR2(30)
 TEXT                                              VARCHAR2(4000)
 
 

How to? - CLOB

July 29, 2005 - 1:03 am UTC

Reviewer: VIKAS from INDIA

Dear Mr. Kyte

Thanx a lot for your knowledge sharing. It was really very useful and of great help.

But the question that now comes out to be is, that how can i insert a Column to its original sequence in a table, where it atually existed? Do i have to drop columns, insert the required column and then update the table by inserting other columns that followed it? or is there any other short/single method to do this?

and the other one is:-

How can i get to know the detailes of all the Tables and their corresponding columns (column name and datatype) which contain CLOB data type in them, in my database? is their any view, query, script that i can use to get the desired results? or do i have to do manual checks?

Thanx with kindest regards...

Vikas.


Tom Kyte

Followup  

July 29, 2005 - 8:32 am UTC

Use a view, this is what views are for.


rename table to something else
create view of table with columns in any old "order" you like.

user_tab_columns is the view you are looking for to find clobs.

CLOB - Cooool

July 30, 2005 - 3:43 am UTC

Reviewer: Vikas SANGAR from INDIA

Dear Mr. Kyte

Thanx,
This was what, i was looking for - Dba_Table_cols.

Regards.
Vikas

CLOB - Error Correction.

July 30, 2005 - 5:22 am UTC

Reviewer: VIKAS from INDIA

Sorry for over-spelling
(dba_table_cols instead of dba_tab_cols).
Actually what i intended was:-

user_tab_columns;
dba_tab_cols;

Both serve the purpose.
Cheers.

Conversion - CLOB

August 02, 2005 - 1:33 am UTC

Reviewer: VIKAS SANGAR from INDIA

Daar Mr. Kyte,

Pls suggest and correct, where ever, I am wrong with my approach.

I have two users, usr1, usr2 (a fresh User).

Usr1 have various objects into its schema (Tables, Views, Triggers, Stored procedures, functions etc) along with data/records in them.

The objects in usr1 schema (Tables, dependent Views...) have certain columns with CLOB datatype.

I want to convert CLOB columns (with records) into varchar2(4000)type.

And now my approach...

1)Generate the ddl scripts for the objects that are existent in usr1's schema, using DBMS_metadata.Get_DDl utility.

2) Edit the ddl script accordingly (replace columns with CLOB to varchar2(4000), change user etc..).

3)Recreate these Objects in usr2's schema, insert records into these objects using simple insert statement...

4)Drop all the objects that are pre-existing in usr1's schema and export/import the object from usr2 tousr1.

Now , can you pls point out what are the common flaws and drawbacks that may occur while following this approach?

->What problems i may face using such an approach?

->What precautions i must take, if i need to follow the above approach, so that data integrity and consistency does gets affected?

->Is there any other better, short and effective way to achieve above results for all the objects using a single or fewer set of instructions. (Besides using the above approach or single handedly adding new varchar2 columns, populating new columns using old CLOB columns, dropping old clob columns etc... from each object)?

Pls suggest...

Thanx in advance, with kindest regards.
Vikas.

Tom Kyte

Followup  

August 02, 2005 - 7:31 am UTC

what is wrong with the suggestion I already gave that you have ruled out.

CLOB - Feedback

August 03, 2005 - 6:12 am UTC

Reviewer: VIKAS from INDIA

Dear Mr Kyte,

Pls dont get me wrong. I have not ruled out your valuable suggestion - No, not at all.

Infact the above approach is basically derived from the suggestion given by you. The only difference is, your's is based on Column level changes and the one i want to try and implement is based on Table level changes.

The main idea behind using this approach is to bring down some extra sweat, by, one by one handling each table that has CLOB as datatype of their columns.(There are numerous such tables in my Database)

So before actually implementing that approach, I just wanted to know whether this is the right one or should i drop this idea.

If its alright, then what precautuions and care should i keep in mind? That was all about it.

Take care.
Kindest regards...
Vikas.

Tom Kyte

Followup  

August 03, 2005 - 11:40 am UTC

seems like a alot of work to me, to recreate an entire schema in another user account after editing all of the ddl.



Number of lines in a LOB?

August 28, 2005 - 3:29 pm UTC

Reviewer: VA from New Jersey, USA

How can I get the number of lines in a text file stored in a BLOB column? Assuming the line has the standard DOS line delimters, chr(10)?

Thanks

Tom Kyte

Followup  

August 28, 2005 - 4:09 pm UTC

you would have to iterate through the LOB and count them.

dbms_lob.instr will find the n'th occurence of the newline.

Number of lines in a LOB?

August 28, 2005 - 6:28 pm UTC

Reviewer: A reader

Can you please show a quick example or point me to one of how to do that? Thanks

Tom Kyte

Followup  

August 29, 2005 - 1:00 am UTC

thought of an easier way...


ops$tkyte@ORA9IR2> desc t;
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 Y                                                 CLOB

ops$tkyte@ORA9IR2> select substr(y,1,20) text,
  2         dbms_lob.getLength(y) len,
  3             length(y)-length(replace(y,chr(10),'')) nl
  4    from t;

TEXT
-------------------------------------------------------------------------------
       LEN         NL
---------- ----------
How Now
Brown Cow,
T
   2084000     125040



that won't work in 8i, but you didn't say.... 

Doesnt work for BLOB

August 29, 2005 - 9:33 am UTC

Reviewer: VA from New Jersey, USA

I am on 9iR2

select substr(y,1,20) text,
dbms_lob.getLength(y) len,
length(y)-length(replace(y,chr(10),'')) nl

doesnt work when "Y" is a BLOB column. I get a ORA-932: Expected number got BLOB

Any workarounds?

Thanks

Tom Kyte

Followup  

August 29, 2005 - 1:42 pm UTC

why are you looking for newlines in a blob? text MUST be stored in a clob???????

to do this on a blob will mean you'll be writing your own function and to say that it would be slow would be saying something conservative

Why BLOB?

August 29, 2005 - 4:34 pm UTC

Reviewer: A reader

I am doing this on files uploaded using HTML DB which uses the mod_plsql feature of storing all uploaded files in a BLOB column.

Tom Kyte

Followup  

August 30, 2005 - 1:27 am UTC

how often do you need to do this.

how many rows at a time do you need to do this to.

Why BLOB?

August 30, 2005 - 9:04 am UTC

Reviewer: A reader

I have a parse_file routine that takes the CSV file uploaded by mod_plsql and parses the CSV records into a HTML DB collection. Parsing each record, optionally enclosed by double quotes and stuff is a expensive process for more than a couple of hundred records at a time.

So, before I launch the parse process, I would like to get a quick count of number of lines and if it is more than N lines, I would raise an error and ask user to upload a smaller file.

How often? This parse_file routine is a generic routine used by many HTML DB pages. How many rows? Thats what I want to find out! But I dont expect users to upload gigantic CSV files, probably few hundred rows at most.

Thanks

Tom Kyte

Followup  

August 30, 2005 - 12:13 pm UTC

oh, why not just use dbms_lob.getlenth? instead of assuming N lines is bad, assume M bytes is bad? that'll be fast.


say "few hundred rows" of "a couple hundred bytes"

plug in a number for a "few" and a "couple" and multiply.


it'll be as good as newlines - if I upload 1,000 lines of two columns - and you upload 500 lines of 50 columns -- I should be OK, you might not be OK right -- so it is a function of newlines and columns -- not just newlines -- hence it is based on the size of the LOB -- nothing else :)

Also, if you were able to write a function that can PARSE the blob -- you can write a function to count newlines (and should realize that if the blob is big, counting the newlines will take..... a long time :)


August 30, 2005 - 1:39 pm UTC

Reviewer: VA from New Jersey, USA

Thats a good idea, but if the file is really large even uploading it would take a long time.

Uploading it just to do a dbms_lob.getlength() and deleting it seems like a waste of resources.

Unfortunately, there seems to be way to access the file from client-side Javascript. Ideally, I would just do the getlength part in client-side JS and not even bother submitting the page if it is greater than N bytes.

Thanks

Tom Kyte

Followup  

August 30, 2005 - 1:47 pm UTC

and by the time your plsql routine is called -- guess what the entire file is already uploaded (so I'm not seeing your point????)

uploading it, count lines and then reject -- seems like a waste of resources

uploading it, count the bytes and then reject, that is LESS resources than you just used to count lines!!!!!

now what?



August 30, 2005 - 2:00 pm UTC

Reviewer: VA from New Jersey, USA

Tom, relax. You misunderstand, I am agreeing with you, I liked your suggestion.

Thats why I said "Ideally, I would just do the getlength part in client-side JS and not even bother submitting the page if it is greater than N bytes"

But given the current technoogies, there is simply no way to get the size of a file from Javascript (as far as I can tell)

Thanks

Tom Kyte

Followup  

August 30, 2005 - 2:04 pm UTC

I've no idea, being the 'server side guy'

so, let us ask everyone else:

anything on the client to prevent a big upload????

CLOB to VARCHAR2

August 30, 2005 - 10:39 pm UTC

Reviewer: mo

Tom:

I converted a table from another client/server system into Access into oracle. One field "Notes" was of type "CLOB". When i inserted the data from this temp oracle table into my permanent oracle table I got an error because my field type is "VARCHAR2(1000). When I ran:

Alter table stockcodes modify notes varchar2(1000);

oracle could not convert it.

Is there a way where I can grab this data in this "CLOB" field and insert it into "VARCHAR2" field?

THank you,

Tom Kyte

Followup  

August 31, 2005 - 1:14 pm UTC

ops$tkyte-ORA10G> create table t ( x int, notes clob );

Table created.

ops$tkyte-ORA10G> insert into t values ( 1, 'hello world' );

1 row created.

ops$tkyte-ORA10G>
ops$tkyte-ORA10G> alter table t add temp varchar2(1000);

Table altered.

ops$tkyte-ORA10G> update t set temp = dbms_lob.substr( notes, 1000, 1 );

1 row updated.

ops$tkyte-ORA10G> alter table t drop column notes;

Table altered.

ops$tkyte-ORA10G> alter table t rename column temp to notes;

Table altered.

ops$tkyte-ORA10G> desc t;
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 X                                                 NUMBER(38)
 NOTES                                             VARCHAR2(1000)

ops$tkyte-ORA10G> select * from t;

         X
----------
NOTES
-------------------------------------------------------------------------------
         1
hello world

 

CLOB

August 31, 2005 - 1:39 pm UTC

Reviewer: mo

Tom:

Thanks, I ended up exporting the table to 9i database, adding a varchar2(1000) column and updating that column with the CLOB column. Does not that yield the same result?
Your way is easier though.

CLOB insert & update

October 06, 2005 - 5:59 am UTC

Reviewer: atul from India

Hi,

We have tables with CLOB columns.
We have procedure which inserts into LOB and
update table which doesn't include CLOB column.


Our DB_BLOCK_SIZE is 8k.
Avg CLOB lenght is < 4K
Max clob is around 17K,But we have few such records.

Could you please help

1)Could we put CLOB out-of-line to avoid scanning CLOB column while update?

2)Could CACHE option will help?As we can see wait events for Direct Path read after performing 20 Min load test

++++++++
direct path read (lob) 15,325 0 5 0 0.8
SQL*Net break/reset to clien 13 0 3 232 0.0
db file scattered read 6,656 0 3 0 0.3
log file sequential read 56 0 2 44 0.0
buffer busy waits 102 0 2 17 0.0
direct path read 466 0 1 1 0.0
enqueue 1 0 1 532 0.0
direct path write (lob) 356 0 0 1 0.0
++++++++++++++++++

But its not in Top 5 Wait events.

3)If we move CLOB to its own tablespac,Do we still need to explicaitly alter table for "disable in line"?


Thanks,
Atul

2)




Tom Kyte

Followup  

October 06, 2005 - 7:56 am UTC

1) yes, but ask yourself first "do I access the table via an index during the update" - if so, .... you are not "scanning" the clob.

2) yes, it could - since every WRITE to the (out of line - over 4k) lob must make the client WAIT for the write to actually happen and every read of the (out of line - over 4k) block must make the client wait for the IO to happen.


3) tablespace has nothing to do with in/out of line - they are different and independent of each other.

Dropping a table containing a LOB

January 09, 2007 - 2:19 pm UTC

Reviewer: Bob from London, UK

Hi Tom,

I have a question for you regarding LOB's (CLOB).

My initial problem was a tablespace "x_data" being 99% used. When I investigated further, I found a table "x" (containing a CLOB datatype) having a size of 2GB with one record.

select sum(bytes/1024/1024)
from dba_extents
where segment_name = 'X'

I got the table create table statement from (dbms_metadata.get_ddl) and noticed that the initial extent size was set to 2GB. I recreated the table with an initial extent size of 1MB.

I ran the above SQL statement and it was fine (size 1MB).

I then checked the size of of "x_data" - it was 99% full. I went into Enterprise Manager -> tuning pack -> tablespace map and noticed a SYS_LOBxxxxx (in "x_data" tablespace) segment of 1.28GB after having dropped the table.

What do I need to do in order to drop this LOB segment. It exists as an entry in DBA_LOBS.

Thanks
Tom Kyte

Followup  

January 11, 2007 - 9:15 pm UTC

lobs have their own segments and storage characteristics. Use get_ddl again and see what the lob is sized to, perhaps it too was set to 2gb.

Dropping a table containing a LOB

January 10, 2007 - 12:41 pm UTC

Reviewer: Bob from London

Problem solved - the SYSxxx LOB segment belonged to another table from a different schema!


CLOB - Design

January 15, 2007 - 4:16 am UTC

Reviewer: A reader

Hi Tom

I have a design Question

We get data from 5 external sources as CSV files. Each in different format and dealing with different business. Each has a big content that needs to be stored in a CLOB column apart from other data.

We manipulate the data and store it in 5 Different tables having CLOB in each table.

End users use our Serach UI in which they type few words and we need to show all the data that contains those words as part of the CLOB data (even if it is across all the 5 tables).

As a solution for this I could think of ORACLE TEXT index.

But still I think I need to do 5 SELECTs to fetch from 5 different tables, as I can't do UNION with CLOBs.

Another solution I could think of is store all the CLOB data in one sigle table with unique ID and store that ID in the other 5 tables as FKs. In this case I would be searching onle one table (i.e. CLOB_data_table) and outer join the results with other 5 tables to get other information.

Which one do you think as the better one. If both are not good, could you please suggest one.

Thanks in Advance
Kamini

Followup to Origianl Question of this thread

January 20, 2007 - 11:11 pm UTC

Reviewer: Madhava Reddy from NJ,USA

Hi,
the following query might be helpful to solve the first question of this thread (I saw this thread now for the first time and thought I can give a try).

select substr(y,instr(y,chr(13),-instr(y,'interesting'),1)+2,(instr(y,chr(13),instr(y,'interesting'),1)-instr(y,chr(13),-instr(y,'interesting'),1))) line
from t
where y like '%interesting%';


One thing I don't understand is that why length of carriage return is returned as 3 by Oracle (I might be missing some basics here.Hence, I added 2 to nullify this in the above query).


A design Question

February 17, 2007 - 8:57 am UTC

Reviewer: Kamini from India

Hi Tom

I have a design Question

We get data from 5 external sources as CSV files. Each in different format and dealing with different business. Each has a big content that needs to be stored in a CLOB column apart from other data.

We manipulate the data and store it in 5 Different tables having CLOB in each table.

End users use our Serach UI in which they type few words and we need to show all the data that contains those words as part of the CLOB data (even if it is across all the 5 tables).

As a solution for this I could think of ORACLE TEXT index.

But still I think I need to do 5 SELECTs to fetch from 5 different tables, as I can't do UNION with CLOBs.

Another solution I could think of is store all the CLOB data in one sigle table with unique ID and store that ID in the other 5 tables as FKs. In this case I would be searching onle one table (i.e. CLOB_data_table) and outer join the results with other 5 tables to get other information.

Which one do you think as the better one. If both are not good, could you please suggest one.

Thanks in Advance
Kamini

February 19, 2007 - 8:50 am UTC

Reviewer: A reader

Hi Thomas,

we are facing problems with updation of clob data.
the clob data will be around 20k.

i have created a table based on tablespace with 16k block size.
i have also set db_16k_block_size.

so are there any additional parameter which can be played. with.
Tom Kyte

Followup  

February 19, 2007 - 9:36 am UTC

I am facing a problem myself.

I have no clue what your problem might be, that is a problem. We sort of need to fix that problem before we can discuss a solution to your as yet unknown problem.

Oh my god

February 19, 2007 - 12:28 pm UTC

Reviewer: A reader

Sorry Sorry Sorry.

ok the problem is the clob updations are pretty slow.
that't the main problem.
Tom Kyte

Followup  

February 19, 2007 - 2:20 pm UTC

so, how are you doing it. and define "slow" and explain your expectations (eg: what is 'fast enough')

and what is the "oh my god" for? If you want help, well, be a tad more clear in stating what the PROBLEM is.

clob

March 06, 2007 - 5:15 pm UTC

Reviewer: sam

Tom:

I am sending an http message using curl (uploading file) into an oracle procedure that would save the files.

c:\curl\curl -F "i_file=@1234.xml"
http://xxx.xx.xxx/pls/ttt/save_data


SQL> desc test;
Name Null? Type
----------------------------------------- -------- ----------------------------
EMPID NUMBER
FIRST_NAME VARCHAR2(20)
LAST_NAME VARCHAR2(20)
ABC_UPDATES CLOB


My save _data does this:

PROCEDURE SAVE_DATA
(i_empid VARCHAR2 DEFAULT NULL,
i_first_name VARCHAR2 DEFAULT NULL,
i_last_name VARCHAR2 DEFAULT NULL,
i_file VARCHAR2 DEFAULT NULL )
IS

BEGIN

insert into TEST
values (i_empid, i_first_name,i_last_name,i_file);
commit;

END;

I keep geting ORA--00942: table or view does not exist.

Is there something wrong with the way I am saving file into CLOB.

Thanks
Tom Kyte

Followup  

March 06, 2007 - 5:27 pm UTC

have you ever done a file upload with mod_plsql

that is NOT the way it works. You have not done the setup - we upload into a table (document) and pass you a lob locator.

think about it, a varchar2 is limited to 32k in plsql, that'll never work.


please - check out the docs, do a file upload using a web browser, test it out, get it working - and then start this hack.

clob

March 09, 2007 - 10:32 pm UTC

Reviewer: sam

Tom:

I do have a documents table. They also use a varchar2 in the write info procedure here. What am i doing different?

http://download-east.oracle.com/docs/cd/B14099_19/web.1012/b14010/concept.htm#i1005866


Tom Kyte

Followup  

March 12, 2007 - 5:04 pm UTC

I have no idea Sam, no idea.

You don't give us any ideas.

Reading data from CLOB based on a join.

July 11, 2020 - 5:26 am UTC

Reviewer: Rajeshwaran, Jeyabal

Team:

The below testcase was modelled like our business requirement.

Where we need to pull data from CLOB column(actually having JSON data in it)based on a join.

we are on Exadata cloud@customer X7 Quater rack with Oracle 18c (18.10) database.

when i say

select t1.object_name, t2.id,t2.the_clob
from t1, t2 
where t1.id = t2.id;


get complete in few 5 sec, but when i say

select t1.object_name, t2.id,t2.the_clob
from t1, t2 
where t1.id = t2.id;


this was running for hours - given below the sql-monitor report for the same, Kindly help us the understand what is causing the slowness here and what can be done to make it got faster.

c##rajesh@PDB1> create table t1
  2  nologging
  3  as
  4  select a.*, rownum as id
  5  from all_objects a,
  6      all_users
  7  where rownum <= 4000000;

Table created.

c##rajesh@PDB1> create table t2
  2  nologging
  3  as
  4  select rownum as id, x1.the_clob
  5  from all_objects a ,
  6      ( select json_arrayagg( json_object( 'owner' value owner,
  7          'object_name' value object_name,
  8          'object_type' value object_type,
  9          'object_id' value object_id,
 10          'created' value created) returning clob) the_clob
 11  from (
 12  select owner,object_name,object_type,object_id,created
 13  from all_objects
 14  where rownum <=450
 15       ) ) x1
 16  where rownum <=3600;

Table created.

c##rajesh@PDB1> select count(*),
  2      max( dbms_lob.getlength(the_clob) )
  3  from t2;

  COUNT(*) MAX(DBMS_LOB.GETLENGTH(THE_CLOB))
---------- ---------------------------------
      3600                             51799

c##rajesh@PDB1> select count(*),
  2      max( dbms_lob.getlength(the_clob) ) max_size,
  3      avg( dbms_lob.getlength(the_clob) ) avg_size,
  4      min( dbms_lob.getlength(the_clob) ) min_size
  5  from t2;

  COUNT(*)   MAX_SIZE   AVG_SIZE   MIN_SIZE
---------- ---------- ---------- ----------
      3600      51799      51799      51799

c##rajesh@PDB1> set timing on time on
10:29:30 c##rajesh@PDB1> create table t3
10:29:32   2  tablespace ehdl
10:29:32   3  as
10:29:32   4  select t1.object_name, t2.id
10:29:32   5  from t1, t2
10:29:32   6  where t1.id = t2.id ;

Table created.

Elapsed: 00:00:04.16
10:29:40 c##rajesh@PDB1> drop table t3 purge;

Table dropped.


when i pull the data from "the_clob" column - was running slow.

10:30:33 c##rajesh@PDB1> create table t3
10:31:07   2  tablespace ehdl
10:31:07   3  nologging
10:31:07   4  as
10:31:07   5  select t1.object_name, t2.id,t2.the_clob
10:31:07   6  from t1, t2
10:31:07   7  where t1.id = t2.id ;


Here is the sql-monitor report for the same:

c##rajesh@PDB1> @sqlmonitortext cqtp4hdvtp760
old   2:        :x := dbms_sqltune.report_sql_monitor(sql_id=>'&1',type=>'TEXT');
new   2:        :x := dbms_sqltune.report_sql_monitor(sql_id=>'cqtp4hdvtp760',type=>'TEXT');
SQL Monitoring Report

SQL Text
------------------------------
create table t3 tablespace ehdl nologging as select t1.object_name, t2.id,t2.the_clob from t1, t2 where t1.id = t2.id

Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  2
 Session             :  C##RAJESH (19:54701)
 SQL ID              :  cqtp4hdvtp760
 SQL Execution ID    :  33554432
 Execution Started   :  07/11/2020 01:01:11
 First Refresh Time  :  07/11/2020 01:01:15
 Last Refresh Time   :  07/11/2020 01:21:23
 Duration            :  1213s
 Module/Action       :  SQL*Plus/-
 Service             :  pdb1
 Program             :  sqlplus.exe

Global Stats
============================================================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  | PL/SQL  | Buffer | Read | Read  | Write | Write |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Time(s) |  Gets  | Reqs | Bytes | Reqs  | Bytes | Returned Bytes |
============================================================================================================================
|    1241 |     570 |      670 |        0.12 |     0.17 |    0.00 |     6M | 492K |  48GB |  480K |  47GB |           95GB |
============================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3168156750)
=============================================================================================================================================================================================================
| Id   |             Operation             | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Write | Write |  Mem  | Activity |         Activity Detail          | Progress |
|      |                                   |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | Reqs  | Bytes |       |   (%)    |           (# samples)            |          |
=============================================================================================================================================================================================================
| -> 0 | CREATE TABLE STATEMENT            |      |         |       |      1209 |     +4 |     1 |     0 |         |       |       |       |     . |          |                                  |          |
| -> 1 |   LOAD AS SELECT                  | T3   |         |       |      1213 |     +1 |     1 |     0 | 492K |  47GB |  480K |  47GB |   2MB |       99.83 | log buffer space (35)            |          |
|      |                                   |      |         |       |           |        |       |       |         |       |       |       |       |          | Cpu (496)                        |          |
|      |                                   |      |         |       |           |        |       |       |         |       |       |       |       |          | ASM IO for non-blocking poll (3) |          |
|      |                                   |      |         |       |           |        |       |       |         |       |       |       |       |          | direct path read (354)           |          |
|      |                                   |      |         |       |           |        |       |       |         |       |       |       |       |          | direct path write (317)          |          |
| -> 2 |    OPTIMIZER STATISTICS GATHERING |      |      4M | 20789 |      1209 |     +4 |     1 |     479K |      |       |       |       | 256KB |          |                                  |          |
|    3 |     HASH JOIN                     |      |      4M | 20789 |      1209 |     +4 |     1 |     479K |      |       |       |       |   5MB |     0.08 | Cpu (1)                          |          |
|    4 |      TABLE ACCESS STORAGE FULL    | T2   |    3600 |    10 |         1 |     +4 |     1 |     3600 |      |       |       |       |     . |          |                                  |          |
| -> 5 |      TABLE ACCESS STORAGE FULL    | T1   |      4M | 20767 |      1209 |     +4 |     1 |     478K |   86 |  43MB |       |       |     . |          |                                  |      11% |
=============================================================================================================================================================================================================


One thing, i observe here is the we have been waited on "direct path read and writes" - kindly advice, what can be done here to make it go faster.

let me know if this has to be raised as a new question.
Connor McDonald

Followup  

July 13, 2020 - 3:16 am UTC

First can you clarify this statement:


when i say

select t1.object_name, t2.id,t2.the_clob
from t1, t2
where t1.id = t2.id;

get complete in few 5 sec, but when i say

select t1.object_name, t2.id,t2.the_clob
from t1, t2
where t1.id = t2.id;


Reading data from CLOB based on a join.

July 11, 2020 - 11:36 am UTC

Reviewer: Rajeshwaran, Jeyabal

sorry in the above demo , the ddl for the table "T1" was like this:

create table t1 
nologging 
as
select a.*, ceil(dbms_random.value(1,3600)) as id
from all_objects a, 
    all_users 
where rownum <= 4000000;

Reading data from CLOB based on a join.

July 13, 2020 - 3:36 am UTC

Reviewer: Rajeshwaran, Jeyabal

thanks for your time.

Sorry - when i say (without clob column in the select column list)

select t1.object_name, t2.id
from t1, t2 
where t1.id = t2.id;


got competed in few second
but when i say (with clob column included)

select t1.object_name, t2.id,t2.the_clob
from t1, t2 
where t1.id = t2.id;


took almost 2+hours

I understand that - each row from T2 matches with almost 1000+ rows in T1 - so the number of times "the_clob" column selected in the final output is huge.
so what can be done, to make this "ddl" go faster ? kindly advice.

create table t3 
tablespace ehdl 
nologging 
as 
select t1.object_name, t2.id,
  t2.the_clob 
from t1, t2 
where t1.id = t2.id;

Connor McDonald

Followup  

July 20, 2020 - 3:25 am UTC

A lob needs to read via its lob index, and the concept of "index fast full scan" is not present, so to read a lob, it is similar to an indexed read - which means the concept of the "multi-block read" does not come into play.

We can see this with a trace. I repeated your example, and the tkprof looks like

create table t3b as
select t1.object_name, t2.id, t2.the_clob
from t1, t2
where t1.id = t2.id 

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          9          0           0
Execute      1      1.89      12.36     130351      85108      51851        3600
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      1.89      12.36     130351      85117      51851        3600

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 107  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD AS SELECT  T3B (cr=85118 pr=130351 pw=46829 time=12361477 us starts=1)
      3600       3600       3600   OPTIMIZER STATISTICS GATHERING  (cr=83558 pr=83526 pw=0 time=21667 us starts=1 cost=23026 size=309600 card=3600)
      3600       3600       3600    HASH JOIN  (cr=83558 pr=83526 pw=0 time=15175 us starts=1 cost=23026 size=309600 card=3600)
      3600       3600       3600     TABLE ACCESS FULL T2 (cr=27 pr=0 pw=0 time=2002 us starts=1 cost=10 size=162000 card=3600)
   4000000    4000000    4000000     TABLE ACCESS FULL T1 (cr=83531 pr=83526 pw=0 time=990663 us starts=1 cost=22986 size=164000000 card=4000000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  index (re)build lock or pin object              1        0.00          0.00
  PGA memory operation                            4        0.00          0.00
  direct path read                             4312        0.06          9.03
  enq: HW - contention                            1        0.00          0.00
  local write wait                               12        0.00          0.00
  reliable message                                1        0.00          0.00
  enq: CR - block range reuse ckpt                1        0.00          0.00
  db file sequential read                        25        0.00          0.00
  direct path write                             103        0.01          0.22
  enq: TX - contention                            5        0.55          0.93
  log file switch completion                      4        0.01          0.04
  log file sync                                   1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00
********************************************************************************


If we dig into the trace file to see those 4312 direct path reads, we can isolate those that pertain just to the lob segment, in my case:

SQL> select data_object_id
  2  from   dba_objects
  3  where  object_name = 'SYS_LOB0000092269C00002$$';

DATA_OBJECT_ID
--------------
         92270


and then see how many waits for there

# grep direct db19_ora_18840.trc | grep 92270 | wc -l
   3654


and you see approx get a direct read per row of the lobs we need to get.

Some things to look at

- Setting CACHE on the lob the source table can improve things a little, but you are obviously then smashing a lot of lob data into your buffer cache

- parallel DML

- DIY parallelism (eg dbms_parallel_execute)

More to Explore

DBMS_LOB

More on PL/SQL routine DBMS_LOB here