Great Work
Jeff, May 27, 2004 - 1:09 pm UTC
Tom you're the man
Thank you Tom
perumal, May 27, 2004 - 11:47 pm UTC
Hi Tom,
Thank you. I am realy happy and thank you for your response and suggestion.
I got error and have a doubt...
perumal, May 28, 2004 - 5:52 am UTC
Hi Tom,
I have done as you did.
I got the following error.
SQL> declare
2 l_text long;
3 begin
4 for i in 1 .. 3000
5 loop
6 l_text := l_text || 'line ' || i || chr(10);
7 end loop;
8 l_text := l_text || 'interesting line 3001';
9 insert into t values ( 4, l_text );
10 l_text := l_text || chr(10) || 'line 3002';
11 insert into t values ( 5, l_text );
12 l_text := 'interesting line 0' || chr(10) || l_text;
13 insert into t values ( 6, l_text );
14 end;
15 /
insert into t values ( 4, l_text );
*
ERROR at line 9:
ORA-06550: line 9, column 39:
PLS-00382: expression is of wrong type
ORA-06550: line 9, column 13:
PL/SQL: SQL Statement ignored
ORA-06550: line 11, column 37:
PLS-00382: expression is of wrong type
ORA-06550: line 11, column 11:
PL/SQL: SQL Statement ignored
ORA-06550: line 13, column 37:
PLS-00382: expression is of wrong type
ORA-06550: line 13, column 11:
PL/SQL: SQL Statement ignored
what's wrong with me. kindly show it.
If my data is like this...
SQL> insert into t values(1,'line 1
2 line 2
3 this is interesting line 3
4 this is also interesting line 4
5 line 5');
1 row created.
SQL>
SQL> insert into t values(2, 'this is interesting line 1
2 this is also interesting line 2
3 this too interesting line 3
4 line 5
5 line 6
6 this is can also interesting line 7');
1 row created.
SQL>
SQL> insert into t values(3, 'this is interesting line 1
2 line 2
3 this too interesting line 3
4 this is also interesting line 5
5 line 6
6 this is can also interesting line 7');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from t;
X Y
---------- --------------------------------------------------------------------------------
1 line 1
line 2
this is interesting line 3
this is also interesting line 4
lin
2 this is interesting line 1
this is also interesting line 2
this too interestin
3 this is interesting line 1
X Y
---------- --------------------------------------------------------------------------------
line 2
this too interesting line 3
this is also in
SQL> select x,get_line(y,'interesting') from t;
X
----------
GET_LINE(Y,'INTERESTING')
---------------------------------------------------------------------------------------------------------
1
this is interesting line 3
2
this is interesting line 1
3
this is interesting line 1
SQL>
The output is not as we expected.
First record's line 4,Second record's line 2... Missing...
Expecting your's
thank you
May 28, 2004 - 11:12 am UTC
In 8i, clobs were different, you could not use them as easily as in 9i. You would have to insert an empty_clob() -- return it back out and use dbms_lob.writeAppend to place that much text into it.
You did not say you wanted *all* occurences. I just got the first, you have the code - you see the technique. You can return upto 4000 characters as a varchar2, just modify the code to get out all of the values of interest.
Oracle Text and Highlight
Edgar Chupit, June 01, 2004 - 2:24 am UTC
Hello Tom,
Is there an efficient way of using ctx_doc.highlight function to find the terms and offsets?
I've tested both ctx_doc.highlight and ctx_doc.markup and they both retrieve document, pass it via the filter and then finds the offsets. Document retrieving and filtering is very expensive procedure, but according to some sources on the web (for example TechNet) Oracle Text stores "token" together with document id and token positions in documents.
If this is true than why does Oracle Text needs to filter the document again and not simply retrieve offsets from the index?
Is there a possibility to effectively retrieve lines that contains highlighted terms?
Thank you.
June 01, 2004 - 8:15 am UTC
what sort of performance have you measured. You say it is "expensive", what metrics are you using to show that?
Oracle Text and Highligh
Edgar Chupit, June 01, 2004 - 9:22 am UTC
Hello Tom,
Mainly I've measured response time. To measure this I've used Your package runstats. I've created table and populated it with 8 rows (8 word documents) total size of documents was ~100Kbytes. Than I compared speed of highlight vs markup and received results that their
speed is equal.
If there would be a possibility to retrieve offsets of hitwords than I could store a plain text copy of the document in the clob process them same way like makrup does only without going via the filtering mechanism (that have to be expensive for word documents, because we have to call external INSO filter to parse whole document).
As You can see from the tests the highlight process for the 8 rows takes about 2 secs on my configuration, with 50 rows this is about 10-13 seconds.
Here is the test I've used:
SQL> drop index test_idx;
Index dropped.
SQL> drop table test;
Table dropped.
SQL>
SQL> create table test( id integer, data blob );
Table created.
SQL> --*** insert into test table 2 word documents, i've prepared them separatly
SQL> -- and inserted into document table
SQL> insert into test
2 select id, data
3 from document
4 where rownum < 3;
2 rows created.
SQL> --***
SQL> insert into test
2 select id, data
3 from test
4 /
2 rows created.
SQL> insert into test
2 select id, data
3 from test
4 /
4 rows created.
SQL> insert into test
2 select id, data
3 from test
4 /
8 rows created.
SQL>
SQL> select max(dbms_lob.getlength(data)) max,
2 min(dbms_lob.getlength(data)) min,
3 avg(dbms_lob.getlength(data)) avg,
4 sum(dbms_lob.getlength(data)) sum,
5 count(*) count
6 from test
7 /
MAX MIN AVG SUM COUNT
---------- ---------- ---------- ---------- ----------
7797 3769 5783 92528 16
SQL>
SQL> create index test_idx on test(data) indextype is ctxsys.context;
Index created.
SQL>
SQL> select token_count from dr$test_idx$i where token_text = 'NR';
TOKEN_COUNT
-----------
16
SQL>
SQL> set serveroutput on size 10000;
SQL> set timing on;
SQL>
SQL> declare
2 idxName varchar2(25) := 'TEST_IDX';
3 query varchar(25) := 'NR';
4 begin
5 otis.runstats_pkg.rs_start;
6 for r in ( select rowid rid from test where contains(data,query)>0 ) loop
7 ctx_doc.highlight( idxname, r.rid, query, 'HIGHTAB', plaintext=>TRUE );
8 end loop;
9 otis.runstats_pkg.rs_middle;
10 for r in ( select rowid rid from test where contains(data,query)>0 ) loop
11 ctx_doc.markup( idxname, r.rid, query, 'MARKTAB', plaintext=>TRUE );
12 end loop;
13 otis.runstats_pkg.rs_stop(P_DIFFERENCE_THRESHOLD=>150);
14 rollback;
15 end;
16 /
Run1 ran in 255 hsecs
Run2 ran in 239 hsecs
run 1 ran in 106,69% of the time
Name Run1 Run2 Diff
LATCH.child cursor hash table 157 0 -157
STAT...table scan rows gotten 253 22 -231
STAT...calls to get snapshot s 599 324 -275
STAT...recursive calls 637 311 -326
STAT...rows fetched via callba 453 114 -339
STAT...table fetch by rowid 476 130 -346
LATCH.session allocation 394 48 -346
STAT...index fetch by key 498 129 -369
LATCH.library cache pin 1,468 1,015 -453
LATCH.library cache lock 779 225 -554
LATCH.row cache objects 1,591 909 -682
STAT...buffer is not pinned co 1,018 313 -705
STAT...session logical reads 2,259 1,211 -1,048
STAT...consistent gets - exami 1,357 283 -1,074
LATCH.shared pool 1,893 807 -1,086
STAT...consistent gets 1,512 365 -1,147
STAT...consistent gets from ca 1,504 357 -1,147
LATCH.library cache 2,731 1,268 -1,463
STAT...undo change vector size 48,304 50,896 2,592
LATCH.cache buffers chains 5,406 8,053 2,647
STAT...IMU undo allocation siz 7,668 11,400 3,732
STAT...redo size 101,944 132,824 30,880
Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
16,073 14,200 -1,873 113.19%
PL/SQL procedure successfully completed.
Elapsed: 00:00:05.60
June 01, 2004 - 11:44 am UTC
it is not as simple as that. intermedia text queries can get quite involved. Also, indexes get "out of sync" (that is why my home page says "approximately", I update documents, the index entries remain -- they must go from the index to the document to verify). Also, just because something is found in the index doesn't mean it'll "hit" in the document (phrases for example -- the word 'index' might appear in the document 500 times but the phrase 'the index they created' might only be in there once, you could have searched for the phrase -- not a word)....
But, if you want to make it "fast", you could store the filtered document and index that. In that manner, the markup will be blinding fast -- no filtering needed at run time. Consider (this is just a desktop computer, nothing fancy)
ops$ora9ir2@ORA9IR2> create table demo
2 ( id int primary key,
3 theBlob blob,
4 theClob clob
5 )
6 /
Table created.
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> create or replace directory my_files as '/home/tkyte/Desktop'
2 /
Directory created.
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> declare
2 l_blob blob;
3 l_clob clob;
4 l_bfile bfile;
5 begin
6 insert into demo values ( 1, empty_blob(), empty_clob() )
7 returning theBlob, theClob into l_blob, l_clob;
8
9 l_bfile := bfilename( 'MY_FILES', 'asktom.rtf' );
10 dbms_lob.fileopen( l_bfile );
11 dbms_lob.loadfromfile( l_blob, l_bfile, dbms_lob.getlength( l_bfile ) );
12 dbms_lob.fileclose( l_bfile );
13
14 l_bfile := bfilename( 'MY_FILES', 'asktom.txt' );
15 dbms_lob.fileopen( l_bfile );
16 dbms_lob.loadfromfile( l_clob, l_bfile, dbms_lob.getlength( l_bfile ) );
17 dbms_lob.fileclose( l_bfile );
18 end;
19 /
PL/SQL procedure successfully completed.
ops$ora9ir2@ORA9IR2> select dbms_lob.getlength(theBlob) blob_len,
2 dbms_lob.getlength(theClob) Clob_len
3 from demo
4 /
BLOB_LEN CLOB_LEN
---------- ----------
42644 19199
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> create index blob_idx on demo(theBlob) indextype is ctxsys.context;
Index created.
ops$ora9ir2@ORA9IR2> create index clob_idx on demo(theClob) indextype is ctxsys.context;
Index created.
ops$ora9ir2@ORA9IR2> select * from ctx_user_index_errors;
no rows selected
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> create table marktab ( query_id number, document clob );
Table created.
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> create or replace procedure p1
2 as
3 l_clob clob;
4 begin
5 ctx_doc.markup( 'blob_idx', '1', 'similar', l_clob );
6 end;
7 /
Procedure created.
ops$ora9ir2@ORA9IR2> create or replace procedure p2
2 as
3 begin
4 ctx_doc.markup( 'blob_idx', '1', 'similar', 'MARKTAB', plaintext=>true );
5 end;
6 /
Procedure created.
ops$ora9ir2@ORA9IR2> create or replace procedure p3
2 as
3 l_clob clob;
4 begin
5 ctx_doc.markup( 'clob_idx', '1', 'similar', l_clob );
6 end;
7 /
Procedure created.
ops$ora9ir2@ORA9IR2> create or replace procedure p4
2 as
3 begin
4 ctx_doc.markup( 'clob_idx', '1', 'similar', 'MARKTAB', plaintext=>true );
5 end;
6 /
Procedure created.
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> exec p1; p2; p3; p4;
PL/SQL procedure successfully completed.
<b>just to get everything "level" (shared sql et.al.)</b>
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> @connect /
ops$ora9ir2@ORA9IR2> set termout off
ops$ora9ir2@ORA9IR2> set termout on
ops$ora9ir2@ORA9IR2>
ops$ora9ir2@ORA9IR2> set timing on
ops$ora9ir2@ORA9IR2> exec for i in 1 .. 10 loop p1; end loop;
PL/SQL procedure successfully completed.
Elapsed: 00:00:02.22
ops$ora9ir2@ORA9IR2> exec for i in 1 .. 10 loop p2; end loop;
PL/SQL procedure successfully completed.
Elapsed: 00:00:02.14
ops$ora9ir2@ORA9IR2> exec for i in 1 .. 100 loop p3; end loop;
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.68
ops$ora9ir2@ORA9IR2> exec for i in 1 .. 100 loop p4; end loop;
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.06
ops$ora9ir2@ORA9IR2>
<b>In memory markup = faster, filtered markup = really fast (note I compared 10 iterations to 100 iterations here -- else the timing was just too small to make sense of)</b>
Thank you
Edgar Chupit, June 01, 2004 - 1:49 pm UTC
Hello once again,
Thank you for in-depth explanation, this is the solution I was thinking of. Too bad that Oracle Text doesn't have internal filtered object caching, it would be very nice if we'd have one that stores clob version on the fly. But for now, yes, I will try this method of manual clob caching.
I think I will create user_datastore index on the blob field and direct datastore on clob field. When my stored procedure will be called for the blob field it will use ctx_doc.filter to populate clob field, and than after blob field index sync I will call clob index sync.
Error in my proposal
Edgar Chupit, June 01, 2004 - 2:11 pm UTC
Hello,
I've just realized that the method I've proposed is weak, because call to the ctx_doc.filter will invoke my procedure once again and I'll end up with infinitive recoursion.
Now I have a choise between 3 indexes on the same table or manual invocation of $ORACLE_HOME/ctx/bin/ctxhx
June 01, 2004 - 3:53 pm UTC
or create a procedure that does the insert -- don't let the clients do it.
in that fashion you can:
a) insert the blob
b) filter the blob
c) store the filtered text
and you only need an index on the filtered text -- you don't need an index on anything else -- use ifilter to turn blob into clob, insert clob, have index on clob and search that.
Thank you
Edgar Chupit, June 02, 2004 - 4:14 am UTC
Hello,
using this method Ive achieved much faster markup response time, and query time was improved from 50-60 seconds per search to 1-5 seconds per search.
how to return multiple markup results to web applications
Raj, July 08, 2004 - 1:12 pm UTC
Hi Tom,
The information you put here is very helpful.
I have a similar problem, can you please comment on how I should proceed:
I am doing a search on publications table Say I have got 10 matches. I need to show this 10 matches to the user using our web application. I found we can use markup procedure to highlight the search words and store it in a markup:
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:350017644883, <code>
<quote>
We pass it a unique key that we can use to identify our markedup document (I
find userenv('sessionid') great for this purpose).
</quote>
Since I have 10 results using userenv('sessionid') for all the 10 results would not probably work for my case. In this case how do I store the results in markup table? How do I maintain this table (delte the old records?)
The problem is to get search results, mark it up and show to the user. I am using Oracle 9i. Please let me know if you need more information.
Thanks much!!
July 08, 2004 - 1:30 pm UTC
use a sequence. select s.nextval from dual;
Re: how to return multiple markup results to web applications
Raj, July 08, 2004 - 2:23 pm UTC
Thank you!
What is the best way to maintain the markup table? Do I schedule a job and delete it everynight or something like that?
Is there a way to return an array of clobs to web application. I initially considered the option of creating a table type of CLOB. Fill my results in an object of this type and return a cursor. But I think this doesn't work. It is not a real table and there are no columns in it.
Thanks again!
July 08, 2004 - 3:40 pm UTC
it is up to you as to how to maintain the markup table. me, i cleaned it up after marking up.
what is a "web application", that could be most anythat.
assuming java/jdbc -- sure, you can return collections. search for
java array
on this site, i've demoed that many times.
Re: how to return multiple markup results to web applications
Raj, July 08, 2004 - 7:12 pm UTC
It's an ASP.NET web application. I wonder if there is a way to return such collections. I know I can return a recordset. So, I tried to create a type of table CLOB and return a cursor to it. But it does not work (table or view does not exist error);
something like:
Type t_tblLOB is TABLE of CLOB;
store clob markup in t_tblLOB
return a cursor to t_tblLOB.
Any idea how I can achieve this for ASP.NET web application.
Thanks for your help!
July 08, 2004 - 9:03 pm UTC
ops$tkyte@ORA9IR2> create type myTableType as table of clob;
2 /
Type created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace function foo return myTableType
2 pipelined
3 as
4 l_clob clob;
5 begin
6 for i in 1 .. 10
7 loop
8 l_clob := 'hello ' || i;
9 pipe row ( l_clob );
10 end loop;
11 return;
12 end;
13 /
Function created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from table(foo());
COLUMN_VALUE
-------------------------------------------------------------------------------
hello 1
hello 2
hello 3
hello 4
hello 5
hello 6
hello 7
hello 8
hello 9
hello 10
10 rows selected.
A reader, July 09, 2004 - 5:29 am UTC
First timer dealing with CLOB
George, July 10, 2004 - 7:31 am UTC
Thank you Perumal and Tom.
This was very helpful in giving me an idea of the CLOB. I will try to work with it and hope I can figure it out from there...
lob opening question
A reader, July 11, 2004 - 4:29 pm UTC
From 10g doc
"Opening a LOB in read write mode defers any index maintenance on the LOB column until you close the LOB. useful if there is an extensible index on the LOB column and you do not want the database to perform index maintenance every time you write to the LOB.
This technique can increase the performance of your application if you are doing several write operations on the LOB while it is open."
What is an extensible index associated with a lob?
If it has to be created separately - then it seems to
imply that one does not need to open the lob in read-write
mode usually for performance gains - is that correct?
Thanx!
July 12, 2004 - 11:02 am UTC
Oracle text is an extensible index for example. Any "non standard" index would be.
thanx Tom!
A reader, July 12, 2004 - 11:20 am UTC
"Oracle text is an extensible index for example. Any "non standard" index would be."
I take it that this does not mean a LOB index
(that points to chunks of LOB data), correct?
July 12, 2004 - 11:50 am UTC
the lob index is always maintained, else we wouldn't be able to find the bits and pieces of it!
thanx!
A reader, July 12, 2004 - 12:28 pm UTC
just confirming to see if I am missing something!:)
performance problem while using ctx_doc.markup
Deba, April 12, 2005 - 8:38 am UTC
I have seen your initial process to fasten up process while using ctx_doc.markup. But still is not clear to me and you will understand what I am trying to say if you kindly go through my problem.
My environment is Oracle 9i rel 2 (9.2.0.5) and OS is Solaris version 9.
I have created a table
CREATE TABLE DOCUMENTS
(
NAME VARCHAR2 (256),
MIME_TYPE VARCHAR2 (128),
DOC_SIZE NUMBER,
DAD_CHARSET VARCHAR2 (128),
LAST_UPDATED DATE,
CONTENT_TYPE VARCHAR2 (128),
CONTENT CLOB,
BLOB_CONTENT BLOB,
OWNER VARCHAR2 (100),
DOMAIN VARCHAR2 (20),
DOC_TYPE VARCHAR2 (30),
RESOLUTION VARCHAR2 (2000),
SUBJECT VARCHAR2 (300),
UPLOAD_TYPE VARCHAR2 (15)
)
LOB (BLOB_CONTENT) STORE AS BLOB_CONTENT
(TABLESPACE CLOB_DATA DISABLE STORAGE IN ROW
CHUNK 5 PCTVERSION 1
STORAGE (INITIAL 1m next 1m maxextents unlimited pctincrease 0));
Now I have inserted a 4 records where one is ms-word document, one is pdf document, one is html document and one is text document(.txt) using WEBDB and Oracle HTTP server.
After inserting the records, I created index as follows
begin
ctx_ddl.create_preference('doc_lex', 'BASIC_LEXER');
ctx_ddl.set_attribute('doc_lex', 'printjoins', '_-');
end;
create index doc_index on documents(blob_content) indextype is ctxsys.context
parameters ('datastore ctxsys.default_datastore filter ctxsys.inso_filter format column upload_type LEXER doc_lex');
Now I have written a small routine which is as follows
create or replace procedure das as
lclob clob;
begin
for rec in (SELECT rowid,name FROM DOCUMENTS WHERE contains(blob_content,'oracle',1) > 0 ) loop
ctx_doc.markup( 'DOC_INDEX', rec.rowid, 'oracle', lclob);
end loop;
end;
/
Now I am runing the above procedure
<I am doing fresh login here>
17:53:40 SQL> execute das;
PL/SQL procedure successfully completed.
Elapsed: 00:00:26.31
17:54:22 SQL>
In the above procedure there is nothing except ctx_doc.markup and one sql statement.If I run that sql statement separately then the required is
17:54:22 SQL> SELECT rowid,name FROM DOCUMENTS WHERE contains(blob_content,'oracle',1) > 0;
ROWID
------------------
NAME
--------------------------------------------------------------------------------
AATIa7AA0AAAWhiAAC
F13934/mod_plsql_Overview.htm
AATIa7AA0AAAWhiAAB
F22084/Batch_Process_Automation(Support_Doc).pdf
Elapsed: 00:00:01.12
So it is clear that main problem is coming from ctx_doc.markup.
I need your valuable comments to solve this performance issue.
Regards
Deba
April 12, 2005 - 8:56 pm UTC
why would you markup N documents at a time?
but basically, I do not have your documents or even a concept of their size. I might try doing it document by document and seeing "ah hah, it is this TYPE of document" and work from there.
performance problem while using ctx_doc.markup
Deba, April 14, 2005 - 9:00 am UTC
Hi Tom,
Thanks for reply. It is not clear what you are trying to say . I am not doing markup N documents at a time. Rather I am taking a row and marking up that document. That's why I am using FOR loop. The size of the documents are 500 kb in average. It may be a case that I am able to understand what you are trying to say. If possible please give me one example. The problem is that I ahve to resolve this problem immediately.The sizes of documents are as follows
SQL>select subject,dbms_lob.getlength(blob_content) from documents;
SUBJECT
------------------------------------------------------------DBMS_LOB.GETLENGTH(BLOB_CONTENT)
--------------------------------
html doc 26171
word doc 31232
pdf 24585
I thinks sizes are really small. Now if I use ctx_doc.markup then response will be coming within 2 seconds. But when I am uploading any big file say around 400 kb file, then response is becoming too slow. Is there any restriction about the size of the files to be uploaded ? Is it possible to have any solution of using the same method for handling documents whose sizes are more than 500 kb ? I want your valuable comments to tune the ctx_doc.markup.
Thanks
Deba
April 14, 2005 - 9:15 am UTC
create or replace procedure das as
lclob clob;
begin
for rec in (SELECT rowid,name FROM DOCUMENTS WHERE
contains(blob_content,'oracle',1) > 0 ) loop
ctx_doc.markup( 'DOC_INDEX', rec.rowid, 'oracle', lclob);
end loop;
end;
/
yes you were? you looped over all documents with oracle and marked them up. normally I would expect an application to do "a document" to be shown to a user, not all documents at the same time.
what speed are you seeing at 500kb, becoming too slow is too vague.
performance problem while using ctx_doc.markup
deba, April 18, 2005 - 1:11 am UTC
Tom,
Your statement "normally I would expect an application to do 'a document' to be shown to a user, not all documents at the same time " is not clear till now. I am telling you what I am trying to do. In my portal, there is text box where use can put a search criteria. After that when user chooses that search will be done on all type of documents, then obviously I have to sacn all the documents. So you know my objective now. I am looking for a solution to make it faster ( as you are always helping out ).
"it is too slow " means that if I start searching a document ( size is 484 kb ) it is so slow that I had to kill my session.
Thanks
Deba
April 18, 2005 - 7:09 am UTC
there is however
a) the search (no markup)
b) the click (markup)
when you search on my site, is it slow? I am using the same technology here.
performance problem while using ctx_doc.markup
Deba, April 20, 2005 - 2:45 am UTC
Tom,
What you are saying is not clear . What is the meaning of
"the search (no markup)" and "the click (markup)". Please clarify it. Please tell me hot to do this . If you find any problem in my code you can rectify that . Please give some examples ( which you generally provide in every cases and which are also very usefull to undertsand )
Thanks
Deba
April 20, 2005 - 6:56 am UTC
when you search, you use contains in the query:
select* from docs where contains( the_doc, :x ) > 0;
then the user picks A DOCUMENT from that result (not all docs, just ONE) and says "show me", then you call the markup procedure.
you are marking up all documents in a result set in your example, that is not how it would be used.
How to - CLOB Conversion
VIKAS, July 28, 2005 - 9:54 am UTC
Dear Mr. Kyte,
While Migrating our database using OMWB (Only Objects, functions, Triggers, Procedures. -> Not data.) from Ms Sql server to Oracle 9i, certain Table columns in our Database have got assigned CLOB as their Datatype by default.
Now it is required from us to change their type to Varchar2(4000), these columns will also be containing Null values.
Can you pls advice me, How this can be achieved effectively in Oracle 9i?
I'd tried changing the datatype using Enterprise Manager, but i am getting
"Ora - 22859 Invalid modification of columns"
error, even though if am logged in as the object owner or sysdba.
What is the reason behind this? What actually is Oracle trying to do internally? Are my tables locked, or do i not have sufficient priviliges? (I tried doing this as an sysdba, dba roles and grant)
Can you also suggest me a querry which can give me the Name, Column_name of the tables which contain CLOB's in my database? also...
Is their any way through which all the columns with CLOB datatype in all the tables of my database get modified to Varchar2(4000) in a one go (using command/script)?
July 28, 2005 - 10:34 am UTC
you cannot simply change datatypes, especially when the column contains data.
Your choices:
a) create new column, populate it, drop old, rename new.
b) rebuild the table into new table, drop old table, rename new table.
I'll demonstrate (a)
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t
2 as
3 select owner, view_name, to_lob(text) text from all_views;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> desc t;
Name Null? Type
---------------------------------------- -------- ----------------------------
OWNER VARCHAR2(30)
VIEW_NAME VARCHAR2(30)
TEXT CLOB
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table t add text_new varchar2(4000);
Table altered.
ops$tkyte@ORA9IR2> update t set text_new = dbms_lob.substr(text,4000,1), text=null;
2248 rows updated.
ops$tkyte@ORA9IR2> alter table t drop column text;
Table altered.
ops$tkyte@ORA9IR2> alter table t rename column text_new to text;
Table altered.
ops$tkyte@ORA9IR2> desc t;
Name Null? Type
---------------------------------------- -------- ----------------------------
OWNER VARCHAR2(30)
VIEW_NAME VARCHAR2(30)
TEXT VARCHAR2(4000)
How to? - CLOB
VIKAS, July 29, 2005 - 1:03 am UTC
Dear Mr. Kyte
Thanx a lot for your knowledge sharing. It was really very useful and of great help.
But the question that now comes out to be is, that how can i insert a Column to its original sequence in a table, where it atually existed? Do i have to drop columns, insert the required column and then update the table by inserting other columns that followed it? or is there any other short/single method to do this?
and the other one is:-
How can i get to know the detailes of all the Tables and their corresponding columns (column name and datatype) which contain CLOB data type in them, in my database? is their any view, query, script that i can use to get the desired results? or do i have to do manual checks?
Thanx with kindest regards...
Vikas.
July 29, 2005 - 8:32 am UTC
Use a view, this is what views are for.
rename table to something else
create view of table with columns in any old "order" you like.
user_tab_columns is the view you are looking for to find clobs.
CLOB - Cooool
Vikas SANGAR, July 30, 2005 - 3:43 am UTC
Dear Mr. Kyte
Thanx,
This was what, i was looking for - Dba_Table_cols.
Regards.
Vikas
CLOB - Error Correction.
VIKAS, July 30, 2005 - 5:22 am UTC
Sorry for over-spelling
(dba_table_cols instead of dba_tab_cols).
Actually what i intended was:-
user_tab_columns;
dba_tab_cols;
Both serve the purpose.
Cheers.
Conversion - CLOB
VIKAS SANGAR, August 02, 2005 - 1:33 am UTC
Daar Mr. Kyte,
Pls suggest and correct, where ever, I am wrong with my approach.
I have two users, usr1, usr2 (a fresh User).
Usr1 have various objects into its schema (Tables, Views, Triggers, Stored procedures, functions etc) along with data/records in them.
The objects in usr1 schema (Tables, dependent Views...) have certain columns with CLOB datatype.
I want to convert CLOB columns (with records) into varchar2(4000)type.
And now my approach...
1)Generate the ddl scripts for the objects that are existent in usr1's schema, using DBMS_metadata.Get_DDl utility.
2) Edit the ddl script accordingly (replace columns with CLOB to varchar2(4000), change user etc..).
3)Recreate these Objects in usr2's schema, insert records into these objects using simple insert statement...
4)Drop all the objects that are pre-existing in usr1's schema and export/import the object from usr2 tousr1.
Now , can you pls point out what are the common flaws and drawbacks that may occur while following this approach?
->What problems i may face using such an approach?
->What precautions i must take, if i need to follow the above approach, so that data integrity and consistency does gets affected?
->Is there any other better, short and effective way to achieve above results for all the objects using a single or fewer set of instructions. (Besides using the above approach or single handedly adding new varchar2 columns, populating new columns using old CLOB columns, dropping old clob columns etc... from each object)?
Pls suggest...
Thanx in advance, with kindest regards.
Vikas.
August 02, 2005 - 7:31 am UTC
what is wrong with the suggestion I already gave that you have ruled out.
CLOB - Feedback
VIKAS, August 03, 2005 - 6:12 am UTC
Dear Mr Kyte,
Pls dont get me wrong. I have not ruled out your valuable suggestion - No, not at all.
Infact the above approach is basically derived from the suggestion given by you. The only difference is, your's is based on Column level changes and the one i want to try and implement is based on Table level changes.
The main idea behind using this approach is to bring down some extra sweat, by, one by one handling each table that has CLOB as datatype of their columns.(There are numerous such tables in my Database)
So before actually implementing that approach, I just wanted to know whether this is the right one or should i drop this idea.
If its alright, then what precautuions and care should i keep in mind? That was all about it.
Take care.
Kindest regards...
Vikas.
August 03, 2005 - 11:40 am UTC
seems like a alot of work to me, to recreate an entire schema in another user account after editing all of the ddl.
Number of lines in a LOB?
VA, August 28, 2005 - 3:29 pm UTC
How can I get the number of lines in a text file stored in a BLOB column? Assuming the line has the standard DOS line delimters, chr(10)?
Thanks
August 28, 2005 - 4:09 pm UTC
you would have to iterate through the LOB and count them.
dbms_lob.instr will find the n'th occurence of the newline.
Number of lines in a LOB?
A reader, August 28, 2005 - 6:28 pm UTC
Can you please show a quick example or point me to one of how to do that? Thanks
August 29, 2005 - 1:00 am UTC
thought of an easier way...
ops$tkyte@ORA9IR2> desc t;
Name Null? Type
---------------------------------------- -------- ----------------------------
Y CLOB
ops$tkyte@ORA9IR2> select substr(y,1,20) text,
2 dbms_lob.getLength(y) len,
3 length(y)-length(replace(y,chr(10),'')) nl
4 from t;
TEXT
-------------------------------------------------------------------------------
LEN NL
---------- ----------
How Now
Brown Cow,
T
2084000 125040
that won't work in 8i, but you didn't say....
Doesnt work for BLOB
VA, August 29, 2005 - 9:33 am UTC
I am on 9iR2
select substr(y,1,20) text,
dbms_lob.getLength(y) len,
length(y)-length(replace(y,chr(10),'')) nl
doesnt work when "Y" is a BLOB column. I get a ORA-932: Expected number got BLOB
Any workarounds?
Thanks
August 29, 2005 - 1:42 pm UTC
why are you looking for newlines in a blob? text MUST be stored in a clob???????
to do this on a blob will mean you'll be writing your own function and to say that it would be slow would be saying something conservative
Why BLOB?
A reader, August 29, 2005 - 4:34 pm UTC
I am doing this on files uploaded using HTML DB which uses the mod_plsql feature of storing all uploaded files in a BLOB column.
August 30, 2005 - 1:27 am UTC
how often do you need to do this.
how many rows at a time do you need to do this to.
Why BLOB?
A reader, August 30, 2005 - 9:04 am UTC
I have a parse_file routine that takes the CSV file uploaded by mod_plsql and parses the CSV records into a HTML DB collection. Parsing each record, optionally enclosed by double quotes and stuff is a expensive process for more than a couple of hundred records at a time.
So, before I launch the parse process, I would like to get a quick count of number of lines and if it is more than N lines, I would raise an error and ask user to upload a smaller file.
How often? This parse_file routine is a generic routine used by many HTML DB pages. How many rows? Thats what I want to find out! But I dont expect users to upload gigantic CSV files, probably few hundred rows at most.
Thanks
August 30, 2005 - 12:13 pm UTC
oh, why not just use dbms_lob.getlenth? instead of assuming N lines is bad, assume M bytes is bad? that'll be fast.
say "few hundred rows" of "a couple hundred bytes"
plug in a number for a "few" and a "couple" and multiply.
it'll be as good as newlines - if I upload 1,000 lines of two columns - and you upload 500 lines of 50 columns -- I should be OK, you might not be OK right -- so it is a function of newlines and columns -- not just newlines -- hence it is based on the size of the LOB -- nothing else :)
Also, if you were able to write a function that can PARSE the blob -- you can write a function to count newlines (and should realize that if the blob is big, counting the newlines will take..... a long time :)
VA, August 30, 2005 - 1:39 pm UTC
Thats a good idea, but if the file is really large even uploading it would take a long time.
Uploading it just to do a dbms_lob.getlength() and deleting it seems like a waste of resources.
Unfortunately, there seems to be way to access the file from client-side Javascript. Ideally, I would just do the getlength part in client-side JS and not even bother submitting the page if it is greater than N bytes.
Thanks
August 30, 2005 - 1:47 pm UTC
and by the time your plsql routine is called -- guess what the entire file is already uploaded (so I'm not seeing your point????)
uploading it, count lines and then reject -- seems like a waste of resources
uploading it, count the bytes and then reject, that is LESS resources than you just used to count lines!!!!!
now what?
VA, August 30, 2005 - 2:00 pm UTC
Tom, relax. You misunderstand, I am agreeing with you, I liked your suggestion.
Thats why I said "Ideally, I would just do the getlength part in client-side JS and not even bother submitting the page if it is greater than N bytes"
But given the current technoogies, there is simply no way to get the size of a file from Javascript (as far as I can tell)
Thanks
August 30, 2005 - 2:04 pm UTC
I've no idea, being the 'server side guy'
so, let us ask everyone else:
anything on the client to prevent a big upload????
CLOB to VARCHAR2
mo, August 30, 2005 - 10:39 pm UTC
Tom:
I converted a table from another client/server system into Access into oracle. One field "Notes" was of type "CLOB". When i inserted the data from this temp oracle table into my permanent oracle table I got an error because my field type is "VARCHAR2(1000). When I ran:
Alter table stockcodes modify notes varchar2(1000);
oracle could not convert it.
Is there a way where I can grab this data in this "CLOB" field and insert it into "VARCHAR2" field?
THank you,
August 31, 2005 - 1:14 pm UTC
ops$tkyte-ORA10G> create table t ( x int, notes clob );
Table created.
ops$tkyte-ORA10G> insert into t values ( 1, 'hello world' );
1 row created.
ops$tkyte-ORA10G>
ops$tkyte-ORA10G> alter table t add temp varchar2(1000);
Table altered.
ops$tkyte-ORA10G> update t set temp = dbms_lob.substr( notes, 1000, 1 );
1 row updated.
ops$tkyte-ORA10G> alter table t drop column notes;
Table altered.
ops$tkyte-ORA10G> alter table t rename column temp to notes;
Table altered.
ops$tkyte-ORA10G> desc t;
Name Null? Type
---------------------------------------- -------- ----------------------------
X NUMBER(38)
NOTES VARCHAR2(1000)
ops$tkyte-ORA10G> select * from t;
X
----------
NOTES
-------------------------------------------------------------------------------
1
hello world
CLOB
mo, August 31, 2005 - 1:39 pm UTC
Tom:
Thanks, I ended up exporting the table to 9i database, adding a varchar2(1000) column and updating that column with the CLOB column. Does not that yield the same result?
Your way is easier though.
CLOB insert & update
atul, October 06, 2005 - 5:59 am UTC
Hi,
We have tables with CLOB columns.
We have procedure which inserts into LOB and
update table which doesn't include CLOB column.
Our DB_BLOCK_SIZE is 8k.
Avg CLOB lenght is < 4K
Max clob is around 17K,But we have few such records.
Could you please help
1)Could we put CLOB out-of-line to avoid scanning CLOB column while update?
2)Could CACHE option will help?As we can see wait events for Direct Path read after performing 20 Min load test
++++++++
direct path read (lob) 15,325 0 5 0 0.8
SQL*Net break/reset to clien 13 0 3 232 0.0
db file scattered read 6,656 0 3 0 0.3
log file sequential read 56 0 2 44 0.0
buffer busy waits 102 0 2 17 0.0
direct path read 466 0 1 1 0.0
enqueue 1 0 1 532 0.0
direct path write (lob) 356 0 0 1 0.0
++++++++++++++++++
But its not in Top 5 Wait events.
3)If we move CLOB to its own tablespac,Do we still need to explicaitly alter table for "disable in line"?
Thanks,
Atul
2)
October 06, 2005 - 7:56 am UTC
1) yes, but ask yourself first "do I access the table via an index during the update" - if so, .... you are not "scanning" the clob.
2) yes, it could - since every WRITE to the (out of line - over 4k) lob must make the client WAIT for the write to actually happen and every read of the (out of line - over 4k) block must make the client wait for the IO to happen.
3) tablespace has nothing to do with in/out of line - they are different and independent of each other.
Dropping a table containing a LOB
Bob, January 09, 2007 - 2:19 pm UTC
Hi Tom,
I have a question for you regarding LOB's (CLOB).
My initial problem was a tablespace "x_data" being 99% used. When I investigated further, I found a table "x" (containing a CLOB datatype) having a size of 2GB with one record.
select sum(bytes/1024/1024)
from dba_extents
where segment_name = 'X'
I got the table create table statement from (dbms_metadata.get_ddl) and noticed that the initial extent size was set to 2GB. I recreated the table with an initial extent size of 1MB.
I ran the above SQL statement and it was fine (size 1MB).
I then checked the size of of "x_data" - it was 99% full. I went into Enterprise Manager -> tuning pack -> tablespace map and noticed a SYS_LOBxxxxx (in "x_data" tablespace) segment of 1.28GB after having dropped the table.
What do I need to do in order to drop this LOB segment. It exists as an entry in DBA_LOBS.
Thanks
January 11, 2007 - 9:15 pm UTC
lobs have their own segments and storage characteristics. Use get_ddl again and see what the lob is sized to, perhaps it too was set to 2gb.
Dropping a table containing a LOB
Bob, January 10, 2007 - 12:41 pm UTC
Problem solved - the SYSxxx LOB segment belonged to another table from a different schema!
CLOB - Design
A reader, January 15, 2007 - 4:16 am UTC
Hi Tom
I have a design Question
We get data from 5 external sources as CSV files. Each in different format and dealing with different business. Each has a big content that needs to be stored in a CLOB column apart from other data.
We manipulate the data and store it in 5 Different tables having CLOB in each table.
End users use our Serach UI in which they type few words and we need to show all the data that contains those words as part of the CLOB data (even if it is across all the 5 tables).
As a solution for this I could think of ORACLE TEXT index.
But still I think I need to do 5 SELECTs to fetch from 5 different tables, as I can't do UNION with CLOBs.
Another solution I could think of is store all the CLOB data in one sigle table with unique ID and store that ID in the other 5 tables as FKs. In this case I would be searching onle one table (i.e. CLOB_data_table) and outer join the results with other 5 tables to get other information.
Which one do you think as the better one. If both are not good, could you please suggest one.
Thanks in Advance
Kamini
Followup to Origianl Question of this thread
Madhava Reddy, January 20, 2007 - 11:11 pm UTC
Hi,
the following query might be helpful to solve the first question of this thread (I saw this thread now for the first time and thought I can give a try).
select substr(y,instr(y,chr(13),-instr(y,'interesting'),1)+2,(instr(y,chr(13),instr(y,'interesting'),1)-instr(y,chr(13),-instr(y,'interesting'),1))) line
from t
where y like '%interesting%';
One thing I don't understand is that why length of carriage return is returned as 3 by Oracle (I might be missing some basics here.Hence, I added 2 to nullify this in the above query).
A design Question
Kamini, February 17, 2007 - 8:57 am UTC
Hi Tom
I have a design Question
We get data from 5 external sources as CSV files. Each in different format and dealing with different business. Each has a big content that needs to be stored in a CLOB column apart from other data.
We manipulate the data and store it in 5 Different tables having CLOB in each table.
End users use our Serach UI in which they type few words and we need to show all the data that contains those words as part of the CLOB data (even if it is across all the 5 tables).
As a solution for this I could think of ORACLE TEXT index.
But still I think I need to do 5 SELECTs to fetch from 5 different tables, as I can't do UNION with CLOBs.
Another solution I could think of is store all the CLOB data in one sigle table with unique ID and store that ID in the other 5 tables as FKs. In this case I would be searching onle one table (i.e. CLOB_data_table) and outer join the results with other 5 tables to get other information.
Which one do you think as the better one. If both are not good, could you please suggest one.
Thanks in Advance
Kamini
February 18, 2007 - 9:41 am UTC
A reader, February 19, 2007 - 8:50 am UTC
Hi Thomas,
we are facing problems with updation of clob data.
the clob data will be around 20k.
i have created a table based on tablespace with 16k block size.
i have also set db_16k_block_size.
so are there any additional parameter which can be played. with.
February 19, 2007 - 9:36 am UTC
I am facing a problem myself.
I have no clue what your problem might be, that is a problem. We sort of need to fix that problem before we can discuss a solution to your as yet unknown problem.
Oh my god
A reader, February 19, 2007 - 12:28 pm UTC
Sorry Sorry Sorry.
ok the problem is the clob updations are pretty slow.
that't the main problem.
February 19, 2007 - 2:20 pm UTC
so, how are you doing it. and define "slow" and explain your expectations (eg: what is 'fast enough')
and what is the "oh my god" for? If you want help, well, be a tad more clear in stating what the PROBLEM is.
clob
sam, March 06, 2007 - 5:15 pm UTC
Tom:
I am sending an http message using curl (uploading file) into an oracle procedure that would save the files.
c:\curl\curl -F "i_file=@1234.xml"
http://xxx.xx.xxx/pls/ttt/save_data SQL> desc test;
Name Null? Type
----------------------------------------- -------- ----------------------------
EMPID NUMBER
FIRST_NAME VARCHAR2(20)
LAST_NAME VARCHAR2(20)
ABC_UPDATES CLOB
My save _data does this:
PROCEDURE SAVE_DATA
(i_empid VARCHAR2 DEFAULT NULL,
i_first_name VARCHAR2 DEFAULT NULL,
i_last_name VARCHAR2 DEFAULT NULL,
i_file VARCHAR2 DEFAULT NULL )
IS
BEGIN
insert into TEST
values (i_empid, i_first_name,i_last_name,i_file);
commit;
END;
I keep geting ORA--00942: table or view does not exist.
Is there something wrong with the way I am saving file into CLOB.
Thanks
March 06, 2007 - 5:27 pm UTC
have you ever done a file upload with mod_plsql
that is NOT the way it works. You have not done the setup - we upload into a table (document) and pass you a lob locator.
think about it, a varchar2 is limited to 32k in plsql, that'll never work.
please - check out the docs, do a file upload using a web browser, test it out, get it working - and then start this hack.
clob
sam, March 09, 2007 - 10:32 pm UTC
March 12, 2007 - 5:04 pm UTC
I have no idea Sam, no idea.
You don't give us any ideas.
Reading data from CLOB based on a join.
Rajeshwaran, Jeyabal, July 11, 2020 - 5:26 am UTC
Team:
The below testcase was modelled like our business requirement.
Where we need to pull data from CLOB column(actually having JSON data in it)based on a join.
we are on Exadata cloud@customer X7 Quater rack with Oracle 18c (18.10) database.
when i say
select t1.object_name, t2.id,t2.the_clob
from t1, t2
where t1.id = t2.id;
get complete in few 5 sec, but when i say
select t1.object_name, t2.id,t2.the_clob
from t1, t2
where t1.id = t2.id;
this was running for hours - given below the sql-monitor report for the same, Kindly help us the understand what is causing the slowness here and what can be done to make it got faster.
c##rajesh@PDB1> create table t1
2 nologging
3 as
4 select a.*, rownum as id
5 from all_objects a,
6 all_users
7 where rownum <= 4000000;
Table created.
c##rajesh@PDB1> create table t2
2 nologging
3 as
4 select rownum as id, x1.the_clob
5 from all_objects a ,
6 ( select json_arrayagg( json_object( 'owner' value owner,
7 'object_name' value object_name,
8 'object_type' value object_type,
9 'object_id' value object_id,
10 'created' value created) returning clob) the_clob
11 from (
12 select owner,object_name,object_type,object_id,created
13 from all_objects
14 where rownum <=450
15 ) ) x1
16 where rownum <=3600;
Table created.
c##rajesh@PDB1> select count(*),
2 max( dbms_lob.getlength(the_clob) )
3 from t2;
COUNT(*) MAX(DBMS_LOB.GETLENGTH(THE_CLOB))
---------- ---------------------------------
3600 51799
c##rajesh@PDB1> select count(*),
2 max( dbms_lob.getlength(the_clob) ) max_size,
3 avg( dbms_lob.getlength(the_clob) ) avg_size,
4 min( dbms_lob.getlength(the_clob) ) min_size
5 from t2;
COUNT(*) MAX_SIZE AVG_SIZE MIN_SIZE
---------- ---------- ---------- ----------
3600 51799 51799 51799
c##rajesh@PDB1> set timing on time on
10:29:30 c##rajesh@PDB1> create table t3
10:29:32 2 tablespace ehdl
10:29:32 3 as
10:29:32 4 select t1.object_name, t2.id
10:29:32 5 from t1, t2
10:29:32 6 where t1.id = t2.id ;
Table created.
Elapsed: 00:00:04.16
10:29:40 c##rajesh@PDB1> drop table t3 purge;
Table dropped.
when i pull the data from "the_clob" column - was running slow.
10:30:33 c##rajesh@PDB1> create table t3
10:31:07 2 tablespace ehdl
10:31:07 3 nologging
10:31:07 4 as
10:31:07 5 select t1.object_name, t2.id,t2.the_clob
10:31:07 6 from t1, t2
10:31:07 7 where t1.id = t2.id ;
Here is the sql-monitor report for the same:
c##rajesh@PDB1> @sqlmonitortext cqtp4hdvtp760
old 2: :x := dbms_sqltune.report_sql_monitor(sql_id=>'&1',type=>'TEXT');
new 2: :x := dbms_sqltune.report_sql_monitor(sql_id=>'cqtp4hdvtp760',type=>'TEXT');
SQL Monitoring Report
SQL Text
------------------------------
create table t3 tablespace ehdl nologging as select t1.object_name, t2.id,t2.the_clob from t1, t2 where t1.id = t2.id
Global Information
------------------------------
Status : EXECUTING
Instance ID : 2
Session : C##RAJESH (19:54701)
SQL ID : cqtp4hdvtp760
SQL Execution ID : 33554432
Execution Started : 07/11/2020 01:01:11
First Refresh Time : 07/11/2020 01:01:15
Last Refresh Time : 07/11/2020 01:21:23
Duration : 1213s
Module/Action : SQL*Plus/-
Service : pdb1
Program : sqlplus.exe
Global Stats
============================================================================================================================
| Elapsed | Cpu | IO | Concurrency | Cluster | PL/SQL | Buffer | Read | Read | Write | Write | Offload |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Time(s) | Gets | Reqs | Bytes | Reqs | Bytes | Returned Bytes |
============================================================================================================================
| 1241 | 570 | 670 | 0.12 | 0.17 | 0.00 | 6M | 492K | 48GB | 480K | 47GB | 95GB |
============================================================================================================================
SQL Plan Monitoring Details (Plan Hash Value=3168156750)
=============================================================================================================================================================================================================
| Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Write | Write | Mem | Activity | Activity Detail | Progress |
| | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | Reqs | Bytes | | (%) | (# samples) | |
=============================================================================================================================================================================================================
| -> 0 | CREATE TABLE STATEMENT | | | | 1209 | +4 | 1 | 0 | | | | | . | | | |
| -> 1 | LOAD AS SELECT | T3 | | | 1213 | +1 | 1 | 0 | 492K | 47GB | 480K | 47GB | 2MB | 99.83 | log buffer space (35) | |
| | | | | | | | | | | | | | | | Cpu (496) | |
| | | | | | | | | | | | | | | | ASM IO for non-blocking poll (3) | |
| | | | | | | | | | | | | | | | direct path read (354) | |
| | | | | | | | | | | | | | | | direct path write (317) | |
| -> 2 | OPTIMIZER STATISTICS GATHERING | | 4M | 20789 | 1209 | +4 | 1 | 479K | | | | | 256KB | | | |
| 3 | HASH JOIN | | 4M | 20789 | 1209 | +4 | 1 | 479K | | | | | 5MB | 0.08 | Cpu (1) | |
| 4 | TABLE ACCESS STORAGE FULL | T2 | 3600 | 10 | 1 | +4 | 1 | 3600 | | | | | . | | | |
| -> 5 | TABLE ACCESS STORAGE FULL | T1 | 4M | 20767 | 1209 | +4 | 1 | 478K | 86 | 43MB | | | . | | | 11% |
=============================================================================================================================================================================================================
One thing, i observe here is the we have been waited on "direct path read and writes" - kindly advice, what can be done here to make it go faster.
let me know if this has to be raised as a new question.
July 13, 2020 - 3:16 am UTC
First can you clarify this statement:
when i say
select t1.object_name, t2.id,t2.the_clob
from t1, t2
where t1.id = t2.id;
get complete in few 5 sec, but when i say
select t1.object_name, t2.id,t2.the_clob
from t1, t2
where t1.id = t2.id;
Reading data from CLOB based on a join.
Rajeshwaran, Jeyabal, July 11, 2020 - 11:36 am UTC
sorry in the above demo , the ddl for the table "T1" was like this:
create table t1
nologging
as
select a.*, ceil(dbms_random.value(1,3600)) as id
from all_objects a,
all_users
where rownum <= 4000000;
Reading data from CLOB based on a join.
Rajeshwaran, Jeyabal, July 13, 2020 - 3:36 am UTC
thanks for your time.
Sorry - when i say (without clob column in the select column list)
select t1.object_name, t2.id
from t1, t2
where t1.id = t2.id;
got competed in few second
but when i say (with clob column included)
select t1.object_name, t2.id,t2.the_clob
from t1, t2
where t1.id = t2.id;
took almost 2+hours
I understand that - each row from T2 matches with almost 1000+ rows in T1 - so the number of times "the_clob" column selected in the final output is huge.
so what can be done, to make this "ddl" go faster ? kindly advice.
create table t3
tablespace ehdl
nologging
as
select t1.object_name, t2.id,
t2.the_clob
from t1, t2
where t1.id = t2.id;
July 20, 2020 - 3:25 am UTC
A lob needs to read via its lob index, and the concept of "index fast full scan" is not present, so to read a lob, it is similar to an indexed read - which means the concept of the "multi-block read" does not come into play.
We can see this with a trace. I repeated your example, and the tkprof looks like
create table t3b as
select t1.object_name, t2.id, t2.the_clob
from t1, t2
where t1.id = t2.id
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 9 0 0
Execute 1 1.89 12.36 130351 85108 51851 3600
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 1.89 12.36 130351 85117 51851 3600
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 107
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 LOAD AS SELECT T3B (cr=85118 pr=130351 pw=46829 time=12361477 us starts=1)
3600 3600 3600 OPTIMIZER STATISTICS GATHERING (cr=83558 pr=83526 pw=0 time=21667 us starts=1 cost=23026 size=309600 card=3600)
3600 3600 3600 HASH JOIN (cr=83558 pr=83526 pw=0 time=15175 us starts=1 cost=23026 size=309600 card=3600)
3600 3600 3600 TABLE ACCESS FULL T2 (cr=27 pr=0 pw=0 time=2002 us starts=1 cost=10 size=162000 card=3600)
4000000 4000000 4000000 TABLE ACCESS FULL T1 (cr=83531 pr=83526 pw=0 time=990663 us starts=1 cost=22986 size=164000000 card=4000000)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
index (re)build lock or pin object 1 0.00 0.00
PGA memory operation 4 0.00 0.00
direct path read 4312 0.06 9.03
enq: HW - contention 1 0.00 0.00
local write wait 12 0.00 0.00
reliable message 1 0.00 0.00
enq: CR - block range reuse ckpt 1 0.00 0.00
db file sequential read 25 0.00 0.00
direct path write 103 0.01 0.22
enq: TX - contention 5 0.55 0.93
log file switch completion 4 0.01 0.04
log file sync 1 0.00 0.00
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00
********************************************************************************
If we dig into the trace file to see those 4312 direct path reads, we can isolate those that pertain just to the lob segment, in my case:
SQL> select data_object_id
2 from dba_objects
3 where object_name = 'SYS_LOB0000092269C00002$$';
DATA_OBJECT_ID
--------------
92270
and then see how many waits for there
# grep direct db19_ora_18840.trc | grep 92270 | wc -l
3654
and you see approx get a direct read per row of the lobs we need to get.
Some things to look at
- Setting CACHE on the lob the source table can improve things a little, but you are obviously then smashing a lot of lob data into your buffer cache
- parallel DML
- DIY parallelism (eg dbms_parallel_execute)