Thanks for the question, Sudip.

Asked: March 17, 2002 - 11:28 pm UTC

Last updated: December 18, 2011 - 5:23 am UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Hi All,
One of my tables has a BLOB column, in which I will store Excel Files. Now this particular Column will be updated very frequenty. I have taken the precaution of putting the BLOB column on a separate tablespace. All my tablespaces are LMT. I have increased FREELISTS on this table to 10.
What should be the optimal value of PCTUSED, PCTFREE parameter?

Regards
Sudip

and Tom said...

pctused and pctfree do not have any meaning with regards to BLOBS. A blob allocates space in a series of CHUNKS (you controlled the chunk size when you created it). If you for example, set the chunksize to 32k and you have an 8k block -- then a blob will always allocate blocks for itself in multiples of 4 blocks. If your blob is 32k + 100 bytes for example, it will consume 8 blocks (4 for the first 32k chunk, 4 for the remaining 100 bytes). The 3 empty blocks at the end of the blob are dedicated to that blob -- they are not on a freelist anywhere.

Remember also that there is block overhead so my example of 32k + 100 bytes is too simplistic. 32k of blob data will not fit on 4-8k blocks, that simple 32k blob will consume 2 x 4-8k blocks due to block overhead.

When a blob "shrinks" -- it will give up free space in blocks of CHUNK size bytes. eg: in that 32k + 100 byte example, if you later trim the blob back to 16k -- it'll give up the one 32k chunk back for reuse.

Rating

(38 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Sudip Sarkar, March 18, 2002 - 9:36 am UTC

Thanks Tom, It is indeed a very useful information

LOB CHUNK size and index clause

vinci, August 20, 2002 - 11:19 am UTC

Hi Tom,
I had two questions.

1.If i have to follow a Uniform, locally managed tablaspace strategy for an LOB column in a table(LOB in a seperate tablespace than the table data), then the uniform extent size shud be equal to the CHUNK size?
My datablock size is 4k (Solaris, Oracle 8.1.7.4).
We normally use standard extent sizes for tablespaces based on what size the tables are going to be. For example if a table size is between 96k and 128m then all tables like that go into one tablespace which will have a NEXT or UNFIROM extent size of 128k ....and so on.
But in case of a LOB inside a table...should NEXT or UNIFORM be equal to CHUNK size for that LOB tablespace?(which is different from teh general convention that we follow). What is your advice on this?
Even if i continue with the same 128kextent size allocation for teh LOB tableapsce also ... does CHUNK size matter?(say CHUNK size is 4k). What wud be ur recommendation.

2.Oracle documentiation says not to mention index clause for LOB tablespace as of oracle 8i. Even if i use it, oracle will ignore it.
But can i just specify the index name atlaest(for maintenance/standards following)? Will it have any effect on how oracle handles the LOB and its index?

Thanks

August 20, 2002 - 2:06 pm UTC

1) no, that would be not a good idea.  You want a couple of chunks in an extent -- the rule would be "make the extent size a MULTIPLE of the chunk size".

An extent/ chunk would be a little over zealous, you want N chunks / extent.  You get to pick N -- it depends on the ultimate sum of the sizes of the lobs you'll be storing -- again, shooting for 1000 or less extents 


2) you cannot control it (technically).  The name should have no effect.  The following is not documented but appears to work (but could stop working unpredicably!)

ops$tkyte@ORA920.US.ORACLE.COM> CREATE TABLE T (X NUMBER, Y CLOB)
  2  LOB (Y) STORE AS
  3  ( INDEX foo )
  4  /

Table created.

ops$tkyte@ORA920.US.ORACLE.COM> CREATE TABLE T2 (X NUMBER, Y CLOB)
  2  /

Table created.

ops$tkyte@ORA920.US.ORACLE.COM> 
ops$tkyte@ORA920.US.ORACLE.COM> select index_name, index_type, table_name from user_indexes;

INDEX_NAME                     INDEX_TYPE                  TABLE_NAME
------------------------------ --------------------------- ------------------------------
FOO                            LOB                         T
SYS_IL0000030677C00002$$       LOB                         T2

ops$tkyte@ORA920.US.ORACLE.COM> 

Use at your own risk for that reason.  It might be best to just leave it named by the system.  It is not like you can alter it or do anything to it.

vinci, August 20, 2002 - 7:50 pm UTC

Thanks Tom...

Any BLOB limitations

Arun Gupta, February 11, 2004 - 12:46 pm UTC

Tom,
We want to use BLOB type to store image data in the database. From a DBA perspective do you see any issues with backup, export/import or any thing else that comes to your mind. We are on 9iR2.
Thanks

February 11, 2004 - 2:44 pm UTC

Nope.

backup -- it is just data, in the database. not much different from a "date"

export/import -- not with blobs (although you'll find imp does it a row at at time - could be a bit "slow")

you'll have to make decisions about

o log / nolog
o cache / nocache
o chunksize

A reader, July 19, 2004 - 12:06 am UTC

Tom,
We need to store part of phone conversations in our Oracle database. Is the Blob datatype the correct way to do this? Do you have any suggestions that you can shared with me about storing/retrieving this king of data on to the Oracle database? I will appreciate any suggestions you can give me? We are on 9iR2.

Thanks

July 19, 2004 - 7:32 am UTC

</code> http://docs.oracle.com/docs/cd/B10501_01/appdev.920/a96591/toc.htm


yes, a blob would be the correct type. depending on the format of the audio,

http://docs.oracle.com/docs/cd/B10501_01/appdev.920/a88786/toc.htm <code>

might be appropriate as well.

Weird Error with BLOB datatype

A reader, August 09, 2004 - 2:11 pm UTC

Hi Tom,

I get really a weird error on one of my databases which is Oracle9i Enterprise Edition Release 9.0.1.3.0 on HP-UX:

SQL> CREATE TABLE wcm.TEST4_CDA (
     pkey integer not null, 
     data blob, 
     CONSTRAINT TEST4_CDA_PK PRIMARY KEY (pkey))
     lob (data) store as (enable storage in row);


ORA-03001: unimplemented feature

If I change the "segment space management" property of the tablespace from 'Automatic' to 'Manual', it works!

Is it something wrong with the installation? Is it init.ora parameter?

The other databases, which are on different, box but with same Oracle version and OS, work fine.

As always, thanks for your time and valuable input.

Arash

August 09, 2004 - 2:35 pm UTC

in 9iR1, lobs could not be in ASSM tablespaces. in current releases (9ir2 and 10gr1) they can be.

lmts autoallocate / assm

A reader, October 12, 2004 - 12:23 pm UTC

As of 9iR2, is it recommended to use ASSM on tablespaces that are storing BLOB data? I always do now, but wanted to verify. Also, if the files being stored in the blob column within the database will be all different sizes, is it preferable to store them on an LMT with AUTOALLOCATE or is it always preferable to use uniform extents?

October 12, 2004 - 1:39 pm UTC

it is allowed, it is neither "always recommended or always not recommended". Anything that starts with 'A' follows an 80/20 or maybe 95/5 rule -- it works in most cases better than good enough, but there will be times, certain use cases where it doesn't work as well as the old fashioned/manual way.

I would (do) use autoallocate for LMT's most of the time now myself, yes. uniform extents allow you to more easily monitor "hows this growing over time" (predicable) and "do i have enough room for the next extent" (same size, autoallocate does change as the object gets larger)

Using BLOB instead of LONG!

A reader, November 29, 2004 - 10:45 am UTC

Hi Tom,

I have a table, which has a column as long. We store the body of news as content in this column.

Now I want to use BLOB instead of long in new design of APP.

There are two questions:

1- How can I retrieve and view blob data in SQLPLUS?
2- For migrating data, 'select insert into....) doesn't work! Can not insert long datatype into BLOB datatype. How can I do this? I need to use 'select ... insert...) because the two tables have different structure.

Thank you in advance,
- Arash

November 29, 2004 - 3:10 pm UTC

blob instead of long?

nope.

CLOB instead of long -- yes.

1) use a clob
2) use a clob and to_lob(long_column) will work.

A reader, November 29, 2004 - 11:14 am UTC

Sorry! I meant 'insert into ...select ...)

and one more question is that : is it better that I use clob versus blob for content of a story which is text?

November 29, 2004 - 3:25 pm UTC

clob is the only type to use.

imp blob

jasdeep, January 03, 2005 - 1:29 am UTC

hi tom

i faced ora -600 error on a client's database and after discusssing with oracle recreated database by exp/imp but after recreating the database the application is not picking the blob column from the table which it was picking up earlier. the same application is picking up same column from the dummy DB on a different m/c.
i have run initjvm.sql and subsequent files but no result.

thanks

January 03, 2005 - 8:28 am UTC

that doesn't even begin to make sense to me.

I would suggest continue working with support to get the entire problem resolved.

upload files

A reader, January 20, 2006 - 12:14 pm UTC

I want to limit the size of the upload files to a BLOB

January 20, 2006 - 12:53 pm UTC

you cannot as far as I know since the browser will not share that information with us (eg: I do not know of javascript that you could write that can query the file system and ask "what size is the file" before you start uploading)

this is something you would necessarily have to control on the client side.

LOB contention

Kevin Lidh, February 08, 2006 - 5:57 pm UTC

We have a billing system which is partitioned into 9 partitions. The bills are XML and stored in blobs. In each partition on any given day we're writing tens of thousands of these XML. What we're encountering, probably not unexpectedly, is contention on the blob partitions. Same object ID, different block ID in the same data file (only one file per blob partition). What I was wondering was, is there a way to break up the I/O between data files? Right now the LOB tablespaces are one data file. We are using LMT with uniform 256k extents. Someone here suggested manually creating extents and specifying datafiles. Kind of pre-building the tablespace.

February 10, 2006 - 9:02 am UTC

define contention for me here and tell us how you measured it.

define how the lobs are stored (cached, not cached).

define your file system too - are you really used dedicated direct attach storage, or is there some raid level involved here.

256k extents seems a tad small for your application doesn't it?

Elaboration on LOB contention

Kevin Lidh, February 10, 2006 - 10:18 am UTC

What I saw were several processes waiting with "db file sequential read" and "direct path write (lob)" waits for the same object (e.g. SYS_LOB0000027612C00006$$SP_INFO_BLOB_P060 block IDs 114301, 114305, 114309, 114313 in a single data file). The LOBs aren't cached and we use SAN storage. The file system is a 512 GB virtual disk with lots of physical disks under it. I was supposing that it wasn't as much an I/O issue as a queueing issue in Oracle waiting to write to the file but I wasn't sure. I was thinking that maybe if I rebuilt the table and put several smaller finite-sized data files under each partition's tablespace, it wouldn't queue as much. I do think 256k is small for these extents as some of the partitions have 5000+ extents. The chunk size for the table isn't evenly divisible into the extent size but I'm fixing that (although I think Oracle rounds it anyway).

February 10, 2006 - 1:02 pm UTC

so every read and write will be a physical IO, users will have to wait for all writes to fully complete as they write the data.

If you have lots of physical disks, you already have striped the data - that single file is over MANY physical disks.

It is not queueing on eachother (they are not queued for eachother in the database), they are queued at the SAN waiting for their writes to complete.

Have you considered cached lobs, which would allow you to write the lob into the buffer cache and have dbwr write them out later?

do you have access to my last book "Expert Oracle: Database Architecture" - I covered lobs in fairly great detail there, going through the cache, cache reads, nocache, their structures and so on?

LOB contention

Kevin Lidh, February 13, 2006 - 11:22 am UTC

Thank you, thank you, thank you. There are two BLOB tables and I turned caching on both. I read about ensuring you allow for the additional space so I increased db_cache_size and I looked at the number of db writers. This with a few other "what were they thinking" issues I found have gotten us to a point where we have massive enqueuing...but this is good. Now it's processing bills so fast, it's queueing up on the requests for sequence numbers for the bills, statements, etc. (records in a table, not sequences). When we get past the caching of those numbers in the application, we'll have a huge win on our hands.
<excuses>
I do have your book but I was only on Chapter 4 and the LOBs are in Chapter 12.
</excuses>

February 13, 2006 - 12:50 pm UTC

... requests for sequence numbers for the bills, ...

ahhhhh run away, run away !!! "gap free sequences" :(

Su Baba, February 25, 2006 - 1:19 am UTC

Would you recommend using Oracle to store BLOBs? What are the advantages and disadvantages of using a relational database to store/retrieve/parse BLOBs?

February 25, 2006 - 11:15 am UTC

absolutely.

where else would you store a blob - blobs only exist in databases.

files live outside of databases.

So, I suppose you really mean "files versus blobs"

To me, it is really "data", period.

data in a database is

o secured
o backed up
o recoverable
o audited
o accessible

files in a file system are not.

It comes down to that. Yes, I would strongly recommend blobs over files for storing data you actually care about.

Su Baba, February 25, 2006 - 12:16 pm UTC

If I want to do a search within a BLOB, say it's a pdf file, I assume I would use Oracle Context Search to do it? Would this be faster than using operating system-level tools to perform the same search?

February 26, 2006 - 9:29 am UTC

You would use Oracle Text, yes.

What "operating system tool" are you referring to. I am aware of various utilies like "grep" - but they won't be of much use against a PDF.

I believe you'll find the flexibility and feature set available in the database to be slightly "larger" than grep.

Look at the advanced search tab on this site for example. Easy in SQL, not so easy with grep.

Re-initializing CLOBs to save space

Mark, February 26, 2006 - 11:06 am UTC

We store graphs as Base64Binary strings in CLOBs, but only need them for a few months, after which we would like to re-initialize the record CLOB field to save space (the data is still available elsewhere). Will Oracle reclaim the space if we do this? This is critical as more than half our disk space is taken up with these graphs.

February 26, 2006 - 11:46 am UTC

Oracle will preserve "pctversion" (or use RETENTION) of the old ones for read consistency (LOBs don't do UNDO like regular data, they are versioned in the lob tablespace).

So, if you just "insert" and later "reset" (eg: updates don't really happen), you can set pctversion very low in order to maximize your ability to "reclaim" space as soon as possible.

But yes, space should be reused in the lob segment and lob index segments.

PCTVERSION Clause

The CREATE TABLE statement returned from DBMS_METADATA previously included the following:

LOB ("TXT") STORE AS ( PCTVERSION 10 )

This is used to control the read consistency of the LOB. In previous chapters, weve discussed read consistency, multi-versioning, and the role that undo plays in that. Well, when it comes to LOBs, the way read consistency is implemented changes. The LOBSEGMENT does not use undo to record its changes; rather, it versions the information directly in the LOBSEGMENT itself. The LOBINDEX generates undo just as any other segment would, but the LOBSEGMENT does not. Instead, when you modify a LOB, Oracle allocates a new CHUNK and leaves the old CHUNK in place. If you roll back your transaction, the changes to the LOB index are rolled back and the index will point to the old CHUNK again. So the undo maintenance is performed right in the LOB segment itself. As you modify the data, the old data is left in place and new data created.

This comes into play for reading the LOB data as well. LOBs are read consistent, just as all other segments are. If you retrieve a LOB locator at 9:00 am, the LOB data you retrieve from it will be as of 9:00 am. Just like if you open a cursor (a resultset) at 9:00 am, the rows it produces will be as of that point in time. Even if someone else comes along and modifies the LOB data and commits (or not), your LOB locator will be as of 9:00 am, just like your resultset would be. Here, Oracle uses the LOBSEGMENT along with the read-consistent view of the LOBNIDEX to undo the changes to the LOB, to present you with the LOB data as it existed when you retrieved the LOB location. It does not use the undo information for the LOBSEGMENT, since none was generated for the LOBSEGMENT itself.

We can see that LOBs are read consistent easily. Consider this small table with an out-of-line LOB (it is stored in the LOBSEGMENT):

ops$tkyte@ORA10G> create table t
2 ( id int primary key,
3 txt clob
4 )
5 lob( txt) store as ( disable storage in row )
6 /
Table created.

ops$tkyte@ORA10G> insert into t values ( 1, 'hello world' );
1 row created.

ops$tkyte@ORA10G> commit;
Commit complete.

If we fetch out the LOB locator and open a cursor on this table as follows:

ops$tkyte@ORA10G> declare
2 l_clob clob;
3
4 cursor c is select id from t;
5 l_id number;
6 begin
7 select txt into l_clob from t;
8 open c;

and then we modify that row and commit:

9
10 update t set id = 2, txt = 'Goodbye';
11 commit;
12

well see upon working with the LOB locator and opened cursor that the data is presented as of the point in time we retrieved or opened them:

13 dbms_output.put_line( dbms_lob.substr( l_clob, 100, 1 ) );
14 fetch c into l_id;
15 dbms_output.put_line( 'id = ' || l_id );
16 close c;
17 end;
18 /

hello world
id = 1

PL/SQL procedure successfully completed.

But the data is most certainly updated/modified in the database:

ops$tkyte@ORA10G> select * from t;

ID TXT
---------- ---------------
2 Goodbye

The read-consistent images for the cursor C came from the undo segments, whereas the read-consistent images for the LOB came from the LOB segment itself.
So, that gives us a reason to be concerned: if the undo segments are not used to store rollback for LOBs and LOBs support read consistency, how can we prevent the dreaded ORA-1555: snapshot too old error from occurring? And, as important, how do we control the amount of space used by these old versions? That is where PCTVERSION comes into play.

PCTVERSION controls the percentage of allocated (used by LOBs at some point and blocks under the LOBSEGMENTs HWM) LOB space that should be used for versioning of LOB data. The default of 10 percent is adequate for many uses since many times you only ever INSERT and retrieve LOBs (updating of LOBs is typically not done; LOBs tend to be inserted once and retrieved many times). Therefore, not much space, if any, needs to be set aside for LOB versioning.

However, if you have an application that does modify the LOBs frequently, the default of 10 percent may be too small if you frequently read LOBs at the same time some other session is modifying them. If you hit an ORA-22924 while processing a LOB, the solution is not to increase the size of your undo tablespace, or increase the undo retention, or add more RBS space if you are using manual undo management. Rather you should use the following:

ALTER TABLE tabname MODIFY LOB (lobname) ( PCTVERSION n );

and increase the amount of space to be used in that LOBSEGMENT for versioning of data.

RETENTION Clause

This is a mutually exclusive alternative to the PCTVERSION clause, and it is valid when you are using automatic undo management in the database. Rather than reserving some percentage of space in the LOBSEGMENT to version LOBs, the RETENTION clause uses the same time-based mechanism to retain data. The database would have the parameter UNDO_RETENTION set to specify how long to retain undo information for consistent reads. This parameter would also apply to LOB data in this case.

It should be noted that you cannot specify the retention time using this clause; rather, it is inherited from the database setting of UNDO_RETENTION itself.
</quote>

A reader, March 10, 2006 - 1:39 pm UTC

On p. 544 of your book "Expert Oracle Database Architecture", you said, in reference to LOBs,

...
"If the table had millions of rows, and each row had a sizeable LOB associated with it, the LOB data would be huge. It would make sense to segregate the table from the LOB data just to facilitate backup and recovery and space management. You may well want a different unifrom extent size for your LOB data than you have for your regular table data, for example."
...

Can you please elaborate on how segregating the table from the LOB data can faciliate backup and recovery?

March 10, 2006 - 8:30 pm UTC

you break things up - for most people the unit of backup and recovery is historically "a tablespace".

If you have two large tablespaces
versus
one huge one...

BLOBs and Text Search

A reader, March 13, 2006 - 8:18 pm UTC

Would Oracle Text Search work for compressed BLOB data? Let's say we compress BLOBs in Java, store in the database, and then index the BLOB column with ctxsys.context type index. Would Oracle Text Search work in this case? If not, is there anyway we can create a function-based ctxsys.context type index to decompress the data before we perform the search?

March 14, 2006 - 10:05 am UTC

you would have to decompress the data, yes.

A reader, March 14, 2006 - 5:57 pm UTC

SQL> create table t
  2  ( id        int primary key,
  3    theBlob   blob
  4  )
  5  /

Table created.

SQL> 
SQL> CREATE INDEX t_blob_ind ON t(utl_compress.lz_uncompress(theBlob))
  2  INDEXTYPE IS ctxsys.context;
CREATE INDEX t_blob_ind ON t(utl_compress.lz_uncompress(theBlob))
                             *
ERROR at line 1:
ORA-30553: The function is not deterministic


Is there any way to get around this problem?  The goal is to be able to perform search using Oracle Text on compressed BLOB data (data were compressed using utl_compress.lz_compress API).

March 15, 2006 - 9:53 am UTC

write your own function.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:5533095920114 <code>

A reader, March 15, 2006 - 6:38 pm UTC

This is to follow up on the previous question of uncompressing a BLOB field before doing a context search.

I created a procedure as you suggested and indexed that procedure. However, when I query the table, no rows are returned. The PDF file does contain the word "test". Did I do something wrong?


SQL> 
SQL> drop table demo;

Table dropped.

SQL> 
SQL> create table demo
  2  ( id        INT PRIMARY KEY,
  3    theBlob   blob,
  4    dummy_col VARCHAR2(1)
  5  )
  6  /

Table created.

SQL> -- --------------------------------------------------------------
SQL> -- Load a PDF file into a BLOB field and compress the BLOB.
SQL> -- --------------------------------------------------------------
SQL> declare
  2      l_blob    blob;
  3      l_bfile   bfile;
  4  
  5  begin
  6      insert into demo (id, theBLOB) values ( 1, empty_blob() )
  7      returning theBlob into l_blob;
  8  
  9      l_bfile := bfilename( 'BLOB_DIR', 'Test.PDF' );
 10      dbms_lob.fileopen( l_bfile );
 11  
 12      dbms_lob.loadfromfile( l_blob,
 13                             l_bfile,
 14                             dbms_lob.getlength( l_bfile ) );
 15  
 16      UPDATE demo
 17      SET    theBlob = utl_compress.lz_compress(l_blob, 6)
 18      WHERE  id = 1;
 19  
 20      dbms_lob.fileclose( l_bfile );
 21      COMMIT;
 22  end;
 23  /

PL/SQL procedure successfully completed.

SQL> CREATE OR REPLACE PROCEDURE demo_proc (
  2     p_id  IN rowid,
  3     p_lob IN OUT blob
  4  )
  5  IS
  6     l_uncompressed_BLOB BLOB;
  7  
  8  BEGIN
  9     FOR c1 IN (select theBLOB from demo where rowid = p_id) LOOP
 10        l_uncompressed_BLOB := utl_compress.lz_uncompress(c1.theBLOB);
 11  
 12        -- assemble into p_lob ANY data you want indexed.
 13        dbms_lob.copy( p_lob,
 14                       l_uncompressed_BLOB,
 15                       dbms_lob.getlength( l_uncompressed_BLOB ));
 16     END LOOP;
 17  END;
 18  /

Procedure created.

SQL> begin
  2     ctx_ddl.drop_preference('demo_user_datastore');
  3     ctx_ddl.create_preference( 'demo_user_datastore', 'user_datastore' );
  4     ctx_ddl.set_attribute( 'demo_user_datastore', 'procedure', 'demo_proc' );
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> 
SQL> begin
  2     ctx_ddl.drop_preference('my_lexer');
  3     ctx_ddl.create_preference( 'my_lexer', 'BASIC_LEXER' );
  4     ctx_ddl.set_attribute( 'my_lexer', 'base_letter', 'YES');
  5     ctx_ddl.set_attribute( 'my_lexer', 'mixed_case', 'YES' );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> CREATE INDEX demo_idx ON demo(dummy_col)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  PARAMETERS('datastore demo_user_datastore lexer my_lexer');

Index created.

SQL> 
SQL> 
SQL> SELECT id FROM demo
  2  WHERE  CONTAINS(dummy_col, 'test') > 0;

no rows selected

SQL> SELECT COUNT(*) FROM demo;

  COUNT(*)
----------
         1

March 16, 2006 - 7:45 am UTC

never tried it with pdfs - is it getting filtered, when you query the dictionary tables text fills in with the keywords - does it look reasonably filled in?

A reader, March 16, 2006 - 1:13 pm UTC

The dictionary table did not return anything!

SQL> SELECT token_text FROM dr$demo_idx$i;

no rows selected

When you say "never tried it with pdfs," do you mean Oracle Text will not work with PDF files or do you think I used the wrong filter type or some other settings/parameters?

If I just load the PDF without compressing it and index the BLOB column, the query returns with the PDF file.

March 16, 2006 - 2:49 pm UTC

I've never done the custom function with pdfs.

A reader, March 16, 2006 - 6:19 pm UTC

Can you please give me some suggestions on how to get the custom function to work with PDF files? Thanks.

March 17, 2006 - 4:59 pm UTC

I'll have to take a look at it, but probably not until next week.

A reader, March 23, 2006 - 1:54 pm UTC

Hi Tom, sorry to bug you on this again. This is regarding writing custom functions on PDF files. As a matter of fact, I tried both Word doc as well as plain ASCII text files. None of them work with custom function. Would you be able to help?

A reader, April 01, 2006 - 5:14 pm UTC

Hi Tom. Did you get a chance to test custom functions with pdf/word/text files? thank you.

Oracle Text Search on Compressed BLOB Data

Su Baba, May 04, 2006 - 2:05 am UTC

Hi Tom,

Can you please give us pointers on how to get Oracle Text search to work on compressed BLOB data. I tried it with different file types: pdf, txt, word doc. None of them seem to work. I understand that pdf files may get filtered, but txt files should not get filtered right?

As mentioned by "A Reader" above (3/16/2006), Oracle Text seems to work with pdf files if the data is not compressed. However, I couldn't get it to work with compressed BLOBs.

May 04, 2006 - 2:17 am UTC

not sure that you can - the filter that can read pdf, doc, ppt, etc isn't expecting compressed data.

This might work

Tom, May 04, 2006 - 4:35 am UTC

Tom,

I think this can be done [though not in a way I would want to do it] by using the following metalink document

Note:261532.1 - Using Alternative Filters for Filtering Binary Files

essentially you write a batch / shell script which is called by the text filter [much as you suggested for debugging text errors]. This script could then unzip the file and pass it to the standard ctxhx executable for filtering. Messy but I think it would work.

May 04, 2006 - 7:10 am UTC

I thought about that - but - I'm assuming plsql and the use of utl_compress here, could be really messy.

But thanks for the note - if they want to experiment - they can.

How about a user datastore

Tom, May 04, 2006 - 11:09 am UTC

Following up the previous review I have a further possibility. In this article

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:440419921146 <code>

you use a user datastore to concatenate columns together. Rather than doing this would it not be simply possible to use a user datastore and have the function uncompress the blob. This way the blob is stored compressed, but the index is on the uncompressed data.

I'd follow your mantra of "test it" but I am actually at work at the moment and supposed to be working on something else!

Su Baba, May 06, 2006 - 1:57 am UTC

What considerations should I take when making the architecture decision for the following requirements?

- Oracle Standard Edition
- The ability to search compressed and encrypted BLOBs.

What are the advantages and disadvantages of storing the BLOBs in the database as opposed to the file system given the above requirements? Is it even possible to do a Text search on compressed and encrypted BLOBs?

May 07, 2006 - 10:51 am UTC

you don't say what is in the blobs. does it need to be filtered.

And if it is encrypted, you'll really have to tell us the point of "search"? What does it meant to search an encrypted blobs.

Far far too many people (meaning more than zero) equate ENCRYPTION with ACCESS CONTROL.

encryption is good for protecting your data from theft. when someone steals your backup.

access control is *access control*. So, what are you using encryption here for?

separate tabelspace

A reader, June 21, 2006 - 10:29 am UTC

Hi

We have a new application whcih uses BLOB and CLOB. The application vendor gaves us the database creation scripts and it only creates two tablespaces, DATA and INDEX.

I was wondering what benefit would I gain if i put LOBs in their own tablespace?

June 22, 2006 - 11:24 am UTC

tablespaces are all about "administrative ease of use". They are not about "performance" (all tablespaces could be on same disk, we can stripe disk...)

The benefits you would be looking for would be "it makes MY life better to put them into a separate tablespace".

So, would it make your life better?

Pctversion & Retention on unstructured datatypes

Balakrishna, May 14, 2008 - 5:33 am UTC

Dear Tom,

Can you please tell the difference how Pctversion & retention works in 9i and obove releases , since undo_retention is auto. As per my understanding rentention period will be used in the 9i and above releases please correct me if i am wrong.

we are adding datafiles to segments which has lob datatypes continuously but we are not able toreuse the space. please guide me.

Regards

Balakrishna.

May 14, 2008 - 3:27 pm UTC

PCTVERSION Clause
\
The CREATE TABLE statement returned from DBMS_METADATA previously included the following:

LOB ("TXT") STORE AS ( ¿ PCTVERSION 10 ¿ )

This is used to control the read consistency of the LOB. In previous chapters, we¿ve discussed read consistency, multi-versioning, and the role that undo plays in that. Well, when it comes to LOBs, the way read consistency is implemented changes. The LOBSEGMENT does not use undo to record its changes; rather, it versions the information directly in the LOBSEGMENT itself. The LOBINDEX generates undo just as any other segment would, but the LOBSEGMENT does not. Instead, when you modify a LOB, Oracle allocates a new CHUNK and leaves the old CHUNK in place. If you roll back your transaction, the changes to the LOB index are rolled back and the index will point to the old CHUNK again. So the undo maintenance is performed right in the LOB segment itself. As you modify the data, the old data is left in place and new data created.
This comes into play for reading the LOB data as well. LOBs are read consistent, just as all other segments are. If you retrieve a LOB locator at 9:00 am, the LOB data you retrieve from it will be ¿as of 9:00 am.¿ Just like if you open a cursor (a resultset) at 9:00 am, the rows it produces will be as of that point in time. Even if someone else comes along and modifies the LOB data and commits (or not), your LOB locator will be ¿as of 9:00 am,¿ just like your resultset would be. Here, Oracle uses the LOBSEGMENT along with the read-consistent view of the LOBNIDEX to undo the changes to the LOB, to present you with the LOB data as it existed when you retrieved the LOB location. It does not use the undo information for the LOBSEGMENT, since none was generated for the LOBSEGMENT itself.
We can see that LOBs are read consistent easily. Consider this small table with an out-of-line LOB (it is stored in the LOBSEGMENT):
ops$tkyte@ORA10G> create table t
2 ( id int primary key,
3 txt clob
4 )
5 lob( txt) store as ( disable storage in row )
6 /
Table created.

ops$tkyte@ORA10G> insert into t values ( 1, 'hello world' );
1 row created.

ops$tkyte@ORA10G> commit;
Commit complete.
If we fetch out the LOB locator and open a cursor on this table as follows:
ops$tkyte@ORA10G> declare
2 l_clob clob;
3
4 cursor c is select id from t;
5 l_id number;
6 begin
7 select txt into l_clob from t;
8 open c;
and then we modify that row and commit:
9
10 update t set id = 2, txt = 'Goodbye';
11 commit;
12
we¿ll see upon working with the LOB locator and opened cursor that the data is presented ¿as of the point in time we retrieved or opened them¿:
13 dbms_output.put_line( dbms_lob.substr( l_clob, 100, 1 ) );
14 fetch c into l_id;
15 dbms_output.put_line( 'id = ' || l_id );
16 close c;
17 end;
18 /
hello world
id = 1

PL/SQL procedure successfully completed.
But the data is most certainly updated/modified in the database:
ops$tkyte@ORA10G> select * from t;

ID TXT
---------- ---------------
2 Goodbye
The read-consistent images for the cursor C came from the undo segments, whereas the read-consistent images for the LOB came from the LOB segment itself.
So, that gives us a reason to be concerned: if the undo segments are not used to store rollback for LOBs and LOBs support read consistency, how can we prevent the dreaded ORA-1555: snapshot too old error from occurring? And, as important, how do we control the amount of space used by these old versions? That is where PCTVERSION comes into play.
PCTVERSION controls the percentage of allocated (used by LOBs at some point and blocks under the LOBSEGMENT¿s HWM) LOB space that should be used for versioning of LOB data. The default of 10 percent is adequate for many uses since many times you only ever INSERT and retrieve LOBs (updating of LOBs is typically not done; LOBs tend to be inserted once and retrieved many times). Therefore, not much space, if any, needs to be set aside for LOB versioning.
However, if you have an application that does modify the LOBs frequently, the default of 10 percent may be too small if you frequently read LOBs at the same time some other session is modifying them. If you hit an ORA-22924 while processing a LOB, the solution is not to increase the size of your undo tablespace, or increase the undo retention, or add more RBS space if you are using manual undo management. Rather you should use the following:
ALTER TABLE tabname MODIFY LOB (lobname) ( PCTVERSION n );
and increase the amount of space to be used in that LOBSEGMENT for versioning of data.

RETENTION Clause

</quote>

"Snapshot Too Old" and undo_retention

A reader, May 09, 2011 - 4:34 pm UTC

Please see the following example. The first PL/SQL block hit the snapshot too old error as expected. The second PL/SQL block is the same as the first except that 2 lines of "dbms_output.put_line" calls were taken out. That, however, produced an unexpected result - the PL/SQL block completed successfully! Can you please explain this behavior? Thanks.

SQL> show parameter undo_retention

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_retention                       integer     10
SQL> 
SQL> DROP TABLE t;

Table dropped.

SQL> 
SQL> CREATE TABLE t (
  2         id  INT PRIMARY KEY,
  3         txt CLOB
  4      )
  5  LOB(txt) STORE AS (DISABLE STORAGE IN ROW);

Table created.

SQL> 
SQL> INSERT INTO t
  2  SELECT level, 'Hello World ' || level
  3  FROM   dual
  4  CONNECT BY level <= 10;

10 rows created.

SQL> 
SQL> COMMIT;

Commit complete.

SQL> 
SQL> SELECT column_name, retention, pctversion, in_row
  2  FROM   user_lobs
  3  WHERE  table_name = 'T';

COLUMN_NAME           RETENTION PCTVERSION IN_
-------------------- ---------- ---------- ---
TXT                          10            NO

1 row selected.

SQL> 
SQL> -- -----------------------------------------------------------
SQL> -- Hit the snapshot too old error as expected.
SQL> -- -----------------------------------------------------------
SQL> DECLARE
  2   l_txt CLOB;
  3  
  4   CURSOR c_reader IS
  5      SELECT txt FROM t WHERE id = 10;
  6  
  7  BEGIN
  8   OPEN c_reader;
  9   FETCH c_reader INTO l_txt;
 10   dbms_output.put_line('txt=' || l_txt);
 11  
 12   UPDATE t
 13   SET    txt = LOWER(txt)
 14   WHERE  id = 10;
 15  
 16   COMMIT;
 17  
 18   dbms_lock.sleep(20);
 19  
 20   UPDATE t
 21   SET    txt = LOWER(txt)
 22   WHERE  id = 10;
 23  
 24   COMMIT;
 25  
 26   FETCH c_reader INTO l_txt;
 27   dbms_output.put_line('txt=' || l_txt);
 28   CLOSE c_reader;
 29  END;
 30  /
txt=Hello World 10
DECLARE
*
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number  with name "" too small
ORA-22924: snapshot too old
ORA-06512: at line 27


SQL> 
SQL> -- -----------------------------------------------------------
SQL> -- Took out the dbms_output.put_line lines. This time it
SQL> -- ran successfully.
SQL> -- -----------------------------------------------------------
SQL> DECLARE
  2   l_txt CLOB;
  3  
  4   CURSOR c_reader IS
  5      SELECT txt FROM t WHERE id = 10;
  6  
  7  BEGIN
  8   OPEN c_reader;
  9   FETCH c_reader INTO l_txt;
 10  
 11   UPDATE t
 12   SET    txt = LOWER(txt)
 13   WHERE  id = 10;
 14  
 15   COMMIT;
 16  
 17   dbms_lock.sleep(20);
 18  
 19   UPDATE t
 20   SET    txt = LOWER(txt)
 21   WHERE  id = 10;
 22  
 23   COMMIT;
 24  
 25   FETCH c_reader INTO l_txt;
 26   CLOSE c_reader;
 27  END;
 28  /

PL/SQL procedure successfully completed.

SQL> 
SQL> spool off

May 09, 2011 - 5:57 pm UTC

and what happened if you left the dbms_output's in. Why do you think they have any relation to the issue at all?

All this means is the second time around, the read consistent image we needed was still in the lob segment - you had made it grow enough with your prior update activity.

A reader, May 09, 2011 - 8:12 pm UTC

Because no matter what I did, I could not get it to produce the snapshot too old error as long as there is no dbms_output.put_line call in the code. It puzzled me and I couldn't figure it out. The example below shows just one case. I've played with this for a while now in different scenarios. The behavior (not hitting snapshot too old) seems consistent in the absence of the dbms_output.put_line call. It makes no sense but it's pretty consistent in all my test cases.

SQL> show parameter undo_retention

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_retention                       integer     10
SQL> 
SQL> DROP TABLE t;

Table dropped.

SQL> 
SQL> CREATE TABLE t (
  2         id  INT PRIMARY KEY,
  3         txt CLOB
  4      )
  5  LOB(txt) STORE AS (DISABLE STORAGE IN ROW);

Table created.

SQL> 
SQL> INSERT INTO t
  2  SELECT level, 'Hello World ' || level
  3  FROM   dual
  4  CONNECT BY level <= 10;

10 rows created.

SQL> 
SQL> COMMIT;

Commit complete.

SQL> 
SQL> SELECT column_name, retention, pctversion, in_row
  2  FROM   user_lobs
  3  WHERE  table_name = 'T';

COLUMN_NAME           RETENTION PCTVERSION IN_
-------------------- ---------- ---------- ---
TXT                          10            NO

1 row selected.

SQL> 
SQL> -- -----------------------------------------------------------
SQL> -- Hit the snapshot too old error as expected.
SQL> -- -----------------------------------------------------------
SQL> DECLARE
  2   l_txt CLOB;
  3  
  4   CURSOR c_reader IS
  5      SELECT txt FROM t WHERE id = 10;
  6  
  7  BEGIN
  8   OPEN c_reader;
  9   FETCH c_reader INTO l_txt;
 10  
 11   UPDATE t
 12   SET    txt = LOWER(txt)
 13   WHERE  id = 10;
 14  
 15   COMMIT;
 16  
 17   dbms_lock.sleep(20);
 18  
 19   UPDATE t
 20   SET    txt = LOWER(txt)
 21   WHERE  id = 10;
 22  
 23   COMMIT;
 24  
 25   FETCH c_reader INTO l_txt;
 26   CLOSE c_reader;
 27  END;
 28  /

PL/SQL procedure successfully completed.

May 10, 2011 - 7:58 am UTC

After looking at this closer, it is obvious.

If you don't reference the clob, why would it throw an ora-1555? You only referenced the clob (actually tried to access the data it pointed to) when you tried to dbms_output it. If you didn't dbms_output it, you didn't reference it, it would never throw an ora-1555.

You have to reference it, it doesn't take dbms_output.

Your code on line 25 is in error by the way, you already fetched it, you are forgetting to check "c_reader%found" and do proper error handling.

This is all you need to see the 1555.

ops$tkyte%ORA11GR2> DECLARE
  2      l_txt CLOB;
  3      l_data long;
  4  
  5      CURSOR c_reader IS SELECT txt FROM t WHERE id = 10;
  6  BEGIN
  7      OPEN c_reader;
  8      FETCH c_reader INTO l_txt;
  9      CLOSE c_reader;
 10  
 11      UPDATE t SET    txt = LOWER(txt) WHERE  id = 10;
 12      COMMIT;
 13  
 14      dbms_lock.sleep(20);
 15  
 16      UPDATE t SET    txt = LOWER(txt) WHERE  id = 10;
 17      COMMIT;
 18  
 19      l_data := l_txt;
 20  END;
 21  /
DECLARE
*
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number  with name "" too small
ORA-22924: snapshot too old
ORA-06512: at line 19

A reader, May 10, 2011 - 12:05 pm UTC

So what does FETCH do exactly? Does it not reference the CLOB or is it simply holding the pointer to the CLOB?

I noticed that you use the long datatype for l_data. I tried your example but used CLOB for l_data instead. It did not produce the snapshot too old error. Why doesn't CLOB do the same thing?

thanks

May 10, 2011 - 12:29 pm UTC

The FETCH just gets the lob locator, it doesn't dereference it, it doesn't actually touch the lob segment at all.

LONG is a varchar2(32765) in plsql - it is a string.
CLOB is just a pointer

if you assign a clob to a string, it is like this:

str := TO_CHAR(clob);

it converts the clob into a string, in order to do that, it has to actually GET the data.

If you assign a clob to a clob, that is an "fast" thing, you are just assigning a pointer to a pointer. We don't have to copy anything else.

"Snapshot Too Old" and pctversion

A reader, May 12, 2011 - 8:01 pm UTC

How does pctversion for lob segments work? If it's set to 10, does that mean 10% of the space below the high water mark is allocated for versioning?

In the example below, I tried to hit the "snapshot too old" error. I thought the second UPDATE operation would wipe out the UNDO needed by the reader, but the script completed successfully.

SQL> 
SQL> DROP TABLE t;

Table dropped.

SQL> 
SQL> CREATE TABLE t (
  2         id  INT PRIMARY KEY,
  3         txt CLOB
  4      )
  5  LOB(txt) STORE AS (DISABLE STORAGE IN ROW);

Table created.

SQL> 
SQL> INSERT INTO t
  2  SELECT level, 'Hello World ' || level
  3  FROM   dual
  4  CONNECT BY level <= 10;

10 rows created.

SQL> 
SQL> COMMIT;

Commit complete.

SQL> 
SQL> ALTER TABLE t MODIFY LOB (txt) (PCTVERSION 10);

Table altered.

SQL> 
SQL> 
SQL> SELECT column_name, retention, pctversion, in_row
  2  FROM   user_lobs
  3  WHERE  table_name = 'T';

COLUMN_NAME                     RETENTION PCTVERSION IN_
------------------------------ ---------- ---------- ---
TXT                                               10 NO

1 row selected.

SQL> 
SQL> DECLARE
  2   l_txt  CLOB;
  3   l_data LONG;
  4  
  5   CURSOR c_reader IS
  6      SELECT txt FROM t WHERE id = 10;
  7  
  8  BEGIN
  9   OPEN c_reader;
 10   FETCH c_reader INTO l_txt;
 11   CLOSE c_reader;
 12  
 13   UPDATE t
 14   SET    txt = LOWER(txt)
 15   WHERE  id = 10;
 16  
 17   COMMIT;
 18  
 19   -- ------------------------------------------------------------
 20   -- Shouldn't this UPDATE operation wipe out the UNDO needed
 21   -- by the reader?
 22   -- ------------------------------------------------------------
 23   FOR i In 1..10 LOOP
 24      UPDATE t
 25      SET   txt = LOWER(txt)
 26      WHERE  id  = i;
 27  
 28      COMMIT;
 29   END LOOP;
 30  
 31   l_data := l_txt;
 32  END;
 33  /

PL/SQL procedure successfully completed.

SQL> 
SQL> spool off

May 13, 2011 - 12:45 pm UTC

It reserves about 10% of the allocated space for 'undo' - however, it is not a LIMIT on the maximum amount of undo. It can save as much as it wants.

There is also the additional fact that commits in plsql are asynchronous by default - you might not be able to emulated everything you want in a single plsql block like that.

Pctversion

A reader, May 13, 2011 - 2:51 pm UTC

I'm trying to get a grasp on pctversion. Can you come up with a simple example on how it actually works? What do I need to do to hit the "snapshot too Old" error? I can easily do that when pctversion is set to 0. However, when it's set to a value greater than 0, I have not been able to consistently hit the error. Obviously, my understanding is not complete. A simple example would definitely help clarify the confusion. THanks.

May 18, 2011 - 2:39 am UTC

pctversion saves about that percentage for undo from the total allocated space. If you have pctversion 10, it'll save about 10% of the allocated space for previous versions.

About 10%
Not precisely 10%

and it is under no obligation to immediately expire undo at any given point in time.

That you can sometimes hit it (and sometimes not) at precisely the point you think you should goes towards that "about X%"

A reader, May 13, 2011 - 2:51 pm UTC

"There is also the additional fact that commits in plsql are asynchronous by default - you might not be able to emulated everything you want in a single plsql block like that."

Can you please elaborate on this statement?

May 18, 2011 - 2:42 am UTC

commits in plsql do not wait for lgwr to finish writing to the redo log, they do not wait for the commit to "finish", they just post lgwr and ask him to commit.

therefore, your commits may not yet be done in your example until the plsql block completes.

for example:

begin
   for i in 1 .. 100
   loop
      insert into t values ( i );
      commit;  -- async, does not wait for commit to finish
   end loop;

   << IMPLICIT WAIT FOR LAST COMMIT TO ACTUALLY FINISH >>
end;

is how plsql would work under the covers - only when the block of code submitted by the client application is finished does plsql wait for the commits it performed to be "done".

So, the commits in the above example - which in theory would/could release the undo - might not yet be completed until the example is basically "done" already.

Binary data in CLOB and VARCHAR2

Rittick, December 17, 2011 - 8:41 pm UTC

I have data that varies from 2000 to 50000 bytes. 90 % of the data is less than 4000 bytes. The data is encrypted before storage. My understanding is that VARCHAR2 is more efficient in terms of storage & retrieval. My questions are:

1. What the potential problems of storing encrypted data in a varchar2 or clob
2. Would I get any performance gains by storing data less than 4k in a varchar2 instead of a clob. Data greater than 4k bytes will of course be stored in a clob.
3. Or Is Blob the only option

Thanks for your help.

December 18, 2011 - 5:23 am UTC

My understanding is that
VARCHAR2 is more efficient in terms of storage & retrieval.

that doesn't matter - you pick a data type based on your storage needs, nothing else.

You use dates for dates
You use numbers for numbers
You use strings for strings
You use blobs or raw for binary data

1) numerous. character set translation is the first one. Unless every client that ever connects to your database uses the same exact character set as the database uses - character set translation will take place, you'll scramble your encrypted data like an egg. It will effectively corrupt it.

even if you use the same - you will still have character set issues when you access this data. The data in the column will NOT conform to the rules of the character set you are using.

also, you have to use a clob, a clob is not a varchar2 - so it is not even relevant to thinks about varchar2.

and lastly, a varchar2 is not "more efficient in terms of storage and retrieval" in general. If you have a date, a number - it would be horribly inefficient to store them in a varchar2. If you have data that exceeds 4000 bytes, it would be pretty much not possible to store it in a varchar2. A varchar2 is used when you have text data of 4000 bytes or less - for nothing else.

2) we already inline data like that - in both a blob and a clob. Just use a blob. (raw and blob are 'faster' than varchar2 and clob by the way - they are the 'fastest' types, we don't have to do anything to them, just store them)

3) blob is the correct approach, yes.

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

LOB CHUNK size and index clause

Any BLOB limitations

Weird Error with BLOB datatype

lmts autoallocate / assm

Using BLOB instead of LONG!

imp blob

upload files

LOB contention

Elaboration on LOB contention

LOB contention

Re-initializing CLOBs to save space

BLOBs and Text Search

Oracle Text Search on Compressed BLOB Data

This might work

How about a user datastore

separate tabelspace

Pctversion & Retention on unstructured datatypes

"Snapshot Too Old" and undo_retention

"Snapshot Too Old" and pctversion

Pctversion

Binary data in CLOB and VARCHAR2

More to Explore

CLOB/BLOB/etc

efficient use of BLOB

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

LOB CHUNK size and index clause

Any BLOB limitations

Weird Error with BLOB datatype

lmts autoallocate / assm

Using BLOB instead of LONG!

imp blob

upload files

LOB contention

Elaboration on LOB contention

LOB contention

Re-initializing CLOBs to save space

BLOBs and Text Search

Oracle Text Search on Compressed BLOB Data

This might work

How about a user datastore

separate tabelspace

Pctversion & Retention on unstructured datatypes

"Snapshot Too Old" and undo_retention

"Snapshot Too Old" and pctversion

Pctversion

Binary data in CLOB and VARCHAR2

More to Explore

CLOB/BLOB/etc