Skip to Main Content
  • Questions
  • Break down large text into chunks of different formats

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Dan.

Asked: May 14, 2006 - 11:40 am UTC

Last updated: May 15, 2006 - 8:26 pm UTC

Version: 9.2.0

Viewed 1000+ times

You Asked

Hi Tom,

I have 2 tables:
a)
Table BIGTEXT
2 columns:
ID NUMBER
TEXT CLOB

CREATE TABLE bigtext(id NUMBER, text CLOB);
INSERT INTO bigtext VALUES(1, 'This is a really long text that needs to be broken down into chunks of various sizes');

TEXT could also be VARCHAR2(4000).

b)
Table BROKENDOWN
4columns:
HOWMANY NUMBER
WORD1 VARCHAR2(200)
WORD2 VARCHAR2(200)
WORD3 VARCHAR2(200)

CREATE TABLE brokendown
(
howmany NUMBER,
word1 VARCHAR2(200),
word2 VARCHAR2(200),
word3 VARCHAR2(200)
);

Supposing table BIGTEXT only contains 1 row with ID = 1 and
TEXT = 'This is a really long text that needs to be broken down into chunks of various sizes'
I need to write a generic procedure that receives a parameter called HOWMANY, that does the following:

If HOWMANY = 1, it should perform INSERTs of the kind below:

INSERT INTO brokendown VALUES(1,'This',NULL,NULL);
INSERT INTO brokendown VALUES(1,'is',NULL,NULL);
INSERT INTO brokendown VALUES(1,'a',NULL,NULL);
INSERT INTO brokendown VALUES(1,'really',NULL,NULL);
INSERT INTO brokendown VALUES(1,'long',NULL,NULL);
..

If HOWMANY = 2:

INSERT INTO brokendown VALUES(2,'This','is',NULL);
INSERT INTO brokendown VALUES(2,'is','a',NULL);
INSERT INTO brokendown VALUES(2,'a','really',NULL);
INSERT INTO brokendown VALUES(2,'really','long',NULL);
INSERT INTO brokendown VALUES(2,'long','text',NULL);
..

If HOWMANY = 3:

INSERT INTO brokendown VALUES(3,'This','is','a');
INSERT INTO brokendown VALUES(3,'is','a','really');
INSERT INTO brokendown VALUES(3,'a','really','long');
INSERT INTO brokendown VALUES(3,'really','long','text');
INSERT INTO brokendown VALUES(3,'long','text','that');
..

I would appreciate a hint on how to best approach the problem and find a well-performing solution that consumes the least amount of resources.
Thanks,
-Dan


and Tom said...

This is just procedural code? You'll

a) read clob
b) parse clob
c) insert


psuedo code:


for x in ( select * from bigtext )
loop
while text is not yet null
loop
for i in 1 .. howmany
loop
array(i) := word from text
text := text minus that word (shrink the text as you parse it)
end loop
insert into brokendown values ( x.id, array(1), array(2), array(3) );
end loop
end loop


Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

10g can use regexp_substr to parse words

Kirill, May 15, 2006 - 12:25 pm UTC

Hi,

If you were in 10g, this would be another way to do it using straight SQL:

select regexp_substr (
'This is a really long text that needs to be
broken down into chunks of various sizes',
'[[:alnum:]_]+',1,rownum)
from dual
connect by level <= :howmany

... and to get all of the words:

select regexp_substr (
'This is a really long text that needs to be
broken down into chunks of various sizes',
'[[:alnum:]_]+',1,rownum)
from dual
connect by level <= regexp_instr (
'This is a really long text that needs to be
broken down into chunks of various sizes',
'[[:alnum:]_]+',1,level);

REGEXP_SUBSTR('THISISAREALLYLONGTEXTTHATNEEDSTOBEBROKENDOWNINTOCHUNKSOFVARIOUSSI
--------------------------------------------------------------------------------
This
is
a
really
long
text
that
needs
to
be
broken
down
into
chunks
of
various
sizes

17 rows selected.

</code> http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm#CHDIDJJC <code>



But what about...

Chris Poole, May 15, 2006 - 7:25 pm UTC

Hi Tom,

Am I right in thinking that creating a text index would also do the same? You could then query the underlying DR$<blah>$I table. Would this not be easier as the code is already written?

Just wondering,

Chris Poole


Tom Kyte
May 15, 2006 - 8:26 pm UTC

I think using a text index to get at the tokens would be "overkill"

besides, you would lose the order and the "cardinality", as well as stopwords ;)



More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here