Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, San.

Asked: February 07, 2001 - 12:48 pm UTC

Last updated: December 20, 2005 - 6:51 pm UTC

Version: 8.1.5

Viewed 1000+ times

You Asked

Tom:

We are trying to store XML pages into CLOB datatype. How do I make a serch for a specific string on that XML page? Can we use Inter Media or some thing..if yes, please let me know How?

Regards,
San



and Tom said...

I asked Joel Kallman, our resident interMedia expert to take a look at this. He gave this very detailed example that shows how to use interMedia to index XML and do section searching on the attributes contained within an XML document:


InterMedia Text supports indexing of XML via
the specification of a section group. A section group
is a collection of predefined sections of a document.

interMedia Text sections let you search for text
within a particular named section, rather than
across an entire document. This can dramatically improve
the accuracy of searching across a set of tagged documents.




1) Firstly, create the table to store our XML documents

create table employee_xml(
id number primary key,
xmldoc clob )
/


2) Insert a sample document (the DTD is not required)

insert into employee_xml values( 1,
'<?xml version="1.0"?>
<!DOCTYPE employee [
<!ELEMENT employee (Name, Dept, Title)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Dept (#PCDATA)>
<!ELEMENT Title (#PCDATA)>
]>
<employee>
<Name>Joel Kallman</Name>
<Dept>Oracle Service Industries Technology Group</Dept>
<Title>Technologist</Title>
</employee>');



3) Create our interMedia Text section group called
'xmlgroup', Add the tags Name and Dept to the section
group.

(Caution: in XML, tag names are case-sensitive, but
tag names in section groups are case-insensitive)

begin
ctx_ddl.create_section_group('xmlgroup', 'XML_SECTION_GROUP');
ctx_ddl.add_zone_section( 'xmlgroup', 'Name', 'Name' );
ctx_ddl.add_zone_section( 'xmlgroup', 'Dept', 'Dept' );
end;
/



4) Create our interMedia Text index, specifying the section
group
we created above. Also, specify the
null_filter, as the Inso
filter is not required.

create index employee_xml_index on employee_xml( xmldoc )
indextype is ctxsys.context
parameters( 'filter ctxsys.null_filter section group xmlgroup' )
/



5) Now, execute a query, searching for a name within a
specific section

select id from employee_xml where
contains(xmldoc, 'Joel within Name') > 0;


6) Only non-empty tags will be indexed, but not the tag names
themselves.
Thus, the following queries will return
zero rows.


select id from employee_xml
where contains(xmldoc, 'title') > 0;

select id from employee_xml
where contains(xmldoc, 'employee') > 0;


7) But the following query will locate our document, even
though
we have not defined Title as a section.

select id from employee_xml where contains(xmldoc, 'Technologist') > 0;





Let's say you want to get going right away with indexing XML, and don't want to have to specify sections for every element in your XML document collection. You can do this very easily by using the predefined AUTO_SECTION_GROUP. This section group is exactly like the XML section group, but the pre-definition of sections is not required. For all non-empty tags in your document, a zone section will be created with the
section name the same as the tag name.

Use of the AUTO_SECTION_GROUP is also ideal when you may not know in advance all of the tag names that will be a part of your XML document set.



8) Drop our existing interMedia Text index.

drop index employee_xml_index
/


9) And this time, recreate it specifying the
AUTO_SECTION_GROUP.

We do not need to predefine the sections of our group, it is
handled for us automatically.

create index employee_xml_index on employee_xml( xmldoc )
indextype is ctxsys.context
parameters( 'filter ctxsys.null_filter section group ctxsys.auto_section_group' )
/


10) And once again, we should be able to locate our document using a section search

select id from employee_xml
where contains(xmldoc, 'Technologist within Title') > 0;





interMedia Text in 8.1.6 and 8.1.7 has added many exciting features in support of XML. For example, AUTO_SECTION_GROUP also automatically indexes tag attributes. Additionally, in 8.1.6 and later, you can define sections qualified with a doctype delimiter. This avoids the problem of tag "collisions" when you may have different document types with identical tag names.

These and many other new features can be reviewed at:


</code> http://technet.oracle.com/sample_code/products/text/htdocs/text_samples/Lite/Samples/imt_816_techover.html <code>
...

Rating

  (10 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Simply super

anthony, September 18, 2002 - 11:26 am UTC

Its a superp explanation.

Empty Tags

Rukshan Soza, December 10, 2002 - 5:58 am UTC

Tom,

Can you please explain how we can find documents with empty tags?. You explained here that empty tags are not indexes. We have a need to query based on empty tags.

Also, we already have an intermedia index to do free text search with "contains" clause. If we want to add increase functionality to search via tags then do we create a new index with auto xml sections or can we modify the existing index with alter command. An example would be much appreciated.

Many Thanks,

Rukshan

Tom Kyte
December 17, 2002 - 11:14 am UTC

I asked Sean Dillon, our local XML technologist, to take a look at this and here's what he had to say:
--

In 8.1.5, searching for empty tags will require you to use the XML Developer's Kits. This means in order to check an element to see if it's empty, you have to load it up into a DOM or SAX parser and perform an XPath expression against the document. This is a programmatic endeavor, and will be CONSIDERABLY processor and memory intensive (If your documents aren't too large, I would strongly urge you to use the DOM parser for this, just keep in mind the entire document needs to fit into memory). It's resource intensive because basically you'll have to full scan the table and for each row perform a series of operations. Parse the document into memory, then perform the XPath expression to check for an empty string. If you MUST do this in 8.1.5, I would think of alternative ways to accomplish what you're trying to do. This all depends on the behavior of your application of course, but maybe you want to try checking to see if an element is null when you insert the row, and then add a column to the table that indicates whether it's null or not. That way, you can index that column and quickly find those rows that satisfy the "null element" requirement.

In Oracle9i Release 2, this is exponentially easier. Not only do you not have to write the code to perform the parse and XPath search, all you have to do is use a built-in function to the XMLType data type known as "EXTRACT" or "EXISTSNODE". You can also create function-based indexes on XPath expressions in 9iR2, which would also speed up your searches considerably.

If you want to play with XPath expressions, a fellow Oracle employee, Steve Meunch, wrote a book entitled "Building Oracle XML Applications" for O'Reilly. In it, he has a utility called "XPathGrep" that lets you quickly perform an XPath expression against an XML document. As soon as you get the XPath expression you need to look for an empty tag, you've got your answer. You can find all the code for Steve's book on O'Reilly's website, or you can find the two classes needed (along with the XML Developer's Kits, of course) for XPathGrep here: </code> http://asktom.oracle.com/~sdillon/xml/xpathgrep.tar

For the intermedia index, I'm not so sure I understand what you mean by "search via tags".  This is precisely what the CONTAINS clause does for you, it lets you search for text or values within particular elements.

_____________________________________________________________________
Sean Dillon - Principal Technologist, Oracle Corporation
Author "Beginning Oracle Programming" 
http://www.amazon.com/exec/obidos/ASIN/186100690X <code>

CLOB

Art, December 17, 2002 - 7:39 pm UTC

Tom Kyte
December 18, 2002 - 11:00 am UTC

goto technet
goto documentation
goto database
goto text

its all there.

Great Explaination

Anirudh Sharma, March 17, 2003 - 7:16 am UTC

Hi Tom
Actually we also have a same sort of requiremnet, in the sense that we also need to store the XML file data in the database and then go for the search. I have not worked much on interMedia before what I would like to know is how can we store a really long XML file in the CLOB column(file with about 10K lines). Tried it using SQLLDR but it wasn't fruitful. It seems to be storing only the path that I gave. Kindly rectify.
ctl file for SQLLDR
load data
INFILE 'd:\ldr.dat'
INTO TABLE inter_XML
APPEND
FIELDS TERMINATED BY ','
(id,
ext_fname FILLER CHAR(80),
xml_doc LOBFILE(ext_fname) TERMINATED BY EOF)
-------
file ldr.dat has path to the xml file:-
1,D:\test.xml
-----

Or if there is any other way to acheive this coz simple insert statment gave me string too long error;
Thanks

Tom Kyte
March 17, 2003 - 9:28 am UTC

ops$tkyte@ORA920> create table inter_xml( id number, xml_doc clob );

Table created.

And then after running sqlldr:

ops$tkyte@ORA920> select * from inter_xml;

        ID XML_DOC
---------- --------------------------------------------------------------------------------
         1 load data
           INFILE *
           INTO TABLE inter_XML
           APPEND
           FIELDS TERMINATED BY ','
           (id,
            ext_fname FILLER CHAR(80),
            xml_doc LOBFILE(ext_fname) TERMINATED BY EOF)
           begindata
           1,/home/tkyte/t.ctl



I loaded my ctl file -- so you can see it.  You've done something wrong, such as using a control file different then the one posted here.  Your CTL file is dandy, you made some other mistake. 

Valar, January 28, 2004 - 9:29 am UTC

Simply Brillant Explanation with concise examples !!

How Use InterMedia to get this..

Arangaperumal, July 09, 2004 - 7:54 am UTC

Hi Tom,
My table and data are as below.
By using Intermedia text search i need to get records as given.




create table sample
(
id number PRIMARY KEY,
name CLOB
)


INSERT INTO SAMPLE VALUES(1,'Some data |need to| be inserted |here.It has to')
INSERT INTO SAMPLE VALUES(2,'Here| no need|to enter |somedata. because| testing ')
INSERT INTO SAMPLE VALUES(3,'|Why data? no data| and anything')
INSERT INTO SAMPLE VALUES(4,'InterMedia|search can be|used here| and |need more| information')
INSERT INTO SAMPLE VALUES(5,'all -search are not- equal to -sinter- media')


if contains(name,'\|need')>0 then


1 |need to
4 |need more


if contains(name,'\|some') > 0 then


2 |somedata. because



if contains(name,'\|s')>0 then

2 |some data because
4 |search can be



if contains(name,'\-s') then

5 -search are not
5 -sinter



how can achive this?
my output must be from first special character to next special character.
I can also achive as last one also.

Thank you


Tom Kyte
July 09, 2004 - 8:11 am UTC

you will have to parse it -- Oracle text finds text - if you have special processing needs (eg: there is no such thing as a "special character" beyond your definition of it -- in your problem domain, you have a "unique" problem here) you'll need to implement them.

text can mark up the tokens that it finds (that is documented), but you would have to find the "end of the special text" on your own after that.

Date stored in the XML issues

Jon, December 20, 2005 - 3:51 pm UTC

I have followed this thread and implemented a table with a CLOB field and an Oracle Text index on it. It works pretty well except for searching on a date value stored in the XML.

Here are some test cases:

create table test_table
(log_timestamp date not null,
xml_message clob not null);

insert into test_table values (sysdate, '<?xml version="1.0"?>
<LogInfo>
<LogTimeStamp>2005-08-25T10:19:28.2115478-04:00</LogTimeStamp>
</LogInfo>');

insert into test_table values (sysdate, '<?xml version="1.0"?>
<LogInfo>
<LogTimeStamp>2005/08/25T10:19:28.2115478_04:00</LogTimeStamp>
</LogInfo>');

insert into test_table values (sysdate, '<?xml version="1.0"?>
<LogInfo>
<LogTimeStamp>20050925</LogTimeStamp>
</LogInfo>');

insert into test_table values (sysdate, '<?xml version="1.0"?>
<LogInfo>
<LogTimeStamp>2005-09-25</LogTimeStamp>
</LogInfo>');

insert into test_table values (sysdate, '<?xml version="1.0"?>
<LogInfo>
<LogTimeStamp>2005/09/25</LogTimeStamp>
</LogInfo>');

insert into test_table values (sysdate, '<?xml version="1.0"?>
<LogInfo>
<LogTimeStamp>2005:09:25</LogTimeStamp>
</LogInfo>');

insert into test_table values (sysdate, '<?xml version="1.0"?>
<LogInfo>
<LogTimeStamp>2005|09|25</LogTimeStamp>
</LogInfo>');


commit;


create index dnl.test_table_idx on dnl.test_table( xml_message )
indextype is ctxsys.context
parameters( 'filter ctxsys.null_filter section group ctxsys.auto_section_group' );

--test cases:

select log_timestamp, xml_message
from test_table t
where contains(t.xml_message, '2005-08-25T10 within LogTimeStamp') > 0;

--4 rows but none contain 2005-08-25

select log_timestamp, xml_message
from test_table t
where contains(t.xml_message, '2005-08-25T10:19:28.2115478-04:00 within LogTimeStamp') > 0;

--4 rows but none contain 2005-08-25

select log_timestamp, xml_message
from test_table t
where contains(t.xml_message, '2005/08/25T10:19:28.2115478_04:00 within LogTimeStamp') > 0;

-- no rows

select log_timestamp, xml_message
from test_table t
where contains(t.xml_message, '2005/08/25T10 within LogTimeStamp') > 0;

--both rows that have 2005 and 08. It should only be one row.

select log_timestamp, xml_message
from test_table t
where contains(t.xml_message, '2005:09 within LogTimeStamp') > 0;

--four rows that have 2005 and 09. It should be just one row.

select log_timestamp, xml_message
from test_table t
where contains(t.xml_message, '2005|09 within LogTimeStamp') > 0;

-- all 6 rows

select log_timestamp, xml_message
from test_table t
where contains(t.xml_message, '2005/09 within LogTimeStamp') > 0;

--four rows that have 2005 and 09. it should be just one row.

It is obviously having problems with /, :, and | but how else can I create a date field in the XML that I can index properly? What other characters are "special" that I shouldn't have in the XML?


Tom Kyte
December 20, 2005 - 6:51 pm UTC

not sure exactly how you want the searching to do - what should match, but it'll be a combination of

o printjoins (to index the string properly) 
o text search options

For example, exact searching:

ops$tkyte@ORA10GR2> begin
  2     ctx_ddl.create_preference('my_lexer','BASIC_LEXER');
  3     ctx_ddl.set_attribute('my_lexer','printjoins','/-:');
  4  end;
  5  /

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> create index test_table_idx on test_table( xml_message )
  2      indextype is ctxsys.context
  3      parameters( 'filter ctxsys.null_filter section group ctxsys.auto_section_group lexer my_lexer' );

Index created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select log_timestamp, xml_message
  2  from test_table t
  3  where contains(t.xml_message,  '{2005-08-25T10} within LogTimeStamp') > 0;

no rows selected

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select log_timestamp, xml_message
  2  from test_table t
  3  where contains(t.xml_message,  '{2005-08-25T10:19:28.2115478-04:00} within LogTimeStamp') > 0;

LOG_TIMES XML_MESSAGE
--------- --------------------------------------------------------------------------------
20-DEC-05 <?xml version="1.0"?>
          <LogInfo>
            <LogTimeStamp>2005-08-25T10:19:28.2115478-04:00</LogTimeStamp>
          </LogInfo>


ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select log_timestamp, xml_message
  2  from test_table t
  3  where contains(t.xml_message,  '{2005/08/25T10:19:28.2115478_04:00} within LogTimeStamp') > 0;

LOG_TIMES XML_MESSAGE
--------- --------------------------------------------------------------------------------
20-DEC-05 <?xml version="1.0"?>
          <LogInfo>
            <LogTimeStamp>2005/08/25T10:19:28.2115478_04:00</LogTimeStamp>
          </LogInfo>


ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select log_timestamp, xml_message
  2  from test_table t
  3  where contains(t.xml_message,  '{2005/08/25T10} within LogTimeStamp') > 0;

no rows selected

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select log_timestamp, xml_message
  2  from test_table t
  3  where contains(t.xml_message,  '{2005:09} within LogTimeStamp') > 0;

no rows selected

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select log_timestamp, xml_message
  2  from test_table t
  3  where contains(t.xml_message,  '{2005|09} within LogTimeStamp') > 0;

LOG_TIMES XML_MESSAGE
--------- --------------------------------------------------------------------------------
20-DEC-05 <?xml version="1.0"?>
          <LogInfo>
            <LogTimeStamp>2005|09|25</LogTimeStamp>
          </LogInfo>


ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select log_timestamp, xml_message
  2  from test_table t
  3  where contains(t.xml_message,  '{2005/09} within LogTimeStamp') > 0;

no rows selected
 

Jon, December 27, 2005 - 11:32 am UTC

begin
ctx_ddl.create_preference('my_lexer','BASIC_LEXER');
ctx_ddl.set_attribute('my_lexer','printjoins','/-:|');
end;

drop index test_table_idx;

create index test_table_idx on test_table( xml_message )
indextype is ctxsys.context
parameters( 'filter ctxsys.null_filter section group ctxsys.auto_section_group lexer my_lexer' );

--now running the query and escaping the - and adding a % for a wildcard:

select log_timestamp, xml_message
from test_table t
where contains(t.xml_message, '2005\-08\-25T10% within LogTimeStamp') > 0;

1 row returned and it is the right row! :)

thanks!


Use XML file and translate to INSERT

Adam, March 02, 2007 - 7:20 am UTC

Hi Tom,

If I have created an XML file holding data from a source table, looking like this:
<?xml version = '1.0'?>
<VALUESET>
<VALUES num="1">
<C1>1</C1>
<C2>1</C2>
</VALUES>
<VALUES num="2">
<C1>2</C1>
<C2>2</C2>
</VALUES>
<VALUES num="3">
<C1>3</C1>
<C2>3</C2>
</VALUES>
<VALUES num="4">
<C1 NULL="TRUE"/>
<C2 NULL="TRUE"/>
</VALUES>
<VALUES num="5">
<C1>5</C1>
<C2 NULL="TRUE"/>
</VALUES>
<VALUES num="6">
<C1 NULL="TRUE"/>
<C2>6</C2>
</VALUES>
</VALUESET>

Is it possible to then use this file to add the data into a table on the target schema? In other words I want something like this:
INSERT INTO t (C1, C2)
VALUES (NULL, 6);

Thanks

same question

Alex, April 01, 2010 - 9:18 am UTC

Hello Tom,

I have the same question as the one above. I guess the answer is actually already here in the former posts, but since I am a total newbie at oracle I probably need some steps more to be explained.

What I want to achieve is to insert some information from an xml file into an oracle table. Like the example above.

Thanks a lot in advance

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here