Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Raymond.

Asked: April 29, 2009 - 1:16 pm UTC

Last updated: April 28, 2010 - 3:25 pm UTC

Version: 11.1.0.7

Viewed 10K+ times! This question is

You Asked

I have a word document .rtf that i want to convert to HTML and plain text for my oracle text search.

I tried this code that i found in your site but i'm getting error.

--I have created the preferenceexec ctx_ddl.create_preference('test_inso_filter', 'INSO_FILTER');
exec ctx_ddl.create_policy( 'my_policy', 'test_inso_filter' );

--I have created LOAD_DIR Directory as well.
-- my document name is 0400630.rtf


declare
l_bfile bfile;
l_clob clob;
begin
l_bfile := bfilename( 'LOAD_DIR', '0400630.rtf' );
dbms_lob.fileopen( l_bfile );
ctx_doc.policy_filter( 'my_policy', l_bfile, l_clob, true );

dbms_output.put_line( l_clob );
end;
/


ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-11207: user filter command exited with status 1
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DOC", line 1050
ORA-06512: at line 7


and Tom said...

that is an old one - the inso filter was deprecated many releases ago...

Here is an 11.1.0.7 example

ops$tkyte%ORA11GR1> exec ctx_ddl.create_preference('test_filter', 'AUTO_FILTER');

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> exec ctx_ddl.create_policy( 'my_policy', 'test_filter' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create or replace directory load_dir as '/home/tkyte'
  2  /

Directory created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> declare
  2    l_bfile bfile;
  3    l_clob clob;
  4  begin
  5    l_bfile := bfilename( 'LOAD_DIR', 'test.rtf' );
  6    dbms_lob.fileopen( l_bfile );
  7    ctx_doc.policy_filter( 'my_policy', l_bfile, l_clob, false );
  8    dbms_output.put_line( l_clob );
  9  end;
 10  /
<HTML><BODY>

<p><font size="3" face="Times New Roman">This is a test
document</font></p>
<h1><font size="3" face="Times New Roman"><i>In
italic</i></font></h1>
<h2><font size="3" face="Times New Roman"><b>In
bold</b></font></h2>
<h3><font size="3" face="Times New
Roman"><u>Underlined</u></font></h3>
<p><font size="3" face="Times New
Roman">That is all?</font></p>















</BODY></HTML>


PL/SQL procedure successfully completed.


Rating

  (14 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

same error I got. Am i doing something wrong

Raymond Tan, April 29, 2009 - 5:02 pm UTC

SQL> exec ctx_ddl.create_preference('test_filter', 'AUTO_FILTER');

PL/SQL procedure successfully completed.

SQL> exec ctx_ddl.create_policy( 'my_policy', 'test_filter' );

PL/SQL procedure successfully completed.

SQL> CREATE OR REPLACE DIRECTORY
      LOAD_DIR AS     '/export/home/oracle/rtan/OpinionsRTF'
/    

Directory created.

SQL> declare
      l_bfile bfile;
      l_clob clob;
begin
   l_bfile := bfilename( 'LOAD_DIR', '0400630.rtf' );
   dbms_lob.fileopen( l_bfile );
   ctx_doc.policy_filter( 'my_policy', l_bfile, l_clob, false );
   dbms_output.put_line( l_clob );
end; 
/

declare
*
ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-11207: user filter command exited with status 1
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DOC", line 1050
ORA-06512: at line 7

Tom Kyte
April 29, 2009 - 9:47 pm UTC

can you try with a very simple RTF document first, let's see if it "can work" and the problem is the RTF you have or not.

open word
create a very simple RTF like I did
save it
test with that.

same thing

Raymond Tan, April 30, 2009 - 11:39 am UTC

I open up a word and type " this is a test" and save it as .rtf and got the same error. I tried researching the error but there's not really good information about the error.

ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-11207: user filter command exited with status 1
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DOC", line 1050
ORA-06512: at line 7

Tom Kyte
April 30, 2009 - 11:57 am UTC

can you do this

a) cd $ORACLE_HOME/ctx/bin
b) ./ctxhx ..../test.rtf /tmp/x.html ASCII8 utf8 H NOMETA 120 HEURISTIC FORMAT NOPDFROTATE


that is the command being executed - let's see what it says...

Nothing happen

Raymond Tan, April 30, 2009 - 12:56 pm UTC

$_oracle>cd $ORACLE_HOME/ctx/bin

$_oracle>pwd
/disk1/app/oracle/product/11.1.0/db_7/ctx/bin

$_oracle>./ctxhx /export/home/oracle/rtan/OpinionsRTF/test.rtf /tmp/x.html ASCII8 utf8 H NOMETA 120 HEURISTIC FORMAT NOPDFROTATE

$_oracle>ls -lrt
total 352
-rwxr-x--- 1 oracle oinstall 0 Sep 26 2008 ctxhxO
-rwxr-x--x 1 oracle oinstall 171292 Feb 4 09:28 ctxhx


Tom Kyte
April 30, 2009 - 1:24 pm UTC

cat /tmp/x.html

and how are you connecting to the database? (over sqlnet or direct)
and if direct, what user/group are you when you connect?

I con' have that file /tmp/x.html

Raymond Tan, April 30, 2009 - 5:22 pm UTC

I tried connecting as sysdba and a local user with all the oracle text privileges. I also did tried the direct connection and sqlnet using toad connection.

Tom Kyte
April 30, 2009 - 6:42 pm UTC

I don't know what you mean "I con' have" ???


I don't have the file cat /tmp/x.html

Raymond Tan, April 30, 2009 - 5:42 pm UTC

I made a type on the previous message.
Tom Kyte
April 30, 2009 - 6:42 pm UTC

I don't know what that means, why can't you have that file?

x.html

Raymond Tan, May 01, 2009 - 10:38 am UTC

x.html - i Don't have this file under /tmp directory.

cd $ORACLE_HOME/ctx/bin

./ctxhx /export/home/oracle/rtan/OpinionsRTF/test.rtf /tmp/x.html ASCII8 utf8 H NOMETA 120 HEURISTIC FORMAT NOPDFROTATE

I tried executing the about command but nothing happen.
Tom Kyte
May 01, 2009 - 11:02 am UTC

can you please verify for us that test.rft exists

cut and paste the output of this:

ls -l /export/home/oracle/rtan/OpinionsRTF/test.rtf
head /export/home/oracle/rtan/OpinionsRTF/test.rtf
id


I want to know that you can actually read that file.


here is the output

Raymond Tan, May 01, 2009 - 11:30 am UTC

$_oracle>ls -l /export/home/oracle/rtan/OpinionsRTF/test.rtf
-rwxrwxrwx 1 oracle oinstall 4123 Apr 30 08:23 /export/home/oracle/rtan/OpinionsRTF/test.rtf

$_oracle>head /export/home/oracle/rtan/OpinionsRTF/test.rtf
{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman{\*\falt Times New Roman};}{\f141\froman\fcharset238\fprq2 Times New Roman CE{\*\falt Times New Roman};}
{\f142\froman\fcharset204\fprq2 Times New Roman Cyr{\*\falt Times New Roman};}{\f144\froman\fcharset161\fprq2 Times New Roman Greek{\*\falt Times New Roman};}{\f145\froman\fcharset162\fprq2 Times New Roman Tur{\*\falt Times New Roman};}
{\f146\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew){\*\falt Times New Roman};}{\f147\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic){\*\falt Times New Roman};}
{\f148\froman\fcharset186\fprq2 Times New Roman Baltic{\*\falt Times New Roman};}{\f149\froman\fcharset163\fprq2 Times New Roman (Vietnamese){\*\falt Times New Roman};}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;
\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;
\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\ql \li0\ri0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \rtlch\fcs1 \af0\afs24\alang1025 \ltrch\fcs0 \fs24\lang1033\langfe1033\cgrid\langnp1033\langfenp1033
\snext0 Normal;}{\*\cs10 \additive \ssemihidden Default Paragraph Font;}{\*
\ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\tblind0\tblindtype3\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv
\ql \li0\ri0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \rtlch\fcs1 \af0\afs20 \ltrch\fcs0 \fs20\lang1024\langfe1024\cgrid\langnp1024\langfenp1024 \snext11 \ssemihidden Normal Table;}}
{\*\latentstyles\lsdstimax156\lsdlockeddef0}{\*\rsidtbl \rsid5661317\rsid15806011}{\*\generator Microsoft Word 11.0.0000;}{\info{\title This is a test}{\author LDC}{\operator LDC}{\creatim\yr2009\mo4\dy30\hr8\min18}{\revtim\yr2009\mo4\dy30\hr8\min18}

$_oracle>id
uid=1000(oracle) gid=201(oinstall)

Tom Kyte
May 01, 2009 - 11:36 am UTC

I'll have to refer you to support, that should have filtered it. Not sure what is setup incorrectly on your system - it runs for me on mine with a standard install.

Sorry.

Thanks

Raymond Tan, May 01, 2009 - 11:38 am UTC

Appreciate your help.
Thanks a lot.

Set Load Library Path

Patrick Ford, May 16, 2009 - 3:55 pm UTC

> echo $LD_LIBRARY_PATH
LD_LIBRARY_PATH: Undefined variable.

SAND9DB-FORDP> declare
2 l_bfile bfile;
3 l_clob clob;
4 begin
5 l_bfile := bfilename( 'LOAD_DIR', 'test.rtf' );
6 dbms_lob.fileopen( l_bfile );
7 ctx_doc.policy_filter( 'my_policy', l_bfile, l_clob, false );
8 dbms_output.put_line( l_clob );
9 end;
10 /

ERROR at line 1:
ORA-20000: Oracle Text error:
DRG-11207: user filter command exited with status 1
ORA-06512: at "CTXSYS.DRUE", line 160
ORA-06512: at "CTXSYS.CTX_DOC", line 1050
ORA-06512: at line 7

Set LD_LIBRARY_PATH environment variable.
tcsh> setenv LD_LIBRARY_PATH $ORACLE_HOME/lib:$ORACLE_HOME/ctx/lib:$ORACLE_HOME/lib32:/usr/lib
or
bash> export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME/ctx/lib:$ORACLE_HOME/lib32:/usr/lib

> echo $LD_LIBRARY_PATH
/opt/app/oracle/product/1110/db1/lib:/opt/app/oracle/product/1110/db1/ctx/lib:/opt/app/oracle/product/1110/db1/lib32:/usr/lib

SAND9DB-FORDP> declare
2 l_bfile bfile;
3 l_clob clob;
4 begin
5 l_bfile := bfilename( 'LOAD_DIR', 'test.rtf' );
6 dbms_lob.fileopen( l_bfile );
7 ctx_doc.policy_filter( 'my_policy', l_bfile, l_clob, false );
8 dbms_output.put_line( l_clob );
9 end;
10 /

<HTML><BODY>

<p><font size="3" face="Times New Roman">This is an rtf file
testing with Oracle 11.1.0.7 on Solaris SPARC 64</font></p>















</BODY></HTML>


PL/SQL procedure successfully completed.
Tom Kyte
May 23, 2009 - 10:59 am UTC

that would not be it, do you see the command line test I asked him to do? If it were ld_library_path for the command line failing, we'd have seen an ERROR.


and this is why I asked him:

"and how are you connecting to the database? (over sqlnet or direct) "

he was using sqlnet, there the environment is inherited from the listener - hence the path must have been set else the listener itself would not start.

Set Load Library Path

Patrick Ford, June 11, 2009 - 1:17 am UTC

Not seeing that behavior, no visible ERROR with those commands

> env|grep LD
> lsnrctl start

LSNRCTL for Solaris: Version 11.1.0.7.0 - Production on 11-JUN-2009 04:36:02

Copyright (c) 1991, 2008, Oracle. All rights reserved.

Starting /opt/app/oracle/product/1110/db1/bin/tnslsnr: please wait...

The command completed successfully

> cd $ORACLE_HOME/ctx/bin
> ./ctxhx /export/home/oracle/script/test.rtf /tmp/x.html ASCII8 utf8 H NOMETA 120 HEURISTIC FORMAT NOPDFROTATE
>
Tom Kyte
June 11, 2009 - 8:30 am UTC

not sure what you are showing here. doesn't look like a cut and paste.

the listener stuff - not sure what that was for? It isn't involved in this at all - you are running in the foreground.


[ora10gr2@dellpe bin]$ env | grep LD
OLDPWD=/home/ora10gr2
[ora10gr2@dellpe bin]$ ls -l /tmp/x.html
ls: /tmp/x.html: No such file or directory
[ora10gr2@dellpe bin]$ ./ctxhx ./test.rtf /tmp/x.html ASCII8 utf8 H NOMETA 120 HEURISTIC FORMAT
[ora10gr2@dellpe bin]$ ls -l /tmp/x.html
-rw------- 1 ora10gr2 ora10gr2 507 Jun 11 08:04 /tmp/x.html
[ora10gr2@dellpe bin]$

all that is showing is that in this case, this program didn't need ld lib path. If ld lib path was the problem, the OS would have printed on stderr an error message when it could not find a library.

Set Load Library Path

Patrick Ford, June 11, 2009 - 8:34 pm UTC

Sorry for the confusion, Tom. That's just a copy paste of commands with the extraneous listener guts removed in response to your previous Followup, "...do you see the command line test I asked him to do? If it were ld_library_path for the command line failing, we'd have seen an ERROR....hence the path must have been set else the listener itself would not start."

I'm probably missing something but without the LD_LIBRARY_PATH set, the ctxhx can be executed and the listener started without error. Not having LD_LIBRARY_PATH produces the same error message Raymond encountered and setting it alleviates that error as posted on May 16. I'm still not clear why that's not a potential solution, similar to Doc ID 824319.1.

ctxhx error on windows 2003 OS

Ale, April 15, 2010 - 11:25 am UTC

Hi, I am facing similar problem on my oracle 10.2.0.4 on windows 2003 Server (either 32 or 64 bit).

set ORACLE_HOME=c:\oracle\product\10.2.0\db
set ORACLE_SID=ora
set SystemRoot=c:\Windows
set LD_LIBRARY_PATH=%ORACLE_HOME%\lib
set path=%path%;%LD_LIBRARY_PATH%;%ORACLE_HOME%\bin;%ORACLE_HOME%\database;C:\WinResKits\Tools\;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem

when I try to use the following nothing happens:
ctxh c:\test.doc c:\x.html

if I change the source file to a pdf file it works:
ctxh c:\test_pdf.pdf c:\x.html

If i try to "index" a simple txt file it does not work either!

Can anyone help me out about this?

My goal is to create a table with a blob column and index it with a USER_FILTER which will call a .bat file containng a call to ctxhx.exe (see http://www.oracle.com/technology/products/text/htdocs/altfilters.htm#multifilter ) but it does not work for NON pdf files!

Thanks in advance for any suggestion

Paresh

Pareshkumar Tapania, April 28, 2010 - 1:52 pm UTC

Hi Tom,

I am using the package "ctx_doc.policy_filter" to fetch the plain text out ot RTF data. But somehow I am not getting the exact output what user enter from application, User enters data from Application (Textpad or editplus)which Application convert to respective RTF code and store this RTF text into database CLOB column. But when I use package ctx_doc.policy_filter to get the plain text from thie RTF text, Oracle is adding unwanted line feeds.

Following are the details:

1. Create Table text_test
CREATE TABLE TEXT_TEST
(
RMRK_KEY NUMBER NOT NULL,
PROV_ID VARCHAR2(10 BYTE) NOT NULL,
PROV_KEY NUMBER(10),
PROV_SEQ_NUM NUMBER(3),
RMRK_CATG_CODE VARCHAR2(10 BYTE) NOT NULL,
RMRK_TYPE_CODE VARCHAR2(10 BYTE) NOT NULL,
RMRK_DATE DATE NOT NULL,
RMRK_TEXT CLOB,
CREATE_OPER VARCHAR2(360 BYTE) NOT NULL,
CREATE_DATE DATE NOT NULL,
UPDATE_OPER VARCHAR2(360 BYTE) NOT NULL,
UPDATE_DATE DATE NOT NULL
)

2. Insert one test record.

Insert into TEXT_TEST
(RMRK_KEY, PROV_ID, PROV_KEY, PROV_SEQ_NUM, RMRK_CATG_CODE, RMRK_TYPE_CODE, RMRK_DATE, RMRK_TEXT, CREATE_OPER, CREATE_DATE, UPDATE_OPER, UPDATE_DATE)
Values
(358223, 'D01', 77277, 1, 'PRV', 'PRV', sysdate, '{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17 Testing Provision remarks \par
\par
for 11g\par
\par
\par
After carriage return\par
}
', 'XXXX', sysdate, 'XXXX', sysdate);

COMMIT;

3. Following code I used to get the plain text:

a. EXEC ctx_ddl.create_preference('test_filter', 'AUTO_FILTER');
b. EXEC ctx_ddl.drop_policy('my_policy');
c. EXEC ctx_ddl.create_policy('my_policy','test_filter');
d. CREATE TABLE test_clob_table (col1 CLOB);
e.
DECLARE
l_clob CLOB;
BEGIN
FOR x IN (SELECT RMRK_TEXT
FROM text_test)
LOOP
ctx_doc.policy_filter (policy_name => 'my_policy'
, document => x.rmrk_text_plain
, restab => l_clob
, plaintext => TRUE
, CHARSET => 'US7ASCII'
);

INSERT INTO test_clob_table
VALUES (l_clob
);
commit;
END LOOP;
END;
/

f. Output is:




Testing Provision remarks



for 11g



After carriage return



Its Adding 4 blank lines at the top of the remark, for every line feed its adding 3 blank lines. And at the end its adding two more blank lines.

But we are expecting output as:

Testing Provision remarks

for 11g


After carriage return
*****

Tom Kyte
April 28, 2010 - 3:25 pm UTC

do you see the \par's in there?

Guess what they are.

So, as I see it, it added a header space (as it should) - like a margin and output things with respect to the paragraph marks - the \par


You should not expect anything fancy from a text filter, just basically the text is all you can expect.


solution is good but..

pradeep, June 21, 2015 - 8:13 am UTC

hi...
the solution is good and working for me, but is there any way to get the tabular data, right now it is filtering text phrase by phrase or line by line. for ex.
if pdf contains values like

Name: Amount
Pradeep 100 USD

i want the output as it is but the current setup gives the output like
Name:
Amount
Pradeep
100 USD

is there any way to get the original format of text with in pdf?
100 USD

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here