Hi all,
I use CTXRULE to classify some text, but AB&B work and AT&T not work. Please support me on this:
Here is my search text
SQL> select text from test_lexer_special;
TEXT
--------------------
AB&B
{AT&T}
AT&T
SQL>
How I create index
SQL> EXEC CTX_DDL.CREATE_INDEX_SET('test_ctxrule_1');
PL/SQL procedure successfully completed.
SQL> EXEC CTX_DDL.CREATE_PREFERENCE ('testmy_lexer', 'BASIC_LEXER');
PL/SQL procedure successfully completed.
SQL> EXEC CTX_DDL.SET_ATTRIBUTE ('testmy_lexer', 'PRINTJOINS', '&');
PL/SQL procedure successfully completed.
SQL> CREATE INDEX test_ctxrule_index1 ON test_lexer_special (text)
2 INDEXTYPE IS CTXSYS.CTXRULE
3 PARAMETERS ('INDEX SET test_ctxrule_1 LEXER testmy_lexer STOPLIST CTXSYS.EMPTY_STOPLIST');
Index created.
SQL>
This works
SQL> SELECT * FROM test_lexer_special WHERE MATCHES(text, 'ab&b')> 0;
TEXT
--------------------
AB&B
SQL>
This does not work
SQL> SELECT * FROM test_lexer_special WHERE MATCHES(text, 'at&t')> 0;
no rows selected
SQL> select token_text,token_extra from DR$test_ctxrule_index1$I;
TOKEN_TEXT TOKEN_EXTRA
-------------------- --------------------
AB {B}
SQL>
Thanks.
AT is still part of the stoplist! You can see this by reporting on the index with ctx_report.describe_index:
create table test_lexer_special (
text varchar2(30)
);
set define off
insert into test_lexer_special values ('AB&B');
insert into test_lexer_special values ('{AT&T}');
insert into test_lexer_special values ('AT&T');
CREATE INDEX test_ctxrule_index1 ON test_lexer_special (text) INDEXTYPE IS CTXSYS.CTXRULE
PARAMETERS ('INDEX SET test_ctxrule_1 LEXER testmy_lexer STOPLIST CTXSYS.EMPTY_STOPLIST');
set long 2000000000
set head off
set pagesize 10000
select ctx_report.describe_index('TEST_CTXRULE_INDEX1') from dual;
===========================================================================
INDEX DESCRIPTION
===========================================================================
index name: "CHRIS"."TEST_CTXRULE_INDEX1"
index id: 1284
index type: ctxrule
base table: "CHRIS"."TEST_LEXER_SPECIAL"
primary key column:
text column: TEXT
text column type: VARCHAR2(30)
language column:
format column:
charset column:
configuration column:
Query Stats Enabled: NO
status: INDEXED
full optimize token:
full optimize count:
docid count: 3
nextid: 4
===========================================================================
INDEX OBJECTS
===========================================================================
datastore: DIRECT_DATASTORE
filter: NULL_FILTER
section group: NULL_SECTION_GROUP
lexer: BASIC_LEXER
wordlist: BASIC_WORDLIST
stemmer: ENGLISH
fuzzy_match: GENERIC
stoplist: BASIC_STOPLIST
stop_word: Mr
stop_word: Mrs
stop_word: Ms
stop_word: a
stop_word: all
stop_word: almost
stop_word: also
stop_word: although
stop_word: an
stop_word: and
stop_word: any
stop_word: are
stop_word: as
stop_word: at
... etc. ...
So why has this happened? You specified an empty stoplist, right?
Well you've also specified an index set. These only apply to CTXCAT indexes. You're using CTXRULE. Unfortunately this doesn't throw an exception...
Specify the stoplist on its own and everything will be hunky dory:
drop INDEX test_ctxrule_index1;
CREATE INDEX test_ctxrule_index1 ON test_lexer_special (text) INDEXTYPE IS CTXSYS.CTXRULE
PARAMETERS ('STOPLIST CTXSYS.EMPTY_STOPLIST');
SELECT * FROM test_lexer_special WHERE MATCHES(text, 'at&t',1)> 0;
TEXT
{AT&T}
AT&T