HI,
I need to perform fulltext searches in an xmltype XML field. The table in which this field is located has other fields that can be used to narrow the search in the fulltext index (for example, a date field). However, I noticed that by creating a composite domain index on the xmltype field, the optimizer ignores the possible filter on the data field, and performs a subsequent filter, once the records with the context index have been identified.
This only happens if the field where the context index is created is of type xmltype. I did a test by creating the same table, changing only the field in clob, and the plan seems correct to me, i.e. the records are searched for using only the context index, not doing a subsequent filter.
It can also be seen from the trace that in the second case, with the same data, the cost of the query is significantly lower.
the two table, one wih xmltype field, the other with clob field.
create table test_xml
(
id number,
dt date,
xmltext xmltype
);
create table test_xml_clob
(
id number,
dt date,
xmltext clob
);
Script to populate some data
declare
cnt number := 0;
begin
for x in 1..100 loop
for m in 1..12 loop
for d in 1..28 loop
for y in 2022..2024 loop
insert into test_xml values(cnt, to_date(to_char(m) || '/' || to_char(d) || '/' || to_char(y), 'mm/dd/yyyy'), '<?xml version="1.0" encoding="UTF-8"?><root><item>str1</item></root>');
insert into test_xml_clob values(cnt, to_date(to_char(m) || '/' || to_char(d) || '/' || to_char(y), 'mm/dd/yyyy'), '<?xml version="1.0" encoding="UTF-8"?><root><item>str1</item></root>');
cnt := cnt+1;
end loop;
end loop;
end loop;
commit;
end loop;
end;
The two indexes, both with the
filter by option
CREATE INDEX ndx_cdi_xmlft_test ON test_xml (xmltext)
INDEXTYPE IS CTXSYS.CONTEXT
FILTER BY dt PARAMETERS ('FILTER CTXSYS.NULL_FILTER');
CREATE INDEX ndx_cdi_xmlclobft_test ON test_xml_clob (xmltext)
INDEXTYPE IS CTXSYS.CONTEXT
FILTER BY dt PARAMETERS ('FILTER CTXSYS.NULL_FILTER');
Statistics:
exec dbms_stats.gather_table_stats('', tabname => 'TEST_XML_CLOB', cascade => TRUE);
exec dbms_stats.gather_table_stats('', tabname => 'TEST_XML', cascade => TRUE);
The queries used for testing. The first one has a much higher cost:
select * from test_xml where contains(xmltext, 'str1') > 0 and dt between to_date('01/01/2022', 'dd/mm/yyyy') and to_date('02/01/2022', 'dd/mm/yyyy') ;
select * from test_xml_clob where contains(xmltext, 'str1') > 0 and dt between to_date('01/01/2022', 'dd/mm/yyyy') and to_date('02/01/2022', 'dd/mm/yyyy') ;
select banner_full from v$version
BANNER_FULL
------------------------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.15.0.0.0
I asked the XML team about this, big thanks to Rodrigo Hernandez for investigating this who has this to say:
This is a limitation on the way Extensible Optimizer interacts with Text Index query costing. It is not passing correct arguments to the Text Index costing function, which results in the predicate's selectivity and function cost to not be computed correctly.
We've filed a bug for this. In the meantime, there is an (undocumented! use with caution) DOMAIN_INDEX_FILTER hint to get the filtering you want.