Sorry for the delay - I got some good information back from the Text Product Manager
If you are happy with a count limit of 100, you can use a text index and a "query template" which specifies
<score algorithm="count">
SQL> create table t (id int primary key, doc varchar2(2000));
Table created.
SQL>
SQL> -- 1 to 5 occurrences of the word "oracle"
SQL> insert into t values (1, 'oracle');
1 row created.
SQL> insert into t values (2, 'oracle oracle');
1 row created.
SQL> insert into t values (3, 'oracle oracle oracle');
1 row created.
SQL> insert into t values (4, 'oracle oracle oracle oracle');
1 row created.
SQL> insert into t values (5, 'oracle oracle oracle oracle oracle');
1 row created.
SQL>
SQL> -- 110 occurrences of the word "oracle"
SQL> insert into t values (6, 'oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle o
racle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracl
e oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle or
acle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle oracle');
1 row created.
SQL>
SQL> create index t_index on t(doc) indextype is ctxsys.context;
Index created.
SQL>
SQL> select id, score(1) as occurrence_count from t where contains( doc, '
2 <query>
3 <textquery>
4 oracle
5 </textquery>
6 <score algorithm="count" />
7 </query>
8 ', 1) > 0;
ID OCCURRENCE_COUNT
---------- ----------------
1 1
2 2
3 3
4 4
5 5
6 100 -- there are actually 110
6 rows selected.
If you need to allow for counts larger than 100, it's probably best to dispense with the text index, and use CTX_DOC.POLICY_HIGHLIGHT. This uses a POLICY (sometimes described as an index definition without an index) and scans each document provided to it to look for search terms. It's designed to provide offset and length for the "hits" within the document, but we can dispense with them just count the number of highlights.
Obviously as this doesn't use an index it will be less efficient, but there is no limit on the count returned.
SQL>
SQL> set serverout on
SQL>
SQL> begin
2 ctx_ddl.create_policy (
3 policy_name => 'occurrence_policy'
4 );
5 end;
6 /
PL/SQL procedure successfully completed.
SQL>
SQL> create or replace function occurrence_count
2 (document varchar2,
3 search_term varchar2
4 )
5 return number is
6 highlight_tab ctx_doc.highlight_tab;
7 begin
8 for c in ( select id, doc from t ) loop
9 ctx_doc.policy_highlight( 'T_INDEX', document, search_term, highlight_tab);
10 return highlight_tab.count;
11 end loop;
12 end;
13 /
Function created.
SQL>
SQL> select id, occurrence_count( doc, 'oracle') from T;
ID OCCURRENCE_COUNT(DOC,'ORACLE')
---------- ------------------------------
1 1
2 2
3 3
4 4
5 5
6 110
6 rows selected.
SQL>