So questions I have are:
Why are the E-rows missing for operation 3 & does this help account for the large underestimate for line 2?
Why is operation 4 doing so much work? This is also the slowest part of the plan (From the A-Time).Good questions, I can only speculate:
operation 4: in our case the json column is a clob. Could this add the additional overhead? Also: I'm not sure how the contains expression is actually evaluated by oracle text. Would it parse the columns value again or lookup the parsed results in some internally managed tables?
operation 3: I did the following tests and notitced a difference between gather_table_stats and analyze table compute statistics, where the e-rows estimated seem to accurate after analyze table, only.
According to the Oracle Text Developer Guide both analyze sstatement and use of dbms_stats are valid:
https://docs.oracle.com/database/121/CCAPP/GUID-E4F013A8-E83C-44AF-B9A4-CA1FBF717730.htm#CCAPP9634 alter session set statistics_level = all;
create table t_json(json clob, constraint t_json_chk_json check (json is json) enable);
create index t_json_i_json on t_json(json)
indextype is ctxsys.context parameters ('section group ctxsys.json_section_group sync (on commit)');
insert into t_json values ('{}');
insert into t_json select '{"type": "t1", "id": 1}' from dual connect by level <= 10;
insert into t_json select '{"type": "t2", "id": 2}' from dual connect by level <= 10000;
commit;
execute dbms_stats.gather_table_stats(user, 'T_JSON', estimate_percent=>50) ;
select count(1) from t_json where contains(json,'{t2} INPATH(/type)') > 0;
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
>SQL_ID 0rwwkj8hq0526, child number 0
>------------------------------------
>select count(1) from t_json where contains(json,'{t2} INPATH(/type)') >
>0
>
>Plan hash value: 3915049437
>
>--------------------------------------------------------------------------------------------
>| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
>--------------------------------------------------------------------------------------------
>| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 83 |
>| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 83 |
>|* 2 | DOMAIN INDEX | T_JSON_I_JSON | 1 | 502 | 10000 |00:00:00.01 | 83 |
>--------------------------------------------------------------------------------------------
>
>Predicate Information (identified by operation id):
>---------------------------------------------------
>
> 2 - access("CTXSYS"."CONTAINS"("JSON",'{t2} INPATH(/type)')>0)
insert into t_json select '{"type": "t3", "id": 3}' from dual connect by level <= 30000;
commit;
select count(1) from t_json where contains(json,'{t3} INPATH(/type)') > 0;
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
>SQL_ID 7ucz1zm29hw3v, child number 0
>-------------------------------------
>select count(1) from t_json where contains(json,'{t3} INPATH(/type)') >
>0
>
>Plan hash value: 3915049437
>
>--------------------------------------------------------------------------------------------
>| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
>--------------------------------------------------------------------------------------------
>| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 231 |
>| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 231 |
>|* 2 | DOMAIN INDEX | T_JSON_I_JSON | 1 | 502 | 30000 |00:00:00.01 | 231 |
>--------------------------------------------------------------------------------------------
>
>Predicate Information (identified by operation id):
>---------------------------------------------------
>
> 2 - access("CTXSYS"."CONTAINS"("JSON",'{t3} INPATH(/type)')>0)
-- regather statistics - no change in plan
execute dbms_stats.gather_table_stats(user, 'T_JSON', estimate_percent=>50) ;
select count(1) from t_json where contains(json,'{t2} INPATH(/type)') > 0;
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
> SQL_ID 0rwwkj8hq0526, child number 0
>-------------------------------------
>select count(1) from t_json where contains(json,'{t2} INPATH(/type)') >
>0
>
>Plan hash value: 3915049437
>
>--------------------------------------------------------------------------------------------
>| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
>--------------------------------------------------------------------------------------------
>| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 79 |
>| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 79 |
>|* 2 | DOMAIN INDEX | T_JSON_I_JSON | 1 | 502 | 10000 |00:00:00.01 | 79 |
>--------------------------------------------------------------------------------------------
>
>Predicate Information (identified by operation id):
>---------------------------------------------------
>
> 2 - access("CTXSYS"."CONTAINS"("JSON",'{t2} INPATH(/type)')>0)
select count(1) from t_json where contains(json,'{t3} INPATH(/type)') > 0;
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
>SQL_ID 7ucz1zm29hw3v, child number 0
>-------------------------------------
>select count(1) from t_json where contains(json,'{t3} INPATH(/type)') >
>0
>
>Plan hash value: 3915049437
>
>--------------------------------------------------------------------------------------------
>| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
>--------------------------------------------------------------------------------------------
>| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 231 |
>| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 231 |
>|* 2 | DOMAIN INDEX | T_JSON_I_JSON | 1 | 502 | 30000 |00:00:00.01 | 231 |
>--------------------------------------------------------------------------------------------
>
>Predicate Information (identified by operation id):
>---------------------------------------------------
>
> 2 - access("CTXSYS"."CONTAINS"("JSON",'{t3} INPATH(/type)')>0)
-- use analyze instead of gather_table_stats - estimates in plan are changing
analyze table t_json compute statistics;
select count(1) from t_json where contains(json,'{t2} INPATH(/type)') > 0;
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
>SQL_ID 0rwwkj8hq0526, child number 0
>-------------------------------------
>select count(1) from t_json where contains(json,'{t2} INPATH(/type)') >
>0
>
>Plan hash value: 3915049437
>
>--------------------------------------------------------------------------------------------
>| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
>--------------------------------------------------------------------------------------------
>| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 79 |
>| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 79 |
>|* 2 | DOMAIN INDEX | T_JSON_I_JSON | 1 | 10000 | 10000 |00:00:00.01 | 79 |
>--------------------------------------------------------------------------------------------
>
>Predicate Information (identified by operation id):
>---------------------------------------------------
>
> 2 - access("CTXSYS"."CONTAINS"("JSON",'{t2} INPATH(/type)')>0)
select count(1) from t_json where contains(json,'{t3} INPATH(/type)') > 0;
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
>SQL_ID 7ucz1zm29hw3v, child number 0
>-------------------------------------
>select count(1) from t_json where contains(json,'{t3} INPATH(/type)') >
>0
>
>Plan hash value: 3915049437
>
>--------------------------------------------------------------------------------------------
>| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
>--------------------------------------------------------------------------------------------
>| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 231 |
>| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 231 |
>|* 2 | DOMAIN INDEX | T_JSON_I_JSON | 1 | 29999 | 30000 |00:00:00.01 | 231 |
>--------------------------------------------------------------------------------------------
>
>Predicate Information (identified by operation id):
>---------------------------------------------------
>
> 2 - access("CTXSYS"."CONTAINS"("JSON",'{t3} INPATH(/type)')>0)
July 28, 2022 - 2:05 pm UTC
OP 4: I guessed you're using a clob/blob ;)
There are lots of possible reasons - row chaining/migration, lob storage settings, getting read consistent data, ... - you'll need to investigate to find out exactly why.
One other thing I just noticed - it's using GLOBAL index to access the data. Is the table partitioned? Do you see similar effects (lots of work to fetch few rows) on non-partitioned tables?
I'm not sure how the contains expression is actually evaluated by oracle textI'm not sure either to be honest! You could try tracing the session to see what's happening when you run CONTAINS.
OP 3: ANALYZE is obsolete, you should use dbms_stats.
You should also set the estimate percent to AUTO
https://blogs.oracle.com/optimizer/post/setting-estimate-percent (though it's possible Oracle Text behaves differently).
That said, if ANALYZE enables the optimizer to come up with better estimates and therefore plans, you could use it while finding a better/more permanent solution.
Have you tried creating a function-based index to assist the event lookup?