Skip to Main Content
  • Questions
  • Using JSON_TABLE with aggregate functions

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Sagar.

Asked: September 27, 2017 - 10:43 am UTC

Last updated: October 30, 2017 - 2:16 pm UTC

Version: 12.2.0.0.3

Viewed 1000+ times

You Asked

Hi Team

DB Version: 12.2.0.0.3

On a high level, on successful completion of the scripts (from LiveSQL) you would have:
1) Loaded 10k users data in JSON format in STAGE_USER table.
2) Loaded 500 roles data in JSON format in STAGE_ROLE table.
1) Loaded 10k user-role membership data in JSON format in STAGE_USERMEM table. Each user is assgined 100 roles.

[Note] ==> You might receive "ORA-06502: PL/SQL: numeric or value error" during Statement#9,19 and 29 executions, which converts the relational data to JSON format. Please ignore the error, the data will be loaded into the respective tables.

Gathered stats on all the three staging tables created post successful execution of the script.

EXEC dbms_stats.gather_table_stats(USER, '<TABLE_NAME>');


Below are the two use cases for which I see delay in response time with the JSON_TABLE operator being used:

1) Find total # of users assigned role 'DummyRole1' in the membership table

Avg response time of below SQL: ~3.9 secs.
With data increased to 100k in this membership table, the avg response time increases to ~20 secs.

SELECT COUNT(1)
FROM STAGE_USERMEM o,
        json_table(userrolemem_json, '$'
    COLUMNS 
        (
           NESTED PATH '$.user_mem_dtls.user_role_mem[*]'
             COLUMNS (role_id VARCHAR2(32 CHAR) PATH '$.role_id',
                      role_name  VARCHAR2(20 CHAR) PATH '$.role_name')
        )
    ) t
WHERE t.role_name='DummyRole1';


2) Find total no of roles assigned to each of the users in the membership table

Avg response time of below SQL: ~4.5 secs.
With data increased to 100k in this membership table, the avg response time increases to ~30 secs.

SELECT t.user_id,COUNT(1)
FROM STAGE_USERMEM o,
        json_table(userrolemem_json, '$'
    COLUMNS 
        (
           user_id VARCHAR2(32 CHAR) PATH '$.user_id',
           NESTED PATH '$.user_mem_dtls.user_role_mem[*]'
             COLUMNS (role_id VARCHAR2(32 CHAR) PATH '$.role_id',
                      role_name  VARCHAR2(20 CHAR) PATH '$.role_name')
        )
    ) t
GROUP BY t.user_id;


Could you suggest how to speed up the response time for the above two use cases, as when load increases the response time shoots up.

Thanks in advance!

with LiveSQL Test Case:

and Chris said...

Are the sample data representative of your real data?

DummyRole1 exists in every JSON document. So you're returning every row from the table.

If this is just a "for example", and in your production data each role only appears in a "few" documents, you may get some benefits from creating a JSON search index.

To do this, you first need to add the "is json" check constraint to the column. Then add the index:

alter table stage_usermem add constraint j check ( userrolemem_json is json );
create search index stage_usermem_ji on stage_usermem (userrolemem_json) for json;


But if roles appear in most/all the documents, adding this will give little or no benefit:

set serveroutput off
SELECT /*+ gather_plan_statistics */COUNT(1)
FROM STAGE_USERMEM o,
        json_table(userrolemem_json, '$'
    COLUMNS 
        (
           NESTED PATH '$.user_mem_dtls.user_role_mem[*]'
             COLUMNS (role_id    VARCHAR2(32 CHAR) PATH '$.role_id',
                      role_name  VARCHAR2(20 CHAR) PATH '$.role_name')
        )
    ) t
WHERE t.role_name='DummyRole1';

select * from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

Plan hash value: 2625201623                                                                                   
                                                                                                              
------------------------------------------------------------------------------------------------------------  
| Id  | Operation               | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  
------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT        |               |      1 |        |      1 |00:00:01.97 |   20134 |  20000 |  
|   1 |  SORT AGGREGATE         |               |      1 |      1 |      1 |00:00:01.97 |   20134 |  20000 |  
|   2 |   NESTED LOOPS          |               |      1 |  10000 |  10000 |00:00:02.01 |   20134 |  20000 |  
|   3 |    TABLE ACCESS FULL    | STAGE_USERMEM |      1 |  10000 |  10000 |00:00:00.02 |     134 |      0 |  
|   4 |    JSONTABLE EVALUATION |               |  10000 |        |  10000 |00:00:01.93 |   20000 |  20000 |  
------------------------------------------------------------------------------------------------------------  
                                                                                                              
Note                                                                                                          
-----                                                                                                         
   - statistics feedback used for this statement 

alter table stage_usermem add constraint j check ( userrolemem_json is json );
create search index stage_usermem_ji on stage_usermem (userrolemem_json) for json;

SELECT /*+ gather_plan_statistics */COUNT(1)
FROM   STAGE_USERMEM o,
      json_table(userrolemem_json, '$'
       COLUMNS 
        (
           NESTED PATH '$.user_mem_dtls.user_role_mem[*]'
             COLUMNS (role_id    VARCHAR2(32 CHAR) PATH '$.role_id',
                      role_name  VARCHAR2(20 CHAR) PATH '$.role_name')
        )
    ) t
WHERE t.role_name='DummyRole1';

select * from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

Plan hash value: 3790423225                                                                                            
                                                                                                                       
---------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  
---------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                  |      1 |        |      1 |00:00:02.07 |   21148 |  20877 |  
|   1 |  SORT AGGREGATE               |                  |      1 |      1 |      1 |00:00:02.07 |   21148 |  20877 |  
|   2 |   NESTED LOOPS                |                  |      1 |    408 |  10000 |00:00:01.96 |   21148 |  20877 |  
|   3 |    TABLE ACCESS BY INDEX ROWID| STAGE_USERMEM    |      1 |      5 |  10000 |00:00:00.06 |    1147 |    877 |  
|*  4 |     DOMAIN INDEX              | STAGE_USERMEM_JI |      1 |        |  10000 |00:00:00.03 |    1020 |    877 |  
|   5 |    JSONTABLE EVALUATION       |                  |  10000 |        |  10000 |00:00:01.84 |   20001 |  20000 |  
---------------------------------------------------------------------------------------------------------------------  
                                                                                                                       
Predicate Information (identified by operation id):                                                                    
---------------------------------------------------                                                                    
                                                                                                                       
   4 - access("CTXSYS"."CONTAINS"("O"."USERROLEMEM_JSON",'{DummyRole1} INPATH                                          
              (/user_mem_dtls/user_role_mem/role_name)')>0)  


Notice that the value in the buffers column increases when using the index. So this makes the query do more work!

So if roles are common, you may be better suited to creating a materialized view over your query. Then querying that. For example:

create materialized view json_mv
as
SELECT t.role_name, COUNT(1)
FROM STAGE_USERMEM o,
        json_table(userrolemem_json, '$'
    COLUMNS 
        (
           NESTED PATH '$.user_mem_dtls.user_role_mem[*]'
             COLUMNS (role_id    VARCHAR2(32 CHAR) PATH '$.role_id',
                      role_name  VARCHAR2(20 CHAR) PATH '$.role_name')
        )
    ) t
group by t.role_name;

select /*+ gather_plan_statistics */* from json_mv
where  role_name = 'DummyRole1';

select * from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

Plan hash value: 445572365                                                                  
                                                                                            
------------------------------------------------------------------------------------------  
| Id  | Operation            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  
------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT     |         |      1 |        |      1 |00:00:00.01 |       9 |  
|*  1 |  MAT_VIEW ACCESS FULL| JSON_MV |      1 |      1 |      1 |00:00:00.01 |       9 |  
------------------------------------------------------------------------------------------  
                                                                                            
Predicate Information (identified by operation id):                                         
---------------------------------------------------                                         
                                                                                            
   1 - filter("ROLE_NAME"='DummyRole1')                                                     
                                                                                            
Note                                                                                        
-----                                                                                       
   - dynamic statistics used: dynamic sampling (level=2)  


Notice the buffers (work) has dropped drastically. You should see much quicker execution times too.

And of course, it's worth asking:

Do you need to store the data as JSON? Can you use regular relational tables instead?

This makes indexing more straightforward.

And, if you look at the original plans closely, you'll see that most of the time goes on the "JSONTABLE EVALUATION" operation. i.e. converting your data from JSON to relational is where most of the time goes. Get rid of this and you could make your queries much faster.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks

Sagar, October 25, 2017 - 8:54 am UTC

Thanks for the response Chris.

We are exploring using document store with Hybrid JSON model. Using relational fields to store simple attribute data, and MVA, CMVA type data in dedicated column (JSON aware) to store such data in JSON format.
We will not be able to use SODA APIs if opting for above approach, but our use case of handling Flex fields data best fits this Hybrid Model.
Chris Saxon
October 30, 2017 - 2:16 pm UTC

"MVA, CMVA type data" ?

JSON Hybrid model

Sagar, October 30, 2017 - 4:36 pm UTC

MVA - Multi Value Attribute
CMVA - Complex Multi Value Attribute

Following youtube video from JavaOne 2017 conference by Beda was big help.

Top 10 tips for JSON processing in Database:
https://www.youtube.com/watch?v=yqSgV1IX7zE&index=7&t=33s&list=LL7FlmabGSMww1j6WN0T0YXA

Thanks.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.