Skip to Main Content
  • Questions
  • Query taking more time to split,compare against lookup table and to insert

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Prasad.

Asked: September 22, 2017 - 4:22 pm UTC

Last updated: October 05, 2017 - 5:02 pm UTC

Version: 12C

Viewed 1000+ times

You Asked

We have a dimension table in which one column has mutlipul values like an Array.
I have to get all this individual values and should compare against a lookup table, if the value does not exist in the look up table
then i have to insert this value into lookup table along with an id,here the id should be sequence(select max(id)+1 from lookup table)

DIM TABLE:

CREATE TABLE TEST0922(CODE VARCHAR2(2000));
INSERT INTO TEST0922 VALUES('["REJECT","APPRVOE","HOLD","RESEARCH","REJECT","APPRVOE","HOLD","RESEARCH","REJECT","HOLD","ANALYSIS"]');
INSERT INTO TEST0922 VALUES('["REJECT","APPRVOE","HOLD","RESEARCH","REJECT","APPRVOE","HOLD","RESEARCH","REJECT","HOLD","TEST123"]');
COMMIT;
LOOKUP TABLE:

CREATE TABLE TEST_LOOKUP0922(ID NUMBER,CODE VARCHAR2(30));
INSERT INTO TEST_LOOKUP0922 VALUES(1,'REJECT');
INSERT INTO TEST_LOOKUP0922 VALUES(2,'APPRVOE');
INSERT INTO TEST_LOOKUP0922 VALUES(3,'HOLD');
commit;

I am using below query inside a procedure to get the values not in lookup table and then inserting these values.
ELECT CODE FROM TEST0922;
WITH T AS
(SELECT REPLACE(REGEXP_SUBSTR(CODE,'("[^"\]*\"+)', 1, LEVEL), '"', '') COL1
FROM
( SELECT CODE FROM TEST0922
)
--CONNECT BY LEVEL <= REGEXP_COUNT(audit_ruleaction, ',')+1
CONNECT BY level <=regexp_count(CODE,'("[^"\]*\"+)')
) SELECT DISTINCT COL1 FROM T WHERE COL1 NOT IN (SELECT CODE FROM TEST_LOOKUP0922);


But this query is running more than 45mins against just 250 rows in DIM table, but my actual Production environment will get 100k to 200k records on daily basis.

So please suggest any better query/approach as early as possible;

and Connor said...

Always test your components before bringing them all together - lets start with just the segmenting out of the labels first

SQL> CREATE TABLE TEST0922(CODE VARCHAR2(2000));

Table created.

SQL> INSERT INTO TEST0922 VALUES('["REJECT","APPRVOE","HOLD","RESEARCH","REJECT","APPRVOE","HOLD","RESEARCH","REJECT","HOLD","ANALYSIS"]');

1 row created.

SQL> INSERT INTO TEST0922 VALUES('["REJECT","APPRVOE","HOLD","RESEARCH","REJECT","APPRVOE","HOLD","RESEARCH","REJECT","HOLD","TEST123"]');

1 row created.

SQL> COMMIT;

Commit complete.


SQL> SELECT REPLACE(REGEXP_SUBSTR(CODE,'("[^"\]*\"+)', 1, LEVEL), '"', '') COL1
  2  FROM TEST0922
  3  CONNECT BY level <=regexp_count(CODE,'("[^"\]*\"+)');

COL1
----------------------------------------------------------------------------------------
REJECT
APPRVOE
HOLD
RESEARCH
REJECT
APPRVOE
HOLD
RESEARCH
..
...
TEST123
HOLD
ANALYSIS
TEST123

4094 rows selected.


I'm pretty sure you didnt plan on getting 4094 rows back ... which of course will cause all sorts of problems. So we will fix that

SQL> SELECT tst.rowid r, tag
  2  from TEST0922 tst,
  3  lateral
  4    ( select REPLACE(REGEXP_SUBSTR(tst.CODE,'("[^"\]*\"+)', 1, LEVEL), '"', '') tag
  5      FROM  dual
  6      CONNECT BY level <=regexp_count(tst.CODE,'("[^"\]*\"+)')
  7     );

R                  TAG
------------------ ------------------------------
AAAlUQAAHAAAT6/AAA REJECT
AAAlUQAAHAAAT6/AAA APPRVOE
AAAlUQAAHAAAT6/AAA HOLD
AAAlUQAAHAAAT6/AAA RESEARCH
AAAlUQAAHAAAT6/AAA REJECT
AAAlUQAAHAAAT6/AAA APPRVOE
AAAlUQAAHAAAT6/AAA HOLD
AAAlUQAAHAAAT6/AAA RESEARCH
AAAlUQAAHAAAT6/AAA REJECT
AAAlUQAAHAAAT6/AAA HOLD
AAAlUQAAHAAAT6/AAA ANALYSIS
AAAlUQAAHAAAT6/AAB REJECT
AAAlUQAAHAAAT6/AAB APPRVOE
AAAlUQAAHAAAT6/AAB HOLD
AAAlUQAAHAAAT6/AAB RESEARCH
AAAlUQAAHAAAT6/AAB REJECT
AAAlUQAAHAAAT6/AAB APPRVOE
AAAlUQAAHAAAT6/AAB HOLD
AAAlUQAAHAAAT6/AAB RESEARCH
AAAlUQAAHAAAT6/AAB REJECT
AAAlUQAAHAAAT6/AAB HOLD
AAAlUQAAHAAAT6/AAB TEST123

22 rows selected.


Now we can move onto the lookup and that will run a lot better

SQL> CREATE TABLE TEST_LOOKUP0922(ID NUMBER,CODE VARCHAR2(30));

Table created.

SQL> INSERT INTO TEST_LOOKUP0922 VALUES(1,'REJECT');

1 row created.

SQL> INSERT INTO TEST_LOOKUP0922 VALUES(2,'APPRVOE');

1 row created.

SQL> INSERT INTO TEST_LOOKUP0922 VALUES(3,'HOLD');

1 row created.

SQL> commit;

Commit complete.

SQL> WITH T AS
  2  (
  3  SELECT tag
  4  from TEST0922 tst,
  5  lateral
  6    ( select REPLACE(REGEXP_SUBSTR(tst.CODE,'("[^"\]*\"+)', 1, LEVEL), '"', '') tag
  7      FROM  dual
  8      CONNECT BY level <=regexp_count(tst.CODE,'("[^"\]*\"+)')
  9     )
 10  )
 11  SELECT DISTINCT tag FROM T WHERE tag NOT IN (SELECT CODE FROM TEST_LOOKUP0922);

TAG
------------------------------
TEST123
ANALYSIS
RESEARCH

3 rows selected.

SQL>


Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, September 27, 2017 - 2:44 pm UTC

Can You please help me in Understanding the logic.

The logic I wrote and the logic you suggested are almost same except the "lateral" clause.

So want to understand how this is going to change the number of rows it returned.

Code I have written:
SELECT REPLACE(REGEXP_SUBSTR(CODE,'("[^"\]*\"+)', 1, LEVEL), '"', '') COL1
2 FROM TEST0922
3 CONNECT BY level <=regexp_count(CODE,'("[^"\]*\"+)');


Code you suggested:


SELECT tst.rowid r, tag
2 from TEST0922 tst,
3 lateral
4 ( select REPLACE(REGEXP_SUBSTR(tst.CODE,'("[^"\]*\"+)', 1, LEVEL), '"', '') tag
5 FROM dual
6 CONNECT BY level <=regexp_count(tst.CODE,'("[^"\]*\"+)')
7 );
Connor McDonald
September 29, 2017 - 1:08 pm UTC

Your code works fine for a *one* row table. Once you have more than one row, the CONNECT BY cycles through more than one row for each parse through a string. So you get WAY WAY more results than needed (and do lots more processing).

You hid this from yourself using DISTINCT

Lateral (in the way we're using it here) restricts the looping to the row in question.

A reader, September 27, 2017 - 4:46 pm UTC

I tried with the below logic.

Still it is taking more time to get splitted values.

with t as
(SELECT --tst.rowid r,
tag
from BDRMETA_CORE.JSON_PARSE_DATA_41P tst,
lateral
( select REPLACE(REGEXP_SUBSTR(tst.audit_ruleaction,'("[^"\]*\"+)', 1, LEVEL), '"', '') tag
FROM dual
CONNECT BY level <=regexp_count(tst.audit_ruleaction,'("[^"\]*\"+)')
)) select distinct tag from t;


Taking 20 mins for 50000 records.
Connor McDonald
October 05, 2017 - 5:02 pm UTC

You could try popping back to first principles with a home grown extraction

SQL> select count(*) from test0922;

  COUNT(*)
----------
   8388608

Elapsed: 00:00:00.23
SQL>
SQL> set serverout on
SQL> create or replace
  2  function parser return sys.odcivarchar2list pipelined is
  3    type slist is table of number(1) index by varchar2(30);
  4    s slist;
  5    r varchar2(100);
  6    t varchar2(100);
  7    idx pls_integer;
  8  begin
  9    for i in ( select rtrim(replace(substr(code,2),'"'),']')||',' s from TEST0922 ) loop
 10      r := i.s;
 11      loop
 12         idx := instr(r,',');
 13         t := substr(r,1,idx-1);
 14         if not s.exists(t) then
 15            pipe row ( t );
 16            s(t) := 1;
 17         end if;
 18         r := substr(r,idx+1);
 19         exit when r is null;
 20      end loop;
 21    end loop;
 22  end;
 23  /

Function created.

Elapsed: 00:00:00.01
SQL> select * from table(parser);

COLUMN_VALUE
----------------------------------------------------------------------------------------------------
REJECT
APPRVOE
HOLD
RESEARCH
ANALYSIS
TEST123

6 rows selected.

Elapsed: 00:00:50.48


More to Explore

Performance

Get all the information about database performance in the Database Performance guide.