Hi
First time asking a question here, I have tried to follow the guidelines, apologies for any errors on my part.
Any help you can give will be gratefully received, even if it's just to point me in the right direction rather than an answer
First a bit of context. We are in the process of removing technical debt and as such we performing a migration of data from customised tables back to the original base tables for the application where possible.
As an interim measure, we are migrating the data and replacing the custom tables with views which point to the new storage locations of the data. The view synonyms have the same names as the custom tables that they replace so that downstream systems don't fall over. Once all interfaces and other dependent code have been migrated to use the new locations, these interim views will be dropped.
One of these views has to take data which is now in a clob and produce a view which chunks the clob into 240 character strings to match how the data used to be stored producing
level text1 text string first 240
2 text string next 240
3 text string another 240
.
.
.
I used the following select to produce that output (and I need the surrogate ID from tab_a)
select a_surrogate_id,
v.lvl,
v.chnk,
from tab_a,
(select subj, crse, term, to_char(regexp_substr(s, '.{240}', 1, lvl)) chnk, lvl
from (select subj, crse, term, s, level lvl
from (select b_subj_code subj, b_crse_numb crse, b_valid_from term, substr(b_required_materials,1,4000) s from tab_b)
connect by level <= length(s) / 240) ) v
where a_subj_code = v.subj
and a_crse_numb = v.crse
and v.term = (select max(sy.b_valid_from)
from tab_b sy
where sy.b_subj_code = v.subj
and sy.b_crse_numb = v.crse
and sy.b_valid_from <= a_valid_from)
which works, but is really, really slow (add an infinite number of reallys) and when I do a select from the view with a where clause, I have time to run a marathon before it comes back (and I run slowly).
The explain plan is the following
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16470 | 34307 |
| 1 | NESTED LOOPS | | 815 | 12M| 14222 |
| 2 | VIEW | | 22813 | 357M| 532 |
| 3 | CONNECT BY WITHOUT FILTERING | | | | |
| 4 | TABLE ACCESS FULL | tab_B | 22813 | 11M| 532 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| tab_A | 1 | 37 | 1 |
| 6 | INDEX RANGE SCAN | A_KEY_INDEX | 1 | | 1 |
| 7 | SORT AGGREGATE | | 1 | 17 | |
| 8 | FIRST ROW | | 1 | 17 | 1 |
| 9 | INDEX RANGE SCAN (MIN/MAX) | UNIQUE_B | 1 | 17 | 1 |
------------------------------------------------------------------------------------------
If I add a where clause to this bit
select subj, crse, term, to_char(regexp_substr(s, '.{240}', 1, lvl)) chnk, lvl
from (select subj, crse, term, s, level lvl
from (select b_subj_code subj, b_crse_numb crse, b_valid_from term, substr(b_required_materials,1,4000) s from tab_b <b>where clause</b>)
connect by level <= length(s) / 240)
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16433 | 1 |
| 1 | VIEW | | 1 | 16433 | 1 |
| 2 | CONNECT BY WITHOUT FILTERING | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| tab_B | 1 | 522 | 1 |
| 4 | INDEX RANGE SCAN | UNIQUE_B | 1 | | 1 |
------------------------------------------------------------------------------------
and it returns in a flash, because it can use the indexes and no full table scan.
So, my questions are
- is there a better way of writing the view other than the regexp_substr ? I don't know how long the clob is so I discarded the infinite number of select union select union select union.... possibility. In the example I get the first 4000 characters but in reality it could be more.
- if there isn't, is there a way of avoiding the full table scan?
The full scan is only part of the problem!
The CONNECT BY LEVEL clause links every other row in the table to the hierarchy!
Here's an example with a ten-row table that should be split into five rows each:
create table t (
c1 int, c2 varchar2(1000)
);
insert into t
with rws as (
select level x from dual
connect by level <= 10
)
select x,
rpad ( 'line 1', 10, '1' ) ||
rpad ( ' line 2', 10, '2' ) ||
rpad ( ' line 3', 10, '3' ) ||
rpad ( ' line 4', 10, '4' ) ||
rpad ( ' line 5', 10, '5' )
from rws;
select count(*) from t
connect by level <= length ( c2 ) / 10;
COUNT(*)
----------
111110
The query returns over 100k rows!
Look at a few rows and you'll see it combines row one with the other rows:
select connect_by_root c1 rt, level dpth,
sys_connect_by_path ( c1, '-' ) pth,
t.*
from t
connect by level <= length ( c2 ) / 10
fetch first 10 rows only;
RT DPTH PTH C1 C2
---------- ---------- ---------- ---------- ----------------------------------------
1 1 -1 1 line 11111 line 2222 line 3333 line 4444 line 5555
1 2 -1-1 1 line 11111 line 2222 line 3333 line 4444 line 5555
1 3 -1-1-1 1 line 11111 line 2222 line 3333 line 4444 line 5555
1 4 -1-1-1-1 1 line 11111 line 2222 line 3333 line 4444 line 5555
1 5 -1-1-1-1-1 1 line 11111 line 2222 line 3333 line 4444 line 5555
1 5 -1-1-1-1-2 2 line 11111 line 2222 line 3333 line 4444 line 5555
1 5 -1-1-1-1-3 3 line 11111 line 2222 line 3333 line 4444 line 5555
1 5 -1-1-1-1-4 4 line 11111 line 2222 line 3333 line 4444 line 5555
1 5 -1-1-1-1-5 5 line 11111 line 2222 line 3333 line 4444 line 5555
1 5 -1-1-1-1-6 6 line 11111 line 2222 line 3333 line 4444 line 5555
To avoid this, add this clause to connect by:
( prior c1 = c1 or prior sys_guid() is null )
This ensures it only builds the tree from rows with the same value - this should be the primary key/unique constraint columns. The sys_guid() call is to add unique information to each row to prevent loops.
Do this and we get the 50 rows we expect:
select count(*) from t
connect by level <= length ( c2 ) / 10
and ( prior c1 = c1 or prior sys_guid() is null );
COUNT(*)
----------
50
As you're on an up-to-date version of 19c, you could improve this further using SQL macros! In effect, these enable you to create parameterized views.
So you can create a function that accepts the filter criteria for TAB_B and applies this in the START WITH clause of the CONNECT BY. Use START WITH instead of WHERE because this filters the data
before creating the tree. Using WHERE it'll generate the data for every row in the table, then filter to those rows you want.
Here's a quick example:
create or replace function split_rows (
tab_id integer
) return clob sql_macro as
stmt clob;
begin
stmt := q'[
select * from t
start with c1 = tab_id
connect by level <= length ( c2 ) / 10
and ( prior c1 = c1 or prior sys_guid() is null )
]';
return stmt;
end;
/
select * from split_rows ( 1 );
/*
C1 C2
---------- -----------------------------------------------------------
1 line 11111 line 2222 line 3333 line 4444 line 5555
1 line 11111 line 2222 line 3333 line 4444 line 5555
1 line 11111 line 2222 line 3333 line 4444 line 5555
1 line 11111 line 2222 line 3333 line 4444 line 5555
1 line 11111 line 2222 line 3333 line 4444 line 5555
*/
You'll need to update the code to pass the necessary parameters, so this may be something to consider later in the migration.