First up, some observations:
- You're selecting 70% of a 200M+ row table (~140M rows)
That's a chunk of data. It's going to take a while to process this whatever you do. And selecting such a large percentage of the table means a full scan of accounts is likely the best approach.
- Transferring that amount of data to the client takes time
If I use a smaller example that returns 75k rows, the query takes ~2 mins in SQL*Plus, etc.:
create table accounts as
select
replace(replace(timestamp,'-',''),':','') || replace(replace(timestamp,'-',''),':','') a
from all_objects
;
create table patterns as
select distinct substr(a,1,13) || '%' as p
from accounts
;
exec dbms_stats.gather_table_stats(user, 'accounts');
exec dbms_stats.gather_table_stats(user, 'patterns');
set term off
alter session set tracefile_identifier = chris;
exec dbms_monitor.session_trace_enable(waits => true);
select * from accounts acc
inner join patterns pat
on acc.a like pat.p;
But as the formatted tracefile shows, the vast majority of this time is back-and-forth with the client (see 114s on "SQL*Net message from client" value at the end):
select * from accounts acc
inner join patterns pat
on acc.a like pat.p
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 753 3.83 3.88 0 152089 0 75116
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 755 3.84 3.88 0 152089 0 75116
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 70
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
75116 75116 75116 NESTED LOOPS (cr=152089 pr=0 pw=0 time=4030450 us cost=24262 size=6610208 card=150232)
75116 75116 75116 TABLE ACCESS FULL ACCOUNTS (cr=1105 pr=0 pw=0 time=117288 us cost=108 size=2178364 card=75116)
75116 75116 75116 TABLE ACCESS FULL PATTERNS (cr=150984 pr=0 pw=0 time=3432685 us cost=0 size=30 card=2)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 753 0.00 0.00
SQL*Net message from client 753 0.32 114.49
Whereas if I loop through the query in PL/SQL, the whole thing completes in ~2s!
begin
for rws in (
select * from accounts acc
inner join patterns pat
on acc.a like pat.p
) loop
null;
end loop;
end;
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.01 0.02 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.02 0.02 0 0 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 70
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.14 0.14
SELECT *
FROM
ACCOUNTS ACC INNER JOIN PATTERNS PAT ON ACC.A LIKE PAT.P
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 752 1.79 1.80 92 152087 0 75116
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 754 1.80 1.80 92 152087 0 75116
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 70 (recursive depth: 1)
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
75116 75116 75116 NESTED LOOPS (cr=152087 pr=92 pw=0 time=1904147 us cost=24262 size=6610208 card=150232)
75116 75116 75116 TABLE ACCESS FULL ACCOUNTS (cr=1104 pr=92 pw=0 time=49706 us cost=108 size=2178364 card=75116)
75116 75116 75116 TABLE ACCESS FULL PATTERNS (cr=150983 pr=0 pw=0 time=1603095 us cost=0 size=30 card=2)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 15 0.00 0.00
db file scattered read 30 0.00 0.00
This is because all those network round trips are gone.
What are you doing with all this data? Do you really need to select all of it? Can you process it in PL/SQL to reduce network effects?
- There's no other predicates in this query
So there's not much room for indexing strategies to help.
Bearing in mind all that, here's some things you could look into:
ParallelIf you've got the capacity on your server, running the query in parallel may help reduce the runtime
Materialized ViewPre-computing the join will reduce some of the work the query does. You still need to fetch gobs of data though, so don't expect miracles.
Manually Splitting It Up (DIY Parallelism)Fetch small amounts at a time and manually stitch the results back together later.