Well, not having the table defs makes it a bit of a guess but I'll guess that inv is defined as nullable which prevents us from doing some nice optimizations. Here is an example where I compare the results of three inserts and their performance. This should get you going. Suggest the hash anti-join method as being the probable "best". Depends on the size of the tables really. Also, I assume an index on invdatew for the NOT EXISTS query, if no such index, don't do that one!
ops$tkyte@ORA817DEV.US.ORACLE.COM> drop table invdatew;
Table dropped.
ops$tkyte@ORA817DEV.US.ORACLE.COM> drop table investor;
Table dropped.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table invdatew
2 as
3 select 1 inv, a.*, sysdate dt from all_objects a where 1=0;
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table investor
2 as
3 select 1 inv, a.* from all_objects a where 1=0;
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> create index invdatew_idx on invdatew(inv);
Index created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> exec gen_data( 'INVDATEW', 5000);
PL/SQL procedure successfully completed.
ops$tkyte@ORA817DEV.US.ORACLE.COM> exec gen_data( 'INVESTOR', 5000);
PL/SQL procedure successfully completed.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table invdatew compute statistics for table for all indexes for all indexed columns;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table investor compute statistics for table for all indexes for all indexed columns;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set sql_trace=true;
Session altered.
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into invdatew
2 select a.*, sysdate from investor a
3 where inv not in (select inv from invdatew);
5000 rows created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> rollback;
Rollback complete.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into invdatew
2 select a.*, sysdate from investor a
3 where inv is not null
4 and inv not in (select /*+ HASH_AJ */ inv from invdatew where inv is not null );
5000 rows created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> rollback;
Rollback complete.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into invdatew
2 select a.*, sysdate from investor a
3 where not exists ( select *
4 from invdatew
5 where invdatew.inv = a.inv );
5000 rows created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> rollback;
Rollback complete.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
Now, the three queries are *different* in that if INV is null in invdatew -- the first one returns a different answer (no rows) where as the second two might return some rows -- see
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:442029737684 <code>
for the reasoning behind that...
So, I think the second two queries are OK for you (if you have null in inv in invdatew you would always get ZERO rows -- probably not what you wanted right?)
So, looking at the tkprof for these inserts we see hugely different performance characteristics:
insert into invdatew
select a.*, sysdate from investor a
where inv not in (select inv from invdatew)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.02 0 0 0 0
Execute 1 85.64 86.08 0 1974190 31754 5000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 85.66 86.10 0 1974190 31754 5000
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 511
Rows Row Source Operation
------- ---------------------------------------------------
5001 FILTER
5001 TABLE ACCESS FULL INVESTOR
5000 TABLE ACCESS FULL INVDATEW
This query was processed like this:
for each row in investor
loop
FULL SCAN INVDATEW making sure that INV not in there
end loop
that is -- 5000 full scans of INVDATEW (5000 = number of rows in investor!)********************************************************************************
insert into invdatew
select a.*, sysdate from investor a
where inv is not null
and inv not in (select /*+ HASH_AJ */ inv from invdatew where inv is not null )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.67 3.84 0 550 10631 5000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.68 3.85 0 550 10631 5000
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 511
Rows Row Source Operation
------- ---------------------------------------------------
5001 HASH JOIN ANTI
5000 TABLE ACCESS FULL INVESTOR
5000 VIEW VW_NSO_1
5000 INDEX FAST FULL SCAN (object id 44573)
Wow -- that is different -- from over 85 cpu seconds down to .6, over 86 seconds runtime to under 4 seconds.
This was processed sort of like this:
for every row in outer join investor to invdatew
if invdatew.inv is null then insert this record
Much more efficient...********************************************************************************
insert into invdatew
select a.*, sysdate from investor a
where not exists ( select *
from invdatew
where invdatew.inv = a.inv )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.72 4.32 0 10672 10623 5000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.72 4.32 0 10672 10623 5000
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 511
Rows Row Source Operation
------- ---------------------------------------------------
5001 FILTER
5001 TABLE ACCESS FULL INVESTOR
5000 INDEX RANGE SCAN (object id 44573)
Now, not as good as the hash_aj but.... very close, very close. The logical IO's would make me avoid this approach however, rather do 550 LIO's then 10,672. This was processed like this:
for every record in investor
run a subquery that uses an index to find a row in invdatew
if not found
then insert
end if;
end loop
Hopefully, you can use that hash_aj and your query will run in minutes or less.