Skip to Main Content
  • Questions
  • Perf : Insert into Select Vs For All

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Manish.

Asked: December 17, 2018 - 5:42 pm UTC

Last updated: June 17, 2022 - 5:31 pm UTC

Version: oracle11

Viewed 10K+ times! This question is

You Asked

I need to perform db operations where I need to inserts .5 million reocrds from another complex query involving joins,exists etc:


As of now I am using INSERT INTO SELECT inside a pl/sq package.

but some time I am observing for same data set this queries takes more time 

Will For All perform better thhan insert into select.

Please suggest.

Also do we have any way to clear process or connections in pl/sqls like java

and Connor said...

Will For All perform better thhan insert into select.

They will be very similar. Both are taking advantage of operations in bulk as opposed to row at a time.

SQL> create table t ( x int );

Table created.

SQL> set timing on
SQL> begin
  2    for i in 1 .. 100000
  3    loop
  4      insert into t values (i);
  5    end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.36
SQL>
SQL> insert into t
  2  select rownum from dual
  3  connect by level <= 100000;

100000 rows created.

Elapsed: 00:00:00.04
SQL>
SQL> declare
  2    type numlist is table of number index by pls_integer;
  3    s numlist;
  4  begin
  5    for i in 1 .. 100000
  6    loop
  7      s(i) := i;
  8    end loop;
  9    forall i in 1 .. 100000
 10      insert into t values (s(i));
 11  end;
 12  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL>
SQL>


As you can see from above, *loading* 5 million records probably won't take much time - the time is most likely being in the complex query. You can easily verify this be comparing:

insert into target_table select ... [your complex query]

versus

exec for i in ( select ... [your complex query] ) loop null; end loop;


Also do we have any way to clear process or connections in pl/sqls like java


Not sure what you mean by that, besides simply disconnecting from the database.

Rating

  (12 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Manish, December 18, 2018 - 2:44 am UTC

Thanks.

One more thing will memory,i/o impact will also be same.

Actually i am facing issue that pl/sql containing insert into select becomes slower after few runs.

I am facing the same issue

Imran, December 18, 2018 - 10:36 am UTC

Hi Tom,

Instead of raising a another new query I though my query is related to current discussion.

Issue description:
I have written anonymous block in oracle plsql. This block simply includes a select statement with a limit of 1000 records that are fetched at a time using bulk and inserted into another table using For All. The select statement has calls to user defined functions. When I execute this block, every time it failed with “ORA-04030: out of process memory when trying to allocate 16328 bytes” error only after inserting 25,90,000 records.



Error:
ORA-04030: out of process memory when trying to allocate 16328 bytes
(koh-kghu sessi,pl/sql cursor meta-data)
ORA-06500: PL/SQL: storage
error
ORA-04030: out of process memory when trying to allocate 16328 bytes
(koh-kghu sessi,pmucalm coll)

I thought the error is due to insufficient memory space available to process plsql block hence I altered block to remove bulk logic and directly insert into … select … query(added where clause to select with rownum between :min and :max) such that it would return only 1000 records. This query is inside a loop, for every iteration the min and max values will keep increasing until all the records are inserted but the new logic also failed with same error as above.

My queries are
(a) What could be cause of error and how do I identify this cause?
(b) Can I resolve this issue without changing memory size?
(c) Why I am getting this error only after successfully inserting 25,90,000 records? Why does it throw error?

DB Version: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

Select * From v$sga;
Fixed Size 2234960
Variable Size 2751464880
Database Buffers 1375731712
Redo Buffers 46137344

Thank you for answering queries.
Imran.

Connor McDonald
December 19, 2018 - 2:06 am UTC

That sounds like you are not releasing space from the array you are using to load the data with.

As we've seen, an efficient loading mechanism is:

- put 'n' rows in an array
- use FORALL to insert that array into a table.

That does *not* mean if you have 100million rows to load, you would put 100million rows into in an array. That's a sure fire way to run out of process memory (ora-4030).

To load 100million rows you would do something like:

loop
- load 5000 rows into an array
- insert 5000 rows
end loop

which each loop loads the "next" 5000 rows. The performance will be similar and you wont have memory issues.

A common error I see with nested tables is people do:

- load 'n' rows into array using the array.extend call
- insert them with FORALL

but then forget to re-initialize the array with something like: array := array();

so every extend call just makes the array larger and larger until the ora-4030 blows up.


But I am releasing space

Imran, December 19, 2018 - 9:08 am UTC

Thanks for the response Connor.

My SRC_TABLE has around 5 million records and the error is thrown after successfully inserting 2.5 million records.

Below is how my plsql block looks
Declare
…
Begin
 
Open l_refcur For Select ... From SRC_TABLE Where...;
Loop
 
Fetch l_refcur
Bulk Collect INTO l_tab LIMIT 1000;
 
FORALL IN l_tab
     Insert Into Dest_Table Using l_tab;
Commit;
Exit When l_tab.Count  < 1000;
<b>l_tab.Delete;</b>
End Loop;
 
End;

Just before End Loop there is call to Delete upon collection variable which should clear and release memory (as per, http://ksun-oracle.blogspot.com/2013/06/oracle-plsql-collection-memory-reclaim.html )
Hence, I am attempting to re-initialize the collection variable for each iteration therefore I cannot understand why this problem is re-occurring?

To check if the cause of the issue is memory leakage I have re-written the code such that there is no plsql processing but still the same error is thrown? which makes me to believe that the cause is not memory leakage.

re-written as
Declare
…
Begin
 
min_countr         := 1;
max_countr         := 1000;
rowcnt             := 1;

Loop

EXIT WHEN nvl(rowcnt,0) = 0; 
 
Insert into DEST_TABLE… Select … From SRC_TABLE Where … rownum between min_countr and max_countr; -- maximum of only 1000 records would be fetched and inserted
 
rowcnt         := SQL%ROWCOUNT;
commit;

min_countr     := min_countr + 1000;
max_countr     := max_countr + 1000;
End Loop;
 
End;

Please let me know what next can we do in analyzing the issue? is there a way to calculate how much memory I require to run this block and what if I do not have resources to increase memory ?

Thanks,
Imran.

wild guess

Racer I., December 19, 2018 - 10:49 am UTC

Hi,

@Imran :

Actually the delete (in your case) is not necessary. BULK COLLECT INTO automatically clears the collection before filling it.

> The select statement has calls to user defined functions.

My guess would be that the memory leak is in one of these functions. If they are normally only called in fresh sessions for small data sets you wouldn't notice this.

Its also more prevalent to EXIT on COUNT = 0 right after the
fetch instead of at the end although that means technically one fetch more...

regards,
Connor McDonald
December 20, 2018 - 7:41 am UTC

Agreed. The basics of the looping look sound, but without seeing the entire code we can't make many more assertions than that.

Issue Resolved.

Mohammed Imran, January 08, 2019 - 4:32 am UTC

Thanks for your inputs on the issue.

I was able to resolve the issue by altering user defined functions (used in select clause of query) as DETERMINISTIC but still could not understand the cause of issue.

Thanks,
Imran.

Optimize forall bulk collect or better yet single SQL

A reader, June 14, 2022 - 4:51 am UTC

Greetings!
This code basically read data from staging table by finding delta rows using minus operation by filtering the delta rows to insert/update on the main table.

IM.IA_LOAN_RAW_STAGE has about 200000 rows.
IM.IA_LOAN_DATA_RAW has 165 million rows.


1. Do you have any suggestions on how to improve/optimize this code from performance perspective.

2. The query identifies the DELTA rows using minus operator.. Can this be written in more efficient way.

2. If possible, Can this rewritten completely direct SQL without using PL/SQL array insert/bulk collect .

Thanks!


CREATE OR REPLACE EDITIONABLE PROCEDURE "IM"."DELTA_APPLY_TO_RAW" (UPDATED_ROWS OUT NUMBER, INSERTED_ROWS OUT NUMBER)
AS
COUNTER NUMBER := 0;
RECORDED_DATE_START_VAL TIMESTAMP := SYS_EXTRACT_UTC(SYSTIMESTAMP);
UTC_START VARCHAR2(20) := '1970-01-01 00:00:00';
INFINITY VARCHAR2(20) := '9999-12-31 23:59:59';
DEFAULT_EFFECTIVE_DATE_START TIMESTAMP := TO_DATE(UTC_START, 'YYYY-MM-DD HH24:MI:SS');
DEFAULT_EFFECTIVE_DATE_END TIMESTAMP := TO_DATE(INFINITY, 'YYYY-MM-DD HH24:MI:SS');
BATCH_SIZE NUMBER := 200000;
CURSOR BT_CURSOR IS
    SELECT
        RECORD_CREATED_TIME,
        RECORD_MODIFIED_TIME,
        .....
        QUALIFYING_RATE_PERCENT,
        TOTAL_INTEREST_PERCENTAGE,
        MI_AND_FUNDINGFEEFINANCED_AMT,
        BROKER_LENDER_COMPANY_ID,
        INITIAL_SUBMIT_DATE,
        BORROWERS_COUNT
    FROM
        IM.IA_LOAN_RAW_STAGE
    MINUS
    SELECT
        RECORD_CREATED_TIME,
        RECORD_MODIFIED_TIME,
        .....
        QUALIFYING_RATE_PERCENT,
        TOTAL_INTEREST_PERCENTAGE,
        MI_AND_FUNDINGFEEFINANCED_AMT,
        BROKER_LENDER_COMPANY_ID,
        INITIAL_SUBMIT_DATE,
        BORROWERS_COUNT
    FROM
        IM.IA_LOAN_DATA_RAW bt
    WHERE
        EXISTS (
            SELECT 1 FROM IM.IA_LOAN_RAW_STAGE SRC
            WHERE
                SRC.ENCOMPASS_ID_INSTANCE_ID = bt.ENCOMPASS_ID_INSTANCE_ID
        )
        AND RECORDED_DATE_START <= RECORDED_DATE_START_VAL
        AND RECORDED_DATE_END > RECORDED_DATE_START_VAL;
TYPE BT_REC IS RECORD (
    RECORD_CREATED_TIME            IM.IA_LOAN_DATA_RAW.RECORD_CREATED_TIME%TYPE,
    RECORD_MODIFIED_TIME           IM.IA_LOAN_DATA_RAW.RECORD_MODIFIED_TIME%TYPE,
    ....,
    QUALIFYING_RATE_PERCENT        IM.IA_LOAN_DATA_RAW.QUALIFYING_RATE_PERCENT%TYPE,
    TOTAL_INTEREST_PERCENTAGE      IM.IA_LOAN_DATA_RAW.TOTAL_INTEREST_PERCENTAGE%TYPE,
    MI_AND_FUNDINGFEEFINANCED_AMT  IM.IA_LOAN_DATA_RAW.MI_AND_FUNDINGFEEFINANCED_AMT%TYPE,
    BROKER_LENDER_COMPANY_ID       IM.IA_LOAN_DATA_RAW.BROKER_LENDER_COMPANY_ID%TYPE,
    INITIAL_SUBMIT_DATE            IM.IA_LOAN_DATA_RAW.INITIAL_SUBMIT_DATE%TYPE,
    BORROWERS_COUNT                IM.IA_LOAN_DATA_RAW.BORROWERS_COUNT%TYPE
);
TYPE BT_TABLE IS TABLE OF BT_REC;
BT_TABLE_VAR BT_TABLE;
BEGIN
    DBMS_OUTPUT.PUT_LINE('['
                         || SYSTIMESTAMP
                         || ']: APPLYING DELTAS TO RAW TABLE');
    UPDATED_ROWS := 0;
    INSERTED_ROWS := 0;
    OPEN BT_CURSOR;
    LOOP
        DBMS_OUTPUT.PUT_LINE('[' || SYSTIMESTAMP || ']: STARTING A NEW BATCH');
        FETCH BT_CURSOR BULK COLLECT INTO BT_TABLE_VAR LIMIT BATCH_SIZE;
        DBMS_OUTPUT.PUT_LINE('[' || SYSTIMESTAMP || ']: ROWS FETCHED: ' || BT_TABLE_VAR.COUNT);
        EXIT WHEN BT_TABLE_VAR.COUNT = 0;
        DBMS_OUTPUT.PUT_LINE('[' || SYSTIMESTAMP || ']: UPDATING RECORDS ');
        FORALL i IN 1..BT_TABLE_VAR.COUNT
            -- update old records
            UPDATE
                IM.IA_LOAN_DATA_RAW TGT
            SET
                RECORDED_DATE_END = RECORDED_DATE_START_VAL, UPDATE_TIME = SYS_EXTRACT_UTC(SYSTIMESTAMP)
            WHERE
                BT_TABLE_VAR(i).ENCOMPASS_ID_INSTANCE_ID = TGT.ENCOMPASS_ID_INSTANCE_ID
                AND RECORDED_DATE_START <= RECORDED_DATE_START_VAL
                AND RECORDED_DATE_END > RECORDED_DATE_START_VAL
            LOG ERRORS INTO IM.IA_LOAN_DATA_ALL_ERROR ('DELTA_TO_RAW_BT') REJECT LIMIT UNLIMITED;
        UPDATED_ROWS := UPDATED_ROWS + SQL%ROWCOUNT;
        DBMS_OUTPUT.PUT_LINE('[' || SYSTIMESTAMP || ']: INSERTING RECORDS');
        FORALL i IN 1..BT_TABLE_VAR.COUNT
        -- insert new records
            INSERT
                INTO
                    IM.IA_LOAN_DATA_RAW (
                        RECORD_CREATED_TIME,
                        RECORD_MODIFIED_TIME,
                        ....,
                        QUALIFYING_RATE_PERCENT,
                        TOTAL_INTEREST_PERCENTAGE,
                        MI_AND_FUNDINGFEEFINANCED_AMT,
                        BROKER_LENDER_COMPANY_ID,
                        INITIAL_SUBMIT_DATE,
                        BORROWERS_COUNT,
                        RECORDED_DATE_START,
                        RECORDED_DATE_END,
                        EFFECTIVE_DATE_START,
                        EFFECTIVE_DATE_END
                    )
                VALUES (
                    BT_TABLE_VAR(i).RECORD_CREATED_TIME,
                    BT_TABLE_VAR(i).RECORD_MODIFIED_TIME,
                    ....
                    BT_TABLE_VAR(i).QUALIFYING_RATE_PERCENT,
                    BT_TABLE_VAR(i).TOTAL_INTEREST_PERCENTAGE,
                    BT_TABLE_VAR(i).MI_AND_FUNDINGFEEFINANCED_AMT,
                    BT_TABLE_VAR(i).BROKER_LENDER_COMPANY_ID,
                    BT_TABLE_VAR(i).INITIAL_SUBMIT_DATE,
                    BT_TABLE_VAR(i).BORROWERS_COUNT,
                    RECORDED_DATE_START_VAL,
                    DEFAULT_EFFECTIVE_DATE_END,
                    DEFAULT_EFFECTIVE_DATE_START,
                    DEFAULT_EFFECTIVE_DATE_END
                )
            LOG ERRORS INTO IM.IA_LOAN_DATA_ALL_ERROR ('DELTA_TO_RAW_BT') REJECT LIMIT UNLIMITED;
        INSERTED_ROWS := INSERTED_ROWS + SQL%ROWCOUNT;
        COMMIT;
        DBMS_OUTPUT.PUT_LINE('['
                         || SYSTIMESTAMP
                         || ']: FINISHED APPLYING DELTAS TO RAW TABLE - INSERTED ROWS:'
                         || INSERTED_ROWS);
    END LOOP;
    CLOSE BT_CURSOR;
EXCEPTION
    WHEN OTHERS THEN
        DBMS_OUTPUT.PUT_LINE(SQLERRM);
        ROLLBACK;
END;
/


<code>

Chris Saxon
June 14, 2022 - 1:24 pm UTC

1. Assuming you're accessing real tables (not views) the query is pretty simple, so there's not much to do.

I'd check the EXISTS subquery - is this necessary? It means you're reading the rows in ia_loan_raw_stage (at least) twice.

It may also be unnecessary from a logical perspective. Essentially it says:

"Remove the values from STAGE that are already in RAW"

So you're only left with new values. Meaning the UPDATE also will have nothing to do.

Is this correct?

If so, going further - as this is a staging table - is it possible to NOT insert the existing values from RAW into STAGE in the first place? This means you can skip the MINUS check altogether.

I'd also check if there's index on:

im.ia_loan_data_raw ( recorded_date_end, recorded_date_start )

and - if not - try creating one.

There may also be optimizations you can do based on how likely it is the values in the staging table already exist in ia_loan_data_raw.

If 99% of the time

2. It might be faster to change it to use NOT EXISTS instead, e.g.:

select * from im.ia_loan_raw_stage
where  not exists (
  select * from im.ia_loan_data_raw
  where  ...
);


This is something you'll need to try out though.

3. You should be able to rewrite it to a single MERGE:

merge into im.ia_loan_data_raw r
using (
 select * from im.ia_loan_raw_stage
 minus 
 select * from im.ia_loan_data_raw
 where  ...
) s
on  ( ... )
when matched then
  update set ...
when not matched then
  insert ( ... ) values ( ... );

Optimize forall bulk collect or better yet single SQL

A reader, June 14, 2022 - 2:46 pm UTC

Hi Connor

Thanks for all your inputs and suggestions.

Here are some of my updates-

>> 1. Assuming you're accessing real tables (not views) the query is pretty simple, so there's not much to do.

They are all real tables.

>> I'd check the EXISTS subquery - is this necessary? It means you're reading the rows in ia_loan_raw_stage (at least) twice.
It may also be unnecessary from a logical perspective. Essentially it says:
"Remove the values from STAGE that are already in RAW"
So you're only left with new values. Meaning the UPDATE also will have nothing to do.
Is this correct?


Valid point. I will double check this.
"WHERE
EXISTS (
SELECT 1 FROM IMT.IADB_LOAN_DATA_RAW_STAGE SRC
WHERE
SRC.ENCOMPASS_ID_INSTANCE_ID = bt.ENCOMPASS_ID_INSTANCE_ID
)"

AND RECORDED_DATE_START <= RECORDED_DATE_START_VAL
AND RECORDED_DATE_END > RECORDED_DATE_START_VAL;

>> If so, going further - as this is a staging table - is it possible to NOT insert the existing values from RAW into STAGE in the first place? This means you can skip the MINUS check altogether.
I'd also check if there's index on:
im.ia_loan_data_raw ( recorded_date_end, recorded_date_start )
and - if not - try creating one.


I have below indexes on the raw table:

INDEX_NAME COLUMN_NAME COLUMN_POSITION
------------------------------ ------------------------------ ---------------
IA_LOAN_DATA_RAW_IDX01 ENCOMPASS_ID_INSTANCE_ID 1
IA_LOAN_DATA_RAW_IDX01 RECORDED_DATE_START 2
IA_LOAN_DATA_RAW_IDX01 EFFECTIVE_DATE_START 3

IA_LOAN_DATA_RAW_IDX02 DAY_OF_LOCK 1
IA_LOAN_DATA_RAW_IDX02 RECORDED_DATE_START 2

IA_LOAN_DATA_RAW_IDX03 GFE_APPLICATION_DATE 1
IA_LOAN_DATA_RAW_IDX03 RECORDED_DATE_START 2

IA_LOAN_DATA_RAW_IDX04 RECORDED_DATE_START 1
IA_LOAN_DATA_RAW_IDX04 RECORDED_DATE_END 2


>> There may also be optimizations you can do based on how likely it is the values in the staging table already exist in ia_loan_data_raw.

I assume that 99% of the time the values already exists in RAW table. I will check this out.

Does using Parallel Query (DOP - 16) will further speed up this search ?

So using NOT EXIST, the query will be look like this. I will give it a try.


So, the query will look like this:

select * from im.ia_loan_raw_stage SRC
where not exists (
select * from im.ia_loan_data_raw
SRC.ENCOMPASS_ID_INSTANCE_ID = bt.ENCOMPASS_ID_INSTANCE_ID
)
AND RECORDED_DATE_START <= RECORDED_DATE_START_VAL
AND RECORDED_DATE_END > RECORDED_DATE_START_VAL;

);



>> 3. You should be able to rewrite it to a single MERGE:

Thanks. This is what i was looking for. I will check with dev team if PL/SQL array processing can be completely eliminated.


merge into im.ia_loan_data_raw r
using (
select * from im.ia_loan_raw_stage
minus
select * from im.ia_loan_data_raw
where ...
) s
on ( ... )
when matched then
update set ...
when not matched then
insert ( ... ) values ( ... );

Chris Saxon
June 14, 2022 - 4:23 pm UTC

Does using Parallel Query (DOP - 16) will further speed up this search ?

Maybe.

For "fast" queries that process "few" rows, using parallel can make the overall process slower. Even if parallel processing does help, you get diminishing returns as you add more processes. Exactly how this kicks in depends on your data.

Rather than picking a specific degree of parallelism, it may be better to use Auto DOP and let the optimizer figure this out for you:

https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/degree-parallel.html#GUID-76AC9190-43AC-4529-89AD-1BDA95C6DFBD

So, the query will look like this:

That's not quite the same!

Using MINUS you're comparing all the columns - you'll get a row if there are any differences. To have an equivalent NOT EXISTS, you need to compare ALL the columns selected in MINUS.

MINUS is also null safe - NULL MINUS NULL returns no rows - but when comparing in the NOT EXISTS you'll need to have null checks if these values are possible.

Optimize forall bulk collect or better yet single SQL

A reader, June 14, 2022 - 2:53 pm UTC

Hi Chris

Thanks for all your inputs and suggestions.

Optimize forall bulk collect or better yet single SQL

A reader, June 14, 2022 - 9:46 pm UTC

Hi Chris,

Thanks!

1)
-- I got little clarification from Dev team on the reason for the update statement.

IM.IA_LOAN_DATA_RAW is a bi-temporal table - on an update we close out the old record and insert a new one.

So, the update will be executed if there is matching row as well as the new row should be inserted. So for matching row, update+insert should be done.

I think merge can only do either update or insert.

Any thoughts ?


2)
>>
Using MINUS you're comparing all the columns - you'll get a row if there are any differences. To have an equivalent NOT EXISTS, you need to compare ALL the columns selected in MINUS.
MINUS is also null safe - NULL MINUS NULL returns no rows - but when comparing in the NOT EXISTS you'll need to have null checks if these values are possible.


I will run the query to check this. Thanks



Chris Saxon
June 17, 2022 - 5:31 pm UTC

Yes, MERGE is either or, not both.

So you'll need at least two statements: an INSERT and an UPDATE.

One thing that might help:

With UPDATE you can fetch details from the affected rows. So it's possible you could run the UPDATE and return the results into an array. Then use these to drive the INSERT along the lines of:

update ...
set ...
where ...
returning ...
bulk collect into arr

forall i in 1 .. arr.count 
  insert into ...

DBA

Sajan Varughese, June 16, 2022 - 6:08 pm UTC

Could you please point me tow oracle 11g client for windows 64 bit?; I am unable to get to the download in oracle.com

@Download client

A reader, June 18, 2022 - 11:32 pm UTC

https://www.oracle.com/database/technologies/instant-client/winx64-64-downloads.html

There are several packages, you need one of these
Either Instant Client Package - Basic
Or Instant Client Package - Basic Light

Dependent on what you are looking for, you may not need any other, above two are both fully working clients.

Why do you have "DBA" in your Title, looks good on your CV?

FORALL vs INSERT into SELECT .. FROM

Isaac William, June 19, 2022 - 1:49 am UTC

Hi Connor,

I often read what you write and have watched quite a few of your presentations. Very informative and thanks very much for the time you guys put into this.

You said that the performance for both would be similar. My question is the following:
- Will not the fact that you are moving rows from the database layer to PL/SQL, creating a memory structure in PL/SQL and then loading them back into the database be a bit of overhead?
- In particular, if you are **not** transforming anything in the PL/SQL layer but dumping it back into the target table as-is, will not the INSERT INTO SELECT .. FROM be faster?

(If you need to derive/ compute something complex row-by-row in the PL/SQL layer, that's another matter.)

Thanks- Isaac.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library