Thanks for the question, Tauseef.

Asked: February 04, 2016 - 10:36 am UTC

Last updated: February 09, 2016 - 5:51 pm UTC

Version: 11g

Viewed 10K+ times! This question is

You Asked

I have a stored procedure that collects information for around 4-5 million records( on a monthly basis).
The way I have used is as below.

create or replace procedure monthly_report(report_Date date) is
v_start_date date;
v_end_date date;
C_LIMIT PLS_INTEGER := 100000;
type monthly_record is RECORD
(
--declarations here for the columns.
);
type tab_month_rec is table of monthly_record;
v_tab_month_rec tab_month_rec;
i number;
INDX number;
cursor example_cursor(start_date date, end_date date) is
select c1,c2,c3 from
t1,t2,t3,t4
where
date_of_record> start_date and date_of_record < end_date
--and all the join conditions here
;

begin
dbms_output.put_line('Monthly Report Execution starts');
-- logic for calculating start_date and end_date
i := 0;
open example_cursor(v_start_date,v_end_Date);
LOOP
i:=i+1;
DBMS_OUTPUT.PUT_LINE('Loop no:'||I);

FETCH example_cursor
BULK COLLECT INTO v_tab_month_rec
LIMIT C_LIMIT;
EXIT WHEN v_tab_month_rec.COUNT = 0;

FORALL INDX IN 1 .. V_TAB_MORT_LOANS.COUNT
--DML statement here
END LOOP;
COMMIT;
end monthly_report;

this procedure takes 1hr to 2hr depending on data volume.
Can i improve upon the code?
Like the value of C_LIMIT - 100000. should I make it more or less considering 4-5 million records?
COMMIT statement, should it be inside the LOOP ?
what all other factors - (except the code in the cursor) can be considered.
-Execution Plan for the cursor query

------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 330 | 61710 | 127K (2)| 00:25:26 |
|* 1 | FILTER | | | | | |
| 2 | NESTED LOOPS | | | | | |
| 3 | NESTED LOOPS | | 331 | 61897 | 127K (2)| 00:25:26 |
| 4 | NESTED LOOPS | | 331 | 54284 | 126K (2)| 00:25:18 |
| 5 | VIEW | VW_SQ_1 | 6544 | 166K| 106K (3)| 00:21:23 |
|* 6 | FILTER | | | | | |
| 7 | HASH GROUP BY | | 6544 | 223K| 106K (3)| 00:21:23 |
|* 8 | FILTER | | | | | |
|* 9 | INDEX RANGE SCAN | ARLT_MORTDATA_PK | 37M| 1258M| 104K (1)| 00:20:59 |
| 10 | TABLE ACCESS BY INDEX ROWID | ARLT_MORTDATA | 1 | 138 | 3 (0)| 00:00:01 |
|* 11 | INDEX UNIQUE SCAN | ARLT_MORTDATA_PK | 1 | | 2 (0)| 00:00:01 |
|* 12 | INDEX UNIQUE SCAN | ARLT_MORTGAGE_PK | 1 | | 1 (0)| 00:00:01 |
| 13 | TABLE ACCESS BY INDEX ROWID | ARLT_MORTGAGE | 1 | 23 | 2 (0)| 00:00:01 |
| 14 | CONCATENATION | | | | | |
|* 15 | TABLE ACCESS BY INDEX ROWID | ARLT_HISTTRAN_ARCH | 1 | 44 | 10 (0)| 00:00:01 |
|* 16 | INDEX RANGE SCAN | ARLT_HISTTRAN_INDX | 1 | | 9 (0)| 00:00:01 |
|* 17 | TABLE ACCESS BY INDEX ROWID | ARLT_HISTTRAN_ARCH | 1 | 44 | 5 (0)| 00:00:01 |
|* 18 | INDEX RANGE SCAN | ARLT_HISTTRAN_LNBR_EFFDT_IDX | 1 | | 4 (0)| 00:00:01 |
|* 19 | FILTER | | | | | |
|* 20 | TABLE ACCESS BY INDEX ROWID | ARLT_MORTDATA | 1 | 38 | 4 (0)| 00:00:01 |
|* 21 | INDEX UNIQUE SCAN | ARLT_MORTDATA_PK | 1 | | 3 (0)| 00:00:01 |
| 22 | SORT AGGREGATE | | 1 | 35 | | |
|* 23 | FILTER | | | | | |
|* 24 | TABLE ACCESS BY INDEX ROWID| ARLT_MORTDATA | 12 | 420 | 214 (0)| 00:00:03 |
|* 25 | INDEX RANGE SCAN | ARLT_MORTDATA_LN_IDX | 215 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------.
What is the DML statement exactly? - This is insert into staging table for further processing. Insert into staging(column names) values (values from the pl/sql table variable).

If I add commit before the END Loop then probably I have to add a delete statement just after the begin block. Because for the same month there should not be duplicate records and at any given point of time there will be only 1 month of data in this staging table. I just asked about putting the commit before END LOOP, because it might have something to do with the redo logs.

and Chris said...

To tune the performance you first need to profile the code to figure out where it's spending most of it's time. There are a number of tools you could use to do this, such as:

- tkprof - SQL execution stats

https://oracle-base.com/articles/misc/sql-trace-10046-trcsess-and-tkprof

- PL/SQL hierarchical profiler - PL/SQL execution stats

https://oracle-base.com/articles/11g/plsql-hierarchical-profiler-11gr1

- DIY instrumentation - recording the time each PL/SQL statement completed to a table, then analyzing this to find which consumed the most time.

https://github.com/OraOpenSource/Logger

The procedure is relatively simple (basically just a select then an update). But Oracle could be spending most of its time doing the select, the update or roughly an equal amount of time on each.

You need to do this analysis. Without these figures we're just guessing.

That said, the explain plan is estimating that step 9:

|* 9 | INDEX RANGE SCAN | ARLT_MORTDATA_PK | 37M| 1258M| 104K (1)| 00:20:59 |

takes 20 minutes and returns 37 million rows. If this is accurate then that's a big chunk of your time accounted for. Trace the SQL execution to verify this.

If you get an execution plan and still need help, then you can post it here. Please post the complete SQL statement and use the code tags to preserve the indentation!

A limit of 100,000 is rather high. The commit should be outside the loop:

https://asktom.oracle.com/pls/apex/f?p=100:11:0::NO::P11_QUESTION_ID:4951966319022

Rating

(3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Helpful

Tauseef, February 08, 2016 - 11:22 am UTC

Thanks for your help.
I feel there is no problem with the commit statement being inside the loop or outside the loop. I thought it would take less time if it is kept inside the loop.

100,000 is a large limit, probably 25000 should be ideal limit size.

Somebody suggested that converting to single SQL statement would make it faster. I guess if we cocnvert it to single sql statement then it will be slower as it has to fetcch all the records and then insert into the new table.
I think cursor, limit, collection and forall loop actually speeds up things.

The links for profilers are very useful.

February 08, 2016 - 3:22 pm UTC

Placing the commit inside the loop will not make your code faster. Please read the link I provided at the end of the answer.

I would still say 25,000 is high. You would need to test in your environment to find the point where increasing this adds no performance benefit/slows it down. This really is the last mile of tuning though. Optimize the SQL statements first.

Converting the loop to a single SQL statement is likely to be faster (though it's not guaranteed).

I guess if we cocnvert it to single sql statement then it will be slower as it has to fetcch all the records and then insert into the new table.

This is what the code (cursor, bulk collect, forall) is doing anyway. But it has more SQL statements = more parsing, more SQL<>PL/SQL contexts switches, etc.

Single SQL Statement

Tauseef, February 09, 2016 - 6:13 am UTC

Insert into staging_table as select * from transaction_table where conditions.

The above is the format for single SQL statement. If the above query fetches 2-3 million records then first all records will be populated and then inserted at one shot.

This process is simplified by limiting the fetch size to a limit( say 10,000) and then use a collection to temporarily hold the data and then a single forall with a Insert statement.

I guess the second approach will be faster. If not can you please correct where I am wrong.

February 09, 2016 - 7:16 am UTC

insert-select is most likely going to be the fastest.

But if you want to really know why its taking so long, you need to trace it.

- exec dbms_monitor.session_trace_enable(waits=>true)
- insert-select-where
- exec dbms_monitor.session_trace_disable
- select * from v$diag_info

which will tell you the name of the tracefile.

Cheers,
Connor

Clarification

Tauseef, February 09, 2016 - 7:55 am UTC

Hi Connor,

I am still confused as to why insert-select would be faster.
If this is the case why would people write stored procedures.
Just to remind that I am working with around 4-5 million records.

I have personally seen the difference with the current stored procedure that it takes lesser time than insert-into-select.

First reason is that insert-into-select does not support LIMIT, so memory usage cannot be controlled.

I have seen that my procedure completes at least 20% faster than insert-into-select. May be if I change the value of LIMIT from 100,000 to a reasonable value then it will be still faster.

Please correct me if I am wrong.

February 09, 2016 - 5:51 pm UTC

You can place an insert-select in a stored procedure (in fact, I recommend it!).

The single statement solution do less parsing (one statement vs many) and do fewer SQL-PL/SQL context switches (1 vs many). This doesn't guarantee the SQL only solution is faster. But usually it is.

Yours may be one of the rare cases where bulk collect is faster. We need some figures and the outputs from the traces to help you figure out how to make this better.

When it comes to memory, the SQL solution will use temp space if the working set doesn't fit.

Using bulk collect, if you set the limit too high then you'll get an out of memory error. So you have to pick a limit low enough to ensure this doesn't happen. If your limit is 100,000 then you may find you run out (depending on the data and what else is going on in your system).

The SQL solution can adapt to how much is available. It won't fail.

Chris

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Helpful

Single SQL Statement

Clarification

More to Explore

PL/SQL demos

PL/SQL docs

Procedure taking long time to run

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Helpful

Single SQL Statement

Clarification

More to Explore

PL/SQL demos

PL/SQL docs