Connor and Chris will both be at AI World from October 12 to October 17 , the premier Oracle conference of 2025. If you're in Vegas, please come say Hi or pop into our sessions

Thanks for the question, Joe.

Asked: November 26, 2006 - 10:10 pm UTC

Last updated: January 05, 2007 - 8:58 am UTC

Version: 10.1.2

Viewed 1000+ times

You Asked

I have been tasked with replacing Ascential and Informatica as our primary ETL tools for PeopleSoft's EPM application. So I searched your site and asked other EPM users how they used Oracle to solve similar requirements, and started building my program in PL/SQL.

I have cut & paste a copy of my program for your review. It moves about 10.5 million rows for one table from PeopleSoft FSM. All in all, I reduced the total time for 85 ETLs from 14 hours to 5, across PeopleSoft HCM & FSM to EPM.

I am wondering:
1- Any obvious coding errors or miss use?
2- If there is anything I can do better?
3- Is there anything I can do to make things more dynamic?
4- Can I improve performance in anyway?

Thanks in advance for your review and feedback.

The program code is:

CREATE OR REPLACE PACKAGE "LOAD_PS_JRNL_LN_TO_EPM" AS
PROCEDURE LOAD_PS_JRNL_LN;
--
-- Load the FSM PS_PS_JRNL_LN Data over to EPM.
-- Input parameters:
--
--
END;
/
CREATE OR REPLACE PACKAGE BODY "LOAD_PS_JRNL_LN_TO_EPM" AS
PROCEDURE LOAD_PS_JRNL_LN
IS

TYPE my_varray IS TABLE OF PS_JRNL_LN%ROWTYPE INDEX BY PLS_INTEGER;
v_data my_varray;
n number := 1000;
v_dttm VARCHAR2(20);

CURSOR c_tbl_select IS
SELECT BUSINESS_UNIT
,JOURNAL_ID
,JOURNAL_DATE
,UNPOST_SEQ
,JOURNAL_LINE
,LEDGER
,'FSM' --SRC_SYS_ID
,ACCOUNT
,ALTACCT
,DEPTID
,OPERATING_UNIT
,PRODUCT
,' ' --FUND_CODE
,' ' --CLASS_FLD
,' ' --PROGRAM_CODE
,' ' --BUDGET_REF
,AFFILIATE
,' ' --AFFILIATE_INTRA1
,' ' --AFFILIATE_INTRA2
,' ' --CHARTFIELD1
,' ' --CHARTFIELD2
,' ' --CHARTFIELD3
,PROJECT_ID
,' ' --BOOKCODE
,' ' --GL_ADJUST_TYPE
,' ' --BUDGET_PERIOD
,SCENARIO
,CURRENCY_CD
,STATISTICS_CODE
,MONETARY_AMOUNT
,MOVEMENT_FLAG
,STATISTIC_AMOUNT
,JRNL_LN_REF
,SUSPENDED_LINE
,LINE_DESCR
,JRNL_LINE_STATUS
,JOURNAL_LINE_DATE
,FOREIGN_CURRENCY
,RT_TYPE
,FOREIGN_AMOUNT
,RATE_DIV
,RATE_MULT
,PROCESS_INSTANCE
,DOC_TYPE
,DOC_SEQ_NBR
,DOC_SEQ_DATE
,DOC_SEQ_STATUS
,JRNL_LINE_SOURCE
,BUSINESS_UNIT_PC
,ACTIVITY_ID
,ANALYSIS_TYPE
,RESOURCE_TYPE
,RESOURCE_CATEGORY
,RESOURCE_SUB_CAT
,BUDGET_DT
,BUDGET_LINE_STATUS
,CLOSING_STATUS
,' ' --ENTRY_EVENT
,' ' --EE_PROC_STATUS
,'0' --JOURNAL_LINE_GFEE
,'0' --IU_TRAN_GRP_NBR
,' ' --IU_ANCHOR_FLAG
,'N' -- LOAD_ERROR
,'S' -- DATA_ORIGIN
,SYSDATE -- CREATED_EW_DTTM
,SYSDATE -- LASTUPD_EW_DTTM
,'0.1' -- BATCH_SID
FROM PS_JRNL_LN@FSPROD;
--WHERE ROWNUM <= 50000;

BEGIN

v_dttm := TO_CHAR(SYSDATE, 'DD-MON-YYYY HH24:MI:SS');
DBMS_OUTPUT.PUT_LINE ('Start DTTM is: '|| v_dttm);

EXECUTE IMMEDIATE 'TRUNCATE TABLE PS_JRNL_LN';
EXECUTE IMMEDIATE 'COMMIT';

OPEN c_tbl_select;
LOOP
FETCH c_tbl_select BULK COLLECT INTO v_data LIMIT n;
FORALL i IN v_data.FIRST..v_data.LAST
INSERT INTO PS_JRNL_LN values v_data(i);
EXIT WHEN c_tbl_select%NOTFOUND;
END LOOP;
COMMIT;
CLOSE c_tbl_select;

v_dttm := TO_CHAR(SYSDATE, 'DD-MON-YYYY HH24:MI:SS');
DBMS_OUTPUT.PUT_LINE ('End DTTM is: '|| v_dttm);

END LOAD_PS_JRNL_LN;
END LOAD_PS_JRNL_LN_TO_EPM;
/

and Tom said...

I am at a loss to understand why this is not just:

truncate table PS_JRNL_LN;

insert /*+ APPEND */ into PS_JRNL_LN
select ... from PS_JRNL_LN@FSPROD;

why does any code exist here at all?

Rating

(25 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Joe, November 27, 2006 - 9:47 am UTC

Hmmm...
When would one use what I wrote vs. what you suggested? I attended a conference, where the speaker, a well-known PL/SQL author suggested FORALL & BULK COLLECT were faster at times for data migrations, so I thought I would give it a whirl.

November 27, 2006 - 7:28 pm UTC

Never write code unless and until you have to.

truncate table PS_JRNL_LN;

insert /*+ APPEND */ into PS_JRNL_LN
select ... from PS_JRNL_LN@FSPROD;

so simple, so elegant, so much more efficient than CODING IT YOURSELF

Faster than What ?

K Balfe, November 27, 2006 - 10:28 am UTC

Joe, Are you sure they said BULK COLLECT was faster than straight SQL? I suspect they said it was faster than the same kind of procedural code without BULK COLLECT.

is there something I miss ???

mlot, November 27, 2006 - 11:15 am UTC

>> Hmmm...
When would one use what I wrote vs. what you suggested?
<<

umm when you want it to run faster ?

Joe, November 27, 2006 - 1:55 pm UTC

Yes I am sure. The author stated when using BULK COLLECT with FORALL, when dealing with large volumes of data, their benchmarks and test results showed this to be the case in 10g. This being my first attempt at PL/SQL development, I have seen incredible performance improvement over the ETL tools stated above, but, I am looking for constructive criticism and direction to improve these ETL methods when moving data to our data warehouse.

Maybe I miss understood the presentation, but I the question I posed back to Tom was, when should I use BULK COLLECT and a FORALL loop, vs. a single SQL statement as he stated above? What is the hardware impact using each method?

November 27, 2006 - 7:45 pm UTC

if you are moving data from table T1 to table T2 - doing it in any fashion other than a single insert would be "not smart", "not efficient".

Tell you what - try it out :)

(bulk collect/forall cannot even do direct path!)

If I can do it in a single SQL statement - I shall
I shall write code only when forced to

Joe, November 27, 2006 - 2:21 pm UTC

For those interested (PS_JRNL_LN) in timings, here is what I have found so far in my testing:

Informatica: 1 hr 49 mins. 35 secs.
PL/SQL (as designed above): 27 mins. 27 secs.
Suggestion: 27 mins. 19 secs.

Row Count Processed: 10.5 million and growing....

November 27, 2006 - 7:46 pm UTC

how long did it take you to code - the suggested method versus, the plsql method

and you have parallel and nologging and other options available to you with the suggestion....

that you do not have with plsql.

do the Mantra Dance ;)

mlot, November 27, 2006 - 3:29 pm UTC

</code> http://tkyte.blogspot.com/2006/10/slow-by-slow.html <code>

Ctrl+F "mantra"

Alberto Dell'Era, November 27, 2006 - 3:40 pm UTC

> PL/SQL (as designed above): 27 mins. 27 secs.
> Suggestion: 27 mins. 19 secs.

They're so close, that with 99.9% probability the bottleneck is not the local insert, but something else.

Probably the select on the remote system, or the network.

November 27, 2006 - 8:00 pm UTC

I would concur with that.

Joe, November 27, 2006 - 4:02 pm UTC

Thx "MLOT"... but my question still stands. When would you use a BULK COLLECT & FORALL LOOP as opposed to a single SQL statement? I was assuming (please be easy here, I am new to this) that using a FORALL loop with the LIMIT attribute, would grab a set amount data and process it. Some of the tables are constantly being updated, and I assumed this method would allow me to avoid snapshot too old error, since it was processing a limited set of data. Make sense?

November 27, 2006 - 8:07 pm UTC

you would use it only when you cannot use the single sql statement.

There - that is all.

when you CANNOT do it in a single sql statement, that is when you set down and consider writing code.

Avoid Cursor processing

Vinny, November 27, 2006 - 4:47 pm UTC

Joe,
Use Cursors (even with Bulk Collect) only when you have a need to do something at the row level (row by row = slow by slow as Tom often said). If you have no need to do logic within a loop, use straight SQL. You can almost always do everything in straight SQL without resorting to cursor based processing. A lot of times when you think you need a cursor, you can split the processing into a series of steps and still use single SQL statements in each step. This will be more efficient than cursor based processing anyday.

Vinny

BULK COLLECT vs insert/select

cosmin, November 27, 2006 - 4:50 pm UTC

well, I for one, would use bulk collect when I have a pretty busy system with lots of activity, and I want to make use of any little slices in cpu/disk that are available. insert/select is mainly used, at least to me, when the system is relatively idle and all or most resources could be devoted to this big insert, which btw, could be optimized further as in

insert /*+ append */ into target_table
select /*+ parallel(p,4) */ * from source_table p

I'll let you guess what the above mean (and when to do them) or you can google asktom ;-)

so, these are different tools, for different environments.

my $0.02 ;-)

November 27, 2006 - 8:09 pm UTC

eh - no. Think again, use a single sql statement, do not add unnecessary cpu and other overheads by trying to "outsmart the computer"

your logic does not compute

SQL vs PL/SQL

Richard, November 27, 2006 - 4:57 pm UTC

After using Warehouse Builder for over 5 years, I definitely agree with Tom that straight SQL is always faster than any row-level or bulk collect process you can write in PL/SQL. However, if you must have row-level logging of errors (duplicate rows, violation of business rules, etc) you're options are limited to PL/SQL cursor-based code unless you're using 10.2. Search this site for examples, or check out OTN, for the exceptions clause.

A reader, November 27, 2006 - 4:57 pm UTC

Thx for the feedback.

I have used the insert/ select statement for years with hints, however as my tables grew failures appeared. I began to explore PL/SQL and thought if my tables dont have much idle time if any at all, BULK COLLECT & FORALL with a LIMIT was my answer. All I can say is the numbers have spoken for themselves in tests.

I appreciate your input, along with everyone else's. I guess I need to do more research on my methods to see if I can grab some of improvements across the board.

November 27, 2006 - 8:10 pm UTC

"failures appeared"

what does that mean exactly.

Joe

Joe, November 27, 2006 - 5:00 pm UTC

Sorry, that was my reply above.

So I gather, without an exception clause or similar need, straight SQL is recommended? Is that a good rule to stick to?

November 27, 2006 - 8:11 pm UTC

and even in 10gr2 with log errors.... you have the chance to erase EVEN MORE CODE.

fight the slow by slow processing!

Yes

Rahul, November 27, 2006 - 5:43 pm UTC

</code> http://tkyte.blogspot.com/2006/10/slow-by-slow.html <code>

In there Tom says:

"
My mantra, that I'll be sticking with thank you very much, is:

* You should do it in a single SQL statement if at all possible.

* If you cannot do it in a single SQL Statement, then do it in PL/SQL.

* If you cannot do it in PL/SQL, try a Java Stored Procedure.

* If you cannot do it in Java, do it in a C external procedure.

* If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it

think in sets...

learn all there is to learn about SQL...
"
So, yes, single sql statement should be enough.

Rahul.

Joe, November 27, 2006 - 9:57 pm UTC

Ok I got it... thanks for the replies.
No code unless the single insert doesnt apply.

Thanks all.

Any other feedback is still appreciated. As for time to code, it took me an hour to figure out, and 10 minutes to clone for each of the 85 tables.

Updates and other DMLs

Vivek Gandhi, November 30, 2006 - 10:42 am UTC

Hi Tom

1. Do you feel that using "a single SQL" mantra apply to all DMLs (not only Inserts) ?

2. I have seen that sometimes updates take a lot of time when using a single SQL and a row by row approach is used just so the process can be seen progressing. What are your thoughts on that ?

thanks
Vivek

November 30, 2006 - 10:49 am UTC

1) absolutely.

2) you'd have to show me. It is very possible to write inefficient <ANY LANGUAGE HERE>, and it is very possible write efficient <ANY LANGUAGE HERE>. so, maybe you were comparing a poorly written, inefficient SQL statement with some finely tuned code.

Can we use PL/SQL during to handle these conditions ??

A reader, December 01, 2006 - 2:24 am UTC

I feel that using a PL/SQL code instead of a single SQL statement might be useful only when we could possibly face 1. snapshot too old error
2. no sufficient space in the target tablespace ..
am i right Tom ??

December 01, 2006 - 5:32 am UTC

huh?????

no idea where you might be going with that one. I don't see how plsql would "avoid" any of that.

Can we use PL/SQL during to handle these conditions ??

Vinicius Pacheco, December 01, 2006 - 6:30 am UTC

I guess if you are facing a snapshot too old problem you can avoid it by open a cursor that retrieves a small set of data, process it, mark it as processed, close the cursor re-open the same cursor in a loop that only ends when the cursor doesn't retrieve any row.

I agree: it's much slower to process and more dificult to write code. But: which other other way to handle the snapshot too old?

December 01, 2006 - 6:50 am UTC

SIZE YOUR UNDO APPROPRIATELY.

it is really that simple, think of the money, time, effort, maintenance, debugging effort and so on you could save.

Limited resources (memory, CPU, disk contention) added to picture

Mark Ruddock, December 19, 2006 - 10:16 am UTC

We have a situation similar but constrained by too little memory and UNDO and IO wait, a daily move of 2 million rows to history and the APPEND is not good or appropriate due to the fact that some data is being deleted after the bulk load.

If this were a one time, never look back and you have the DB to yourself situation, yes, do the insert /*+ append nologging, parallel */ select * from

But since we need to play nice and not consume lots of undo and temporary space that is needed for other 100 pieces of processing running at the same time, its seems better to do this in chunks, not because its faster, but because it scales linearly and does not consume large memory and temp disk storage to hold all the pending rows before the commit.

It is not a matter of speed or efficiencyif there were plenty of resources and we were not looking at 2 million rows each day( and growing), we would be ok with the insert as select as it is the fastest way (I agree with Tom on that).

We have found that with > 1 million rows, the process began to run extreemly long. The select is from a 1 million row staging table into a growing history table.
To mitigate the performance degredation and failures, we have felt compelled to write this as PL/SQL so that the limited resources are not taken away from the OLTP processing flow where we have SLAs.

Is this still recommended to be insert as select? If so, at what row count would this become trouble, or would the recommendation be to increase undo and memory to always have capacity?

December 19, 2006 - 10:24 am UTC

fail to see how this consumes less memory.

If it your choice, I would always go for "get the right resources for the job, you waste so much money thinking about this problem - more than you would spend on the right resources"

Memory explanation

Mark Ruddock, December 19, 2006 - 11:53 am UTC

the insert into used space in UNDO equal to the number of rows times the length of the rows..

Memory for Oracle is managed for the insert into but I am using memory loosely, refering to the UNDO/ROLLBACK mainly, that millions of rows without a commit is expensive in terms of shear UNDO space consumed. Our system resources are greatly impacted by large scale inserts as these and so we are contrained by this.

A driving loop with a limit rows and a FORALL insert will manage both UNDO/ROLLBACK and memory to remain within contraints of resources.

Thanks for your response,

December 19, 2006 - 12:51 pm UTC

undo and rollback is not memory, not at all - very confusing. Sort of like people saying "my laptop has 80gb of 'memory'" when they mean "an 80gb disk drive"

so I'll say it again.... don't see the tie in to memory at all.

just an idea..

Graham Oakes, December 29, 2006 - 6:50 am UTC

Mark,

just an idea and maybe it doesn't sit well with your architecture but couldn't you just partition switch the staging table into a partition of your history table?

Graham

Creating large UNDOs is good.

Lars, January 03, 2007 - 11:02 pm UTC

Hi all,
A very prosperous 2007 to you all.

In the perfect world where money is no issue, like in the POKER business, or in a 24/7 application the UNDOs could be as large as they should be.

But...sometimes you have to stick with the snapshot too old thingy, since the money is not there, the disk is not there.

And doing row by row with COMMIT every 1000 rows or so will, as I see it, do a pretty good work around for the snapshot too old. If-and-only-if it is maintaining a system, doing NEW applications one could think twice and actually demand the proper disk space. But that's another issue.

Further, in several Oracle-systems I have seen out here, it is very common to have a row by row insertion in message broker systems sending data to and from applications.
Of course, they could or should perhaps be rewritten.

But it is well structured, well used and widely used message handling systems that are not easy to rewrite. Mostly due to lack of resources, money and people and knowledge and sometimes lazyness...

Best Regards,
Lars

January 05, 2007 - 8:58 am UTC

if you commit every 1000 rows, you better spend the MONEY TO DEVELOP THE CODE TO BE RESTARTABLE - and that can be very expensive and most people ignore it. Until their procedure fails and then they are left with a mess to clean up (more money)

Commits and Undo

Mike, January 11, 2007 - 7:30 pm UTC

Lars,

In my experience, frequent commits do not prevent the "Snapshot too old" error, so I don't think that's a good reason to use row by row processing. I've tried commits and they didn't work for me, so perhaps you've been lucky? (Or perhaps I'm mistaken?)

If I run into that and the DBA's tell me the undo/rollback is large already, I usually break it down into smaller chunks. Perhaps a single SQL, but run it twice - once for each half of the data or whatever gets the job done. A parameter is used to tell it what part of the data to process.

Just sharing in case you do get the snapshot too old and commits stop fixing it.

Mike

single sql better than PL/SQL?

charlie, December 07, 2007 - 8:44 am UTC

http://tkyte.blogspot.com/2006/10/slow-by-slow.html
In there Tom says:
* You should do it in a single SQL statement if at all possible.
* If you cannot do it in a single SQL Statement, then do it in PL/SQL.

I was a true beliver of that until this case. We have a huge table, and we need to update it with values from another table.  

Let me give you an example
create table test (id number,col1 varchar2(20)) pctfree 99;
(this is only for creating a huge table with lots of columns)
declare
v_number number :=1;
begin
loop
insert into test values (v_number, 'a');
v_number := v_number+1;
exit when v_number > 1000000;
end loop;
commit;
end;

create table test1 as select * from test where id < 400000;
update test1 set col1='b';
commit;

create index test_ind on test(id);
create index test1_ind on test1(id);
analyze table test compute statistics;
analyze table test1 compute statistics;
SQL> update test
     set col1=(select col1 from test1
     where test.id=test1.id )
     where exists
     (select 1 from test1
     where test.id=test1.id);
399999 rows updated.
Elapsed: 00:04:15.01
SQL> declare
  2  v_count number;
  3  cursor test_curs is
  4  select test.id,test.col1 from test,test1
  5  where test.id=test1.id;
  6  begin
  7  for test_cur in test_curs
  8  loop
  9  update test
 10  set col1=test_cur.col1
 11  where id=test_cur.id;
 12  end loop;
 13  commit;
 14  end;
 15  /
PL/SQL procedure successfully completed.
Elapsed: 00:04:30.82

Here the difference is not huge yet, but in our case, single SQL will take an hour to finish, while PL/SQL only take 5 min.  The bigger the table test, the huger the difference. 

And the reason is simple, as there will be 40% of rows needs to be updated, Oracle will pick full table scan of table test(a huge table).  While in PL/SQL, "UPDATE TEST SET COL1=:B2 WHERE ID=:B1" will be able to use index.

I would like to see your comments.

Thanks very much as usual for your wonderful support.

Charlie

RE: single SQL better (above)

Duke Ganote, November 18, 2009 - 8:43 am UTC

@Charlie. Not seeing it. Looks like the single SQL is using the index, and it's faster. Of course, I did a full analyze for all columns and indexes, and I was on 10gR2.

SQL*Plus: Release 10.2.0.1.0 - Production on Wed Nov 18 08:26:44 2009

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


insert into test ( ID, COL1 )
select level as ID,  'a' as COL1
from dual
connect by level <= 1000000;
Elapsed: 00:06:41.41
SQL> create table test1 as select * from test where id < 400000;
Elapsed: 00:07:21.68
SQL> update test1 set col1='b';
Elapsed: 00:00:03.11
SQL> commit;
Elapsed: 00:00:00.03
SQL> create index test_ind on test(id);
Elapsed: 00:03:00.02
SQL> create index test1_ind on test1(id);
Elapsed: 00:00:02.47
SQL> analyze table test compute statistics for all columns for all indexes;
Elapsed: 00:01:07.56
SQL> analyze table test1 compute statistics for all columns for all indexes;
Elapsed: 00:00:05.74
SQL> set autot on exp;
SQL> update test
       set col1=(select col1 from test1
       where test.id=test1.id )
       where exists
       (select 1 from test1
       where test.id=test1.id);
Elapsed: 00:00:17.01

<<edited!>>
--------------------------------------------------
| Id  | Operation                    | Name      | 
--------------------------------------------------
|   5 |   TABLE ACCESS BY INDEX ROWID| TEST1     |
|*  6 |    INDEX RANGE SCAN          | TEST1_IND |
--------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEST"."ID"="TEST1"."ID")
   6 - access("TEST1"."ID"=:B1)

Note
-----
   - dynamic sampling used for this statement

SQL> rollback;
Elapsed: 00:00:08.92
SQL> ed
Wrote file afiedt.buf

   declare
     v_count number;
     cursor test_curs is
     select test.id,test.col1 from test,test1
     where test.id=test1.id;
   begin
     for test_cur in test_curs
      loop
      update test
      set col1=test_cur.col1
      where id=test_cur.id;
    end loop;
    --commit;
  end;
SQL> /
Elapsed: 00:00:28.55

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Faster than What ?

is there something I miss ???

do the Mantra Dance ;)

Avoid Cursor processing

BULK COLLECT vs insert/select

SQL vs PL/SQL

Joe

Yes

Updates and other DMLs

Can we use PL/SQL during to handle these conditions ??

Can we use PL/SQL during to handle these conditions ??

Limited resources (memory, CPU, disk contention) added to picture

Memory explanation

just an idea..

Creating large UNDOs is good.

Commits and Undo

single sql better than PL/SQL?

RE: single SQL better (above)

More to Explore

PL/SQL demos

PL/SQL docs

PL/SQL ETL program- Can I do anything better?

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Faster than What ?

is there something I miss ???

do the Mantra Dance ;)

Avoid Cursor processing

BULK COLLECT vs insert/select

SQL vs PL/SQL

Joe

Yes

Updates and other DMLs

Can we use PL/SQL during to handle these conditions ??

Can we use PL/SQL during to handle these conditions ??

Limited resources (memory, CPU, disk contention) added to picture

Memory explanation

just an idea..

Creating large UNDOs is good.

Commits and Undo

single sql better than PL/SQL?

RE: single SQL better (above)

More to Explore

PL/SQL demos

PL/SQL docs