Joe, November 27, 2006 - 9:47 am UTC
Hmmm...
When would one use what I wrote vs. what you suggested? I attended a conference, where the speaker, a well-known PL/SQL author suggested FORALL & BULK COLLECT were faster at times for data migrations, so I thought I would give it a whirl.
November 27, 2006 - 7:28 pm UTC
Never write code unless and until you have to.
truncate table PS_JRNL_LN;
insert /*+ APPEND */ into PS_JRNL_LN
select ... from PS_JRNL_LN@FSPROD;
so simple, so elegant, so much more efficient than CODING IT YOURSELF
Faster than What ?
K Balfe, November 27, 2006 - 10:28 am UTC
Joe, Are you sure they said BULK COLLECT was faster than straight SQL? I suspect they said it was faster than the same kind of procedural code without BULK COLLECT.
is there something I miss ???
mlot, November 27, 2006 - 11:15 am UTC
>> Hmmm...
When would one use what I wrote vs. what you suggested?
<<
umm when you want it to run faster ?
Joe, November 27, 2006 - 1:55 pm UTC
Yes I am sure. The author stated when using BULK COLLECT with FORALL, when dealing with large volumes of data, their benchmarks and test results showed this to be the case in 10g. This being my first attempt at PL/SQL development, I have seen incredible performance improvement over the ETL tools stated above, but, I am looking for constructive criticism and direction to improve these ETL methods when moving data to our data warehouse.
Maybe I miss understood the presentation, but I the question I posed back to Tom was, when should I use BULK COLLECT and a FORALL loop, vs. a single SQL statement as he stated above? What is the hardware impact using each method?
November 27, 2006 - 7:45 pm UTC
if you are moving data from table T1 to table T2 - doing it in any fashion other than a single insert would be "not smart", "not efficient".
Tell you what - try it out :)
(bulk collect/forall cannot even do direct path!)
If I can do it in a single SQL statement - I shall
I shall write code only when forced to
Joe, November 27, 2006 - 2:21 pm UTC
For those interested (PS_JRNL_LN) in timings, here is what I have found so far in my testing:
Informatica: 1 hr 49 mins. 35 secs.
PL/SQL (as designed above): 27 mins. 27 secs.
Suggestion: 27 mins. 19 secs.
Row Count Processed: 10.5 million and growing....
November 27, 2006 - 7:46 pm UTC
how long did it take you to code - the suggested method versus, the plsql method
and you have parallel and nologging and other options available to you with the suggestion....
that you do not have with plsql.
do the Mantra Dance ;)
mlot, November 27, 2006 - 3:29 pm UTC
Alberto Dell'Era, November 27, 2006 - 3:40 pm UTC
> PL/SQL (as designed above): 27 mins. 27 secs.
> Suggestion: 27 mins. 19 secs.
They're so close, that with 99.9% probability the bottleneck is not the local insert, but something else.
Probably the select on the remote system, or the network.
November 27, 2006 - 8:00 pm UTC
I would concur with that.
Joe, November 27, 2006 - 4:02 pm UTC
Thx "MLOT"... but my question still stands. When would you use a BULK COLLECT & FORALL LOOP as opposed to a single SQL statement? I was assuming (please be easy here, I am new to this) that using a FORALL loop with the LIMIT attribute, would grab a set amount data and process it. Some of the tables are constantly being updated, and I assumed this method would allow me to avoid snapshot too old error, since it was processing a limited set of data. Make sense?
November 27, 2006 - 8:07 pm UTC
you would use it only when you cannot use the single sql statement.
There - that is all.
when you CANNOT do it in a single sql statement, that is when you set down and consider writing code.
Avoid Cursor processing
Vinny, November 27, 2006 - 4:47 pm UTC
Joe,
Use Cursors (even with Bulk Collect) only when you have a need to do something at the row level (row by row = slow by slow as Tom often said). If you have no need to do logic within a loop, use straight SQL. You can almost always do everything in straight SQL without resorting to cursor based processing. A lot of times when you think you need a cursor, you can split the processing into a series of steps and still use single SQL statements in each step. This will be more efficient than cursor based processing anyday.
Vinny
BULK COLLECT vs insert/select
cosmin, November 27, 2006 - 4:50 pm UTC
well, I for one, would use bulk collect when I have a pretty busy system with lots of activity, and I want to make use of any little slices in cpu/disk that are available. insert/select is mainly used, at least to me, when the system is relatively idle and all or most resources could be devoted to this big insert, which btw, could be optimized further as in
insert /*+ append */ into target_table
select /*+ parallel(p,4) */ * from source_table p
I'll let you guess what the above mean (and when to do them) or you can google asktom ;-)
so, these are different tools, for different environments.
my $0.02 ;-)
November 27, 2006 - 8:09 pm UTC
eh - no. Think again, use a single sql statement, do not add unnecessary cpu and other overheads by trying to "outsmart the computer"
your logic does not compute
SQL vs PL/SQL
Richard, November 27, 2006 - 4:57 pm UTC
After using Warehouse Builder for over 5 years, I definitely agree with Tom that straight SQL is always faster than any row-level or bulk collect process you can write in PL/SQL. However, if you must have row-level logging of errors (duplicate rows, violation of business rules, etc) you're options are limited to PL/SQL cursor-based code unless you're using 10.2. Search this site for examples, or check out OTN, for the exceptions clause.
A reader, November 27, 2006 - 4:57 pm UTC
Thx for the feedback.
I have used the insert/ select statement for years with hints, however as my tables grew failures appeared. I began to explore PL/SQL and thought if my tables dont have much idle time if any at all, BULK COLLECT & FORALL with a LIMIT was my answer. All I can say is the numbers have spoken for themselves in tests.
I appreciate your input, along with everyone else's. I guess I need to do more research on my methods to see if I can grab some of improvements across the board.
November 27, 2006 - 8:10 pm UTC
"failures appeared"
what does that mean exactly.
Joe
Joe, November 27, 2006 - 5:00 pm UTC
Sorry, that was my reply above.
So I gather, without an exception clause or similar need, straight SQL is recommended? Is that a good rule to stick to?
November 27, 2006 - 8:11 pm UTC
and even in 10gr2 with log errors.... you have the chance to erase EVEN MORE CODE.
fight the slow by slow processing!
Yes
Rahul, November 27, 2006 - 5:43 pm UTC
</code>
http://tkyte.blogspot.com/2006/10/slow-by-slow.html <code>
In there Tom says:
"
My mantra, that I'll be sticking with thank you very much, is:
* You should do it in a single SQL statement if at all possible.
* If you cannot do it in a single SQL Statement, then do it in PL/SQL.
* If you cannot do it in PL/SQL, try a Java Stored Procedure.
* If you cannot do it in Java, do it in a C external procedure.
* If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it
think in sets...
learn all there is to learn about SQL...
"
So, yes, single sql statement should be enough.
Rahul.
Joe, November 27, 2006 - 9:57 pm UTC
Ok I got it... thanks for the replies.
No code unless the single insert doesnt apply.
Thanks all.
Any other feedback is still appreciated. As for time to code, it took me an hour to figure out, and 10 minutes to clone for each of the 85 tables.
Updates and other DMLs
Vivek Gandhi, November 30, 2006 - 10:42 am UTC
Hi Tom
1. Do you feel that using "a single SQL" mantra apply to all DMLs (not only Inserts) ?
2. I have seen that sometimes updates take a lot of time when using a single SQL and a row by row approach is used just so the process can be seen progressing. What are your thoughts on that ?
thanks
Vivek
November 30, 2006 - 10:49 am UTC
1) absolutely.
2) you'd have to show me. It is very possible to write inefficient <ANY LANGUAGE HERE>, and it is very possible write efficient <ANY LANGUAGE HERE>. so, maybe you were comparing a poorly written, inefficient SQL statement with some finely tuned code.
Can we use PL/SQL during to handle these conditions ??
A reader, December 01, 2006 - 2:24 am UTC
I feel that using a PL/SQL code instead of a single SQL statement might be useful only when we could possibly face 1. snapshot too old error
2. no sufficient space in the target tablespace ..
am i right Tom ??
December 01, 2006 - 5:32 am UTC
huh?????
no idea where you might be going with that one. I don't see how plsql would "avoid" any of that.
Can we use PL/SQL during to handle these conditions ??
Vinicius Pacheco, December 01, 2006 - 6:30 am UTC
I guess if you are facing a snapshot too old problem you can avoid it by open a cursor that retrieves a small set of data, process it, mark it as processed, close the cursor re-open the same cursor in a loop that only ends when the cursor doesn't retrieve any row.
I agree: it's much slower to process and more dificult to write code. But: which other other way to handle the snapshot too old?
December 01, 2006 - 6:50 am UTC
SIZE YOUR UNDO APPROPRIATELY.
it is really that simple, think of the money, time, effort, maintenance, debugging effort and so on you could save.
Limited resources (memory, CPU, disk contention) added to picture
Mark Ruddock, December 19, 2006 - 10:16 am UTC
We have a situation similar but constrained by too little memory and UNDO and IO wait, a daily move of 2 million rows to history and the APPEND is not good or appropriate due to the fact that some data is being deleted after the bulk load.
If this were a one time, never look back and you have the DB to yourself situation, yes, do the insert /*+ append nologging, parallel
*/ select * from
But since we need to play nice and not consume lots of undo and temporary space that is needed for other 100 pieces of processing running at the same time, its seems better to do this in chunks, not because its faster, but because it scales linearly and does not consume large memory and temp disk storage to hold all the pending rows before the commit.
It is not a matter of speed or efficiency
if there were plenty of resources and we were not looking at 2 million rows each day( and growing), we would be ok with the insert as select as it is the fastest way (I agree with Tom on that).
We have found that with > 1 million rows, the process began to run extreemly long. The select is from a 1 million row staging table into a growing history table.
To mitigate the performance degredation and failures, we have felt compelled to write this as PL/SQL so that the limited resources are not taken away from the OLTP processing flow where we have SLAs.
Is this still recommended to be insert as select? If so, at what row count would this become trouble, or would the recommendation be to increase undo and memory to always have capacity?
December 19, 2006 - 10:24 am UTC
fail to see how this consumes less memory.
If it your choice, I would always go for "get the right resources for the job, you waste so much money thinking about this problem - more than you would spend on the right resources"
Memory explanation
Mark Ruddock, December 19, 2006 - 11:53 am UTC
the insert into used space in UNDO equal to the number of rows times the length of the rows..
Memory for Oracle is managed for the insert into but I am using memory loosely, refering to the UNDO/ROLLBACK mainly, that millions of rows without a commit is expensive in terms of shear UNDO space consumed. Our system resources are greatly impacted by large scale inserts as these and so we are contrained by this.
A driving loop with a limit rows and a FORALL insert will manage both UNDO/ROLLBACK and memory to remain within contraints of resources.
Thanks for your response,
December 19, 2006 - 12:51 pm UTC
undo and rollback is not memory, not at all - very confusing. Sort of like people saying "my laptop has 80gb of 'memory'" when they mean "an 80gb disk drive"
so I'll say it again.... don't see the tie in to memory at all.
just an idea..
Graham Oakes, December 29, 2006 - 6:50 am UTC
Mark,
just an idea and maybe it doesn't sit well with your architecture but couldn't you just partition switch the staging table into a partition of your history table?
Graham
Creating large UNDOs is good.
Lars, January 03, 2007 - 11:02 pm UTC
Hi all,
A very prosperous 2007 to you all.
In the perfect world where money is no issue, like in the POKER business, or in a 24/7 application the UNDOs could be as large as they should be.
But...sometimes you have to stick with the snapshot too old thingy, since the money is not there, the disk is not there.
And doing row by row with COMMIT every 1000 rows or so will, as I see it, do a pretty good work around for the snapshot too old. If-and-only-if it is maintaining a system, doing NEW applications one could think twice and actually demand the proper disk space. But that's another issue.
Further, in several Oracle-systems I have seen out here, it is very common to have a row by row insertion in message broker systems sending data to and from applications.
Of course, they could or should perhaps be rewritten.
But it is well structured, well used and widely used message handling systems that are not easy to rewrite. Mostly due to lack of resources, money and people and knowledge and sometimes lazyness...
Best Regards,
Lars
January 05, 2007 - 8:58 am UTC
if you commit every 1000 rows, you better spend the MONEY TO DEVELOP THE CODE TO BE RESTARTABLE - and that can be very expensive and most people ignore it. Until their procedure fails and then they are left with a mess to clean up (more money)
Commits and Undo
Mike, January 11, 2007 - 7:30 pm UTC
Lars,
In my experience, frequent commits do not prevent the "Snapshot too old" error, so I don't think that's a good reason to use row by row processing. I've tried commits and they didn't work for me, so perhaps you've been lucky? (Or perhaps I'm mistaken?)
If I run into that and the DBA's tell me the undo/rollback is large already, I usually break it down into smaller chunks. Perhaps a single SQL, but run it twice - once for each half of the data or whatever gets the job done. A parameter is used to tell it what part of the data to process.
Just sharing in case you do get the snapshot too old and commits stop fixing it.
Mike
single sql better than PL/SQL?
charlie, December 07, 2007 - 8:44 am UTC
http://tkyte.blogspot.com/2006/10/slow-by-slow.html
In there Tom says:
* You should do it in a single SQL statement if at all possible.
* If you cannot do it in a single SQL Statement, then do it in PL/SQL.
I was a true beliver of that until this case. We have a huge table, and we need to update it with values from another table.
Let me give you an example
create table test (id number,col1 varchar2(20)) pctfree 99;
(this is only for creating a huge table with lots of columns)
declare
v_number number :=1;
begin
loop
insert into test values (v_number, 'a');
v_number := v_number+1;
exit when v_number > 1000000;
end loop;
commit;
end;
create table test1 as select * from test where id < 400000;
update test1 set col1='b';
commit;
create index test_ind on test(id);
create index test1_ind on test1(id);
analyze table test compute statistics;
analyze table test1 compute statistics;
SQL> update test
set col1=(select col1 from test1
where test.id=test1.id )
where exists
(select 1 from test1
where test.id=test1.id);
399999 rows updated.
Elapsed: 00:04:15.01
SQL> declare
2 v_count number;
3 cursor test_curs is
4 select test.id,test.col1 from test,test1
5 where test.id=test1.id;
6 begin
7 for test_cur in test_curs
8 loop
9 update test
10 set col1=test_cur.col1
11 where id=test_cur.id;
12 end loop;
13 commit;
14 end;
15 /
PL/SQL procedure successfully completed.
Elapsed: 00:04:30.82
Here the difference is not huge yet, but in our case, single SQL will take an hour to finish, while PL/SQL only take 5 min. The bigger the table test, the huger the difference.
And the reason is simple, as there will be 40% of rows needs to be updated, Oracle will pick full table scan of table test(a huge table). While in PL/SQL, "UPDATE TEST SET COL1=:B2 WHERE ID=:B1" will be able to use index.
I would like to see your comments.
Thanks very much as usual for your wonderful support.
Charlie
RE: single SQL better (above)
Duke Ganote, November 18, 2009 - 8:43 am UTC
@Charlie. Not seeing it. Looks like the single SQL is using the index, and it's faster. Of course, I did a full analyze for all columns and indexes, and I was on 10gR2.
SQL*Plus: Release 10.2.0.1.0 - Production on Wed Nov 18 08:26:44 2009
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
insert into test ( ID, COL1 )
select level as ID, 'a' as COL1
from dual
connect by level <= 1000000;
Elapsed: 00:06:41.41
SQL> create table test1 as select * from test where id < 400000;
Elapsed: 00:07:21.68
SQL> update test1 set col1='b';
Elapsed: 00:00:03.11
SQL> commit;
Elapsed: 00:00:00.03
SQL> create index test_ind on test(id);
Elapsed: 00:03:00.02
SQL> create index test1_ind on test1(id);
Elapsed: 00:00:02.47
SQL> analyze table test compute statistics for all columns for all indexes;
Elapsed: 00:01:07.56
SQL> analyze table test1 compute statistics for all columns for all indexes;
Elapsed: 00:00:05.74
SQL> set autot on exp;
SQL> update test
set col1=(select col1 from test1
where test.id=test1.id )
where exists
(select 1 from test1
where test.id=test1.id);
Elapsed: 00:00:17.01
<<edited!>>
--------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------
| 5 | TABLE ACCESS BY INDEX ROWID| TEST1 |
|* 6 | INDEX RANGE SCAN | TEST1_IND |
--------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TEST"."ID"="TEST1"."ID")
6 - access("TEST1"."ID"=:B1)
Note
-----
- dynamic sampling used for this statement
SQL> rollback;
Elapsed: 00:00:08.92
SQL> ed
Wrote file afiedt.buf
declare
v_count number;
cursor test_curs is
select test.id,test.col1 from test,test1
where test.id=test1.id;
begin
for test_cur in test_curs
loop
update test
set col1=test_cur.col1
where id=test_cur.id;
end loop;
--commit;
end;
SQL> /
Elapsed: 00:00:28.55