Skip to Main Content
  • Questions
  • Datastage product doing inserts using arrays

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Lynn.

Asked: February 01, 2013 - 2:42 pm UTC

Last updated: February 13, 2013 - 1:31 pm UTC

Version: 11.2.0.3

Viewed 1000+ times

You Asked

Tom,
I have tried to no avail to find out how our implementation of PeopleSoft EPM that uses Datastage is accomplishing inserts with arrays.

Searching Oracle documentation and google I find no syntax where an insert with bind variables can be followed with 5000 rows that get inserted all at one run of the insert statement. We are seeing that kind of activity with the datastage parms set of 5000 array size and 5000 transaction size.

I get my statistics from gv$sql and gv$sql_plan. They sometimes show that thousands of rows get inserted on very few insert statements.

Here is sample statistics:
sql statment:


INSERT INTO ADMIN.PS_ITEM_XREF(BUSINESS_UNIT, COMMON_ID, SA_ID_TYPE, ITEM_NBR_CH
ARGE, ITEM_NBR_PAYMENT, EFFDT, SRC_SYS_ID, EFF_STATUS, XREF_AMT, XREF_STATUS, RE
FUND_NBR, PAYMENT_ID_NBR, SF_REVERSAL_IND, WRITEOFF_FLG, PRIORITY, PRIORITY_PMT_
FLG, GL_ASSESSED_AMT, CONTRACT_NUM, ITEM_NBR_CONTRACT, SF_PRIORITY, SCC_ROW_ADD_
OPRID, SCC_ROW_ADD_DTTM, SCC_ROW_UPD_OPRID, SCC_ROW_UPD_DTTM, LOAD_ERROR, DATA_O
RIGIN, CREATED_EW_DTTM, LASTUPD_EW_DTTM, BATCH_SID) VALUES(:1, :2, :3, :4, :5, T
O_DATE(:6, 'YYYY-MM-DD HH24:MI:SS'), :7, :8, :9, :10, :11, :12, :13, :14, :15, :
16, :17, :18, :19, :20, :21, TO_DATE(:22, 'YYYY-MM-DD HH24:MI:SS'), :23, TO_DATE
(:24, 'YYYY-MM-DD HH24:MI:SS'), :25, :26, TO_DATE(:27, 'YYYY-MM-DD HH24:MI:SS'),
TO_DATE(:28, 'YYYY-MM-DD HH24:MI:SS'), :29)


details from my report:



d ch car parsing
b # din user name SQL_ID load time last active
-- --- --- ------------ ------------- ------------------- --------------------
sql text
-------------------------------------------------------------------------------
1 0 ADMIN 7tyvy8y9rhrh3 2013-01-29/09:08:03 JAN-29-2013 16:21:30



overall overall overall elapsed cpu i/o
number number elapsed per exec per exec avg
loads executions in secs in secs in secs wait time
------- ---------- ---------- ---------- ---------- ----------

-
1 769 146.43 .19 .13 .01



other avg avg avg rows
avg disk reads buff gets rows total per
wait time per exec per exec p.exec rows second
---------- ------------ ------------ ------- ----------- --------


.00 0 7,214 4989 3,836,624 26,202


In this above statistics report it is saying that this statement was run 769 times, inserted 3,836,624 rows and did 4989 rows per execution. Overall elapsed time was 146.43 seconds. Each execution averaged .19 seconds and the statement process 26,202 rows per second.


Here is the sql to get this report:


prompt sql72, order by last active

select instance_name, to_char(sysdate, 'MON-DD-YYYY HH24:MI:SS') "DATE / TIME" from dual, v$instance;

col instid heading 'd|b' format 9
col child heading 'ch|#' format 99
col cardyn heading 'car|din' format a3
col usernm heading 'parsing|user name' format a12
col loadtime heading 'load time' format a19
col lastact heading 'last active' format a20
col execs heading 'overall|number|executions' format 99999999
col lds heading 'overall|number|loads' format 99999
col elapsed heading 'overall|elapsed|in secs' format 999999.99
col elap_per heading 'elapsed |per exec|in secs' format 999999.99
col cpu_per heading 'cpu |per exec|in secs' format 999999.99
col iowait heading ' i/o | avg |wait time' format 999999.99
col othwait heading 'other| avg |wait time' format 999999.99
col avg_reads heading 'avg|disk reads|per exec' format 999,999,999
col avg_bufg heading 'avg|buff gets|per exec' format 999,999,999
col avg_rows heading 'avg|rows|p.exec' format 999999
col tot_rows heading 'total|rows' format 99,999,999
col row_per_s heading 'rows|per|second' format 999,999
col sqltext heading 'sql text ' format a80

select
b.inst_id instid,
b.child_number child,
decode( nvl( dbms_lob.substr(other_xml,20,24),'zz'),'cardinality_feedback','yes',' ') cardyn, -- yes->yes, no->blk nulls->zz should show null if no plan rows
parsing_schema_name usernm,
b.sql_id,
substr(b.first_load_time,1,20) loadtime,
to_char(b.last_active_time, 'MON-DD-YYYY HH24:MI:SS') lastact,
b.loads lds,
b.executions execs,
(b.elapsed_time/1000000) elapsed,
round(( elapsed_time / (decode( executions,0,1,executions)) / 1000000 ),2) elap_per,
round(( cpu_time / (decode( executions,0,1,executions)) / 1000000 ),2) cpu_per,
round(( user_io_wait_time / (decode( executions,0,1,executions)) / 1000000 ),2) iowait,
round(( (application_wait_time + concurrency_wait_time + cluster_wait_time ) / (decode( executions,0,1,executions)) / 1000000 ),2) othwait,
round(( disk_reads / (decode( executions,0,1,executions)) ),0) avg_reads,
round(( buffer_gets / (decode( executions,0,1,executions)) ),0) avg_bufg,
round(( rows_processed / (decode( executions,0,1,executions)) ),0) avg_rows,
rows_processed tot_rows,
round( rows_processed / (decode( elapsed_time,0,1,elapsed_time)) * 1000000/1 ,0) row_per_s,
substr(b.sql_text,1,3840) sqltext
from gv$sql b, gv$sql_plan p
where
parsing_schema_name not in ('SYS','SYSTEM','ORACLE_OCM','SYSMAN','DBSNMP','EXFSYS') and
b.hash_value = p.hash_value(+) and
b.address = p.address(+) and
b.sql_id = p.sql_id(+) and
b.child_number = p.child_number(+) and
1 = p.id(+)
order by b.last_active_time desc
/


I tried to see this in action with a net 8 sql trace. Could not decipher enough information. Regular sql trace to my knowledge shows the sql statement, but not the data.

I am seeing similar statistics with update statements. I believe this product is able to update 5000 rows with one update statement full of bind variables that must be followed by an array of data that contains 5000 row updates.

What do you think?

Thanks for your time.




and Tom said...

we absolutely support bulk binds for all DML - select, insert, update, delete and merge.

it is very natural and normal. Pretty much all API's from OCI in C, to jdbc in java, to odbc, to plsql, to whatever support bulk processing.

The sql syntax is the same for each, the API calls are different based on language - but they are all pretty much the same.

this is good, normal and expected. You wouldn't see anything in the sql that looked different, but if you could see the source code, you'd see them making bulk binds of arrays procedurally.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

found the info in the pro*c manual

Lynn Sattler, February 13, 2013 - 1:15 pm UTC

Tom,
Thanks for you answer. I made another attempt at a google search after asking the question and did find the info in the pro*c manual.

I do wonder whether doing commits like every 20,000 or 40,000 rows would be an ok. I am planning on testing with some different numbers. Any opinions?

Regarding formatting a report in a post to you, in my original post I used the code button but still messed up on the formatting. I've attempted it again below (learned I could expand the window also). Wondering if I am doing it right. I select the code button before a paste, do an enter, paste a section of the report, select the code button again followed by a return.
It looks good before I do a submit review.




d ch car parsing
b # din user name SQL_ID load time last active
-- --- --- ------------ ------------- ------------------- --------------------
sql text
-------------------------------------------------------------------------------

1 0 ADMIN 7tyvy8y9rhrh3 2013-01-29/09:08:03 JAN-29-2013 16:21:30


overall overall overall elapsed cpu i/o
number number elapsed per exec per exec avg
loads executions in secs in secs in secs wait time
------- ---------- ---------- ---------- ---------- ----------

-

1 769 146.43 .19 .13 .01



other avg avg avg rows
avg disk reads buff gets rows total per
wait time per exec per exec p.exec rows second
---------- ------------ ------------ ------- ----------- --------



.00 0 7,214 4989 3,836,624 26,202


Thanks,
Lynn
Tom Kyte
February 13, 2013 - 1:31 pm UTC

20k or 40k would be ok, 100k or more would be a little better.

also - if you are using sequences, make sure to change the cache size from the default of 20 to around the arraysize you use.

if you do commit periodically - it will be important to make sure your process is restartable - so that when it fails - you can resume from where it left off!! this is critical.


the code should end up being surrounded by

"<"code">"

"</"code">"


without the double quotes

better formatted report?

Lynn Sattler, February 13, 2013 - 1:54 pm UTC

Tom,
Thanks for your response.
I am going to try to follow your code suggestion from your previous answer. If this post comes through ok, I would suggest changing your instructions
from:
Tip: USE THE CODE TAG around SQL, PL/SQL, or fixed width text
to something like:
Tip: To get columns lined up when pasting, use formatted text, select the code button and do your paste between the symbols ><

                                                                              
 d  ch car parsing                                                            
 b   # din user name    SQL_ID        load time           last active         
-- --- --- ------------ ------------- ------------------- --------------------
sql text
------------------------------------------------------------------------------

 1   0     ADMIN        7tyvy8y9rhrh3 2013-01-29/09:08:03 JAN-29-2013 16:21:30


overall    overall    overall   elapsed        cpu        i/o 
 number     number    elapsed   per exec   per exec       avg 
  loads executions    in secs    in secs    in secs  wait time
------- ---------- ---------- ---------- ---------- ----------

-

      1        769     146.43        .19        .13        .01


    other          avg          avg     avg                 rows
     avg    disk reads    buff gets    rows       total      per
wait time     per exec     per exec  p.exec        rows   second
--------- ------------ ------------ ------- ----------- --------



      .00            0        7,214    4989   3,836,624   26,202


Thanks again,
Lynn

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here