Thanks for the question, marcelo.

Asked: December 01, 2015 - 5:47 pm UTC

Last updated: December 15, 2015 - 7:01 am UTC

Version: 11.2.0.4.0

Viewed 10K+ times! This question is

You Asked

I have 2 ctl files :

$ cat E1.ctl
LOAD DATA
INFILE 'E.DAT'
APPEND
INTO TABLE E
WHEN (1:6) != 'HEADER' and (1:7) != 'TRAILER'
(
EMP_ID position (1:10) char,
EMP_NAME position (11:60) char,
DEP_ID position (61:63) char
)
INTO TABLE E_HEADER
WHEN (1:6) = 'HEADER'
(
ACTION_TYPE position (8:10) char,
EXTRACTION_DATE position (11:18) date "YYYYMMDD"
)
INTO TABLE E_TRAILER
WHEN (1:7) = 'TRAILER'
(
ACTION_TYPE position (8:10) char,
EXTRACTION_DATE position (11:18) date "YYYYMMDD"
)

$ cat E2.ctl
LOAD DATA
INFILE 'E.DAT'
APPEND
INTO TABLE E
WHEN (1:6) != 'HEADER' and (1:7) != 'TRAILER'
(
EMP_ID position (1:10) char,
EMP_NAME position (11:60) char,
DEP_ID position (61:63) char
)

I'm trying to insert N records using sqlldr, all of them are rejected because of "ORA-02291: integrity constraint (E_FK) violated".
It's the expected behaviour, although I detect that when I use the E1.ctl, which includes "INSERT INTO TABLES E_HEADER AND E_TRAILER", the execution becomes so slow.

For example:
64000 records to insert.
sqlldr with the E1.ctl takes 1 minute.
sqlldr with the E2.ctl takes 44 minutes.

Could you help me to understand the situation ?

$ exec_loader.sh 64000

64000
Elapsed time was: 00:44:29.94
64000
Elapsed time was: 00:01:14.98

$ cat prepare_data.sql

drop table e;
drop table d;
drop table e_header;
drop table e_trailer;

create table d(dep_id number(3), dep_name varchar2(50));

alter table d add constraint d_pk primary key (dep_id);

create table e(emp_id number(10), emp_name varchar2(50), dep_id number(3));

alter table e add constraint e_pk primary key (emp_id);

alter table e add constraint e_fk foreign key (dep_id) references d(dep_id);

insert into d
select rownum, 'D' || lpad(rownum,49,0) from dual connect by level <= 100;

insert into e
select rownum, 'E' || lpad(rownum,49,0), trunc(dbms_random.value(1,100)) from dual connect by level <= 100000;

create table e_header (action_type varchar2(3), extraction_date date);

create table e_trailer(action_type varchar2(3), extraction_date date);

SET VERI OFF HEADING OFF FEEDBACK OFF ECHO OFF PAGESIZE 0

spool E.DAT

select *
from
(
select 'HEADER ' || 'INS' || to_char(sysdate,'YYYYMMDD') lin
from dual
union
select lpad(100000+rownum,10,' ') || 'E' || lpad(rownum,49,0) || lpad(trunc(100 + dbms_random.value(1,100)),3,' ') lin
from dual connect by level <= &1
union
select 'TRAILER' || 'INS' || to_char(sysdate,'YYYYMMDD') lin
from dual
)
order by decode(substr(lin,1,1),'H',1,'T',null,2);

exit

$ cat parfile.txt
rows=999999
silent=(header,feedback)
errors=99999999
bindsize=8192000
readsize=8192000

$ cat exec_loader.sh

NUM_ROWS=${1}
USERID="xxx/xxx@xxx"

sqlplus ${USERID} @prepare_data ${NUM_ROWS} > prepare_data.log

sqlldr userid=${USERID} parfile=parfile.txt bad=E1.bad log=E1.log control=E1.ctl

grep 'ORA-02291' E1.log | wc -l
grep 'Elapsed time was' E1.log

sqlldr userid=${USERID} parfile=parfile.txt bad=E2.bad log=E2.log control=E2.ctl

grep 'ORA-02291' E2.log | wc -l
grep 'Elapsed time was' E2.log

exit

and Connor said...

Can you clarify something - you say

"although I detect that when I use the E1.ctl...the execution becomes so slow"

yet underneath that you have:

"sqlldr with the E1.ctl takes 1 minute.
sqlldr with the E2.ctl takes 44 minutes."

So which one are you saying is slow ? E1 or E2 ?

============

Addenda: I dont think I need more info. Here's the issue

When you get database errors (not just in sqlldr, but any client), the database and client need to "reset", ie, get themselves back in sync so they can resume.

So I trace your sqlldr execution, you see this:

INSERT INTO E   (EMP_ID,EMP_NAME,DEP_ID) 
VALUES
 (:EMP_ID,:EMP_NAME,:DEP_ID)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse     3158      0.06       0.07          0          0          0           0
Execute   3158      2.44       2.32          0       6321      73837           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     6316      2.51       2.40          0       6321      73837           0

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 102  
Number of plan statistics captured: 3

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  E (cr=0 pr=0 pw=0 time=4 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net more data from client                3158        0.00          0.03
  SQL*Net break/reset to client                6314        0.02         29.19
  SQL*Net message to client                    3157        0.00          0.00
  SQL*Net message from client                  3157        0.01          2.61
  log file switch (checkpoint incomplete)         1        0.01          0.01
********************************************************************************

So all that time being burned up in "SQL*Net break/reset to client" - you are getting that each time you get the FK error.

You want to have less errors (eg load without FK's and validate afterwards) when it comes to bulk data loading.

Hope this helps.

Rating

(2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

marcelo loaiza, December 03, 2015 - 2:00 pm UTC

Sorry, I confused E1.ctl with E2.ctl. E1 takes 44 and E2 takes 1.

I understand the time is being burned up in "SQL*Net break/reset to client", but the FK error is present in both situations, with E1 or E2. I think the times should be similar, the only difference is E1 includes tables E_HEADER and E_TRAILER.

A reader, December 14, 2015 - 1:25 am UTC

Any news about my question ?

December 15, 2015 - 7:01 am UTC

Sorry - been at conferences the last couple of weeks.

If you look in the trace data for E2, you get the following break events

WAIT #380851960: nam='SQL*Net break/reset to client' ela= 1 driver id=1111838976 break?=1 p3=0 obj#=98551 tim=2699971720738
WAIT #380851960: nam='SQL*Net break/reset to client' ela= 28 driver id=1111838976 break?=0 p3=0 obj#=98551 tim=2699971720792

and that's typical throughout the trace file, ie, elapsed around 30us.

In the trace data for E1, the resets are more significant

WAIT #441107432: nam='SQL*Net break/reset to client' ela= 1 driver id=1111838976 break?=1 p3=0 obj#=-1 tim=2700111007820
WAIT #441107432: nam='SQL*Net break/reset to client' ela= 10991 driver id=1111838976 break?=0 p3=0 obj#=-1 tim=2700111018831

averaging around 10ms. My hypothesis is that there's more "cleanup" to do in the latter case.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

More to Explore

Performance

SQL LOADER APPEND INTO SEVERAL TABLES

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

More to Explore

Performance