You Asked
I have 2 ctl files :
$ cat E1.ctl
LOAD DATA
INFILE 'E.DAT'
APPEND
INTO TABLE E
WHEN (1:6) != 'HEADER' and (1:7) != 'TRAILER'
(
EMP_ID position (1:10) char,
EMP_NAME position (11:60) char,
DEP_ID position (61:63) char
)
INTO TABLE E_HEADER
WHEN (1:6) = 'HEADER'
(
ACTION_TYPE position (8:10) char,
EXTRACTION_DATE position (11:18) date "YYYYMMDD"
)
INTO TABLE E_TRAILER
WHEN (1:7) = 'TRAILER'
(
ACTION_TYPE position (8:10) char,
EXTRACTION_DATE position (11:18) date "YYYYMMDD"
)
$ cat E2.ctl
LOAD DATA
INFILE 'E.DAT'
APPEND
INTO TABLE E
WHEN (1:6) != 'HEADER' and (1:7) != 'TRAILER'
(
EMP_ID position (1:10) char,
EMP_NAME position (11:60) char,
DEP_ID position (61:63) char
)
I'm trying to insert N records using sqlldr, all of them are rejected because of "ORA-02291: integrity constraint (E_FK) violated".
It's the expected behaviour, although I detect that when I use the E1.ctl, which includes "INSERT INTO TABLES E_HEADER AND E_TRAILER", the execution becomes so slow.
For example:
64000 records to insert.
sqlldr with the E1.ctl takes 1 minute.
sqlldr with the E2.ctl takes 44 minutes.
Could you help me to understand the situation ?
$ exec_loader.sh 64000
64000
Elapsed time was: 00:44:29.94
64000
Elapsed time was: 00:01:14.98
$ cat prepare_data.sql
drop table e;
drop table d;
drop table e_header;
drop table e_trailer;
create table d(dep_id number(3), dep_name varchar2(50));
alter table d add constraint d_pk primary key (dep_id);
create table e(emp_id number(10), emp_name varchar2(50), dep_id number(3));
alter table e add constraint e_pk primary key (emp_id);
alter table e add constraint e_fk foreign key (dep_id) references d(dep_id);
insert into d
select rownum, 'D' || lpad(rownum,49,0) from dual connect by level <= 100;
insert into e
select rownum, 'E' || lpad(rownum,49,0), trunc(dbms_random.value(1,100)) from dual connect by level <= 100000;
create table e_header (action_type varchar2(3), extraction_date date);
create table e_trailer(action_type varchar2(3), extraction_date date);
SET VERI OFF HEADING OFF FEEDBACK OFF ECHO OFF PAGESIZE 0
spool E.DAT
select *
from
(
select 'HEADER ' || 'INS' || to_char(sysdate,'YYYYMMDD') lin
from dual
union
select lpad(100000+rownum,10,' ') || 'E' || lpad(rownum,49,0) || lpad(trunc(100 + dbms_random.value(1,100)),3,' ') lin
from dual connect by level <= &1
union
select 'TRAILER' || 'INS' || to_char(sysdate,'YYYYMMDD') lin
from dual
)
order by decode(substr(lin,1,1),'H',1,'T',null,2);
exit
$ cat parfile.txt
rows=999999
silent=(header,feedback)
errors=99999999
bindsize=8192000
readsize=8192000
$ cat exec_loader.sh
NUM_ROWS=${1}
USERID="xxx/xxx@xxx"
sqlplus ${USERID} @prepare_data ${NUM_ROWS} > prepare_data.log
sqlldr userid=${USERID} parfile=parfile.txt bad=E1.bad log=E1.log control=E1.ctl
grep 'ORA-02291' E1.log | wc -l
grep 'Elapsed time was' E1.log
sqlldr userid=${USERID} parfile=parfile.txt bad=E2.bad log=E2.log control=E2.ctl
grep 'ORA-02291' E2.log | wc -l
grep 'Elapsed time was' E2.log
exit
and Connor said...
Can you clarify something - you say
"although I detect that when I use the E1.ctl...the execution becomes so slow"
yet underneath that you have:
"sqlldr with the E1.ctl takes 1 minute.
sqlldr with the E2.ctl takes 44 minutes."
So which one are you saying is slow ? E1 or E2 ?
============
Addenda: I dont think I need more info. Here's the issue
When you get database errors (not just in sqlldr, but any client), the database and client need to "reset", ie, get themselves back in sync so they can resume.
So I trace your sqlldr execution, you see this:
INSERT INTO E (EMP_ID,EMP_NAME,DEP_ID)
VALUES
(:EMP_ID,:EMP_NAME,:DEP_ID)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3158 0.06 0.07 0 0 0 0
Execute 3158 2.44 2.32 0 6321 73837 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6316 2.51 2.40 0 6321 73837 0
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 102
Number of plan statistics captured: 3
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 LOAD TABLE CONVENTIONAL E (cr=0 pr=0 pw=0 time=4 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net more data from client 3158 0.00 0.03
SQL*Net break/reset to client 6314 0.02 29.19
SQL*Net message to client 3157 0.00 0.00
SQL*Net message from client 3157 0.01 2.61
log file switch (checkpoint incomplete) 1 0.01 0.01
********************************************************************************
So all that time being burned up in "SQL*Net break/reset to client" - you are getting that each time you get the FK error.
You want to have less errors (eg load without FK's and validate afterwards) when it comes to bulk data loading.
Hope this helps.
Rating
(2 ratings)
Is this answer out of date? If it is, please let us know via a Comment