Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Nag.

Asked: October 12, 2001 - 1:18 pm UTC

Answered by: Tom Kyte - Last updated: July 16, 2013 - 2:58 pm UTC

Category: Developer - Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

I.At what point we should start considering a direct path load , Iam expecting the answer to be a figure, or is it possible to answer this question so specifically (i.e. if the number of rows to be inserted is 10 million then we should consider using direct path load etc).

II.What is teh difference between a direct path load and sql loader.

Thank you


and we said...

I. Nothing is a number. There is no number. It depends on your requirements more then anything.

If you have 10 million rows to load but need to apply a sql function to the input data to convert it -- you won't be using a direct path load with sqlldr.

If you have 10 million rows to load but you need to allow users to perform DML (writes) on the data as you are loading. You won't be using direct path load.

If you have 1 million rows to load and you need to do it fast, and you meet the restrictions for direct path loads -- go for it (to see the restrictions, read the server utilities guide, there is a whole chapter on this topic).

Consider it anytime you have a large volume of data to load in a short period of time.

So, no number from me.

II. A direct path load is a FEATURE of SQLLDR. SQLLDR is a tool that can do a direct path load. Its hard to "compare" them as one is a feature or option of the other. See the server utilities guide for info on exactly what takes place during a direct path load.


and you rated our response

  (10 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

October 12, 2001 - 3:59 pm UTC

Reviewer: A reader

When I said direct path load , I meant the insert with the
/*+ append */ hint.

In some of your anserws you have referred to them as direct path load( or was it direct path insert).

Tom Kyte

Followup  

October 12, 2001 - 4:22 pm UTC

Well, they are very similar EXCEPT that the direct path insert can use SQL all of the way (making it much more flexible).

Both write blocks directly. Both bypass UNDO for the table. Both can bypass REDO as well.

In 9i, with the addition of external tables ( the ability to query a FLAT FILE with SQL) -- SQLLDR might be "moot". Using the external table and the insert /*+ append */ I can basically do a direct path load from file to database without running a command outside of the database.

direct path load and sql loader

July 22, 2002 - 3:35 pm UTC

Reviewer: $ameer from NJ ,USA

Tom

I have one confusion.

When you say SQLLDR in Direct path load 'can bypass REDO',
does this mean REDO is bypassed only if the table being imported has nologging option??
Or the direct path load always bypass REDO??

Also how to check if a particular statement is using REDO / UNDO ?

Thanks in Advance



Tom Kyte

Followup  

July 22, 2002 - 6:36 pm UTC

Depends on the mode of the database even -- archive log vs no archive log mode.

In no archive log, it'll bypass redo.
In archive log, it'll generate redo unless it is done unrecoverable (nologgin)

AUTOTRACE is very useful to see the redo generated by a statement.

May 08, 2003 - 2:10 pm UTC

Reviewer: A reader


PK

June 10, 2004 - 6:37 pm UTC

Reviewer: A reader

Tom,
I am using direct path loading and noticed the following:
1) For delimited file, if I want to rtrim something, then I cannot use direct path loading (cannot use POSITION..)
2) Direct path loading doesn't check Primary Key and loaded duplicates.

For 1), is there an option to get around it?
For 2), is this because it is writing directly to database blocks so doesn't check PK? If so, maybe this is why the PK index is also unusable after the load and needs to rebuild.

Thanks,

Tom Kyte

Followup  

June 10, 2004 - 8:18 pm UTC

1) in 9i you can use sql in direct path. what do you mean by you cannot use position?

2) it checks at the end, the constraints would be left in a direct path load state (well, the indexes would be).

Direct Path Loading

June 10, 2004 - 8:54 pm UTC

Reviewer: A reader

Tom,
Thanks for your help. I am in Ora817.
1) What I meant was I cannot use POSITION(a:b) because it is fix length plus a delimiter (I don't like that at all, but this is what I got). If it is fix-length, and after POSITION(a:b) the trailing nulls are automatically removed, but if I use (first_name char "rtrim(:first_name)"
then direct path loading is not allowed.
Here's a sample (DEA_NO, FIRST_NAME, LAST_NAME)
A45023239~John ~Smith ~

2) Do you mean I should set reenable =true so that the unusable indexes will be rebuilt after the direct path loading?

Thanks,


Tom Kyte

Followup  

June 10, 2004 - 9:25 pm UTC

1) if you use:

LOAD DATA
INFILE *
INTO TABLE t
FIELDS TERMINATED BY '~'
trailing nullcols
( a position(1:9),
b position(11:28),
c position(30:48) )
BEGINDATA
A45023239~John ~Smith ~


then whitespace is automagically rtrimmed and no sql functions are needed?

who cares if there is a delimiter -- it is fixed width, that is all we need.


2) yes.....

Prevent Duplicated Record to Be Loaded

June 11, 2004 - 12:53 am UTC

Reviewer: A reader

Thanks Tom,
You solution works great, you are the best!
By the way, is there a way to prevent duplicate records from being loaded in the direct path mode?

Thanks,

Tom Kyte

Followup  

June 11, 2004 - 8:02 am UTC

not in direct path, the data loaded isn't really "visible". it would be detected at the end when the index structures are merged.

Append

September 20, 2004 - 12:20 am UTC

Reviewer: Neeraj Bindal from INDIA

Good Concepts

Append

September 20, 2004 - 12:22 am UTC

Reviewer: Neeraj Bindal from India

I have following doubts :
1.) I have one table with Following Structure :

CREATE TABLE TEST_DATA(TEST1 VARCHAR2(500),TEST2 VARCHAR2(500),TEST3
VARCHAR2(500),TEST4 VARCHAR2(500),TEST5 VARCHAR2(500),TEST6
VARCHAR2(500))NOLOGGING NOCACHE NOPARALLEL;

I am using the following Insert statement :

insert /* + APPEND */ into TEST_DATA VALUES ('AA','AA','AA','AA','AA','AA');

COMMIT;

This insert statement is not using APPEND hint .

2.) If there is an Index on the Table then Append Hint/Direct Load
drastically increases the Redo Log. ????

3.) Implication of Append Hint/Direct Load is that, No
Insert/Update/Select can be done by any successive Statement and also by any
other Session till the
Rollback/Commit is done ????????

Please help me in understanding these queries.

With Rgds..

Neeraj Bindal


Tom Kyte

Followup  

September 20, 2004 - 8:07 am UTC

it is not well formed.  you have a comment in there, nothing more.

ops$tkyte@ORA9IR2> create table t ( x int );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert /* + append */ into t select rownum from dual;
 
1 row created.
 
ops$tkyte@ORA9IR2> select * from t;
 
         X
----------
         1
 
ops$tkyte@ORA9IR2> insert /*+ append */ into t select rownum from dual;
 
1 row created.
 
ops$tkyte@ORA9IR2> select * from t;
select * from t
              *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

<b>see, with "space", not a hint, with space -- hint</b>


but also:
http://docs.oracle.com/docs/cd/B10501_01/server.920/a96524/c21dlins.htm#10657

it doesn't even make sense to do a single row "append" (you would effectively serialize all access this table, put 1 row per block in this table - it would be really utterly horrible.  good thing we didn't do it)



2) no, where do you see that being said?  in fact, a LARGE bulk insert using append will significantly decrease the redo since:

a) undo is not generated for the table
b) the amount of redo generated for index maintainance is reduced to its minimum because of the large bulk merge that takes place, instead of row by slow row.
c) in noarchivelog mode, no redo for the table would be generated and 
d) in archivelog mode, if the table is nologging, c) would be true as well


so, no, it won't dramatically increase it, quite the opposite

3) in that transaction -- correct.

Read the entire chapter I've pointed you to above.  All is explained there.



 

Multithreading not working with direct path load

July 08, 2013 - 9:36 pm UTC

Reviewer: Lalit Kumar B from Hyderabad, India

Oracle DB version : Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

I am using direct path load to load data from a flat file into a table using SQL*Loader. I have also kept it as parallel. However, I do not see multithreading being used at all, based on the log file report.

I am using the following parameters set to true in the sqlldr:-
parallel=true , multithreading=true , skip_index_maintenance=true


Output in the sqlldr log:-
Path used: Direct 
Insert option in effect for this table: APPEND 
Trigger DEV."R_TM_BK_BORROWER" was disabled before the load. 
DEV."R_TM_BK_BORROWER" was re-enabled. 
The following index(es) on table "YO"."TM_BK_BORROWER" were processed: 
index DEV.I_NK_TM_BORR_1 loaded successfully with 1554238 keys 
index DEV.I_NK_TM_BORR_2 loaded successfully with 1554238 keys 
index DEV.I_NK_TM_BORR_3 loaded successfully with 1554238 keys 
index DEV.I_NK_TM_BORR_31 loaded successfully with 1554238 keys 

Bind array size not used in direct path. 
Column array rows : 5000 
Stream buffer bytes: 256000 
Read buffer bytes: 1048576 

Total logical records skipped: 1 
Total logical records read: 1554241 
Total logical records rejected: 48 
Total logical records discarded: 2 
Total stream buffers loaded by SQL*Loader main thread: 7695 
Total stream buffers loaded by SQL*Loader load thread: 0 


So, I still see in the sqlldr log that the stream buffers are loaded by main thread and load thread is still not being used.
SQL*Loader load thread did not offload the SQL*Loader main thread. If the load thread takes care of the current stream buffers, then it allows the main thread to build the next stream buffer while the load thread loads the current stream on the server. We have a 24 CPU server.

I am not able to find any clue over Google too. Any help is appreciated.
Tom Kyte

Followup  

July 16, 2013 - 2:58 pm UTC

that was multithreaded - it was just done as optimally as possible!

You could make it look different by de-optimizing the load.

from the doc:

http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_modes.htm#i1008033

The goal is to have the load thread perform as many stream buffer loads as possible. This can be accomplished by increasing the number of column array rows, decreasing the stream buffer size, or both. You can monitor the elapsed time in the SQL*Loader log file to determine whether your changes are having the desired effect. See "Specifying the Number of Column Array Rows and Size of Stream Buffers" for more information.


so, for example, if I loaded up 3,000,000 emp records using the defaults:

$ sqlldr / emp direct=true


My log shows:

Bind array size not used in direct path.
Column array  rows :    5000
Stream buffer bytes:  256000
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:       2948274
Total logical records rejected:         0
Total logical records discarded:        0
Total stream buffers loaded by SQL*Loader main thread:      608
Total stream buffers loaded by SQL*Loader load thread:        0

Run began on Tue Jul 16 10:49:37 2013
Run ended on Tue Jul 16 10:50:07 2013

Elapsed time was:     00:00:30.42
CPU time was:         00:00:02.55



if I de-optimize it:

$ sqlldr / emp direct=true multithreading=true streamsize=100



i find:

Bind array size not used in direct path.
Column array  rows :    5000
Stream buffer bytes:   65536
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:       2948274
Total logical records rejected:         0
Total logical records discarded:        0
Total stream buffers loaded by SQL*Loader main thread:      608
Total stream buffers loaded by SQL*Loader load thread:     1701

Run began on Tue Jul 16 10:50:52 2013
Run ended on Tue Jul 16 10:51:24 2013

Elapsed time was:     00:00:31.62
CPU time was:         00:00:02.75




play around with those parameters - but it seems you already have an optimmized load.

That clarifies my question.

July 21, 2013 - 11:39 am UTC

Reviewer: Lalit Kumar B from Hyderabad, India

Thanks Tom, the explanation is just again saying the same golden words in Oracle "It depends".

1. I followed the doc as a ROT(rule of thumb) that "The goal is to have the load thread perform as many stream buffer loads as possible." At first look, the log shows no load threads are being used, however, your explanation proves it is already multithreaded and well optimized. De-optimizing brings in the load thread, surprising.

2. The doc also says, to optimize performance and minimize time "Specify the number of column array rows and the size of the stream buffer". So more the size of STREAMSIZE in bytes, the more we optimize it. It is also proved by your example, if we look at the Elapsed time in both cases.

If we look at Point 1 vs 2, doesn't it sound contradictory? The goal is to let the load thread perform more loads, but to achieve this we have to de-optimize the streamsize. So once again "It depends". Optimal solution in my case is to stick to point 1 because my goal is to have least elapsed time.

Thanks for your help Tom!