Skip to Main Content
  • Questions
  • loading millions of records - speed/scalability

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: December 15, 2000 - 3:16 pm UTC

Last updated: May 12, 2003 - 7:02 pm UTC

Version: 8.1.5

Viewed 1000+ times

You Asked

hi tom:

I wanted to know whether there was any other way, faster than SQLLOADER (Direct & Parallel) to load data into Oracle tables.
I need to come up with a strategy to load literally 100's of millions of records into 8-10 field tables in batch as quickly as possible. I am aware of strategies like splitting data files, loading them with concurrent processes etc.

However, are there 3rd party tools that have strategies/methodologies to load this data faster than the native Oracle utility of SQLLOADER.

I realize this is a very generic question without much detail, but if you could throw some suggestions I would greatly appreciate it.

Thanks in advance !




and Tom said...

I would say -- no, there is nothing I am aware of that is faster then SQLLDR in parallel direct path mode. Since it writes database blocks directly, without going through the database engine, it will be faster then other 3rd party tools which do not have the ability to write directly to the datafiles.

If you are concerned about downtime on a live system or impacting a live system -- you could investigate transportable tablespaces. These would let you load onto a "non-live" system and then detach the already formatted datafiles (including indexes) and attach them to an existing instance in very little time.

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, November 27, 2002 - 11:55 am UTC

Would you prefer using parallel within the sqloader or running multiple parallel streams of sqlldr on the o/s level? If we have a DOP set on the table level, will sqlldr automatically invoke parallel load?

We are discussing on creating load scripts for our datawarehouse. Some of the DBAs say, we should not use parallel caluse at all, instead run parallel streams of sqlldr. We have 36 partitions to load and each of them will be loaded from a seperate datafile. So what would be the best way to load them, we are planning on doing 36 parallel streams of sqlldr (without using parallel on sqlldr command line). Thanks for your valuable suggestions.



Tom Kyte
November 27, 2002 - 12:02 pm UTC

External tables with CTAS or direct path inserts can (and do) use the DOP in 9i and above.

SQLLDR, as a command line scripting tool, would have to be SCRIPTED in order to do parallel loads. You figure out how many, you figure out how to start them all up, you split up the inputs.

if you are loading into 36 segments (partitions) and you don't want to parallel load into each segment, don't use parallel. sounds like you don't want parallel in your case, you'll parallelize over segments instead of within a segment which is just dandy.

A reader, November 27, 2002 - 1:00 pm UTC

Sqlldr during OLTP

Sandra, December 17, 2002 - 1:23 pm UTC

Hi Tom,

I have a large table (around 3 GB) with OLTP users using its indexes for adequate performance. Without these indexes, these queries take far too long.

However, at 1:00 am daily, I have to load (sqlldr) the records for that day -- a total of 40 thousand rows.

Up to row 20.000 load time is ok, but after that it gets really slow. (Why does it happen ?)

I thought of having the indexes dropped during the load, but then OLTP users would time out via Web.

What would be a common solution for that concurrent load vs oltp situation ?

We are using 9i. I thought of list-partitioning the table into 12 months, so we would have local indexes, which would be less complex (smaller) and load time would be less affected. What do you think of that ? Is it a good workaround ? Can you give me some reference or article on that ?

Thank you, Tom.

Tom Kyte
December 18, 2002 - 10:52 am UTC

look in your alert log and see if you see any "cannot allocate new log" messages.

40k rows is nothing - your logs are probably too small (checkpoint not complete, cannot allocate new log) or you flooded arch with more then then he can do (archival required, cannot allocate new log) and need more log files to get your through this busy period.


a load is just like a really fast typist in this case -- 40k rows is nothing. I seriously doubt it is your indexes here - look at the alert.

So there is a BAD file....but what about records successfully loaded ?

Robert, May 12, 2003 - 4:51 pm UTC

Tom,
So the BAD file tells me what records failed the load....
But is there a way I can retrieve/output the records that are just successfully loaded ?
Thanks

Tom Kyte
May 12, 2003 - 7:02 pm UTC

unless you "tag them" somehow (say with sysdate or whatever) -- no, not really, they are just rows in a table.