Skip to Main Content
  • Questions
  • Improve performance while loading data

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Pawan.

Asked: February 02, 2002 - 10:23 pm UTC

Last updated: August 09, 2005 - 10:42 am UTC

Version: 8.1.7

Viewed 1000+ times

You Asked

Tom,
I am presently working on a Data Warehouse project which is using Informatica for mapping and loading data into an Oracle database (8.1.7 on HP UX 11.0). The data is in flat files and some of the hundreds of tables in the Warehouse are HUGE (300-500 million records). The designers have not partioned the base tables into which this data is being loaded. Their contention is that since we are using SAN ( storage Area Network) that has multiple disks we don't need to partion the tables. I am not sure if this is right. Also they have primary keys on the tables. My question is - what is the best way to speed up the loading process? What init.ora parameters need to be set? If they agree to partition then whether it is better to partion by range or by hash? Is it possible to use parallel loading of a table if yes then How?
When I question I am told that since they are using Informatica we will have this kind of performance. I don't understand why one would use a 1/2 million package and have such horrible performance. Please let me know in yr opinion what is the most optimum method of loading such huge amount of data.

Thanks in advance.

and Tom said...

The designers of this system are not database guys are they. Well, that is a rhetorical question -- they aren't.

Partitioning provides many things -- partition elimination can make your queries go faster in a data warehouse. many times faster.

Partitioning allows for easy aging of data, rolling windows. Something virtually every data warehouse does.

Partitioning allows for higher availability when that soft corruption kicks in and it will. better to have a single, small partition offline to be rebuilt then the entire table.

Partitioning eases adminstrative burdens. It is a fact that performing operations on small objects is inherently easier, faster and less resource intensive then performing the same operations on a large object.

Partitioning makes mass updates -- something you might need to do in a data warehouse, might not -- faster. You can use parallel DML, but only if you partition.

And the list goes on.

We've had this "multiple disk" thing from day 1 in databases. There has ALWAYS been more then one disk. Enter raid striping -- a SANS didn't do anything new here. Your designers haven't worked on large data warehouses -- or if they have, they've never used partitioning and don't understand it.



Loading a huge amount of data -- i would be using sqlldr in parallel direct path mode. I would be using parallel index builds with no logging. It would not take very long at all.



Rating

  (14 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Just a quick addition, if I may please

Ravi, February 03, 2002 - 1:46 pm UTC

Hi Tom, If I may add to what you said:
What about disabling constraints and dropping indexes before loading so much amount of data?? And, that way, if you have the comfort of rebuilding the index during database inactive time (like nights), you can reduce time a lot.

And how about altering table to 'NOLOGGING' before doing dataload because we have the data in ascii files anyways.

Just a quick addition: Parallel DML can also be done on a non-partitioned table, right?? The only difference is it will add above the HWM. Right? Please correct me if I'm wrong.


Tom Kyte
February 03, 2002 - 2:03 pm UTC

yup -- parallel direct path loads without logging will get you all that as well. The indexes will be usuable and have to be rebuilt.


PDML (parallel updates/deletes) need partitioning. Parallel INSERT INTO SELECT * FROM does not.


Improve performance while loading data

Tony, February 03, 2002 - 7:19 pm UTC

Pawan,
I just joined an organisation which is almost having the same env., as you mentioned in your scenario(Informatica/ Oracle 8.1.6/HP-UX 11.0). This site was also not using partitioning for their FACT tables. I can really understand the people's environment.
In order to change the old believes (when partitioning was not available), you have to convince them, may be by simulating in some test environment. I had to do the same, once "THE REAL BENEFITS OF PARTITIONING" (as mentioned by Tom) are proved, the resistance/belief will be change very quickly.
Just to give an idea, with partitioning some of the queries response time has improved by even 50%, and even loading time reduced by 50%.
Also keep in mind the benefits of Local and Bitmap indexes. All the best!


Thanks a lot

Pawan, February 05, 2002 - 4:09 pm UTC

Thanks Tom and Ravi,
You people have vindicated whatever I hve been saying but the million dollar question should I take the Boss headlong and show her this answer and pack my bags.

Thanks again

Partitioning Rules

Johan Snyman, February 07, 2002 - 9:51 am UTC

Another HUGE benefit of partitioning is that one can load new data into a table with the same structure as the partitioned table, but without any indexes.

Then you create all the indexes the partitioned table have on the newly loaded table, add an empty partition to the partitioned table, swap the newly loaded table with the empty partition (within a few seconds) and voila - you have added millions of indexed rows in no (or very little) time at all.

It gives you the best of both worlds : loading without index maintenance AND only indexing the newly loaded data.

A reader, March 01, 2004 - 8:08 am UTC

Tom,
I am presently working on a Data Warehouse project which is using
informatica for mapping and loading data into an Oracle database (8.1.7 on windows 2000 and 4 cpu 2.40 ghz and i gb ram).There data is in some relational tables and some of hunderds of huge table in siebel datawarehousing .When loading 43 lakhs of data into siebel datawaregousing it takes huge amount of time even disable all constraints and drop indexes.Even if rebuild single index it takes 8 hours to rebuild .We are not using any partioning any table in datawarehousing .Please suggest what is the best way of loading such huge data

Thanks in advance
MRao

Tom Kyte
March 01, 2004 - 8:57 am UTC

one cannot really comment on that.

1gig of ram is "sort of small", but you would have to identify at least what your bottlenecks are - your major issues. (statspack might be a good place to start). I can say that I know informatica to be infamously "row by row" which is the slow way to do things.

mohan, March 02, 2004 - 1:18 am UTC

Tom
I agree ram is small .If use statspac for identifying bottleneck which one i should focus i mean top 5 wait events.

Thnaks in advance
Mrao

Tom Kyte
March 02, 2004 - 7:38 am UTC

the biggest one.

David Aldridge, July 13, 2005 - 12:39 pm UTC

quote -> 'I can say that I know informatica to be infamously "row by row" which is the slow way to do things.'

That certainly used to be the case. Bulk binding was late to the party with Informatica, and in order to get a direct path insert (ie. nologging, and compression on insert, etc) we've always used its reasonably well-behaved interface to SQL*Loader.

However I understand that from version 7.1.2 Informatica supports direct path inserts natively. I haven't confirmed that myself (we're currently 7.1.1), but if so then it's cause for celebration. I'll post an update when I find out for sure.

Tom Kyte
July 13, 2005 - 1:10 pm UTC

thanks!

David Aldridge, July 20, 2005 - 5:22 pm UTC

Just tested Informatica 7.1.3, and target load types of "Bulk" do indeed use direct path insert.

The world's smallest hooray!

Tom Kyte
July 21, 2005 - 7:50 am UTC

Thanks for the followup, appreciate it.

Ordering a table

Biju C, August 08, 2005 - 5:47 pm UTC

My question is in the same line of loading huge data. I have a staging table to which I load data from flat File.My staging table does not have any unique keys so I cannot have IOT. The issue is the data is not ordered the same way as I use the staging table
Flat File is organized by
Region -> Product -> Period
where as I load from Staging to Data tables Period by Period
So the block reads are really high when picking up each period. Is there a way that I can sort this data in 9i. If not do you think it make sense to have such an option.



Tom Kyte
August 08, 2005 - 8:38 pm UTC

how do you load.

Ordering a table

Biju C, August 09, 2005 - 9:15 am UTC

We use SQLLDR to load into the table.

Ordering the table

A reader, August 09, 2005 - 9:18 am UTC

I thought I mentioned this before, but we load from a Windows server and the Sort.exe was not able to sort the file properly. (File size restrictions in DOS?) . We do not use AIX to load, and if so I would have sorted it in AIX. I=

Ordering a table

Biju C, August 09, 2005 - 9:53 am UTC

Sorry for the previous 2 comments. Please ignore.

We use SQLLDR to load into the table. I thought I mentioned this before, but we load from a Windows server and the
Sort.exe was not able to sort the file properly. (File size restrictions in DOS?) . We do not use AIX to load, and if so I would have sorted it in AIX.

Tom Kyte
August 09, 2005 - 10:42 am UTC

don't use SQLLDR, use an external table and then you can use ORDER BY in a create table as select or insert into.

External Tables

Biju C, August 11, 2005 - 11:09 am UTC

Thanks for your input in this case and all others. Also taking this oppurtunity to say I really like the way you organized your book Expert One-on-One and recommend it to everyone who wants to be good at Oracle

Quick data loading

Srinivas Akula, April 20, 2007 - 6:36 am UTC

hi Tom,

The above answer for our situation is some what similar. I have to load the data from a view into a table(of similar structure) based on some column(like project id). I created a batch table (used to store the batches) which store the batches with start and end numbers(project ids). we are processing the batches one by one using the pl/sql code. for 8 million records it is taking nearly more than 8 hours. can we speed up this process? any useful tip to faster this process? we are removing the indexes on the table while inserting and rebuilding them after wards? any existing solution will match our requirements? Please let me know immediately.. Thanks in advance

bye
Srinivas A

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.