Skip to Main Content
  • Questions
  • High volume delete, insert and updates with over 600 Million records - associated and obvious concerns

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Suresh V.

Asked: September 29, 2016 - 3:09 pm UTC

Last updated: October 01, 2016 - 3:59 am UTC

Version: Oracle 11gR2, hosted in Unix AIX 5

Viewed 1000+ times

You Asked

What's the best way to handle the following (feeling guilty for the tremendous lack of brevity ):

Activities:
1. Client loads 10 files into separate tables daily. Loading starts at 4 AM and finishes by 9 AM.
2. Each of these 10 tables are partitioned on file landing date.
3. Volume stored per partition in each table is approx. 5 GB (6 million records).
4. Data retention set to keep latest 2 dumps.
5. No indexing in any of these tables.
6. Next, we have a staging table X which is partitioned on logging date (since all summaries and reports are based on logging date).
7. X can retain last 6 month's records.
8. X is locally indexed on file type (each file type against each of the 10 tables).
9. X is compressed using Oracle basic compression.
10. There is a procedure written to load data from each of the 10 tables into the table X, daily basis.
11. The procedure is scheduled to run from 9.30 AM daily.
12. In the very first run, the procedure picks up the 10 tables one after another and starts loading its data into X [Processing works serially, i.e., steps 12, 13 and 14 are completed for one file type, then the next file type is considered likewise till all 10 are processed].
13. Analytical functions, decodes and ranking are all applied during each file load (Bulk Load, 5000 fetch limit).
14. From the next runs, the procedure identifies if a new dump with more recent landing date is received, if yes, data for the previous dump is DELETED from X using file type (Iterated for each partition, INDEX FULL SCAN, Parallel 8 with NOLOGGING applied on table).
15. Once all 10 files are loaded, the rows are compared with another table (approx. 200 GB in size, partition schema similar to X) and a flag is updated in X. This update is fired partition-wise for performance and less redo generation.
16. Next, the rows in X are re-compared with a separate table (approx. 70 GB in size, partition key similar to X but retention of 45 days) and the flag is updated with a different value in X.
17. At the end of each day, X will contain approx. 200 million records, total size being 200 GB but compressed). Average size of each partition in X is approx. 2 GB, non uniform, 20% being > 2 GB, 40% being < 2 GB but > 1 GB, rest changing).
18. Logging is performed partition-wise to track count and apparent performance for later.

Challenges:
1. No stats collected post deletion/insertion/updates. Time to analyze compressed partition is a supposed concern.
2. NOLOGGING seems to have no affect as Archivelog mode is enabled. Redo has become a trouble to maintain.
3. Step 13 gets stuck when there's a peak time in database. When stuck it runs for 2 hours to complete for each file. If not stuck, takes 5 minutes per file.
4. Step 14 takes on average 2 hours per file type.
5. Step 15 works within 5 minutes per partition per file type but for partitions with > 2 GB data takes 1 hour.
6. Step 16 performance is more or less similar to Step 15.
7. The entire process takes on average 20 hours to complete, while some days it gets pushed to 23-25 hours as well.
8. The performance impacts collateral running processes and puts the Memory and CPU utilization > 80 to sometimes 99%.
9. Long running processes causing SNAPSHOT TOO OLD error, correct UNDO has become hard to define.

System Information
Tablespace is 300 GB, Index tablespace is 200 GB, free tablespace = 35 GB, No. of CPUs = 16, CPU Cores = 8, RAM = 64 GB.
Idle Time 1417981896, Busy Time 2479534845, User Time 2342396842, SYS Time 114377768, IO Wait Time 743065425.
Average no. of processes running is 9.



and Connor said...

"feeling guilty for the tremendous lack of brevity" is the least of our problems. I'd much rather have too much detail than too little.

How many file types do you have ? If its a moderate amount, perhaps look at subpartitioning by filetype ? My thoughts there is you might be able to do away with the indexes on file type, and just do (sub)partition scans.

When it comes to large data, "update" and "delete" are not your friend. Explore opportunites to copy data (and exchange partition).

eg update T set X = new_value

can be replaced with

create table TMP as select a,b,c,d, new_value X from T

followed by a rename or alter-exchange-partition

Hope this helps.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Agree

Ghassan, October 01, 2016 - 9:41 am UTC

Yes Connor 's suggestion is good approach but beware local indexes size increase you should review some of them for pertinence and...

I Guess you're on exadata? ? If no you have to in this configuration. .

Partion your x by internal 1 day and subpartition by list of file type.
Do do delete rows so you can drop or truncate subpartition before the current inserts set. The one should be done via an exchange subpartition.

Use incremental stats or manage them so the stats are one new built partitions as the six months old ones haven't changed . Just global stats may be done say each week or month unless using the stats copy since the average size file for a given type remain approx comparable. .

You could also consider a solution could be 10 staging tables for each file type ??..

Don't know what consist the steps of comparison in 15 and 16? Maybe could be performed outside the X? ..

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.