You Asked

Thanks in advance for the support.I have a few questions:

1. I will have to load 7 years worth of history data with 128 billion rows into a new Fact Table, which also will have inserts as well as updates for the last 90 days (98% inserts and 2% updates on daily basis). What should we use? Interval or range partitioning (day granularity)?

2. Can I create a History Fact with all the read only data and keep the current year data in another fact?
The possible columns for the table are date, txn id, quality_id, interval id, meter number, read value, create date, modified date.

3.What would be best index strategy on such a huge table? We will need a primary key as well as additional indexes for quick retrieval for at least 60 days of data.

The primary key will be date, trans id, interval id, meter number. We might also need indexes on meter number, date + meter number as well as modified date.

4. We have a very short window of time to load the above 128 billion rows. Data comes from 3 different Oracle databases (5 years data need unpivot function and remaining 2 years use some transformation and processing. We are currently using Informatica. What would be best strategy to load data?

and Connor said...

1. I will have to load 7 years worth of history data with 128 billion rows into a new Fact Table, which also will have inserts as well as updates for the last 90 days (98% inserts and 2% updates on daily basis). What should we use? Interval or range partitioning (day granularity)?

Interval seems an obvious choice, because it gives you all the benefits of range and none of the downsides. Granularity really is up to your business requirements. 7 x 365 days is starting to be a lot of partitions - but you could might to consider a hybrid model. See my blog post on that concept

https://connor-mcdonald.com/2018/07/25/hyper-partitioned-index-avoidance-thingamajig/

2. Can I create a History Fact with all the read only data and keep the current year data in another fact?

I don't think you need to do this. A single table can have a mix of read-only and read-write data, which you control at the partition level. So a second table does not seem necessary.

3.What would be best index strategy on such a huge table? We will need a primary key as well as additional indexes for quick retrieval for at least 60 days of data.

Generally the less indexes the better if you want to load data fast. You have the date in the primary key, so you can probably make that a local index which can be beneficial for partition maintenance operations. You might consider sub-partitioning by meter-number which *might* obviate the need for explicit indexes associated with that column.

4. We have a very short window of time to load the above 128 billion rows. Data comes from 3 different Oracle databases (5 years data need unpivot function and remaining 2 years use some transformation and processing. We are currently using Informatica. What would be best strategy to load data?

Exchange partition is probably your friend here, because it means you can

- load data into a empty unindexed table (very fast)
- then index it
- then exchange it into the true fact table.

This is much more efficient than loading into an already indexed table.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

Administration

Oracle Fact Table creation and loading strategy

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

Administration