I have a around 2 million assets that send 15 min interval data for a given day (i.e) total of 96 intervals for a given day. What is best database modeling approach. My target RDBMS is Oracle.
I'm partitioning by week and for each interval I need to store the interval type (INT_TYPE) (i.e) actual or estimated etc.,
Option--1 will be way too deep in terms of data (i.e) it will have around 60 million records . Option--2 can minimize the volume (i.e) it will have around 2 million records, but how do i model the column interval-type in Option--2? such that we can capture the interval type??
--OPTION 1
CREATE TABLE TRANS.ASSET_DAILY_INT(
INT_READ_DT DATE NOT NULL,
ASSET_ID VARCHAR2(30) NOT NULL,
MINUTES NUMBER (2,0) NOT NULL,
INT_VALUE FLOAT(126) NOT NULL,
INT_TYPE VARCHAR2(30) NOT NULL,
ASSET_TYPE NUMBER(38,0),
INT_SUM_DAY_USG FLOAT(126),
HIGHEST_INT_DAY_USG FLOAT(126),
CREATED_DT DATE,
MODIFIED_DT DATE,
)
TABLESPACE TRANS
PARALLEL(DEGREE 4 INSTANCES 1)
PARTITION BY RANGE(MDM_READ_DT)
INTERVAL (NUMTODSINTERVAL(7,'DAY'))
(PARTITION p_first VALUES LESS THAN ('01-JAN-2010'));
----OPTION 2 how do i model the column interval-type in Option--2? such that we can capture the interval type??
CREATE TABLE TRANS.ASSET_DAILY_INT(
INT_READ_DT DATE NOT NULL,
ASSET_ID VARCHAR2(30) NOT NULL,
INT_VALUE_0000 FLOAT(126),
INT_VALUE_0015 FLOAT(126),
INT_VALUE_0030 FLOAT(126),
INT_VALUE_0045 FLOAT(126),
INT_VALUE_0100 FLOAT(126),
INT_VALUE_0115 FLOAT(126),
INT_VALUE_0130 FLOAT(126),
INT_VALUE_0145 FLOAT(126),
INT_VALUE_0200 FLOAT(126),
INT_VALUE_0215 FLOAT(126),
INT_VALUE_0230 FLOAT(126),
INT_VALUE_0245 FLOAT(126),
INT_VALUE_0300 FLOAT(126),
INT_VALUE_0315 FLOAT(126),
INT_VALUE_0330 FLOAT(126),
INT_VALUE_0345 FLOAT(126),
INT_VALUE_0400 FLOAT(126),
INT_VALUE_0415 FLOAT(126),
INT_VALUE_0430 FLOAT(126),
INT_VALUE_0445 FLOAT(126),
INT_VALUE_0500 FLOAT(126),
INT_VALUE_0515 FLOAT(126),
INT_VALUE_0530 FLOAT(126),
INT_VALUE_0545 FLOAT(126),
INT_VALUE_0600 FLOAT(126),
INT_VALUE_0615 FLOAT(126),
INT_VALUE_0630 FLOAT(126),
INT_VALUE_0645 FLOAT(126),
INT_VALUE_0700 FLOAT(126),
INT_VALUE_0715 FLOAT(126),
INT_VALUE_0730 FLOAT(126),
INT_VALUE_0745 FLOAT(126),
INT_VALUE_0800 FLOAT(126),
INT_VALUE_0815 FLOAT(126),
INT_VALUE_0830 FLOAT(126),
INT_VALUE_0845 FLOAT(126),
INT_VALUE_0900 FLOAT(126),
INT_VALUE_0915 FLOAT(126),
INT_VALUE_0930 FLOAT(126),
INT_VALUE_0945 FLOAT(126),
INT_VALUE_1000 FLOAT(126),
INT_VALUE_1015 FLOAT(126),
INT_VALUE_1030 FLOAT(126),
INT_VALUE_1045 FLOAT(126),
INT_VALUE_1100 FLOAT(126),
INT_VALUE_1115 FLOAT(126),
INT_VALUE_1130 FLOAT(126),
INT_VALUE_1145 FLOAT(126),
INT_VALUE_1200 FLOAT(126),
INT_VALUE_1215 FLOAT(126),
INT_VALUE_1230 FLOAT(126),
INT_VALUE_1245 FLOAT(126),
INT_VALUE_1300 FLOAT(126),
INT_VALUE_1315 FLOAT(126),
INT_VALUE_1330 FLOAT(126),
INT_VALUE_1345 FLOAT(126),
INT_VALUE_1400 FLOAT(126),
INT_VALUE_1415 FLOAT(126),
INT_VALUE_1430 FLOAT(126),
INT_VALUE_1445 FLOAT(126),
INT_VALUE_1500 FLOAT(126),
INT_VALUE_1515 FLOAT(126),
INT_VALUE_1530 FLOAT(126),
INT_VALUE_1545 FLOAT(126),
INT_VALUE_1600 FLOAT(126),
INT_VALUE_1615 FLOAT(126),
INT_VALUE_1630 FLOAT(126),
INT_VALUE_1645 FLOAT(126),
INT_VALUE_1700 FLOAT(126),
INT_VALUE_1715 FLOAT(126),
INT_VALUE_1730 FLOAT(126),
INT_VALUE_1745 FLOAT(126),
INT_VALUE_1800 FLOAT(126),
INT_VALUE_1815 FLOAT(126),
INT_VALUE_1830 FLOAT(126),
INT_VALUE_1845 FLOAT(126),
INT_VALUE_1900 FLOAT(126),
INT_VALUE_1915 FLOAT(126),
INT_VALUE_1930 FLOAT(126),
INT_VALUE_1945 FLOAT(126),
INT_VALUE_2000 FLOAT(126),
INT_VALUE_2015 FLOAT(126),
INT_VALUE_2030 FLOAT(126),
INT_VALUE_2045 FLOAT(126),
INT_VALUE_2100 FLOAT(126),
INT_VALUE_2115 FLOAT(126),
INT_VALUE_2130 FLOAT(126),
INT_VALUE_2145 FLOAT(126),
INT_VALUE_2200 FLOAT(126),
INT_VALUE_2215 FLOAT(126),
INT_VALUE_2230 FLOAT(126),
INT_VALUE_2245 FLOAT(126),
INT_VALUE_2300 FLOAT(126),
INT_VALUE_2315 FLOAT(126),
INT_VALUE_2330 FLOAT(126),
INT_VALUE_2345 FLOAT(126),
ASSET_TYPE NUMBER(38,0),
INT_SUM_DAY_USG FLOAT(126),
HIGHEST_INT_DAY_USG FLOAT(126),
CREATED_DT DATE,
MODIFIED_DT DATE,
)
TABLESPACE TRANS
PARALLEL(DEGREE 4 INSTANCES 1)
PARTITION BY RANGE(MDM_READ_DT)
INTERVAL (NUMTODSINTERVAL(7,'DAY'))
(PARTITION p_first VALUES LESS THAN ('01-JAN-2010'));
So for each 15 minute time period each day, you need to record an actual value, estimated value or both?
If so, your option 2 itself has two sub-options. Add an interval type column as you've done with option 1 so you have two rows for each day: actuals and estimated. Or have two columns for each time period, actual values and estimated, e.g.:
create table asset_daily_int (
int_read_dt date not null,
asset_id varchar2( 30 ) not null,
int_type varchar2( 30 ) not null, --actual/esimated
int_value_0000 float( 126 ),
int_value_0015 float( 126 ),
int_value_0030 float( 126 ),
...
);
create table asset_daily_int (
int_read_dt date not null,
asset_id varchar2( 30 ) not null,
int_act_value_0000 float( 126 ),
int_est_value_0000 float( 126 ),
int_act_value_0015 float( 126 ),
int_est_value_0015 float( 126 ),
int_act_value_0030 float( 126 ),
int_est_value_0030 float( 126 ),
...
);
The question of which of these methods is "best" depends on your requirements and how you plan on querying the tables.
Do you "always" query an asset getting all the intervals for a day and display them as columns? If so one of the option 2 variants is probably best.
Or will you regularly search for values for a particular time slot? Or the values between two intervals? Then option 1 is probably best.
Some other observations:
With a row per time slot and interval type, you use inserts to add new data. Using the wide tables you have one insert at the start of the day then all other additions are updates. So you'll have rows with mostly null columns at the beginning of the day and mostly not null at the end. This may lead to lots of row chaining/migration. Unless you batch your data up to a single insert at the end of each day.
If you want to switch to more granular time periods you need to add columns to the wide table. This may lead to a lot of rework in your app to use them. And if you want to be as granular as every minute or more, you'll hit the 1,000 columns/table limit in Oracle Database.
In general option 1 may lead to more rows, but is more flexible. I lean towards using this unless you have strong requirements which lead to option 2.
Whichever method you go for, you could build materialized views to pivot/unpivot the data as needed which may help with performance in some cases.