Thanks for the question, sunny.

Asked: October 27, 2017 - 6:29 pm UTC

Last updated: November 03, 2017 - 4:50 pm UTC

Version: 12

Viewed 1000+ times

You Asked

I have a around 2 million assets that send 15 min interval data for a given day (i.e) total of 96 intervals for a given day. What is best database modeling approach. My target RDBMS is Oracle.

I'm partitioning by week and for each interval I need to store the interval type (INT_TYPE) (i.e) actual or estimated etc.,

Option--1 will be way too deep in terms of data (i.e) it will have around 60 million records . Option--2 can minimize the volume (i.e) it will have around 2 million records, but how do i model the column interval-type in Option--2? such that we can capture the interval type??

--OPTION 1

CREATE TABLE TRANS.ASSET_DAILY_INT(
    INT_READ_DT                        DATE             NOT NULL,
     ASSET_ID                        VARCHAR2(30)      NOT NULL, 
     MINUTES                         NUMBER (2,0)       NOT NULL, 
    INT_VALUE                   FLOAT(126)              NOT NULL,
    INT_TYPE                    VARCHAR2(30)          NOT NULL,
   ASSET_TYPE                     NUMBER(38,0),
    INT_SUM_DAY_USG                  FLOAT(126),
    HIGHEST_INT_DAY_USG              FLOAT(126),
    CREATED_DT                       DATE,
    MODIFIED_DT                      DATE,
     )
TABLESPACE TRANS
PARALLEL(DEGREE 4 INSTANCES 1)
PARTITION BY RANGE(MDM_READ_DT)
INTERVAL (NUMTODSINTERVAL(7,'DAY'))
(PARTITION p_first VALUES LESS THAN ('01-JAN-2010'));
 
----OPTION 2 how do i model the column interval-type in Option--2? such that we can capture the interval type??
 
CREATE TABLE TRANS.ASSET_DAILY_INT(
    INT_READ_DT                        DATE             NOT NULL,
     ASSET_ID                        VARCHAR2(30)      NOT NULL, 
    INT_VALUE_0000                   FLOAT(126),
    INT_VALUE_0015                   FLOAT(126),
    INT_VALUE_0030                   FLOAT(126),
    INT_VALUE_0045                   FLOAT(126),
    INT_VALUE_0100                   FLOAT(126),
    INT_VALUE_0115                   FLOAT(126),
    INT_VALUE_0130                   FLOAT(126),
    INT_VALUE_0145                   FLOAT(126),
    INT_VALUE_0200                   FLOAT(126),
    INT_VALUE_0215                   FLOAT(126),
    INT_VALUE_0230                   FLOAT(126),
    INT_VALUE_0245                   FLOAT(126),
    INT_VALUE_0300                   FLOAT(126),
    INT_VALUE_0315                   FLOAT(126),
    INT_VALUE_0330                   FLOAT(126),
    INT_VALUE_0345                   FLOAT(126),
    INT_VALUE_0400                   FLOAT(126),
    INT_VALUE_0415                   FLOAT(126),
    INT_VALUE_0430                   FLOAT(126),
    INT_VALUE_0445                   FLOAT(126),
    INT_VALUE_0500                   FLOAT(126),
    INT_VALUE_0515                   FLOAT(126),
    INT_VALUE_0530                   FLOAT(126),
    INT_VALUE_0545                   FLOAT(126),
    INT_VALUE_0600                   FLOAT(126),
    INT_VALUE_0615                   FLOAT(126),
    INT_VALUE_0630                   FLOAT(126),
    INT_VALUE_0645                   FLOAT(126),
    INT_VALUE_0700                   FLOAT(126),
    INT_VALUE_0715                   FLOAT(126),
    INT_VALUE_0730                   FLOAT(126),
    INT_VALUE_0745                   FLOAT(126),
    INT_VALUE_0800                   FLOAT(126),
    INT_VALUE_0815                   FLOAT(126),
    INT_VALUE_0830                   FLOAT(126),
    INT_VALUE_0845                   FLOAT(126),
    INT_VALUE_0900                   FLOAT(126),
    INT_VALUE_0915                   FLOAT(126),
    INT_VALUE_0930                   FLOAT(126),
    INT_VALUE_0945                   FLOAT(126),
    INT_VALUE_1000                   FLOAT(126),
    INT_VALUE_1015                   FLOAT(126),
    INT_VALUE_1030                   FLOAT(126),
    INT_VALUE_1045                   FLOAT(126),
    INT_VALUE_1100                   FLOAT(126),
    INT_VALUE_1115                   FLOAT(126),
    INT_VALUE_1130                   FLOAT(126),
    INT_VALUE_1145                   FLOAT(126),
    INT_VALUE_1200                   FLOAT(126),
    INT_VALUE_1215                   FLOAT(126),
    INT_VALUE_1230                   FLOAT(126),
    INT_VALUE_1245                   FLOAT(126),
    INT_VALUE_1300                   FLOAT(126),
    INT_VALUE_1315                   FLOAT(126),
    INT_VALUE_1330                   FLOAT(126),
    INT_VALUE_1345                   FLOAT(126),
    INT_VALUE_1400                   FLOAT(126),
    INT_VALUE_1415                   FLOAT(126),
    INT_VALUE_1430                   FLOAT(126),
    INT_VALUE_1445                   FLOAT(126),
    INT_VALUE_1500                   FLOAT(126),
    INT_VALUE_1515                   FLOAT(126),
    INT_VALUE_1530                   FLOAT(126),
    INT_VALUE_1545                   FLOAT(126),
    INT_VALUE_1600                   FLOAT(126),
    INT_VALUE_1615                   FLOAT(126),
    INT_VALUE_1630                   FLOAT(126),
    INT_VALUE_1645                   FLOAT(126),
    INT_VALUE_1700                   FLOAT(126),
    INT_VALUE_1715                   FLOAT(126),
    INT_VALUE_1730                   FLOAT(126),
    INT_VALUE_1745                   FLOAT(126),
    INT_VALUE_1800                   FLOAT(126),
    INT_VALUE_1815                   FLOAT(126),
    INT_VALUE_1830                   FLOAT(126),
    INT_VALUE_1845                   FLOAT(126),
    INT_VALUE_1900                   FLOAT(126),
    INT_VALUE_1915                   FLOAT(126),
    INT_VALUE_1930                   FLOAT(126),
    INT_VALUE_1945                   FLOAT(126),
    INT_VALUE_2000                   FLOAT(126),
    INT_VALUE_2015                   FLOAT(126),
    INT_VALUE_2030                   FLOAT(126),
    INT_VALUE_2045                   FLOAT(126),
    INT_VALUE_2100                   FLOAT(126),
    INT_VALUE_2115                   FLOAT(126),
    INT_VALUE_2130                   FLOAT(126),
    INT_VALUE_2145                   FLOAT(126),
    INT_VALUE_2200                   FLOAT(126),
    INT_VALUE_2215                   FLOAT(126),
    INT_VALUE_2230                   FLOAT(126),
    INT_VALUE_2245                   FLOAT(126),
    INT_VALUE_2300                   FLOAT(126),
    INT_VALUE_2315                   FLOAT(126),
    INT_VALUE_2330                   FLOAT(126),
    INT_VALUE_2345                   FLOAT(126),
    ASSET_TYPE                     NUMBER(38,0),
    INT_SUM_DAY_USG                  FLOAT(126),
    HIGHEST_INT_DAY_USG              FLOAT(126),
    CREATED_DT                       DATE,
    MODIFIED_DT                      DATE,
     )
TABLESPACE TRANS
PARALLEL(DEGREE 4 INSTANCES 1)
PARTITION BY RANGE(MDM_READ_DT)
INTERVAL (NUMTODSINTERVAL(7,'DAY'))
(PARTITION p_first VALUES LESS THAN ('01-JAN-2010'));

and Chris said...

So for each 15 minute time period each day, you need to record an actual value, estimated value or both?

If so, your option 2 itself has two sub-options. Add an interval type column as you've done with option 1 so you have two rows for each day: actuals and estimated. Or have two columns for each time period, actual values and estimated, e.g.:

create table asset_daily_int (
  int_read_dt      date not null,
  asset_id         varchar2( 30 ) not null,
  int_type         varchar2( 30 ) not null, --actual/esimated
  int_value_0000   float( 126 ),
  int_value_0015   float( 126 ),
  int_value_0030   float( 126 ),
  ...
);

create table asset_daily_int (
  int_read_dt      date not null,
  asset_id         varchar2( 30 ) not null,
  int_act_value_0000 float( 126 ),
  int_est_value_0000 float( 126 ),
  int_act_value_0015 float( 126 ),
  int_est_value_0015 float( 126 ),
  int_act_value_0030 float( 126 ),
  int_est_value_0030 float( 126 ),
  ...
);

The question of which of these methods is "best" depends on your requirements and how you plan on querying the tables.

Do you "always" query an asset getting all the intervals for a day and display them as columns? If so one of the option 2 variants is probably best.

Or will you regularly search for values for a particular time slot? Or the values between two intervals? Then option 1 is probably best.

Some other observations:

With a row per time slot and interval type, you use inserts to add new data. Using the wide tables you have one insert at the start of the day then all other additions are updates. So you'll have rows with mostly null columns at the beginning of the day and mostly not null at the end. This may lead to lots of row chaining/migration. Unless you batch your data up to a single insert at the end of each day.

If you want to switch to more granular time periods you need to add columns to the wide table. This may lead to a lot of rework in your app to use them. And if you want to be as granular as every minute or more, you'll hit the 1,000 columns/table limit in Oracle Database.

In general option 1 may lead to more rows, but is more flexible. I lean towards using this unless you have strong requirements which lead to option 2.

Whichever method you go for, you could build materialized views to pivot/unpivot the data as needed which may help with performance in some cases.

Rating

(15 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Normalize

Racer I., October 30, 2017 - 12:27 pm UTC

Hi,

I would probably have multiple tables

Assets (asset_id number(16), asset_name varchar2(30), asset_type NUMBER(38))
Types (type_id number(16), type_name varchar2(30))
int_data (timestmp date, asset_id number(16), type_id number(16), value NUMBER(16))

with
timestmp : includes minute (can be extracted as needed)
value : store 12,34 as 123450 and multiply/divide by 1000
created_dt/modified_dt : can be estimated from read_dt

INT_SUM_DAY_USG/HIGHEST_INT_DAY_USG : via view *)
optionally mat view or daily compress run (view with union)

as to space usage : OPs' option 1/2 should be fairly equal in storage footprint. Depends on how sparse the data can be.

regards,

October 30, 2017 - 2:10 pm UTC

Yep, normalizing asset_type seems like a good idea. I probably wouldn't bother with a separate types table though unless there will be many types with flags you can set.

sunny aleti, October 30, 2017 - 9:25 pm UTC

My constraint is that we have existing model that has 6 years of history perfectly working with option-2 (with 96 columns for each 15 min, daily intervals). Now we have a new business need to store interval_type and associate that to each interval reading moving forward. I wanted to mitigate any impact to all our downstream and upstream design/ applications? I like your normalized design.

However, I wanted to validate few of my design concerns, especially with OPTION-1, which SIGNIFICANTLY INCREASES THE VOLUME (around 60 million for each day). I might do a hourly partition (or sub partition by meter) but we will still be dealing with huge data and not sure how much this would be a performance bottleneck compared to OPTION-2, which will have low volume (around 2 million for each day)??

This model will mainly cater analytical/reporting needs and batch updated by ETL. What would be the optimal design to store 15 min, 96 intervals daily, for optimal performance??

Is deep data design (option-1) better than Horizontal fixed column (option-2)???

thx

October 31, 2017 - 1:14 pm UTC

Well, if you're worried that option 1 "SIGNIFICANTLY INCREASES THE VOLUME", then TEST YOUR APPLICATION with the new design.

Switching the table structure and increasing the number of rows may make:

- some queries a lot slower
- some queries a lot faster
- little or no difference for most queries

It really depends on the queries you run, what indexes you create and so on. Anything I tell you about which will be faster is just a guess. DON'T GUESS. TEST AND MEASURE.

sunny aleti, October 30, 2017 - 9:30 pm UTC

to your question on Do you "always" query an asset getting all the intervals for a day and display them as columns? If so one of the option 2 variants is probably best.

Mostly mixed there is a need to fetch all the intervals and also only peak intervals as well. The only design criteria for me would be PERFORMANCE

sunny aleti, October 30, 2017 - 10:44 pm UTC

Also, the following design would not work, since we get mixed interval types (i.e.) in 24 hours we might get some actual interval reads and some estimated interval reads. As such we need to identify each interval with their type.

create table asset_daily_int (
int_read_dt date not null,
asset_id varchar2( 30 ) not null,
int_type varchar2( 30 ) not null, --actual/esimated
int_value_0000 float( 126 ),
int_value_0015 float( 126 ),
int_value_0030 float( 126 ),
...
);

October 31, 2017 - 1:15 pm UTC

The proposal is you have two rows for each day. One for estimates one for actuals. I was assuming you could capture both at the same time.

sunny aleti, October 31, 2017 - 1:56 pm UTC

Assuming we have 60 interval of type estimate and remaining (36) intervals as actual and if we store 2 rows at same time for each interval, wouldn't showing all the 24 hrs (96 intervals) along with their interval_type be complicated.. How can we write this query easily..I'm I missing anything??

October 31, 2017 - 2:00 pm UTC

What exactly are you storing? Can you have both estimated and actual values recorded at the same time interval?

A reader, October 31, 2017 - 2:11 pm UTC

Yes. In 24 hrs we can get mixed interval types (estimates/actual)

October 31, 2017 - 5:49 pm UTC

But can you get an estimate and actual at precisely the same time, e.g. 0015? What are you supposed to show if that happens?

sunny aleti, October 31, 2017 - 6:14 pm UTC

Ohh sorry I might have miss understood your question. NOT at same time for 24hrs we have 96 intervals (15 min interval). At any given time we can have maximum 96 intervals for 24 hrs, but out of 96 we will have mixed interval_types (i.e.) estimated/actual. If we store as below, how do we query and show all the 24 hrs intervals for a given time period, which has different types-- wouldn't the query be bit complicated

create table asset_daily_int (
int_read_dt date not null,
asset_id varchar2( 30 ) not null,
int_type varchar2( 30 ) not null, --actual/esimated
int_value_0000 float( 126 ),
int_value_0015 float( 126 ),
int_value_0030 float( 126 ),
...
);

November 01, 2017 - 4:05 pm UTC

So if you want only one row/day in your results, just group by date/asset and take the min value of all the periods:

select int_read_dt date, asset_id, 
       min(int_value_0000), min(int_value_0015), min(int_value_0030), ...
from   asset_daily_int
group  by int_read_dt date, asset_id,

If you need to see for each interval if it was estimated or actual... then yes it gets a bit more complicated.

Further variants

Racer I., November 01, 2017 - 9:25 am UTC

HI,

> , especially with OPTION-1, which SIGNIFICANTLY INCREASES THE VOLUME

More rows, but each is shorter. So you can't say its more volume (segment size). If lots of time/asset-combinations are missing (or you can leave out 0 values) option 1 will probably use less space in total. A missing/null/0-value in a row (option 2) uses 1 byte. Only NULLs at the end use nothing (doesn't work with your date fields at the end though).

> At any given time we can have maximum 96 intervals for 24 hrs, but out of 96 we will have mixed interval_types (i.e.) estimated/actual

You can use two rows for the two types and fill in only the values in the corresponding row :
est 0 7 0 0 5 6
act 3 0 4 4 0 0

The you can use SUM() + GROUP BY to combine the values for display/export (losing the distinction though).

Another way (if there are only to types, originally you said : (i.e) actual or estimated etc., ) and all values are positive :
time 3 -7 4 4 -5 -6
i.e. mark estimates with a minus sign. That way you don't need to change the database, only the way it is accessed.

regards,

November 01, 2017 - 4:07 pm UTC

I'm wary of using negative numbers to indicate estimated vs. actual. Even assuming they are impossible in real-world data, this can add a lot of confusion down the line.

A reader, November 02, 2017 - 8:03 pm UTC

Finally, to mitigate the reporting requirement IT wanted to go with option-2, but create 96+96 columns so that we can store intervals and interval types. Do you see any concerns with this design??

I personally wanted to go with option-1, but not comfortable with the reporting performance (i.e.) transpose (pivoting) to show all intervals on a single row. Is there any way we can easily perform transpose (pivoting)???

Thx

November 03, 2017 - 12:14 pm UTC

Given no other details, option 1 is still my preferred method. But having a flag for type could work.

Is there any way we can easily perform transpose (pivoting)???

That's what the pivot command is for!

https://blogs.oracle.com/sql/how-to-convert-rows-to-columns-and-back-again-with-sql-aka-pivot-and-unpivot

If you're concerned about performance: TEST!

If you find it's "not good enough", you can always create a materialized view to pivot the data. So you can have your cake and eat it (sort of ;)

PIVOT

Racer I., November 03, 2017 - 10:59 am UTC

Hi,

WITH
AD AS ( 
  select 1 aid, TRUNC(SYSDATE) + INTERVAL '15' MINUTE dt, 1 tp, 50 val FROM DUAL UNION ALL
  select 1 aid, TRUNC(SYSDATE) + INTERVAL '45' MINUTE dt, 1 tp, 30 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 1 tp, 10 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '75' MINUTE dt, 1 tp, 22 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 2 tp, 10 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 1 tp, 33 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE + 1) + INTERVAL '60' MINUTE dt, 2 tp, 37 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE + 1) + INTERVAL '60' MINUTE dt, 1 tp, 35 val FROM DUAL),
MD AS (  
  SELECT aid, TRUNC(dt) dt, TO_CHAR(dt, 'HH24:MI') dm, tp, val
  FROM  AD)
SELECT *
FROM  MD  
PIVOT (SUM(val) FOR (dm) IN ('00:00', '00:15', '00:30', '00:45', '01:00', '01:15')) -- ...
ORDER BY aid

Getting missing values to show as 0 is harder. You'd have to add them in an intermediate WITH-clause with the full day range and a left join.

November 03, 2017 - 12:18 pm UTC

Getting missing values to show as 0 is harder.

It is?

The example has no rows for 00:00, yet it shows in the results:

WITH
AD AS ( 
  select 1 aid, TRUNC(SYSDATE) + INTERVAL '15' MINUTE dt, 1 tp, 50 val FROM DUAL UNION ALL
  select 1 aid, TRUNC(SYSDATE) + INTERVAL '45' MINUTE dt, 1 tp, 30 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 1 tp, 10 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '75' MINUTE dt, 1 tp, 22 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 2 tp, 10 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 1 tp, 33 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE + 1) + INTERVAL '60' MINUTE dt, 2 tp, 37 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE + 1) + INTERVAL '60' MINUTE dt, 1 tp, 35 val FROM DUAL),
MD AS (  
  SELECT aid, TRUNC(dt) dt, TO_CHAR(dt, 'HH24:MI') dm, tp, val
  FROM  AD)
SELECT *
FROM  MD  
PIVOT (SUM(val) FOR (dm) IN ('00:00', '00:15', '00:30', '00:45', '01:00', '01:15')) -- ...
ORDER BY aid;

AID   DT                     TP   '00:00'   '00:15'   '00:30'   '00:45'   '01:00'   '01:15'   
    1 03-NOV-2017 00:00:00      1                  50                  30                     
    2 03-NOV-2017 00:00:00      1                                                43        22 
    2 03-NOV-2017 00:00:00      2                                                10           
    2 04-NOV-2017 00:00:00      1                                                35           
    2 04-NOV-2017 00:00:00      2                                                37

So you just need to nvl/coalesce these, no?

Zeroes

Racer I., November 03, 2017 - 1:11 pm UTC

Hi,

I meant zeroes in all the slots, where there is NULL now, as option-2 would have. Would be nice if PIVOT had a "NULLS as 0" option. It doesn't allow NVL(SUM(dm), 0) and SUM(NVL(dm, 0)) does nothing.

But yes you can add that outside. Wouldn't work though, if PIVOT were truly dynamic (its also bit cheese and tedious ;) ) :

select aid, dt, tp, NVL(M1, 0) M1, NVL(M2, 0) M2, NVL(M3, 0) M3, NVL(M4, 0) M4, NVL(M5, 0) M5, NVL(M6, 0) M6 from (
WITH
AD AS ( 
  select 1 aid, TRUNC(SYSDATE) + INTERVAL '15' MINUTE dt, 1 tp, 50 val FROM DUAL UNION ALL
  select 1 aid, TRUNC(SYSDATE) + INTERVAL '45' MINUTE dt, 1 tp, 30 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 1 tp, 10 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '75' MINUTE dt, 1 tp, 22 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 2 tp, 10 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 1 tp, 33 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE + 1) + INTERVAL '60' MINUTE dt, 2 tp, 37 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE + 1) + INTERVAL '60' MINUTE dt, 1 tp, 35 val FROM DUAL),
MD AS (  
  SELECT aid, TRUNC(dt) dt, TO_CHAR(dt, 'HH24:MI') dm, tp, val
  FROM  AD)
SELECT *
FROM  MD  
PIVOT (SUM(val) FOR (dm) IN ('00:00' AS M1, '00:15' AS M2, '00:30' AS M3, '00:45' AS M4, '01:00' AS M5, '01:15' AS M6))) -- ...
ORDER BY aid, dt, tp

regards,

November 03, 2017 - 1:29 pm UTC

Yeah, it is a little tedious...

In some environments you may be able to show nulls as zero though ;)

set null 0

WITH
AD AS ( 
  select 1 aid, TRUNC(SYSDATE) + INTERVAL '15' MINUTE dt, 1 tp, 50 val FROM DUAL UNION ALL
  select 1 aid, TRUNC(SYSDATE) + INTERVAL '45' MINUTE dt, 1 tp, 30 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 1 tp, 10 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '75' MINUTE dt, 1 tp, 22 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 2 tp, 10 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE) + INTERVAL '60' MINUTE dt, 1 tp, 33 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE + 1) + INTERVAL '60' MINUTE dt, 2 tp, 37 val FROM DUAL UNION ALL
  select 2 aid, TRUNC(SYSDATE + 1) + INTERVAL '60' MINUTE dt, 1 tp, 35 val FROM DUAL),
MD AS (  
  SELECT aid, TRUNC(dt) dt, TO_CHAR(dt, 'HH24:MI') dm, tp, val
  FROM  AD)
SELECT *
FROM  MD  
PIVOT (SUM(val) FOR (dm) IN ('00:00' AS M1, '00:15' AS M2, '00:30' AS M3, '00:45' AS M4, '01:00' AS M5, '01:15' AS M6))
ORDER BY aid, dt, tp;

AID   DT                     TP   M1   M2   M3   M4   M5   M6   
    1 03-NOV-2017 00:00:00      1    0   50    0   30    0    0 
    2 03-NOV-2017 00:00:00      1    0    0    0    0   43   22 
    2 03-NOV-2017 00:00:00      2    0    0    0    0   10    0 
    2 04-NOV-2017 00:00:00      1    0    0    0    0   35    0 
    2 04-NOV-2017 00:00:00      2    0    0    0    0   37    0

estimates

Racer I., November 03, 2017 - 1:31 pm UTC

Hi,

With act/estimate columns :

SELECT *
FROM  MD  
PIVOT (SUM(CASE tp WHEN 1 THEN val ELSE 0 END) AS act, SUM(CASE tp WHEN 2 THEN val ELSE 0 END) AS est
  FOR (dm) IN ('00:00' AS M1, '00:15' AS M2, '00:30' AS M3, '00:45' AS M4, '01:00' AS M5, '01:15' AS M6)) -- ...
ORDER BY aid, dt

also : ready cheesy for cheese in the last post

regards,

November 03, 2017 - 1:46 pm UTC

Yep, nice extension.

Can I have some extra mozzarella on top of that cheese with a side helping of blue stilton? ;)

A reader, November 03, 2017 - 3:08 pm UTC

Racer hit one of our other issues with nulls. We get lot of nulls, due to non communicating assets, which the only way is mitigated by custom SQL as you guys demonstrated below. I call it custom sql (sql override) because the BI tools wouldn’t do this, regardless of how we model the reporting framework. So we are stuck with option 2.

With regards to Materialized view, wouldn’t that be a duplication of the data, since as we currently have 6 years (2 mil assets) interval data and users can query fir any date range, so I need to build a MV for the entire history. Is there a dynamic date parameter that I can pass to the MV (just thinking loud).

Also, what abt object oriented design for this, such that we save interval and its corresponding interval type as an object??

November 03, 2017 - 4:42 pm UTC

If you have a lot of nulls, surely that means you'll be storing a lot less rows?

If you create a materialized view (or even just a regular view), surely you can point your BI tool at that? Then you're back to "Sesame Street" SQL again ;)

With regards to Materialized view, wouldn’t that be a duplication of the data

Well yes. But you were really concerned about scanning 60 MILLION ROWS! Using an MV allows you to query only your 2 million rows. Which may help performance.

The refresh process for MVs allows you to keep data in sync with the base table quickly and easily (assuming you can get fast refreshes working). So I wouldn't be too concerned about this.

Also, what abt object oriented design for this, such that we save interval and its corresponding interval type as an object??

NO. Just No.

A reader, November 03, 2017 - 3:19 pm UTC

This is what I was referring abt OOD

CREATE TYPE typ_interval AS OBJECT (int_type varchar2(30 char), int_value float);
/

CREATE table t (int_0000 typ_interval, int_0015 typ_interval);

Whenever we insert a row we’ll need to use the interval construct:
insert into t values(typ_interval('type1', 1), typ_interval('type2', 2));

How do we deal with DML/Query complications??

November 03, 2017 - 4:43 pm UTC

Just say no. This adds a lot of complexity to your SQL. And I can't really see how it helps you in this case.

A reader, November 03, 2017 - 3:25 pm UTC

Also, can you please weighing on the possible issues with the below.

“to mitigate the reporting requirement IT wanted to go with option-2, but create 96+96 columns so that we can store intervals and interval types. Do you see any concerns with this design??”

November 03, 2017 - 4:50 pm UTC

I don't really understand how doing this "mitigates the reporting requirement".

But the key concerns are the same as those that apply generally to option 2.

If you insist on this route, instead of a flag I'd just create two columns for each interval (e.g. est_0000 & act_0000). Unless you think there will be more than two interval types in the near future. Generally using one column (that's not part of the key) to flag which type of data you have in another is a bad idea.

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Normalize

Further variants

PIVOT

Zeroes

estimates

More to Explore

Design

Data Model (Design) Question

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Normalize

Further variants

PIVOT

Zeroes

estimates

More to Explore

Design