Skip to Main Content
  • Questions
  • Handle different time zones in time dimension of data warehouse

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question.

Asked: September 11, 2015 - 7:18 pm UTC

Last updated: September 18, 2015 - 3:47 am UTC

Version: 11.2

Viewed 1000+ times

You Asked

How to design, handle time dimension in a datawarehouse to accommodate different time zone ?

I see standard time dimension with day as the lowest grain.
We have data coming from across the world as the application is used world wide. Head office is est time zone. But some in California have different date time stamp, some are in Japan, India, China, Europe.

How to design the time dimension table for multi time zone.

Any design suggestions ? I know oracle has time stamp with time zone but how would that fit in the time dimension table. We have last update date which would be different for different time zone.

Should all entries be converted to est and then tied to a particular time dimension entry? Or have additional time zone Id column in date dimension table ?

Just want some design suggestions on the implementation.

and Connor said...

Often the toughest part of the exercise of handling timezones is not the implementation, but getting appropriate definitions from business stakeholders.

For example, when someone says: "What were our global sales between 9am and 5pm today", is the question really:

- between 9am and 5pm in the timezone of the person asking the question, what sales came in from *all* regions ? (and of course, that might have been middle of the night for a region when the shop was closed).

OR

- between 9am and 5pm in the timezone of each region, what were the sales. Which leads to an additional question, what do you mean by "today" in that case ?
Somone asking that question at 7pm Friday, probably means "Friday" but for some of the timezones, Friday might not be completed yet etc...

Without consensus on what the definitions will be, often consumers of data from the warehouse will happily run queries in total ignorance of whether the answers they are getting are in alignment with their *perception* of what question they are trying to answer.


In any event, on your fact table(s), I'd strongly recommend capturing the timestamp AND timezone in all cases, because as requirements change, information that you might not have needed before might become relevant later.

Additionaly, when you're talking mllions or billions of rows, its not computationally trivial to be converting between timezones over and over and over, so it may well be the case that you'll store both the originating timestamp/timezone on the fact *plus* a "universal" timestamp as well. That *might* be UTC/GMT, but if some other timezone made sense for you (eg "All reporting is done with respect to New York head office"), then that could be more appropriate.


Storing a universal time as well as local time, then gives you more flexibility when it comes to mapping to a time dimension. You may opt to always link by the universal. That in itself takes discipline, because people will naturally issue queries in their own timezone, which means you might need to intercept them to ensure that boundaries are correct. For example, a "sales for the week" query might need universal date ranges *greater* than that of 1 week to capture the data from those timezones that "stretch outside" the week boundary defined by the local timezone.


I hope this helps. Reviewers are welcome to add their experiences.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Sokrates, September 14, 2015 - 5:05 am UTC


Generic solution

David Aldridge, September 14, 2015 - 9:18 am UTC

Completely agree that there is a huge business logic question hanging over this, but it's worth considering how you might build a very generic solution.

It's worth noting first that it is pretty rare for the granularity of the date dimension to be more than "by day", and in fact the questioner specifically mentions a granularity of "day". Usually the definition of a day is not in question.

And generally, if there was a time component that was required as part of the facts, you might still use a day-level date dimension and use the date-time as a degenerate dimension, because building a datetime dimension to the granularity of one second is too much detail. In the case of different users having a different definition of when "Friday 27th April" starts and ends, the day-based date dimension is almost useless, though.

One compromise solution is a more detailed granularity in the datetime dimension, and if you go to a 15-minute granularity then every timezone's day is mappable to a set of datetime dimension records.

So anyway, that is one completely generic approach -- convert all datetimes to UTZ, and build a datetime dimension at 15 minute granularity. In the dimension table you might have different day definitions pre-calculated for major countries/markets "GB_BUSINESS_DAY", "US_EASTCOAST_BUSINESS_DAY", etc.

And ten years is 350,400 15-minute records instead of 3,650 day records -- not too bad.
Connor McDonald
September 16, 2015 - 3:50 am UTC

Thanks for your input David.