You Asked
How to design, handle time dimension in a datawarehouse to accommodate different time zone ?
I see standard time dimension with day as the lowest grain.
We have data coming from across the world as the application is used world wide. Head office is est time zone. But some in California have different date time stamp, some are in Japan, India, China, Europe.
How to design the time dimension table for multi time zone.
Any design suggestions ? I know oracle has time stamp with time zone but how would that fit in the time dimension table. We have last update date which would be different for different time zone.
Should all entries be converted to est and then tied to a particular time dimension entry? Or have additional time zone Id column in date dimension table ?
Just want some design suggestions on the implementation.
and Connor said...
Often the toughest part of the exercise of handling timezones is not the implementation, but getting appropriate definitions from business stakeholders.
For example, when someone says: "What were our global sales between 9am and 5pm today", is the question really:
- between 9am and 5pm in the timezone of the person asking the question, what sales came in from *all* regions ? (and of course, that might have been middle of the night for a region when the shop was closed).
OR
- between 9am and 5pm in the timezone of each region, what were the sales. Which leads to an additional question, what do you mean by "today" in that case ?
Somone asking that question at 7pm Friday, probably means "Friday" but for some of the timezones, Friday might not be completed yet etc...
Without consensus on what the definitions will be, often consumers of data from the warehouse will happily run queries in total ignorance of whether the answers they are getting are in alignment with their *perception* of what question they are trying to answer.
In any event, on your fact table(s), I'd strongly recommend capturing the timestamp AND timezone in all cases, because as requirements change, information that you might not have needed before might become relevant later.
Additionaly, when you're talking mllions or billions of rows, its not computationally trivial to be converting between timezones over and over and over, so it may well be the case that you'll store both the originating timestamp/timezone on the fact *plus* a "universal" timestamp as well. That *might* be UTC/GMT, but if some other timezone made sense for you (eg "All reporting is done with respect to New York head office"), then that could be more appropriate.
Storing a universal time as well as local time, then gives you more flexibility when it comes to mapping to a time dimension. You may opt to always link by the universal. That in itself takes discipline, because people will naturally issue queries in their own timezone, which means you might need to intercept them to ensure that boundaries are correct. For example, a "sales for the week" query might need universal date ranges *greater* than that of 1 week to capture the data from those timezones that "stretch outside" the week boundary defined by the local timezone.
I hope this helps. Reviewers are welcome to add their experiences.
Rating
(2 ratings)
Is this answer out of date? If it is, please let us know via a Comment