Skip to Main Content
  • Questions
  • DW - Slowly Changing "Facts" (not Dimensions) - How to

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question.

Asked: March 23, 2016 - 5:31 am UTC

Last updated: March 23, 2016 - 7:34 am UTC

Version: 12c

Viewed 1000+ times

You Asked

How we we handle Slow changing "Facts"
What would be the implications of that solution if we say use same SCD type 2 even for Fact Tables.

I am really surprised Kimball does not talk about this in any of his great books. He talks about Transaction fact , periodic snapshot fact and accumulation snapshot. But never about slowly changing Fact

We always hear about Slowly Changing Dimensions to be tacked with Type 1 , 2 ,3. I want to use typ2 even for the Facts.

Any Suggestion on possible solutions ?

A Fact measure say a Date value and say a count numeric value in fact table may have to be changed say as part of a valid business process or even as part of data correction which may happen after a few days or months.

Now I need to maintain history of this fact change over a period of time at a day's grain. How to do it.

I don't want to keep inserting the same data every day in the fact table if the value in fact has not changed

Once fact changes for particular dimensions then I need to maintain the history with the changes done to fact.

With SCF type 2 ( slowly changing fact type 2 )
So the current new values will be inserted in the fact and update those standard effective from and to dates will be updated with current flag Y and
the old flag will also be updated with Effective end date as new start date for new data - 1.

I don't know the implications of this approach as I said I never seen the Gurus talking about it or may be I was not paying attention to Kimball.

Any suggestions ? Alternatives ?


and Connor said...

There is a fair bit of literature out there for people's means of tackling the issue (eg http://brentgreenwood.blogspot.com.au/2011/12/slowly-changing-facts.html )

The approach you take is very much driven by business requirements (or typically, which requirements they'd like to prioritise highest).

For example, some options (driven by requirements) can include:

1) the history (ie, 'old' version of the fact row) is stored elsewhere, because its needed for compliance, but the core requirement is seeing the current data and interpreting it as "it was always that way". If people want to see "point in time" then they use the separate history table.

2) the snapshot concept, the facts are stored as they currently are every day. I've never seen this work well, because the fact table grows exponentially

3) the start/end concept as you've outlined above - the fundamental challenge with that is if you want *edit* old data, it means it gets a lot harder to manage, but you might never get to archive or retire old data, because of the "just in case" issue, ie, an update comes through months or years after the first instance of a fact is recorded. Similarly, physical implementation options (compression, read-only, etc) become more limited.

Still, even with that considered, there are still with variations of that, once again depending on requirements, eg using your case of:

march 1st => march 10th, fact1, count=100
march 10th => (current), fact1, count=80

I've seen options such as:

i) just as above, it becomes the job of the query write to ensure they don't (for example) arrive at a count of 180 etc.

ii) additional columns to reflect different time-based views, eg

march 1st => march 10th, fact1, count=100,delta=100
march 10th => (current), fact1, count=80,delta=-20

so queries can sum *all* of the records (for delta) prior to that of a requested point in time, rather than have to hunt for something marked as 'current'

iii) artificial correctional rows (finance people often like this, because it matches their thinking of ledger style operations, eg)

march 1st => march 10th, fact1, count=100
march 10th => (current), fact1, count=-100
march 10th => (current), fact1, count=80

once again, the benefit being summation across all rows

Which is best for you ? Only you can decide on that. Once you're into "scf" territory, everything is basically some sort of compromise between performance, complexity and manageability...just like most things :-)

Hope this helps.


Is this answer out of date? If it is, please let us know via a Comment