My first Publication Agile-Data-Warehouse-Design-eBook | Page 105
84
Chapter 3
Dimensional History
You must define
how dimensional
attributes handle
history
Slowly changing
dimensions
dramatically affect
how historical
events are reported
When you have discovered all the attributes of a dimension, described them using
examples, and modeled their hierarchical relationships, there is one more piece of
information that stakeholders must tell you about each one: how to handle its
history. This information is also known as a dimension’s temporal properties or
slowly changing dimension (SCD) policy.
Stakeholders instinctively feel the need to preserve event history — especially legally
binding financial transactions — but may think of dimensions as (relatively static)
reference data that simply has to be kept up to date when it does change. While it is
true that dimensions are relatively static compared to dynamic business events,
slowly changing dimension history, or rather the lack of it, can have a profound
effect on a data warehouse’s ability to accurately compare events over time and
meet stakeholders’ initial and longer-term needs. For example, Dean Moriarty, a
customer who was based in New York, relocates to California at the beginning of
this year. Should a BI query for “order totals by state, this year vs. last year” associ-
ate all of Moriarty’s orders to his current location: California? Or should Mori-
arty’s orders last year be associated with New York (last year’s location), and only
this year’s with California? What if BI users want to look at the biggest spenders in
California over the last two years, should their queries include Moriarty based on
his high spending while he was still in New York last year or exclude him because
he hasn’t spent so much since moving to L.A.? Another way of asking these ques-
tions is “Should queries use the current or historical values of customer state?” Is
there a simple answer?
Current Value Attributes
Operational systems
generally default to
current valued
reference data
Current value (CV)
attributes provide
“as is” reporting that
matches operational
reporting results
Operational systems generally concentrate on the “here and now” with an under-
standable bias towards current values for reference or master data. For example, an
order processing system will make sure that only “this year’s model” products are
available to be selected for shipping to customers’ current locations. Because of this
bias, operational database applications will often overwrite reference data when it
changes. If the same updates are applied to dimensional attributes they will contain
current values only.
Dimensional attributes that only contain the current value descriptions provide “as
is” reporting; i.e., they roll up event history using today’s descriptions, making it
seem as if everything has always been described as it is now. For the previous
example, current value (CV) attributes would roll up all of Moriarty’s orders to
California (his current location) regardless of where, on the road, he was living
when he placed them. This is typically the style of reporting that stakeholders are
used to from their attempts to analyze history directly from operational systems.