My first Publication Agile-Data-Warehouse-Design-eBook | Page 186
166
Chapter 6
Who and What
Customer Dimensions
Customer
dimensions are
typically very large
dimensions (VLDs)
Customer dimensions are particularly challenging because of their size. Business-
to-consumer (B2C) customer dimensions can be deep (millions of customers),
wide (many interesting descriptive attributes), and volatile (people are volatile).
This combination of high data volumes and high volatility is the reason customer
dimensions are often referred to as “monster dimensions”—they’re a little scary.
How large is a very large dimension (or table of any type)? Everything is relative.
Any absolute figure we quote will be trumped by future hardware and that
trumped again by unimagined requirements for capturing big data. The only
definition that stands the test of time is: “a very large table is one that does not
perform as well as you wish it to.”
Mini-Dimension Pattern
Problem/Requirement
Customer attributes
can be too volatile
to track using the
Type 2 SCD
technique
Stakeholders are very interested in tracking descriptive changes to the customer
base to help to explain changes in customer behavior. So they have defined many
historic value (HV) customer attributes. Unfortunately using the Type 2 SCD
technique for each HV attribute is likely to cause explosive growth in the customer
dimension; for example, a 10 million row CUSTOMER [HV] dimension with an
AGE [HV] attribute will grow by 10 million rows per year. Obviously AGE is a
poor choice as an HV attribute; it alone would turn CUSTOMER into a rapidly
changing dimension. This issue is quickly solved by replacing AGE in the dimen-
sion with the fixed value (FV) attribute DATE OF BIRTH [FV] and calculating the
historically correct age, at the time of the facts, in the BI query layer. Sadly, very
few customer dimension historical value requirements are as easy to solve as age.
If AGE is in constant use—perhaps with medical data—it can be treated as a non-
additive fact (NA) and stored with other facts in a fact table.
Customer HV
attributes must be
carefully chosen.
Not every change
should be tracked,
can be tracked, or
is worth tracking
It only takes 5 HV attributes (that cannot be calculated) that change on average
once every two years for each customer, for an initial 10M row CUSTOMER
dimension to grow by up to 25 million rows per year. With growth like that, you
will have to be careful about which attributes you define as HV, and what types of
change you track. You don’t want to track attributes that have no historical signifi-
cance—they should be defined as current value (CV). Nor do you want to track a
history of corrections that should be handled as simple updates. Corrections are
easy to spot for FV attributes, such as date of birth (cannot change, can only be
corrected), but may require group change rules (described in Chapter 3) that look
for combinations of HV and CV attributes changing together to detect genuine
change. You may also want to avoid tracking macro-level changes.