My first Publication Agile-Data-Warehouse-Design-eBook | Page 186

166 Chapter 6 Who and What Customer Dimensions Customer dimensions are typically very large dimensions (VLDs) Customer dimensions are particularly challenging because of their size. Business- to-consumer (B2C) customer dimensions can be deep (millions of customers), wide (many interesting descriptive attributes), and volatile (people are volatile). This combination of high data volumes and high volatility is the reason customer dimensions are often referred to as “monster dimensions”—they’re a little scary. How large is a very large dimension (or table of any type)? Everything is relative. Any absolute figure we quote will be trumped by future hardware and that trumped again by unimagined requirements for capturing big data. The only definition that stands the test of time is: “a very large table is one that does not perform as well as you wish it to.” Mini-Dimension Pattern Problem/Requirement Customer attributes can be too volatile to track using the Type 2 SCD technique Stakeholders are very interested in tracking descriptive changes to the customer base to help to explain changes in customer behavior. So they have defined many historic value (HV) customer attributes. Unfortunately using the Type 2 SCD technique for each HV attribute is likely to cause explosive growth in the customer dimension; for example, a 10 million row CUSTOMER [HV] dimension with an AGE [HV] attribute will grow by 10 million rows per year. Obviously AGE is a poor choice as an HV attribute; it alone would turn CUSTOMER into a rapidly changing dimension. This issue is quickly solved by replacing AGE in the dimen- sion with the fixed value (FV) attribute DATE OF BIRTH [FV] and calculating the historically correct age, at the time of the facts, in the BI query layer. Sadly, very few customer dimension historical value requirements are as easy to solve as age. If AGE is in constant use—perhaps with medical data—it can be treated as a non- additive fact (NA) and stored with other facts in a fact table. Customer HV attributes must be carefully chosen. Not every change should be tracked, can be tracked, or is worth tracking It only takes 5 HV attributes (that cannot be calculated) that change on average once every two years for each customer, for an initial 10M row CUSTOMER dimension to grow by up to 25 million rows per year. With growth like that, you will have to be careful about which attributes you define as HV, and what types of change you track. You don’t want to track attributes that have no historical signifi- cance—they should be defined as current value (CV). Nor do you want to track a history of corrections that should be handled as simple updates. Corrections are easy to spot for FV attributes, such as date of birth (cannot change, can only be corrected), but may require group change rules (described in Chapter 3) that look for combinations of HV and CV attributes changing together to detect genuine change. You may also want to avoid tracking macro-level changes.