My first Publication Agile-Data-Warehouse-Design-eBook | Page 188

168 Chapter 6 Who and What Figure 6-1 Removing volatile attributes from CUSTOMER Poorly designed mini-dimensions can be almost as large and volatile as the original dimension Create stable mini- dimensions by removing high cardinality attributes or reducing their cardinality by banding So, problem solved? Unfortunately, it might just be a case of problem moved. If the mini-dimension contains several high cardinality attributes, the number of unique demographic profiles may be almost as high as the number of customers and customer changes will create new profiles rather than reuse existing ones because they are too specific. The CV customer dimension might not grow but the so- called “mini-dimension” will, to become the new “monster dimension”. Mini-dimensions need to be mini and stay mini if they are to solve the VLD HV problem. Figure 6-2 shows a redesign of CUSTOMER DEMOGRAPHIC where some of the original high cardinality attributes (CITY, POST CODE) have been removed and the continuously valued attributes (INCOME, CREDIT SCORE) have been converted into low cardinality discrete bands. This dramatically reduces the number of unique profiles and increases the chances of reusing them when customers change. When you have defined a small stable customer mini-dimension you may be able to add additional frequently queried, low cardinality (GENDER, AGE BAND) attrib- utes without significantly increasing its size. These would increase the filtering power of the mini-dimension and reduce the need for many queries to access the much larger CUSTOMER dimension at all. Add mini-dimension keys to their main dimensions, to support efficient ETL processing Figure 6-2 also shows that CURRENT DEMO KEY, a CV foreign key to CUSTOMER DEMOGRAPHICS, has been added to CUSTOMER. This creates a single table containing the customer business key: CUSTOMER ID and the two customer surrogate keys: CUSTOMER KEY and CURRENT DEMO KEY needed to load customer facts. ETL processes would use this to build an efficient lookup.