My first Publication Agile-Data-Warehouse-Design-eBook | Page 188
168
Chapter 6
Who and What
Figure 6-1
Removing volatile
attributes from
CUSTOMER
Poorly designed
mini-dimensions can
be almost as large
and volatile as the
original dimension
Create stable mini-
dimensions by
removing high
cardinality attributes
or reducing their
cardinality by
banding
So, problem solved? Unfortunately, it might just be a case of problem moved. If the
mini-dimension contains several high cardinality attributes, the number of unique
demographic profiles may be almost as high as the number of customers and
customer changes will create new profiles rather than reuse existing ones because
they are too specific. The CV customer dimension might not grow but the so-
called “mini-dimension” will, to become the new “monster dimension”.
Mini-dimensions need to be mini and stay mini if they are to solve the VLD HV
problem. Figure 6-2 shows a redesign of CUSTOMER DEMOGRAPHIC where
some of the original high cardinality attributes (CITY, POST CODE) have been
removed and the continuously valued attributes (INCOME, CREDIT SCORE)
have been converted into low cardinality discrete bands. This dramatically reduces
the number of unique profiles and increases the chances of reusing them when
customers change.
When you have defined a small stable customer mini-dimension you may be able
to add additional frequently queried, low cardinality (GENDER, AGE BAND) attrib-
utes without significantly increasing its size. These would increase the filtering
power of the mini-dimension and reduce the need for many queries to access the
much larger CUSTOMER dimension at all.
Add mini-dimension
keys to their main
dimensions, to
support efficient
ETL processing
Figure 6-2 also shows that CURRENT DEMO KEY, a CV foreign key to
CUSTOMER DEMOGRAPHICS, has been added to CUSTOMER. This creates a
single table containing the customer business key: CUSTOMER ID and the two
customer surrogate keys: CUSTOMER KEY and CURRENT DEMO KEY needed
to load customer facts. ETL processes would use this to build an efficient lookup.