My first Publication Agile-Data-Warehouse-Design-eBook | Page 190
170
Chapter 6
Mini-dimensions
create additional
fact table keys
Banding causes
some loss of
dimensional detail
but creates better
attributes for report
grouping and
filtering
Who and What
Banding continuous valued attributes, such as INCOME or CREDIT SCORE,
causes historical detail to be lost. Having said that, these high cardinality details do
not make good dimensional attributes. Remember, the role of a dimensional is to
provide good report row headers and filters. A high cardinality INCOME attribute
is a poor report row header. Precise income figures (if you actually knew them) are
virtually unique to customers and would group very little data, resulting in unread-
able long reports—more like database dumps than BI reports. Low cardinality
banded attributes, such as INCOME BAND make far better row headers, just so
long as the bandings are carefully designed with the stakeholders. When
stakeholders give you examples of high cardinality dimensional attributes you
should ask them for the rollup descriptions or bands they would like to see on their
reports. When they define their favorite numeric bandings make sure the bands are
contiguous—no gaps and no overlaps.
If stakeholders need access to the history of continuous valued customer details,
you should model these not as dimensional attributes, but as facts in a customer
profile fact table.
Sensible Snowflaking Pattern
Problem
Large numbers of
low cardinality
attributes can waste
space in very large
dimensions
Stakeholders want to use customer first purchase date in conjunction with com-
mercially available geodemographic information to segment the customer base for
marketing purposes. Adding all the necessary descriptive attributes related to first
purchase date (first purchase quarter, first purchase on a holiday indicator etc,)
and geodemographic code would greatly increase the size of an already very large
customer dimension. Stakeholders are worried that existing queries, that don’t
need the new marketing attributes, will be adversely affected.
Solution
“Snowflake” the
CUSTOMER
dimension. Move
large collections of
low cardinality
attributes to
outriggers
Generally, it’s a good idea to denormalize as many descriptive attributes as possible
into a dimension to simplify the model and improve query performance. But
CUSTOMER dimensions, because of their size, are exceptional and can often
benefit from sensible normalization or “snowflaking”. The FIRST PURCHASE
DATE and GEODEMOGRAPHICS outriggers, shown in Figure 6-3, represent
sensible snowflaking because they avoid a large number of much lower cardinality
date and geodemographic attributes being embedded in the CUSTOMER dimen-
sion. Keeping these attributes separate will make a worthwhile storage saving that
will improve query performance—especially for all the queries that are not inter-
ested in the first purchase dates or geodemographics. In this specific case the use of
an outrigger for FIRST PURCHASE DATE makes even more sense as it can be
implemented as a role-playing view of the standard CALENDAR dimension.