My first Publication Agile-Data-Warehouse-Design-eBook | Page 297
Dimensional Design Patterns for Cause and Effect
277
Too Many Degenerate Dimensions?
Most transaction fact tables will contain at least one degenerate transaction ID that
cannot be stored more efficiently in a separate dimension table because its cardi-
nality approaches that of the fact table itself and it has no additional descriptive
attributes. However if a fact table contains many additional degenerates you should
try to prune these to keep the fact table record length under control using the
following guidelines:
Large sets of
degenerate
dimensions within
a fact table should
be remodeled as
separate
dimensions
If degenerates will be used to group or filter or will be browsed in combina-
tion, they should be remodeled as attributes in a separate dimension.
Any degenerates that contain large unstructured comments should be re-
placed by a surrogate key to a COMMENT dimension (as in Figure 9-2).
If a degenerate [Y/N] flag will be frequently counted it can be remodeled as a
low cardinality additive fact with the values 0, 1 that can be summed. This is
especially useful as aggregates can be built that use this fact.
Some degenerates
can be remodeled
as useful additive
and non-additive
If the degenerate is high cardinality and will be counted distinctly it should
remain in the fact table where it will act as a non-additive fact.
facts
If a degenerate flag describes the type of value in an adjacent fact it may repre-
sent data that would be better modeled as separate additive fact columns
without the flag. For example, a REVENUE fact and a flag REVENUE TYPE
with the values: ‘E’ for estimate and ‘A’ for actual, should instead be modeled
as two facts: ESTIMATED REVENUE and ACTUAL REVENUE.
Sometimes a degenerate meets more than one of these criteria. For example, a
flag may be frequently counted, and used for grouping and constraining. In
which case, you can model it as both a fact and a dimensional attribute.
Creating How Dimensions
If you identify degenerates that should be remodeled as dimensions, check to see if
any belong in existing dimensions. For any that do not, define a new dimension
with its own surrogate key and relocate the degenerates to it, replacing them in the
fact table with the new surrogate key. This new dimension is often called a “junk”
dimension, because of its tough-to-classify mix of attributes. But it really is not
junk at all. Instead, it is a non-conformed how dimension, specific to just this set of
facts, that can often be usefully named after its matching fact table. For example, a
CALL DETAILS FACT table may need a CALL DETAIL how dimension, and a
SALES FACT table may need a SALE TYPE dimension. If a fact table has multiple
small non-conformed dimensions—typically whys and hows— they can often be
merged to reduce the number of keys in the fact table.
Move degenerates
to a physical how
dimension named
after the fact table
and replace them in
the fact table with a
surrogate key