My first Publication Agile-Data-Warehouse-Design-eBook | Page 296
276
Chapter 9
Why and How
from scratch without the bridge and the fact table would still need to be altered to
add the OPTION PACK KEY.
Column flag dimensions should only be populated with observed combinations,
otherwise they can easily grow to be bigger than fact tables. A bit flag dimension
with only 20 columns has over a million possible combinations.
If bridge table rows
contain quantities,
a pivoted dimension
can contain count
columns
If the business problem was more complex and varying quantities of each option
could be chosen to configure a product, the OPTION PACK bridge table would
need to contain an OPTION QUANTITY attribute and the OPTION PACK [PD]
pivoted dimension would contain option count columns rather than [Y/N] flags.
Similarly, if small numbers of options and options quantities were handled as
separate fact rows (to avoid a bridge table) and comparisons or combinations were
constantly used then a pivoted fact table might be created with option count facts.
Consequences
Pivoted dimensions
are limited to
relatively small and
stable value
populations
Pivoted dimensions are limited by the maximum number of columns available in a
database table (usually between 256 and 1024) and the ETL involved in automating
the maintenance of volatile combination values is complex. A pivoted dimensions
works well for Pomegranate because there are only a few hundred relatively stable
options (with several new ones being add manually each year) but it could not cope
with a possible 155,000 ICD10 diagnosis codes.
How Dimensions
How dimensions are
often degenerate
(DD) IDs that
provide useful
links to operational
source records and
unique count
measures
How dimensions document any additional information about facts that are not
captured by other dimensions. The most common how dimensions are degenerate
(DD) transaction identifiers stored in fact tables. These dimensions describe how
facts come to exist by tying them back to the original source system transactions.
They can also be invaluable for providing unique transaction counts. For example,
an ORDER ID in an ORDERS FACT table can be used to count how many orders
contained at least one laptop product line item. Using COUNT(DISTINCT
Order_ID) ensures that individual orders with several line items for different
laptops will not be over-counted. As mentioned earlier, a degenerate ID that can be
uniquely counted is essential if a star schema has one or more multi-valued dimen-
sions.
Look out for conformed degenerate dimensions and add them to the event matrix
to help you discover event sequences and milestone dependencies that can be
modeled as evolving events.