My first Publication Agile-Data-Warehouse-Design-eBook | Page 266
246
Chapter 8
Atomic-level fact
tables are required
to provide fully
additive unique
counts
How Many
snapshot they may not be as additive as you hope, often turning out to be semi-
additive or non-additive when you try to sum them further. If so, the only way to
calculate a correct unique count is to go back to the transactions and count them
distinctly within the context of the query. The status counts SHIPPED and
DELIVERED, in Figures 8-5 and 8-9, do not suffer from this problem because they
count order item states uniquely at their atomic-level of detail, whereas the event
counts SHIPMENTS and DELIVERIES do, because they count shipments and
deliveries aggregated to the order item level. If stakeholders want the total number
of deliveries this month vs. last month they cannot get the answer from ORDER
FACT [AS] using sum(Deliveries) . Instead they need to use DELIVERY
FACT [TF] to count(distinct Delivery_Numbers) .
Think of degenerate dimensions as non-additive facts. They cannot be summed
but can be counted distinctly to produce useful additive measures. For example,
count(distinct Receipt_Number) provides an additive count of unique
sales transactions/shopping baskets.
Heterogeneous Facts Pattern
Heterogeneous
products, that are
described differently,
are often measured
in the same way
In Chapter 6, we discussed product dimension design for handling heterogeneous
products, which involved moving large sets of exclusive (Xn) attributes into their
own more efficient swappable subset dimensions. Thankfully, heterogeneously
described products are often measured homogenously; for example, a large retailer
might sell everything from milk to DVD players, but it doesn’t matter if items are
best described by fat content (“2% semi-skimmed”) or technical features (“Blu-ray
recording”), they are all measured the same way: by quantity sold, revenue, cost,
and margin, using the same sales fact table.
Problem/Requirement
Heterogeneous
products that have
heterogeneous facts
can give rise to
inefficient “one size
fits all” fact table
designs
However, in certain businesses—such as banking—heterogeneous products will
have heterogeneous facts: very different ways of being measured. This can make fact
table designs that attempt to provide an integrated view of the business, very
inefficient. For example, Figure 8-11 shows a small portion of a monthly account
snapshot that will allow all major product types (checking, saving, mortgages,
loans, and credit cards accounts) to be analyzed. Unfortunately this “one size fits
all” fact table will be very wide and sparsely populated. The dimensional keys and a
small set of core facts that measure all account types (ACCOUNT BALANCE and
TRANSACTION COUNT) would always be present, but the majority of the facts
are marked as exclusive (Xn) with their validity based upon the defining character-
istic PRODUCT CATEGORY [DC]. These will be null most of the time, making
the table “fact rich but data poor”. Depending on the database technology used, the
null facts may take up far less storage space than valid facts, but if there are hun-
dreds of facts in total across all lines of business this design will still be extremely
difficult to manage and likely to perform poorly.