My first Publication Agile-Data-Warehouse-Design-eBook | Page 266

246 Chapter 8 Atomic-level fact tables are required to provide fully additive unique counts How Many snapshot they may not be as additive as you hope, often turning out to be semi- additive or non-additive when you try to sum them further. If so, the only way to calculate a correct unique count is to go back to the transactions and count them distinctly within the context of the query. The status counts SHIPPED and DELIVERED, in Figures 8-5 and 8-9, do not suffer from this problem because they count order item states uniquely at their atomic-level of detail, whereas the event counts SHIPMENTS and DELIVERIES do, because they count shipments and deliveries aggregated to the order item level. If stakeholders want the total number of deliveries this month vs. last month they cannot get the answer from ORDER FACT [AS] using sum(Deliveries) . Instead they need to use DELIVERY FACT [TF] to count(distinct Delivery_Numbers) . Think of degenerate dimensions as non-additive facts. They cannot be summed but can be counted distinctly to produce useful additive measures. For example, count(distinct Receipt_Number) provides an additive count of unique sales transactions/shopping baskets. Heterogeneous Facts Pattern Heterogeneous products, that are described differently, are often measured in the same way In Chapter 6, we discussed product dimension design for handling heterogeneous products, which involved moving large sets of exclusive (Xn) attributes into their own more efficient swappable subset dimensions. Thankfully, heterogeneously described products are often measured homogenously; for example, a large retailer might sell everything from milk to DVD players, but it doesn’t matter if items are best described by fat content (“2% semi-skimmed”) or technical features (“Blu-ray recording”), they are all measured the same way: by quantity sold, revenue, cost, and margin, using the same sales fact table. Problem/Requirement Heterogeneous products that have heterogeneous facts can give rise to inefficient “one size fits all” fact table designs However, in certain businesses—such as banking—heterogeneous products will have heterogeneous facts: very different ways of being measured. This can make fact table designs that attempt to provide an integrated view of the business, very inefficient. For example, Figure 8-11 shows a small portion of a monthly account snapshot that will allow all major product types (checking, saving, mortgages, loans, and credit cards accounts) to be analyzed. Unfortunately this “one size fits all” fact table will be very wide and sparsely populated. The dimensional keys and a small set of core facts that measure all account types (ACCOUNT BALANCE and TRANSACTION COUNT) would always be present, but the majority of the facts are marked as exclusive (Xn) with their validity based upon the defining character- istic PRODUCT CATEGORY [DC]. These will be null most of the time, making the table “fact rich but data poor”. Depending on the database technology used, the null facts may take up far less storage space than valid facts, but if there are hun- dreds of facts in total across all lines of business this design will still be extremely difficult to manage and likely to perform poorly.