My first Publication Agile-Data-Warehouse-Design-eBook | Page 283

Dimensional Design Patterns for Cause and Effect 263 Figure 9-1 PROMOTION dimension While a PROMOTION dimension may be a small dimension, with only of few hundred promotional condition combinations, it can be challenging to build and assign to the facts because of its mix of direct and indirect causal factors. Direct causal factors are usually straightforward to assign because they are captured by the operational system but many of the interesting indirect causes may not be. For example, a sales system will not (reliably) record whether discounts are also pro- moted by TV ads or special (in-store or on-website) product displays because this information is not needed to complete each sale transaction and print a valid invoice/receipt (which must show any direct discount details). A richly descriptive promotion dimension will require this information to be sourced from elsewhere— typically from less formal data sources, like spreadsheets and word processing documents, and its ETL processing will need to be sophisticated enough to assign the full combination of promotional conditions correctly. Indirect causal values are often more difficult to source than direct causal values The DW/BI team may have to build small data entry applications to capture causal descriptions and timetables when this information is “known to business but not known to any system.” If BI users need to analyze promotion return on investment (ROI), the data ware- house will need an additional Promotion Spend Fact table—using the same con- formed PROMOTION dimension. BI users can then run drill-across queries against both PROMOTION SPEND FACT and SALES FACT to compare promo- tion costs to sales revenue uplift. PROMOTION may be conformed across sales and promotion cost star schemas Unstructured Why Dimensions In some cases valuable direct causal details are attached to transactions as unstruc- tured comments. These potentially large textual columns should be removed from fact tables and placed in separate dimensions, to maximize fact table performance for the majority of queries that just want to rapidly aggregate the additive facts. The resulting text dimensions are why dimensions. Figure 9-2 shows an example COMMENT dimension that contains reasons that salespeople have given for varying the price for specific customers. This could be turned into a better why dimension by adding additional attributes that codify the free-format text reasons based on interesting keywords. Embellishing the table with low cardinality sets of descriptive tags would provide better report row headers and more consistent filters. Direct causal factors are often captured as free-format text reasons. These non-additive text facts should be removed from fact tables and placed in text dimensions