My first Publication Agile-Data-Warehouse-Design-eBook | Page 301

Dimensional Design Patterns for Cause and Effect The STEP IN PURCHASE dimension role lets BI users analyze page visit sequences that lead to product purchases and ones that don’t: page views that don’t lead to a purchase would have a STEP IN PURCHASE KEY that points to the "Not Applicable" row 0 in STEP. 281 Events that are not part of a sequence use STEP row 0 Consequences STEP dimensions are relatively simple to populate from spreadsheets, but they grow surprisingly quickly as the maximum number of steps increases. The formula for calculating the number of rows needed for n total steps is: n  (n+1)/2 . There- fore, 200 steps = 20,100 rows, and 1,000 steps would be more than half a million rows! If 200 steps are more than adequate for 99% of all sequences, pre-populate your STEP dimension accordingly, and create special step number records greater than 200 if/when they are needed. These records would use special STEP KEY values (e.g. the negative step number) and would contain the STEP NUMBER but have missing values for the other attributes to denote that they are steps in "excep- tionally long" sequences. Often exceptionally long sequences are the result of ETL processing errors or poorly defined business rules that fail to spot the end of a normal sequence. Although designing and creating a STEP dimension is straightforward, attaching it to the facts can require significant additional ETL processing. The events that belong to the same sequence have to be identified by an appropriate business rule (for example, all the page visits from the same IP address that are no more than 10 minutes apart) and counted in a first pass of the data; only then can the correct STEP KEYs be assigned to each fact row in a second pass. STEP dimensions grow in size quickly. You should set a maximum number of steps for the majority of sequences STEP dimensions require additional ETL processing to make two passes of the data Overloading facts with STEP information and other richly descriptive why and how dimensions takes significant additional effort from the ETL team. You should make sure you take them to lunch—on a regular basis. Audit Dimension Pattern Problem/Requirement No treatment of how and why would be complete without covering the perplexed stakeholder’s questions: How did this data get into our data warehouse? Why are the figures so high/low? Too often the answers to these questions are locked away in an ETL tool metadata repository—inaccessible to BI users who need this information the most. Stakeholders want to query data lineage