My first Publication Agile-Data-Warehouse-Design-eBook | Page 269

Design Patterns for High Performance Fact Tables and Flexible Measures 249 To answer the “Who didn’t attend but should have?” question about seminars, there is a case for making SEMINAR ATTENDANCE FACT a normal fact table by adding an ATTENDANCE fact. This would be 1 if an invited prospect attends and 0 for a “no show”. Normally fact tables don’t record events that didn’t happen, because there are just too many of them. Airlines don’t record all the flights you didn’t take today—even if you are their best frequent flyer. But in the case of sales seminars Pomegranate didn’t invite the whole world, so the number of extra records for invitees that did not attend would be manageable. If the number of A dummy fact (always equal to 1) can be added to factless fact tables to provide an additive fact that can be summed. This makes it easier to build aggregates of large factless fact tables that can be used “invisibly” by aggregate navigation (see Aggrega- tion later in this chapter). The aggregate will have the same fact but it will hold values other than 1. Also, some BI tools only recognize a table as a fact table if it has at least one fact. A dummy additive non-events is not too high, a 0/1 fact can be added to count “what didn’t happen” fact (equal to 1) can be added to support aggregate navigation In Figure 8-13, the PRODUCT dimension is defined as a multi-level (ML) dimension and the PRODUCT KEY in SEMINAR ATTENDANCE FACT is marked as ML too, documenting that it makes use of the multi-level feature of the dimension. This design allows the star schema to record attendance for seminars that are single product launches and seminars that promote entire product categories. Fact Table Optimization Because fact tables are so large—accounting for the vast majority of the storage and I/O activity of a dimensional data warehouse—it is essential to design them for high performance. The techniques for optimizing fact table performance are downsizing, indexing, partitioning, and aggregation. Downsizing The first way to improve performance is to design fact tables that are as compact as possible without compromising their usability. The following checklist gathers together techniques for reducing fact table row width: Use integer surrogate keys as dimensional foreign keys. Keep business keys in dimensions. Use date keys instead of datetime data types—especially if time is unused. Reduce the number of dimension keys—combine small why and how dimen- sions (see Chapter 9). Improve fact table performance by reducing row width