My first Publication Agile-Data-Warehouse-Design-eBook | Page 269
Design Patterns for High Performance Fact Tables and Flexible Measures
249
To answer the “Who didn’t attend but should have?” question about seminars,
there is a case for making SEMINAR ATTENDANCE FACT a normal fact table by
adding an ATTENDANCE fact. This would be 1 if an invited prospect attends and
0 for a “no show”. Normally fact tables don’t record events that didn’t happen,
because there are just too many of them. Airlines don’t record all the flights you
didn’t take today—even if you are their best frequent flyer. But in the case of sales
seminars Pomegranate didn’t invite the whole world, so the number of extra
records for invitees that did not attend would be manageable. If the number of
A dummy fact (always equal to 1) can be added to factless fact tables to provide an
additive fact that can be summed. This makes it easier to build aggregates of large
factless fact tables that can be used “invisibly” by aggregate navigation (see Aggrega-
tion later in this chapter). The aggregate will have the same fact but it will hold
values other than 1. Also, some BI tools only recognize a table as a fact table if it
has at least one fact. A dummy additive
non-events is not
too high, a 0/1 fact
can be added to
count “what didn’t
happen”
fact (equal to 1) can
be added to support
aggregate
navigation
In Figure 8-13, the PRODUCT dimension is defined as a multi-level (ML) dimension
and the PRODUCT KEY in SEMINAR ATTENDANCE FACT is marked as ML too,
documenting that it makes use of the multi-level feature of the dimension. This
design allows the star schema to record attendance for seminars that are single
product launches and seminars that promote entire product categories.
Fact Table Optimization
Because fact tables are so large—accounting for the vast majority of the storage and
I/O activity of a dimensional data warehouse—it is essential to design them for
high performance. The techniques for optimizing fact table performance are
downsizing, indexing, partitioning, and aggregation.
Downsizing
The first way to improve performance is to design fact tables that are as compact as
possible without compromising their usability. The following checklist gathers
together techniques for reducing fact table row width:
Use integer surrogate keys as dimensional foreign keys. Keep business keys in
dimensions.
Use date keys instead of datetime data types—especially if time is unused.
Reduce the number of dimension keys—combine small why and how dimen-
sions (see Chapter 9).
Improve fact table
performance by
reducing row width