My first Publication Agile-Data-Warehouse-Design-eBook | Page 253
Design Patterns for High Performance Fact Tables and Flexible Measures
233
Fact Table Granularity
A fact table’s granularity is its level of detail: the meaning of each fact row in the
table. Granularity can be stated in business terms and/or dimensionally. For exam-
ple, the business definition of granularity for an order fact table is “one record per
order line item”, while the dimensional granularity is “orders by date, time, cus-
tomer, and product”. Transaction fact table and accumulating snapshot granularity
tends to be defined in business terms while periodic snapshot granularity is defined
dimensionally. Whichever approach you choose (often both, for the benefit of
stakeholders and the DW team), stating and clearly documenting the granularity,
is an essential step in fact table design. Fact tables that have fuzzy or mixed granu-
larity definitions are impossible to build and use correctly. Granularity
Granularity is documented in the model by recording the combination of granular-
ity dimensions (GD) that uniquely identify each fact. For most transaction fact
tables and accumulating snapshots the list of GD columns will include a degener-
ate transaction ID dimension; for example, a call detail fact table with a business
granularity of “one row per phone call”, can use a degenerate CALL REFERENCE
NUMBER [GD] to uniquely identify each row. This succinct granularity definition
is very useful for ETL processing, but for BI purposes it can be helpful if the granu-
larity can also be defined using dimensions that are more likely to be queried—
such as customer and call timestamp (assuming a customer can only make one call
at a time). These alternative granularity definitions can be documented using
numbered GD codes. For example, CALL REFERENCE NUMBER [GD1] and
CUSTOMER KEY [GD2], CALL DATE KEY [GD2], CALL TIME KEY [GD2]. Granularity can be
describes a fact
table’s level of
detail: the meaning
of each fact row. It
must be clearly
documented
stated in business
terms or
dimensionally by
listing GD columns
For accumulating snapshots and period-to-date snapshots that must be updated,
GD columns, especially degenerate IDs, are used to define unique update indexes
for fast ETL processing. For advice on fact table indexes see Indexing later in this
chapter.
Modeling Evolving Events
Evolving events represent business processes that are complex enough or take
enough time to complete that they are described as sequences of smaller milestone
events. You can think of them as multi-verb events because each milestone can
represent a discrete event (verb). These multiple verbs can be modeled as a single
evolving event in two ways:
Related events that
represent a process
sequence can be
modeled as multi-
verb evolving
events, initially or
Initially as evolving: An evolving event can emerge directly in response to a
modelstorming “Who does what?” question when stakeholders think of an
event as only the beginning of a time-consuming process that needs to be
retrospectively