My first Publication Agile-Data-Warehouse-Design-eBook | Page 272
252
Chapter 8
How Many
Some DBMSs allow you to partition on more than one dimension. This can be
useful when a particular dimension is frequently used to constrain queries or
represents the way source data extracts are organized for ETL processing; for
example, by organization, geography, or data provider.
Aggregation
Aggregates act as
group by indexes for
existing fact tables
DBMS aggregate
navigation
automates
aggregate usage
Small high
performance
aggregates can be
designed using the
lost, shrunken and
collapsed patterns
An aggregate (AG) (fact table) is a stored summary of a base fact table. It acts like a
group by index on the base facts—speeding up queries that do not need to return
detailed figures. They are an essential complement to traditional where clause
indexes. A star-join index optimizes highly constrained queries that need to sum-
marize smaller quantities of data, whereas aggregates optimize broad, loosely
constrained queries that need to summarize large quantities of data. Aggregates are
derived fact tables that are very similar to periodic snapshots, dimensionally and in
terms of granularity. They differ from periodic snapshots in that they do not
provide any new facts. Instead, they simply contain summarized versions of the
additive facts from base fact tables.
Historically, data warehouse queries were written to use specific aggregates in the
form of summary data marts. Today, many DBMSs provide aggregate navigation
that automatically redirects queries to the best (smallest) aggregate. When this
happens, the aggregates are invisible to the BI users and query tools.
Aggregates must be designed so that they match the GROUP BY and WHERE
clauses of the most popular queries, or they will be not be used. They also must be
designed so that they are many times smaller than existing fact tables—to provide
performance improvements that justify the cost of maintaining them. Twenty
times smaller is a useful guideline—which can lead to a corresponding query
performance boost. The three types of aggregate design in a dimensional data
warehouse are lost, shrunken, and collapsed.
Lost Dimension Aggregate Pattern
Lost dimension aggregates are created by summarizing a fact table using a subset of
its dimensions. Figure 8-14 shows a lost dimension aggregate formed by dropping
the customer and store dimensions.
Figure 8-14
Lost dimension
aggregate