My first Publication Agile-Data-Warehouse-Design-eBook | Page 271
Design Patterns for High Performance Fact Tables and Flexible Measures
251
If a fact is frequently used for ranking or range banding you should consider
indexing it to speed up sorting, and joining to a Range Band dimension (described
in Chapter 9).
Partitioning
Partitioning allows large tables to be stored as a number of smaller physical
datasets based on value ranges. If your DBMS supports table partitioning you
should consider partition large fact tables on the surrogate key of their primary
date dimension. Partitioning on date can be made simpler by carefully designing
your calendar dimension surrogate keys (see Chapter 7, Date Keys for details).
Partitioning has a number of benefits for ETL, query performance and administra-
tion:
Large fact tables
can be partitioned
on date key ranges
ETL performance: Partitions with local indexes that can be dropped and
rebuilt independently allow ETL process to use bulk/fast mode loads into an
empty partition while they are un-indexed. If only the most recent partitions of
accumulating and PTD snapshots are being updated, unique update indexes
(that are used for ETL, not queries) can be dropped on historic partitions. Par-
tition swapping allows ETL to update the data warehouse while queries con-
tinue to run. Loading into empty
Fact table pruning: Many fact tables need a fixed amount of history (24
months, 36 months). Monthly partitions allow older data to be efficiently re-
moved by truncating a partition rather than row-by-row deletion of millions of
records. Partitions can be
Real-time support: Fact tables that need to be refreshed frequently throughout
the day can be implemented using real-time “hot partitions”. These are special
un-indexed in-memory partitions that are trickle-fed from the operational
source. During the day queries use these like any normal partition, and at night
their data is merged with the fully indexed historical partitions. Un-indexed “hot
Query performance: DBMS optimizers will ignore partitions that are outside
of a query’s date range, and some can read multiple partitions in parallel. But
splitting a table into too many small partitions can also hurt performance, es-
pecially for broad queries that must “stitch” many partitions together. This can
be avoided by creating aggregates to answer the broad queries. Query optimizers
For more information on real-time partitions and ETL processing see:
The Data Warehouse Toolkit, Second Edition, Ralph Kimball, Margy Ross (Wiley,
2002), pages 135–139.
The Data Warehouse ETL Toolkit, Ralph Kimball and Joe Caserta (Wiley, 2004),
Chapter 11, “Real-Time ETL Systems.”
partitions speeds up
ETL processing.
Partition swapping
enables 24/7 BI
access
truncated to rapidly
delete unneeded
history
partitions” can
support real-time
ETL inserts
can use partition
pruning and parallel
access to speed up
certain queries