My first Publication Agile-Data-Warehouse-Design-eBook | Page 271

Design Patterns for High Performance Fact Tables and Flexible Measures 251 If a fact is frequently used for ranking or range banding you should consider indexing it to speed up sorting, and joining to a Range Band dimension (described in Chapter 9). Partitioning Partitioning allows large tables to be stored as a number of smaller physical datasets based on value ranges. If your DBMS supports table partitioning you should consider partition large fact tables on the surrogate key of their primary date dimension. Partitioning on date can be made simpler by carefully designing your calendar dimension surrogate keys (see Chapter 7, Date Keys for details). Partitioning has a number of benefits for ETL, query performance and administra- tion: Large fact tables can be partitioned on date key ranges ETL performance: Partitions with local indexes that can be dropped and rebuilt independently allow ETL process to use bulk/fast mode loads into an empty partition while they are un-indexed. If only the most recent partitions of accumulating and PTD snapshots are being updated, unique update indexes (that are used for ETL, not queries) can be dropped on historic partitions. Par- tition swapping allows ETL to update the data warehouse while queries con- tinue to run. Loading into empty Fact table pruning: Many fact tables need a fixed amount of history (24 months, 36 months). Monthly partitions allow older data to be efficiently re- moved by truncating a partition rather than row-by-row deletion of millions of records. Partitions can be Real-time support: Fact tables that need to be refreshed frequently throughout the day can be implemented using real-time “hot partitions”. These are special un-indexed in-memory partitions that are trickle-fed from the operational source. During the day queries use these like any normal partition, and at night their data is merged with the fully indexed historical partitions. Un-indexed “hot Query performance: DBMS optimizers will ignore partitions that are outside of a query’s date range, and some can read multiple partitions in parallel. But splitting a table into too many small partitions can also hurt performance, es- pecially for broad queries that must “stitch” many partitions together. This can be avoided by creating aggregates to answer the broad queries. Query optimizers For more information on real-time partitions and ETL processing see: The Data Warehouse Toolkit, Second Edition, Ralph Kimball, Margy Ross (Wiley, 2002), pages 135–139. The Data Warehouse ETL Toolkit, Ralph Kimball and Joe Caserta (Wiley, 2004), Chapter 11, “Real-Time ETL Systems.” partitions speeds up ETL processing. Partition swapping enables 24/7 BI access truncated to rapidly delete unneeded history partitions” can support real-time ETL inserts can use partition pruning and parallel access to speed up certain queries