My first Publication Agile-Data-Warehouse-Design-eBook | Page 270

250 Chapter 8 How Many Move free text comments and lengthy sets of degenerate flags into their own physical dimensions and replace them with short foreign keys (See Chapter 9). Don’t store a large number of facts that can easily be calculated intra-record; e.g., don’t store all the durations that can be calculated from a smaller number of milestone timestamps. Limit fact history to only the data that is useful for BI The next thing to consider is the length of each fact table. You should try to limit history to what the BI users really need. Don’t use fact tables as an expensive archival strategy. If the auditors need more history than the BI users, they should get that from the operational system of record, not the data warehouse. Regulatory requirements are not analytical requirements, so don’t automatically load 20 years of transactional history just because it exists. If the business has changed substan- tially in that time how far can queries go back and make valid comparisons? Also, the further back you go the harder it becomes to load the data because data quality challenges tend to increase. When modeling business events with stakeholders, ask for event stories describ- ing the earliest when details that BI users will need to work with. The most interesting data is the most recent. If you have years of history to load, start with the current year and work backwards—partitioning can help to do this efficiently. Don’t bother loading the oldest data until stakeholders ask for it. Indexing Create query indexes on foreign keys to support “star join optimization” More query indexes can improve BI performance but slowdown ETL Accumulating and period-to-date snapshots also need an ETL update index After you have done all that you can to control the size of a fact table the next issue to consider is how to index it for query performance. Here you should seek your DBMS vendor’s advice on defining some form of “star join index.” This generally involves creating a bitmap index on each dimensional foreign key—but techniques vary by DBMS and by version, with new data warehousing index strategies being added all the time (we hope). Whatever indexing techniques you use, there is inevitably a trade-off between query performance and ETL processing time. Your priorities should be heavily biased towards query performance—but BI users can only query what you can manage to load in the available time—so index thoughtfully! In addition to query indexes, accumulating snapshots and period-to-date (PTD) periodic snapshots need an ETL index to support efficient updates. This will be an OLTP-style unique index using GD columns such as the ORDER ID degenerate dimension in CUSTOMER ORDERS. Transaction fact tables and most periodic snapshots are insert-only so they do not require a unique index, as long as ETL processes can guarantee fact uniqueness.