My first Publication Agile-Data-Warehouse-Design-eBook | Page 231

Dimensional Design Patterns for Time and Location 211 While these calendar attributes help tremendously, there is still the question of what date the “year to date” should be. For data warehouses that are loaded nightly, common sense might suggest a “year to date” of yesterday (SYSDATE –1). How- ever, not every business process runs on the same schedule, and therefore not every fact table is loaded nightly. Some fact tables may be loaded weekly, monthly, or on- demand when source data extracts becomes available—a common requirement for external data feeds. This causes problems when trying to compare YTD figures for this year with YTD figures for last year. YTD figures for this year may not contain data up to yesterday whereas the YTD figures for last year will contain data right up to yesterday minus one year. You need to know Even when fact tables are loaded nightly, they may not be loaded completely. ETL errors will occur from time to time, and complete data will not be available for reporting until these errors are fixed. It may also be quite normal for some ETL processes to encounter “late-arriving data” where the complete set of events for a particular date will not be fully available until several days (or weeks) after that date; for example, roaming call charges from international mobile networks, or medical insurance claims submitted long after treatments were given. Comparisons between the current year and last year are inaccurate whenever data is complete for last year and the current year is still a work in progress. Because of ETL when the YTD facts were last loaded to make valid comparisons with previous years errors or “late- arriving data”, you also need to know the last complete load for YTD facts Solution Information about the status of each fact table—when it was last loaded and the last complete day’s worth of data it contains—should be stored in the data ware- house rather than in the heads of ETL support staff or BI users. It should be avail- able as data in a format that BI tools can readily use. The FACT STATE table (shown in Figure 7-4) supports valid YTD comparisons by storing the recency and completeness of each fact table in a format that can easily be used with the CALENDAR dimension. It contains the most recent load date and the last complete load date of each fact table. The most recent load dates should be updated automatically by all fact-loading ETL processes. For ETL processes that are subject to unpredictable late-arriving data you may have to manually set the LAST COMPLETE LOAD DATE. A FACT STATE table holds the most recent load date and the last complete load date of each fact table Figure 7-4 FACT STATE table