My first Publication Agile-Data-Warehouse-Design-eBook | Page 231
Dimensional Design Patterns for Time and Location
211
While these calendar attributes help tremendously, there is still the question of
what date the “year to date” should be. For data warehouses that are loaded nightly,
common sense might suggest a “year to date” of yesterday (SYSDATE –1). How-
ever, not every business process runs on the same schedule, and therefore not every
fact table is loaded nightly. Some fact tables may be loaded weekly, monthly, or on-
demand when source data extracts becomes available—a common requirement for
external data feeds. This causes problems when trying to compare YTD figures for
this year with YTD figures for last year. YTD figures for this year may not contain
data up to yesterday whereas the YTD figures for last year will contain data right
up to yesterday minus one year. You need to know
Even when fact tables are loaded nightly, they may not be loaded completely. ETL
errors will occur from time to time, and complete data will not be available for
reporting until these errors are fixed. It may also be quite normal for some ETL
processes to encounter “late-arriving data” where the complete set of events for a
particular date will not be fully available until several days (or weeks) after that
date; for example, roaming call charges from international mobile networks, or
medical insurance claims submitted long after treatments were given. Comparisons
between the current year and last year are inaccurate whenever data is complete for
last year and the current year is still a work in progress. Because of ETL
when the YTD facts
were last loaded to
make valid
comparisons with
previous years
errors or “late-
arriving data”, you
also need to know
the last complete
load for YTD facts
Solution
Information about the status of each fact table—when it was last loaded and the
last complete day’s worth of data it contains—should be stored in the data ware-
house rather than in the heads of ETL support staff or BI users. It should be avail-
able as data in a format that BI tools can readily use.
The FACT STATE table (shown in Figure 7-4) supports valid YTD comparisons by
storing the recency and completeness of each fact table in a format that can easily
be used with the CALENDAR dimension. It contains the most recent load date and
the last complete load date of each fact table. The most recent load dates should be
updated automatically by all fact-loading ETL processes. For ETL processes that
are subject to unpredictable late-arriving data you may have to manually set the
LAST COMPLETE LOAD DATE.
A FACT STATE
table holds the most
recent load date and
the last complete
load date of each
fact table
Figure 7-4
FACT STATE table