My first Publication Agile-Data-Warehouse-Design-eBook | Page 227
Dimensional Design Patterns for Time and Location
ISO Date Keys
Something else is unusual about the DATE KEYs in Figure 7-2: they are based on
the ISO date format YYYYMMDD, which breaks the rule that surrogate keys
should not be “intelligent.” For date keys—and only for date keys—the benefits for
breaking this rule outweigh any negatives:
207
ISO date keys are
in the format
YYYYMMDD
ETL benefits: Date keys can be derived from the source date values directly,
rather than with a surrogate key lookup table. This can be significant when
processing events that contain many when details that all need to be converted
into fact table date keys. ISO keys are easy
DBMS benefit: The readable ISO format makes it easier to set up fact table
partition ranges. ISO keys are easy
to generate
to read, which can
be good (for ETL)
BI benefits: None! BI queries should not use the YYYYMMDD format as a
quick way of filtering facts and avoiding joins to the Calendar dimension with
its consistent date descriptions. Best not to tell BI developers or users—keep
this little secret between the ETL team and the DBAs.
Epoch-Based Date Keys
Epoch-based date keys are generated by subtracting an epoch or origin date from
the date; for example, the DATE KEYs in Figure 7-5 have been generated using an
epoch of 1/1/1900. Epoch keys can be a good alternative to ISO format if your ETL
toolset is faster at date arithmetic than date reformatting. Epoch keys are also small
contiguous numbers that may work better with some DBMS query optimizers than
ISO keys, which are much larger—with a gap of 8770 between every December 31 st
and January 1 st . The downside of epoch keys is that they are harder to read when
setting up partition ranges. However, this may also be a BI benefit, because epoch
keys are far less likely to be abused by queries using them directly as filter or
decodes (instead of using the appropriate calendar dimension attributes). Which is
the best approach ISO or epoch? If performance is paramount, you should speed
test both with your ETL toolset and DBMS platform. You should also wait until
you have read Date Version Keys , later in this chapter. before you decide on your
date surrogate key strategy.
Create a version of your CALENDAR dimension that is keyed on a date rather than
a surrogate key (as a materialized view). This can be useful as an outrigger that
can be joined to date data type dimensional attributes such as FIRST PURCHASE
DATE in CUSTOMER or HIRE DATE in EMPLOYEE. This allows them to be
grouped and filtered using all the rich CALENDAR attributes for very little extra
ETL effort. A date-keyed CALENDAR can also be useful for prototyping BI queries
using sample data that has not yet been loaded into fact tables and converted to
DATE KEYs. But it should never be used in place of a surrogate key-based
CALENDAR for querying fact tables.
and bad (for BI)
Epoch date keys are
based on an origin
date; e.g., 1/1/1900
Epoch keys are also
easy to generate
Epoch keys are
more compact than
ISO keys but less
easy to read