My first Publication Agile-Data-Warehouse-Design-eBook | Page 227

Dimensional Design Patterns for Time and Location ISO Date Keys Something else is unusual about the DATE KEYs in Figure 7-2: they are based on the ISO date format YYYYMMDD, which breaks the rule that surrogate keys should not be “intelligent.” For date keys—and only for date keys—the benefits for breaking this rule outweigh any negatives: 207 ISO date keys are in the format YYYYMMDD ETL benefits: Date keys can be derived from the source date values directly, rather than with a surrogate key lookup table. This can be significant when processing events that contain many when details that all need to be converted into fact table date keys. ISO keys are easy DBMS benefit: The readable ISO format makes it easier to set up fact table partition ranges. ISO keys are easy to generate to read, which can be good (for ETL) BI benefits: None! BI queries should not use the YYYYMMDD format as a quick way of filtering facts and avoiding joins to the Calendar dimension with its consistent date descriptions. Best not to tell BI developers or users—keep this little secret between the ETL team and the DBAs. Epoch-Based Date Keys Epoch-based date keys are generated by subtracting an epoch or origin date from the date; for example, the DATE KEYs in Figure 7-5 have been generated using an epoch of 1/1/1900. Epoch keys can be a good alternative to ISO format if your ETL toolset is faster at date arithmetic than date reformatting. Epoch keys are also small contiguous numbers that may work better with some DBMS query optimizers than ISO keys, which are much larger—with a gap of 8770 between every December 31 st and January 1 st . The downside of epoch keys is that they are harder to read when setting up partition ranges. However, this may also be a BI benefit, because epoch keys are far less likely to be abused by queries using them directly as filter or decodes (instead of using the appropriate calendar dimension attributes). Which is the best approach ISO or epoch? If performance is paramount, you should speed test both with your ETL toolset and DBMS platform. You should also wait until you have read Date Version Keys , later in this chapter. before you decide on your date surrogate key strategy. Create a version of your CALENDAR dimension that is keyed on a date rather than a surrogate key (as a materialized view). This can be useful as an outrigger that can be joined to date data type dimensional attributes such as FIRST PURCHASE DATE in CUSTOMER or HIRE DATE in EMPLOYEE. This allows them to be grouped and filtered using all the rich CALENDAR attributes for very little extra ETL effort. A date-keyed CALENDAR can also be useful for prototyping BI queries using sample data that has not yet been loaded into fact tables and converted to DATE KEYs. But it should never be used in place of a surrogate key-based CALENDAR for querying fact tables. and bad (for BI) Epoch date keys are based on an origin date; e.g., 1/1/1900 Epoch keys are also easy to generate Epoch keys are more compact than ISO keys but less easy to read