My first Publication Agile-Data-Warehouse-Design-eBook | Page 171
150
Chapter 5
Use unique example ranges for the most common surrogate keys; e.g., 1–1000 for
customers, 2000–3000 for products. This can help the DW team read the foreign
key examples in fact tables (stakeholders would never look at these values). This
convention is just for human readability; reserving value ranges for specific
dimensions keys in the physical database is not recommended.
ETL and Audit Attributes
Add effective dating
attributes for
managing SCDs
If you have already modeled how the dimension should track history and discov-
ered its CV and HV attributes, you may already have the following SCD ETL
administrative attributes in your dimensions (if not you should add them now):
EFFECTIVE DATE
END DATE
CURRENT
Effective dating
attributes support
point in time
dimension queries
EFFECTIVE DATE and END DATE define the valid date range for each dimen-
sion row. For example, in Figure 5-9 employee Bond’s three MARITAL STATUS
(HV) changes have unique effective date ranges—with no overlaps or gaps. For the
current version of each EMPLOYEE there is no END DATE. But rather than
leaving END DATE as Null, make sure ETL processes set it to the maximum date
supported by the database. This allows query tools to use simple BETWEEN logic
when asking questions about the dimension population at a specific point in time.
For example, a query to count the number of employees in each city at the close of
2011 would be:
SELECT city, count(*)
FROM employee
WHERE TO_DATE('31/12/2011','DD/MM/YYYY') BETWEEN
effective_date AND end_date
GROUP BY city
Queries should use
a CURRENT flag
rather than the max
DBMS date
SCD administrative
attributes should be
Not Null
The CURRENT flag indicates whether a row version is current (Y). This could be
inferred from the value of END DATE but this saves stakeholders and query tools
from remembering the otherwise meaningless maximum date value, which can
vary by DBMS.
SCD administrative attributes should all be defined as Not Null. END DATE
should have a default value of the maximum database date, and CURRENT should
default to “Y”.
EFFECTIVE DATE and END DATE in Figure 5-10 are shown as dates. This would
allow the dimension to track one set of changes per day because the minimum
effective range for a historical version is one day. Multiple changes on the same
day (if they could be detected from the source system feed) would have to be
batched into a single update to the dimension. This is a reasonable approach if
multiple changes to the same attribute on the same day are corrections. If inter-
day changes are more significant and must be tracked to match inter-day facts,
EFFECTIVE DATE and END DATE need to be stored as full timestamps.