My first Publication Agile-Data-Warehouse-Design-eBook | Page 171

150 Chapter 5 Use unique example ranges for the most common surrogate keys; e.g., 1–1000 for customers, 2000–3000 for products. This can help the DW team read the foreign key examples in fact tables (stakeholders would never look at these values). This convention is just for human readability; reserving value ranges for specific dimensions keys in the physical database is not recommended. ETL and Audit Attributes Add effective dating attributes for managing SCDs If you have already modeled how the dimension should track history and discov- ered its CV and HV attributes, you may already have the following SCD ETL administrative attributes in your dimensions (if not you should add them now): EFFECTIVE DATE END DATE CURRENT Effective dating attributes support point in time dimension queries EFFECTIVE DATE and END DATE define the valid date range for each dimen- sion row. For example, in Figure 5-9 employee Bond’s three MARITAL STATUS (HV) changes have unique effective date ranges—with no overlaps or gaps. For the current version of each EMPLOYEE there is no END DATE. But rather than leaving END DATE as Null, make sure ETL processes set it to the maximum date supported by the database. This allows query tools to use simple BETWEEN logic when asking questions about the dimension population at a specific point in time. For example, a query to count the number of employees in each city at the close of 2011 would be: SELECT city, count(*) FROM employee WHERE TO_DATE('31/12/2011','DD/MM/YYYY') BETWEEN effective_date AND end_date GROUP BY city Queries should use a CURRENT flag rather than the max DBMS date SCD administrative attributes should be Not Null The CURRENT flag indicates whether a row version is current (Y). This could be inferred from the value of END DATE but this saves stakeholders and query tools from remembering the otherwise meaningless maximum date value, which can vary by DBMS. SCD administrative attributes should all be defined as Not Null. END DATE should have a default value of the maximum database date, and CURRENT should default to “Y”. EFFECTIVE DATE and END DATE in Figure 5-10 are shown as dates. This would allow the dimension to track one set of changes per day because the minimum effective range for a historical version is one day. Multiple changes on the same day (if they could be detected from the source system feed) would have to be batched into a single update to the dimension. This is a reasonable approach if multiple changes to the same attribute on the same day are corrections. If inter- day changes are more significant and must be tracked to match inter-day facts, EFFECTIVE DATE and END DATE need to be stored as full timestamps.