My first Publication Agile-Data-Warehouse-Design-eBook | Page 167

146 Chapter 5 RI prevents bad keys getting into good fact tables DBMS constraints can enforce RI but ETL SK lookups can often do this more efficiently Referential integrity means that every foreign key has a matching primary key. Without RI checks, facts could be loaded into fact tables with corrupt dimension foreign keys that do not match any existing dimensional values. If this happens, any query that uses these bad keys will fail to include those facts because they will not join to the appropriate dimensions. If these “bent needles” find there way into the giant haystack of fact tables the (SQL “NOT IN”) queries needed to find them are prohibitively expensive. RI can be enforced by defining foreign key constraints in the database. However, in practice, DBMSs can be too slow at loading data warehousing quantities of data with RI switched on. In contrast, ETL processes are optimized for performing the type of in-memory lookups required to check foreign keys against primary keys— this is exactly what ETL does when translating business keys into surrogate keys prior to loading fact tables. Effectively, the surrogate key processing provides “free” procedural referential integrity, which allows DBMS RI checking to be safely disabled. DBMS query optimizers often benefit from having fact table RI constraints defined. You can retain these optimization clues but still boost ETL performance by setting the constraints to unenforced (you might call this “trust me I know what my ETL process is doing” mode). This tells the optimizer what it needs to know about the relationships between facts and dimensions to speed up queries, but avoids unnecessary insert and update checks that would slow down ETL. Enable DBMS-enforced RI for fact tables during ETL development and initial data take-on to provide “belt and braces” data integrity assurance and test ETL surro- gate key lookups. If no DBMS RI errors are raised, the ETL processes are assign- ing valid surrogate keys to facts and the additional DBMS checks are unnecessary. You can then disable the DBMS RI (drop the constraints or set them to unenforced), if it is having an adverse effect on load times. Slowly Changing Dimensions Change stories match the behavior of slowly changing dimensions When you model dimensions with stakeholders you ask them how each dimen- sional attribute should handle change. You record their answers as change stories which illustrate the behavior of fixed value (FV) attributes that can only be cor- rected and have no valid history, current value (CV) attributes with no required history and historic value (HV) attributes that preserve their history. Figure 5-9 shows two Employee change stories for James Bond. The historical values of his MARITAL STATUS and CITY (his HV attributes) have been carefully tracked on each of his stories while his CV and FV attributes DEPARTMENT and DATE OF BIRTH show only the current or corrected single values for all his stories. This example data matches exactly how slowly changing dimensions (SCDs) are imple- mented by ETL processes in a dimensional data warehouse using surrogate keys.