My first Publication Agile-Data-Warehouse-Design-eBook | Page 164

Modeling Star Schemas Insulate the Data Warehouse from Business Key Change Surrogate keys protect the data warehouse from changes or glitches in the way business keys are administrated. For example, if business keys change when a business process is migrated to a new packaged application they can be updated in a dimension without affecting millions or billions of facts. If business keys are reused when products are discontinued or customers depart, new dimensional rows with new surrogate keys can be assigned to these reused codes so they remain distinct from their previous use. Business key instability like this is often hidden because it may not cause a problem in the operational systems if older transactions are archived, but the problem surfaces when the data warehouse has to take a longer-term view. Cope with Multiple Business Keys for a Dimension Surrogate keys allow the data warehouse to integrate events from multiple opera- tional sources that store information about the same conformed dimensions using different business keys. By using a surrogate key, you can sidestep the question of which business key is best for a dimension—a politically sensitive issue within organizations that have grown by merger or acquisition. The safest answer is “there is no best business key.” They are all important non-key attributes that should be stored in the dimension. The multiple business keys will be used by ETL processes to assign surrogate foreign keys to the facts derived from the multiple sources. They may also all have analytical value to various stakeholder groups, especially if some are natural keys that the stakeholders work with outside of their databases. Track Dimensional History Efficiently The data warehouse must provide history for the descriptive attributes of slowly changing dimensions (SCDs) as well as rapidly changing facts. Surrogate keys provide a simple mechanism for storing this history directly in dimension tables and efficiently joining the correct historical descriptions to the historical facts at query time. Handle Missing Dimensional Values Every dimension needs at least one special record that represents ‘missing’ or ‘not applicable’ to cope with errors or minor variations in business events. For example, in-store and telephone CUSTOMER ORDERS are handled by a SALESPERSON whereas online orders are not; they do not naturally have a SALESPERSON dimen- sion. When online orders are loaded into the same fact table as other orders their Null or missing SALESPERSON IDs are replaced with a special surrogate key value that points to a “No Salesperson” record in the SALESPERSON dimension. This allows order queries that group by SALESPERSON to still include online orders but display them using “Missing” labels defined by stakeholders. If the SALESPERSON foreign key was left as Null, online orders would always be ex- cluded. Having a special missing row in every dimension simplifies query joins; all joins can be inner joins as every fact will always find a matching dimension record. 143 When business keys are changed or reused, surrogate keys prevent facts from been affected Surrogate keys provide a single primary key for conformed dimensions that have multiple business keys originating from multiple source systems Surrogate keys allow dimensional history to be tracked and joined efficiently to facts Surrogate keys can represent special missing dimension values for which there are no business keys; e.g., “No Customer”, “Missing Date”