My first Publication Agile-Data-Warehouse-Design-eBook | Page 165

144 Chapter 5 Reserve the surrogate key value zero for the default Missing row in each dimen- sion. You use this row zero to hold the stakeholders’ Missing labels recorded in the BEAM✲ dimension table example data. If different types of missing are needed you can add additional special rows, using negative surrogate keys, to represent “Unknown”, “Not Applicable”, “Error” etc., leaving the normal dimension values to use positive integers. Being consistent in the use of special value surrogate keys can greatly simplify ETL processing. Surrogate keys enable multi-level dimensions to describe variable- level business processes Surrogate keys keep sensitive data anonymous. Unlike business keys they cannot be used to join sensitive data to source systems that might provide full disclosure Use integer surrogate keys instead of alphanumeric business keys to reduce the size of fact tables and indexes Support Multi-Level Dimensions Some business events have variable-level dimensional details. For example, tele- sales orders are normally attributed to individual salespeople, but occasionally an order is attributed to a sales team, when the salesperson is on probation or has left before the order is processed. Orders are a variable-level who event. By using an extension of the missing value SK technique, a number of additional special value rows can be added to the SALESPERSON dimension to represent teams, branches, regions, or other levels in the sales organization hierarchy, creating a multi-level dimension to which both normal and exceptional facts can be attached. The multi- level design pattern is covered in Chapter 6. Protect Sensitive Information For data protection or security reasons you might need to create anonymous customer or employee dimensions for analyzing sensitive purchase habits or salary payment facts. Anonymized dimensions obviously must not contain name, exact address, date of birth, or other descriptive attributes that could be used in combi- nation to identify individuals. However, if they use business keys (such as Cus- tomer ID or Employee Number) as primary keys, then fact tables will contain business key foreign keys that can be cross-referenced with other systems to provide full disclosure. You can prevent this by replacing business keys with surrogate keys, that do not exist outside of the data warehouse, and limiting access to the disclosing business key to surrogate key mapping tables, to secure ETL processes only. Reduce Fact Table Size Integer surrogate keys are more compact than datetime keys and most alphanu- meric business keys—especially so called ‘smart keys’ that have embedded business logic. This can lead to significant reductions in the size of fact tables. For example, a typical detailed fact table has 10 dimensions and 5 measures. If the average length of a business key is 10 characters (bytes), replacing each foreign key with a 4-byte integer would halve the size of the fact table and its corresponding indexes. Even if savings can only be made on a few foreign keys, every byte counts when it comes to fact tables.