My first Publication Agile-Data-Warehouse-Design-eBook | Page 89

68 Chapter 3 If stakeholders cannot agree on examples, you may have homonyms: multiple attributes with similar names The goal of using examples is to ensure that everyone is in clear agreement about the definition of each attribute. If the meaning or use of an attribute is unclear or contentious ask for additional examples. If stakeholders can’t agree on example values for an attribute it can indicate that you have discovered homonyms: two or more attributes with the same name but different meanings. If all the possible meanings are valid, then each set needs to be uniquely named, and modeled as a separate attribute with its own examples. If stakeholders struggle to provide examples for an attribute you should be alerted to the possibility that the attribute doesn’t exist yet, or is a “nice to have” attribute that isn’t currently well understood enough to be useful. Descriptive Attributes Check if codes such as business keys are smart keys: contain hidden meaning When documenting examples for business keys and any other cryptic codes check if the values contain any hidden meaning. For example, the data in Figure 3-2 suggests that CUSTOMER ID is a “smart key” that can be used to differentiate business and consumer customers. How does this tally with CUSTOMER TYPE? You would investigate this further via data profiling. It might prove useful as an additional quality assurance test during ETL processing. For every code you are given you should ask the stakeholders: Do any existing reports or spreadsheets decode [business keys / other cryptic codes] into more descriptive labels? Model descriptive attributes that decode all cryptic codes. No decodes If YES, you want to convert this report logic in to ETL code and define descriptive attributes for these labels in your dimensions, where they will be consistently maintained and available to everyone. Your motto should be “No SQL decodes at query time!” If BI applications need to decode dimensional attributes you and the stakeholders have not done a good enough job of defining the dimensions. in BI queries! Beware of “smart keys” with embedded meaning. They seldom remain smart over their lifetime, and often become overloaded with multiple meanings as business processes evolve. BI applications should not attempt to decode smart keys and other codes to provide descriptive labels. It is almost impossible for embedded report logic to keep up with these codes as their meaning morphs over time. It should be replaced by descriptive data in the data warehouse. Codes provide consistent sort order for multi-language text If you find more BI-friendly descriptive attributes for codes you can remove or hide the codes in the final version of a dimension, as long as stakeholders have no use for them on reports. However, if you are designing a multinational data ware- house that will translate descriptive attributes into several national languages, these otherwise cryptic codes are useful for consistently sorting reports that are re-run internationally. Chapter 7 covers techniques for handling national languages.