My first Publication Agile-Data-Warehouse-Design-eBook | Page 89
68
Chapter 3
If stakeholders
cannot agree on
examples, you may
have homonyms:
multiple attributes
with similar names
The goal of using examples is to ensure that everyone is in clear agreement about
the definition of each attribute. If the meaning or use of an attribute is unclear or
contentious ask for additional examples. If stakeholders can’t agree on example
values for an attribute it can indicate that you have discovered homonyms: two or
more attributes with the same name but different meanings. If all the possible
meanings are valid, then each set needs to be uniquely named, and modeled as a
separate attribute with its own examples.
If stakeholders struggle to provide examples for an attribute you should be alerted
to the possibility that the attribute doesn’t exist yet, or is a “nice to have” attribute
that isn’t currently well understood enough to be useful.
Descriptive Attributes
Check if codes such
as business keys
are smart keys:
contain hidden
meaning
When documenting examples for business keys and any other cryptic codes check
if the values contain any hidden meaning. For example, the data in Figure 3-2
suggests that CUSTOMER ID is a “smart key” that can be used to differentiate
business and consumer customers. How does this tally with CUSTOMER TYPE?
You would investigate this further via data profiling. It might prove useful as an
additional quality assurance test during ETL processing. For every code you are
given you should ask the stakeholders:
Do any existing reports or spreadsheets decode [business keys /
other cryptic codes] into more descriptive labels?
Model descriptive
attributes that
decode all cryptic
codes. No decodes
If YES, you want to convert this report logic in to ETL code and define descriptive
attributes for these labels in your dimensions, where they will be consistently
maintained and available to everyone. Your motto should be “No SQL decodes at
query time!” If BI applications need to decode dimensional attributes you and the
stakeholders have not done a good enough job of defining the dimensions.
in BI queries!
Beware of “smart keys” with embedded meaning. They seldom remain smart over
their lifetime, and often become overloaded with multiple meanings as business
processes evolve. BI applications should not attempt to decode smart keys and
other codes to provide descriptive labels. It is almost impossible for embedded
report logic to keep up with these codes as their meaning morphs over time. It
should be replaced by descriptive data in the data warehouse.
Codes provide
consistent sort order
for multi-language
text
If you find more BI-friendly descriptive attributes for codes you can remove or
hide the codes in the final version of a dimension, as long as stakeholders have no
use for them on reports. However, if you are designing a multinational data ware-
house that will translate descriptive attributes into several national languages, these
otherwise cryptic codes are useful for consistently sorting reports that are re-run
internationally. Chapter 7 covers techniques for handling national languages.