My first Publication Agile-Data-Warehouse-Design-eBook | Page 158

Modeling Star Schemas 137 Additional Data While profiling the candidate sources, it is extremely likely that you will discover relevant data that the stakeholder didn’t request. If any of it looks like potential facts, or dimensional attributes for the currently prioritized events, you should add them to the model for review. Additional business keys that represent further reuse of conformed dimensions are especially interesting. Use bold italics to highlight new columns. Use italics to highlight additional data Unavailable Data If you cannot find a data source, or the only available source conveys little or no information, use bold strikethrough on the unavailable column and its examples. Figure 5-2 shows that PRODUCT WEIGHT is unavailable. If an entire event or dimension is unavailable you should strikethrough the whole table and the appro- priate row or column on the matrix (and inform the stakeholders as soon as possible). Figure 5-3 shows the (thankfully unlikely) situation that there is no reliable source for a product dimension. If this really was the case you would also strikethrough all PRODUCT details in event tables—making them non-viable. Use strikethrough to highlight unavailable data. If an entire table is unavailable highlight this on the matrix too Figure 5-3 BEAM✲ diagram showing missing data source for an entire dimension Nulls and Mismatched Attribute Descriptions If you discover an attribute definition mismatch you should highlight these by using bold strikethrough on the appropriate column code. For example, PRODUCT SUBCATEGORY in Figure 5-2 was defined by the stakeholders as a mandatory (MD) description of all products (and would therefore be a good level in the default product hierarchy). However, data profiling shows that it is missing for 20% of products so it has been marked as MD. Highlight missing It can be very useful to point out ‘not Null’ sources for any event details and di- mensional attributes that the stakeholder did not explicitly identify as mandatory, by highlighting them as NN. These rare cases, where data is more reliably available than stakeholders thought, may open up new areas of analysis that they previously didn’t consider. Highlight mandatory Use the following notation to annotate a model with source definitions: {source} : Data source system, table, column, file or field name Value : Unavailable or incorrect data or conflicting definition NN : Not Null. Column cannot contain null values mandatory data using MD. Strikethrough other mismatched column codes source data as NN: Not Null