My first Publication Agile-Data-Warehouse-Design-eBook | Page 158
Modeling Star Schemas
137
Additional Data
While profiling the candidate sources, it is extremely likely that you will discover
relevant data that the stakeholder didn’t request. If any of it looks like potential
facts, or dimensional attributes for the currently prioritized events, you should add
them to the model for review. Additional business keys that represent further reuse
of conformed dimensions are especially interesting. Use bold italics to highlight
new columns.
Use italics to
highlight
additional data
Unavailable Data
If you cannot find a data source, or the only available source conveys little or no
information, use bold strikethrough on the unavailable column and its examples.
Figure 5-2 shows that PRODUCT WEIGHT is unavailable. If an entire event or
dimension is unavailable you should strikethrough the whole table and the appro-
priate row or column on the matrix (and inform the stakeholders as soon as
possible). Figure 5-3 shows the (thankfully unlikely) situation that there is no
reliable source for a product dimension. If this really was the case you would also
strikethrough all PRODUCT details in event tables—making them non-viable.
Use strikethrough
to highlight
unavailable data.
If an entire table is
unavailable highlight
this on the matrix
too
Figure 5-3
BEAM✲ diagram
showing missing
data source for an
entire dimension
Nulls and Mismatched Attribute Descriptions
If you discover an attribute definition mismatch you should highlight these by
using bold strikethrough on the appropriate column code. For example,
PRODUCT SUBCATEGORY in Figure 5-2 was defined by the stakeholders as a
mandatory (MD) description of all products (and would therefore be a good level
in the default product hierarchy). However, data profiling shows that it is missing
for 20% of products so it has been marked as MD. Highlight missing
It can be very useful to point out ‘not Null’ sources for any event details and di-
mensional attributes that the stakeholder did not explicitly identify as mandatory,
by highlighting them as NN. These rare cases, where data is more reliably available
than stakeholders thought, may open up new areas of analysis that they previously
didn’t consider. Highlight mandatory
Use the following notation to annotate a model with source definitions:
{source} : Data source system, table, column, file or field name
Value : Unavailable or incorrect data or conflicting definition
NN
: Not Null. Column cannot contain null values
mandatory data
using MD.
Strikethrough other
mismatched column
codes
source data as NN:
Not Null