My first Publication Agile-Data-Warehouse-Design-eBook | Page 183
162
Chapter 5
Summary
Agile data profiling targets the data sources implicated by the BEAM ✲ model. It is done early as
a data-driven modeling activity to validate the stakeholders data requirements before detailed
star schemas are designed. When data sources don’t yet exist, proactive DW/BI designs based
on the BEAM ✲ model can help define better BI data feeds from new operational systems.
Annotated models present data profiling results in a format stakeholders are familiar with. An
annotated table contains source names, data types and summary data profiling metrics. Data
source issues such as missing data and mismatched definitions are highlighted using
strikethrough. Additional data is highlighted using italics.
The DW/BI team uses the annotated model and detailed data profiling results to provide initial
task estimates for building and loading the proposed facts and dimensions. These ETL estimates
are added to the event matrix for use during model review and sprint planning.
During a model review, stakeholders use the annotated model and the DW/BI team estimates to
agree amendments to the design and reprioritize their requirements in light of the data realities
and available development resources.
BEAM ✲ models are easily translated into logical dimensional models and star schemas.
Dimension tables are updated by adding primary keys and administrative attributes. Event
tables are converted into fact tables by replacing dimensional details with foreign keys and
changing quantities (how many details) into fully-additive (FA), semi-additive (SA), or non-
additive (NA) facts with standardized (conformed) units of measure.
(Data warehouse) surrogate keys are used as dimension primary keys to insulate the data
warehouse from business keys, provide dimensional flexibility (manage SCDs, missing values,
multi-levels, etc.) and improve query efficiency.
Enhanced star schemas convey additional dimensional information. Consistent dimensional
layout documents dimensions by W-type and increases multi-star schema model readability.
BEAM ✲ short codes document table and column level dimensional properties not handled by
standard ER notation.
In addition to the standard documentation provided by modeling tools, a data warehouse
matrix provides an overview of all the star schemas and OLAP cubes in the data warehouse.
Similar in layout to an event modeling and planning matrix, this physical matrix provides
additional information about the actual warehouse for a technical audience. It is a vital tool for
managing the warehouse and must be kept up to date along with the event matrix and star
schema diagrams as the data warehouse design evolves.