My first Publication Agile-Data-Warehouse-Design-eBook | Page 156
Modeling Star Schemas
135
operational development team—while they are still in design mode. An agile
operational team will welcome your early input to ensure their system will capture
crucial business intelligence information needed by the data warehouse. If you
have this level of cooperation you can press on with your dimensional design and
ETL development.
When source database development lags behind data warehouse design, you can
avoid delaying ETL development by defining extract file layouts, based on your
BEAM✲ tables, and getting the operational development team to agree to their
scheduled delivery. The agile ETL team can then get on with mapping these
initially empty files to their star schema targets.
Once data take-on has begun for a new operational system you should profile the
initial data and the previously agreed-upon extract files as early as possible to help
the operational team keep to their promises. Trust no one!
Profile sources as
soon as they are
available
Annotating the Model with Data Profiling Results
The results of data profiling need to be presented to the stakeholders, so that they
can review the data issues, decide on next steps, and if necessary reprioritize
development based on the data realities. While data profiling tools can provide
many useful graphical reports for the warehouse team, the profiling results for a
BEAM ✲ model are best delivered to the modelstorming stakeholders in a format
that they are familiar with: the BEAM ✲ model itself. Present data
In Figure 5-2, the PRODUCT dimension has been extended with data profiling
results showing counts and percentages for missing, unique, minimum and maxi-
mum values for each column. These simple profiling measures are a great start for
highlighting potential issues, and can be augmented with more sophisticated
measures and graphics generated by data profiling tools. The Figure 5-3 table has
also been annotated to show data sources, unavailable details, new attributes and
definition mismatches. The following sections describe the model review notation
used. BEAM✲ tables are
profiles to
stakeholders using
the BEAM✲ format
familiar to them
extended to hold
profiling metrics and
annotated to
highlight source
data issues
Data Sources and Data Types
For each dimensional attribute and event detail, record its best candidate data
source within braces ({ }); for example, {ERP.Employee.Grade} identifies the
source system ERP, Employee table or file, and Grade column or field. If a single
source table or file is the source for all columns in a BEAM ✲ table you can add its
name once to table header and only name the individual source column or field in
each BEAM ✲ column; For example, in Figure 5- the table header shows that the
source for all PRODUCT attributes is the ERP system table PRD and the column
46#$"5&(03: source UBCMFcolumnJT PRD_4$"5. If a table or column will
be derived from multiple sources, you can comma delimit them or use source
For each column
add its source
name, data type
and length