My first Publication Agile-Data-Warehouse-Design-eBook | Page 151
130
Chapter 5
Agile Data Profiling
Profile candidate
data sources for the
prioritized events
and dimensions, to
discover their data
The first step in translating a BEAM ✲ model into a viable data warehouse design is
to use agile data profiling to identify candidate data sources for the model’s priori-
tized events and dimensions. Data profiling is the process of examining data
sources to learn about their structure, content, and data quality. Agile data profil-
ing (see Figure 5-1) is also:
structure, content
and quality
Targeted to the candidate data sources for the business events and conformed
dimensions that the stakeholders have prioritized for the next release, rather
than all available data sources.
Done early, as a data modeling task to help define the dimensional model.
Agile data profiling
is done early as a
modeling activity –
Done frequently, to ensure that the model responds to change; this is espe-
cially important for new data sources that are being developed in parallel with
the data warehouse.
before a target DW
schema is created
Done by DW/BI team members who will load the data, to give them a feel for
its complexity that will help them with their ETL task estimates.
Recorded in the business model so that data profiles can be used to review that
BEAM ✲ BI data requirements model with the stakeholders, before any techni-
cal data models are proposed.
Figure 5-1
Agile data profiling
The most expensive and painful way to discover the data profile of an operational
source is to create an idealized target schema, attempt to ETL the source into the
target and record all the errors. Don’t make this extremely late/non-existent data
profiling mistake. Agile data warehouse designers never create a detailed physical
model before profiling a source, unless they are deliberately doing proactive
DW/BI design to help define a brand new source.