My first Publication Agile-Data-Warehouse-Design-eBook | Page 151

130 Chapter 5 Agile Data Profiling Profile candidate data sources for the prioritized events and dimensions, to discover their data The first step in translating a BEAM ✲ model into a viable data warehouse design is to use agile data profiling to identify candidate data sources for the model’s priori- tized events and dimensions. Data profiling is the process of examining data sources to learn about their structure, content, and data quality. Agile data profil- ing (see Figure 5-1) is also: structure, content and quality Targeted to the candidate data sources for the business events and conformed dimensions that the stakeholders have prioritized for the next release, rather than all available data sources. Done early, as a data modeling task to help define the dimensional model. Agile data profiling is done early as a modeling activity – Done frequently, to ensure that the model responds to change; this is espe- cially important for new data sources that are being developed in parallel with the data warehouse. before a target DW schema is created Done by DW/BI team members who will load the data, to give them a feel for its complexity that will help them with their ETL task estimates. Recorded in the business model so that data profiles can be used to review that BEAM ✲ BI data requirements model with the stakeholders, before any techni- cal data models are proposed. Figure 5-1 Agile data profiling The most expensive and painful way to discover the data profile of an operational source is to create an idealized target schema, attempt to ETL the source into the target and record all the errors. Don’t make this extremely late/non-existent data profiling mistake. Agile data warehouse designers never create a detailed physical model before profiling a source, unless they are deliberately doing proactive DW/BI design to help define a brand new source.