My first Publication Agile-Data-Warehouse-Design-eBook | Page 156

Modeling Star Schemas 135 operational development team—while they are still in design mode. An agile operational team will welcome your early input to ensure their system will capture crucial business intelligence information needed by the data warehouse. If you have this level of cooperation you can press on with your dimensional design and ETL development. When source database development lags behind data warehouse design, you can avoid delaying ETL development by defining extract file layouts, based on your BEAM✲ tables, and getting the operational development team to agree to their scheduled delivery. The agile ETL team can then get on with mapping these initially empty files to their star schema targets. Once data take-on has begun for a new operational system you should profile the initial data and the previously agreed-upon extract files as early as possible to help the operational team keep to their promises. Trust no one! Profile sources as soon as they are available Annotating the Model with Data Profiling Results The results of data profiling need to be presented to the stakeholders, so that they can review the data issues, decide on next steps, and if necessary reprioritize development based on the data realities. While data profiling tools can provide many useful graphical reports for the warehouse team, the profiling results for a BEAM ✲ model are best delivered to the modelstorming stakeholders in a format that they are familiar with: the BEAM ✲ model itself. Present data In Figure 5-2, the PRODUCT dimension has been extended with data profiling results showing counts and percentages for missing, unique, minimum and maxi- mum values for each column. These simple profiling measures are a great start for highlighting potential issues, and can be augmented with more sophisticated measures and graphics generated by data profiling tools. The Figure 5-3 table has also been annotated to show data sources, unavailable details, new attributes and definition mismatches. The following sections describe the model review notation used. BEAM✲ tables are profiles to stakeholders using the BEAM✲ format familiar to them extended to hold profiling metrics and annotated to highlight source data issues Data Sources and Data Types For each dimensional attribute and event detail, record its best candidate data source within braces ({ }); for example, {ERP.Employee.Grade} identifies the source system ERP, Employee table or file, and Grade column or field. If a single source table or file is the source for all columns in a BEAM ✲ table you can add its name once to table header and only name the individual source column or field in each BEAM ✲ column; For example, in Figure 5- the table header shows that the source for all PRODUCT attributes is the ERP system table PRD and the column 46#$"5&(03: source UBCMFcolumnJT PRD_4$"5. If a table or column will be derived from multiple sources, you can comma delimit them or use source For each column add its source name, data type and length