My first Publication Agile-Data-Warehouse-Design-eBook | Page 155

134 Chapter 5 Automating Your Own Data Profiling Checks Use SQL scripts to generate data profiling queries that write their results to a table If you don’t have a data profiling tool but you do have hundreds or thousands of source columns to check, you can use SQL-generated SQL to create data profiling tests that write their results to a table for analysis and presentation with BI tools. For example, the following SQL generates a set of INSERT statements that count Nulls for all columns in a schema, and write the results to a PROFILING _RESULTS table: SELECT 'INSERT INTO PROFILING_RESULTS(TABLE_NAME, COLUMN_NAME, MISSING_COUNT) SELECT ''' || Table_Name || ''', ''' || Column_Name || ''', COUNT(*) FROM ' || Table_Name || ' WHERE ' || Column_Name || ' IS NULL;' FROM SYS.All_Tab_Columns WHERE … Search online for ready-made profiling scripts Search online for “SQL data profiling script” and you should be able to find ready- made scripts that you can adapt for your database platform that will create all the tests recommended above and more and store the results in table form. For in-depth coverage of data profiling, data quality measurement, and ETL techniques for continuously addressing data quality read: Data Quality: The Accuracy Dimension, Jack E. Olsen (Morgan Kaufmann, 2003) The Data Warehouse ETL Toolkit, Ralph Kimball, Joe Caserta (Wiley, 2004) Chapter 4, pages 113–147 No Source Yet: Proactive DW/BI Design Proactive DW/BI designers have to cope without stable data sources to profile (yet) Use no source as an opportunity to define the perfect BI data source What if there is no source to profile? This might not be a disaster, just a timing issue that needs to be anticipated. When agile BI systems are developed in parallel with new operational systems, a proactive data warehouse design can preempt operational system development or the installation of a packaged solution. Initially, there will be no indicative source data, possibly not even a source data model. Even when there is a well documented data model, as in the case of a packaged source, it can provide little useful information until the system has been configured and real data migrated to it. ETL development is especially challenging when source data definitions are still fluid (non-existent), but this does present an opportunity for the agile data ware- house team to negotiate a better “data deal.” The BEAM ✲ model can be used to provide a detailed specification of business intelligence data requirements to the