Intelligent CIO Europe Issue 49 - Page 57

the analytics environment , the more complex the infrastructure becomes .
Technologies that work seamlessly to support a variety of processes will be key .
These include :
• The enterprise data warehouse ( EDW ) – The production analytics environment where routine analyses , reports and KPIs are produced on a regular basis using trusted reliable data .
• The investigative computing platform ( ICP ) – Used for data exploration , data mining , modelling and cause and effect analyses . Also known as the data lake , this is the playground for data scientists and others who have unknown or unexpected queries .
• A data integration platform – That extracts , formats and loads structured data into the EDW and invokes data quality processing where needed .
• A data refinery – For ingesting raw structured and multi-structured data , distilling it into useful formats in the ICP for advanced analyses .
• Analytics tools and applications – To create reports , perform analyses and display results .
• A data catalogue – Which acts as an entry point for users where they can view what data is available and discover what analytical assets already exist . This needs to be meticulously maintained and updated .
Implementing a Data Fabric : The technical processes involved
Those responsible for building and maintaining a Data Fabric face a big task . The simpler they make the business community ’ s access and utilisation of
• Discovery – Alongside detecting what data and assets already exist in the environment and getting the full metadata on data lineage ( sources , integration techniques and quality metrics ), technical people can utilise usage statistics ( who is using what and how often ) and impact analysis to understand what data and analytical assets are impacted if an integration programme changes .
• Data availability – If a user requests data that is not available , potential sources will need to be researched and assessed in terms of quality , accessibility and suitability for the requested purpose . All this information needs to be documented into the data catalogue for future usage .
• Design and deploy – Populating the right analysis component ( EDW , ICP and RT ) with the right data and technologies from the appropriate source of data , utilising data integration and quality processes to ensure the data can be trusted . Sensitive data must be identified and protected by encryption or other masking mechanisms .
• Monitoring – The data catalogue must be updated with the latest additions , edits and changes made to the Data Fabric , its data , or its analytical assets . Similarly , any changes in data lineage or usage should be monitored .
Top tips for success
For the Data Fabric to succeed , organisations must commit to maintain the integrity of the architectural standards and components it is built on . So , if silos are created as temporary workarounds , these will need to be decommissioned when no longer needed .
Since the value of the Data Fabric depends on the strength of information gathered in the data catalogue , out of date , stale or inaccurate metadata cannot leak into the catalogue . Finally , simply forklifting legacy analytic components , like an ageing data warehouse , into the fabric could result in integration problems . Ideally , these legacy components should be reviewed and redesigned .
While it is a big undertaking , successful Data Fabric environments are already proving their worth when it comes to enabling companies to leverage data more effectively and unlocking the full potential of their data assets to gain competitive advantage . p
www . intelligentcio . com INTELLIGENTCIO EUROPE 57