The Doppler Quarterly Summer 2016 | Page 47

Mapping Data Lake Considerations to the Tenets
1 . Strategy & Economics – Data lakes have specific elements for strategy and economics because of their ability to enable better decision making within an organization , to positively influence revenue and customer satisfaction .
2 . Security & Governance – Because of the multitude of data stored in a data lake , Security and Governance must consider the risks associated with data being combined , as well as analyzed , outside of traditional organizational roles or workflows .
3 . Application Portfolio Assessment – Any data lake project should include an evaluation of applications from a data usage perspective , including documentation of source of record evaluation .
Application Migration – In the case of a data lake , very little application migration work will take place ; rather , the focus will be around implementation of new capabilities for supporting the data lake and integration with existing systems .
4 . DevOps – In the case of a data lake , DevOps models will allow anyone within an organization to develop analytical models and access a repository of curated data about the organization , allowing them to effectively manage their business and test theories .
5 . CloudOps – With data lakes , there are many moving pieces and interconnected systems . Strong CloudOps models for monitoring , response , incident management and staff training ensure stability . CloudOps also includes cost control elements to ensure services are properly started and stopped , and that costs are being monitored by management for alignment with organization goals and returns on investment .
6 . DataOps – Data quality is paramount in a data lake to ensure that decisions and recommendations made are grounded in truth . DataOps , including metadata management , data linking , quality , curation and archiving , are key elements to all data lake deployments .
Data Quality & Modeling
The primary function of a data lake is to provide a single repository of diverse data sets , easily accessible and of high quality and integrity . Data quality is paramount , as is the ability to easily find data sets and related data . There are a variety of best practices to use as measures for data quality within a data lake :
Schema on Read – Because of the diverse nature of workloads and analytics patterns in a data lake , all schemas should be applied on read . This schema on read model ensures that each analyst can optimize their data views and relationships .
SUMMER 2016 | THE DOPPLER | 45