Intelligent CIO APAC Issue 09 | Page 46

CIO OPINION
Data lakes can quickly become data swamps if data is dumped without consistent data definitions .
refining that raw data into an analytics-ready state , so the data is actionable and accessible for exploration and analysis .
When building a performant data lake , focus on not only the sources and types of data to ingest from and the speed of data replication , but also the ability to transform and refine that data and make it consumption ready for analytics .
Universal support for variety of source systems and target platforms , real-time incremental change data capture and pipeline automation – all the way from configuring and managing data pipelines to transforming and refining raw data into curated , analytics-ready data sets-are critical to accelerate value from your data lake .
Value can only be derived from data you trust
Data security , quality , consistency and governance are critical to data lake value . Data lakes can quickly become data swamps if data is dumped without consistent data definitions and metadata models . Check for the ability to auto-generate and augment metadata , tag and secure sensitive data and establish enterprise-wide access controls .
Data in data lakes is of value only if data consumers can understand and use data , verify its origin and trust its quality . Integrated catalog for automated data profiling and metadata generation , lineage , data security and governance are critical to building a successful data lake .
Accessible data is key to unlocking value creation
A key reason for a data lake failing to unlock value is the inability to access and consume data at the speed of the market . It is not enough to just store data in the data lake ; data should also be usable and accessible to create value . Data consumers ’ inability to easily find , understand and self-provision desired datasets – or their dependence on data scientists or specialized programmers to extract data means delayed and dated data .
A user-friendly marketplace capability for search and evaluation , as well as self-service preparation of derivative datasets can fast-track data lake value realization .
While the original on-prem Hadoop based model might have potentially outlived its usefulness , cloud migration and advances in integration technologies have provided users a new way of storing , processing and refining data , putting it to use in a much more cost and time effective way .
But value from the data cannot simply be unlocked by dumping the data into a single pool and hoping for the best . Avoiding a data swamp involves many considerations , not least of all ensuing the data poured into the lake is trustworthy and accessible . p
46 INTELLIGENTCIO APAC www . intelligentcio . com