Intelligent CIO APAC Issue 09

CIO OPINION

Data lakes can quickly become data swamps if data is dumped without consistent data definitions .

refining that raw data into an analytics-ready state , so the data is actionable and accessible for exploration and analysis .

When building a performant data lake , focus on not only the sources and types of data to ingest from and the speed of data replication , but also the ability to transform and refine that data and make it consumption ready for analytics .

Universal support for variety of source systems and target platforms , real-time incremental change data capture and pipeline automation – all the way from configuring and managing data pipelines to transforming and refining raw data into curated , analytics-ready data sets-are critical to accelerate value from your data lake .

Value can only be derived from data you trust

Data security , quality , consistency and governance are critical to data lake value . Data lakes can quickly become data swamps if data is dumped without consistent data definitions and metadata models . Check for the ability to auto-generate and augment metadata , tag and secure sensitive data and establish enterprise-wide access controls .

Data in data lakes is of value only if data consumers can understand and use data , verify its origin and trust its quality . Integrated catalog for automated data profiling and metadata generation , lineage , data security and governance are critical to building a successful data lake .

Accessible data is key to unlocking value creation

A key reason for a data lake failing to unlock value is the inability to access and consume data at the speed of the market . It is not enough to just store data in the data lake ; data should also be usable and accessible to create value . Data consumers ’ inability to easily find , understand and self-provision desired datasets – or their dependence on data scientists or specialized programmers to extract data means delayed and dated data .

A user-friendly marketplace capability for search and evaluation , as well as self-service preparation of derivative datasets can fast-track data lake value realization .

While the original on-prem Hadoop based model might have potentially outlived its usefulness , cloud migration and advances in integration technologies have provided users a new way of storing , processing and refining data , putting it to use in a much more cost and time effective way .

But value from the data cannot simply be unlocked by dumping the data into a single pool and hoping for the best . Avoiding a data swamp involves many considerations , not least of all ensuing the data poured into the lake is trustworthy and accessible . p

46 INTELLIGENTCIO APAC www . intelligentcio . com

Intelligent CIO APAC Issue 09 | Page 46