TECHNICAL GUIDE
How to Guide: Architecture
Patterns to Consider When
Designing an Enterprise Data Lake
Sudi Bhattacharya and Neal Matthews
This article focuses on the business
value of enterprise Data Lakes, design-
ing for storage, security & governance
and how to utilize your big data as a
core asset to extract valuable insights.
at its roots. The door to previously unavailable explor-
atory analysis and data mining opens up, enabling
completely new possibilities.
Speed
“A data lake is a storage repository that holds a vast
amount of raw data in its native format, including
structured, semi-structured, and unstructured data.
The data structure and requirements are not defined
until the data is needed.” ...and a question: Why should
you care? In today’s dynamic business environment, new data
consumption requirements and use cases emerge
extremely rapidly. By the time a requirements docu-
ment is prepared reflecting requested changes to
data stores or schemas, users have often moved on to
a different or even contradictory set of schema
changes. In contrast, the entire philosophy of a data
lake revolves around being ready for an unknown use
case. When the source data is in one central lake,
with no single controlling structure or schema
embedded within it, supporting a new additional use
case can be much more straightforward.
Innovation Self Service
In a large enterprise, perhaps the most powerful
impact of a data lake is the enablement of innovation.
We have seen many multi-billion dollar organizations
struggling to establish a culture of data-driven insight
and innovation. They get bogged down by the struc-
tural silos that isolate departmental or divisional-
ly-divided data stores, and which are mirrored by
massive organizational politics around data owner-
ship. While far from trivial to implement, an enter-
prise data lake provides the necessary foundation to
clear away the enterprise-wide data access problem What is the average time between a request made to IT
for a report and eventual delivery of a robust working
report in your organization? In far too many cases, the
answer is measured in weeks or even months. With a
properly designed data lake and well-trained business
community, one can truly enable self-service Business
Intelligence. Allow the business people access to what-
ever slice of the data they need, letting them develop
the reports that they want, using any of a wide range of
tools. IT becomes the custodian of the infrastructure
The Business Case
Let’s start with the standard definition of a data lake:
12 | THE DOPPLER | SUMMER 2017