Consumption Pattern
Machine
Learning
Ad-hoc
Analysis
Reports
Dashboard
Processed, Standardized,
Use Case Specific Data
Enterprise
Search
Interactive
Fast Queries
Raw Data
Data Lake
Figure 4: Data Lake Layers and Consumption Patterns
lake into a column store platform. Examples of tools
to accomplish this would be Google BigQuery, Ama-
zon Redshift or Azure SQL Data Warehouse.
enterprise big data as a core asset, to extract mod-
el-based insights from data, and nurture a culture of
data-driven decision making.
Interactive Query and Reporting
There are still a large number of use cases that require
support for regular SQL query tools to analyze these
massive data stores. Apache Hive, Apache Presto,
Amazon Athena, and Impala are all specifically devel-
oped to support these use cases by creating or utiliz-
ing a SQL-friendly schema on top of the raw data.
EDITOR’S NOTE
This is the second article in a multi-part series dis-
cussing the strategic considerations and crucial
Data Exploration and Machine Learning technical details that senior managers and CxOs
Finally, a category of users who are among the big-
gest beneficiaries of the data lake are your data sci-
entists, who now can have access to enterprise-wide
data, unfettered by various schemas, and who can
then explore and mine the data for high-value busi-
ness insights. Many data scientists tools are either
based on or can work alongside Hadoop-based plat-
forms that access the data lake. infrastructure modernization strategy. We share
Conclusion
When designed and built well, a data lake removes
data silos and opens up flexible enterprise-level
exploration and mining of results. The data lake is
one of the most essential elements needed to harvest
need to consider in an enterprise-wide analytics
observations and insights we’ve developed in our
role as a partner in these journeys with multiple
clients.
Keep current on cloud. Sign up to
receive articles like this every Friday.
cloudtp.com/doppler
SUMMER 2017 | THE DOPPLER | 19