of tools for accessing data using SQL interfaces , tools for storing data in JSON objects , optimized platforms for read-only , as well as tools for batch processing unstructured data . These tools should be considered when designing a data lake , including the necessary interfaces for ingest and processing of data . Later in the paper we discuss specific technologies from AWS and Google for data access and retrieval . A common platform for metadata should also be designated for streamlined data access .
• Security Controls , Logging & Auditing – Security is a key element of a data lake ; the identity management , auditing and access controls should be designed to meet the risk levels of the organization , as well as compliance needs . Access controls should be consistent between access methods .
• Deployment & Automation – Tremendous operational value comes from the ability to automate deployment and recovery in the cloud . All data lake functionality should be automated for deployment and recovery , to lower the operational burden on the IT team when making changes and responding to incidents .
• Advanced Capabilities – Advanced capabilities include APIs for data analysis , or development toolkits that quickly enable teams to mock up new analysis and reports .
Figure 5 shows the recommended design pattern for a cloud-based data lake , including connectivity to traditional enterprise systems .
Data Lake
Data Lake Data Processing
Metadata
Predictive Analytics Machine Learning
Data Lake Data Storage & Retrieval
Data Consumers
Dashboards
Rules / Matching Engine
Governance Policies
Streaming Analytics
ETL Engine
|
Batch Processing |
Analytical Reporting |
In-Memory Analytics |
Search & Indexing |
Online Transaction
Processing
|
Ecommerce
Data Science
BI
Mobile Apps
|
Data Integration
Object Store
Long Term Archive
Figure 5 : Data Lake Functional Architecture
SUMMER 2016 | THE DOPPLER | 39