Google Based Data Lake |
Predictive API |
Data Consumers |
Data Lake Data Processing Pub / Sub
Metadata
|
GoogleML
Data Lake Data Storage & Retrieval
|
Dashboards |
Rules / Matching Engine
Governance Policies
Streaming Analytics
ETL Engine
|
Hadoop on Google Compute Engine |
BigQuery |
In-Memory Analytics |
Search & Indexing |
Big Table |
Ecommerce
Data Science
BI
Mobile Apps
|
Data Integration |
Google Cloud Storage |
Figure 6 : Google Hosted Data Lake Key Google data lake technologies and capabilities include :
Operational Aspects Pub / Sub – Pub / Sub provides a seamless developer experience for the sharing of data between systems and tools .
Scalability & Performance BigQuery – BigQuery provides a highly scalable platform for analysis of data sets that are commonly read-heavy . BigQuery is a PaaS offering , ensuring low operational overhead on the IT organization .
Data Access & Retrieval Google Cloud Storage – Google Cloud Storage provides an object interface for storage of historical and archive data .
Hadoop on Google Compute Engine – Google provides multiple vendor solutions for running Hadoop on Google Compute Engine ; this can be leveraged in a data lake as a scalable batch processing environment that feeds processed , prepared data to other systems , including BigQuery .
Advanced Capabilities Google Machine Learning – Google Machine Learning capabilities provide developers the ability to leverage pre-trained models , as well as train their own for rapid analysis of data .
Predictive API – Google Predictive API provides the ability to identify patterns in data quickly , without standing up additional servers , or services .
42 | THE DOPPLER | SUMMER 2016