The Doppler Quarterly Summer 2016 | Page 36

Building a Platform for Machine Learning & Analytics

Joey Jablonski
Introduction
Predictive analytics and supporting technologies like machine learning require access to diverse data sets and powerful , scalable compute resources . Modern capabilities , including predictive analytics and machine learning , enable organizations to leverage large amounts of data from social media , online journeys , the Internet of Things ( IoT ) and other sources to enable data driven decisions across an organization . Leveraging a data lake to store the necessary information for powering predictive analytics and machine learning workloads empowers staff across an organization to analyze data , test theories and drive changes to business processes , the customer experience and products .
A data lake is not meant to replace existing systems . Rather , it is an integration point between existing data platforms , to enable a seamless view into all of an organization ’ s data . A data lake will complement existing systems by ensuring that analytical workloads , development , testing and machine learning model creation will not impact production workloads in other performance optimized systems . Ultimately , a data lake is a concept , and while it has some specific technologies and workflows , its value lies in the connectivity between the core of the data lake and supporting business and operational systems .
Building a data lake requires organizations to assess data strategy , infrastructure architecture and workflows , to ensure the data available is of high quality , linked for rapid analysis and does not expose the organization to risk through data compromise , or create compliance challenges . Figure 1 shows the common steps an organization takes as they begin a data lake project , and the key considerations , both technical and organizational , that must be addressed for a successful data lake implementation .
34 | THE DOPPLER | SUMMER 2016