The Doppler Quarterly Winter 2018 | Page 59

The current state-of-the-art in DL is AlphaGo Zero from Google’s DeepMind. The combination of DeepMind’s algorithms and Google’s vast amounts of raw data have been making great progress toward solving difficult problems, such as image and speech recognition. We also cannot forget the fact that AlphaGo Zero has now handily beaten the Go world champion! Elements of AI Now that we’ve provided a breakdown of the larger topic, let’s take a look at some of the parts that need to be considered in tackling AI. Data Ingestion The large volumes of data we referred to earlier first need to be captured before we can do anything with them. This is where data ingestion comes into play. Think of data sources like social media streams, corporate transaction systems and sensor data (aka the Internet of Things). This data, whether in the form of files, transactions or streams, is often pulled and stored into a reposi- tory. With virtually unlimited storage capacity and relatively low costs, public cloud provides an attractive destination. Data Munging We use the term data munging to encompass a few concepts that generally comprise 80-90% of the overall effort involved in AI. These include: • ETL (extract, transform, load), to get the data into a common format • Cleansing or removing incomplete or corrupt data • Deduplication, to remove duplicate data that might be pulled in from dif- ferent sources • Enrichment, to add in third-party data that may provide a more com- plete data set to analyze Much of this process can be automated, but there is no magic way to avoid the still laborious job of getting all your data ready for the data scientists to start analyzing. Data Analytics Once your data has been ingested and munged into a usable state, you can begin to apply the computational techniques of Machine Learning and Deep Learning. This is not an exact science and generally involves quite a bit of trial and error. It is also important to factor in a healthy dose of applicable domain knowledge. For example, if you’re looking at marketing data, you’d better be working with someone who understands the type of marketing you’re doing. Or if you’re looking to improve predictive maintenance for industrial machin- ery, you’d better include someone who knows the ins and outs of how those machines tick. WINTER 2018 | THE DOPPLER | 57