Analytics Magazine Analytics Magazine, January/February 2014 | Page 23

In our internal data science meetings, we love to think about, tinker with and invent our next generation algorithms for when a customer is “at scale.” When at scale, a customer has run enough ad campaigns that have created enough data that we can finally apply some of our cutting-edge predictive analytics, machine learning and optimization algorithms. That is, we actually have some big data to work with. EARLY ON: SMALL DATA Long before a customer is at scale, they are essentially in a start-up phase. In this phase, terabytes and petabytes are replaced by mere megabytes. A/B/n testing is replaced by just … A. Predictive analytics is replaced by anecdotal evidence. And sample sizes are so small that the concept of statistical significance is, well, insignificant. From a data science perspective, we refer to this as small data. But despite the lack of data during this start-up phase, customers still expect our platform to optimize their ad campaigns. So how do we approach this situation? We will address this and other similar situations in the “Big Data Dreams, Small Data Reality” column. A few other obvious examples include planning for new businesses, new products or services and new business processes. A NA L Y T I C S A start-up almost certainly lacks the historical data that an established company has collected about its operations, finances or sales and marketing strategies. Yet a new business still needs to plan its future: which products or services to launch, which customers to target, how to set pricing policies, how to promote the brand, how to layout the website, how much to staff up, and so on. All of these decisions could be aided by data, if only you had some. In the absence of data, one of the most important parts of planning to make data-driven decisions is how you structure your decision model. Did you include the right objectives, constraints and other assumptions? Even though you have no data, you still have to populate your model with something, for example industry benchmark data, data from public company SEC filings, probability distributions (if you want to use something more sophisticated like Monte Carlo simulation), and yes, even gut-feel values. As you start gathering data, you can transition from those external data sources to your own internal data. But when do you make this transition? How much data is enough data? In contrast to new businesses, well-established companies, such as the Fortune 500, have databases upon J A N U A R Y / F E B R U A R Y 2 014 | 23