Analytics Magazine Analytics Magazine, January/February 2014 | Page 63

SYSTEMATIC MISSING VALUES REALLY MATTER From the above example we learned that systematic patterns in the occurrence of missing values really matter. In order to illustrate the quantitative effect of random or systematic missing values on model quality, simulation studies have been performed. For scenarios with varying proportions of missing values and different types of missing values, the values have been set to “missing” in the analysis data. As a next step the missing data have been imputed with the mean before they were used in the predictive model. From the results shown in Figure 4, there is clearly a remarkable difference in model quality when dealing with different types of Figure 5 (top) and Figure 6 (bottom): Machine missing values; the blue and green downtime in a factory over 12 weeks. lines represent the scenarios with random missing values and lay example. Another customer may refuse to higher than their counterparts from the answer a question so no value is entered scenarios with systematic missing values in, for example, the field “number of chil(red and brown lines). dren.” Such cases can be easily detected How do I know that something is missand selected by database queries. But not ing? This question may sound trivial; a all missing values reveal themselves in missing value in a table can be recognized such an explicit way. Consider the Figure by the fact that a cell is empty. Aunt Su5 and Figure 6 that show machine downsanne’s missing date of birth value is one time in a factory over 12 weeks. A NA L Y T I C S J A N U A R Y / F E B R U A R Y 2 014 | 63