SYSTEMATIC MISSING VALUES
REALLY MATTER
From the above example we
learned that systematic patterns
in the occurrence of missing values really matter. In order to illustrate the quantitative effect
of random or systematic missing values on model quality,
simulation studies have been
performed. For scenarios with
varying proportions of missing values and different types
of missing values, the values
have been set to “missing” in the
analysis data. As a next step the
missing data have been imputed
with the mean before they were
used in the predictive model.
From the results shown in Figure 4, there is clearly a remarkable
difference in model quality when
dealing with different types of Figure 5 (top) and Figure 6 (bottom): Machine
missing values; the blue and green downtime in a factory over 12 weeks.
lines represent the scenarios with
random missing values and lay
example. Another customer may refuse to
higher than their counterparts from the
answer a question so no value is entered
scenarios with systematic missing values
in, for example, the field “number of chil(red and brown lines).
dren.” Such cases can be easily detected
How do I know that something is missand selected by database queries. But not
ing? This question may sound trivial; a
all missing values reveal themselves in
missing value in a table can be recognized
such an explicit way. Consider the Figure
by the fact that a cell is empty. Aunt Su5 and Figure 6 that show machine downsanne’s missing date of birth value is one
time in a factory over 12 weeks.
A NA L Y T I C S
J A N U A R Y / F E B R U A R Y 2 014
|
63