MIS S IN G VA LU E S
record the variables AGE
and DATA TRAFFIC are
missing and the variable
GENDER is not missing.
With such a representation (Figure 3), it can
be seen at one glance
that about 60 percent of
the records don’t have
a single missing value
(pattern 000000000000), Figure 3: Representation of missing records.
and that another 30 percent of the records have
a missing value in only
one of the variables (light
blue). The little red cells
show groups of records,
where already five or
more variables are missing. This information
is important to decide
whether missing values
shall be imputed by analytical methods or not.
Such a representation
method is well suited to Figure 4: X-axis represents the increasing proportion of missing
values, Y-axis shows the relative average response rate of the
detect patterns of missing
predictive model.
values in the variables.
It provides an answer to
the question: Which missing values occur
be treated differently in marketing actions
together and helps to define segments in
and probably have demand for specific
the data? In our case we would probably
hardware (phone with large keys, simple
find a segment “Aunt Susanne and her
usage, etc.). Or they may need special asfriends.” Customers in this segment should
sistance through the customer care hotline.
62
|
A N A LY T I C S - M A G A Z I N E . O R G
W W W. I N F O R M S . O R G