Figure 7: Methods to detect, impute missing values for time series data.
• The MEAN VALUE: assuming that a
missing value is best represented by
the mean of the existing values in the
time series
• The INTERPOLATED VALUE: here a
most likely value is found based on
spline interpolations
The analysis has to be checked where
data points are missing (where the time
series has “holes”) and how these holes
shall be interpreted from a business point
of view. These considerations then lead
to the decision of how the missing values
shall be imputed.
SUMMARY
the data. Business considerations are
needed to decide how they shall be detected and handled. The aim is to get a
more complete picture and to remove
biases and patterns. Analytical methods help to detect missing values, to
provide optimal replacement values
and to simulate the consequences on
model quality.
Gerhard Svolba ([email protected])
works for SAS Austria as an analytic solution
architect. He is the author of the SAS Press
books “Data Preparation for Analytics Using
SAS” and “Data Quality for Analytics Using
SAS” and speaks at international analytics
conferences about the necessary pre-steps
before statistical analyses can start. To
download the presentations click here.
In analytics, missing values are
more than just a technical feature of
A NA L Y T I C S
J A N U A R Y / F E B R U A R Y 2 014
|
65