MIS S IN G VA LU E S
When looking at the course over
time, we might assume that the diagrams in Figure 5 and Figure 6 represent different data (maybe different
years or different factories). In truth the
graphs are built with the same data that
is shown in Table 1:
WEEK
1
3
7
8
11
DOWNTIMES
12
16
8
7
15
Table 1
From a pure technical point of view,
no values are missing and the “number of
failures” is always filled. However, from a
content point of view we do indeed have
missing data: There is no entry for weeks
2, 4, 5, 6 and 10. It now has to be decided whether this means that in week 2
there were no failures or if no information
exists for week 2.
The red line in Figure 5 uses the data
as they are and assumes a missing value
for week 2 that is just interpolated with
a straight line. The blue line in Figure 6
treats the missing weeks in the data as
weeks with zero failures and interprets
the data as shown in table 2:
This example illustrates that in timeseries data we can have missing data in
64
|
A N A LY T I C S - M A G A Z I N E . O R G
WEEK
1
2
3
4
5
6
7
8
9
10
11
12
DOWNTIMES
12
0
16
0
0
0
8
7
0
0
15
0
Table 2
the form of missing records that are not
explicitly visible. Only by reviewing the
continuity of the time axis do we detect
these missing values.
Figure 7 shows methods that are
frequently used to detect and impute
missing values for time series data. The
original data only had 12 records, the records for May, July and November 1949
where automatically inserted into the
data and different replacement values
were inserted.
• ZERO VALUE: assuming that if a
period is not represented in the data,
no traffic occurred for that period
• The LAST KNOWN VALUE: if no
information for a period is available
the value is assumed to be the same
as in the previous period
W W W. I N F O R M S . O R G