Analytics Magazine Analytics Magazine, January/February 2014

MIS S IN G VA LU E S When looking at the course over time, we might assume that the diagrams in Figure 5 and Figure 6 represent different data (maybe different years or different factories). In truth the graphs are built with the same data that is shown in Table 1: WEEK 1 3 7 8 11 DOWNTIMES 12 16 8 7 15 Table 1 From a pure technical point of view, no values are missing and the “number of failures” is always filled. However, from a content point of view we do indeed have missing data: There is no entry for weeks 2, 4, 5, 6 and 10. It now has to be decided whether this means that in week 2 there were no failures or if no information exists for week 2. The red line in Figure 5 uses the data as they are and assumes a missing value for week 2 that is just interpolated with a straight line. The blue line in Figure 6 treats the missing weeks in the data as weeks with zero failures and interprets the data as shown in table 2: This example illustrates that in timeseries data we can have missing data in 64 | A N A LY T I C S - M A G A Z I N E . O R G WEEK 1 2 3 4 5 6 7 8 9 10 11 12 DOWNTIMES 12 0 16 0 0 0 8 7 0 0 15 0 Table 2 the form of missing records that are not explicitly visible. Only by reviewing the continuity of the time axis do we detect these missing values. Figure 7 shows methods that are frequently used to detect and impute missing values for time series data. The original data only had 12 records, the records for May, July and November 1949 where automatically inserted into the data and different replacement values were inserted. • ZERO VALUE: assuming that if a period is not represented in the data, no traffic occurred for that period • The LAST KNOWN VALUE: if no information for a period is available the value is assumed to be the same as in the previous period W W W. I N F O R M S . O R G

Analytics Magazine Analytics Magazine, January/February 2014 | Page 64