Analytics Magazine Analytics Magazine, January/February 2014 | Page 58

WO RK IN G W I T H DATA Missing values The origin, detection, treatment and consequences of missing values in analytics. BY GERHARD SVOLBA Missing values – and how to deal with them – is an inevitable problem for statisticians, data miners or anyone working with analytical data. Missing values in the data create uncertainty for the analyst and the information consumer because decisions need to be made without having the full picture. Missing values should trigger a discussion about randomness and systematic patterns, as they might introduce more fuzziness and/or bias into the picture. Missing values can also reduce the number of usable records for the analysis, or force analysts to eliminate variables from the analysis. This happens for a technical reason, since many analytical methods M 58 | A N A LY T I C S - M A G A Z I N E . O R G such as regression techniques, neural networks or cluster algorithms cannot deal with missing values, as they require numeric values to be used in a mathematical equation. Consequently, if an observation has a missing value in any of the required variables, the whole observation (data record) needs to be omitted from the analysis. Other options would be to exclude it from the analysis variable as a whole or to insert imputation values for the missing data points. (However, in descriptive statistics or with decision trees, missing values can simply be represented as a separate category.) Along with the technical consideration of what to do with missing values, it is also important, from a business point of view, to W W W. I N F O R M S . O R G