Analytics Brio April 2014 | Page 10

Data Analysis - Precautions Gautam Banerjee [email protected] The combination of large data sets and observational data means that data analysis activities are often at risk of drawing misleading inferences. In this blog we describe five of these dangers. Within the operations research and analytics community these problems are well understood. Unfortunately, the solution is not easy and often tempers down the enthusiasm, and to be cognizant that rather more complex models are necessary.      Hidden nuances of data collection - Many a times the analyst is not aware of the process of data collection and the hurdles or bottlenecks of doing so. However a detailed understanding of the same renders the data analysis more holistic and improves handling observations that otherwise may lead to diametrically opposite inferences. Say for example we are studying the galactic observations from telescope to understand the distance of space objects from earth. One of the primary assumptions here will be that further the object is from earth, dimmer will it look under the telescopic lens. However there may be interstellar and intergalactic gas and dust clouds, which attenuate radiation. So it violates the primary assumption. Of course these phenomenon is known to scientists but things get more complicated or messy in situations where the data collection process are not so well understood, or, even worse, when the possibility of such events is not considered. Process change or Nonstationarity -Any inferences from data analysis is not valid if the there is any change in the underlying process or pattern of events. For example, a model of herbivorous animals browsing propensity on grass lands will be quite useless when the animal population declines rapidly. Any historical data on browsing wil