Analytics Magazine Analytics Magazine, September/October 2014 | Page 14

Analy ze T h i s ! All of the students learn that these capabilities are truly essential in order for them to be able to answer even moderately challenging business questions whose answers are data-driven. 14 | way to do a meaningful real-world analytics project without doing a significant amount of data manipulation work to get the data into shape for whatever analysis needs to be done. As the New York Times’ Steve Lohr recently reported [2], this need for extensive “data wrangling” is as great for professional data science teams as it is for my students. This is where the fun begins. How do you manipulate data without writing a computer program? The most common first step is to use Excel to manipulate raw data into the required format, often in awkward and unnatural ways. This is not only because of Excel’s increasingly powerful capabilities for sorting, searching and summarizing but also because this is an environment with which they already have extensive experience and comfort. One of last year’s project teams, upon finally determining a particular statistical analysis that would provide the client with some unique insights, spent several hours perusing and posting on http://www.mrexcel. com/ before ultimately figuring out how to do the lookup/summary calculations that were needed to prepare the data. As the limitations of the Excel platform become apparent, some teams have no choice but to get educated on other tools such as Python, SQL and R. In class, we often hold short workshops to help support this learning process, some led by me and others organized by the students. But all of this consumes valuable time on the project calendars, and this data wrangling is often the source of a great deal of stress, a n a ly t i c s - m a g a z i n e . o r g w w w. i n f o r m s . o r g