Analy ze T h i s !
All of the students learn
that these capabilities are
truly essential in order
for them to be able to
answer even moderately
challenging business
questions whose answers
are data-driven.
14
|
way to do a meaningful real-world analytics project without doing a significant amount of data
manipulation work to get the data into shape
for whatever analysis needs to be done. As the
New York Times’ Steve Lohr recently reported
[2], this need for extensive “data wrangling” is as
great for professional data science teams as it is
for my students.
This is where the fun begins. How do you
manipulate data without writing a computer program? The most common first step is to use
Excel to manipulate raw data into the required
format, often in awkward and unnatural ways.
This is not only because of Excel’s increasingly
powerful capabilities for sorting, searching and
summarizing but also because this is an environment with which they already have extensive
experience and comfort. One of last year’s project teams, upon finally determining a particular
statistical analysis that would provide the client
with some unique insights, spent several hours
perusing and posting on http://www.mrexcel.
com/ before ultimately figuring out how to do the
lookup/summary calculations that were needed
to prepare the data.
As the limitations of the Excel platform become apparent, some teams have no choice
but to get educated on other tools such as Python, SQL and R. In class, we often hold short
workshops to help support this learning process,
some led by me and others organized by the
students. But all of this consumes valuable time
on the project calendars, and this data wrangling is often the source of a great deal of stress,
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g