European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 101
European Policy Analysis
the predicted results are compared with
the known values of the response of the
test data. This approach is called crossvalidation.4
Why is Machine Learning Useful in
Political Science?
On the one hand, it is quite obvious
that a society that is affected by “big data”
(Mayer-Schönberger and Cukier 2013) in
so many ways needs political data science
capable of analyzing these processes.
Wherever we have to deal with a huge
amount of data that might be poorly
structured, machine learning should be at
hand. Network data—for example, from
social media—often falls in this category.
Machine learning is extremely powerful
on microdata like consumer behavior or
real-time sensor data (e.g., GPS data from
steadily moving targets). An additional
field of application that is perhaps closer
to political science is social media (e.g.,
data from Twitter or Facebook). Finally,
spoken language as data has been analyzed
with machine learning algorithm very
successfully (Suzuki 2009). In recent
years, machine learning has gained more
and more attention in social science, but
this process seems to be quite slow. Seven
years ago, Lazer et al. wrote in Science: “If
one were to look at the leading disciplinary
journals in economics, sociology, and
political science, there would be minimal
evidence of an emerging computational
social science engaged in quantitative
modeling of these new kinds of digital
traces” (Lazer et al. 2009, 721). And, in
2012, Jim Giles complained in Nature:
“Little data-driven work is making it into
top social-science journals” (Giles 2012,
450). Although this observation is still
true if compared with the overwhelming
majority of nonmachine learning articles,
in the last years machine learning related
work has been published in top political
science journals (Cantú and Saiegh 2011;
Grabau and Hegelich 2016; Grimmer and
Stewart 2013; Hainmueller and Hazlett
2014; Hegelich, Fraune, and Knollmann
2015; Hill and Jones 2014; Hopkins and
King 2010; Montgomery, Hollenbach,
and Ward 2012). Nevertheless, up to
now, machine learning is rather an
exotic approach to political science
and there may be multiple reasons for
this. First, machine learning—as will be
demonstrated on the following pages—
is very different from “normal statistics.”
It is not about R2 and significance levels,
and it requires some effort to get familiar
with these methods. Second, although
more and more statistical software has
integrated machine learning algorithms,
state of the art in this method requires a
good deal of computer science knowledge
to obtain, manipulate, and analyze
data (Abedin 2014; Ergül 2013). Third,
political scientists do not often have to
deal with data tables with more than
a million rows or datasets exceeding 1
TB. Comparing voter participation in
28 European countries, for example,
would probably not reveal any limits of
“conventional” statistical approaches.
Machine learning, therefore, is definitely
no one-fits-all solution.
To demonstrate the scope of
machine learning approaches, this
paper takes data from the policy agenda
project (PAP) (www.policyagendas.org)
as a test case.5 The paper focuses on the
question which attention variables can
explain dramatic shifts in annual budgets
(punctuations). This is a supervised
classification task in a rare event
classification problem.
101