European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 101

European Policy Analysis the predicted results are compared with the known values of the response of the test data. This approach is called crossvalidation.4 Why is Machine Learning Useful in Political Science? On the one hand, it is quite obvious that a society that is affected by “big data” (Mayer-Schönberger and Cukier 2013) in so many ways needs political data science capable of analyzing these processes. Wherever we have to deal with a huge amount of data that might be poorly structured, machine learning should be at hand. Network data—for example, from social media—often falls in this category. Machine learning is extremely powerful on microdata like consumer behavior or real-time sensor data (e.g., GPS data from steadily moving targets). An additional field of application that is perhaps closer to political science is social media (e.g., data from Twitter or Facebook). Finally, spoken language as data has been analyzed with machine learning algorithm very successfully (Suzuki 2009). In recent years, machine learning has gained more and more attention in social science, but this process seems to be quite slow. Seven years ago, Lazer et al. wrote in Science: “If one were to look at the leading disciplinary journals in economics, sociology, and political science, there would be minimal evidence of an emerging computational social science engaged in quantitative modeling of these new kinds of digital traces” (Lazer et al. 2009, 721). And, in 2012, Jim Giles complained in Nature: “Little data-driven work is making it into top social-science journals” (Giles 2012, 450). Although this observation is still true if compared with the overwhelming majority of nonmachine learning articles, in the last years machine learning related work has been published in top political science journals (Cantú and Saiegh 2011; Grabau and Hegelich 2016; Grimmer and Stewart 2013; Hainmueller and Hazlett 2014; Hegelich, Fraune, and Knollmann 2015; Hill and Jones 2014; Hopkins and King 2010; Montgomery, Hollenbach, and Ward 2012). Nevertheless, up to now, machine learning is rather an exotic approach to political science and there may be multiple reasons for this. First, machine learning—as will be demonstrated on the following pages— is very different from “normal statistics.” It is not about R2 and significance levels, and it requires some effort to get familiar with these methods. Second, although more and more statistical software has integrated machine learning algorithms, state of the art in this method requires a good deal of computer science knowledge to obtain, manipulate, and analyze data (Abedin 2014; Ergül 2013). Third, political scientists do not often have to deal with data tables with more than a million rows or datasets exceeding 1 TB. Comparing voter participation in 28 European countries, for example, would probably not reveal any limits of “conventional” statistical approaches. Machine learning, therefore, is definitely no one-fits-all solution. To demonstrate the scope of machine learning approaches, this paper takes data from the policy agenda project (PAP) (www.policyagendas.org) as a test case.5 The paper focuses on the question which attention variables can explain dramatic shifts in annual budgets (punctuations). This is a supervised classification task in a rare event classification problem. 101