European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 100

Decision Trees and Random Forests : Machine Learning Techniques to Classify Rare Events
rare event classification task , the decision trees , bagging , random forest , and logistic regression are compared and approaches to visualize the results are discussed .
The paper ends with a conclusion summing up the advantages and pitfalls of machine learning in political science .
Statistical
Theory
and
Data
Explanation
Machine Learning — A Closer Look

The term “ machine learning ” means that a computer program ( algorithm ) changes its performance when new data is provided :

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience E . ( Mitchell 1997 , 2 )
Spam filters are a good everydaylife example for machine learning ( Conway and White 2012 , 73 – 92 ). If users add an email to their spam folder , the program analyzes the data of this email and will probably identify similar emails as spam from thereon .
A common subdivision of machine learning is the differentiation of supervised and unsupervised learning . Supervised learning — like in the spamfilter example — relies on a training sample with known values of the response variable , while unsupervised learning algorithms search autonomously for similarities in data like patterns or clusters . 1 Within supervised learning techniques , it is common to differentiate between classification and regression problems . The former are the tasks where the response variable is categorical : like TRUE / FALSE or “ American ,” “ European ,” “ Asian ,” and so on . In regression tasks , the response variable is numeric . However , many machine learning algorithms can deal with classification as well as regression problems .
One of the key advantages of machine learning is its biggest pitfall as well . Machine learning algorithms can easily handle great numbers of variables . Unlike in normal linear regression , there is even the possibility of having more predictor variables than cases in the data . 2 Of course , this makes machine learning very computational intensive , but this is not really the problem , as long as the machine is doing this work . 3 In most settings , this flexibility will lead to a situation in which machine learning is outperforming the accuracy of classical statistical approaches . But the danger is that machine learning is just too good : “ These more complex models can lead to a phenomenon known as overfitting the data , which essentially means they follow the errors , or noise , too closely ” ( James et al . 2013 , 22 ). Fitting noise instead of the signal ( Silver 2012 ) means that the model will predict very accurate even those points in the data that deviate just randomly . In consequence , the model that seemed to be very sound would perform weak on new data . To overcome this problem , it is common practice to do the work twice . The original data is divided ( often randomly ) in a training set and a test set . Then the model is only fitted on the training data . In the second step , this model is used to predict the test data . To evaluate the model ’ s performance ,
100