European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 115

European Policy Analysis ROC curve (AUC), the better the model. A purely random classifier has an AUC of 0.5 and is presented by a straight diagonal in the plot. Because the true and false positive rates are independent from the type of classification model, we can use ROC curves to compare the performance of any classifier. Figure 9 shows the ROC curves for all three decision tree models: single decision tree (dashed black line), bagging (dashed gray line), and random forest (black line). For comparison, a random classification is added (the dashed light gray diagonal) as well as a logistic regression model (gray line) fitted on exactly the same data. As can be seen in the plot, the ensemble methods bagging and random forest clearly outperform the logistic regression and the single decision tree. These differences become even clearer when looking directly at the AUC. Figure 10 shows a bar plot of the different AUCs with the critical values 0.5 (random classification) and 0.75 (standard for clinical tests) added as dashed lines.15 Now, we can conclude that the random forest model outperforms the other models. In data mining, this result could be the end of the analysis.16 The best model is taken to run predictive analytics and the accuracy leads to sound predictions. But in political science, the focus is normally not foremost on the precision of predictions but on understanding relationships between variables. A good way to interpret a ra ndom forest model is to look at the variable importance. For every predictor variable, we can calculate its influence on the final result. As described earlier, splits in decision trees result from optimizing the classification error rate or the Gini coefficient. So, each split in every tree will lead to a decrease of these two measures. Predictors that lead to stronger decreases, therefore, are more important for the model. The variable importance plot (Figure 11) shows the mean decreases for all seven predictors. Figure 10: Bar Plot of AUCs for different Models 115