European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 115
European Policy Analysis
ROC curve (AUC), the better the model.
A purely random classifier has an AUC of
0.5 and is presented by a straight diagonal
in the plot. Because the true and false
positive rates are independent from the
type of classification model, we can use
ROC curves to compare the performance
of any classifier.
Figure 9 shows the ROC curves
for all three decision tree models: single
decision tree (dashed black line), bagging
(dashed gray line), and random forest
(black line). For comparison, a random
classification is added (the dashed
light gray diagonal) as well as a logistic
regression model (gray line) fitted on
exactly the same data. As can be seen in
the plot, the ensemble methods bagging
and random forest clearly outperform the
logistic regression and the single decision
tree.
These differences become even
clearer when looking directly at the AUC.
Figure 10 shows a bar plot of the different
AUCs with the critical values 0.5 (random
classification) and 0.75 (standard for
clinical tests) added as dashed lines.15
Now, we can conclude that the
random forest model outperforms the
other models. In data mining, this result
could be the end of the analysis.16 The
best model is taken to run predictive
analytics and the accuracy leads to sound
predictions. But in political science,
the focus is normally not foremost on
the precision of predictions but on
understanding relationships between
variables.
A good way to interpret a ra ndom
forest model is to look at the variable
importance. For every predictor variable,
we can calculate its influence on the
final result. As described earlier, splits
in decision trees result from optimizing
the classification error rate or the Gini
coefficient. So, each split in every tree will
lead to a decrease of these two measures.
Predictors that lead to stronger decreases,
therefore, are more important for the
model. The variable importance plot
(Figure 11) shows the mean decreases for
all seven predictors.
Figure 10: Bar Plot of AUCs for different Models
115