European Policy Analysis Volume 2, Number 1, Spring 2016

Decision Trees and Random Forests : Machine Learning Techniques to Classify Rare Events

Figure 7 : Decision Tree on Training Set

Table 1 : Cross-table Decision Tree / Training Set

Prediction		Real
	FALSE	TRUE	Total
FALSE	366	27	393
TRUE	5	8	13
Total	371	35	406

The cross-table of the results ( Table 1 ) shows the predicted results in the rows and the real results in the columns . For example , the decision tree has predicted 393 times the class “ FALSE ”. Three hundred sixty six of these cases have been correctly detected ( true positives ), while 27 have the real label “ TRUE .” We can see that the decision tree has predicted the right results in 374 ( 366 + 8 ) of 406 cases . This is the classification rate of 97 percent .

Unfortunately , these results are strongly overfitted . If the model is used to predict new data from the test set , the results are much weaker ( Table 3 ). Now , the classification rate reaches “ only ” 87 percent . More dramatic : only one punctuation was rightly detected .

112

European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 112