European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 112

Decision Trees and Random Forests : Machine Learning Techniques to Classify Rare Events
Figure 7 : Decision Tree on Training Set
Table 1 : Cross-table Decision Tree / Training Set
Prediction
Real
FALSE
TRUE
Total
FALSE
366
27
393
TRUE
5
8
13
Total
371
35
406
The cross-table of the results ( Table 1 ) shows the predicted results in the rows and the real results in the columns . For example , the decision tree has predicted 393 times the class “ FALSE ”. Three hundred sixty six of these cases have been correctly detected ( true positives ), while 27 have the real label “ TRUE .” We can see that the decision tree has predicted the right results in 374 ( 366 + 8 ) of 406 cases . This is the classification rate of 97 percent .
Unfortunately , these results are strongly overfitted . If the model is used to predict new data from the test set , the results are much weaker ( Table 3 ). Now , the classification rate reaches “ only ” 87 percent . More dramatic : only one punctuation was rightly detected .
112