European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 107

European Policy Analysis
In Figure 5 , predictor space is now divided in a way that either more “ Punctuation ” or more “ Incremental ” values are in each region . For classification , we simply look for the dominant category in each region . For example , if Year is less than 1959 , it is more likely that there had been a punctuation in budgets . If we split the remaining area again , we see that frequent mentioning of a topic in the State of the Union speeches of the previous years made punctuations less likely . 7
The division of the predictor space in different regions can be visualized in a decision tree . Figure 6 shows the corresponding decision tree .
The great thing about splitting the predictor space in regions is that the direction of the predictors ’ influence can change . If sou is bigger than 1.5 , this increases the chance of punctuations , but if it is above 54 punctuations are less likely .
In order to find out how the predictor space could be divided in regions , two things are necessary : a criterion to evaluate the value of a possible split and an algorithm to find splits that optimize this criterion .
In classification trees , the classification error rate for each region would be a compelling evaluation criterion 8 :
EE = 1 − max
! pp !" .
Here , p ! represents the proportion of training observations in the mth
mk
region that are from the kth class .” ( James et al . 2013 , 311 – 312 ).
Because the classification error is not sensitive enough for tree growing , in practice , the Gini index is often preferred . “ The Gini index is defined by
!
GG = pp !" 1 − pp !" ,
!!! a measure of total variance across the K classes ” ( James et al . 2013 , 312 ). The Gini index can be interpreted as a measure of node purity , that is , the lower the number of observations from different classes in a region , the lower the Gini index will be . For a region with only one class , the Gini index will be zero .
To find the best partition of the predictor space , recursive binary splitting is used , which is a top-down , greedy approach :
The approach is top-down because it begins at the top of the tree ( at which point all observations belong to a single region ) and then successively splits the predictor space ; each split is indicated via two new branches further down on the tree . It is greedy because at each step of the treebuilding process , the best split is made at that particular step , rather than looking ahead and picking a split that will lead to a better tree in some future step . ( James et al . 2013 , 306 )
It is recursive because every branch of the tree could be seen as a tree on its own and the same procedure is applied again ( Siroky 2009 ).
The algorithm can be described as follows :
1 . Try all possible cutpoints for the first predictor and calculate the resulting evaluation criterion . 2 . Choose the cutpoint that reduces the criterion most .
107