European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 105

European Policy Analysis Decision Trees A common used class of machine learning algorithms runs under the label decision trees. Decision trees can be used for regression as well as for classification problems and are suitable for extreme event studies (Frohwein and Lambert 2000). In comparison to most methods in classical statistics, decision trees are not based on any probability density function. This means that there is no assumption of any underlying distribution. Decision trees, therefore, belong to the field of nonparametric statistics. In tree-based methods, the predictor space is segmented in a number of simpler regions. For each region, the most likely value of the response is calculated separately. To demonstrate this method, a simplified version of the data described earlier is used. Data is limited to a random sample of 50 data points with only the three variables sou, Year, and Punc. Figure 4 shows the predictor space of this sample data with the State of the Union speeches (sou) on the x-axis and the Year on the y-axis. Punctuations are represented with triangles and incremental changes with circles. There is no eye-catching pattern. The values “Punctuation” and “Incremental” ra ther seem to be randomly scattered in the plot. But this picture changes, if the predictor space is divided in several regions. Figure 4: Predictor Space of Sample Data 105