European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 105
European Policy Analysis
Decision Trees
A common used class of machine
learning algorithms runs under the label
decision trees. Decision trees can be used
for regression as well as for classification
problems and are suitable for extreme
event studies (Frohwein and Lambert
2000). In comparison to most methods in
classical statistics, decision trees are not
based on any probability density function.
This means that there is no assumption
of any underlying distribution. Decision
trees, therefore, belong to the field of
nonparametric statistics. In tree-based
methods, the predictor space is segmented
in a number of simpler regions. For
each region, the most likely value of the
response is calculated separately.
To demonstrate this method, a
simplified version of the data described
earlier is used. Data is limited to a random
sample of 50 data points with only the
three variables sou, Year, and Punc.
Figure 4 shows the predictor
space of this sample data with the State
of the Union speeches (sou) on the x-axis
and the Year on the y-axis. Punctuations
are represented with triangles and
incremental changes with circles. There
is no eye-catching pattern. The values
“Punctuation” and “Incremental” ra ther
seem to be randomly scattered in the plot.
But this picture changes, if the
predictor space is divided in several
regions.
Figure 4: Predictor Space of Sample Data
105