European Policy Analysis Volume 2, Number 1, Spring 2016 | Page 102

Decision Trees and Random Forests : Machine Learning Techniques to Classify Rare Events
Data Explanation
The data from PAP has two different coding systems . Budget data uses Office of Management and Budget ( OMB ) functions and subfunctions , whereas attention data is coded by the coding scheme of the PAP . Eleven topics are more or less convergent in both coding schemes and , therefore , were selected for the analysis . The topics are : “ National Defense ,” “ International Affairs ,” “ Energy ,” “ Natural Resources and Environment ,” “ Agriculture ,” “ Transportation ,” “ Education , Training , Employment , and Social Services ,” “ Health ,” “ Social Security ,” “ Administration of Justice ,” and “ General Government .”
For all selected 11 topics , the annual percentage budget shifts were taken together with the corresponding year and the legislative period . Figure 1 shows the histogram of the annual percentage budget shifts for all 11 topics . It can be seen that the distribution of budget shifts is clearly not following the bell shape of the normal distribution ( black line ). Instead , we find far too many incremental changes and many extreme values . To decide , whether a budget shift can count as punctuation the interquartile range for each topic is calculated following the approach of Hegelich , Fraune , and Knollmann ( 2015 ). The data contains 553 cases that are no punctuations and 57 cases that are counted as punctuations .
Figure 1 : Annual Percentage Changes in US Budget Functions
102