Journal on Policy & Complex Systems Volume 3, Issue 2 | Page 133

Policy and Complex Systems
ranges of values for both ordinal and nominal features , the number of two-way interactions ( i . e ., bivariate models ) is on the order of hundreds of thousands ( Table 2 ). For 5thfifth-order interactions , there are over one trillion possible models for two of the three datasets .
Conjunctive Clause Evolutionary Algorithm
We designed the CCEA to search the survey data for multivariate interactions across multiple data types ( i . e ., nominal , ordinal , and continuous ) ( Hanley et al ., 2016 , 2017 ). CCEA is a nonparametric statistical tool that searches across the entire range of multivariate interactions for each feature , where each feature can comprise features sets ( ranges of values ) that vary in size . The only assumption inherent in the models evolved by the CCEA is that ordinal and continuous features must be monotonic or unimodal . The CCEA evolves feature sets and the range of feature values using conjunctive clauses in the following form :
where the term , :=, means “ is defined as ,”
F i represents a feature i that may be nominal , ordinal , or continuous , and whose value lies in a i
, and represents conjunction ( i . e ., logical AND ). Note that a i is a specified range or set of values that is a proper non-empty subset of a pre-specified universal set or a maximum range of each feature . The meaning of such a clause is interpreted as “ if CC k is true for a given input feature vector , then the class outcome is predicted to be associated with k .” Association in this case means that the clause is more often associated with k than one would expect given the global distribution of k . Each one of the clauses , or groups of clauses , could be used as a classifier by stating that if CC k matches an input feature vector , then classify it as k , else classify it as ¬ k .
The CCEA is implemented using a customized version of an Age-Layered Population Structure ( ALPS ) ( Hornby , 2006 ), with five linearly spaced age-layers and an age gap of 5 ( Figure 2 ). In this study , we restrict each CCEA layer to a population size of L ( where L is 64 , the total number of features in the input vectors ). In the CCEA , an additional sixth layer is used as an archive of probabilistically significant clauses . The CCEA was run for 200 generations and five repetitions .
At the start of the first generation ( and every 5 generations there after ), a novel population of clauses , each with age 1 , is introduced into the first age layer . During each generation , all of the individuals in layers 1- – 5 , plus up to L × 5 of the youngest individuals from the archived layer 6 ( or fewer , if the archive does not yet hold this many individuals ) are selected to reproduce with variation . The ages of these selected parents are incremented by 1 ; and they remain in the population . Variation is introduced either through crossover ( with probability P x
= 0.5 ) or through mutation . If selected for crossover , a second parent is selected from the same or preceding ( if one exists ) age layer , using tournament selection with replacement ( tournament size of 3 ); the age of the second parent is not incremented .
If selected for mutation , each feature from the parent is selected with probability 1 / L ( if zero features were initially selected , we select one at random ). For each feature i that was selected , if the selected feature is not present in the clause , then the feature is added to the clause ; and a i is randomly initialized to a nonempty set or a range of allowable values that does not include the entire allowable subset or a range of values . However , if the feature is present in the
129