Journal on Policy & Complex Systems Volume 3, Issue 2

A Novel Evolutionary Algorithm

Figure 2 . Flowchart for the ALPS-based CCEA . For a given target class k , we use the CCEA to evolve an archive of conjunctive clauses ( CCs ) that have a statistically significant probability of being associated with outcome class k ; the CCs can be of arbitrary order , thus representing feature interactions . After running the CCEA , there is an optional post-processing step that can be used to tease out feature associations with target class k that may aid in the selection of causal CCs .

conjunctive clause , then with probability

P _wc

, the feature is removed . For this work , we selected a high probability ( P _wc

= 0.75 ), so that mutation favors order reduction and thus , aids in evolving parsimonious clauses that contain as few features as possible . If the feature Fi is not removed , then the corresponding ai is mutated as follows . If F _i is nominal , we randomly change , add , or delete a categorical value to ai , ensuring that the set remains nonempty and less than the allowable universal set of values . If F _i is ordinal or continuous , we randomly change the lower or upper bound of a _i

, ensuring that the range remains nonempty and less than the maximum allowable range .

Every fifth generation , individuals in layers 1 – 5 age out of their layers into the next higher age layer and a new random population is created for layer 1 . Those aging out of layer 5 are discarded from the population .

Fitness Function

To determine whether a conjunctive clause is probabilistically significant , the CCEA uses the hypergeometric probability mass function ( PMF ) ( Kendall , 1952 ) as the fitness function . For conjunctive clauses evolved in the CCEA , the hypergeometric PMF is defined as follows :

where N _tot is the total number of observations that have non-missing values for the feature combination , X _tot is the total number of observations in the target class , k , that have non-missing values for the feature combination , n _match is the number of samples whose features match a given clause , and x _match is the number of observations that match the clause and are in target class k .

Equation . ( 2 ) quantifies the likelihood that the observed association be-

130

Journal on Policy & Complex Systems Volume 3, Issue 2 | Page 134