Journal on Policy & Complex Systems Volume 3, Issue 2 | Page 134

A Novel Evolutionary Algorithm
Figure 2 . Flowchart for the ALPS-based CCEA . For a given target class k , we use the CCEA to evolve an archive of conjunctive clauses ( CCs ) that have a statistically significant probability of being associated with outcome class k ; the CCs can be of arbitrary order , thus representing feature interactions . After running the CCEA , there is an optional post-processing step that can be used to tease out feature associations with target class k that may aid in the selection of causal CCs .
conjunctive clause , then with probability
P wc
, the feature is removed . For this work , we selected a high probability ( P wc
= 0.75 ), so that mutation favors order reduction and thus , aids in evolving parsimonious clauses that contain as few features as possible . If the feature Fi is not removed , then the corresponding ai is mutated as follows . If F i is nominal , we randomly change , add , or delete a categorical value to ai , ensuring that the set remains nonempty and less than the allowable universal set of values . If F i is ordinal or continuous , we randomly change the lower or upper bound of a i
, ensuring that the range remains nonempty and less than the maximum allowable range .
Every fifth generation , individuals in layers 1 – 5 age out of their layers into the next higher age layer and a new random population is created for layer 1 . Those aging out of layer 5 are discarded from the population .
Fitness Function
To determine whether a conjunctive clause is probabilistically significant , the CCEA uses the hypergeometric probability mass function ( PMF ) ( Kendall , 1952 ) as the fitness function . For conjunctive clauses evolved in the CCEA , the hypergeometric PMF is defined as follows :
where N tot is the total number of observations that have non-missing values for the feature combination , X tot is the total number of observations in the target class , k , that have non-missing values for the feature combination , n match is the number of samples whose features match a given clause , and x match is the number of observations that match the clause and are in target class k .
Equation . ( 2 ) quantifies the likelihood that the observed association be-
130