Journal on Policy & Complex Systems Volume 3, Issue 2 | Page 178

Simulating Heterogeneous Farmer Behaviors
an ambient-based policy is in place , the agent ’ s income may be affected by a tax or subsidy based on the target level and the total environmental damage . This influence on income further affects agent decisions in the next year . An agent ’ s production and adoption decisions are modeled based on the production and adoption deviations from the target levels . These deviations are modeled in two phases as demonstrated in the next section .
Experimental Data Analysis
We conducted statistical data analysis on data from the experiment as documented in Wu , Palm-Forster , and Messer ( 2017 ). The analysis was done in two phases . First , we are interested in classifying people into different behavior groups . The idea is to capture the inherent behavioral difference among people ( e . g ., some people are more environmentally friendly ; some are more self-oriented , etc .) Second , after we classify participants into behavior groups , we estimate how agent production and adoption decisions are influenced by their location , size , information treatment , and type . We use the results to calibrate agent decision rules in the ABM model .
Cluster Analysis
Since we do not have any pre-defined knowledge or want to impose any assumption on how many groups participant behavior should be clustered into , the goal of this analysis is to identify the number of behavior types and cluster agents into that number of groups . With no pre-determined grouping structure , meaning that we do not observe the response variables , cluster analysis is suitable for this purpose . As a popular unsupervised statistical learning method , cluster analysis could generate grouping structures based on patterns in predictors . The first key question is to determine into how many clusters the agents should be grouped .
Clustering Metric
To account for the fixed effects of different treatments , the difference between an agent ’ s actual pollution level and the Nash optimal strategy level in that treatment was considered as a measure of the agent ’ s behavior at each round . Therefore , clustering analysis was implemented based on five variables ( diff1 , diff2 , diff3 , diff4 , diff5 ), the agents ’ differences to Nash over five rounds . These variables are defined as
Diff ijt
= Pollution ijt – TargetPollution ijt
. where Diff ijt denotes the difference of participant i ’ s pollution level to the target pollution level in treatment j , round t .
There are a number of clustering methods available ; the most popular ones include K-means clustering , hierarchical clustering and Gaussian mixture models . There is no definite right or wrong for each of the clustering methods . We selected to use K-means clustering because it generated the most informative grouping structure .
For K-means clustering , the most important task is to determine into how many groups to cluster . This depends on both statistical criterion and knowledge on what a sensible grouping structure is . We perform various statistical procedures to determine the number of clusters .
The Elbow Method
The most intuitive way to determine the number of groups is the “ Elbow Method ”. Figure 5 depicts the within groups sum of squares versus the number of clus-
174