Journal on Policy & Complex Systems Volume 5, Number 2, Fall 2019 | Page 171

Journal on Policy and Complex Systems
in applications as well . Predicted risk influences police action in child protective services calls ( Church & Fairchild , 2017 ; Cuccaro-Alamin , Foust , Vaithianathan , & Putnam-Hornstein , 2017 ). Algorithms determine parole ( Berk , 2017 ).
Methods to decision making for data-driven or evidence-based policy have the benefit of increased transparency and provenance . However , models interacting with complex systems , like those seen in the social sciences face huge challenges to replicability , reproducibility and model evaluation ( c . f ., Collaboration ( 2015 )). Well-trained models , if placed in situ , may still reveal unintended consequences . There are multiple sides to these issues , and there is no one clear solution . In what follows , we simply advance the idea that downstream applications of the model should affect the way error measures are constructed and represented contextually . Borrowing a metaphor from the logistics community , we can think of it as “ final mile ” analysis . Our work is motivated by the following research questions :
1 . How does an intended application differ from and effect model training error ?
2 . Can complex modeling be used to refine baselines that define acceptable levels or error ?
To address these questions , we have constructed a stylized scenario where a machine learning model is used to make decisions about student placement in classrooms at a hypothetical school . The model predicts a student ’ s GPA that is then binned into three different levels of classroom rigor . Students are then placed into their assigned classrooms , but given model error , the assignment may be incorrect and have a significant negative impact on the student ’ s GPA performance . Here , there are two interesting factors that play into the success of the implementation . First , we assume that the model is trained to closely approximate a student ’ s GPA performance as possible . Therefore , a probabilistic error measure ( distance measure , see Ferri , Hernández-Orallo , and Modroiu ( 2009 )) is used for model training . Second , the model is trained on individual observational data of prior student performance and that assumption is maintained during this data repurposing . However , we add in hierarchical effects at the classroom and school levels that are not captured by the available data . These effects take the form of “ disruptors ” that deter learning throughout the school and are modeled through a simple contagion process . Students that are more susceptible are those that are placed incorrectly in their class , thereby making it more likely that the entire school performs at a lower level .
Although stylistic in nature , this example illustrates how an application can influence the way model training error is interpreted and refined . In what follows , we will present the scenario in more detail . Different types of error will be discussed , and error type constraints will be established from application . The focus will be to establish methods for adjusting model baselines for ac-
168