The New Jersey Police Chief Magazine | May 2025
Continued from previous page specific behaviors. For example, AI might disproportionately categorize behavior by Black or Latino individuals as“ aggressive” or“ non-compliant” if these associations are overrepresented in the training data. This contributes to a feedback loop where biased reports inform future AI training, perpetuating structural inequalities. Research on natural language processing has also revealed biases in sentiment analysis and text generation, where language associated with certain demographic groups is often framed more negatively( Caliskan et al., 2017).
Case studies involving other AI applications in the criminal justice system have highlighted the dangers of biased algorithms in realworld scenarios. For example, the Correctional Offender Management Profiling for Alternative Sanctions( COMPAS) system has faced criticism for assigning higher risk scores to Black defendants compared to white defendants with similar backgrounds( Dressel & Farid, 2018). While COMPAS is not directly used for report writing, the underlying issue— biased algorithmic logic— is equally pertinent. In the AI-driven report generation, even subtle biases in language can have profound effects on how incidents are interpreted by courts, media, and the public, potentially leading to discriminatory outcomes.
To mitigate these risks, experts recommend the use of diverse training datasets, regular algorithmic audits conducted by independent third parties, and the inclusion of interdisciplinary oversight panels comprising legal scholars, technologists, civil rights advocates, and community representatives( O’ Neil, 2016). However, these solutions demand resources, expertise, and institutional commitment— elements that may be unevenly distributed across jurisdictions. Without a concerted effort to address algorithmic bias, AI in police report writing risks becoming a high-tech conduit for the very injustices it aims to solve.
Evaluating the Accuracy of ChatGPT-4 in Police Report Writing Recent advancements in natural language processing have positioned AI models like ChatGPT-4 as potential assistants in various administrative and documentation functions within law enforcement. The accuracy of ChatGPT-4 in writing police reports is a topic of increasing academic and operational interest. Empirical studies suggest that while ChatGPT-4 demonstrates proficiency under control conditions, its utility in real-world criminal justice settings remains contingent on multiple variables, including the complexity of the task, the quality and nature of data inputs, and the extent of human oversight.
A 2025 study published in the Egyptian Journal of Forensic Sciences( Elsayed & Ibrahim, 2025) evaluated the performance of ChatGPT-4 in forensic report writing. In structured, retrospective analyses, the model achieved an accuracy rate of 96.6 %, while prospective assessments yielded a 96.2 % success rate in specific condition categories. These figures underscore ChatGPT-4’ s capability in generating coherent, standardized documentation when operating within well-defined parameters. Such outcomes suggest the model’ s potential to assist with the more mechanical aspects of police report writing, such as spelling, grammar, formatting, and summarizing events based on clearly provided data. However, it is crucial to note that forensic report writing often involves more structured and technical language compared to the narrative and contextual complexities frequently found in police incident reports.
The same study( Elsayed & Ibrahim, 2025) also highlighted critical limitations. ChatGPT-4 exhibited reduced reliability in tasks requiring deep interpretive analysis or contextual inference. For instance, when reports involved ambiguous language, emotional nuance, or culturally specific behavior patterns, the AI was less effective in maintaining the narrative coherence and accuracy expected of legal documents. This is particularly problematic in criminal justice, where precision in language can significantly influence prosecutorial decisions, judicial outcomes, and civil rights implications. The potential for misinterpreting slang, sarcasm, or non-verbal cues transcribed from body-worn cameras remains a significant concern.
Concerns about the overreliance on generative AI tools like ChatGPT-4 are further evidenced by real-world incidents. In 2025, a complaint was reportedly filed with the Office of Police Accountability in Seattle, alleging that a police sergeant had used AI— possibly ChatGPT— to generate official reports( Seattle Times, 2025). In response, the agency emphasized the necessity of formal policies to regulate the use of generative AI in law enforcement documentation. Simultaneously, the King County Prosecuting Attorney’ s Office reportedly instituted a prohibition against AI-generated reports being submitted as evidence, citing potential inaccuracies and the challenge of verifying authorship and content integrity( King County Prosecutor’ s Office, 2025). These developments underscore the critical need for a cautious, policy-driven approach to integrating AI into report writing processes. While ChatGPT-4 and similar models may serve as useful adjuncts for standardizing language and improving efficiency in certain aspects of documentation, they are not a substitute for human judgment, particularly in the high-stakes legal environment of police reporting. The inherent risks associated with misrepresentation, contextual error, and evidentiary inadmissibility necessitate rigorous oversight, validation, and clearly.
Recommendations Given the potential benefits and significant risks associated with AI-assisted police report writing, a set of comprehensive policy recommendations is essential to guide ethical and lawful implementation. First, all AI tools used in law enforcement must undergo rigorous, independent testing and validation prior to deployment. These evaluations should assess not only efficiency but also fairness, accuracy( including specific error rates for different demographic groups), and robustness against bias. Agencies should mandate transparent documentation from vendors detailing training data sources, error metrics disaggregated by relevant demographic factors, and the AI’ s decision-making processes.
Second, legislative reform at the state level, particularly in New Jersey, is imperative. Lawmakers should establish clear guidelines governing the use of AI in police documentation, including mandating human review of all AI-generated reports, requiring the retention of original audio and video recordings alongside AI-generated drafts for evidentiary comparison, and ensuring the preservation of AI-generated drafts as part of the official record. These laws should also explicitly clarify the extent to which defense counsel can request discovery of AI processes, including source code and algorithmic logic, to uphold due process and confrontation rights, potentially through the establishment of legal standards for AI transparency in criminal proceedings. The creation of a dedicated AI Oversight Board in New Jersey, comprising legal experts, technologists, civil rights advo-
30
Continued on next page