Evaluating the evidence in algorithmic evidence-based decision-making: the case of US pretrial risk assessment tools: Current Issues in Criminal Justice: Vol 33 , No 3

ABSTRACT

Algorithmic decision-making (ADM) promises to strengthen evidence-based decisions, particularly to better manage risks in various domains. Its use also extends to the criminal justice system where algorithmic risk assessments potentially provide very valuable evidence that can inform highly sensitive decisions. Yet, such algorithmic tools also introduce intricate problems that are tied to the fundamental question of exactly what kind and what quality of evidence they offer. This paper illustrates this problem based on a comparison of pretrial risk assessments that have been implemented statewide in the USA. The authors highlight the empirical variation in the construction, evaluation and documentation of these tools to carve out the considerable discretion involved along these dimensions. They also point to further possible ways of looking at the performance of these tools and show why evaluating the quality of the evidence delivered by algorithmic risk assessments is a far from straightforward affair.

KEYWORDS:

Acknowledgements

We thank the anonymous reviewers for their very helpful comments and suggestions. Thanks also go to Malin Grüninger and to Louisa Prien for their assistance in researching the info used in the article.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 https://law.stanford.edu/pretrial-risk-assessment-tools-factsheet-project/

2 https://pretrialrisk.com

3 And it could lead to a differential treatment of groups of people. Due to a biased practice of disproportionately arresting certain groups with the same crime rate as other groups, criminal history can reinforce existing discrimination: ‘[r]acial bias in arrests leads to racial bias in risk scores’ (Eckhouse et al., Citation2019, p. 196).

4 On a more technical note, a trained statistical model of an algorithmic risk assessment tool may still be useful overall for predicting outcomes even individual predictors are not statistically significant. If many of the included predictors are correlated with each other, they share variation and thus could partly be substituted for each other. This also means that the unique variation and information that a predictor has for explaining an outcome is decreased. This multicollinearity of predictors makes it less likely that their coefficients are significant under conventional levels. If the primary aim is to build a well-performing prediction model, one could give less weight to the Type-I error, largely ignore the statistical significance of predictors and rather aim to avoid the Type-II error: accepting the null hypothesis of ‘no effect’ even if there is an effect. However, the models will be inefficient and substantively misleading because some predictors could be dropped without losing predictive power and because one does not know which predictor counts as a relevant explanatory factor. Consequently, when accepting a high Type-I error, one might keep predictors in the model that themselves have no predictive power and do not contribute to a high risk of failure (outcome). However, some individuals may well have scores on these statistically irrelevant features which make them incur higher risk scores.

5 The PTRA, however, does build on the analyses by VanNostrand and Keebler (Citation2009) who present regression tables for determinants of pretrial outcomes.

6 Moreover, the AUC-ROC is calculated for a range of thresholds above which a predicted score counts as the positive outcome class. However, not all thresholds are equally plausible or adequate, meaning only a part of the AUC-ROC region is relevant. As noted earlier, choosing a certain threshold directly means giving a false positive greater (or the same) weight than a false negative or vice versa. Which threshold(s) are plausible therefore depend on this weight ratio and, ideally, the interpretation of the AUC-ROC occurs against a clear definition of the relative weight of the classification errors.

7 If one were to get a classifier performance below 0.5, e.g., 0.3, one can simply flip the positive and negative class and get the inverse, i.e., 0.7.

8 It should be noted that these R²-values reflect the fit of models containing features that were pre-selected based on their relevance judging from bivariate analyses. This means the models do not contain totally irrelevant predictors for which the model fit would be punished by the Pseudo-R² as these are deflated depending on the number of all included predictors. This means that without first winnowing the field predictors, the overall model fit would have been lower.

Additional information

Funding

The authors disclose receipt of the following financial support for the research, authorship, and/or publication of this article: This research has been conducted within the project “Deciding about, by, and together with algorithmic decision-making systems”, funded by the Volkswagen foundation.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 312.00 Add to cart

* Local tax will be added as applicable

Evaluating the evidence in algorithmic evidence-based decision-making: the case of US pretrial risk assessment tools

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Evaluating the evidence in algorithmic evidence-based decision-making: the case of US pretrial risk assessment tools

ABSTRACT

Acknowledgements

Disclosure statement

Notes

Additional information

Funding

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature