Weights matter: Improving the predictive validity of risk assessments for criminal offenders: Journal of Offender Rehabilitation: Vol 58 , No 2

Abstract

This article explored whether the use of optimal weights can improve the predictive validity of risk assessments for criminal offenders, an issue the literature has long been divided on. By analyzing the unweighted instrument Level of Service Inventory – Revised (LSI-R), the study provided evidence in favor of optimal weights, showing that a weighted instrument clearly outperforms an unweighted one in predicting recidivism. The magnitude of the difference in performance would not be expected by some of the literature. The study also showed that objective and easily measurable items of such instruments were associated with larger weights than subjective items.

Keywords:

Acknowledgments

I would like to thank my advisors Donald Wittman and Carlos Dobkin for their invaluable help. I am also grateful to Elizabeth Drake of the Washington State Institute for Public Policy for providing me with the data sets used in this article. I benefited from comments and suggestions from Jean Paul Rabanal, Julian Caballero, and two anonymous referees. I thank them all wholeheartedly.

Notes

1 Items need not be objectively measurable (e.g., “any prior offenses”); subjective items can also be part of an instrument (e.g., “attitude toward sentence”), as long as they can be coded in a standardized way. In addition, items don’t need to be of a dichotomous nature taking 0 (no) or 1 (yes) answers. They can measure a continuous variable, such as age, and use cut-off points to separate different levels (e.g., if age is between 18 and 30, 1 point is added to the score, etc.). Moreover, if an item is negatively associated with an outcome, properly structured scales can add negative numbers, such as −1, −2, etc.

2 The Method section provides more information on the distinction between “static” and “dynamic” predictors.

3 The authors compared the predictive validities of those instruments and found the Structured Assessment of Violence Risk in Youth (SAVRY) to be the best, followed by the Violence Risk Appraisal Guide (VRAG) and the Spousal Assault Risk Assessment (SARA). Note that VRAG is one of the instruments that use a form of weighting, while the other two instruments do not.

4 This debate has more recently expanded to include the effectiveness and suitability of machine learning in predicting recidivism outcomes (for a synopsis see Wormith, Citation2017).

5 Laughlin (Citation1978) though showed that the loss in accuracy was actually twice as great as previously thought and, therefore, equal weighting schemes could not be considered a panacea. Wainer (Citation1978) conceded this point.

6 However, Cohen (Citation1990) argued that even in those circumstances the optimal weights will not perform much better than equal weights.

7 For example, the Results section shows that the majority of LSI-R items are positively correlated with outcomes. However, there is also a minority of items that are negatively correlated with outcomes.

8 Some questions can be answered on a scale from 0 to 3, but the responses are also converted to a binary 0 or 1 form by way of a simple conversion method.

9 The full set of LSI-R items can be found in Andrews and Bonta, (Citation1995) or Holsinger, Lowenkamp, and Latessa (Citation2003).

10 The subcomponents are (with number of items in parentheses): criminal history (10), education/employment (10), financial (2), family/marital (4), accommodation (3), leisure/recreation (2), companions (5), alcohol/drug problems (9), emotional/personal (5), attitudes/orientation (4).

11 For example, the criminal history subcomponent with its 10 items plays a more important role in the determination of an offender’s score than the Attitudes/Orientation subcomponent with its 4 items.

12 First generation (1G) assessment is clinical judgment. Second generation (2G) are instruments that include only static items, thus focusing on the risk posed by an offender. Third generation (3G) instruments, such as the LSI-R, include static and dynamic items, thus targeting both the risk and the needs of an offender. Fourth generation (4G) instruments also measure the responsivity (motivation and abilities) of an offender to treatment (Andrews & Bonta, Citation2006; Drake, Citation2014; Hamilton et al., Citation2016).

13 The Static Risk Assessment (SRA) succeeded the LSI-R in 2008 (Barnoski & Drake, Citation2007). In 2013, Washington State authorities commissioned the development of a 4G assessment. The process led to the creation of the Static Risk Offender Need Guide for Recidivism (STRONG-R; Hamilton et al., Citation2016). Additionally, the Static Risk Assessment, revised (SRA2) and the Ohio Risk Assessment System (ORAS) were also tested on a Washington State population (Drake, Citation2014).

14 Although a discrepancy in favor of the construction sample, known in the literature as shrinkage, is to be expected.

15 Technical violations of parole or probation conditions are not considered recidivism events. The use of convictions as a recidivism measure is generally preferred to arrests since the latter “do not represent a determination of guilt by a court” (Barnoski, Citation1997, p. 3). Note though that when the follow-up period is short (12 months), the use of arrests is accepted by some of the literature as a second-best option (Latessa, Lemke, Makarios, & Smith, Citation2010).

16 The term “offender” is used loosely here because, as noted previously, individual observations are offenses and not offenders, since several offenders in the sample have committed multiple offenses.

17 For robustness purposes, the OLS results were verified by estimating separate nonlinear logit and probit models.

18 The nonlinear logit and probit models yield extremely similar weights, which have been omitted for brevity.

19 Such items include having insufficient peer interactions, reliance on social assistance, ever having alcohol problems, etc.

20 It should be noted that this strong link between recidivism and community supervision violations may also be due to unaccounted for factors that are related to differences in the way authorities supervise different individuals in the community. I owe this observation to a referee’s comment.

21 The large shrinkage for the WI from 0.7246 in the construction sample to 0.6851 in the validation sample should also be noted. Shrinkage was also observed for the other two types of recidivism but it was much milder. The case of violent felonies can only be attributed to the fact that the predicted event became increasingly rare as one moved from general recidivism to more specific types. The difference with the LSI-R (0.6397 for the construction sample and 0.6291 for the validation sample) was still large and significant though.

22 Although, subjective items allow authorities to exercise more judgment in their risk assessment decisions, as shown by Georgiou (Citation2017).

23 It is always possible to reconstruct the unweighted scores of the offenders on a particular scale, if all the information is available to the authorities.

Weights matter: Improving the predictive validity of risk assessments for criminal offenders

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Weights matter: Improving the predictive validity of risk assessments for criminal offenders

Abstract

Acknowledgments

Notes

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature