6,089
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

Forensic Mental Health Practitioners’ Use of Structured Risk Assessment Instruments, Views about Bias in Risk Evaluations, and Strategies to Counteract It

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon

Abstract

The use of structured risk assessment instruments (SRAIs) has increased significantly over the past decades, with research documenting variation between countries. The use of SRAIs, their perceived utility and potential for mitigating bias in forensic risk evaluations (FREs) was investigated in a survey of Dutch forensic mental health practitioners (N = 110) We found generally positive views regarding SRAI utility. Bias in FREs was of concern to respondents. We found no evidence of a bias blind spot (the belief that oneself is less prone to bias than peers/colleagues). SRAIs were rated as the most effective debiasing strategy, but respondents also endorsed introspection. There were few differences in beliefs about sources of bias or debiasing strategies between respondents who had bias training and those who had not, suggesting the need for development of effective strategies to mitigate bias and training related to bias in FREs.

Forensic mental health practitioners play an important role in advising courts about the treatment and management of people who suffer from a mental illness and have been accused or convicted of a crime. Forensic mental health evaluations are a critical component of legal decision making because judges tend to follow the recommendations made by a behavioral expert (Gowensmith et al., Citation2012, Citation2014; Leij et al., Citation2001; Messina et al., Citation2019). Specifically, an offender’s estimated risk of violent or sexual reoffending is a key consideration in determining what conditions are necessary—and legally justified—to minimize the risk of harm to potential victims and to make the best use of limited resources (Harte & Breukink, Citation2010).

Use of structured risk assessment instruments in forensic risk evaluations

Research conducted over the past decades indicates that using structured approaches in forensic risk evaluations (FREs)—particularly structured risk assessment instruments (SRAIs)—increases the accuracy of predictions about the likelihood of future violent or sexual offending, as compared to unstructured clinical judgment (AEgisdóttir et al., Citation2006; Andrews et al., Citation2006; Dawes et al., Citation1989; Grove & Meehl, Citation1996; Hanson & Morton-Bourgon, Citation2009). Unstructured clinical judgment (UCJ) is a method of FRE in which the clinician uses only their experience and intuition to evaluate an examinee’s risk of future violence (Grove et al., Citation2000). In a seminal monograph, Monahan (Citation1981) revealed that UCJs about the likelihood of violent reoffending were correct in about one out of three cases, which lead to criticisms of UCJ as being inaccurate and unreliable (Douglas & Kropp, Citation2002).

Since the early 1980s, an astonishing variety of standardized and evidence-based SRAIs have been developed for evaluating the potential for different types of violence (Shepherd & Sullivan, Citation2017). For example, among a large-scale international survey of forensic mental health professionals (N = 2,135), the respondents reported using over 200 different commercially available SRAIs and more than 200 different locally-developed SRAIs (Singh et al., 2014). SRAIs provide information about the probability of recidivism, the severity of the consequences if reoffending occurs, and whether the reoffending risk is imminent or more remote. SRAIs can also help to identify offender treatment needs that, if addressed, may reduce the risk of reoffending (Bonta & Andrews, Citation2007).

Actuarial risk assessment instruments are one type of SRAI and contain factors empirically related to an increased risk of violent, sexual, or general criminal recidivism (Doyle & Dolan, Citation2007), depending on the instrument’s purpose. Risk factors are scored and combined according to an algorithm determined by the tool developers. The final risk score thereby provides a recidivism risk estimate based on the recidivism rates of groups of individuals with the same score (Doyle & Dolan, Citation2007). A second type of SRAIs employs structured professional judgment (SPJ), a method in which the evaluator considers the presence or absence of empirically-based risk factors and risk factors they deem relevant based on expertise. The evaluator uses their professional judgment to determine the relative importance of these factors to the individual case (Douglas & Kropp, Citation2002). Actuarial and SPJ instruments generally exhibit approximately similar predictive validity (Campbell et al., Citation2009; Yang et al., Citation2010), but the type of SRAI an evaluator chooses may depend on the setting and purpose of the risk evaluation (Brown & Rakow, Citation2016).

Numerous national and international surveys of forensic mental health professionals indicate that the use of SRAIs to assess the risk of violent recidivism is increasingly common in FREs (Archer et al., Citation2006; Hurducas et al., Citation2014; Kelley et al., Citation2020; Lally, Citation2003; Neal & Grisso, Citation2014b; Singh et al., 2014). Findings from more recent surveys also indicate that SRAI use and FRE practices can vary widely across countries (e.g., Canada: McLaughlin & Kan, Citation2014; Denmark: Nielsen et al., Citation2015; Belgium: Pham et al., 2016; Israel: Singh et al., Citation2019). Yet there is relatively little country-specific research about which SRAIs forensic evaluators are required to use in practice, which SRAIs they choose to use, and their perceptions about the usefulness of those tools (Hurducas et al., Citation2014).

Use of SRAIs in forensic risk evaluations in the Netherlands

SRAIs were introduced into the Dutch forensic mental health system in the late 1990s (de Ruiter & Hildebrand, Citation2007) and are now commonly used in FREs. Dutch forensic mental health experts evaluate approximately four to five thousand criminal defendants a year and produce a written report for a court (Messina et al., Citation2019). In the Netherlands, a person who has committed a crime and has a mental disorder can be ordered to treatment in a secure psychiatric facility, usually after time served in prison, a disposition known as maatregel van terbeschikkingstelling (TBS; de Ruiter & Hildebrand, Citation2007). Although the goal of treatment under TBS is to successfully reintegrate the offender into the community, a TBS order can entail lifelong mandatory treatment (Bogaerts et al., Citation2018). SRAIs are therefore used to assess offenders both for potential commitment to TBS, and for regular reviews to determine if the individual’s risk has been lowered to a degree sufficient to warrant release (de Ruiter, Citation2016).

We are unaware of any published surveys of Dutch forensic mental health practitioners about the SRAIs they most commonly use and the perceived utility of those instruments. Two relatively recent studies published in the Dutch language reported on SRAIs commonly used in forensic settings in the Netherlands (Harte & Breukink, Citation2010; van Horn et al., Citation2016), but neither surveyed practitioners about which SRAIs they use in practice, nor the perceived utility of the SRAIs. Therefore, in the current study, we survey forensic mental health practitioners about the SRAIs they are required or choose to use and the perceived usefulness of specific SRAIs.

Bias awareness and sources of bias in forensic risk evaluations

Cognitive bias refers to systematic errors in logic or reasoning that occur outside of conscious awareness (Wilson & Brekke, Citation1994) and are the result of the mind’s automatic processing of information based on experience and prior expectations (Tversky & Kahneman, Citation1974). Therefore, cognitive bias represents a threat to the objectivity and validity of a forensic mental health evaluation (Neal & Grisso, Citation2014a; Zapf & Dror, Citation2017). In addition, the reliability and accuracy of FREs depend to a large extent on the evaluator’s ability to minimize the influence of cognitive bias and reach objective conclusions (MacLean et al., Citation2019).

A number of recently published surveys suggest that forensic mental health practitioners are aware of, and concerned about, the potential for bias in forensic mental health evaluations (Kukucka et al., Citation2017; Neal & Brodsky, Citation2016; Neal & Grisso, Citation2014b; Zapf et al., Citation2018). However, many remain skeptical about bias affecting their own work as evidenced by a bias blind spot (Pronin et al., Citation2002), that is, the belief that they are less prone to bias than their colleagues (Boccaccini et al., Citation2017; Kukucka et al., Citation2017; Neal & Brodsky, Citation2016; Zapf et al., Citation2018; Zappala et al., Citation2018). For example, Zapf and colleagues (Zapf et al., Citation2018) surveyed 1,099 mental health practitioners who conduct forensic evaluations and just over half (52.2%) agreed that their own judgments can be influenced by cognitive bias (Zapf et al., Citation2018).

In order for forensic evaluators to take appropriate steps to counter bias, they must first be aware of it, have accurate perceptions about how bias operates (e.g., Kukucka et al., Citation2017; MacLean et al., Citation2019; Zapf & Dror, Citation2017), and accept the potential for it to affect their work (Dror, Citation2018; Lilienfeld et al., Citation2009; Wilson & Brekke, Citation1994). Unfortunately, it also appears that most forensic evaluators do not receive formal training about how various type of cognitive biases can impact FREs (Zapf et al., Citation2018). Finally, forensic evaluators need guidance about specific effective debiasing strategies that they can employ to increase the objectivity and validity of their conclusions (MacLean et al., Citation2019; Neal & Brodsky, Citation2016; Zapf et al., Citation2018).

Debiasing strategies

Research related to potential debiasing strategies is substantial in medical, business, and policy applications (Soll et al., Citation2015).Some progress has also been made in seeking to counteract bias that may occur in the physical forensic sciences (Dror, Citation2018; Dror et al., Citation2015; Jeanguenat et al., Citation2017; Kassin et al., Citation2013). Yet, despite indications that bias can affect forensic evaluations (e.g., Beckham et al., Citation1989; Boccaccini et al., Citation2008; Boccaccini et al., Citation2017; Guarnera et al., Citation2017; Murrie et al., Citation2013), research regarding forensic mental health evaluators’ awareness of various types of cognitive bias and potential strategies to counteract them has only recently emerged (Dror & Murrie, Citation2018; MacLean et al., Citation2019; Neal & Brodsky, Citation2016; Neal & Grisso, Citation2014a; Zapf et al., Citation2018; Zapf & Dror, Citation2017).

In the first study to examine forensic mental health evaluators’ ideas for potential debiasing strategies, Neal and Brodsky (Citation2016) conducted interviews with 20 forensic psychologists certified by the American Board of Forensic Psychology. The interview prompts were designed to obtain information about the psychologists’ awareness of the potential for bias in forensic evaluations and different strategies they believed could minimize bias in their own work. Qualitative analysis of the interviews revealed 25 strategies the participants believed may be useful to mitigate bias in forensic evaluations (see for a complete list). In the second stage of their study, Neal and Brodsky asked 351 members of the American Psychological Association to rate the 25 strategies on their usefulness as potential bias correction measures. Overall, participants in the second stage of the study rated 22 of the strategies as useful or very useful. We sought to add to these findings by eliciting Dutch forensic evaluators’ beliefs about potential sources of bias and the effectiveness of various strategies to counteract it in FREs (Neal & Brodsky, Citation2016; Zapf et al., Citation2018).

Table 4. Means and Frequencies of Effectiveness Ratings for Debiasing Strategies (N = 110).

The current study

In this study, we surveyed Dutch forensic mental health evaluators regarding their use and perceived effectiveness of SRAIs for evaluating violent, sexual, criminal, and intimate partner violence risk. Based on previous research conducted by Zapf et al. (Citation2018), we also measured Dutch evaluators’ awareness about the potential for bias in forensic risk evaluations. This study further adds to the previous research conducted by Neal and Brodsky (Citation2016) by examining evaluators’ perceptions of the efficacy of various debiasing strategies to reduce the potential for cognitive bias in FREs.

Method

Participants

We aimed to obtain a representative sample of Dutch mental health practitioners who conduct forensic risk evaluations. To be eligible to participate, respondents had to be a mental health professional who works in the Netherlands and conducts risk evaluations. Out of 154 respondents who began the survey, 44 did not complete it (71.4% completion rate). We excluded incomplete surveys from our analyses. Therefore, our sample was comprised of 110 respondents with current or previous experience conducting risk evaluations. About 60.9% of respondents were women and approximately 39.1% men. Most had obtained a Master’s degree (n = 77; 70.0%) and 25 had a doctoral degree (22.7%). Six respondents held a Bachelor’s degree (5.5%), and two had a 2-year degree (1.8%).

Experience in mental health settings among respondents ranged from 0 to 46 years (M = 18.9, SD = 11.8, Mdn = 18). Respondents worked in a variety of settings, including forensic psychiatric hospitals (5.5%), private practice (14.5%), mental health clinics (9.1%), and one person indicated working in a hospital setting. Half of respondents selected the option of “other” for their work environment, with most indicating that they worked in some type of clinical or correctional setting (e.g., forensic outpatient clinic, prison psychiatric center or detention facility), and several who worked solely as a court-appointed evaluator or consultant.

Procedure

This study was reviewed and approved by the Ethics Review Committee for XXX (reference 185_06_11_2017). Data collection took place between late March and early June of 2018. Participants were recruited via advertisements in the newsletter of the Nederlands Register Gerechtelijk Deskundigen (NRGD; Dutch Registry of Court Experts) and of the Expertisecentrum Forensische Psychiatrie (EFP; Expertise Center for Forensic Psychiatry). The NRGD is a registry of forensic professionals created by statute in 2010. Forensic experts who wish to register with the NRGD are evaluated based on field-specific requirements before they are approved (Nederlands Instituut voor Forensische Psychiatrie en Psychologie, Citation2018). The EFP is a professional organization that facilitates cooperation between researchers and practitioners in the field of forensic psychiatry in the Netherlands (Expertisecentrum Forensische Psychiatrie, n.d.).

We also recruited participants via social media with Facebook posts in groups of former forensic psychology Master’s students from XXX University, and via personal invitations by e-mail to professional contacts of one of the authors. This same author also shared the survey announcement on their professional LinkedIn page. The survey was also advertised on KNAPP, which is an online site dedicated to forensic psychiatric care in the Netherlands, and permits messages to be posted to members to facilitate collaboration and knowledge-sharing (https://www.knapp-efp.nl). In return for completing the study, respondents were offered an opportunity to enter a raffle for a chance to win a voucher in the amount of €50 from an online shopping site. The winner was randomly selected from those who completed the survey and provided their e-mail address.

It is difficult to provide an accurate estimate of the number of potential survey respondents we reached. There were 489 forensic mental health experts listed on the NRGD website at the time we conducted the survey and all of them were sent the NRGD newsletter with the study announcement. To NRGD behavioral experts specializing in either adult or juvenile forensic psychology or psychiatry whose e-mail addresses were available online, we sent a personal invitation by e-mail (n = 270) and two follow-up reminders. We have no information about how many potential respondents may have viewed the EFP newsletter online. We estimate approximately 400 people eligible to participate in the survey were reached by Facebook and LinkedIn. Finally, there are approximately 1,880 members on the KNAPP website, although it is unknown how many of them were eligible to participate in the survey. There is also overlap between NRGD experts, professionals who receive the EFP newsletter, the author’s professional network, and users of the KNAPP website. We were unable to determine the number of participants obtained from the individual platforms we used for advertisement of the study. Thus, the exact response rate is unknown.

Survey

The survey was conducted online using Qualtrics and was available to respondents in both English and Dutch. We used the cookies-based Qualtrics feature to prevent respondents from taking the survey multiple times. We opted not to provide a back button in order to prevent respondents from changing their answers after they completed a page.

Respondents were provided with a brief description of the survey and acknowledged their consent to participate. The survey was comprised of five major sections as outlined below.

Demographics, FRE experience, general frequency of SRAI use, concerns about errors, and consulting about FREs

In the first section, we collected demographic information regarding gender, education level, years of experience in mental health settings, and current work environment. We asked respondents to specify how many forensic risk evaluations they had performed over the past two years. Next, respondents were asked to indicate on a 5-point Likert-type scale how useful they thought SRAIs are in conducting risk evaluations (1 = not at all, 2 = slightly, 3 = moderately, 4 = very, 5 = extremely). We also asked how often an SRAI was used when conducting a risk evaluation (1 = never, 2 = sometimes, 3 = about half the time, 4 = most of the time, 5 = always).

Next, respondents were asked to rate their concerns about the possibility of false positives and false negatives on two separate 5-point Likert scales (1 = not at all, 2 = a little, 3 = a moderate amount, 4 = a lot, to 5 = a great deal). In the survey, we defined a false negative as occurring when an individual is classified as having a low risk to reoffend, when in reality he/she has a high risk. We defined a false positive as occurring when an individual is classified as having a high risk to reoffend, when in reality, he/she has a low risk.

The next two questions related to respondents’ frequency of consulting with third parties about FREs (1 = never to 5 = always). We defined consulting in the survey as “seeking advice about the evaluation.” We also asked respondents to specify with whom they consulted from a list of options (colleagues, supervisor, other treatment provider(s) of the evaluee, prison/jail staff, evaluee’s family members, probation/parole officer(s), or other). Respondents could select multiple options, and if other was selected, we asked them to specify in free-text.

Use of specific SRAIs and usefulness ratings

In the subsequent section, respondents were asked which SRAIs they were required by their employer or jurisdiction to use, and which SRAIs they chose to use. We created a list of commonly-used SRAIs in the Netherlands (see for complete list); respondents could select multiple SRAIs and also provide free-text responses. For the SRAIs that respondents indicated they were required or chose to use, they rated the usefulness of each separately on a 5-point Likert scale (1 = not at all useful to 5 = extremely useful).

Table 1. Risk assessment instruments: required and optional users.

Cognitive bias concerns in FREs

We asked respondents about their views regarding cognitive bias in forensic risk evaluations (FREs). The extent to which respondents thought cognitive bias is a problem in FREs was rated on a 5-point Likert scale (1 = not at all, 2 = a little, 3 = a moderate amount, 4 = a lot, 5 = a great deal). We defined cognitive bias in the survey as an error in reasoning, evaluating, remembering, or processing information. Respondents then indicated yes or no in response to questions about whether they had received any specific training about cognitive bias in FREs and whether they had ever been concerned about bias in an FRE conducted by (a) themselves or (b) someone else.

Ratings of effectiveness of potential debiasing strategies

In the next section, respondents were asked to rate on a 5-point Likert scale the efficacy of each of the 25 debiasing strategies (Neal & Brodsky, Citation2016) to reduce the potential that cognitive bias will influence an evaluator’s judgment about future violence risk (1 = not effective at all to 5 = extremely effective). In contrast to Neal and Brodsky, we did not give participants an option to provide a rating of unsure, but used moderately effective as the midpoint of our scale. We were interested in respondents’ beliefs about how effective the strategies are, not whether the strategies are effective in reality. A response of unsure would have made it impossible to distinguish between a respondent who was unsure about whether the debiasing strategy was effective and a respondent who was unsure about their own opinion.

Potential sources of bias in FREs

Finally, respondents indicated on a 5-point Likert scale (1 = strongly disagree to 5 = strongly agree) the extent to which they believed that various situations have the potential to bias an evaluator when conducting an FRE. Seven items were derived from Zapf et al. (Citation2018) survey. The statements included situations such as whether an evaluator’s prior beliefs affect how they analyze a case or their ultimate opinion about the case, whether making a conscious effort to set aside prior beliefs can reduce the likelihood the evaluator will be influenced by them, and whether evaluators know in advance what conclusion they are expected to reach in a case and whether that affects their conclusion (see for the complete list).

Table 3. Mean ratings and frequencies of beliefs about potential sources of cognitive bias in forensic risk evaluations.

One difference that should be noted between the current study and Zapf et al.’s study is that we eliminated questions about the effects of irrelevant contextual information (Zapf et al., Citation2018, p. 5, , #5-7). Despite the fact that research suggests irrelevant contextual information can result in bias and errors in other areas of forensic science (Dror, Citation2012, Dror, Citation2018; Dror et al., Citation2006, Dror et al., Citation2015; Kukucka & Kassin, Citation2014; Nakhaeizadeh et al., Citation2014), there is debate about what type of information should be considered irrelevant contextual information in these fields (Curley et al., Citation2019, Citation2020; Dror & Murrie, Citation2018; Gardner et al., Citation2019; Thompson et al., Citation2020). Further, we were unable to locate any agreed-upon definition of exactly what constitutes irrelevant contextual information specifically for FREs. Therefore, we opted not to include items based on this concept.

Table 2. Group mean and differences in usefulness ratings of most frequently used SRAIs.

Data analysis

All analyses were conducted using IBM SPSS software version 25.

Results

Experience conducting FREs and consulting about FREs

In the two years preceding the survey, the majority of respondents (59%) conducted 10 or more forensic risk evaluations (Mdn = 12; range of 0–250), seven (6.4%) of whom had conducted 100 or more. Forty-six respondents (41.8%) had performed 10 or more presentencing evaluations, and six (5.5%) had conducted 100 or more. Twenty-two respondents (20.0%) had performed at least one inpatient TBS evaluation in the past two years, 11 (10.0%) of whom had performed 10 or more. Fifteen (13.6%) respondents had conducted TBS extension evaluations within the previous two years, eight (7.3%) of whom had done 10 or more of these evaluations.

Nearly all respondents (n = 104; 94.5%) said they consulted with others about FREs at least some of the time. Over half (57.3%) said they consulted with others always or most of the time. More than two-thirds of respondents (69.1%) indicated they consult with colleagues and a substantial minority (41.8%) said they consulted with another treatment provider of the examinee. More than one-third consult with either the examinee’s probation/parole officer or prison staff (29.1% and 6.4%, respectively), while less than 10% consult with the examinee’s family members (9.1%).

Frequency of SRAI use and required and optional use of specific SRAIs

Almost all respondents indicated they used an SRAI for risk evaluations always or most of the time (n = 107; 97.3%). Only one respondent indicated that they never used an SRAI when conducting a risk evaluation, one said they did about half the time, and one said they did sometimes. The respondents also mostly agreed that SRAIs are useful in conducting risk evaluations, with 14 (12.7%) rating them as extremely useful and 69 (62.7%) rating them as very useful. However, 19 respondents (17.3%) rated SRAIs as moderately useful, and eight (7.3%) rated them as only slightly useful. Most of our respondents were required to use a specific SRAI, with only about 8% indicating they were not required to do so. Five respondents chose not to use any SRAIs; however, it is unknown whether they chose not to use any specific SRAIs because they used only those that were required.

Many respondents indicated they were both required and choose to use the same SRAIs. In other words, these respondents were in the required and optional users group and provided duplicate ratings for the same SRAI. We therefore created two conceptually distinct groups: required users and optional users. If respondents indicated they were required, or both required and chose to use the SRAI, we considered them required users. If respondents indicated only that they chose to use the SRAI, we considered them optional users (see and ).

The Historical Clinical Risk Management-20 (HCR-20V2; Webster et al., Citation1997; HCR-20V3; Douglas et al., Citation2014) and the Historische, Klinische, Toekomstige-30 (HKT-30; Werkgroep Pilotstudy Risicotaxie, Citation2002) or its successor, the HKT-R (Spreen et al., Citation2014), were the two most commonly reported required SRAIs (48.2% and 47.3%, respectively). The HCR-20 is an SPJ tool designed to assess psychosocial functioning as it relates to violence risk among adults and includes 10 historical, five clinical, and five risk management items (Douglas et al., Citation2014). The HKT-30 is a Dutch-language SPJ tool that was created in the Netherlands for the evaluation of violence risk in forensic psychiatric settings. The HKT-30 has a structure similar to the HCR-20, and consists of 11 historical, 13 clinical and dynamic, and six risk management items rated on a 5-point scale (Werkgroep Pilotstudy Risicotaxie, Citation2002).

Similarly, nearly half (47.3%) of our respondents were required to use the Structured Assessment of PROtective Factors for violence risk (SAPROF; de Vogel, de Ruiter, Bouman, & de Vries Robbé, Citation2009; Citation2012). The SAPROF is designed to be used in conjunction with an SPJ SRAI and comprises 17 factors considered to be protective against violent behavior (de Vries Robbé et al., Citation2013). Each protective factor is rated on a three-point scale (0 =clearly absent, 1 =somewhat present, 2 =clearly present). An Integrative Final Risk Judgment is obtained by integrating information about the protective factors assessed with the SAPROF and the risk factors measured with an SPJ risk assessment tool (de Vries Robbé et al., Citation2013).

Nearly half of respondents (44.5%) were required to use the Psychopathy Checklist-Revised (PCL-R; Hare, Citation2003) or the Psychopathy Checklist: Screening Version (PCL:SV; Hart, Cox, & Hare, 1995). The PCL-R and PCL:SV are psychometric tools to assess an individual’s level of psychopathy. Although the PCL-R and PCL:SV are not violence risk assessment instruments per se, psychopathic traits are linked with an increased risk of violent recidivism (Barbaree et al., Citation2001; Hawes et al., Citation2013; Lanterman et al., Citation2014). Therefore, the PCL-R or PCL:SV is commonly used in combination with SRAIs to assess the risk of violent or sexual reoffending.

To estimate sexual reoffending risk, more than half of respondents (53.6%) were required to use the Static-99 (Hanson & Thornton, Citation1999) or the Static-99R (Helmus et al., Citation2012). The Static-99R is an actuarial tool comprised of 10 static, historical risk factors. Although not specifically listed as a choice, two respondents reported they were required to use an SPJ tool, the Sexual Violence Risk-20 (SVR-20; Boer et al., Citation1997) to assess sexual reoffending risk. The SVR-20 consists of 20 items related to three domains: psychosocial adjustment, sexual offenses, and future plans (Rettenberger et al., Citation2017).

A number of participants indicated by free-text response that they used a combination of the Static-99R, the STABLE-2007 (Hanson et al., Citation2007), and the ACUTE-2007 (Hanson et al., Citation2007) to evaluate the risk of sexual reoffending. Some were required to use the STABLE (Hanson et al., Citation2007) or the ACUTE (Hanson et al., Citation2007). The STABLE-2007 is used in sexual offending recidivism risk assessment and contains 13 risk factors related to the offender’s ability to regulate his sexual behavior (Hanson et al., Citation2015). The ACUTE-2007 is a measure of dynamic risk factors and contains seven items related to sexual reoffending risk that can change over the short-term (Hanson et al., Citation2007).

Perceived usefulness of most frequently used SRAIs

An independent samples t-test revealed that the only significant difference between required and optional users with respect to their usefulness ratings of the five most commonly used SRAIs (i.e., HCR-20, HKT-30/R, SAPROF, PCL-R/SV, and Static-99/99R) were found for the Static-99/99R. The optional users rated the Static-99/99R as significantly less useful (M = 3.90, SD = 0.32) than required users (M = 4.31, SD = 0.75), t(30.02) = 2.90, p = .007, Hedges’ g = 0.58 (medium effect), 95% CI [-0.10, 1.26]. Mean usefulness ratings of the five most commonly used SRAIs for required and optional users are presented in , as are the results of the independent samples t-tests.

Concerns about cognitive bias and errors in FREs

Overall, respondents rated cognitive bias as a moderate problem in FREs in general (M = 3.32, SD = 0.75). A McNemar’s test revealed no significant difference in the proportion of respondents who were concerned about bias in an FRE conducted by someone else compared to those concerned about bias in an FRE conducted by themselves, p = .388. A paired samples t-test revealed that respondents were significantly more concerned about the possibility of a false negative (M = 3.07, SD = 1.00) than a false positive (M = 2.85, SD = 0.81) in a risk evaluation, t(109) = −2.38, p = .019, Hedges’ gavg = 0.31 (small effect), 95% CI [0.05, 0.57].

Potential sources of bias in FREs

Similar to the approach of Zapf and colleagues (2017), we asked respondents about situations that may have the potential to bias evaluators when they are conducting risk evaluations. Mean ratings, response frequencies, and modal responses among our respondents for each level of agreement regarding potential sources of bias in forensic evaluations are presented in .

A substantial majority (83.6%) of our respondents did not agree that cognitive bias is less of a problem in forensic psychology than in other forensic sciences (e.g., fingerprint analysis, hair matching). They generally agreed (57.2%) that evaluators sometimes know what conclusions they are expected to reach and that this affects their conclusions (61.8%). Most of our respondents agreed that an evaluator’s prior beliefs and expectations can affect how they analyze a case (90.9%) or their ultimate opinion on the case (90.0%). Yet, a substantial majority (71.8%) agreed that if an evaluator makes a conscious effort to set aside their prior beliefs and expectations, they are less likely to be influenced by them.

Perceived effectiveness of debiasing strategies

We asked our respondents to rate how effective they thought the 25 strategies identified in Neal and Brodsky (Citation2016) are in reducing the potential for cognitive bias in a risk evaluation. The mean effectiveness ratings for all debiasing strategies are presented in . We calculated the sample mean rating across all six strategies that have been suggested in the literature as potentially effective in mitigating bias, and the sample mean rating for the 19 strategies identified by Neal and Brodsky (Citation2016) as ineffective or not specifically suggested previously. Only introspection has been shown to be ineffective, because people are often unaware of the existence of a particular biasing stimulus, that their response is influenced by the stimulus, or both (Nisbett & Wilson, Citation1977; Pronin & Kugler, Citation2007; Wilson & Brekke, Citation1994).

Strategies previously suggested in scientific literature include the following: critically examining conclusions, consulting with colleagues, taking time to think about the evaluation information rather than immediately writing the report, receiving explicit didactic training about objectivity, exposure to the importance of objectivity through reading professional literature, and using structured evaluation methods. A paired samples t-test revealed that respondents gave significantly higher ratings to strategies that have been suggested in the literature (M = 3.47, SD = 0.57) than to those identified by Neal and Brodsky as not having been specifically suggested previously (M = 3.09, SD = 0.59), t(109) = 10.51, 95% CI [0.31, 0.45], p < .001, Hedges’ gavg = 0.65 (medium effect), 95% CI [0.50, 0.80].

Effects of cognitive bias training

A minority of respondents (27%) indicated that they had received training on cognitive bias. An independent samples t-test revealed no significant difference between respondents who had received bias training (M = 3.37, SD = 0.77) and those who had not (M = 3.30, SD = 0.75) with regard to the extent to which they believed cognitive bias is a problem in FREs, t(108) = −0.41, Hedges’ g = 0.09, 95% CI [-0.33, 0.51]. Overall, this finding indicates respondents think cognitive bias is a moderate problem in FREs, regardless of whether they have had cognitive bias training or not.

We also examined whether training about cognitive bias in forensic evaluations affected ratings of the debiasing strategies. A Welch’s t-test revealed that practitioners who had some training related to bias gave significantly higher effectiveness ratings (M = 4.10, SD = 0.61) to “taking personal responsibility to continue learning after completing formal training and education” as a debiasing strategy than those who had not received such training (M = 3.71, SD = 0.83), t(71.02) = −2.68, p = .009, Hedges’ g = 0.50 (medium effect), 95% CI [0.08, 0.93]. We are unaware of research that tests whether this strategy is effective in counteracting cognitive bias.

Discussion

Risk evaluation practices and use of SRAIs may vary by country, by type of legal system (i.e., adversarial vs. inquisitorial), and other system factors, such as recommendations by professional organizations, organizational or statutory requirements to use (specific) SRAIs, and regulations governing recognition as an expert in the legal system (McLaughlin & Kan, Citation2014). In addition, awareness and education about the potential for cognitive bias in forensic risk evaluations and potentially useful debiasing strategies may also vary between countries. Therefore, the purpose of this survey was threefold: (a) to identify which SRAIs are commonly used by forensic mental health evaluators who conduct FREs in the Netherlands and their perceptions of the usefulness of those SRAIs; (b) to gain insight into evaluators’ views about potential sources of bias when conducting FREs; and (c) to examine evaluators’ concerns about cognitive bias and their views of potential strategies to diminish bias.

Frequency of use and usefulness ratings of SRAIs

Similar to previous surveys (Hurducas et al., Citation2014; Neal & Brodsky, Citation2016; Zapf et al., Citation2018), respondents in the current study were quite experienced in their respective fields and averaged approximately 18 years of experience in mental health settings. The majority of our respondents had conducted more than 10 risk assessments in the two years preceding our survey, with a small proportion having conducted 100 or more.

Our respondents reported a very high rate of SRAI use, as nearly all of them indicated that they use an SRAI always or most of the time. Further, the vast majority of respondents were required to use a specific SRAI. These high rates of SRAI use likely reflect professional recommendations in the Netherlands that urge the use of SRAIs in forensic assessments (Nederlands Instituut voor Forensische Psychiatrie en Psychologie, Citation2018). Our results are in line with findings in other recent international surveys suggesting the increasing use of SRAIs (Neal & Grisso, Citation2014b; Singh et al., 2014). For example, Neal and Grisso (Citation2014b) reported that 96.9% of their respondents reported using an SRAI for sexual offender risk assessment and 89.0% for violence risk assessment. Similarly, among an international sample of mental health professionals from Europe, Singh and colleagues (2014) reported that over the 12 months preceding the survey, approximately 63% of respondents used SRAIs to conduct risk assessments.

In addition, in the Netherlands, a strong emphasis is placed on using an SPJ approach to FREs (Nederlands Instituut voor Forensische Psychiatrie en Psychologie, Citation2018). Our findings indicate that SPJ tools—specifically, the HCR-20 and the HKT-30/R—are the SRAIs most commonly used among Dutch forensic evaluators. In fact, most of our respondents reported that they are required to use one of these two tools. Regardless of whether they were required or optional users, evaluators rated the HCR-20 and the HKT-30/R as moderately to highly useful. About two-thirds of our respondents also used the SAPROF and rated it as moderately to very useful.

It is common practice for forensic evaluators to use a psychopathy measure when performing a (sexual) violence risk assessment. Psychopathy is generally considered relevant in psycholegal contexts because of its relationship with risk for general criminal, violent, and sexual recidivism (DeMatteo et al., Citation2020). Nearly half of our respondents reported that they were required to use a psychopathy measure (PCL-R or PCL:SV) and a substantial majority of these required users rated the measure as very or extremely useful. Similarly, in their survey of forensic psychologists, Viljoen and colleagues (2010) reported that nearly 65% of respondents who conducted any type of adult risk assessment used the PCL-R or PCL:SV.

Interestingly, the majority of our respondents indicated they were required or chose to use an actuarial (rather than SPJ) SRAI—the Static-99/99R—for evaluating the risk of sexual reoffending. This is not entirely unexpected as the Static-99/R is one of the most commonly used SRAIs for estimating the likelihood of sexual recidivism (Archer et al., Citation2006; Chevalier et al., Citation2015; Kelley et al., Citation2020; Neal & Grisso, Citation2014b). Furthermore, previous surveys have reported that actuarial SRAIs are used more often than SPJ SRAIs for evaluating sexual recidivism risk (Kelley et al., Citation2020; Neal & Grisso, Citation2014b).

Despite its widespread use and popularity, recent field studies indicate that the Static-99R can result in significant overestimates of sexual recidivism risk (Boccaccini et al., Citation2017). The Static-99R has also been criticized because it considers only historical risk factors that cannot be changed with treatment (Cauley, Citation2007; Craig et al., Citation2005). The Risk-Needs-Responsivity (RNR; Bonta & Andrews, Citation2007) model of rehabilitation for people who have committed criminal offenses indicates that addressing treatment needs requires the assessment of dynamic factors that can be targets for treatment (Bonta & Andrews, Citation2007; Mann et al., Citation2010). In fact, some respondents in our survey wrote a free-text response that said they used the Static-99/99R in combination with the ACUTE-2007/STABLE-2007 to measure static and dynamic factors and treatment needs. Some respondents also wrote that they use the Sexual Violence Risk-20 (SVR-20; Boer et al., Citation1997), which is the most commonly used SPJ instrument for evaluating sexual recidivism risk.

Concerns about bias and errors in forensic risk evaluations

Respondents to our survey were significantly more concerned about the possibility of a false negative than a false positive outcome in their risk evaluations. This finding suggests that Dutch forensic evaluators, like evaluators elsewhere (Bonta & Motiuk, Citation1990), may tend to err on the side of caution (i.e., being more averse to the potential outcome of improperly classifying someone as low risk who then reoffends than improperly classifying someone as high risk who would not have reoffended). Furthermore, erring on the side of caution can increase the likelihood of false positives, the consequence of which is that people are unjustly deprived of their freedom and limited mental health and correctional resources are needlessly wasted (Bonta & Motiuk, Citation1990; Harris, Citation2006). Of course, false positives and false negatives both carry harmful consequences and neither error is desirable. Yet it is much easier to identify a false negative because the crime the person commits is likely to come to the attention of law enforcement, and potentially, the media. On the other hand, a false positive is unlikely to be identified because the absence of reoffending is likely to be attributed to incapacitation and treatment, not a misclassification of the individual’s risk.

We were also interested in how often and with whom our respondents consulted regarding FREs. We defined consulting for our respondents as “seeking advice about the evaluation,” thereby leaving open the possibility of accounting for collateral interviews. Previous studies indicate that risk judgments can be more accurate than individual evaluator ratings by using a consensus method (de Vogel & de Ruiter, Citation2006; Huss & Zeiss, Citation2004; McNiel et al., Citation2000), and this method is commonly used in the Netherlands when conducting FREs (Harte & Breukink, Citation2010). Our findings confirm that it is common practice for Dutch evaluators to seek advice from others about an FRE always or most of the time. In addition, over 40% of our respondents indicated that they consult with other treatment providers of the evaluee who is the subject of the FRE.

Beyond consultations with colleagues, we also note that our respondents indicated they “consult” with other parties about FREs, including probation and parole officers, the evaluee’s family, and prison staff. Because of how we defined “consulting” in our survey, the responses elicited from our respondents likely include inter-professional consultations as well as interviews with collateral informants. In fact, collateral sources of information appear fairly common in FREs. Neal and Grisso (Citation2014b) reported that their international respondents (N = 434) conducted collateral interviews, both with other professionals (54.5% for FREs for violence and 25.0% for FREs for sexual violence) and with nonprofessionals (27.0% and 35.2%, respectively).

Guidelines from The Netherlands Institute of Forensic Psychiatry and Psychology indicate forensic evaluators should include relevant information in their reports, but the decision about relevant and irrelevant information is left to the subjective opinion of the evaluator (de Ruiter & Kaser-Boyd, Citation2015). On the one hand, collateral interviews may provide evaluators with important information needed to conduct a complete FRE. On the other hand, there is a concomitant risk that the evaluator will be exposed to potentially biasing, irrelevant contextual information (Zapf & Dror, Citation2017). There is a growing body of empirical studies providing evidence that exposure to task-irrelevant contextual information can bias forensic evaluators (Dror, Citation2012, Dror, Citation2018; Dror et al., Citation2006, Dror et al., Citation2015; Kukucka & Kassin, Citation2014; Nakhaeizadeh et al., Citation2014) and should be avoided. In fact, avoiding potentially biasing, irrelevant information may be one of the most effective strategies to prevent bias before it occurs (Dror et al., Citation2015; Gardner et al., Citation2019; National Research Council, Citation2009; Wilson & Brekke, Citation1994).

Potential sources of bias in forensic evaluations

Similar to Zapf et al. (Citation2018) findings, our respondents were inclined to agree that an evaluator’s prior beliefs and expectations can affect how they analyze a forensic case and formulate their ultimate opinion, suggesting some awareness of confirmation bias among our respondents (Kassin et al., Citation2013). Nevertheless, less than half of our respondents viewed bias as a problem in FREs. This contrasts with nearly 86% of forensic evaluators in Zapf et al.’s survey who expressed concern about cognitive bias in forensic evaluations as a whole, and about 79% of whom said they were concerned about cognitive bias in their specific domain of forensic evaluations.

Furthermore, a substantial majority of our respondents agreed that a conscious effort to set aside prior beliefs or expectations makes it less likely an evaluator will be influenced by them. Yet, conscious efforts to set aside prior beliefs and expectations are unlikely to be effective in eliminating bias, because bias operates outside of awareness (Wilson & Brekke, Citation1994). Furthermore, even when an evaluator is aware of the potential for preexisting motivations and emotions to affect their evaluation, efforts to counteract them are not necessarily effective (Kassin et al., Citation2013).

Perhaps our respondents were not as concerned about bias in FREs because they commonly used SRAIs (de Ruiter, Citation2016; de Ruiter & Hildebrand, Citation2007). As applied to FREs, SRAIs may help reduce the effects of bias, although a growing body of evidence suggests this is not always the case (Chappell et al., Citation2013; Gowensmith & McCallum, Citation2019; Guay & Parent, Citation2018; Guy et al., Citation2014; Miller & Maloney, Citation2013; Murrie et al., Citation2008, Citation2009; Murrie & Balusek, Citation2008; Schmidt et al., Citation2016; Shepherd & Sullivan, Citation2017; Storey et al., Citation2012; Wormith et al., Citation2012). Therefore, evaluators should still remain aware of the potential for bias if they are to take steps to effectively minimize its effects on FREs (Dror, Citation2018; Lilienfeld et al., Citation2009; Wilson & Brekke, Citation1994).

Perceived effectiveness of debiasing strategies

Neal and Brodsky (Citation2016) identified six debiasing strategies that have previously been suggested as potentially effective debiasing techniques for use in forensic evaluations. These strategies are: using structured methods to gather and analyze data (Croskerry et al., Citation2013; Graber et al., Citation2012; Neal & Grisso, Citation2014b; Zapf & Dror, Citation2017), consulting with colleagues (Croskerry et al., Citation2013; Graber et al., Citation2012), critically examining conclusions (e.g., considering alternative hypotheses; Galinsky et al., Citation2000; Galinsky & Moskowitz, Citation2000; Grisso, Citation2010; Lord et al., Citation1984; Mumma & Wilson, Citation1995; Soll et al., Citation2015; Zapf & Dror, Citation2017), receiving explicit didactic training about objectivity (Bridge & Marić, Citation2019; Croskerry et al., Citation2013; Graber et al., Citation2012; Soll et al., Citation2015), taking time to think about the evaluation before writing the report (Croskerry et al., Citation2013; Lilienfeld et al., Citation2009), and reading professional literature about the importance of objectivity (Croskerry et al., Citation2013). Our respondents gave significantly higher effectiveness ratings to the six debiasing strategies with some empirical support than they did to the remaining 19 strategies. However, we note that the empirical support for several of these ‘effective’ debiasing strategies is derived from medical research related to reducing errors and improving accuracy in clinical and diagnostic decision-making. Therefore, we cannot say with certainty that the debiasing strategies suggested in other fields will be effective in mitigating bias in FREs (Fischhoff, Citation1982; Soll et al., Citation2015).

In addition to the six strategies identified by Neal and Brodsky (Citation2016) as having been suggested in scientific literature, we think there are at least four others (from the 25 listed) that have been suggested as potentially effective in countering bias specific to forensic evaluations. These four strategies are: investigating all relevant data before forming an opinion (Grisso, Citation2010; Zapf & Dror, Citation2017), basing conclusions and opinions on sound data (Grisso, Citation2010; Zapf & Dror, Citation2017), taking careful notes during an evaluation (Arkes, Citation1981; Borum et al., Citation1993; Mumma & Wilson, Citation1995), and examining patterns of personal decision-making (e.g., agreement with referral party preferences; comparing one’s decisions over time to base rates; Brodsky, Citation2013; DeClue & Rice, Citation2016; Gowensmith & McCallum, Citation2019; Murrie & Balusek, Citation2008; Murrie & Warren, Citation2005; Parker, Citation2016). However, we note that very few strategies to mitigate bias have been empirically tested. Therefore, field and task-specific research regarding the effectiveness of potential strategies to mitigate bias in FREs would be of significant benefit for the discipline, forensic examinees, and legal decision-makers.

Limitations and conclusions

There are a number of limitations to our findings that relate to our sample. First, we note the relatively small sample size of our study (N = 110), although the sample size is in line with previous similar surveys conducted in other countries (e.g., Archer et al., Citation2006; Hill & Demetrioff, Citation2019; Viljoen et al., Citation2010). Second, practitioners who chose to participate in the survey may have done so because they possess more awareness and/or concern about the potential for bias than evaluators who chose not to participate, thus introducing (self-)selection bias. Third, eight of our respondents had not attained a graduate degree, which may have limited the types of risk assessment tools they were qualified or trained to use. Finally, we are aware of one previous international survey by Singh and colleagues (2014) that differentiated between forensic psychologists’ and forensic psychiatrists’ use of violence SRAIs. Their findings indicated that forensic psychologists may use violence SRAIs more frequently than forensic psychiatrists. However, we did not ask our respondents to specify their discipline. Future researchers may consider explicitly differentiating the use and perceived utility of SRAIs between these disciplines and others (e.g., social workers, nurses). For these reasons, our sample may not be representative of all forensic mental health practitioners who conduct FREs in the Netherlands.

In addition, the actual response rate to our survey is unknown, and any conclusions about the broader population of forensic mental health practitioners in the Netherlands are therefore tentative. Furthermore, we do not know the context in which our respondents are performing FREs: for example, are they conducting FREs for the court, for the purpose of treatment planning and risk management, for decisions related to patient restrictions? The context in which an FRE is undertaken may have an effect on SRAI use and how concerns about potential bias are managed. Therefore, our findings may not generalize across all forensic contexts and purposes for which FREs are conducted.

Our survey results are also limited with respect to the use of sexual recidivism SRAIs. Unfortunately, we failed to include SPJ instruments for sexual violence, such as the SVR-20, in our survey list of SRAI options. Still, a number of respondents included the SVR-20 in free text, but we cannot exclude the possibility that more respondents would have selected the SVR-20 if it had been listed as an option.

Finally, we cannot rule out the possibility of socially desirable responding. Although the survey was completed anonymously, and participants were assured of anonymity, it is possible that some people were concerned about negative portrayals of their profession, their work, or their employer. Therefore, it is possible that there is a disparity between what our respondents say they do in practice and what they actually do.

The findings from this study also point to potential areas for future research. For example, we did not ask participants about “irrelevant contextual information” as in Zapf et al.’s (2017) study, because of the lack of consensus about what constitutes irrelevant contextual information in risk evaluations. A recent survey of forensic scientists (N = 189) from several forensic disciplines (biology, pattern evidence, chemistry, and crime scene investigation) is illustrative of the challenges in obtaining agreement regarding what constitutes irrelevant contextual information in their tasks (Gardner et al., Citation2019). Gardner and colleagues reported that among crime scene investigators, a substantial majority agreed that a description of the evidence, how it was collected, the type of offense, and a synopsis of the case were essential to their tasks. Conversely, 25% said that a suspect’s statement or confession was essential, whereas greater than half said they would review that information if it was available (58.3% and 66.7%, respectively). A small minority agreed that the suspect’s statement or confession was irrelevant (16.7% and 8.3%, respectively).

Similar to crime scene investigation, the process of a forensic risk evaluation involves gathering information and evidence. Information that is deemed to be relevant in that process is likely to vary considerably between cases and individual experts. Given the potential for irrelevant contextual information to introduce bias into FREs (Neal & Saks, Citation2016; Zapf & Dror, Citation2017), surveying forensic mental health professionals about their views of what information is irrelevant to the FRE task seems to be an important avenue for future research. Only within the past decade or so have researchers begun to investigate the role that cognitive bias may play in FREs (Camilleri et al., in press; Charman, Citation2013). Furthermore, many potentially effective debiasing techniques have not been empirically tested with respect to forensic evaluations, or specifically FREs. This point is worth noting because not all debiasing strategies are appropriate and/or effective across domains or tasks (Soll et al., Citation2015). It is not necessarily the case that a debiasing strategy that is effective in countering one type of bias (e.g., confirmation bias) will be effective in countering a different type of bias (Fischhoff, Citation1982). Therefore, direct investigations on the effect structured evaluation methods on countering bias in FREs may shed light on the debiasing potential of these methods.

Conflict of interest

We have no conflicts of interest to disclose.

Data availability statement

The data that support the findings of this study are available from the corresponding author, JK, upon reasonable request.

Additional information

Funding

This research was supported by a fellowship from the Erasmus Mundus Joint Doctorate Program, The House of Legal Psychology [grant no. FPA 2013-0036] and [grant no. SGA 532473-EM-5-2017-1-NL-ERAMUNDUS-EPJD].

References