954
Views
0
CrossRef citations to date
0
Altmetric
Research Article

The use of alternative scenarios in assessing the reliability of victims’ statements

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Received 16 Nov 2022, Accepted 07 Jul 2023, Published online: 20 Jul 2023

ABSTRACT

The use of alternative scenarios has been advocated as a method to mitigate bias when evaluating the reliability of testimonies. In two experiments, undergraduate students acted as expert witnesses when reading an alleged child sexual abuse case file and evaluated the reliability of the statements. In the first experiment, a subgroup of participants were encouraged to think about alternative scenarios (i.e. the statements are fabricated) when evaluating statements (N = 150). Contrary to our expectations, these participants were not more skeptical about the reliability of the alleged victim’s testimony than the control participants. In the second experiment (N = 205), we tested whether scenario-thinking protected against context effects (i.e. the unintended influence of irrelevant information) from a defense lawyer or prosecutor. We found no support that being sensitized to alternative scenarios made participants more skeptical of the reliability of testimonies. However, when we performed an internal joint analysis of Experiments 1 and 2, we did find some evidence that considering alternative scenarios made participants more skeptical of the suspect’s guilt than those in the control group. We discuss the use of alternative scenarios in expert witness work and potential ways to empirically test the alternative scenario approach in the future.

Introduction

Testimonies from witnesses oftentimes serve as important evidence in legal decision-making. Police, lawyers, or prosecutors sometimes ask psychologists to provide their expert opinions on the credibility of such testimonies (Dodier & Denault, Citation2018; Otgaar et al., Citation2020; Volbert & Steller, Citation2014). The main task of such expert witnesses is to perform their assessment of the credibility of the victim and eyewitness statements in an objective way. However, when serving as an expert witness, psychologists might be affected by context information that is irrelevant to the case at hand, such as hearing from a prosecutor that the suspect confessed (Bogaard et al., Citation2014; Rassin, Citation2018). The influence of such irrelevant information can lead to bias in expert witness work, where this bias can undermine accuracy (O’Brien, Citation2009). In the current set of two experiments, we tested whether the use of alternative scenarios can raise skepticism and reduce bias.

Biases in expert witness work

Confirmation bias is a person’s tendency to seek, understand, interpret, and create new evidence to verify their pre-existing beliefs (Kassin et al., Citation2013). Confirmation bias arises when an individual seeks consistent evidence and downplays conflicting evidence to bolster a trusted premise (Nickerson, Citation1998; O’Brien, Citation2009). This psychological process entails unconscious information processing, which can result in incorrect conclusions (O’Brien, Citation2009).

When confirmation bias occurs in legal casework, researchers speak of forensic confirmation bias (Kassin et al., Citation2013). Such bias can result in experts maintaining a sole focus on a single suspect, prompting police investigators to seek and support incriminating evidence while ignoring exonerating evidence. Confirmation bias can also cause investigators to cherry-pick evidence that incriminates the primary suspect while ignoring potentially exonerating evidence (Kassin et al., Citation2013). For example, confirmation bias can skew the decision and testimony of a forensic examiner (such as police, fingerprint examiner, DNA expert, psychologist, psychiatrist, etc.). Biased expert testimony could lead to incorrectly excluding other suspects, and different pieces of biased evidence can influence one another, resulting in a bias cascade effect and a biased case file. This biased information might influence the decision-making of judges or jurors, thereby impacting legal decisions (Kassin et al., Citation2013; O’Brien, Citation2009).

Bias can affect everyone involved in an investigation, including experts and expert witnesses (Dror & Cole, Citation2010). Expert witnesses are required to deliver their expert testimony in a court proceeding in an objective manner. However, in adversarial legal systems, experts can become biased in favor of the side that hired them, a bias termed allegiance bias (Blair et al., Citation2008; Munder et al., Citation2011; Murrie et al., Citation2008 , Citation2009; Thase, Citation1999). Murrie and Boccaccini (Citation2015) reviewed studies on mental health evaluations by forensic experts and concluded that working for one side in an adversarial case led experts to score these instruments in a way that benefited that side. For example, Murrie et al. (Citation2013) invited a group of forensic assessors to conduct a risk assessment of sex offenders using the Psychopathy Checklist-Revised and an actuarial risk-assessment instrument designed to predict sexual recidivism among sex offenders. Ninety-nine participants were randomly assigned to either a prosecution-allegiance (50 participants) or a defense-allegiance group (49 participants) and were deceived into believing that they were part of a formal forensic consultation paid for by either a public-defender service or a specialized prosecution unit. Participants were instructed to score the offenders’ interpersonal, emotional, traits, and sexual behavior using two risk instruments. Prosecution-retained evaluators assigned significantly higher risk assessment scores than defense-retained evaluators.

Although allegiance bias among forensic experts has been widely studied (e.g. Blair et al., Citation2008; Engel & Glöckner, Citation2013; Gowensmith & McCallum, Citation2019; McAuliff & Arter, Citation2016; Murrie et al., Citation2009, Citation2013), to date, only one set of studies (three experiments) has investigated whether allegiance bias can affect the way psychologists assess the reliabilityFootnote1 of victims’ statements (Sauerland et al., Citation2020). In the criminal justice system, the evaluation of these statements can impact the outcome of criminal investigations (Bogaard et al., Citation2014; Volbert & Steller, Citation2014; Vrij, Citation2005), especially when no other evidence is available, as is sometimes the case in sexual abuse allegations (Köhnken, Citation2004; Volbert & Steller, Citation2014). Psychologists use a variety of methods to evaluate the reliability of testimonies from witnesses, victims, or suspects.

A frequently used method for evaluating the reliability of verbal statements is statement validity assessment (SVA; Vrij, Citation2005; Vrij et al., Citation2000). One key component of SVA is Criteria Based Content Analysis (CBCA), which evaluates the statement’s content quality (Blandon-Gitlin et al., Citation2009). The CBCA is based on the idea that statements generated from real-life experiences differ in quantity and quality from false accounts. The CBCA consists of 19 criteria (e.g. logical structure, unstructured production, the quantity of details), and the presence or absence of those criteria is thought to differentiate between true and false statements (Steller & Kohnken, Citation1989; Vrij, Citation2005). One way to distinguish true statements from fabricated ones is to look at the criteria present and the number of criteria present. For example, if many criteria are present, an account is assessed as being truthful. In other words, true statements meet more criteria than fabricated ones.

Bogaard et al. (Citation2014) focused on how biases might affect CBCA scores, although these authors did not focus on allegiance bias per se. Participants viewed four statements supplemented with positive and negative context information to increase or decrease the credibility of the statements. For example, in the positive context information condition, participants learned that other eyewitnesses confirmed certain details of the statement. In the negative context information condition, participants learned details about the victim’s criminal background that implied a history of lying. As expected, and supporting the presence of forensic confirmation bias, participants’ CBCA scores in the positive context information condition were higher than in the negative context information condition.

To the best of our knowledge, only one set of studies has examined allegiance bias in the field of statement validity assessment (Sauerland et al., Citation2020). Students acted as expert witnesses in evaluating interviews involving a case of child sexual abuse. Participants each received a case file, some of which included a request from the defense, some with a request from the prosecution, and some with no request letter. The defense lawyer’s letter of request emphasized elements in the file that questioned the veracity of the accusation, whereas the prosecutor’s letter emphasized information in the file that supported the reliability of the allegations of sexual harassment. After reading the letter, participants were asked to read the case file and then assess the case using several case evaluation questions. As expected, across all three experiments, defense-appointed participants rated the allegations as less reliable than the prosecutor-appointed participants.

Reducing bias in expert witnesses

In the past years, several methods have been suggested that might mitigate the perilous effect of biases on decision-making in expert witnesses. For example, one suggested method is asking people to actively consider alternative hypotheses that contradict the preferred primary hypothesis (Heuer, Citation1999; Otgaar et al., Citation2017; O’Brien, Citation2009; Rassin, Citation2018; van Koppen & Mackor, Citation2019; Vredeveldt et al., Citation2022). Consideration of alternatives may trigger a bias-reducing mindset, as individuals who actively consider an alternative scenario would be sensitized to different options, thereby making them more critical and less susceptible to confirmation or allegiance bias (O’Brien, Citation2009; Rassin, Citation2018). Thinking in terms of alternative scenarios is rooted in Popper’s (Citation1959, Citation1963) concept of falsification. According to Popper, one of the critical steps in determining whether the hypothesis is (provisionally) correct is to engage in falsification rather than merely confirmation. Falsification refers to searching for facts that disconfirm an existing hypothesis. Popper (Citation2005) argued that, even with supporting evidence, a hypothesis can only be maintained until evidence that disconfirms it is detected through falsification attempts. When trying to determine whether a hypothesis possesses explanatory credit, one must look not only at the supporting evidence but also at the falsifying evidence. If serious attempts at falsification fail, this can be considered support for the hypothesis.

Based on this reasoning, scholars have advocated a scenario approach in expert witness work (Otgaar et al., Citation2017; van Koppen & Mackor, Citation2019; Vredeveldt et al., Citation2022). This approach proposes that integrating alternative scenarios in the decision-making process might protect against tunnel vision or confirmation bias (Otgaar et al., Citation2017; Rassin, Citation2018; van Koppen & Mackor, Citation2019). Thus, elaborating scenarios can protect expert witnesses from the effects of bias, which can be detrimental to their performance (Kassin et al., Citation2013). Although alternative scenarios are the standard approach required in technical forensic expert witness reports (AFSP, Citation2009), scenario approaches are rare in psychological expert witness reports. One issue here is that there is no consistent scientific evidence at present that a scenario approach helps to mitigate bias. That is, the studies that have been conducted in this domain have yielded conflicting results.

Three studies supported the idea that the alternative scenario method can prevent bias (Griffith, Citation2019; O’Brien, Citation2009; Rassin, Citation2018), and at least three experiments did not find such evidence (Maegherman et al., Citation2021; Sauerland et al., Citation2020, Experiments 2 and 3). O’Brien (Citation2009, Experiment 2) had participants review a criminal investigation file on a murder case, and then answer several questions about the case under several conditions, including a counter-hypothesis condition. Findings suggested that instructions to elaborate on why the suspect might be innocent (counter-hypothesis condition) reduced bias, that is, reduced guilty ratings given to the suspect character compared with conditions in which alternative scenarios were not actively considered.

Another experiment examined the use of alternative scenarios in legal decision-making (Rassin, Citation2018). Laypeople and criminal justice experts read a case vignette about a murdered psychiatrist. In the control condition, participants received a basic version of the case vignette, which described one suspect. In the second condition, participants received an alternative version of the case vignette describing the alternative scenario of another suspect. In the last condition, a pen-and-paper condition, participants received an alternative suspect version and were additionally asked to imagine that the victim was not killed by the suspect but by another perpetrator. Following this manipulation, participants assessed to which extent the police findings fitted into the alternative scenario. Participants also decided whether the primary suspect should be charged. Neither laypeople nor criminal justice experts in the alternative scenario conditions differed from those in the control condition in their mean ratings of the incriminating strength of the evidence or in their decision of whether the suspect should be convicted. However, compared with the control condition, the pen-and-paper tool significantly decreased scores on these dependent measures. Thus, the alternative scenario approach worked in reducing bias, but only so when actively/explicitly considered by the participants.

Griffith (Citation2019) used a debiasing strategy called ‘considering the opposite’. In her study, forensic clinicians were asked to read a case scenario (vignette), select one of two hypotheses, and then rate their confidence (0–100) for the selected hypothesis. Next, participants viewed six confirmatory, disconfirmatory, or filler information items and were asked to evaluate the importance of each item in constructing their hypothesis. Overall, participants preferred confirmatory information significantly more than disconfirmatory information. However, the intervention (consider-the-opposite debiasing) significantly reduced confirmatory information ratings compared with the control condition. This suggests that this debiasing strategy reduces forensic clinicians’ value of hypothesis-confirming information.

Other studies failed to find that the alternative scenario technique may reduce bias. For example, an experiment including an alternative scenario instruction (i.e. the Analysis of Competing Hypothesis, ACH) failed to detect any debiasing potential (Maegherman et al., Citation2021). Students in the control condition were asked to read general information about bias, while participants in the ACH group read general information about bias and were told how the ACH procedure works. Then, participants were given a case file about a woman suspected of murdering her husband. After reading the case file, participants were asked to rate how likely the woman was guilty, whether or not they would convict her for the murder of her husband, and how confident they felt about their decision. Participants were asked to write down their hypotheses about what happened. The number of scenarios formulated and perpetrators named in the scenarios were counted. Participants were then given a list of confirming and disconfirming investigative questions. Participants in the ACH condition did not differ significantly from participants in the control condition concerning importance ratings for the exonerating and incriminating evidence. Neither did participants in the ACH condition differ significantly from participants in the control condition with regard to the guilt ratings. The findings show that it is difficult for people to apply ACH, and therefore, it may not be effective to reduce bias. This result is also consistent with recent findings that showed mixed evidence for the effectiveness of ACH in reducing confirmation bias among trained analysts instructed to use the ACH technique (Dhami et al., Citation2019). ACH analysts were no more likely than untrained analysts to find the four alternative hypotheses. However, the ACH group was more apt to rate evidence as inconsistent or consistent with each hypothesis (rather than simply more or less consistent) than untrained analysts.

Two experiments of Sauerland et al. (Citation2020, Experiments 2 and 3) also yielded null results. In their experiments, participants were asked to read a case file about a father who was suspected of sexually abusing his daughter. Participants were randomly assigned to one of two bias conditions (defense vs. prosecution) induced by one of two letters. Experiment 2 also included a control condition (it just mentioned that participants could use a form to make notes) and an experimental condition (where participants were asked to look for elements in the case file that were in favor of or against the statements’ reliability and where they could use a form to take notes). Experiment 3 included a control group and another condition in which participants were asked to look at three specified possible scenarios. After studying the case file and reading the instructions, participants in Experiments 2 and 3 were asked to answer pertinent questions about the case (e.g. how likely is it that the child was sexually abused? how likely is it that the child’s testimony is reliable? etc.). In Experiment 2, instruction (standard vs. two-sided) did not have an effect on participants’ case assessment scores irrespective of the cover letter they received. Experiment 3 also did not provide evidence for the idea that emphasizing alternative scenarios may reduce allegiance bias.

To sum up, so far, research has provided mixed results about the usefulness of alternative scenario instructions for debiasing participants. In previous experiments (e.g. Sauerland et al., Citation2020), participants were provided with alternative scenario instructions after they had read the case file. This sequence may allow participants to have formed an opinion while reading the case file; they might no longer be open-minded when thinking of alternative scenarios afterwards. In the current experiments, we therefore provided participants with scenarios before they read the case file. Indeed, Otgaar et al. (Citation2017) suggested that scenario consideration should ideally happen prior to viewing the case file. According to these authors, the idea of scenario-building aligns well with the idea of pre-registration in science, in which the researchers report, their hypotheses and plan analytical tests before data collection (Wagenmakers et al., Citation2012). Additionally, Otgaar et al. (Citation2017) argued that the scenario-building concept is strongly connected to hypothesis testing in many fields, where one must formulate a null hypothesis and an alternative hypothesis prior to conducting an experiment.

The present experiments

In two experiments, we examined the potential of alternative scenarios for debiasing expert witnesses’ reliability assessment of testimonies in an abuse case. Student participants acted as expert witnesses and received mock case files concerning alleged child sexual abuse. Slightly modifying Sauerland et al.’s (Citation2020) procedure, we asked participants to think about the alternative scenario before they read the case file. We expected participants who received alternative scenarios to be more skeptical about the reliability of the testimony than participants in the no-scenario condition. In Experiment 2, in addition to the use of scenarios, we biased participants by providing context-irrelevant information from a lawyer or prosecutor about the case. We expected this manipulation to result in allegiance bias (Sauerland et al., Citation2020), with participants in the defense condition being more skeptical about the reliability of statements than participants in the prosecution condition. We also expected that considering alternative scenarios would protect against allegiance bias. We report both experiments in tandem as there is much overlap in the procedures and materials.

Method

Participants

In Experiment 1, we aimed for a minimum sample size of 102 participants, based on an a priori power analysis using G*Power 3.1 version for an independent sample t-test with power = .80, α = .05, and a medium effect size of Cohen’s d = 0.50 (Faul et al., Citation2009). The medium effect size was determined based on Rassin (Citation2018), who reported a Cohen’s d of 0.34 (medium) for the difference between the basic scenario and the alternative scenario condition and Cohen’s d of 1.61 (large) for the difference between the basic scenario and pen-paper tool alternative scenario. One hundred and fifty Law and Criminology students participated in this experiment, aged 18–25 years (M = 19.53, SD = 5.70), and 90% of participants were female (n = 135). Participants were randomly assigned to the alternative scenario group (n = 67) or the control group / no scenario (n = 83).

For Experiment 2, based on an a priori power analysis using G*Power 3.1 version for ANOVA with fixed effect, main effects, and interaction with power = .80, α = .05, a medium effect size of f = .25, numerator df = 5, and the number of groups = 6, (Faul et al., Citation2009), we required 211 participants. The medium effect size for allegiance bias was determined based on Sauerland et al. (Citation2020), who found a Cohen’s d of 0.47 (medium) for the difference between defense-appointed and prosecutor-appointed participants. In Experiment 2, the final sample consisted of 205 Law or Criminology students, aged 18–29 years (M = 19.96, SD = 5.80), 79% of which were female (n = 162). We used a 2 (scenario: alternative scenario vs. no scenario) × 3 (letter: prosecutor letter vs. defense lawyer letter vs. no cover letter) between-subjects design. Participants were randomly assigned to one of six groups. Sample sizes per condition were alternative scenario – prosecutor letter (n = 30), alternative scenario – defense lawyer letter (n = 34), alternative scenario – no cover letter (n = 26), no scenario – prosecutor letter (n = 47), no scenario – defense lawyer letter (n = 31), and no scenario – no cover letter (n = 37). Both experiments took place as part of a course on Criminological Psychology. We preregistered these experiments on the Open Science Framework, where the materials and data are also available (https://osf.io/h4ksz). The study was approved by the standing ethical committee of the Faculty of Psychology and Neuroscience, Maastricht University.

Materials

Case file

In Experiments 1 and 2, we used the case file from Sauerland et al. (Citation2020), which can be found on https://osf.io/n2k5u/. The case file describes a sexual abuse case against the father of a seven-year-old girl named Victoria. The case file includes a police interview with Victoria, her mother, and her father. Apart from serious abuse accusations, Victoria also mentioned that her father was very strict.

The case files contained ambiguous and conflicting information in order to give rise to cognitive bias. Specifically, in the police interview files with the alleged victim’s mother, the file mentioned that the mother thought that her ex-husband (Victoria’s father) was a difficult man. She was the first to ask Victoria if her father had touched her. The interview with the victim (Victoria) showed that she could not remember everything and could not tell much because it made her feel uncomfortable. Interviews with the victim’s father (the suspected perpetrator) showed that his relationship with his ex-wife was difficult and that the parents had different parenting styles. The father denied the accusations against him.

Case assessment

Participants in each group provided two general assessments: ‘From 0 to 100, in your opinion, how reliable is the victim’s statement in the case?’ and ‘From 0 to 100, in your opinion, how likely is it that the suspect in the case is guilty?’ The reliability of the victim’s statement consisted of participants’ answers to four case assessment questions that were assessed on a seven-point Likert scale: How likely is it that (1 = very unlikely, and 7 = very likely) 1.  …  the child was physically abused? 2. … the child was sexually abused? 3.  … the child’s testimony is reliable?, and 4.  … the events happened as the child described?

In Experiment 2, we also asked participants about the reliability of the victim’s statements about sexual abuse. The questions were rated on a seven-point Likert scale (1 = very unreliable, and 7 = very reliable): 5. How reliable is the child’s statement about sexual abuse?’. We not only examined scores for separate items, but also created a case assessment score by summing items 2, 3, and 4 (+5 in Experiment 2). Across experiments, we obtained good to excellent internal consistency (Cronbach’s α for the case assessment score varied between .91 and .93).

Attention check

After participants answered all the questions, they answered three additional questions that aimed to check whether they paid attention and read all case files carefully (i.e. the victim’s name, the perpetrator’s name, and the age of the victim). To ensure that participants paid attention to the case file while they were presented with the case information, we added a couple of extra attention checks (basic questions, i.e. ‘what is 3 + 7?’) before participants moved to each new page. Participants who incorrectly answered 2 of the 3 attention check questions and basic questions were excluded from this study.

Cover letter

In Experiment 2, we added the letters of the defense and prosecution to induce allegiance bias (these can be found on https://osf.io/n2k5u/). The letters were quite similar to those used by Sauerland et al. (Citation2020). Thus, participants received one of two cover letters: either from the defense lawyer questioning the truth of the accusation, or from the prosecutor supporting the allegation. The defense attorney’s letter emphasized aspects of the file that cast doubts on the veracity of the accusation, including that the mother’s questioning of the child (Victoria) was highly suggestive, that Victoria’s parents were divorced, that her mother attempted to influence her, and that the police interview with Victoria contained suggestive interviewing techniques. The prosecutor’s letter stressed components of the file that supported the veracity of the complaint, including Victoria’s spontaneous statement, her detailed statement, and her interview in a child-friendly environment in a child investigative interview setting. The cover letter was given before the participants read the statement from the case file.

Design and procedure

Experiment 1 used a between-subjects design in which participants were randomly assigned to the experimental (i.e. alternative scenario) or control group. Experiment 2 employed a 2 (scenario: alternative scenario vs. no scenario) × 3 (letter: prosecutor letter vs. defense lawyer letter vs. no cover letter) between-subjects design in which participants were randomly assigned to one of six groups. Scenario pertains to instructions to encourage considering an alternative scenario or no such instructions. Letter to induce allegiance bias pertains to a prosecutor’s letter, defense lawyer’s letter, or no cover letter.

Data collection was done online using the online survey tool Qualtrics. After reading and providing informed consent, participants in each group received instructions that they should imagine having to serve as an expert witness. Prior to reading the case file, participants in the alternative scenario group had to carefully consider two possible scenarios that might underlie the child’s statements and find elements that did or did not support one or the other scenario. The scenarios were described as follows:

  • ▪ Scenario 1: the child describes events that she really experienced (the abuse took place);

  • ▪ Scenario 2: the child did not experience the events, but she fell prone to suggestion from someone else (the abuse did not take place, but she thinks it did).

While reading the case file, participants in the alternative scenario condition were asked to write elements that supported or contradicted the victim’s statements’ reliability in the space provided online after each page of the case file. Participants were asked to focus on how the evidence fitted with each alternative scenario. Participants in the no scenario control condition received standard instructions stating that they could take notes in the space provided online and then received access to the case file. Participants did not receive any information concerning possible scenarios. In Experiment 2, before reading the case file, participants were additionally presented with a cover letter from the defense lawyer questioning the truth of the accusation, a cover letter from the prosecutor supporting the allegation, or they were not provided with a cover letter.

The case file was presented on three separate pages online (i.e. interview with the mother, interview with the father, and interview with the child). We gave three minutes for each interview file (participants could not click ‘next’ until the three minutes had passed). This was done so that each participant had enough time to read the entire interview file and would not accidentally click the next button. Participants in each group provided the case assessment. Participants could submit the questionnaires after completing all tasks, taking approximately 30–45 min. Participants received a debriefing after completing the surveys.

Results and discussion

Experiment 1

Using a one-tailed independent Welch t-test, we found no statistically significant differences between the two conditions. provides the means, standard deviations, and inferential statistics for the comparisons.

Table 1. Means, standard deviations, and inferential statistics for the six assessment ratings and case assessment scores in the alternative-scenario and no-scenario conditions (Experiment 1).

Although not preregistered, we also computed the Bayes Factors (BF) with a Cauchy prior of 0.707 (see ) with the evidence for an alternate hypothesis (Jarosz & Wiley, Citation2014).

Table 2. Bayes factor for all assessment questions.

In sum, we found no statistically significant differences between participants who were sensitized to the alternative scenario and those who were not. Thus, considering scenarios when evaluating cases did not affect participants’ case evaluation. This null finding is line with literature showing that debiasing techniques are often fragile and highly domain specific (Lilienfeld et al., Citation2009). However, it might be the case that our approach did not elicit any biased ratings to begin with, making any debiasing approach obsolete. It is possible that the use of alternative scenarios is only effective when expert witnesses are biased toward the side that retains them. To address this issue, we attempted to induce allegiance bias by providing a letter indicating that a specific party employed the expert witness in Experiment 2 (cf. Sauerland et al., Citation2020). We hypothesized that alternative scenarios could reduce bias when expert witnesses had to make judgments about a case.

Experiment 2

We conducted 2 (scenario: alternative scenario vs. no scenario) × 3 (letter: prosecutor letter vs. defense lawyer letter vs. no cover letter) ANOVAs for the seven assessment questions and the overall case assessment score. In general, we predicted that the alternative-scenario groups would endorse lower, more cautious ratings than the no-scenario group regardless of induced bias. However, we did not find any significant differences between scenario conditions on the case assessment score F(1, 199) = 0.77, p = .382, d = 0.004.Footnote2 We also predicted that the group that received a cover letter from the prosecutor and the group that received a cover letter from the defense lawyer would make a different case assessment. However, the main effect of letter, F(2, 199) = 0.08, p = .920, d = 0.001Footnote3 was statistically non-significant. Also, the interaction between the scenario and letter was statistically non-significant F(2, 199) = 0.98, p = .377, d = 0.01.Footnote4

Taken together, we found no statistically significant difference in case assessment scores between the groups that used the alternative scenario and those that did not. The findings showed that considering scenarios when evaluating cases did not affect their evaluation. We also did not find evidence for an allegiance bias. provides the means and standard deviations for the seven assessment ratings and the case assessment score in each condition.

Table 3. Means and standard deviations for the seven assessment ratings and case assessment score in each condition (Experiment 2).

We also performed an exploratory internal joint analysis and collapsed the data of the no-scenario versus scenario conditions of Experiments 1 and 2 (N = 355) and performed ANOVAs on our key dependent variables. provides the means, standard deviations, and inferential statistics for the comparisons. In general, we found no significant difference in the case assessment between the groups, except for one item, assessing the suspect’s guilt. Considering scenarios affected the assessment of the suspect’s guilt, F(1, 353) = 5.72, p = .017, d = 0.127, although the effect size was very small. Specifically, participants who received alternative scenarios were more skeptical about the suspect’s guilt (M = 40.79, SD = 17.60) than participants in the no-scenario condition (M = 45.44, SD = 19.41). As for the other items and the total case assessment, considering scenarios did not affect their evaluation.

Table 4. Means, standard deviations, and inferential statistics for the six assessment ratings and case assessment scores in the alternative-scenario and no-scenario conditions (Experiment 1 and Experiment 2)

General discussion

The aim of the current two experiments was to examine the value of using alternative scenarios for reducing bias. Participants received a case file concerning alleged child sexual abuse and evaluated the reliability of a sexual abuse accusation using several case evaluation questions. One group received instructions to use two-sided scenarios, while the other group did not receive any instructions. In the second experiment, we added a manipulation to induce allegiance bias by providing a cover letter from a defense lawyer or prosecutor along with the case file. Across both experiments, we did not find that alternative scenarios made participants more critical about the reliability of the statements, nor did we observe allegiance bias in the participants’ responses.

The approach of alternative scenarios encourages decision makers to consider and evaluate all information, which should protect them from making erroneous inferences (Rassin, Citation2018) as a result of being less prone to bias (Kassin et al., Citation2013; O’Brien, Citation2009). Our findings did not provide support for this idea. Our null findings are, however, in line with two other experiments (Sauerland et al., Citation2020, Experiments 2 & 3). Sauerland et al. (Citation2020) provided scenarios after participants read the case files. We assumed that asking participants to consider alternative scenarios after reading the case file enabled them to form early judgments about the case and the reliability of the statements. Thus, when asked to consider an alternative scenario, they were almost certainly stuck on the primary scenario that they constructed themselves. However, providing scenario instructions before reading a case file did not significantly affect the assessment of a case in the current experiments.

Our findings are also in line with Maegherman et al. (Citation2021), who evaluated whether the use of competing hypotheses could protect against confirmation bias in the context of criminal cases. Participants using competing hypotheses did not differ significantly from participants in the control condition in their initial assessment of the likelihood of guilt, their perception of guilt, or their search for more information. Similar to the current study, participants in neither of the conditions in Maegherman et al. (Citation2021) showed the expected bias, which also meant that the findings could not support the protective effect of using alternative scenarios.

One potential explanation for why the alternative scenario approach was not effective is that participants in both groups may already have considered alternative scenarios about the case. Moreover, this study only provided two scenarios: one where the abuse occurred and the other where it did not. Participants in the control group may also have thought of scenarios beyond those provided to be more critical in assessing cases.

Another explanation for why debiasing in the form of considering the alternative did not work is that debiasing techniques are often fragile and highly domain-specific (Lilienfeld et al., Citation2009). According to Lilienfeld et al. (Citation2009), debiasing may fail due to a bias blind spot (i.e. they do not perceive themselves as biased and therefore in need of improvement). Furthermore, many people may reject debiasing efforts because they do not consider them relevant to their personal lives. Some debiasing attempts may be successful if participants are convinced that their bias leads to real-world decisions. In addition, because the manipulation failed to induce bias in the second experiment, experiment 2 does not allow us to make inferences about the capacity of the alternative scenario method to minimize bias because the advantages of a debiasing strategy can only be apparent under biased conditions.

In our exploratory joint analysis of Experiments 1 and 2, we found weak support for the idea that considering alternative scenarios can affect the assessment of the suspect’s guilt. As expected, considering the alternative scenario made participants more skeptical about the suspect’s guilt rating with an effect size of 0.127. In other words, this effect shows that 55.2% of the no-scenario group rated the assessment of the suspect’s guilt above the mean of the scenario group (Magnusson, Citationn.d.). It thus seems that our scenario manipulation may have only exerted a small effect on our dependent variables.

In Experiment 2, we did not find evidence for an allegiance bias. This finding contradicts previous studies (McAuliff & Arter, Citation2016; Murrie et al., Citation2013; Sauerland et al., Citation2020) that demonstrated allegiance bias for evaluations in child sexual abuse cases, where experts can be biased by who hires them. On the one hand, the result that participants were not biased in assessing the case, is a promising direction for the future, considering that they will become experts. Yet, there are some possible explanations for why the allegiance bias did not occur in the case assessment. For example, we presented a case file containing interviews between the police and the victim, the alleged perpetrator, and eyewitnesses (the victim’s mother). Contrary to other studies (McAuliff & Arter, Citation2016), however, our file did not explicitly indicate the quality of the evidence.

Furthermore, there might be an effect of the legal system where this experiment was conducted. This study was conducted in Belgium and the Netherlands. Both countries use civil law with an inquisitorial system. The legal system may affect whether allegiance bias can be induced through the cover letters provided. The role of the public prosecutor may differ depending on the legal traditions adopted in a particular country. Common-law countries use an adversarial system to determine facts in the adjudication process. The prosecution and defense compete with each other, and judges serve as arbiters to ensure justice for the accused and that the rules of criminal procedure law are followed. Meanwhile, the inquisitorial system is associated with the civil law legal system, which is characterized by extensive pre-trial investigations and interrogations with the aim of avoiding bringing innocent people to justice. An inquisitorial process can be described as a formal inquiry to ascertain truth, whereas an adversarial system uses a competitive process between the prosecution and defense to determine facts. The inquisitorial process gives more power to the judge who oversees the process, while the judge in the adversarial system functions as an arbiter between prosecution and defense demands (Reichel, Citation2017). Because our data collection was conducted in countries that adopt an inquisitorial system, participants may have been less likely to write reports in favor of a side, thereby reducing allegiance bias. This is also corroborated by the finding that many participants failed the manipulation check because they did not indicate that they were hired by one of the parties (in this case, the participants were excluded from the analysis).

There are some more limitations worth mentioning. First, we used an online experiment to collect the data. These debiasing experiments require greater interest and cognitive involvement, which participants may not have when they participate online. In online experiments, participants may lack investment, and dropout rates tend to be much greater than in laboratory-based experiments (Dandurand et al., Citation2008). This was confirmed here, with an overall dropout rate of 47%, which is a sizeable dropout rate as it is close to 50%. These experiments are also quite difficult, require a significant time commitment, and involve complex tasks. Participants were asked to read the case file, think about and write down the evidence that supported and did not support each scenario, and then assess the case. Dandurand et al. (Citation2008) stated that studies with higher difficulty levels or significant time commitments are prone to increased dropout rates. In addition, Finley and Penningroth (Citation2015) concluded that for lengthy or complex tasks, data loss is attributable to participant distraction, forgetfulness, and inattention, which may occur to a more significant degree in online than in lab-based experiments. This limitation can also be associated with the failure of the material from Experiment 2 to induce allegiance bias, although similar material (but not in the online experiment setting) can lead to allegiance bias (Sauerland et al., Citation2020).

In relation to the online experiments, it is also possible that participants were not sufficiently invested in the case or focused on the cover letter. A lack of investment may have reduced the cognitive dissonance that would have occurred in reaction to contradictory information, such as between the primary scenario and the alternative scenario. The lack of attention can cause participants to not be fully aware and internalize that a particular party in this experiment hired them. This assumption is supported by our finding that the ratings given by participants in each group tended to be in the middle of the rating scale (close to the hypothetical mean). Krosnick (Citation1999) named the tendency to choose the middle response as satisficing, because participants chose the first acceptable option rather than trying to decide on the most appropriate option. The tendency to choose a middle response generally occurred more frequently in participants who were less motivated or who had less salient questions that required more cognitive effort.

Second, the current experiments clearly possess lower ecological validity than past studies relying on professional samples (McAuliff & Arter, Citation2016; Murrie et al., Citation2013; Rassin, Citation2018). We asked students to act as experts based on the cover letter they received. That means they only imagined being an expert witness (as opposed to being an actual expert witness). Their background did not allow them to judge the evidence based on their experience as a real expert witness. Expert witnesses’ experience and expertise could make them more likely to understand the case when assessing it. The real expert witnesses may think about scenarios more carefully and make case judgments more thoroughly than students.

In our experiment, participants were asked to assess attributes such as reliability and guilt on a Likert scale. However, such assessments do not necessarily reflect the types of tasks that would be asked of an actual expert, thereby reducing ecological validity. Yet, the purpose of this research was to test whether there was an effect of alternative scenario thinking and allegiance bias on case assessment. So, even though the ecological validity of the tasks used (as a case assessment measure) in this study was low, we opted for them as dependent measures to test the relation between the IVs (allegiance bias and alternative scenarios) and the DV (case assessment). This also allowed us to ensure the internal validity of the research. With experimental methods such as those used in this study, we can examine whether there was evidence of allegiance bias and whether the alternative scenario method was effective in reducing bias. We chose this method, because it allowed us to control secondary variables, such as work experience as an expert witness, types of cases generally handled, years of working as an expert witness, etc.

With the exception of findings concerning the guilt ratings of our joint analysis, our findings showed no convincing evidence that alternative scenarios affect statement reliability assessment. However, these findings of the current study should not be interpreted as supporting the abandonment of the alternative scenario method in statement reliability assessment. There are solid reasons to maintain the use of alternative scenarios in expert witness work. First, in line with the idea of falsification (Popper, Citation1963), it is good practice to attempt to falsify hypotheses. The idea of falsification is a cornerstone of science and hence, should also be used when conducting expert witness work. Second, the idea of alternative scenarios comes close to the idea of preregistration which scientists recommend for combatting (confirmation) biases in research (Hardwicke & Wagenmakers, Citation2021). Although there is currently no experimental evidence showing for the idea that preregistration does reduce biases in research, this approach does make science more transparent. Transparency is similarly relevant for expert witness work. Using scenarios and describing them in expert witness reports increases the transparency of expert witness conclusions. Such transparency can assist triers of fact to reach well-founded legal decisions concerning cases of alleged sexual abuse. Therefore, further research should be promoted to examine whether alternative scenarios can be used as an appropriate method to reduce bias in the legal context. In the future, conducting face-to-face experiment with real expert witnesses is expected to increase the ecological validity of the research. With stronger manipulation and rigorous manipulation checks, we can test whether considering alternative scenarios affects the assessment of statement validity by expert witnesses.

Acknowledgment

We sincerely thank Dr. Carey Marr (UNSW Sydney) for proofreading the article.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statements

All data and supplemental materials that support the findings of this study can be found at https://osf.io/n2k5u/.

Additional information

Funding

This work was supported by the Direktorat Jenderal Pendidikan Tinggi (Grant Number: T/966/D3.2/KD.02.01/2019) through BPPLN scholarship granted to the first author.

Notes

1 Practically, the terms ‘reliability’ and ‘validity’ are virtually interchangeable, but in psychometrics, they have different meanings (Arbiyah et al., Citation2021). Reliability refers to consistency, such as temporal consistency, which means that measurement should yield the same outcome if repeated. Validity is the degree to which a test measures what it purports to measure (Anastasi & Urbina, Citation1997). However, in the legal arena, the term reliability is frequently used when we want to examine whether statements of victims or eyewitnesses referred to events that they truly experienced. We will use the term reliability in this article.

2 For the ANOVAs of the separate items, all yielded a non-significant main effect of scenario, all Fs(1, 199) < 3.668, all ps > .057.

3 For the ANOVAs of the separate items, all yielded a non-significant main effect of letter, all Fs(1, 199) < 1.877, all ps > .156.

4 For the ANOVAs of the separate items, all yielded a non-significant interaction between scenario and letter, all Fs(1, 199) < 1.328, all ps > .267.

References

  • Anastasi, A., & Urbina, S. (1997). Psychological testing. Prentice Hall/Pearson Education.
  • Arbiyah, N., Otgaar, H., & Rassin, E. (2021). Are victim or eyewitness statements credible? Several ways to check them. In-Mind, 46. https://www.in-mind.org/article/are-victim-or-eyewitness-statements-credible-several-ways-to-check-them
  • Association of Forensic Science Providers (AFSP). (2009). Standards for the formulation of evaluative forensic science expert opinion. Science & Justice, 49(3), 161–164. https://doi.org/10.1016/j.scijus.2009.07.004
  • Blair, P. R., Marcus, D. K., & Boccaccini, M. T. (2008). Is there an allegiance effect for assessment instruments? Actuarial risk assessment as an exemplar. Clinical Psychology: Science and Practice, 15(4), 346. https://doi.org/10.1111/j.1468-2850.2008.00147.x
  • Blandon-Gitlin, I., Pezdek, K., Lindsay, D. S., & Hagen, L. (2009). Criteria-based content analysis of true and suggested accounts. Applied Cognitive Psychology, 23(August 2008), 901–917. https://doi.org/10.1002/acp.1504
  • Bogaard, G., Meijer, E. H., Vrij, A., Broers, N. J., & Merckelbach, H. (2014). Contextual bias in verbal credibility assessment: Criteria-based content analysis, reality monitoring and scientific content analysis. Applied Cognitive Psychology, 28(1), 79–90. https://doi.org/10.1002/acp.2959
  • Dandurand, F., Shultz, T., & Onishi, K. (2008). Comparing online and lab methods in a problem-solving experiment. Behavior Research Methods, 40(2), 428–434. https://doi.org/10.3758/BRM.40.2.428
  • Dhami, M. K., Belton, I. K., & Mandel, D. R. (2019). The “analysis of competing hypotheses” in intelligence analysis. Applied Cognitive Psychology, 33(6), 1080–1090. https://doi.org/10.1002/acp.3550
  • Dodier, O., & Denault, V. (2018). The Griffiths question map: A forensic tool for expert witnesses’ assessments of witnesses and victims’ statements. Journal of Forensic Sciences, 63(1), 266–274. https://doi.org/10.1111/1556-4029.13477
  • Dror, I. E., & Cole, S. A. (2010). The vision in blind justice: Expert perception, judgment, and visual cognition in forensic pattern recognition. Psychonomic Bulletin & Review, 17(2), 161–167. https://doi.org/10.3758/PBR.17.2.161
  • Engel, C., & Glöckner, A. (2013). Role-induced bias in court: An experimental analysis. Journal of Behavioral Decision Making, 26(3), 272–284. https://doi.org/10.1002/bdm.1761
  • Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G* Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. https://doi.org/10.3758/BRM.41.4.1149
  • Finley, A., & Penningroth, S. (2015). Online versus in-lab: Pros and cons of an online prospective memory experiment. In A. M. Columbus (Ed.), Advances in psychology research (pp. 135–161). Nova.
  • Gowensmith, W. N., & McCallum, K. E. (2019). Mirror, mirror on the wall, who’s the least biased of them all? Dangers and potential solutions regarding bias in forensic psychological evaluations. South African Journal of Psychology, 49(2), 165–176. https://doi.org/10.1177/0081246319835117
  • Griffith, R. L. (2019). Forensic confirmation bias: Is consider-the-opposite an affective debiasing strategy [Master thesis]. Washburn University Repository. https://wuir.washburn.edu/bitstream/handle/10425/1962/Griffith%2C%20Rebecca%20-%202019.pdf?sequence=1&isAllowed=y.
  • Hardwicke, T. E., & Wagenmakers, E. (2021, April 23). Preregistration: A pragmatic tool to increase transparency, reduce bias, and calibrate confidence in scientific research. https://doi.org/10.31222/osf.io/d7bcu
  • Heuer, R. J. (1999). Analysis of competing hypotheses. In Psychology of intelligence analysis (pp. 95–110). CQ Press. https://doi.org/10.1007/s10606-008-9080-9
  • Jarosz, A. F., & Wiley, J. (2014). What are the odds? A practical guide to computing and reporting Bayes factors. The Journal of Problem Solving, 7(1), 2–9. https://doi.org/10.7771/1932-6246.1167
  • Kassin, S. M., Dror, I. E., & Kukucka, J. (2013). The forensic confirmation bias: Problems, perspectives, and proposed solutions. Journal of Applied Research in Memory and Cognition, 2, 42–52. https://doi.org/10.1016/j.jarmac.2013.01.001
  • Köhnken, G. (2004). Statement validity analysis and the ‘detection of the truth’. The Detection of Deception in Forensic Contexts, 41–63. https://doi.org/10.1017/CBO9780511490071.003
  • Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567. https://doi.org/10.1146/annurev.psych.50.1.537
  • Lilienfeld, S. O., Ammirati, R., & Landfield, K. (2009). Giving debiasing away: Can psychological research on correcting cognitive errors promote human welfare? Perspectives on Psychological Science, 4(4), 390–398. https://doi.org/10.1111/j.1745-6924.2009.01144.x
  • Maegherman, E., Ask, K., Horselenberg, R., & van Koppen, P. J. (2021). Test of the analysis of competing hypotheses in legal decision-making. Applied Cognitive Psychology, 35(1), 62–70. https://doi.org/10.1002/acp.3738
  • Magnusson, K. (n.d.). Interpreting Cohen’s d effect size: An interactive visualization. Retrieved November 10, 2022, from https://rpsychologist.com/cohend/
  • McAuliff, B. D., & Arter, J. L. (2016). Adversarial allegiance: The devil is in the evidence details, not just on the witness stand. Law and Human Behavior, 40(5), 524–535. https://doi.org/10.1177/0956797613481812
  • Munder, T., Gerger, H., Trelle, S., & Barth, J. (2011). Testing the allegiance bias hypothesis: A meta-analysis the allegiance bias hypothesis: A meta-analysis. Psychotherapy Research Research, 21(6), 670–684. https://doi.org/10.1080/10503307.2011.602752
  • Murrie, D. C., & Boccaccini, M. T. (2015). Adversarial allegiance among expert witnesses. Annual Review of Law and Social Science, 11, 37–55. https://doi.org/10.1146/annurev-lawsocsci-120814-121714
  • Murrie, D. C., Boccaccini, M. T., Guarnera, L. A., & Rufino, K. A. (2013). Are forensic experts biased by the side that retained them? Psychological Science, 24(10), 1889–1897. https://doi.org/10.1177/0956797613481812
  • Murrie, D. C., Boccaccini, M. T., Johnson, J. T., & Janke, C. (2008). Does interrater (dis) agreement on psychopathy checklist scores in sexually violent predator trials suggest partisan allegiance in forensic evaluations? Law and Human Behavior, 32(4), 352–362. https://psycnet.apa.org/doi/10.1007/s10979-007-9097-5
  • Murrie, D. C., Boccaccini, M. T., Turner, D. B., Meeks, M., Woods, C., & Tussey, C. (2009). Rater (dis) agreement on risk assessment measures in sexually violent predator proceedings: Evidence of adversarial allegiance in forensic evaluation? Psychology, Public Policy, and Law, 15(1), 19–53. https://psycnet.apa.org/doi/10.1037/a0014897
  • Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2, 175–220. https://psycnet.apa.org/doi/10.1037/1089-2680.2.2.175
  • O’Brien, B. (2009). Prime suspect: An examination of factors that aggravate and counteract confirmation bias in criminal investigations. Psychology, Public Policy, and Law, 15, 315–334. https://doi.org/10.1037/a0017881
  • Otgaar, H., Arbiyah, N., & Mangiulli, I. (2020). The toolbox of memory experts working as expert witnesses. In R. Horselenberg, V. Van Koppen, & J. De Keijser (Eds.), Bakens in de rechtspsychologie: Liber amicorum voor peter van koppen (pp. 477–488). Boom criminologie.
  • Otgaar, H., de Ruiter, C., Howe, M. L., Hoetmer, L., & van Reekum, P. (2017). A case concerning children’s false memories of abuse: Recommendations regarding expert witness work. Psychiatry, Psychology and Law, 24(3), 365–378. https://doi.org/10.1080/13218719.2016.1230924
  • Popper, K. (2005). The logic of scientific discovery. Routledge.
  • Popper, K. R. (1959). The propensity interpretation of probability. The British Journal for The Philosophy of Science, 10(37), 25–42. https://doi.org/10.1093/bjps/X.37.25
  • Popper, K. R. (1963). Conjectures and refutations: The growth of scientific knowledge. Routledge & Keagan Paul.
  • Rassin, E. (2018). Reducing tunnel vision with a pen-and-paper tool for the weighting of criminal evidence. Journal of Investigative Psychology and Offender Profiling, 15(2), 227–233. https://doi.org/10.1002/jip.1504
  • Reichel, P. (2017). Comparative criminal justice systems: A topical approach (7th ed.). Pearson.
  • Sauerland, M., Otgaar, H., Maegherman, E., & Sagana, A. (2020). Allegiance bias in statement reliability evaluations is not eliminated by falsification instructions. Zeitschrift für Psychologie, 228(3), 210–215. https://doi.org/10.1027/2151-2604/a000416
  • Steller, M., & Kohnken, G. (1989). Criteria-based statement analysis. In D. Raskin (Ed.), Psychological methods in criminal investigation and evidence (pp. 217–245). Springer Publishing Company, Inc.
  • Thase, M. (1999). Commentary. What is the investigator allegiance effect and what should we do about it? Clinical Psychology: Science and Practice, 6, 113–115. https://doi.org/10.1093/clipsy/6.1.113
  • van Koppen, P. J., & Mackor, A. R. (2019). A scenario approach to the simonshaven case. Topics in Cognitive Science, 1–20. https://doi.org/10.1111/tops.12429
  • Volbert, R., & Steller, M. (2014). Is this testimony truthful, fabricated, or based on false memory? Credibility assessment 25 years after steller and köhnken (1989). European Psychologist, https://doi.org/10.1027/1016-9040/a000200
  • Vredeveldt, A., van Rosmalen, E. A. J., van Koppen, P., Dror, I. E., & Otgaar, H. (2022). Legal psychologists as experts: Guidelines for minimizing bias. Psychology, Crime & Law, 1–25. https://doi.org/10.1080/1068316X.2022.2114476
  • Vrij, A. (2005). Criteria-based content analysis: A qualitative review of the first 37 studies. Psychology, Public Policy, and Law, 11(1), 3–41. https://doi.org/10.1037/1076-8971.11.1.3
  • Vrij, A., Kneller, W., & Mann, S. (2000). The effect of informing liars about criteria-based content analysis on their ability to deceive CBCA-raters. Legal and Criminological Psychology, 5(1), 57–70. https://doi.org/10.1348/135532500167976
  • Wagenmakers, E., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638. https://doi.org/10.1177/1745691612463078