ABSTRACT
The current study examined the concurrent and predictive validity of four families of lineup-fairness measures – mock-witness measures, perceptual ratings, face-similarity algorithms, and resultant assessments (assessments based on eyewitness participants’ responses) – with 40 mock crime/lineup sets. A correlation analysis demonstrated weak or non-significant correlations between the mock-witness measures and the algorithms, but the perceptual ratings correlated significantly with both the mock-witness measures and the algorithms. These findings may reflect different task characteristics – pairwise similarity ratings of two faces versus overall similarity ratings for multiple faces – and suggest how to use algorithms in future eyewitness research. The resultant assessments did not correlate with the other families, but a multilevel analysis showed that only the resultant assessments – which are based on actual eyewitness choices – predicted eyewitness performance reliably. Lineup fairness, as measured using actual eyewitnesses, differs from lineup fairness as measured using the three other approaches.
Open Scholarship
This article has earned the Center for Open Science badges for Open Data and Preregistered. The data and materials are openly accessible at https://osf.io/swdfu/ and https://bit.ly/3zaJQMk.
Author note
Much of this study occurred while the second author was at Queen Margaret University, Edinburgh, UK.
This study was preregistered (see https://bit.ly/3zaJQMk).
The data file and analysis code are available at https://osf.io/swdfu/, and materials are available from the corresponding author upon reasonable request.
Author contributions
All persons who meet authorship criteria are listed as authors. The contributions that each author made to the current study and manuscript are described below.
The first author: Conceptualization, Funding acquisition, Methodology, Formal analysis, Visualization, and Writing-original draft.
The second author: Conceptualization, Methodology, Software, Data collection, and Writing-review & editing.
The third author: Conceptualization, Methodology, Supervision, and Writing-review & editing.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 The list of searched articles published from 2020 to 2022 is provided as Supplemental Material 1.
2 We pre-registered our analytic approach here: https://bit.ly/3zaJQMk.
3 The authors used the dual-video & single-lineup paradigm (Oriet & Fitzgerald, Citation2018). That is, they selected four versions of a videoed event. Each video depicted the same event with a different person as the target. Two pairs of videos were selected such that each pair comprised targets that were similar looking. They then created a TP lineup for each of the four targets; each TP lineup was used as the TA lineup for the other member of a target’s pair.
4 For the mock-witness paradigm and perceptual ratings, we managed data quality from Mturk/CloudResearch based on the following steps. (1) We only allowed vetted high-quality participants who had passed CloudResearch's attention and engagement measures into the study. (2) We applied CloudResearch's additional data quality settings to exclude anyone that we previously identified as a bad participant. We blocked ‘suspicious geocode locations’ and ‘duplicate IP addresses’ and used the ‘verify worker country and state location’ option. (3) We excluded participants who reported responding randomly, cheating, or using a mobile device. We also included an admonishment that their payment or credit would not be affected so there was no reason for them to lie. (4) We excluded bots and previews. (5) We excluded people who did not complete the study. (6) We excluded people who reported relevant technical difficulties.
5 When a suspect was not chosen by mock-witnesses, his choosing frequency was adjusted to 0.5 to calculate estimates of the mock-witness indices.
6 We selected the ten features based on our preliminary examination. In the examination, we reviewed eighteen research papers on eyewitness descriptions, and listed features commonly reported in the papers. Each of the eighteen papers is included in the reference list of the present study with an asterisk. The features were sex/gender, age, hair, clothing, height, weight, body build/shape, race, and complexion. However, some of the features are impossible for people to make ratings about given the stimuli we have which only show faces (e.g. height, weight, clothing, etc.). Thus, we replaced them with other features which are frequently described by mock-witnesses, such as hair length, facial hair, nose size, and so on.
7 For ease of computation, we calculated the estimates of Euclidean Distance, assuming the equal weight on each feature. The computation may be elaborated with different weights on each feature.
8 Although the analysis with the resultant mock-witness indices was not part of our preregistration, we included them based on the suggestions of two anonymous reviewers.
9 However, given that researchers may be interested in the degree of bias in TP lineups and that it is possible to hold memory constant experimentally, we additionally calculated resultant assessments of TP lineup fairness based on Penrod (Citation2003)’s classification of guessers and reliable eyewitnesses for interested readers. The calculation methods are available in Supplemental Material 3.
10 Even when including estimates of the resultant mock-witness measures for TP lineups in the same correlation analysis, the correlation patterns did not considerably differ from those in (see Supplemental Material 3).
11 Because estimates of the resultant mock-witness measures were calculated only in TA lineups, those measures were excluded from the moderation analysis. However, an additional multilevel analysis including the resultant mock-witness measures of TP and TA lineups demonstrated that the resultant Lineup Size indices predicted suspect and filler ID rates only in TP lineups, whereas the resultant Lineup Bias indices predicted those DVs in both TP and TA lineups (see Supplemental Material 3).
12 We would like to express thanks to an anonymous reviewer for raising this point.
13 The distribution of major variables in the meta-analytic review is follows: MTP = 4.08, SDTP = 1.62, and MTA = 3.51, SDTA = 1.24 for Tredoux’s E; MTP = 3.66, SDTP = 1.24, and MTA = 5.39, SDTA = 1.24 for Effective Size; MTP = 0.45, SDTP = 0.15, and MTA = 0.25, SDTA = 0.17 for suspect ID rates; MTP = 0.27, SDTP = 0.16, and MTA = 0.31, SDTA = 0.16 for filler ID rates; and MTP = 0.28, SDTP = 0.11, and MTA = 0.44, SDTA = 0.13 for rejection rates.