Optimizing CBCA and RM research: recommendations for analyzing and reporting data on content cues to deception: Psychology, Crime & Law: Vol 27 , No 1

ABSTRACT

For more than a century, verbal content cues to deception have been investigated to assess the credibility of statements in judicial contexts. Among the many cues investigated, Criteria-based Content Analysis (CBCA) and criteria based on the reality monitoring (RM) approach have been most prominent. However, research with these cues used as ‘tools’ has not fully exploited their potential. We critically discuss statistical approaches used in past research and recommend a series of 12 principles or guidelines researchers should follow to design, analyze and report future studies on detecting deception with verbal content cues. To illustrate some of these points, we present analyses from two separate studies: A quasi-experiment in a field setting conducted with adults with intellectual disabilities who truthfully or deceptively described a negative autobiographical event to an interviewer, and a large-scale simulation study where adults wrote an account of either an experienced or an invented significant life event. Accounts in both studies were rated with CBCA and RM criteria, as well as by ‘naive’ raters. The guidelines should help to increase the quality and transparency of research in this area.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Notes

1 Actually, we believe that Details characteristic of the offense could have been used in many more studies if this criterion had been defined more generally as Details characteristic of the event, which could be adapted to specific classes of events. For example, if someone were to describe a scuba diving experience, an example of an event specific detail would be a "buddy check" which every diving pair is supposed to perform before entering the water. Only scuba divers (i.e. experts) would be likely to know this.

2 If an item is predicted by theory to be negatively related, it has to be reversely scored.

3 The present study is a complete re-analysis of the original data from two previous studies by Manzanero et al. (Citation2015, Citation2019). None of the results presented here were presented there.

4 IQ was measured with the Wechsler Adult Intelligence Scale (WAIS-IV; Wechsler, Citation2008).

5 Some authors have arrived at a summary score by counting the specific occurrences of each criterion in a statement. For the resulting frequency distribution of a given criterion, the median is calculated and a 0 is assigned to all participants below the median, and a 1 to all participants above it. These 0/1 values for each criterion could then be added up to arrive at a summary score. We do not recommend this practice for the following reasons: (1) By dichotomizing a criterion, information about the frequency (or intensity) of occurrence is lost. (2) Reliability of coding using kappa (not percentage agreement) is likely to be low and depends on the base rate of each criterion (Hauch et al., Citation2017). (3) Because each criterion has either a value of 0 or 1, the resulting summary score weighs each criterion equally, which ignores the possibility to give higher weights to criteria demonstrated to have higher validities in a given domain of application (for a similar argument, see Maier et al., Citation2018). (4) This method does not address the problem of criteria that are negatively associated with truth status, which would have to be subtracted from a summary score.

6 Note that in Sporer's (Citation1997) study, 13 CBCA criteria and eight RM criteria were used for classifying N = 72 accounts, thus by far not using up all degrees of freedom as in the small sample of Study 1 reported here.

7 The study by Sporer and Küpper (Citation1995) concerned only RM criteria. Here, we present new, additional data from ratings and credibility judgments obtained with these accounts not published previously.

8 Actually, the ratings were conducted with the 39 items of the Judgments of Memory Characteristics Questionnaire (reprinted in Sporer, Citation2004), which were then integrated into the eight RM scales based on factor analyses conducted with self-ratings of the 200 accounts separately for lies and truths (see Sporer & Küpper, Citation2004). In a pilot study with N = 40 accounts with the present and two additional raters, Spearman-Brown corrected inter-rater reliabilities were satisfactory for most criteria except for Cognitive operations (see Sporer, Citation2004, Table 4.1).

9 The naive ratings A and B were taken from a published German study by Reinhard et al. (Citation2002) and from an unpublished conference presentation by Sporer et al. (Citation1995). The naive ratings C and D were from an unpublished conference presentation by Sporer and Bursch (Citation1996). The RM ratings were published in German by Sporer and Küpper (Citation1995), while the CBCA ratings E and F are new data.

10 There is some evidence for a truth bias, that is, accuracy for truths is more often higher than accuracy for lies (though not always significantly so). Judgments of truths are also more often above 50% chance level, indicating a veracity effect.

11 Here we only provide the formulae for Cohen's d, which is general practice in reports of primary research. Cohen's d has a small upward bias compared to Hedges' g_u which we reported here. The corrections for bias also affect the estimates of the variances of g_u and the confidence intervals. The correction formulae for between-participants and repeated measures designs also differ (see Borenstein, Citation2009; Lipsey & Wilson, Citation2001).

12 The variety of topics in Study 2 were chosen to test whether CBCA criteria are also applicable to events reported by adults that do not necessarily fulfill these three characteristics. Thus, the events could be either positive or negative, and may or may not have been very emotional (see Sporer & Küpper, Citation2004; Sporer & Sharman, Citation2006, on the effects of emotionality and valence).

13 In Study 2 reported here, CBCA and RM codings were made by different raters and hence the correlations observed should more accurately reflect overlapping definitions of different criteria sets.

14 Actually, this criterion has been used for over a hundred years in Germany and other central European countries by forensic experts in their expert testimony (e.g. Stern, Citation1903–1906, Citation1926; see Sporer, Citation2008).

15 Admittedly, sample size in our Study 1 was very small (N = 29), relative to the number of predictors, to conduct MDAs. This is one reason why running Study 2 with a much larger sample (N = 200) was so important. Still, we showed in both studies that adding an increasing number of predictor variables (even strings of random numbers) until exhausting all degrees of freedom artificially increased classification rates.

16 In one of the most widely used classical texts on multiple regression analysis, Cohen et al. (Citation2003) elaborate on the statistical procedures to construct internally consistent psychological (sub-)scales using CITCs. When attempting to create criteria or scales from different approaches, factor analysis is generally used to reduce the number of items making up subscales measuring psychological constructs. The few studies that have done this so far show that CBCA (and RM) criteria are not a unidimensional construct but consist of several facets that need to be considered separately.

17 There is a related problem concerning individual criteria in those instances when different operationalizations or definitions of the same criteria are used in different studies.

18 An anonymous reviewer pointed out that in the risk assessment domain, actuarial judgments based on standardized risk assessment instruments are preferable to clinical, idiographic judgments, and questioned why it should be otherwise in the credibility assessment domain. The main reason is that risk assessment instruments often incorporate variables strongly related to the outcome according to empirical studies, some of which were conducted on very large data sets (e.g. Babchishin & Helmus, Citation2016). There is no wonder, then, that assessments based on these predictors are much more likely to be accurate than clinical impressions. Also, the direction of effect issue we emphasize here is sometimes side-stepped in the risk assessment literature where all predictors were re-scaled in the same direction (Babchishin & Helmus, Citation2016). Unfortunately, the situation is very different in the credibility assessment domain. Conducting large-scale studies is not possible, as in field situations ground truth often is unknown. Running laboratory simulation studies with several hundreds or thousands of participants is also not feasible. The extant small-scale (though numerous) studies often show only moderate or weak associations between credibility criteria and truth status (such as in the two studies reported here), and meta-analyses on verbal and nonverbal deception cues show that a large number of moderator variables (such as the variables mentioned in the text, which are considered within the SVA framework) have a strong influence, either in isolation or through complex interactions, on credibility indicators (see, e.g. Amado et al., Citation2015, Citation2016; DePaulo et al., Citation2003). In any case, SVA is not akin to unstructured clinical judgment in violence risk assessment, but to structured professional judgment alongside actuarial classification applied to an individual case.

19 It is an empirical question to what extent the availability of CBCA criteria in many publications and on the Internet may have increased misunderstandings and improper use.

Optimizing CBCA and RM research: recommendations for analyzing and reporting data on content cues to deception

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Optimizing CBCA and RM research: recommendations for analyzing and reporting data on content cues to deception

ABSTRACT

Disclosure statement

Correction Statement

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature