Abstract
While the concept of sampling variation is well understood by most researchers in the field of deception detection, previous studies have failed to account for the multiple sources of sampling variation present in typical experimental designs and use participant-level data as the dependant measure in analyses. These aggregated data, however, contain inherent biases that can mislead researchers. We argue that to appropriately test hypotheses and make inferences beyond a particular sample of participants, the decision-level data must be modelled directly. To illustrate how this can be achieved we provide an introduction to generalized linear mixed models (GLMMs) for the analysis of deception data and present Monte Carlo simulations demonstrating both the seriousness of the inherent biases present in participant-level data and the benefits of the GLMM approach. These simulations suggest that the empirical Type 1 and Type 2 error rates associated with main effects testing in deception research may be as high as 35% when data are aggregated ‘by-judge’ and as high as 60% when data are aggregated ‘by-sender’, respectively. When decision-level data are modelled directly, however, these rates are likely to be close to nominal levels (6% and 28%, respectively). Implications for past and future research are discussed.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Kristy A. Martire http://orcid.org/0000-0002-5324-0732
Notes
1. In an analysis using by-sender data, the design would be specified as a 2×2 fully between-subjects factorial design where senders either tell the truth or tell a lie (between-subjects factor) while being interviewed with one of the two protocols; either the new protocol or the old protocol (between-subjects factor).
2. Also of interest would be the interaction between interview protocol and veracity, but for the sake of clarity we do not discuss this here.
3. We note that different authors use different notations. Here we have followed the notation used by Goldstein (Citation2003).
4. Calculating precise p values for fixed effects estimates produced by GLMMs is difficult, given the uncertainty associated with identifying the correct degrees of freedom for the t and F distributions that the p values are based on. While there are numerous approximations available, for the purposes of this paper we can reasonably expect the t-distribution to approximate the normal distribution (the data set is fairly large and balanced), thus we can assume a coefficient is ‘significant’ if its t-value is greater than 2 (see Baayen, Davidson, & Bates, Citation2008).