3,267
Views
2
CrossRef citations to date
0
Altmetric
Brief Articles

The language of lies: a preregistered direct replication of Suchotzki and Gamer (2018; Experiment 2)

, &
Pages 1310-1315 | Received 17 Jul 2018, Accepted 25 Nov 2018, Published online: 02 Dec 2018

ABSTRACT

Is lying in a different language easier or more difficult? The Emotional Distance and the Cognitive Load hypothesis give competing answers. Suchotzki and Gamer measured the time native German speakers needed to initiate honest and deceptive answers to German and English questions. Lie-truth differences in RTs were much smaller for the foreign compared to the native language. In our preregistered replication study in native Dutch speakers, we found that lie-truth differences in RTs were moderately smaller when participants were tested in English than when tested in Dutch. These findings indicate that people struggle with quickly retrieving the truth in another language, and that foreign language use may diminish lie-truth differences.

Is lying in a different language easier or more difficult? Speaking in another language is typically less emotional. Advertisements are judged to be less emotional when presented in a foreign language (Puntoni, de Langhe, & van Osselaer, Citation2009), and emotional stimuli elicit less pronounced autonomic nervous system activity when presented in a foreign language (Harris, Ayçiçeǧi, & Gleason, Citation2003). The emotional distance hypothesis predicts that the reduced emotionality associated with foreign language use may facilitate lying.

Speaking in another language is typically also cognitively demanding. According to the cognitive load theory of lying, this increased cognitive load may hamper lying. Liars, being mentally taxed already, are reasoned to have greater difficulties with the additional task than truth tellers, amplifying lie-truth differences. Several studies have indeed reported that lie-truth differences were amplified through imposing cognitive load, e.g. by having to tell a story in a reverse order or by being asked unexpected questions (Vrij & Granhag, Citation2012).

Suchotzki and Gamer (Citation2018) recently found smaller lie-truth differences in response times (RTs) when native German speakers were tested in English than when tested in German. This effect was mostly driven by a slowing of truth RTs in the foreign language, and was observed across 3 studies. At the same time, the findings conflict with the popular cognitive load theory of lying. The findings also contrast with a study that found a similar lie-truth difference in RTs for native versus non-native speakers (Verschuere & Kleinberg, Citation2017). Language effects in that study, however, were confounded by ethnicity and power was reduced by the use of an imbalanced, between-subjects design (i.e. 95 native speakers vs 519 non-native speakers). We therefore set up a preregistered, direct replication of Suchotzki and Gamer (Citation2018), Experiment 2. We chose Experiment 2 because the original authors argued it had higher ecological validity than their Experiment 1, and because the validity of RTs in their Experiment 3 may have been hampered by the concurrent physiological recordings. We chose to follow the design of Experiment 2 as closely as possible, with the most notable difference being participants country of residence and language. This allowed to ascertain that the original effect indeed originates from foreign language use and is not restricted to a German-English effect. We assessed native Dutch speakers in Dutch and English. From the emotional distance hypothesis it follows that foreign language use would diminish the emotionality of lying, reducing lie RTs, thereby diminishing lie-truth differences. The cognitive load hypothesis in contrast predicts that foreign language use would particularly hamper the more effortful process, lying, thereby increasing lie RTs and increasing lie-truth differences. Here we test the directional hypothesis from SG18 that lie-truth difference in RTs will be smaller when participants are tested in a foreign language than when tested in their native language.

Method

The study was approved by the local ethical committe and registered as number 2018-CP-8840. The method was approved by the first author of the original report, and preregistered: https://osf.io/z9kme/register/565fb3678c5e4a66b5582f67. All materials, data, and analytic scripts are available at https://osf.io/x4rfk/. We report all measures in the study, all manipulations, any data exclusions, and the sample size determination rule.

Participants

Using G-POWER with α =.05 and β = .95 for the target effect d = .97Footnote1 [95% CI: .58; 1.35], we found that we needed 13 participants for the paired sample t-test to be able to pick up a significant effect. To assure sufficient power we ran G-POWER using β = .99 for the lower bound of the 95% confidence interval surrounding the original effect (d = .58), showing we need n = 49 to pick up the effect of interest if it exists. We therefore planned to collect data until we had 49 participants, after exclusions. However, due to a misunderstanding of the preregistered exclusion criteria within the research team, we have 63 included participants.

Sixty-seven individuals were recruited at the University of Amsterdam. Participants self-identified as (1) being Dutch native speaker, (2) having a minimum 4 years of learning English at school, and (3) not being raised English-Dutch bilingually. Participants were rewarded with 10 euro or 1 credit per hour. We excluded 3 participants with less than 50% valid trials in one of the four conditions considered in our analyses (lie-native, truth-native, lie-nonnative, truth-nonnative) after the exclusion of errors and RT outliers. Data of 1 participant was deleted at the request of the participant. The final sample consisted of 63 participants (66.66% female) with a mean age of 23.29 years (SD = 7.32; range: 18–61). Fifty-nine per cent of the sample majored in psychology.

Procedure

After providing informed consent, participants provided their age, gender, major, mother tongue, and self-estimated level of English according to the Common European Framework of Reference for Languages (North, Citation2014), containing the levels A1, A2, B1, B2, C1, C2 that we coded as 1-6, respectively. Participants then performed the Windows© speech recognition wizard which required verbally repeating several sentences. Speech recognition was followed by the lie test. Afterwards, they were given a questionnaire in which they were asked to rate each question (presented once in Dutch and once in English) as being either emotional or not. Furthermore, they were asked to indicate the true answer to each question (on average participants reported yes as the true answer for 49.53% of the questions; SD = 6.72; range: 34.82–63.39%).

Additionally, and outside the scope of the direct replication, participants were asked to indicate their self-estimated level of German.Footnote2 Finally, participants performed the Lexical Test for Advanced Learners (LexTale), adminstered via MatLab, assessing language proficiency for English, then for German, and finally for Dutch. When time was running out, the Dutch LexTale was not administered (leaving n = 58).

Lie test

During the lie task, 56 different questions (e.g. “Is your mother from Japan?”) were presented via headphones, prerecorded by a Dutch native speaker who was highly fluent in English. Responses were given verbally and transcoded using the speech recognition function of Inquisit 5. The intertrial interval between the given response and the presentation of a new question was set to vary randomly between 1000, 1100, 1200, 1300, 1400, and 1500 milliseconds. Half of the questions were chosen to be emotional (e.g. “Have you ever lost someone you love?”) and the other half to be neutral (e.g. “Do you have an aquarium at home?”).Footnote3 Each question had a Dutch and an English version. Every question had to be answered either truthfully or deceitfully, depending on the colour of a dot in the middle of the screen, presented after hearing the question. The dots were either blue or yellow. Colour assignment to truth-telling and lying conditions was counterbalanced across participants. Questions were assigned randomly and there were 224 trials per participants, with 28 trials in each of 8 experimental conditions (lie-native Dutch, truth-native Dutch, lie-nonnative Dutch, truth-nonnative Dutch, lie-native English, truth-native English, lie-nonnative English, truth-nonnative English). The different trial types were presented in random order using the “noreplacenorepeat” function of Inquisit Software 5 that “Randomly selects without replacement or consecutive selection of a value”.

LexTale

The LexTale (Lemhöfer & Broersma, Citation2012) is a brief vocabulary test. For each of 60 stimuli, presented one by one on the computer screen, the participant judges whether the stimulus is an existing word or not. The test consists of 20 pseudowords (e.g. crumper) and 40 existing words (e.g. savoury). The LexTale score is the percentage of correctly judged stimuli, calculated as ((number of words correct/40*100) + (number of non-words correct/20*100)) / 2. Each LexTale tests for one language and uses language matched non-words. We used the English, German, and Dutch LexTales.

Deviations from preregistered protocol

There were 3 deviations from our Preregistered Protocol. First, due to misunderstanding of the exclusion criteria within the research team, instead of collecting data until we reached 49 included participants, we have 63 included participants. Second, during piloting we noticed that it was necessary to run the speech recognition wizard for each participant to obtain reliable recognition of speech. Third, due to a miscommunication, the questions were verbalised by a Dutch native speaker who was very fluent in English (self-rated: C2; LexTale English score: 98.75%) rather than someone who was raised bilingually (i.e. having both Dutch and English as native language), as requested by original authors.

Deviations from original study

Our preregistration analysis plan deviated in two minor ways from the original study. First, we only considered behavioural errors as errors (i.e. saying No when Yes answer was required; as is common for the Sheffield Lie Test; see e.g. Debey, Verschuere, & Crombez, Citation2012) thereby including trials where participants gave the correct answer in the wrong language (i.e. “Yes” instead of “Ja”). This was a fortunate decision, because we found the voice recognition not systematically coding the correct language (i.e. code “Yes” as “Ja”). Second, RTs were excluded as outliers when deviating more than 2.5 SDs from the mean per participant per condition. Whereas the original authors considered condition to involve all 8 cells of their 2 × 2 × 2 design, we ignored stimulus emotionality in our preregistered analyses hence only considered the 4 cells of interest (Dutch-Truth, Dutch-Lie, English-Truth, English-Lie) to define outliers. In the exploratory analyses, we report results when analysing the results following the original.

Results

Descriptive statistics and manipulation checks (not preregistered)

The manipulation check confirmed that participants were more proficient in Dutch than in English, for both self-report as well as for the LexTale, see .

Table 1. Objective and subjective proficiency in Dutch and English.

Preregistered analysis

The time duration between question offset and microphone triggering was the measure of the response time expressed in milliseconds. Behavioural errors – answers that did not conform to the required veracity condition based upon the ground truth provided by the participants (10.10% of all trials) – were excluded from RT analysis. Exploratory analyses on behavioural errors can be found on https://osf.io/x4rfk/. Additionally, RT outliers deviating more than 2.5 SDs from the mean per subject and per each of the 4 conditions of interest (Dutch-truth, Dutch-lie, English-truth, English-lie) were excluded from data (2.74% of all trials).

As is clear from , the paired sample t-test showed that the lie-truth difference in RTs for the non-native language (M = 155.59, SD = 195.30) was smaller than the lie-truth difference in RTs for the native language (M = 307.94, SD = 268.98), t(62) = 4.26, p < .001, d = .54 [95% CI: .27, .80]. As a rule of thumb, Cohen (Citation1988) proposed .20, .50, and .80 as thresholds for “small”, “moderate”, and “large” effects, respectively. This implies that the observed language effect is of moderate size. Another way to describe and interpret this language effect is that for 78% of our participants the lie-truth difference for the non-native language was smaller than the average lie-truth difference for the native language.

Table 2. Average response time (in ms) for lying and truth-telling to emotional and neutral questions, presented in English or Dutch.

Exploratory analyses

While the focus of the replication was on the lie effect for native vs non-native language, the full ANOVA including all factors of the design is of interest. The 2 (Language: native vs. foreign) × 2 (Veracity: truth vs. lie) × 2 (Emotionality: emotional vs neutral) repeated measures ANOVA on RTs showed main effects for Veracity F(1,62) = 95.57, p < .001, ηp2 = .61. < .001, and Language F(1,62) = 5.93 p = .018, ηp2 = .09, that fell under the predicted Veracity by Language interaction, F(1,62) = 18.27, p < .001, ηp2 = .23, mimicking the result of the preregistered t-test. The ANOVA further showed a main effect of Emotionality F(1,62) = 12.01, p = .001, ηp2 = .16, with greater RTs to emotional than to neutral questions. This effect was qualified by language, with a significant Language by Emotionality interaction, F(1,62) = 11.08, p = .001, ηp2 = .15, indicating that the emotionality effect was reduced in the foreign language. The Veracity × Emotionality interaction was not significant, F(1,62) = 2.47, p = .121, ηp2 = .04, neither was the 3-way interaction F(1,62) = 0.03, p = .869, ηp2 < .01. What drove the smaller lie effect for foreign language evidenced by the predicted Veracity by Language interaction? Truth RTs in the foreign language were greater than truth RTs in the native language t(62) = 4.70, p < .001, d = .59 [.32, .86]. Lie RTs did not differ with language, t(62) = .67, p = .50, d = .08 [−.16, .33].

Analyses using outlier and exclusion criteria of original study

We reanalysed the data, taking into account stimulus emotionality when calculating RT outliers, and including only trials that were answered in the same language as question, as was done by the original authors. This led to excluding 25 participants, leaving n = 41. Taking into account the imperfect speech recognition, the lie effect for non-native language (M = 127.29, SD = 219.60) was again smaller than that for the native language (M = 292.64, SD = 258.36), t(40) = 3.25, p < .01, d = .51 [.14, .88].

Outliers

When inspecting the data, we found one participant to have an egregious difference between the lie effect in the native (Dutch: +1245 ms) and the foreign language (English: −397 ms), exceeding 5SDs from the average difference. Removing the participant showed that the key difference (lie effect native vs lie effect non-native) remained highly similar, t(61) = 4.76, p < .001, d = .60 [.35; .86].

Association with language proficiency

Pearson correlations between the language effect (lie effect non-native minus lie effect native) and English LexTale score, r = −.08, p = .51, or self-rated English proficiency, r = −.01, p = .93 were negligible.

Mini meta-analysis

A meta-analytic combination of the current study and Suchotzki and Gamer (Citation2018; Experiment 2) using JASP software using a random effects model estimates the lie-truth difference between the native and the foreign language to be 152.71 milliseconds [95% CI: 111.94, 193.49].

Discussion

Is lying in a different language easier or more difficult? Suchotzki and Gamer (Citation2018) found that lie-truth differences in RTs were less pronounced when native German speakers were tested in English than when tested in German. Our direct replication in Dutch native speakers, tested in Dutch and English, served to assess the robustness of that finding. Inspired by the guidelines by Valentine et al. (Citation2011) we use several, preregistered, criteria to judge the replication results. (1) Our replication study produced a statistically significant effect in the same direction as the original. (2) The effect size observed in the replication, however, was smaller than that observed in the original study: The point estimate of the effect size (d = .54) obtained in the replication study fell below the lower limit of the 95% confidence interval of the original effect [i.e.; .58, 1.35]. (3) Nonetheless, the subjective estimate of all members of the replication team was that the results replicated the original effect. In sum, while the effect of the replication study is smaller than that of the original, the replication study, when running in a different country and using another native language, also found that speaking in another language reduces the lie-truth difference in RTs.

As has been observed for many other studies (Lehrer, Citation2010), the replication effect was smaller than the original effect. We see at least four possible explanations. First, it is possible that our participants were more proficient in English than participants in the original study. Indeed, self-judged proficiency in English was 0.96 point higher on the 6-point Common European Reference Framework scale. Then again, as also observed in the original study, we found no association between self-judged language proficiency and the amount in which the lie-truth difference decreased from native to the foreign language. To further explore this possibility, we recommend future research to have greater variation in language proficiency (e.g. by testing on more languages and/or by including participants with lower proficiency). Second, biases by the original investigators may have inflated their estimation of the effect. Preregistration can protect researchers from such biases in future studies. Third, when excluding one outlier with an extreme language effect, the effect size, d = .60, fell just within the 95%CI of the original effect. Fourth, as reliability determines the lower limit of validity, it is possible that the reliability of the replication was lower than that of the original. The correlation between the lie-truth difference in the native vs foreign language – which is entered into the equation for calculating the targeted effect size – was indeed lower in our study (r = .28) than in the original (r = .66). Differences in the functionality of the speech recognition software may be partly responsible for this difference, and future research may consider using keyboard responses (as in Suchotzki & Gamer, Citation2018; Experiment 1).

Despite its strengths – which include the preregistration of predictions, analyses, and inferential criteria – this study is not without its limitations. Most importantly, our study suffers from the same limitations as the original, including a lack of ground truth and the limited ecological validity of the computerised deception paradigm.

That lie-truth differences in RTs are smaller in a foreign than in the native language have applied as well as theoretical implications. Imposing cognitive load may not always have the intended effect of increasing lie-truth differences (see also Verschuere, Köbis, Bereby-Meyer, Rand, & Shalvi, Citation2018). These findings fuel the concern that cognitive load techniques can backfire, making truthtellers look more as liars, and warn that recommending to implement cognitive load techniques in practice may have been premature. Theoretically, it is of interest what drives the observed effects. As also observed in the original studies, we found no evidence that lie RTs would decrease by foreign language use, a finding that speaks against the emotional distance hypothesis which posits that foreign language use would diminish the emotionality of lying thereby reducing lie RTs. It is also not in line with the cognitive load hypothesis that foreign language use would particularly affect the more effortful proces, lying, thereby increasing lie RTs. Suchotzki and Gamer (Citation2018) reasoned that the extra cognitive load associated with foreign language use (that is apparent for truth telling) may be cancelled out for lying by the emotional distance effect. A dual process explanation (Kahneman, Citation2003) could also explain the findings if foreign language use not just hampers automatic (truthful) responding, but at the same time faciliates effortful (deceptive) responding. A process dissociation approach (Jacoby, Citation1991) may help to shed light on how speaking in another language affects automatic versus effortful processing.

To conclude, for both native German speakers as well as native Dutch speakers, the validity of RT-based deceptions paradigms – used mostly for the study of deception but having applied potential for lie detection (Meijer, Verschuere, Gamer, Merckelbach, & Ben-Shakhar, Citation2016) – is hampered by using a non-native test language. This is worrysome, because in an ever more global world, lie tests are more often conducted in a foreign language (often English), and the findings indicate that lie-truth discrimination may be less accurate for statements in a foreign language than for statements in the speakeŕs native language.

Acknowledgments

The authors would like to thank Kristina Suchotzki for sharing materials and data and for discussing and approving the protocol of the replication attempt. We also like to thank Marieke Pluis, Franziska Yasrebi-de Kom, and Dian van Huijstee for their help in developing the materials. Bruno Verschuere has a long and ongoing collaboration with Kristina Suchotzki and Matthias Gamer.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. Cohen’s d is the standardised paired difference for within-subjects designs (Cohen, Citation1988). In the original study ηp2 was reported, and Cohen’s d was provided to us by authors. The excel sheet used for calculating Cohen’s d can be found on https://osf.io/x4rfk/.

2. We assessed proficiency in German for possible follow-up research, anticipating it would be associated with lower and more variable proficiency than English.

3. Conducting a direct replication, we chose to use the stimuli of the original study and include stimulus emotionality as a factor in the design. As the original study found no Veracity × Emotionality interaction and effects of stimulus emotionality are not the focus of our replication attempt, we excluded stimulus emotionality from the confirmatory analyses. We report on stimulus emotionality in the exploratory analyses.

References

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
  • Debey, E., Verschuere, B., & Crombez, G. (2012). Lying and executive control: An experimental investigation using ego depletion and goal neglect. Acta Psychologica, 140(2), 133–141. doi: 10.1016/j.actpsy.2012.03.004
  • Harris, C. L., Ayçiçeǧi, A., & Gleason, J. B. (2003). Taboo words and reprimands elicit greater autonomic reactivity in a first language than in a second language. Applied Psycholinguistics, 24, 561–579. doi: 10.1017/S0142716403000286
  • Jacoby, L. L. (1991). A process dissociation framework: Separating automatic from intentional uses of memory. Journal of Memory and Language, 30(5), 513–541. doi: 10.1016/0749-596X(91)90025-F
  • Kahneman, D. (2003). Maps of bounded rationality: Psychology for behavioral economics. The American Economic Review, 93(5), 1449–1475. doi: 10.1257/000282803322655392
  • Lehrer, J. (2010). The truth wears off: Is there something wrong with the scientific method? The New Yorker. doi: 10.1037/e672852010-001
  • Lemhöfer, K., & Broersma, M. (2012). Introducing LexTALE: A quick and valid lexical test for advanced learners of English. Behavior Research Methods, 44(2), 325–343. doi: 10.3758/s13428-011-0146-0
  • Meijer, E. H., Verschuere, B., Gamer, M., Merckelbach, H., & Ben-Shakhar, G. (2016). Deception detection with behavioral, autonomic, and neural measures: Conceptual and methodological considerations that warrant modesty. Psychophysiology, 53(5), 593–604. doi: 10.1111/psyp.12609
  • North, B. (2014). Putting the common European framework of reference to good use. Language Teaching, 47(2), 228–249. doi: 10.1017/S0261444811000206
  • Puntoni, S., de Langhe, B., & van Osselaer, S. M. J. (2009). Bilingualism and the emotional Intensity of advertising language. Journal of Consumer Research, 35(6), 1012–1025. doi: 10.1086/595022
  • Suchotzki, K., & Gamer, M. (2018). The language of lies: Behavioral and autonomic costs of lying in a native compared to a foreign language. Journal of Experimental Psychology: General, 147(5), 734–746. doi: 10.1037/xge0000437
  • Valentine, J. C., Biglan, A., Boruch, R. F., Castro, F. G., Collins, L. M., Flay, B. R., … Schinke, S. P. (2011). Replication in prevention science. Prevention Science, 12(2), 103–117. doi: 10.1007/s11121-011-0217-6
  • Verschuere, B., & Kleinberg, B. (2017). Assessing autobiographical memory: The web-based autobiographical implicit association test. Memory (Hove, England), 25(4), 520–530. doi: 10.1080/09658211.2016.1189941
  • Verschuere, B., Köbis, N., Bereby-Meyer, Y., Rand, D. G., & Shalvi, S. (2018). Taxing the brain to uncover lying? Meta-analyzing the effect of imposing cognitive load on the reaction-time costs of lying. Journal of Applied Research in Memory and Cognition, 7(3), 462–469. doi: 10.1016/j.jarmac.2018.04.005
  • Vrij, A., & Granhag, P. A. (2012). Eliciting cues to deception and truth: What matters are the questions asked. Journal of Applied Research in Memory and Cognition, 1(2), 110–117. doi: 10.1016/j.jarmac.2012.02.004