2,356
Views
5
CrossRef citations to date
0
Altmetric
Research Article

Symptom coaching and symptom validity tests: An analog study using the structured inventory of malingered symptomatology, Self-Report Symptom Inventory, and Inventory of Problems-29

ORCID Icon, ORCID Icon & ORCID Icon

Abstract

In this pilot and exploratory study, we tested the robustness of three self-report symptom validity tests (SVTs) to symptom coaching for depression, with and without additional information available on the Internet. Specifically, we divided our sample (N = 193) so that each subject received either the Structured Inventory of Malingered Symptomatology (SIMS; n = 64), the Self-Report Symptom Inventory (SRSI; n = 66), or the Inventory of Problems-29 (IOP-29; n = 63). Within each of the three subgroups, approximately one third of participants were instructed to respond honestly (Genuine Condition, nSIMS = 21; nSRSI = 24; nIOP-29 = 26) and approximately two-thirds were instructed to feign depression. One half of the feigners were presented with a vignette to increase their compliance with instructions and were given information about symptoms of depression (Coached Feigning, nSIMS = 25; nSRSI = 18; nIOP-29 = 21), and the other half were given the same vignette and information about symptoms of depression, plus two Internet links to review before completing the test (Internet-Coached Feigning, nSIMS = 18; nSRSI = 24; nIOP-29 = 16). Overall, the results showed that the genuine conditions yielded the lowest total scores on all three measures, while the two feigning conditions did not significantly differ from each other. Looking at the detection rates for all feigning participants, all three measures showed satisfactory results, with IOP-29 performing slightly better than SIMS and SIMS performing slightly better than SRSI. Internet-Coached Feigners scored slightly lower on all three measures than feigners who were coached without the Internet links. Taken together, the results of this preliminary and exploratory study suggest that all three SVTs examined are sensitive to feigned depression even in the presence of symptom coaching, both with and without additional Internet-based information.

Whether physical or mental, feigning illness and/or symptoms occurs at a non-trivial rate in both civil (i.e., personal injury claims) and criminal settings. The Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition defines “the intentional production of false or grossly exaggerated physical or psychological symptoms motivated by external incentives” as malingering (DSM-5; American Psychiatric Association, Citation2013). The “external incentives” are a key part of the definition. This allows professionals (i.e., clinicians and researchers) to distinguish malingering from factitious disorder, which is motivated by internal incentives (Bass & Wade, Citation2019; Boskovic, Citation2019). External incentives related to malingering can be financial (e.g., compensation for disability), legal (e.g., avoidance of criminal prosecution, avoidance of civil obligations such as military service), and receipt of privileges (e.g., additional retakes for failed/missed exams) (Akca et al., Citation2020; see also Merten & Merckelbach, Citation2020; Sherman et al., Citation2020). However, due to the complexity of verifying incentives (see van Impelen et al., Citation2017), researchers generally prefer the terms “feigning” and “overreporting” to “malingering” because they do not involve assumptions about intentionality or possible incentives (Rogers, Citation2018b).

The exact prevalence (i.e., base rate) of feigning is difficult to determine and therefore tends to be based on estimates. One renowned study has shown that feigning was suspected in approximately 7–31% of more than 33,000 annual examinations by practitioners, depending on the setting (i.e., civil, criminal, or medical/psychiatric; Mittenberg et al., Citation2002; see also Dandachi-FitzGerald et al., Citation2020). Indeed, factors such as ambiguous definitions of feigning, referral context (e.g., clinical, forensic), criteria for detection, methods, and standardized measures used, among others, contribute to the variability of the frequency with which feigning occurs, ranging from 3% to 64% (Young, Citation2015). A review of the literature considering these contributing factors found a likely base rate of feigning to be 15% ± 15% (Young, Citation2015).

Prominent researchers and associations have emphasized the importance of paying attention to possible feigning and developing enhanced methods for detecting feigning (e.g., Bush et al., Citation2005; Chafetz et al., Citation2015; Heilbronner et al., Citation2009; Slick et al., Citation1999). Most commonly, the assessment of possible feigning is conducted using both Symptom Validity Tests (SVTs) and Performance Validity Tests (PVTs; Larrabee, Citation2012). PVTs essentially assess underperformance, following the rationale that feigners, when presented with a cognitive task (typically with a dichotomous response option of which only one is correct), sometimes deliberately choose the wrong answer(s) to present themselves as impaired (Fazio et al., Citation2015; Merten & Merckelbach, Citation2012, Citation2013). In contrast, SVTs are used to detect over-endorsement of genuine or fabricated symptoms, as well as the presentation of implausible combinations of symptoms. The rationale behind most SVTs is that individuals who are feigning tend to overendorse illness-related complaints and that they may overreport bizarre or extremely unlikely problems (Boskovic et al., Citation2020). In turn, this also has the potential to generate unlikely patterns of symptom endorsements.

Some SVTs are administered in the form of symptom inventories that examinees must complete on their own, while others take the form of a (semi)structured interview (for an overview, see Rogers, Citation2018a). The most commonly used stand-alone SVT is probably the Structured Inventory of Malingered Symptomatology (SIMS; Dandachi-Fitzgerald et al., Citation2013; Martin et al., Citation2015; Smith & Burger, Citation1997), which consists of 75 true/false items across five domains: Amnestic disorder, psychosis, low intelligence, neurological impairment, and affective disorders. The majority of items refer to extreme, bizarre, or atypical complaints, although some of them describe more common symptoms (Martin et al., Citation2015). The SIMS has been shown to be a psychometrically sound measure in a variety of samples (for a meta-analysis see van Impelen et al., Citation2014), and has high consistency across different cultures (Nijdam-Jones & Rosenfeld, Citation2017; see also Boskovic et al., Citation2017). However, it has some drawbacks, such as overestimation of feigning in certain psychopathological groups (for a systematic review, see van Impelen, Citation2014). In addition, it has been argued that the SIMS has a high face validity due to the easily recognizable bizarre quality of some items and also limited applicability in a civil context where people might frequently report symptoms related to pain or anxiety (Merten et al., Citation2016).

To overcome some of these limitations, a new measure was developed, the Self-Report Symptom Inventory (SRSI; Merten et al., Citation2016). The SRSI includes 107 items, most of which are divided into two main scales, one capturing genuine symptoms, and the other capturing pseudo-complaints. The bizarre quality of the SIMS items has been replaced with more plausible content, making the SRSI less recognizable as a symptom validity measure. In addition, each of the main SRSI scales includes five subscales to measure commonly reported psychological and physical complaints, which improves its use in the civil context as well. The SRSI includes items that capture cognitive problems, depression, anxiety/post-traumatic stress disorder (PTSD), nonspecific somatic symptoms, and pain symptoms. To date, research has provided strong empirical support for the use of the SRSI in a variety of cultures and contexts (see Giger & Merten, Citation2019; Merten et al., Citation2016; van Helvoort et al., Citation2019; for a review, see Merten et al., Citation2021).

Another relatively new stand-alone SVT is the Inventory of Problems-29 (IOP-29; Viglione et al., Citation2017). The IOP-29 includes 29 self-administered items that assess the credibility of psychiatric and cognitive symptom presentations. Specifically, the items tap into various symptom presentations associated with conditions such as depression, PTSD, cognitive impairment, schizophrenia, and combinations thereof. Since its introduction, many studies have aimed to test its psychometric properties and cross-cultural validity. The results of these studies have been particularly encouraging (e.g., Banovic et al., Citation2021; Carvalho et al., Citation2021; Giromini, Barbosa, et al., Citation2019; Giromini, Lettieri, et al., Citation2019; Ilgunaite et al., Citation2020; Roma et al., Citation2020; Šömen et al., Citation2021; Winters et al., Citation2020; for a quantitative review, see Giromini & Viglione, Citation2022).

Although each of the SVTs described above is successful in detecting feigning, coaching presents a problem that could threaten their success. Coaching implies the use of information about validity testing (e.g., through self-preparation) or the provision of information about validity measures (e.g., by third parties) that could confound the detection of feigning (Rogers & Bender, Citation2018). In a recent meta-analysis on neurocognitive testing, data showed that coaching about target symptoms was significantly more detrimental to the utility of SVTs than test coaching (i.e., providing information about the SVTs; Crişan et al., Citation2021). Some of the potential sources of coaching are (a) feedback provided by professionals (e.g., psychologists or psychiatrists) on symptom presentations or (b) attorneys themselves (see Essig et al., Citation2001), but also (c) the Internet (Suhr & Gunstad, Citation2007). Thanks to the Internet, information about various symptom presentations, clinical criteria for disorders, and physical or mental sequelae of deficits is readily available to anyone. Further, research has shown that the validity of PVT results was compromised by online information. Specifically, results showed that 8–26% of websites visited by researchers posed a threat (i.e., a moderate to high level threat) to the validity of test scores (Bauer & McCaffrey, Citation2006; see also Ruiz et al., Citation2002).

Given the impact of coaching on the validity of symptom inventories, it is important to determine the extent to which commonly used SVTs are robust to Internet-facilitated coaching. To our knowledge, no study has yet examined the effects of Internet coaching on SIMS, SRSI, and IOP-29 scores. However, several studies have investigated the psychometric robustness of SIMS, SRSI, and IOP-29 to other types of coaching. For example, Jelicic et al. (Citation2007) tested the utility of SIMS in detecting feigned cognitive decline between honest, naïve (i.e., uncoached), and coached participants. Results showed that SIMS correctly classified 86% of feigners, who were informed and warned about SVTs, suggesting that SIMS is resistant to coaching (Jelicic et al., Citation2011). In a recent study that included students either responding honestly or being coached to feign depression or ADHD, the SIMS showed perfect specificity (i.e., no false positive errors), but low to medium sensitivity (36–52%), depending on the cutoff point applied. Because feigning groups in this study received, in addition to disorder-specific symptom information, a warning that one or more tests were designed to detect feigning (Grant et al., Citation2020), the authors concluded that coaching instructions may result in lower sensitivity (see also Jelicic et al., Citation2011). Regarding the SRSI and coaching, Boskovic et al. (Citation2019) examined the ability of the SRSI to detect fabricated PTSD-related complaints. In addition to honest participants, the study included a group of college students who were instructed to feign PTSD without any additional information and a group of actors who were required to watch videos of people talking about trauma-related experiences before completing the SRSI. Findings showed that the SRSI performed well, correctly classifying 89% of the coached actors. Finally, the IOP-29 was also tested with respect to the effects of coaching. For instance, Gegner et al. (Citation2021) included honest participants and two groups of participants who were instructed to feign mild traumatic brain injury (TBI), half of whom also received a warning to avoid detection. The IOP-29 was able to detect over 90% of feigning participants, with only a small difference (approximately 2%) in favor of the coached participants. In summary, previous studies have supported the robustness of the SIMS, SRSI, and IOP-29 to coaching. However, no study has used the exact same coaching instructions for all three tests, and no study has focused specifically on Internet coaching.

In light of the above, the current study was designed with the goal of separately examining the SIMS, SRSI, and IOP-29 with respect to their robustness to coaching, using the same simulation paradigm and instructions for all three measures (see below). We chose these stand alone SVTs because they are frequently researched and used in practice (see Giromini et al., Citation2022). Specifically, we investigated whether symptom coaching and additional Internet coaching (i.e., providing additional Internet links to gain information concerning the disorder) had an impact on SIMS, SRSI, and IOP-29 scores. We included three conditions: one served as a control condition (Genuine condition); two included instructions to fake depression, information about symptoms of depression (i.e., symptom coaching), and a warning “not to overdo it” (see Appendix). In one of the two feigning conditions (Internet-Coached Feigning condition) but not in the other (Coached Feigning condition), participants additionally received Internet links to websites containing information concerning depression. We hypothesized that (i) participants in the Genuine condition would generate the most credible results on all three measures, meaning that they would not endorse psuedosymptoms or endorse them at a much lower rate, compared with the participants in the two feigning conditions. For the Internet-Coached Feigning condition, we made two mutually exclusive predictions because of the lack of previous research on this topic. That is, we speculated that the (ii) participants in the Internet-Coached Feigning condition could show more credible results on all three measures (i.e., lower endorsement of psuedosymptoms) compared to participants in the Coached Feigning condition. This is because, by using the Internet, participants who were faking depression could better understand which symptoms are characteristic of real depression and thus avoid endorsing pseudosymptoms. Yet, the available literature suggests that searching for health information online increases health anxiety (i.e., cyberchondria), leading to higher endorsement of pathological items in symptom inventories (e.g., Brown et al., Citation2020; Jungmann et al., Citation2020; Starcevic & Berle, Citation2013). We therefore also hypothesized that (iii) the Internet-Coached Feigning might actually lead to higher pseudosymptom scores due to an increased tendency to exaggerate/overreport.

Method

Participants

A total of 197 psychology students of the Erasmus University Rotterdam were recruited as participants, with the inclusion criterion being that they were pursuing their studies in English. Four students failed the check for inattentive responding, which we included in all of the used measures (see below). Thus, the final sample consisted of 193 students (88.6% female, Mage = 20.77, SD = 2.97; age range 18–44 years). The vast majority of participants were undergraduate students (99%); only two participants reported being master’s students. On average, participants rated their English proficiency as very good (M = 4.25, SD = .79). More specifically, the frequency of English proficiency ratings was as follows: 43.5% “native/excellent” (point 5), 42% “good” (point 4), 11% “average” (3), and 3.5% reported their proficiency as “poor” (2). Nobody indicated “terrible” (1) proficiency. Of the total seven students with poor proficiency, one was given SIMS (feigning), one SRSI (feigning), and five received IOP-29 (genuine group). This study was approved by the standing ethical committee of the Erasmus University Rotterdam, the Netherlands.

According to the research design (see below), participants were randomly assigned to one of three groups: SIMS (n = 64); SRSI (n = 66), or IOP-29 (n = 63). The random allocation of participants was performed automatically by the research platform used to conduct the study, i.e., Qualtrics. Within these groups, each person was randomly assigned to receive either Genuine, Coached Feigning, or Internet-Coached Feigning instructions (see for more details). Overall, participants’ ages did not differ between the three SVT groups, F(2, 192) = .82, p = .440, nor did participants differ in their self-reported English proficiency, F(2, 192) = 1.24, p = .293. However, the mental health ratings of participants who received different measures differed significantly, F(2, 192) = 5.15, p = .002. Specifically, participants in both the SIMS and SRSI groups had higher ratings, indicating better mental health, than participants in the IOP-29 group (p = .002 and p = .025, respectively). However, participants receiving SIMS and those with SRSI did not differ in their mental health ratings (p ≈ 1.00). Only one participant rated their mental health as extremely bad, but this person was assigned to the Genuine condition and was administered the SRSI.

Table 1. Age, English proficiency, and mental health check by group and condition.

Measures

Structured Inventory of Malingered Symptomatology (SIMS; Smith & Burger, Citation1997)

The SIMS consists of 75 true/false self-report items gauging probable feigning. SIMS includes five categories of psychopathology, each portrayed by 15 items: (1) Affective Disorders, (2) Neurological Impairment, (3) Amnesia, (4) Low Intelligence, and (5) Psychosis. The total number of endorsed symptoms represents the total SIMS score, which is the only important “credibility” index of the SIMS. Indeed, the test authors have indicated that the subscales of the SIMS are not useful for detecting feigned psychopathology, as they are only used to assess what type of psychopathology the respondent is attempting to feign once it has been determined that the total SIMS score exceeds the cutoff value (Widows & Smith, Citation2005). Furthermore, a high SIMS score does not indicate feigning per se. Rather, the SIMS is recommended as a screening tool for symptom overreporting (i.e., negative response bias), which may require further assessment in case of positive outcomes. Higher scores indicate unlikely symptomatology, with recommended cutoff scores of >14 or >16 (i.e., scores exceeding 14 or 16). In this study, the cutoff >16 was used because this cutoff point has received the most empirical support (Van Impelen et al., Citation2017). The internal consistency estimate of reliability (Cronbach’s alpha) of the SIMS was .90. For this study, we included only the total score in our analyses, consistent with the recommendations in the SIMS manual.

Following recommendations to include checks for inattentive responding in survey measures (e.g., Meade, & Craig, Citation2012; Ziegler, Citation2015), we chose to include one check for approximately 20 items. Therefore, three additional items (e.g., Please respond to this question with “T”) were randomly inserted into the SIMS. Only two participants failed this check, and they were therefore excluded from the final sample.

Self-Report Symptom Inventory (SRSI; Merten et al., Citation2016)

The SRSI consists of 100 items divided into two main scales: the genuine symptoms scale and pseudosymptoms scale. Genuine symptom subscale domains are (1) cognitive problems, (2) depression, (3) pain, (4) nonspecific somatic hardship, and (5) PTSD/anxiety. Correspondingly, (1) cognitive/memory, (2) neurological motoric ailment, (3) neurological sensory problems, (4) pain, and (5) anxiety/depression (incl. PTSD) subscales relate to pseudosymptoms. Each subscale includes 10 items. The pseudosymptoms scale items assess symptom over-endorsement. For diagnostic purposes, a cutoff score of >9 pseudosymptom items is recommended, whereas for screening purposes, a score of >6 is applicable. In this study, the Cronbach’s alphas for the genuine symptom scale and pseudosymptoms scale were .94 and .93, respectively. For the whole SRSI, the Cronbach’s alpha was .96.

In addition to the items of the main scales, the SRSI also contains seven items related to cooperativeness and consistency. For the consistency items (five in total), careless responding would be evident if individuals affirmed them in a way that contradicted their symptom report (Boskovic et al., Citation2020). In this sample, participants in the Genuine condition (M = 1.92, SD = 1.91) endorsed significantly higher consistency items than participants in the Coached Feigning condition (M = .28, SD = .67) and Internet-Coached Feigning condition (M = .54, SD = .88) (Welch’s F(2, 40.48) = 7.49, p = .002, ηp2 = .24). In contrast, no significant differences were observed between the two feigning conditions, p = .519. Thus, participants in this study consistently responded according to their instructions. For the purposes of this study, we included only the total scores of the two main scales in our analyses.

Five items that screen for random responding were included randomly throughout the scale. Only one participant failed the checks and was removed from the final sample.

Inventory of Problems-29 (IOP-29; Viglione et al., Citation2017)

The IOP-29 contains 29 items, 26 of which use an SVT-like format, and three of which use a PVT-like format. Each of the 26 SVT-like items provides three response options: “true,” “false,” and “does not make sense.” The developers of the IOP-29 argue that the option “does not make sense” allows for a more refined understanding of the true versus false response selection, by eliminating the dichotomous forced-choice nature of most available SVTs. The IOP-29 generates a False Disorder probability Score (FDS) based on two sets of reference data (i.e., from patients and from experimental feigners) rather than the more standard, T-metric, healthy controls-based normed score. The computation of the FDS is performed via the website https://www.iop-test.com. The FDS is to be interpreted as a likelihood value with higher scores reflecting more implausible presentations of a disorder and lower scores nearing bona fide presentations. This format arguably facilitates decision making in real-life settings (Giromini et al., Citation2018). A cutoff score of FDS = .50 allows classifying a given IOP-29 as mainly noncredible (if ≥.50) or mainly credible (if <.50). As noted above, even though the IOP-29 is a relatively new SVT, several international studies have already demonstrated that it has the good psychometric properties for use in forensic assessments (Young et al., Citation2020).

A check for inattentive responses was also added to this scale (“To this item respond with T”). One subject failed the check and was removed from the data.

Procedure

The study was conducted online. Participants signed up for the study via the Research Participation System (SONA and ERPS), where they received a Qualtrics link that directed them to the study. After providing general information and consent, participants answered demographic questions. In addition, given that this study was conducted in The Netherlands, we asked participants to rate their English proficiency and current mental health on a 5-point scale (1 for terrible proficiency/extremely poor mental health; 5 for excellent English proficiency/extremely good mental health). Participants were then randomly assigned to either the Genuine (i.e., honest participants), Coached Feigning, or Internet-Coached Feigning condition. While the genuine participants were asked to respond honestly, for the participants in the two feigning conditions we used a simulation research paradigm. That is, coached feigning participants received a vignette in which they were instructed to simulate a particular condition and respond as if they were the protagonist of a scenario (see Rogers, Citation2018c). Specifically, they received instructions to feign depression (see Appendix; adapted from Giromini, Lettieri, et al., Citation2019; see also Pignolo et al., Citation2021), which also included information about the key symptoms of depression (i.e., symptom coaching) and a warning not to “overdo it.” Participants in the Internet-Coached Feigning condition were also given two links of websites found in online searches for depression (i.e., searches conducted by the authors at planning stage of study) and asked to carefully review their content (see Appendix). These participants were not allowed to continue with the study unless they clicked on the presented links. Participants in the three conditions were then randomly assigned to one of three groups, which would complete one of the following three inventories each: the SIMS, the SRSI, or the IOP-29. After the main task, participants were presented with several exit questions (e.g., difficulty, motivation, clarity of instructions) and a debriefing form. Permission to use and store data anonymously was obtained. Participants were also asked to provide a randomly generated personal identification code in case they wished to withdraw their responses in the future. All participants who completed the study were granted one research participation credit as compensation.

Analysis

Due to the small sample sizes, for the main analyses, we opted for the non-parametric version of analyses of variance (ANOVA). Hence, within each group (i.e., for each SVT), we conducted Kruskal–Wallis tests to assess score differences across the three research conditions (Genuine, Coached Feigning, and Internet-Coached Feigning). When the main effect was statistically significant, pairwise contrasts were examined using Bonferroni correction. Epsilon squared (ε2 = H/(n2 − 1)/(n + 1); Tomczak & Tomczak, Citation2014) was used for the effect size computations. To facilitate future meta-analytic research, we also present means and standard deviations. Furthermore, classification accuracy was tested, too. Data and the outputs are available at Open Science Framework (OSF) platform (https://osf.io/a9km3/).

Results

Preliminary analyses: motivation, clarity of instructions, and task difficulty

Ratings of motivation to participate were obtained on a five-point scale (1 = “Extremely unmotivated”; 5 = “Extremely motivated”). The mean value for the whole sample (N = 193) was M = 4.07 (SD = .65; range 2-5Footnote1). We also asked the participants to rate the clarity of instructions on a five-point scale (1 = “Extremely unclear”; 5 = “Extremely clear”), obtaining a mean value of M = 4.47 (SD = .76; N = 193). Lastly, participants rated the difficulty in filling out the inventories (1 = “Extremely easy”; 5 = “Extremely difficult”), and the mean value for the whole sample (N = 193) was M = 2.27 (SD = 1.01).

We then inspected whether, within each SVT group, participants assigned to the three different conditions differed in terms of motivation, clarity of instructions, and difficulty of the task (for descriptive statistics, see ). Genuine, coached feigning, and Internet-coached feigning participants who received SIMS did not significantly differ from each other in terms of motivation (H(2) = 3.93, p = .140, ε2 = .06), nor in the ratings of difficulty of the task (H(2) = 2.92, p = .232, ε2 = .04). However, they did provide different ratings of clarity of instructions (H(2) = 11.38, p = .003, ε2 = .18). Pair-wise comparisons using Bonferroni correction indicated that participants in the Genuine condition rated the clarity of the instructions significantly higher than those in the Coached Feigning condition (p = .003), whereas those in the Genuine condition did not differ in their ratings from those in the Internet-Coached Feigning condition (p = .108), nor did the two feigning conditions differ from each other (p = .939).

Table 2. Motivation, clarity of instructions, and difficulty scores by group and condition.

A similar trend was observed among the participants who filled out SRSI. Namely, the main effects were statistically significant for both ratings of clarity (H(2) = 8.99, p = .011, ε2 = .14) and ratings of difficulty (H(2) = 9.10, p = .011, ε2 = .14). Participants in the Genuine condition rated the clarity of instructions higher than both those in the Coached Feigning condition (p = .041) and those in the Internet-Coached Feigning condition (p = .004), whereas the two feigning conditions did not significantly differ from each other (p = .524). When asked about the difficulty of the task, participants in the Genuine condition rated it significantly lower than both coached feigners (p = .004) and Internet-coached feigners (p = .035). Again, the two feigning conditions did not significantly differ from each other (p = .353).

There were no significant differences between participants in the Genuine, Coached Feigning, and Internet-Coached Feigning conditions who received the IOP-29 in terms of motivation (H(2) = .56, p = .755, ε2 = .009), clarity of instructions (H(2) = 2.65, p = .266, ε2 = .04), or difficulty of the task (H(2) = 3.38, p = .185, ε2 = .05).

SIMS: comparison between genuine, coached feigning, and internet-coached feigning

To compare whether conditions differed in their endorsement of SIMS items, we ran a Kruskal–Wallis test with condition (Genuine vs. Coached Feigning vs. Internet-Coached Feigning) as independent variable and SIMS total as dependent variable. Results showed an overall significant difference between participants assigned to the three conditions, H(2) = 20.81, p < .001, ε2 = .33. More specifically, participants in the Genuine condition (M = 11.95, SD = 7.41) endorsed fewer items than those in the Coached Feigning (M = 24.96, SD = 7.85); p < .001) and Internet-coached Feigning (M = 23.78, SD = 9.82; p = .001) conditions. The two feigning conditions did not significantly differ from each other in their SIMS scores, p = .590.

Detection rate

We employed the recommended >16 items cut off (see van Impelen et al., Citation2014) in order to assess the detection accuracy of the SIMS (for the Area Under the Curve (AUC) results for all three measures, see Supplemental Table 1). As shown in , 88% of participants in the coached feigning and 78% of participants in the Internet-coached feigning were correctly detected as over-endorsers. Because the two feigning conditions did not yield statistically significantly different results, we also computed the detection rate for feigners overall (i.e., combined), which was 84%. Results also showed that 29% of the genuine participants were falsely classified as over-endorsers (i.e., false positives).

Table 3. Minimum and maximum score on SIMS across conditions, and sensitivity and specificity for >16 cutoff SIMS.

SRSI: comparison between genuine, coached feigning, and internet-coached feigning

As with SIMS, we also conducted Kruskal–Wallis tests to investigate whether the three conditions (Genuine vs. Coached Feigning vs. Internet-Coached Feigning) yielded different SRSI’s genuine symptoms and pseudosymptoms scores. Participants assigned to the three conditions significantly differed on genuine symptoms scale, H(2) = 18.17, p < .001, ε2 = .28, and pseudosymptoms scale, H(2) = 8.92, p = .012, ε2 = .14. Pairwise comparisons with Bonferroni correction indicated that participants in the Genuine condition endorsed significantly fewer items on the genuine symptoms scale (M = 17.46, SD = 9.70) than those in the Coached Feigning (M = 30.11, SD = 9.49, p = .001) and Internet-Coached Feigning conditions (M = 30.38, SD = 10.97, p < .001). Participants in the Coached Feigning and Internet-Coached Feigning conditions did not exhibit significantly different scores on the genuine symptoms scale, ps = 1.00. For the pseudosymptoms scale, pairwise comparisons indicated a significant difference between genuine participants (M = 4.46, SD = 4.04) and coached feigners (M = 10.33, SD = 7.85; p = .018), and a non-significant difference between scores of Genuine condition and Internet-Coached feigners (M = 9.75, SD = 10.36, p = .061). The two feigning conditions obtained similar pseudosymptoms scores (p = 1.00).

Detection rate SRSI

Employing the screening cutoff score of >6 pseudosymptoms items, 60% of the feigners were detected. More specifically, 67% of the participants in the Coached Feigning condition and 54% of those in the Internet-Coached Feigning condition were detected correctly as over-endorsers. However, 25% of genuine participants also endorsed more than 6 pseudosymptoms. Using the recommended standard (>9) cutoff point, 41% of the participants in the combined feigning conditions were detected correctly. Specifically, 39% of over-endorsers in the Coached Feigning condition and 42% of those in the Internet-Coached Feigning condition were detected. Also, 17% of genuine participants endorsed more than nine pseudo-items. The sensitivity and specificity rates at these cutoff points are reported in .

Table 4. Minimum and maximum number of endorsed pseudosymptoms across conditions, and sensitivity and specificity for >6 and >9 cutoff SRSI.

IOP-29: comparison between genuine, coached feigning, and internet-coached feigning

As it was done with the other two measures, we ran the Kruskal–Wallis test with FDS as dependent and condition as independent variables. Overall, the three conditions (Genuine vs. Coached Feigning vs. Internet-Coached Feigning) yielded significantly different scores, H(2) = 41.38, p < .001, ε2 = .67. Pairwise comparisons with Bonferroni correction indicated that participants in the Genuine condition (M = .14, SD = .11) obtained a lower FDS score than those in the Caoched Feigning (M = .77, SD = .20, p < .001) and Internet-Coached Feigning (M = .69, SD = .21, p < .001) conditions. The two feigning conditions did not differ from each other (p = .851).

Detection rate

We tested the detection accuracy of the proposed cutoff score of the IOP-29, i.e., FDS ≥ .50 (Viglione & Giromini, Citation2020). Employing this cutoff led to the correct classification of 96% of genuine participants, with one participant being classified as non-credible. Further, the employment /xrefof the proposed cutoff led to the detection of 86% of participants in the Coached Feigning condition and 88% of those in the Internet-Coached Feigning condition (for more details, see ).

Table 5. Minimum and maximum of FDS scores for all conditions and sensitivity and specificity for FDS >.50.

Scores on SIMS, SRSI, and IOP-29, and mental health check and english proficiency

Lastly, we inspected how well the mental health self-reports of participants in the Genuine condition associated with their scores on all three measures. Such information could inform on the extent to which genuine psychopathology could generate false positive results. The correlation between mental health report (lower number indicating poorer mental health) and total score on SIMS was significant and in a negative direction, Spearman’s rho = −.580, p = .006 (n = 21). A similar result was observed for the total genuine score of the SRSI, Spearman’s rho = −.560, p = .004 (n = 24), whereas the correlation between mental health ratings and SRSI pseudosymptoms scale scores was not significant, Spearman’s rho = −.367, p = .078. Finally, there was no significant correlation between mental health reports and IOP-29 FDS scores, Spearman’s rho = −.227, p = .264 (n = 26).

We also investigated whether there is a relationship between genuine participants’ self-reported English proficiency and their scores on all three measures. The results showed that English proficiency of our participants did not significantly correlate with the total score on SIMS, Spearman’s rho = −.060, p = .797, nor with scores on SRSI’s genuine symptom scale, Spearman’s rho = .145, p = .500, and pseudosymptoms scale, Spearman’s rho = .007, p = .972. However, the scores of the IOP-29 FDS significantly (positively) correlated with participants’ proficiency in English, Spearman’s rho = .414, p = .036. Yet, once we excluded genuine participants in the IOP-29 condition who scored below 3 on their English proficiency (n = 5), the correlation was no longer significant, Spearman’s rho = .238, p = .299 (n = 21).

Discussion

In this simulation study, we aimed to test the robustness of three symptom validity tests (SVTs), namely the SIMS, the SRSI, and the IOP-29, to symptom coaching of psychology students who were asked to feign depression. In addition to information about symptoms of depression, participants who feigned depression also received clear instructions not to exaggerate their symptoms. Further, a group of experimental feigners also received Internet links for additional information about depression. Overall, our preliminary results suggest that all three measures performed decently, but that there is a need to continue this line of research and find ways to further improve the robustness of SVTs to coaching.

First, the results of our study showed that the total scores of all three measures significantly differed by condition, with participants in the Genuine condition generating the most credible symptom presentations (i.e., scored lowest on all instruments) and those in the feigning conditions generating the least creadible ones (i.e., scored highest on all insturments), supporting our primary hypothesis. Differences between participants in the genuine versus feigning conditions were significant on the total SIMS score, on the main genuine symptoms and pseudosymptoms scales on SRSI, and on the “false disorder probability” (FDS) score of the IOP-29.

The SIMS scores of our feigning conditions were relatively similar to those reported in previous SIMS studies. Conversely, our genuine participants endorsed more symptoms than a control group in a prior similar project (Clegg et al., Citation2009). A similar trend was also noted on the SRSI, with our participants endorsing more items than in previous research on depression, regardless of condition membership (Stevens et al., Citation2018). This tendency toward excessive over-endorsing, even among genuine participants, may explain their non-significant differences from Internet-coached feigners’ scores on the SRSI’s pseudosymptoms scale. However, our participants in the Genuine condition scored similar to the low anxiety control group of a previous study on anxiety-related problems (Boskovic et al., Citation2019). Moreover, Zahid et al. (Citation2022) recently reported a false positive rate of 40% when administering the SIMS on an undergraduate student sample and using the standard cutscore of ≥16. As such, albeit somewhat puzzling, our results concerning specificity are not completely in contrast with recent research.

As for the IOP-29, the average scores obtained in the Genuine condition are in line with previous research on the utility of the IOP-29 in the detection of fabricated depression (Giromini, Lettieri, et al., Citation2019; Ilgunaite et al., Citation2020; Šömen et al., Citation2021). In this study, thus, the IOP-29 performed slightly better than the SIMS, which in turn performed slightly better than the SRSI. However, when considering the differences in response style between the three measures, it is important to also note the differences in their length. The IOP-29 is the shortest, followed by the SIMS, and then the SRSI, which contains more than one hundred items. It is therefore possible that in the Genuine conditions our participants’ attention waned over time, resulting in higher symptom overendorsement on lenghtier measures such as SIMS and SRSI.

Second, the addition of information available on the Internet did not result in statistically significant differences between the two feigning conditions. On the one hand, it did not jeopardize the effectiveness of our studied SVTs; on the other hand, it did not even facilitate their task, as one might expect given that searching for symptoms on the Internet is known to increase symptom endorsement in symptom inventories (Brown et al., Citation2020; Jungmann et al., Citation2020; Starcevic & Berle, Citation2013). Because both conditions received a detailed description of depression symptoms, it is likely that adding information available on the Internet had little additional value. All in all, however, looking at all mean values, Internet-coached participants scored slightly lower than coached feigners on all three measures, suggesting that information available via the Internet may have made participants somewhat more familiar with the symptoms of genuine depression, and therefore more cautious in item endorsement.

Third, and related to the previous point, the detection rates of SIMS and SRSI suggest that the Internet-coached feigners may have been more reluctant to endorse symptoms than participants in the Coached Feigning condition who had not received additional Internet information. Indeed, paticipants with Internet coaching were less likely to be detected than coached feigners without Internet information (SIMS: 78% vs 88%; SRSI 54% vs 67%, respectively). With respect to SIMS, these results are somewhat consistent with those previously reported by Merten et al. (Citation2010) and Jelicic et al. (Citation2011). Nevertheless, SRSI detection rates are lower than in previous studies examining symptom coaching (e.g., Boskovic et al., Citation2019). This discrepancy might be related to our instruction not to overdo the symptom presentation, which was intended as a protection against acquiescence response bias (Ray, Citation1983), and is unlikely to resemble real-life situations. Still, this type of instruction could be considered strategy-coaching, which has been shown to be more successful in reducing symptom endorsement than symptom coaching (Jelicic et al., Citation2011). Because SIMS and SRSI heavily rely on the evaluees’ tendency to overendorse symptoms, instructing participants about the symptoms of depression while also counterbalancing their overreporting by telling them not to “overdo it” potentially diminished their effectiveness. In particular, it is possible that the SRSI was more affected by such instructions because it contains both genuine symptoms and pseudosymptoms. Thus, it may have been easier for participants who were informed about the symptoms of depression to compare the items presented and to limit their endorsement to genuine items only. This assumption is supported by the evidently higher scores generated by the two feigning conditions on the genuine symptom scale as well, with participants coached via the Internet endorsing more genuine items. The decrease in detection of Internet coached participants compared to feigners without additional coaching was the least apparent on the IOP-29 (86% vs 88%). Our results are very similar to those previously reported on the effects of coaching and performance on the IOP-29 (Gegner et al., Citation2021). However, it is important to note that the coaching instructions of both conditions of feigners were taken from a previously published paper using the IOP-29 (Giromini et al., Citation2019). Thus, further research with other instructions is needed to more fairly compare the effectiveness of the three SVTs included in this study, as it may be that the IOP-29 was at an advantage in this study, especially considering that not all of its items follow the usual SVT rationale.

Finally, we also examined whether scores on SIMS, SRSI, and IOP-29 correlated with the mental health ratings that genuine participants had provided before the study began. The results showed that the better the mental health of the participants, the lower the scores on SIMS, and a similar trend was observed for SRSI genuine scale, too. There was no significant correlation between mental health self-report ratings and IOP-29 scores. We also checked whether the self-reported English proficiency of our genuine participants was related to their scores on all three measures, and only the score on the IOP-29 was positively related to language proficiency. That is, the better the language proficiency, the higher the FDS. However, when the few outlier participants who self-rated their language skills as “poor” were excluded, this correlation was no longer significant. This finding is nevertheless suprising, as previous research has shown that speakers with lower proficiency tend to score higher on SVTs (van der Heide et al., Citation2020). Since the IOP-29 also contains tasks similar to those of the PVTs, this result needs further investigation.

It is important to reflect on the limitations of this study. First, despite the ambitious goal of including SIMS, SRSI, and IOP-29 in this project, we did not administer the tests together with the same participants. Thus, while we attempted to examine the comparative validity of the SIMS, SRSI, and IOP-29, we did not do so in a sufficiently exhaustive manner and future research should focus on this. Second, our sample size may have been too small to detect small to medium sized differences, not only between genuine participants and feigners, but specifically between the two feigning conditions. More importantly, our study revealed an unexpectedly high number of false positive results on SIMS and SRSI compared to previous studies conducted in person using the standard paper-and-pencil format. It should be noted that we administred the tests online, which, according to the test developers, represents a significant deviation from standard conditions of use (Merten et al., Citation2021). On the other hand, it is also worth noting that our study was conducted during the Covid-19 lock-down period, and that it is likely that our participants experienced higher levels of general distress than samples used in the past. Third, and relatedly, our sample consisted of undergraduates, particularly female psychology students, which limits the ecological validity of our findings (Rogers, Citation2018a). We also suggest that future investigations include checks for prior knowledge about the symptoms participants are asked to fabricate. In this regard, psychology students may not be a representative sample. Considering that we did not have a control condition consisting of patients with depression, our results can only be considered in terms of sensitivity and potential indications for future studies. In addition, it is possible that our genuine participants did not fully follow the instructions and perhaps tended to exaggerate their psychological problems. Fourth, because our study was conducted in The Netherlands, our participants were most likely not native English speakers, and their language proficiency was self-reported. Although they receive education in English, there is still a possibility that some of them did not fully understand the items they were responding to. Another important omission is that we did not specifically ask about the nationality of the participants, so we cannot determine the percentage of students who were native English speakers. Finally, this study was conducted online using Qualtrics. Inattentive responding is known to be more common in studies conducted in this format, so we included inattentivness checks in all three measures. However, we could not ensure, for example, that participants in the Internet-Coached Feigning condition actually read/listened to the content of the Internet links, because the program could only check that they clicked on the links to continue with the study.

Taken together, our preliminary findings suggest that all three SVTs included in the study, i.e., SIMS, SRSI, and IOP-29, are sensitive to feigned depression even in the presence of symptom coaching, both with and without additional internet-based information. However, additional research using a clinical comparison simulation or criterion groups study design in which all three SVTs are administered to the same individuals, as well as the inclusion of different or alternative instructions, patient samples, and objective tests of language proficiency, would be beneficial.

Ethical approval

This study was conducted in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki).

Supplemental material

R2_Supplemental_Table_.docx

Download MS Word (22.4 KB)

Disclosure statement

The first and second authors declare no conflict of interest. The third author declares that he owns a share in the corporate (LLC) that possesses the rights to Inventory of Problems instruments.

Notes

1 Four participants reported motivation below the mid-point (3), one received SIMS and three participants received IOP-29. We re-ran the main analyses without these participants, but the results did not differ, hence, we kept them in the dataset.

References

Appendix

Instructions

Feigning condition

Dear participant

Soon, we will ask you to fill out a questionnaire. However, we want you to fill out that questionnaire as someone who is diagnosed with major depression disorder. We would like you to put yourself in the shoes of a person who has had an accident at work and is now suffering from mental health problems—namely, depression—related to that accident and for which he has requested to be put on disability. To help you provide a credible presentation, please read the following text, and try to pretend that you are the person depicted in this scenario.

You are an administrator at a small, well-established firm. Your boss has been trying to cut expenses by having the cleaning crew work before regular work hours are over, thus getting the job done at a cut rate. You have repeatedly informed him that this is not a safe working condition for the employees, but he has not changed the procedure. One day, near the end of the day, you are leaving to do a special errand for your boss. As you cross a freshly mopped floor, you slip and fall, landing hard on your tailbone. As a result, you have been out of work for 2 weeks on disability and continue to experience a fair amount of pain, particularly when you sit for any length of time. The workers compensation physician insists that he can find nothing to explain the pain and refuses to authorize any more time off or disability payments, stating that you are able to return to work, a job that requires long periods of time sitting at your computer. Before terminating your case, the physician refers you to the staff psychologist for a routine evaluation. You realize that this evaluation is your only opportunity to remain on disability under your employer’s obligation. You have no additional coverage and need an income until you are fully recovered. You also feel that your boss is responsible, and that money should come from the company through workers compensation. So, your only choice is to present yourself as having significant depression on the tests that the psychologist is going to give you. You therefore decide to attempt to present yourself as having a major depression as the result of your accident, to remain on disability.^

Here the symptoms of the Major Depression Disorder. Keep in mind that depressed patients typically have 5 or more of the following symptoms, but most likely not all of them: 1. Depressed mood most of the day, nearly every day (e.g., feeling sad, empty, hopeless), 2. Markedly diminished interest or pleasure in all, or almost all, activities most of the day, nearly every day, 3. Significant weight loss when not dieting or weight gain, or decrease or increase in appetite nearly every day, 4. Insomnia or hypersomnia nearly every day, 5. Psychomotor agitation or retardation nearly every day, 6. Fatigue or loss of energy nearly every day, 7. Feelings of worthlessness or excessive or inappropriate guilt nearly every day, 8. Diminished ability to think or concentrate, or indecisiveness, nearly every day, 9. Recurrent thoughts of death, recurrent suicidal ideation without a specific plan, or a suicide attempt or a specific plan for committing suicide.

When you take the tests and try to pretend you suffer from a Major Depressive Disorder, please keep in mind that if you present your condition in an extremely dramatic way, your performance may not be believable, and the examiner might understand that you do not suffer from depression but are only faking it. So, try to not over-do it. If you will be able to produce test results that are consistent with those produced by people who really suffer from Major Depression Disorder and you will not look like a feigner, you may win a small prize consisting of a 10€gift card in the lottery!

Internet-coached feigning condition

The same as for the Feigning Condition, but plus:

It is known that people are better at feigning depression if they check the internet for available information. So, we found two links for you and ask you to click on the following links and read the presented information. Please do not close this tab, as the study will stop, but just open the links in a new tab and come back once you are done reading the information. It is of high importance that you actually check these links as you will not be allowed to proceed with the study if you do not. But please do not continue the study unless you read the provided information.

The information provided on these links is brief and it will not take you a long time to read it:

  1. Watch a 1-minute video and read the text: https://www.webmd.com/depression/guide/detecting-depression

  2. Read about psychological, physical, and other symptoms of depression: https://www.nhs.uk/conditions/clinical-depression/symptoms/