2,019
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Faking self-reports of health behavior: a comparison between a within- and a between-subjects design

ORCID Icon, &
Pages 895-916 | Received 17 Jun 2021, Accepted 06 Oct 2021, Published online: 22 Oct 2021

ABSTRACT

Background

This study examines people's ability to fake their reported health behavior and explores the magnitude of such response distortion concerning faking of preventive health behavior and health risk behavior. As health behavior is a sensitive topic, people usually prefer privacy about it or they wish to create a better image of themselves (Fekken et al., 2012; Levy et al., 2018). Nevertheless, health behavior is often assessed by self-report questionnaires that are prone to faking. Therefore, it is important to examine the possible impact of such faking.

Methods

To replicate the findings and test their robustness, two study designs were realized. In the within-subjects-design, 142 participants repeatedly answered a health behavior questionnaire with an instruction to answer honestly, fake good, and fake bad. In the between-subjects design, 128 participants were randomly assigned to one of three groups that filled out the health behavior questionnaire with only one of the three instructions.

Results

Both studies showed that successful faking of self-reported preventive and health risk behavior was possible. The magnitude of such faking effects was very large in the within-subjects design and somewhat smaller in the between-subjects design.

Conclusion

Even though each design has its inherent merits and problems, caution is indicated regarding faking effects.

It is the main goal to prevent non-communicable diseases and improve people’s health behavior, not only for researchers but also for general practitioners and health care workers (World Health Organization [WHO], Citation2013). Health behavior is defined as “overt behavioral patterns, actions, and habits that relate to health maintenance, to health restoration and to health improvement" (Gochman, Citation1997, p. 3). Thus, various behaviors are covered by this definition. Often, preventive behavior that improves and protects health like eating a healthy diet and performing sufficient physical activity is distinguished from risk behavior like smoking or excessive alcohol use, which endangers health and should be prevented or reduced to a minimum (Kasl & Cobb, Citation1966).

Although health researchers can use many innovative techniques like wearables, physiological measures, or ambulatory assessment to assess health behavior, self-reports are still the most frequent measure (Sattler et al., Citation2021). Self-reports are easy to use, economic in terms of management and time and they are cost-efficient (Foa, Cashman, Jaycox, & Perry, Citation1997). At the same time, they are subject to criticism, among other things because of their susceptibility to errors and wilful response distortion, leading to a limited validity of self-reported data (Griffith, Chmielowski, & Yoshita, Citation2007). Objective and subjective measures often show significant correlations in relative terms, for example, correlations of r = .21 to r = .52 are reported for physical activity measured by self-reports and accelerometers (Atienza & King, Citation2005; Nelson, Taylor, & Vella, Citation2019). However, it is striking that the absolute measures differ significantly. For example, people report about twice as much physical activity time in self-reports as objectively measured data (Atienza & King, Citation2005). Systematic reviews claim that self-reported health behavior questionnaires may succeed at ranking individuals concerning their health behavior but cannot provide valid results concerning the absolute quantity of physical activity (Helmerhorst, Brage, Warren, Besson, & Ekelund, Citation2012). A similar picture emerges for diet, smoking, and alcohol consumption. Participants overreport fruit and vegetable consumption in self-reports compared to objectively assessed intake (Lechner, Brug, & De Vries, Citation1997). In the German National Nutrition Survey II, the correlation of subjective and objective measures of reported fruit and vegetable consumption ranged from .24 ≤ r ≥ .40 (Straßburg, Eisinger-Watzl, Krems, Roth, & Hoffmann, Citation2019). In their review, Gorber, Schofield-Hurwitz, Hardt, Levasseur, and Tremblay (Citation2009) claim that smoking is underreported in self-reports compared to objective measures. For example, smoking prevalence is often underestimated when assessed by self-report versus using blood cotinine levels (Lewis et al., Citation2003) or urinary cotinine concentration (Hwang, Kim, Lee, Jung, & Park, Citation2018). For the comparison of self-reported and objectively measured alcohol consumption, correlations of r = .27 were reported. Yet, over 50% of participants denying consuming alcohol in the last 30 days were tested positive for Phosphatidylethanol, an objective indicator for drinking (Littlefield et al., Citation2017).

As explanations for the discrepancies between the objective and subjective measures, in addition to memory effects and biases due to reference points, the deliberate distortion of answers in self-reports is often discussed, to present oneself as more socially desirable (Atienza & King, Citation2005). It is well documented that people sometimes alter their responses to benefit from the creation of the desired impression (Crowne & Marlowe, Citation1960; Edwards, Citation1957; Furnham, Citation1986; Locander, Sudman, & Bradburn, Citation1976; Mazar & Ariely., Citation2006; McCrae & Costa, Citation1983; Mensch & Kandel, Citation1988; Nederhof, Citation1985; Norman, Citation1967).

The prevailing assumption about dishonest behavior is that people act completely purposively in every situation according to the maxim of the greatest gain. Insofar as dishonest behavior means maximizing profit, people behave dishonestly (Henrich et al., Citation2001; Morgan, Citation2006). In doing so, they consider three factors: the benefit that could be gained from dishonest behavior, the probability of being caught in the dishonest behavior, and the expected punishment if caught. The action alternative that maximizes personal gain becomes the guiding factor (Becker, Citation1968).

Health behavior is a highly delicate topic and may thus be particularly susceptible to dishonest reporting, also because it might contain information that can be socially unacceptable or even illegal (Fekken, Holden, McNeill, & Wong, Citation2012). Thus, dishonest reporting of health behavior may lead to significant benefits. Since the probability of being caught in dishonest reporting is relatively low as the validity of self-reports can often not be checked, people might create a desired impression of themselves when reporting their health behavior. Previous research showed that the majority of patients admit restricting information given to their clinicians and not being entirely honest concerning their health behavior (Levy et al., Citation2018), and also that self-report measures of health behavior are susceptible to response distortion (Fekken et al., Citation2012). It is thus questionable whether self-reported health behavior yields a diagnostic value. Although trying to reduce nondisclosure and faking seems obvious, the degree of dishonesty and the extent of response distortion remain unclear.

Therefore, the following research investigates people's practical skills to distort their responses in a health behavior questionnaire to create the desired impression and subsequently, estimates the magnitude of such response distortion. To test the robustness of the findings, the customary within-subjects design of faking studies is backed up by a between-subjects design to profit from the advantages and insights of both designs concerning the research question. So, the rationale of this study is to investigate how large such faking effects may be when people are instructed to alter their responses accordingly. The studies do not investigate whether faking happens in practice and how large such field effects may be.

Dishonesty and faking

DePaulo, Kashy, Kirkendol, Wyer, and Epstein (Citation1996) claim that people are dishonest in about 30% of their social interactions each week. Dishonesty can take different forms, not only the extent of dishonesty can vary from telling outright untruths to slight self-promotions, but also the direction of response distortion can vary from creating a favorable impression, fake good, to creating an unfavorable impression, fake bad (Cook, Citation2004). Faking is a response bias in which individuals consciously manipulate their responses to create the desired impression (Griffith et al., Citation2007; Komar, Brown, Komar, & Robie, Citation2008; McFarland & Ryan, Citation2000; Van Hooft & Born, Citation2012). As a form of other-deceptive enhancement, faking as conscious response biasing has to be differentiated from self-deceptive enhancement, where individuals believe positive self-descriptions to be true (Paulhus, Citation1984).

For people to control aspects that are relevant to the development of an impression, the capacity, the willingness, and the opportunity to control the information given are important factors (Levashina & Campion, Citation2006). Those factors are assumed to be linked multiplicatively. The capacity to control the information given consists of cognitive capacity as well as social and verbal competencies. The willingness to modify the information given relies on personality traits and integrity, but also on a cost–benefit analysis that compares the benefits of creating the desired impression to the negative consequences that may arise if a person is caught being more or less dishonest. The opportunity to modify the information given can be illustrated by comparing two assessment methods. Whereas it can be easy to create the desired impression in an interview or a self-report questionnaire, it is nearly impossible to influence objective measurements like blood parameters. Self-report questionnaires on health behavior are susceptible to faking because all three factors can be given.

Considerations about the adequate study design

Since there is often no way of checking the validity of self-reports, directed-faking designs are employed to study faking (Viswesvaran & Ones, Citation1999). In these studies, participants receive instructions to distort their responses in a particular manner. Although it is disputable whether directed faking accurately represents faking in practice directed-faking designs benefit from a high degree of control and a direct comparison of honest and faked scores (Furnham, Citation1990). Most studies about response distortion rely on a within-subjects design where participants are being tested multiple times with different external stimuli (e.g. Fell & König, Citation2016). The advantages of this design are obvious. For example, the within-subjects design is usually characterized by higher internal validity and higher statistical power. However, concerning the practical relevance of the topic in question, it seems plausible to consider a between-subjects design, too, where each participant is assigned to one faking condition. Between-subjects designs are often discarded for their reliance on randomization and the risk of baseline differences between the different experimental groups, possibly leading to substantial noise and therewith reducing the statistical power. But the strength of between-subjects designs cannot be ignored, as they yield a higher external validity and might be more naturally aligned with the phenomenon in practice (Charness, Gneezy, & Kuhn, Citation2012). With regards to response distortion, it seems highly implausible that a person repeatedly answers the same questions albeit giving different responses in real settings. The case where participants are exposed to a motivational cue to present themselves either more positively or more negatively seems closer to reality.

Both designs have their merits regarding the scientific insights concerning the research topic. Thus, in the following study, the extent to which it is possible to fake health behavior in a self-report questionnaire is examined both by a within-subjects design (Study 1) as well as by a between-subjects design (Study 2). The two designs do not only permit to answer slightly different research questions: While the within-subjects design allows determining the maximum limits of response distortion of the health behavior scales, the between-subjects design sheds light on the operational level of faking with higher external validity (Viswesvaran & Ones, Citation1999). Further, this approach allows to compare the results of the two study designs and thus leads to a deeper understanding of faking self-reported health behavior. For example, it could previously be shown that the responses to a stimulus differ significantly dependent on whether participants were evaluating just the one stimulus or evaluating multiple stimuli (Hsee & Zhang, Citation2004). This referencing effect might cause participants of a within-subjects design to alter their responses according to the instructions. As participants in the between-subjects design are only confronted with one instruction, they do probably not conduct the same careful balancing as the participants in the within-subjects design who may have to adjust their responses to previous responses given. A possible result may be that participants in the within-subjects design do not report their honest health behavior, but rather a constructed concept of behavior that represents the middle of fake good and fake bad behavior.

Therefore, we investigate the following hypotheses: Instructed to report a favorable health behavior, people report significantly healthier dietary habits (H1), more physical activity (H2), less smoking (H3), and less alcohol consumption (H4) than people instructed to report their actual health behavior. Analogously, people instructed to report unfavorable health behavior report significantly unhealthier dietary habits (H1), less physical activity (H2), more smoking (H3), and more alcohol consumption (H4) than people instructed to report their actual health behavior.

In addition, we compare the results of the two study designs and investigate first indicators of referencing effects concerning the reported honest behavior.

Method 1

Sample

As it was not clear whether the effect sizes of directed faking studies in personality inventories are comparable to the effects of faking in health behavior questionnaires, small effects were anticipated (Cohen, Citation1988). Therefore, the intended sample size calculated by G*Power was 134 participants (Faul, Erdfelder, Lang, & Buchner, Citation2007). The final sample included 142 German participants (73.2% female) between the ages of 18 and 67 (M = 25.5, SD = 11.4). 93% of the sample had graduated from high school and nearly 20% of those participants had a university degree.

Instrument

To assess health behavior tailored to the German sample, a questionnaire based on existing questionnaires and newly developed items was developed to assess diet, physical activity, smoking, and alcohol consumption. Independent experts checked that responses to all items could potentially be distorted.

Diet was measured with 15 items based on the recommendations of the German Nutrition Society (German Nutrition Society [DGE], Citation2010). Analogous to the assessment of physical activity, the average number of days a week in which participants consumed a certain category of food (vegetables, fruit, grains, dairy products, meat, fish, and eggs), as well as the amount of food eaten, was assessed. In addition, the amount of drinking per day was assessed. For comparability, following other inventories for the assessment of dietary habits, the amount of food was assessed in portions and drinking was assessed in liters (e.g. Emanuel, McCully, Gallagher, & Updegraff, Citation2012).

Physical activity was measured with seven items that were based on the International Physical Activity Questionnaire – Short Form in German but presented in written format (IPAQ-SF, Booth, Owen, Bauman, & Gore, Citation1996). The IPAQ-SF is a retrospective self-report questionnaire that assesses the physical activity of the past seven days. The questionnaire assesses the number of days and the average time (hours and minutes) spent on physical activity with an open response format. More specifically, the questionnaire assesses moderate and vigorous physical activity as well as walking and sitting behavior. The IPAQ-SF was chosen because of its good psychometric qualities and its implementation in multiple previous studies (Craig et al., Citation2003; Hagströmer, Oja, & Sjöström, Citation2006).

Smoking and alcohol consumption were only assessed if participants answered positively to a filter question assessing their basic consumption (i.e. ‘Do you/ did you ever smoke?’). If the answer was ‘yes’, the frequency of substance consumption was asked and the number of alcoholic beverages, and the amount of smoking. The items are listed in the questionnaire in the supplementary material (Questionnaire).

Design

Following most previous studies on response distortion, faking was investigated in a repeated-measures design with three conditions. The online questionnaire was implemented using SoSci Survey (Leiner, Citation2019) and made available to participants at www.soscisurvey.de Participants were recruited via notices on campus and various online platforms. Participation in the study was not monetarily rewarded. In total, the questionnaire was accessed 756 times. 314 participants started working on the questionnaire and 142 participants completed the survey entirely.

In one condition, participants were asked to answer the questionnaire honestly (in the following: honest condition). In the two other conditions, participants were asked to fake their responses to appear as healthy as plausible (fake good condition), or as unhealthy as plausible (fake bad condition). A pilot study confirmed the effectiveness of the instruction and indicated the necessity to add the phrase ‘as (un-)healthy as plausible’ to the instruction to prevent unrealistic ceiling- or bottom-effects in the response behavior. The order of conditions was fully randomized to prevent sequencing order effects. The conduct of the study complied with the ethical standards of the responsible committee (The Ethics Committee of the Faculty of Empirical Human and Economic Sciences of Saarland University). Written informed consent was obtained from all subjects before the study.

Analytic Strategies

Data analysis was conducted using IBM SPSS Statistics 24. First, descriptive measures were calculated. For vegetables, fruit, grains, dairy, and eggs, the average amount of portions per day were calculated by multiplying the number of days by the amount of food eaten and then dividing that by seven days. For meat and fish, the recommendations of the DGE are based on weekly consumption, therefore, the average amount of meat and fish consumed per week was calculated by multiplying the number of days of consumption with the reported number of portions. Analogously, the time spent on physical activity per week was calculated by multiplying the number of days of vigorous physical activity per week, respectively moderate physical activity, and walking, with the particular amount of time spent. Drinking and sitting were assessed as daily behavior, thus these measures were not modified. To detect intraindividual differences in the reported diet and physical activity, two repeated-measures multivariate analyses of variance (MANOVA) were conducted. Individual comparisons on each facet of the constructs as well as planned contrasts were conducted to specify the results. For smoking, measures for the frequency of smoking behavior and the number of cigarettes per day were assessed. A repeated-measures analysis of variance (ANOVA) was conducted to analyze intraindividual differences. Similarly, for alcohol consumption, the frequency and quantity of alcohol consumption were compared across the three conditions through two ANOVAs.

Results 1

To examine intraindividual differences in the reported dietary habits, a repeated-measures MANOVA was conducted. The reported dietary habits differed significantly over the three conditions, Wilks Lambda = .19, F(16, 125) = 32.47, p < .001, ηp2 = .81, confirming the hypotheses. Furthermore, all subscales showed similar differences: vegetables, F(1.70, 238.04) = 185.87, p < .001, ηp2 = .57, fruit, F(1.89, 264.23) = 181.52, p < .001, ηp2 = .57, grains, F(1.42, 199.39) = 59.19, p < .001, ηp2 = .30, dairy products, F(1.82, 255.00) = 37.96, p < .001, ηp2 = .21, meat, F(1.44, 201.28) = 158.18, p < .001, ηp2 = .53, fish, F(1.57, 220.02) = 32.55, p < .001, ηp2 = .19, eggs, F(1.49, 209.04) = 26.85, p < .001, ηp2 = .16 and drinking, F(1.34, 188.12) = 61.00, p < .001, ηp2 = .31.

Across all variables, participants reported significantly different dietary habits under the instruction to fake good compared to the instruction to answer the questions honestly respectively to fake bad. For all dietary facets except meat consumption, there were significant differences between the instructions to be honest or fake good. Also, under the instruction to fake bad, the responses given differed significantly from the honest condition, thus confirming the hypotheses (H1). For example, instructed to fake good, participants reported eating significantly more fruit and vegetables than in the honest and the fake bad condition. Similarly, instructed to fake bad, participants reported eating significantly less fruit and vegetables per day than instructed to answer honestly. The descriptive values, as well as the planned contrasts, are shown in .

Table 1. Means, standard deviations, and planned contrasts statistics for reported diet and physical activity in the within-subjects design.

For physical activity, differences in vigorous physical activity, moderate physical activity, walking, and sitting were investigated. A repeated-measures MANOVA showed that there were significant differences between the three conditions concerning their reported levels of physical activity, Wilks Lambda = .27, F(8, 133) = 44.41, p < .001, ηp2 = .73. All facets of physical activity showed the expected differences (H2). The three conditions differed significantly for the reported vigorous physical activity, F(2, 289) = 68.28, p < .001, ηp2 = .33, moderate physical activity, F(1.76, 246.92) = 56.00, p < .001, ηp2 = .29, walking, F(1.42, 271.40) = 116.46, p < .001, ηp2 = .30 and sitting, F(1.41, 197.37) = 116.46, p < .001, ηp2 = .45. Again, displays the descriptive values and the planned contrasts between the three conditions. In support of our hypotheses, the responses given in each of the conditions differed significantly. For vigorous and moderate physical activity as well as walking, higher values signify healthier behavior, contrary to sitting, where higher values equal unhealthier behavior. Thus, instructed to fake bad and report an unfavorable level of physical activity, participants reported less time spent on vigorous and moderate physical activity and walking and reported more time spent sitting than instructed to present themselves honestly respectively fake good. Also, consistent with our hypotheses, the fake good condition differed significantly from the honest condition. For example, instructed to fake good, participants reported spending less time sitting and spending significantly more time walking and with moderate and vigorous physical activity than instructed to be honest.

Assuming that the experimental instruction influences the reported smoking behavior, a one-way repeated-measures ANOVA was conducted (H3). The expected differences were detected, F(1.65, 232.07) = 304.23, p < .001, ηp2 = .68. Contrast tests were consistent with the previous findings: instructed to fake bad, participants reported significantly more smoking (M = 3.22, SD = 1.09) than instructed to present themselves honestly (M = 1.52, SD = 0.91), F(1, 141) = 264.58, p < .001, ηp2 =.63, and instructed to fake good (M = 1.13, SD = 0.41), F(1, 141) = 481.76, p < .001, ηp2=.77. The fake good condition differed significantly from the honest condition, F(1, 141) = 35.42, p = <.001, ηp2 =.20.

Concerning the number of smoked cigarettes per day, the three conditions differed significantly, F(1.13, 98.3) = 59.92, p < .001, ηp2 = .41. Contrast tests were consistent with the previous findings: instructed to fake bad (M = 8.47, SD = 0.86), participants reported smoking more cigarettes per day than instructed to present themselves honestly (M = 0.73, SD = 2.83), F(1, 87) = 61.01, p < .001, ηp2 =.41, and instructed to fake good (M = 0.05, SD = 0.43), F(1, 87) = 63.60, p < .001, ηp2=.42. The fake good condition also differed significantly from the honest condition, F(1, 87) = 5.76, p = .019, ηp2 =.06.

A one-way repeated-measures ANOVA was conducted to investigate whether the reported amount of alcohol consumption differed according to the experimental instruction (H4). In support of our hypotheses, the reported frequency of alcohol consumption in the three conditions differed significantly, F(1.68, 237.10) = 161.86, p < .001, ηp2 = .53. Instructed to fake good, participants reported a lower frequency of alcohol consumption (M = 1.56, SD = 0.87) than instructed to report their health behavior honestly (M = 2.29, SD = 0.60), F(1, 141) = 89.32, p < .001, ηp2= .39, and differed significantly from the fake bad condition (M = 3.09, SD = 0.58), F(1, 141) = 232.89, p < .001, ηp2 = .62. Also, the reported frequency of alcohol consumption differed significantly between the instruction to report health behavior honestly and fake bad, F(1, 141) = 119.56, p < .001, ηp2 = .46.

Concerning the amount of alcoholic beverages, a repeated-measures ANOVA again showed that the three conditions differed significantly, F(2, 40) = 55.22, p < .001, ηp2 = .73. On average, participants reported consuming less portions of alcoholic beverages when they were instructed to fake good (M = 1.83, SD = 1.14) than when instructed to respond honestly (M = 2.90, SD = 1.43), F(1, 20) = 16.54, p = .001, ηp2 = .45, and when instructed to fake bad (M = 4.33, SD = 0.86), F(1, 20) = 111.70, p < .001, ηp2 = .85. They also reported consuming less portions of alcoholic drinks when they were instructed to respond honestly than when instructed to fake bad, F(1, 20) = 44.78, p < .001, ηp2 = .69. Thus, the responses given in each condition indicated different amounts of consumption of alcoholic beverages and differences in the reported frequency of consumption.

Discussion 1

The results of this study match previous findings on response distortion claiming that people are successful at altering their responses according to the instruction. For example, Viswesvaran and Ones (Citation1999) showed in a meta-analysis that people are very successful at altering their responses in studies on personality questionnaires employing a directed-faking design. In the health context, the results also align with the respective body of literature. For example, Fekken et al. (Citation2012) showed that self-reported health behavior on the Health Behavior Checklist (HBC, Vickers, Conway, & Hervig, Citation1990) was susceptible to response distortion. The results of Fekken et al. (Citation2012) indicated that all dimensions of the HBC were susceptible to reporting an unfavorable behavior, but faking good was only shown on the preventive health subscale, whereas faking good was not successfully shown on the subscales for health risk behavior. Contrary to these findings, in the present study successful response distortion in both directions was not only shown on preventive health behavior but also health risk behavior like smoking and alcohol consumption. Thus, generally speaking, the current study complies with the existing literature and extends its findings.

Considering the magnitude of response distortion in the current study, it was shown that the effects for reporting a more favorable health behavior (.03 ≤ ŋ2 ≥ .55) were smaller than the effects for reporting a more unfavorable health behavior (.03 ≤ ŋ2 ≥ .63) on nearly all facets of health behavior. This might be an indicator of an egocentric bias in the perception of onés health behavior. More specifically, the above-average effect presumes that people rate themselves more favorably than comparable others (Alicke, Klotz, Breitenbecher, Yurak, & Vredenburg, Citation1995; Taylor & Brown, Citation1988). Following this assumption, people would rate their health behavior as above average, thus leaving little room for improvement. This ceiling effect might explain why the effect sizes for positive response distortion were smaller than their counterparts for negative response distortion. The dissimilarity of the magnitude for the two directions of response distortion has previously been reported in faking personality inventories (Viswesvaran & Ones, Citation1999).

Yet, comparing the detected range of response distortion to previous findings on faking in personality inventories, the effect sizes of the current study seem to be larger than previous findings. For example, Birkeland, Manson, Kisamore, Brannick, and Smith (Citation2006), as well as Viswesvaran and Ones (Citation1999), claim that the extent of response distortion in personality inventories may be up to one standard deviation. For example, conscientiousness scores, as well as neuroticism scores, could be augmented by nearly an entire standard deviation, scores for extraversion, openness, and agreeableness tended to be augmented by around half a standard deviation (corresponding to η2 = .20, resp. η2 = .06) (Viswesvaran & Ones, Citation1999). The effect sizes in the present study, however, ranged from .03 ≤ η2 ≥ .63 and can mostly be interpreted as very large (Cohen, Citation1988; Funder & Ozer, Citation2019). Although it is unclear whether personality inventories and health behavior questionnaires can be regarded as comparable, two conclusions can be drawn: first, apparently, health behavior scales, in general, seem to be very susceptible to faking. Second, some facets of health behavior seem to be more prone to response distortion than others. Maybe, the knowledge on some facets of health behavior is larger than on others, rendering faking a more difficult task concerning those facets, where the knowledge is lower (Levashina & Campion, Citation2006). Possibly, the sample of the study contributed to the magnitude of the faking effects. As the sample had a relatively high level of education, participants might have been particularly good at distorting their responses because their capacity to fake was high. Another explanation for the magnitude of the effect sizes in the present study is related to design characteristics of the within-subjects design. As participants had to contrast multiple scenarios, they might have been more sensitized to distort their responses more drastically following the instructions, leading to increases in effect sizes (Charness et al., Citation2012). Similar assumptions were made by Viswesvaran and Ones (Citation1999), who also highlighted the demand characteristics of within-subjects designs.

Method 2

To circumvent the inherent design difficulties of the interpretation of the previous results and to replicate the findings, the research questions of study 1 were also examined using a between-subjects design. The questionnaire was the same in both study designs, solely the set-up was adjusted.

Sample

For study 2, the calculation of the intended sample size was adjusted to the effects detected in the within-subjects design. At least medium effects were anticipated in the between-subjects design (Cohen, Citation1988). Therefore, the intended sample size calculated by G*Power was 123 participants (Faul et al., Citation2007). The final sample included 128 German participants (55.5% female) with a mean age of 26 years (SD = 7.71, range 18-58). 90% of the participants at least graduated from high school and 36% of those had a university degree. Participants were randomly distributed to one of three groups. Due to selective drop-out, there were 38 participants in the fake good group, 49 participants in the honest group, and 41 participants in the fake bad group. No significant differences were found between the three groups concerning age, gender, or education (p > .2).

Design

The same questionnaires as in study 1 (see METHOD 1 section) were employed with the exception that each participant was randomly assigned to one of the three instructions. Thus, study 2 was based on a three-group between-subjects design. The online questionnaire was again implemented using SoSci Survey (Leiner, Citation2019) and made available to participants at www.soscisurvey.de. The conduct of study 2 complied with the ethical standards of the responsible committee (The Ethics Committee of the Faculty of Empirical Human and Economic Sciences of Saarland University). Written informed consent was obtained from all subjects before the study.

Analytic strategies

The descriptive values were calculated analogously to the procedure in study 1. For diet and physical activity, two MANOVAs were conducted to detect significant differences overall concerning the two constructs. Individual comparisons on each facet of the constructs as well as planned contrasts were used to specify the results. An ANOVA was applied to analyze group differences in the reported frequency of smoking. As the base rate of participants reporting any smoking was very small (n = 31), differences in the reported number of cigarettes per day were not analyzed. Similarly, for alcohol consumption, the frequency and quantity of alcohol consumption were compared across the three groups through two ANOVAs.

Results 2

To examine differences in the reported dietary habits, a MANOVA was used. The three groups differed significantly in their reported diet, Wilks Lambda = .59, F(14, 238) = 5.22, p < .001, ηp2= .24. Furthermore, most facets showed similar differences. For the reported amount of vegetables, F(2, 122) = 22.29, p < .001, ηp2= .26, fruit, F(2, 122) = 14.12, p < .001, ηp2= .18, grains, F(2, 122) = 3.98, p = .021, ηp2= .07, meat, F(2, 122) = 20.62, p < .001, ηp2= .25, eggs, F(2, 122) = 3.54, p = .032, ηp2= .05, and drinking, F(2, 122) = 8.03, p < .001, ηp2= .12, the three groups differed significantly, except for dairy products, F(2, 122) = .20, p = .819, ηp2= .00, and fish, F(2, 122) = 0.74, p = .477, ηp2= .02, where the reported portions did not differ significantly between the three groups. The descriptive measures and the group contrasts are displayed in . Across all variables, the fake good group and the honest group differed significantly from the fake bad group, and the fake bad group differed significantly from the honest group. Yet, the fake good group did not differ significantly from the honest group, thus the hypotheses were only partly confirmed (H1). For example, the fake good group and the honest group reported eating significantly more fruit and vegetables than the fake bad group, and the fake bad group reported eating significantly less fruit and vegetables per day than the honest group. But the honest group did not differ significantly from the fake good group in the reported amount of fruit and vegetables eaten.

Table 2. Means, standard deviations, and planned contrasts statistics for reported diet and physical activity in the between-subjects design.

A MANOVA showed that there were significant differences between the three groups concerning their reported levels of physical activity, Wilks Lambda = .64, F(10, 242) = 6.15, p < .001, ηp2 = .20. With the exception of moderate physical activity, F(2, 125) = 1.30, p = .138, ηp2= .02, all variables showed the hypothesized effects. The reported level of physical activity differed concerning vigorous physical activity, F(2, 125) = 16.11, p < .001, ηp2= .21, walking, F(2, 125) = 5.61, p = .022, ηp2= .08, and time spent sitting, F(2, 125) = 6.56, p = .002, ηp2= .10. Again, the descriptive values and the contrasts between the groups are displayed in . Consistent with our hypotheses, the honest group differed from the fake bad group and the fake good group differed from the fake bad group (H2). Thus, the fake bad group reported less time spent on physical activity and more time spent sitting than the honest group and the fake good group. Yet, contrary to the hypotheses, the fake good group did mostly not differ from the honest group, except for the reported time spent sitting. Thus, the fake good group reported spending less time sitting but did not report spending significantly more time with physical activity than the honest group.

Concerning the reported smoking behavior, an ANOVA was conducted. Significant differences between the three groups were found, F(2, 125) = 21.51, p < .001, ηp2= .26. Contrast tests were consistent with the previous findings: the fake bad group (M = 2.31, SD = 0.93) reported unhealthier smoking behavior than the honest group (M = 1.35, SD = 0.69), t(125) = −5.86, p < .001, ηp2 =.26, and the fake good group (M = 1.34, SD = 0.71), t(125) = −5.53, p < .001, ηp2=.26. Yet, the fake good group again did not differ significantly from the honest group, t(125) = -.03, p = .977, ηp2< .01 (H3).

An ANOVA was also conducted to investigate whether the three groups differed in their reported frequency of alcohol consumption. In support of our hypotheses, the three groups differed significantly in their reported frequency of alcohol consumption, F(2, 125) = 5.65, p = .004, ηp2 = .08. Participants of the fake good group reported a healthier level of alcohol consumption (M = 2.66, SD = .58) than participants of the honest group (M = 2.96, SD = .46), t(125) = −2.69, p = .008, ηp2 = .08, and differed significantly from participants of the fake bad group (M = 3.02, SD = .52), t(125) = −3.15, p = .002, ηp2 = .10. The reported frequency of alcohol consumption did not differ significantly between the honest group and the fake bad group, t(125) = .59, p = .553, ηp2 = .01 (H4), however. Thus, participants instructed to fake good reported consuming alcoholic beverages less frequent than participants instructed to report their consumption of alcoholic beverages honestly, but only the fake good group differed significantly from the fake bad group.

Concerning the amount of alcoholic drinks, the three groups also differed significantly, F(2, 107) = 9.60, p < .001, ηp2 = .15. Participants of the fake good group (M = 1.11, SD = 0.53) reported less portions of alcoholic drinks than participants in the honest group (M = 1.50, SD = 0.69), t(107) = −2.36, p = .020, ηp2 = .09. The fake good group differed as expected from the fake bad group (M = 1.86, SD = 0.76), t(107) = −4.37, p < .001, ηp2 = .26. This also was the case for the honest group, t(107) = −2.37, p = .02, ηp2 = .08. Thus, the three groups differed both in the reported frequency as well as the amount of consumption of alcoholic drinks.

Discussion 2

Study 2 also supports the assumption that participants are able to distort their responses in a health behavior self-report questionnaire when instructed to do so. On all four dimensions of health behavior, thus both concerning preventive behavior and risk behavior, significant differences were found under the different experimental instructions.

The effect sizes of study 2 comply with previous findings. Again, for most dimensions of health behavior, large effects for faking were found (.08 ≤ ŋp2 ≥ .25). Yet, although participants were generally able to distort their responses, the reported health behavior of the fake good group did not differ significantly from the one of the honest group concerning at least some of the health behavior dimensions. However, both the fake good group and the honest group differed significantly from the fake bad group except for very few facets.

The missing differences between the fake good group and the honest group might be a result of several processes. For example, participants in the honest group might have practiced response distortion, too. According to Mazar and Ariely (Citation2006; Mazar, Amir, & Ariely, Citation2008), it is possible to behave dishonestly to a certain extent without challenging the self-concept of being an honest person. The Theory of Self-Concept Maintenance assumes that people can solve their motivational dilemma of profiting from dishonest behavior versus risking the extrinsic and intrinsic costs of dishonest behavior by balancing both elements. The theory claims that there is a range of dishonesty within which people can behave dishonestly enough to profit from it but do not endanger their positive self-view (Mazar, Amir, & Ariely, Citation2008). In the present study, this mechanism would allow for participants to report their health behavior slightly ameliorated without getting into conflict with the experimental instruction to report their health behavior honestly.

The missing differences between the two experimental groups might also be explained by an egocentric bias of all participants. Again, the above-average effect (Alicke et al., Citation1995; Taylor & Brown, Citation1988) might have led participants to believe that their health behavior is healthier than the health behavior of other people. Thus, when asked to fake good and report a particularly positive health behavior, participants of this experimental group might have adjusted their reports insufficiently since they perceived their behavior to already be healthy above average. Correspondingly, they would adjust their responses very much in the fake bad group to report a health behavior that is even more unfavorable than that of most people.

A third explanation for the similarity of the reported health behavior of the fake good group and the honest group arises from design characteristics. A possible, although improbable inherent design flaw might be that the three groups differed from one another regarding their real health behavior. However, as participants were assigned randomly to the three groups, there should not have been significant differences in health behavior between the groups. As Rost (Citation2013) state, all confounding variables should be present in all experimental groups equally if participants are assigned randomly to the groups and the experimental groups are sufficiently large, thus minimizing the probability of systematic a priori group differences. Yet, significant differences in the real health behavior of the participants of the three experimental groups cannot entirely be ruled out.

Comparison of main results of the two studies

To investigate first indicators of design inherent differences, the honest responses given in study 1 were compared to the honest responses in study 2. MANOVAs and ANOVAs indicate that the honest responses from the within-subjects design were significantly healthier than honest responses of the between-subjects design for diet, Wilks Lambda = .60, F(8, 173) = 14.15, p < .001, ηp2 = .40, physical activity, Wilks Lambda = .94, F(4, 178) = 2.96, p = .021, ηp2 = .06, smoking, F(1, 181) = 23.95, p < .001, ηp2 = .12, and alcohol consumption, F(1, 181) = 9.09, p = .003, ηp2 = .05.

To secure the internal validity of this comparison, potential differences in the demographic characteristics of the two samples were investigated. The mean age was comparable in the two samples, t(249,284) = -.57, p = .568, ηp2 < .01, the mean age in the within-subjects design (M = 25.49, SD = 11.39) did not differ significantly from the mean age in the between-subjects design (M = 26.16, SD = 7.71). However, the gender ratio differed significantly between the two samples, χ2(1) = 10.45, p < .001. In the within-sample, there were significantly more female participants (74.3%) than in the between-sample (55.5%). Moreover, the education level differed slightly between the two samples. Participants of the between-subjects design tended to have a higher education (M = 5.97, SD = 1.05), than participants of the within-subjects design (M = 5.40, SD = 0.91), t(268) = −4.76, p < .001, ηp2 = .08.

General discussion

The current studies aimed at exploring people’s ability to fake self-reported health behavior both concerning preventive health behavior and health risk behavior. Both the conventional within-subjects design and the between-subjects design yielded evidence for people’s ability to practice response distortion and fake reports of their health behavior. The effect sizes of faking in self-report measures of health behavior indicate that the phenomenon should not be underestimated. The results thus comply with and expand previous findings.

Design-related differences

Comparing the results of both studies, the patterns of the differences in both studies between the instructions to fake good, be honest, or fake bad look similar. However, there are repeatedly larger and more pronounced differences between the instructions in the within-subjects design. The observation that the responses to the different instructions are more comparable and closer together in the between-subjects design is also backed by the insignificant differences between the fake good group and the honest group on most facets. The more distinct response pattern in the within-subjects design might be an indicator for a participant x treatment interaction (Viswesvaran & Ones, Citation1999). It seems plausible that there might be interindividual differences in faking. That is, participants with broader knowledge or a higher need for approval from others might fake more than others (Levashina & Campion, Citation2006; Rzewnicki, Auweele, & De Bourdeaudhuij, Citation2003). Alternatively, the more pronounced effects might be a design characteristic. For example, demand-effects might have caused participants to answer rather extremely in the two faking instructions, as participants tend to alter their responses as a result of direct contrasts of the conditions (Charness et al., Citation2012).

The more pronounced effects in the within-subjects design might also be a result of comparison effects. The study design might have caused participants of the within-subjects design to alter their responses carefully according to the instructions and previously given answers. Participants of the within-subjects design might not have reported their actual health behavior but somehow calculated figures in the honest condition. A first indicator of such referencing is that participants of the honest condition of the within-subjects design reported significantly healthier behavior than participants of the honest group of the between-subjects design. Yet, as stated above, the samples of the two studies differed concerning some characteristics. In study 1, there were significantly more female participants than in study 2. As women usually have a better health behavior than men, this also might explain why participants of the within-subjects design showed healthier honest reports than those of the between-subjects design (Dehghan, Akhtar-Danesh, & Merchant, Citation2011; Wardle et al., Citation2004). In addition, the education level differed slightly between the two samples. Participants of the between-subjects design tended to have a higher education than participants of the within-subjects design. Higher education is usually positively correlated with better health behavior (Cowell, Citation2006). Similarly, higher education is positively correlated with the ability to fake (Levashina & Campion, Citation2006). Thus, it seems rather surprising that participants of the within-subjects design reported healthier behavior in the honest condition. Nevertheless, most participants of both studies had at least graduated from high school and it might be possible that the variance of educational levels was quite restricted to highly educated participants in the present studies. The effects of those potential differences concerning the interpretation of the different findings between the two studies are not clear. The most important comparison between the two samples would doubtlessly be their self-reported health behavior in a context that does not involve any faking instruction. This would probably call for an innovative research design that future studies might employ. Ideally, this design would not only allow excluding the possibility of a priori group differences in the health behavior of the participants of the two study designs but would also shed light on the processes of response distortion in the respective design characteristics.

Implications for research and practice

Although a few issues remain to be resolved, the current research contributes to the body of knowledge with important insights. For future research, it seems crucial to beware of design-related differences in studies concerning response distortion. Whereas quite a few previous studies concluded that the within-subjects design might be better suited to investigate faking, it seems important to bring into consideration the ecological validity of both designs. As noted previously, the evoked mental processes might differ in within- and between-subjects designs (Hsee & Zhang, Citation2004). Thus, it would be important to access the cognitive processes that prevail when faking is done. Therefore, future studies should include qualitative methods like the Thinking Aloud Method, in which participants verbalize their thoughts while faking a health behavior questionnaire to get closer to which design seems to be more naturally aligned with the ongoing mental processes in practice (Eccles & Arsal, Citation2017).

Moreover, it is important to emphasize the meaning of the current research for previous and future research as well as for practitioners. Data on health behavior based on self-report measures have to be interpreted cautiously, as there is a very real possibility that the reports have suffered from faking. For example, nationwide assessments of dietary habits or physical activity are often realized based on phone-based interviews. It seems plausible that these results are prone to faking. In their large-scale study on assessing the level of physical activity worldwide, Guthold et al. (Citation2018) acknowledge the possibility of faked responses in self-reports. They attempted to correct for it by applying a correcting factor which resulted of a comparison of the results of the IPAQ to another self-report questionnaire, the Global Physical Activity Questionnaire (Armstrong & Bull, Citation2006). Assuming that people have a desire to create a favorable image of themselves, it is quite conceivable that participants would have augmented their reported levels of physical activity in both questionnaires. If that was the case, the real extent of insufficient physical activity would still be underestimated. The current research leads to the assumption that this underestimation might be a huge problem, as faking self-reported levels of physical activity was shown to be easily executed. The occurrence of faking in self-report questionnaires might also lead to faulty interventions, either because a need for an intervention is not recognized or because interventions are implemented that are not optimally tailored to the need.

A useful strategy would be to counteract faking behavior in the first place. Therefore, the willingness to fake should be targeted. As Levashina and Campion (Citation2006) claim, the willingness to modify the information given to create the desired impression relies largely on the potential benefits of the desired impression. The prevention of negative feelings as shame and guilt, as well as fear of judgment, have been identified as major reasons for dishonesty in practice (Levy et al., Citation2018). Thus, researchers and practitioners should attempt to create an environment of trust and acceptance to minimize the initial willingness to fake. Also, highlighting the benefits of honest responses might decrease the willingness to fake (Law, Bourdage, & O’Neill, Citation2016).

Researchers and practitioners would probably also benefit from a critical pluralism of methods when assessing subjective constructs like mental and physical health status and behavior. For example, Rzewnicki et al. (Citation2003) previously showed that interviewing participants that previously filled out a written self-report questionnaire on their physical activity level led participants to correct their reports. Thus, it might be a simple solution to compare different self-report measures. As noted earlier, this method was applied in research, for example by Guthold et al. (Citation2018). Yet, assuming that all self-report measures might be prone to faking, the additional use of more objective measurement methods might be plausible, for example, the use of wearable activity trackers to assess physical activity (Wong, Mentis, & Kuber, Citation2018). Although these indicators have their faults and weaknesses, in combination with self-report measures, the resulting findings might be less prone to errors due to faking and response distortion.

Limitations

An important limitation is that the current studies investigated solely whether successful faking of self-reported health behavior is theoretically possible. As customary in directed-faking studies, we specifically instructed participants to distort their responses. Levashina and Campion (Citation2006) claim that the capacity, the willingness, and the opportunity to control the information given are essential for faking. By controlling the willingness and the opportunity to fake, we simplified the faking process artificially and solely investigated people’s capacity to fake their responses. The results do not answer the important question of whether in practice, people tend to fake their self-reported health behavior. Yet, the lack of differences between the instructions to be honest and to fake good in the between-subjects design might be indicative of participants practicing response distortion unsolicited, as there were clear differences in the within-subjects design when a comparison between the responses to the different instructions was possible. As previous research indicates a high probability for the presence of such faking in practice (DePaulo et al., Citation1996; Levy et al., Citation2018) and the current studies hint at faking being a substantial threat to the assessment of health behavior, future studies should target the presence of faking in self-reports concerning health behavior in practice.

Conclusion

This research yields evidence for people’s ability to practice response distortion and fake their reported health behavior both concerning preventive health behavior and health risk behavior. As faking is linked with important considerations about an adequate study design, a major benefit of this research is the analysis of the research question on behalf of two research designs to investigate the robustness of the results and profit from the scientific insights of each design. The findings of the two studies call for caution when interpreting health behavior data based on self-report measures. It is highly recommended to consider faking in future research to clarify its impact on the interpretation of self-reported health behavior in research and practice.

Supplemental material

Supplemental Material

Download MS Word (40.2 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References