2,424
Views
5
CrossRef citations to date
0
Altmetric
Articles

Incidental disgust does not cause moral condemnation of neutral actions

, &
Pages 96-109 | Received 19 Nov 2019, Accepted 11 Aug 2020, Published online: 25 Aug 2020

ABSTRACT

Emotivism in moral psychology holds that making moral judgements is at least partly an affective process. Three emotivist hypotheses can be distinguished: the elicitation hypothesis (that moral transgressions elicit emotions); the amplification hypothesis (that disgust amplifies moral judgments); and the moralisation hypothesis (that affect moralises the non-moral). Even though the moralisation hypothesis is the strongest and most radical form of emotivism, it has not been systematically experimentally tested. Most previous studies have used as stimuli morally wrong actions, and thus they cannot answer whether disgust is sufficient to moralise an otherwise neutral action. In Experiment 1 (N = 87) we tested the effect of incidental disgust on morally neutral scenarios, and in Experiment 2 (N = 510) the differential effect of disgust on neutral and wrong scenarios. The results did not support either the moralisation or the amplification hypothesis. Instead, Bayesian analyses provided substantial evidence for the null hypothesis that incidental disgust does not affect moral ratings. The results are in line with a recent meta-analysis suggesting that disgust has no effect on moral ratings.

Traditional theories of moral judgement (e.g. Kohlberg, Citation1981; Piaget, Citation1997) consider people as rational beings, who make moral judgements through rational reflection. They weigh the pros and cons of their actions, consider various rules that they should or should not follow, or even engage in highly abstract considerations such as what would happen if everyone would follow the same maxim as they do. Recent findings in moral psychology question this position, indicating that, at least in some cases such as sexual morality, people's moral judgements are not driven by rational reflection, but instead emotion. We may call moral psychological theories that posit some role for emotion in moral judgement emotivist theories of moral judgement. Following Avramova and Inbar (Citation2013) and Landy and Goodwin (Citation2015), we may distinguish between the three following emotivist claims:

  1. Emotions follow from moral judgements (the elicitation hypothesis)

  2. Emotions amplify moral judgements (the amplification hypothesis)

  3. Emotions moralise the non-moral (the moralisation hypothesis)

The elicitation hypothesis is the most undisputed emotivist claim: for instance, reading about moral transgressions elicits facial expressions of disgust (Cannon et al., Citation2011). There is also evidence for the amplification hypothesis: for example, artificially induced disgust may amplify negative ratings of moral transgressions (e.g. Eskine et al., Citation2011; Inbar et al., Citation2011; Olatunji et al., Citation2016; Schnall et al., Citation2008; Ugazio et al., Citation2012; Wheatley & Haidt, Citation2005; but see Landy & Goodwin, Citation2015). The moralisation hypothesis, in turn, has received the least support, and is not directly addressed in the literature (Avramova & Inbar, Citation2013; Landy & Goodwin, Citation2015; Pizarro et al., Citation2011). This is surprising, since the moralisation hypothesis represents the strongest and most radical form of emotivism and is explicitly advocated by some researchers (e.g. Haidt, Citation2001; Prinz, Citation2006). In contrast to the amplification hypothesis, which implies that an emotion like disgust can amplify the perceived wrongness of an action, the moralisation hypothesis implies that disgust could render an action morally condemnable that is otherwise considered as morally neutral. As Avramova and Inbar (Citation2013) write, “on this view, a briefly experienced flash of disgust can make the difference between finding (for example) smoking or homosexuality morally objectionable or acceptable” (Avramova & Inbar, Citation2013, p. 170).

We will next review the previous studies addressing the role of emotions in moral judgment, focusing particularly on whether they support strong emotivism. We will focus on disgust in particular, which has been most intensively studied previously (for the differential effect of emotions besides disgust, see e.g. Hutcherson & Gross, Citation2011; Rozin et al., Citation1999; Russell & Giner-Sorolla, Citation2011; Russell & Piazza, Citation2015).

Previous research

Correlational studies

The early studies on the role of emotion in moral judgement were interviews. For instance, Haidt et al. (Citation1993) investigated how disrespectful but harmless actions are judged morally by participants of low and high socioeconomic status (SES) in the U.S. and Brazil. The scenarios depicted cleaning a toilet with one's national flag, consensual incest between siblings, or eating one's dead pet dog. They discovered that high-SES U.S. participants from elite universities considered the actions to be matters of social convention or personal preferences, whereas participants of low SES (particularly low-SES Brazilians) judged the actions to be immoral. Importantly, for those participants who judged the actions to be wrong, the judgement was better predicted by ratings of offensiveness than by ratings of harmfulness (which can be taken to reflect conceptual processes). On the contrary, for those participants who did not moralise the actions, the judgement was better predicted by ratings of harm than offensiveness (Haidt et al., Citation1993).

In a similar study, Haidt and Hersh (Citation2001) interviewed American Democrats and Republicans about three kinds of sexual acts: homosexual sex, unusual forms of masturbation, and consensual incest between adult brother and sister. They discovered that Republicans were less tolerant than Democrats towards the actions, and that the Republicans relied more on offensiveness than harm in justifying their answers. Regression analysis indicated that, in both groups, harm did not significantly predict moral condemnation in any of the scenarios, but instead affect and religious strength did, affect being the best predictor. The authors also discovered a phenomenon they entitled “moral dumbfounding”, defined as “a confused inability to explain one's position” (Haidt & Hersh, Citation2001, p. 209), characterised by puzzlement, laughter, and stuttering (see also Haidt et al., unpublished manuscript). Importantly, however, Haidt and Hersh (Citation2001) did not discover any moral dumbfounding in the classical Heinz dilemma (Kohlberg, Citation1981), whose ratings were best predicted by harm.Footnote1 It can be argued that the Heinz case was less emotional and more reflectively judged than the emotional scenarios.

There is also evidence that disgust sensitivity is associated with moral judgments (e.g. Chapman & Anderson, Citation2014; Inbar, Pizarro, Knobe, et al., Citation2009b; Jones & Fitness, Citation2008). For instance, Chapman and Anderson (Citation2014) presented evidence that participants with high disgust sensitivity rated moral transgressions more harshly. Relatedly, moral transgressions have been found to elicit facial expressions characteristic of disgust (Cannon et al., Citation2011; Chapman et al., Citation2009).

All these studies are correlational and mainly support the elicitation hypothesis. They leave open the direction of causality – whether affect causes moral condemnation or vice versa. The moralisation hypothesis is more directly assessed by experimental studies, which have typically focused on the effect of artificially induced disgust on moral ratings.

Experimental studies

Disgust has been discovered to amplify moral judgements (e.g. Eskine et al., Citation2011; Inbar et al., Citation2011; Olatunji et al., Citation2016; Schnall et al., Citation2008; Ugazio et al., Citation2012; Wheatley & Haidt, Citation2005; but see Landy & Goodwin, Citation2015). For instance, Eskine et al. (Citation2011) induced disgust in a group of participants through making them drink a bitter tasting liquid. The disgust group made more severe moral ratings than controls in scenarios depicting moral transgressions, such as consensual incest or shoplifting. To take another example, Schnall et al. (Citation2008) induced disgust in participants through bad smell, by making them conduct the test in a dirty office, recalling a disgusting experience, or through watching a disgusting video. In all conditions, the disgusted participants made more severe moral judgements about the described moral transgressions than the non-disgusted participants. The effect was mediated by the participants’ sensitivity to feel disgust.

The previous studies are limited in two respects. First, they have mainly used scenarios depicting moral transgressions, and thus addressed only the amplification hypothesis and not the moralisation hypothesis. That is, they have tested whether incidental disgust amplifies condemning judgments of actions that would have been judged as morally wrong already before the emotional induction. Thus, they cannot test the moralisation hypothesis. Second, the studies have been plagued by small sample sizes and possibly a publication bias. A recent meta-analysis by Landy and Goodwin (Citation2015) found only a very small amplifying effect of disgust (d = .11), which disappeared completely after controlling for publication bias (but see Schnall et al. (Citation2015), for a critical discussion).

To our knowledge, no previous studies have systematically addressed the moralisation hypothesis by focusing on scenarios that are not judged to be immoral without emotional induction. Landy and Goodwin (Citation2015) found evidence for a moralising effect of disgust in their meta-analysis, but the effect was small (d = .21) and came from a small number of studies (k = 13). Moreover, there was evidence of a publication bias: all published studies (k = 6) supported a larger effect size than the unpublished studies (k = 7). Evidence for or against the moralisation hypothesis mainly comes from individual cases. For example, Wheatley and Haidt (Citation2005) found evidence that hypnotically induced disgust moralised an otherwise neutral action (the student council case), whereas Schnall et al. (Citation2008) did not find any effect of disgust on the otherwise neutral scenario of driving vs. walking to work (see also Ugazio et al., Citation2012). Most of the studies mentioned in Landy and Goodwin (Citation2015) that tested the moralisation hypothesis are unpublished manuscripts, theses, or raw data (see column “Nonmoral” in in Landy and Goodwin (Citation2015) and the corresponding references). Systematically testing the moralisation hypothesis is also motivated, because it is the strongest form of emotivism: it implies that disgust alone could moralise an otherwise neutral action.

Table 1. Mean ratings for the permissibility of the actions in the pre-test scenarios. The selected scenarios are marked with an asterisk (*). The scale is from −5 (morally totally wrong) to +5 (morally totally right).

It is worth bringing up here the study by Nichols (Citation2002), which did not use emotion induction, but instead probed participants on three types of transgressions: conventional non-disgusting (e.g. wearing pajamas to class), conventional disgusting (e.g. spitting in one's water glass and then drinking it at a dinner party), and moral non-disgusting (a child hitting another). The results were that disgusting conventional transgressions were moralised more often than non-disgusting ones. It is unclear whether this study can be considered as experimental, as it didn't utilise a control group and emotion induction. Thus, the difference between the ratings of the scenarios could be due to other factors than disgust (e.g. the participants could judge that spitting at a dinner party disturbs other people and is therefore wrong based on reasoning about harm). However, at least on face value the results support the moralisation hypothesis (but see Royzman et al., Citation2011).

Aims and hypotheses

We tested the moralisation hypothesis in two experiments. We defined “morally neutral” an action that is judged as “neutral” on an axis from morally right to wrong through neutral. The probes were selected in a pretest. In Experiment 1, we examined whether disgust, induced through viewing disgusting pictures, moralises neutral actions, as predicted by the moralisation hypothesis. In Experiment 2, we examined the differential effect of disgust, induced through viewing disgusting videos, on wrong and neutral scenarios. The moralisation hypothesis implies that disgust would moralise neutral scenarios (i.e. shift their estimates from neutral to condemnable), but the amplification hypothesis predicts that disgust would only amplify judgments of morally wrong scenarios, without affecting judgments of neutral scenarios. To control for possible moderating variables, we also measured the political orientation (in both experiments) and disgust sensitivity (in Experiment 1) of the participants. Earlier research suggests that conservatives are more sensitive to disgust than liberals, and that disgust sensitivity may be associated with moral ratings (e.g. Haidt & Hersh, Citation2001; Inbar, Pizarro, and Bloom, Citation2009a).

Experiment 1

Method

Pre-test

Twelve scenarios were formulated involving actions that could in principle be moralised but that would nevertheless be judged morally neutral. In addition, 4 control scenarios were included: two scenarios involving a morally wrong action and two scenarios involving a morally right action. The control scenarios were included to break the pattern and to enable the participants to use the full rating scale. The scenarios were presented to the participants (N = 54) in a pseudo-random order, so that the control scenarios were presented at roughly equal intervals. The participants were instructed to “evaluate the morality of the described actions”. They rated the scenarios on an 11-point Likert-scale ranging from −5 (morally totally wrong) through 0 (neither right nor wrong) to +5 (morally totally right). The survey was conducted on the Internet, using the web-based survey utility Webropol. Recruitment was conducted through psychology students’ mailing list at University of Turku, Finland.

Six of the 12 morally neutral scenarios were chosen for the experiment. The selection criteria were the following: the mean, mode and median of the rating of the scenario were between −1 and +1. The ratings of the pre-test scenarios are presented in .

The selected morally neutral scenarios were the following: Simo is a married man who secretly flirts with his female colleagues, although he never crosses the line; Sofia gossips to her friends that their 47-year-old male professor is dating a 24-year-old female student; Maj-Lis walks across a charity worker on the street but does not donate because she intends to use her change for a cup of coffee; Laura smokes at her workplace secretly from his husband, who disapproves smoking; Maija reads a magazine addressed to her neighbour, accidentally put in her mailbox by the postman, before returning it; Kauko knows about the carbon dioxide emission of airplanes and their effect on the climate, but nevertheless flies to Tenerife on his summer vacation. The morally wrong control scenario selected for the actual experiment was about Heli, who intentionally gave wrong directions to a group of gypsies who were looking for a spa; the morally right control scenario selected was about Stefan, who saved his neighbour's cat from a tree.

As we planned to use the mean of the neutral scenarios as the dependent variable in the proper experiments, we checked whether the mean morality rating differed from zero (morally neutral). A one-sample t-test indicated that the ratings did not differ from neutral (t = −1.16, p = .25) and Bayesian analysis indicated substantial evidence that the ratings did not differ from morally neutral (BF01 = 3.57).Footnote2

Participants

The experiment was conducted with university students (N = 87, 37 male) attending three different lectures: educational psychology (University of Turku, n = 19, 2 male), change management (Turku School of Economics, n = 37, 19 male), and political history (University of Turku, n = 31, 16 male). The participants were randomised into an experimental group (n = 45, 19 male) and control group (n = 42, 18 male). The participants were warned both on the title sheet and verbally about the possibility of seeing potentially disturbing pictures, and it was emphasised that participating in the experiment was voluntary. The sample size was not determined a priori, instead we aimed to gather as many participants as possible.

Materials and procedure

Two variants of a paper questionnaire were used, one for the experimental (disgust) group and the other for the control group. The questionnaires were given randomly to the students during the lectures, except for the last test group (political history), in which more experimental questionnaires were given to men to balance the number of males and females in the experimental group.

As a method of emotional induction carried out for the experimental group, the participants were asked to evaluate the interestingness and disgustingness of six coloured photographs on a 0–4 scale. The interestingness ratings were included to conceal the experimental purpose of the pictures. The pictures were chosen using Google image search. We did not use standardised image databases such as photographs from the International Affective Picture System (IAPS; Lang et al., Citation2008), because we judged that the IAPS pictures were either not disgusting enough, or they referred to morally relevant actions (e.g. pictures of victims of violence). In the control group, the pictures were emotionally neutral pictures from the same or higher-level semantic categories as the experimental stimuli. The stimuli are listed in .

Table 2. Stimuli used for emotional induction.

The experimental stimuli were the six morally neutral scenarios chosen in the pre-test, plus a wrong and right control scenario, making up 8 scenarios altogether. The control scenarios were included to encourage the participants to use the full rating scale. The participants were asked to “evaluate the morality of the action” in the scenarios. A 160-mm visual analogue (VA) scale was used, with the end points titled “morally totally wrong” and “morally totally right”, and the middle point (marked with a short vertical line) titled “neither morally right nor wrong”.

To measure the participants’ political orientation and disgust sensitivity, they were asked to rate their position on the political left-right axis and their position on the liberal-conservative axis on VA-scales, and to fill in a disgust sensitivity scale (DS-R, Haidt et al., Citation1994; modified by Olatunji et al., Citation2007). The VA-scales for the political questions were 160 mm long, with the end points titled “left” and “right”, and “liberal” and “conservative”. The centre point of the scale was marked with a short vertical line. The sum score from DS-R was used as a measure of disgust sensitivity.

The effectiveness of the disgust manipulation was tested using ratings of faces depicting Ekman's six basic emotions (joy, surprise, disgust, fear, sorrow, and anger). The participant was asked to rate on a 0–4 Likert scale how much their mood corresponded to each of the photographs. Photographs were used because we assumed they would probe the participants’ mood more directly and validly than verbal representations.

The order of the sections in the sheet was the following: 1. Political orientation, 2. DS-R, 3. Picture Rating, 4. Moral Rating, and 5. Mood Rating. The emotional induction (and control) pictures were positioned on the same spread with the moral scenarios to maximise possible effect on the scenarios. The Mood Rating appeared on the last page to confirm that the induced emotion remained throughout the scenarios.

Results

Prior to the analyses, the data was checked for corrupt responses (e.g. a null answer to all the moral probes); no such responses were found. The experimental and control groups were matched in terms of sex, age, DS, and political orientation (Left-Right and Liberal-Conservative), with threshold levels of ps ≥ .3. In the initial sample (N = 87), participants were more on the political right in the control group than in the experimental group. Thus, we excluded from the control group six participants most positive (i.e. representing the Right) on the Left-Right measure. In the final sample (n = 81), the groups did not differ on sex (χ2 = .040, p = .84, BF10 = .28), age (t = −.086, p = .93, BF10 = .23), DS (t = .44, p = .67, BF10 = .26), Left-Right scale (t = −1.05, p = .30, BF10 = .38), or Liberal-Conservative scale (t = −.70, p = .48, BF10 = .29). All subsequent analyses were performed on the sample with matched groups. The key participant characteristics in the final sample are summarised in .

Table 3. Key participant characteristics in the matched sample (n = 81).

The Left-Right rating correlated positively with the Liberal-Conservative rating (r = .51, p < .001; i.e. conservatism was associated with being on the political right), but neither the Left-Right nor Liberal-Conservative rating correlated with disgust sensitivity (rs < .16, ps > .17).

Effectiveness of the emotional induction procedure

The Mood Ratings between the experimental and control group are summarised in . Both disgust and fear were experienced more in the experimental group (ps ≤ .05), but disgust showed clearly the largest difference. In terms of Bayes factor (BF), evidence for greater disgust in the experimental group was strong (BF10 > 10), but there was only weak evidence for a difference in other emotions (BF10's < 3).

Table 4. Differences between the experienced affect between the experimental and control group. Rank-Biserial correlation (rRB) was used as an estimate of standardised effect size.

The experimental pictures were rated as more disgusting (M = 3.01, SD = .80) than the control pictures (M = .25, SD = .38; U = 1491, p < .001, BF10 > 100). The experimental pictures were rated as less interesting (M = .78, SD = .83) than the control pictures (M = 1.62, SD = .79; U = 354, p < .001, BF10 > 100).

Morality ratings

Morality ratings for each scenario are summarised in . To test for the overall effect of induced disgust, the sum variable of Mean Morality was computed for the ratings of the six neutral scenarios. Before proceeding to between-groups analyses, we tested whether the participants considered the depicted actions as genuinely neutral without any emotional induction. One sample t-test was conducted to test whether Mean Morality in the control group differed from 0 (which represented morally neutral on the scale); there was no difference (t = 1.42, p = .17, BF10 = .45). Next, we proceeded to testing for group differences. Mean Morality did not differ between the experimental group (M = 3.56, SD = 17.11) and the control group (M = 3.17, SD = 13.42; W = 766, p = .68, ηp2=.00, BF10 = .23). The inverted Bayes Factor was BF01 = 4.28, indicating substantial evidence for the null hypothesis.

Table 5. Morality ratings for all scenarios in the experimental and control conditions. Positive values represent morally right and negative morally wrong judgments.

To test for the possible mediating effects of the participants’ Disgust Sensitivity (DS) and their position on the Liberal-Conservative (LibCon) and Left-Right (LeftRight) scales on Mean Morality, an ANCOVA was performed with Mean Morality as the dependent variable, Group as the fixed factor, and DS, LibCon, and LeftRight as covariates. To test for possible moderating effects, the model included interactions between Group and each of the covariates. Group did not interact with any of the covariates (Fs(1,65) < 2, ps > .16), and none of the main effects were significant (Fs(1,65) < 2.2, p's > .14), except for the main effect of LeftRight (F(1,65) = 4.34, p = .041; the more a participant was on the political right, the more they considered the actions as morally correct). Bayesian analyses did not support the alternative hypothesis, but instead there was strong evidence for the lack of interaction between Group and LibCon or DS (BF01s > 15), and substantial evidence for the lack of interaction between Group and LeftRight (BF01 = 3.91). Likewise, there was substantial evidence for the lack of main effect of Group, LibCon, and DS (BF01s > 3.70), but no evidence for or against main effect of LeftRight (BF10 = .98).

Discussion of Experiment 1

Experiment 1 examined the effect of induced disgust on morally neutral scenarios to test the moralisation hypothesis, which implies that disgust can moralise an action. The disgust induction procedure was successful, as the participants in the experimental group reported experiencing more disgust than participants in the control condition. Despite this, there were no differences in aggregated moral judgments between the experimental and control groups, even after controlling for possible interactions with the participants’ disgust sensitivity and political orientation. On the contrast, the Bayesian analyses provided substantial support for the null hypothesis that disgust does not affect moral ratings, and strong-to-moderate evidence for the lack of interaction between Group and the background variables. These findings are evidence against the moralisation hypothesis, but they do not lend support for any specific hypothesis; instead, they are compatible with the amplification hypothesis or even a purely rationalist account of moral judgment. To more directly test whether disgust has a differential effect on neutral and wrong moral scenarios, we conducted Experiment 2, which included both neutral and wrong scenarios.

Experiment 2

Method

Participants

We aimed to gather at least 500 participants, which enables discovering a small effect (d = .25) with .05 alpha level and power of .80. The participants were recruited through e-mail lists and Facebook advertising. As a cover story we said that the experiment was about associations between visual attention and reading comprehension. Because deceit was used, we sought for ethical approval of the experiment from the institutional research ethics committee at Åbo Akademi University, Finland. The participants were warned about the possibility of seeing disturbing material during the experiment. When the participants clicked on the recruiting link, they were randomly redirected to a Google Form that included the moral scenarios and either disgusting or emotionally neutral videos. Altogether 510 people responded (268 in the experimental group). However, 451 (88%) of the respondents were female, which made it impossible to generalise across the whole tested population. Thus, we decided to exclude all non-women. Additionally, two participants were excluded because all their moral ratings were “1”, which we considered as a random answer pattern. Thus, the final sample was n = 448.

Materials and procedure

The participants first filled in a background questionnaire, which probed their gender, age, education, which political party best represented their values, and net income. DS was not included in this experiment, because we had double the number of moral scenarios compared to Experiment 1 and wanted to keep the experiment short to maximise the number of participants. The moral scenarios were the same as in Experiment 1, with the addition of morally wrong variants of each, which were the following: Simo flirts with his female colleagues and during one evening after the workplace's pre-Christmas party, he passionately kisses with his female colleague and does not tell his wife about this (in the neutral version Simo never crosses the line); Sofia posts gossips about her professor in a Facebook group, from where the rumours start to spread (in the neutral version Sofia privately gossips about the professor); Maj-Lis walks across a charity worker on the street and angrily shouts at him that he should spend his time more wisely (in the neutral version Maj-Lis passively passed the worker); Laura smokes secretly at her workplace and lies to her husband that she hasn't smoked when he wonders why her clothes smell (in the neutral version the husband doesn't ask and Laura doesn't tell); Maija takes the magazine addressed to her neighbour and doesn't return it (in the neutral version she does return it); Kauko flies to Thailand each month because he can afford that, despite knowing that it boosts climate change (in the neutral version he flies to Tenerife on his summer vacation). The neutral scenarios and their wrong variants totalled 12 scenarios (6+6). In addition, we included two extremely wrong scenarios to avoid a ceiling effect in the evaluations of the moderately wrong actions. The extremely wrong scenarios were the following: Pekka becomes violent when he uses alcohol, and once during drinking he hits a stranger who falls, hits his head and dies; Anu hurries to work by car, simultaneously reading e-mail on her cell phone, when she hits a child on crosswalk, killing the child. The extremely wrong scenarios were always presented as first and in the middle of the other scenarios, which were counterbalanced in order. The scenarios were evaluated on a seven-point Likert scale (1 = “completely wrong”; 7 = “completely right”).

For emotional induction, we used film clips validated by Hewig et al. (Citation2005). For disgust induction, two film clips were used: A scene from the movie Pink Flamingos (dir. John Waters) where a dog defecates on the street and a person picks up the feces and eats them; and The horse head scene from The Godfather (dir. Francis Ford Coppola), where a person finds the decapitated head of a horse in his bed and starts screaming. In the neutral emotion group the following clips were used: an excerpt from Hannah And Her Sisters (dir. Woody Allen) depicting discussions between different people; and a scene from All The President's Men (dir. Alan J. Pakula), depicting two journalists trying to crack the Watergate conspiracy. The films were presented in counterbalanced order, one before any of the scenarios and the other in the middle of the scenarios. The participants evaluated how arousing and disgusting the videos were on a Likert scale from 1 (not at all disgusting/arousing) to 5 (extremely disgusting/arousing). The question about arousal was included to conceal the purpose of the study.

After all the moral scenarios, the participants evaluated how strongly they felt joy, surprise, disgust, fear, sorrow, and anger on a scale from 1 (don't feel at all) to 5 (feel very strongly). Finally, the purpose of the study was revealed to them (after thanking the participant, they were informed that “the real purpose of the study was to examine the effect of disgust on moral ratings”) and they got to answer whether they had guessed its purpose (“did you guess the real purpose of the study while filling in the questionnaires?”).

Results

Political orientation on the left-right (LeftRight) and liberal-conservative (LibCon) axes were coded based on Kivikangas’ (Citation2017) analysis of politicians’ answers to questions about values in the communal elections test (“match your vote”) of Helsingin Sanomat. The politicians were grouped based on which party they belonged to, and their mean answers determined the party's overall mean in LeftRight and LibCon.

As in Experiment 1, the experimental and control groups were matched in terms of age, education, income level, LeftRight, and LibCon with threshold levels of ps ≥ .3 (sex was not matched because only females were included). Average income was higher (U = 34324, p = .24) in the control group than in the experimental group. Thus, we removed four participants with highest wage in the control group plus three participants with the lowest income in the experimental group, resulting in a final sample of n = 441 (228 in the experimental and 213 in the control group). In the matched sample, the groups did not differ with respect to age (t = .22, p = .83, BF10 = .11), education (U = 24679, p = .75, BF10 = .11), income (U = 25553, p = .32, BF10 = .17), LeftRight (U = 17033, p = .39, BF10 = .21), or LibCon (U = 17365, p = .59, BF10 = .14). Key participant characteristics in the final sample are summarised in .

Table 6. Key participant characteristics by Group in Experiment 2.

LibCon and LeftRight correlated positively (r = .65, p < .001), that is, participants who were on the political right were also more conservative. Income and education also correlated positively (r = .24, p < .001), as well as Age and Income (r = .42, p < .001). Other associations between the background variables were not significant. At the end of the study, 148 (33.6%) of the participants reported having guessed its purpose.

Effectiveness of the emotional induction procedure

Mood ratings in the experimental and control groups are summarised in . In terms of p-values, happiness, surprise, and disgust were experienced more in the experimental than in the control group, but the effect size was by far largest for disgust. Bayes Factors indicated evidence for a difference only in disgust (BF10 > 100) but not in the other emotions (BF10s < 2.3).

Table 7. Self-reported mood at the end of Experiment 2.

The experimental videos were rated as more disgusting than the control videos (M = 3.78, SD = .86 VS. M = 1.50, SD = .68; U = 1590, p < .001, BF10 > 100, d = 2.92), as well as more arousing (M = 2.65, SD = .90 VS. M = 2.15, SD = .73; U = 16322, p < .001, BF10 > 100, d = .60).

Morality ratings

Before examining the morality ratings in detail, we checked whether guessing the purpose of the study was associated with the moral ratings. Thus, we conducted one-way ANOVA with average moral rating (average from neutral and wrong scenarios) as the dependent variable and Group (experimental/control) and Guessed (yes/no) as the between-subjects factors. There was no main effect of Group (F < .001, p > .99, BF01 = 8.91) or Guessed (F = .74, p = .39, BF01 = 6.44), or their interaction (F = .33, p = .57, BF01 = 5.52 [the null model included main effects of Group and Guessed]). In terms of Bayes factors, there was substantial evidence for the null hypothesis that guessing the purpose of the study did not affect the results; thus, we proceeded to the main analysis without further exclusions.

Morality ratings in each of the scenarios and means for the neutral and wrong scenarios per group are summarised in . To test whether the neutral scenarios were really rated as neutral and wrong scenarios as wrong independently of any emotional induction (i.e. in the control group), one-sample t-tests were used to test the difference of the mean morality ratings to values representing neutral on the scale. In terms of p-value, Mean Neutral (M = 3.83) was significantly lower (more wrong) than 4 which represented morally neutral on the scale (t = −2.48, p = .014), but there was no Bayesian evidence for such difference (BF10 = 1.54). We considered BFs as more trustworthy than p-values, which easily reach significance in large samples and conclude that the neutral scenarios were indeed considered as morally neutral on average. Mean Wrong in turn was significantly lower (more wrong) than 4 (t = −37.47, p < .001) and the difference was supported by Bayesian analysis (BF10 > 100).

Table 8. Ratings of the individual scenarios and means by Group.

Main effects of Scenario Type (neutral vs. wrong), Group, and their interaction were examined in an ANOVA with the mean morality rating as the dependent variable, Scenario Type as the within-subjects factor, and Group as the between-subjects factor. There was no main effect of Group (F = .13, p = .72, ηp2=.00, BF01 = 12), but a significant effect of Scenario Type (F = 896, p < .001, ηp2=.67, BF10 > 100). There was no interaction between Group and Scenario Type (F = .086, p = .77, ηp2=.00, BF01 = 8.77 [the null model included the main effects of Group and Scenario Type]). That is, in terms of the Bayes Factors, there was substantial evidence for the null hypothesis that disgust does not affect moral ratings.

Next, we examined whether the background variables moderated the effect of group. We used an ANCOVA with Group as the between-subjects factor, Scenario Type as the within-subjects factor, and age, income, education, LeftRight, and LibCon as the covariates. In this model, there was again no main effect of Group (F = .24, p = .62, ηp2=.00, BF01 = 10), but there was a main effect of Scenario Type (F = 21.75, p < .001, ηp2=.055, BF10 > 100). There was no interaction between Group and Scenario Type (F = .082, p = .78, ηp2=.00, BF01 = 7.98 [the null model included the main effects of Group and Scenario Type]). None of covariates predicted the moral ratings (ps > .06, BF10s < 1.2), nor did they interact with Group (ps > .06, BF01s > 1.80 [each Covariate × Group interaction was added individually and tested against the null model that included the main effects of all the Covariates, Group, and Scenario Type]).

Discussion of Experiment 2

In Experiment 2, we aimed to test the differential effect of disgust on neutral and wrong moral scenarios. The amplification hypothesis predicts that incidental disgust makes judgments of morally wrong actions more severe but does not affect the neutral scenarios; the moralisation hypothesis as a stronger theory predicts that disgust can make otherwise morally neutral scenarios condemnable. The emotional induction was successful, as participants in the experimental group felt substantially more disgusted than participants in the control group, with no remarkable differences in other emotions. Despite this, there was neither a main effect of induced disgust, nor an interaction between disgust induction and type of scenario. In contrast, Bayes Factors provided substantial evidence against the effect of disgust. These findings are in conflict with both the stronger moralisation hypothesis and the weaker amplification hypothesis.

General discussion

The moralisation hypothesis (that disgust can moralise otherwise neutral actions) has not been systematically tested before, because previous studies have mainly used as stimuli scenarios that are judged to be morally wrong even without any emotional induction. Previous evidence for the moralisation hypothesis is mainly correlational (e.g. Haidt et al., Citation1993; Haidt & Hersh, Citation2001) or derived from individual cases (e.g. the student council case of Wheatley & Haidt, Citation2005). Thus, the present study was the first where it was systematically tested with large samples. We tested the moralisation hypothesis in two experiments that involved morally neutral scenarios, that is, scenarios that are judged as morally neither wrong nor right when no emotional induction is used. Additionally, we tested the differential effect of disgust on neutral and wrong variants of the same scenarios. The results of Experiment 1 provided substantial evidence against the hypothesis that disgust would moralise neutral actions. Experiment 2 provided substantial evidence against both the moralisation hypothesis and the amplifying hypothesis: disgust did not moralise neutral actions, nor did it amplify morally wrong scenarios.

These results are in line with the meta-analysis of Landy and Goodwin (Citation2015), which indicated that incidental disgust does not make moral judgments more severe. On the other hand, Landy and Goodwin found some evidence for the moralisation hypothesis, which is surprising, given that the moralisation hypothesis is stronger than the amplification hypothesis: if the latter is incorrect, it is very improbable that the former would be true. However, there was evidence of a publication bias in the studies addressing the moralisation hypothesis. Moreover, most of the evidence was based on raw data or unpublished theses or manuscripts. Finally, the previous studies did not systematically test whether the assumedly non-moral scenarios were indeed considered as non-moral when no emotional induction was used. In contrast, in the present experiments the hypothetically neutral scenarios were in fact considered as morally neutral (i.e. their ratings did not statistically differ from “neutral”). This allows a more direct test of the moralisation hypothesis than was the case in the previous studies.

Even though incidental disgust did not affect moral ratings in our experiments, this does not imply that emotion would not have had any role in the moral judgments. It is possible that emotion elicited by the stimulus itself can affect its moral evaluation (in contrast to incidental disgust). For example, a depiction of the married man Simo kissing with his female colleagues could have elicited more negative emotions than a depiction of him simply flirting with his colleagues, leading to the moralisation of the former action. However, this possibility cannot be ruled out or confirmed based on the present experiments; it would require an independent empirical test. A central problem in testing the effect of non-incidental disgust (or any other emotion) elicited by the stimulus itself is how to hold the action constant while only manipulating the emotion elicited by it – usually the two are entangled (cf. the probes used by Nichols, Citation2002). One way to address this problem could be to manipulate the appearance of an agent performing a depicted action: we would compare an unattractive vs. an attractive person conducting the same action. The hypothesis would be that actions performed by unattractive persons would elicit disgust and be evaluated more morally wrong.

All in all, our results could be interpreted in line with classical rationalist theories of moral judgment (e.g. May, Citation2018; Royzman et al., Citation2009; Turiel, Citation1983) instead of moral emotivism. In our experiments, scenario type had a very large effect size – a trivial finding that can be taken to indicate that what mainly affects moral judgment are the conceptual processes related to the interpretation of the scenario. The effect of the scenario (wrong vs. neutral) could be interpreted in terms of conceptual processes, such as harm elicited by the action or its accordance with social norms. However, since we did not assess perceived harm or normativity, it is impossible to know what aspects of the scenarios determined the judgments. Moreover, we cannot rule out the emotivist hypothesis that some emotional processes elicited by the scenario itself could have affected the moral ratings.

Limitations and strengths

One could argue that the disgust induction was too weak to influence the aggregated moral judgments. However, the standardised effect sizes (partial eta squared .11 in Experiment 1 and .12 in Experiment 2) were similar to those in previous studies. For instance, in the Schnall et al. (Citation2008) study, effect size was .19 in Experiment 1, non-significant in Experiment 2 (effect size not reported), and .06 in Experiment 3, but they nevertheless found an amplifying effect of disgust on aggregated moral ratings in each of the experiments. Another possible limitation of the current study is that making strong conclusions based on null effects can be problematic, because absence of evidence is not typically evidence of absence. To counter this problem, we used Bayesian statistics, which provides a continuous measure of evidence for the alternative or the null hypothesis.

A central limitation of the current study pertains to the type of scenarios used. Haidt and Graham (Citation2007) divide morality into five domains: harm/care, fairness/reciprocity, ingroup/loyalty, authority/respect, and purity/sanctity. There is evidence that disgust is mainly related to actions that are evaluated on the purity/sanctity dimension (e.g. Cameron et al., Citation2013; Dasgupta et al., Citation2009; Horberg et al., Citation2009). It could be argued that disgust can only moralise an action that can potentially appear in the purity/sanctity domain. To address this question, future studies could use as probes scenarios that are considered as morally neutral but that are mildly impure (e.g. unusual forms of sexual behaviour). The challenge for this approach is, however, how to select cases that are impure but are nevertheless judged as morally neutral.

Our study could also be criticised for operationalising moralisation as a continuous variable that was assessed on a scale from morally right to wrong. Our theoretical approach was that if an action was judged as morally neither right nor wrong (i.e. on the midpoint of the scale), it was considered as non-moral. However, it could be objected that the moral/non-moral difference is not a matter of degree but instead dichotomous. One could claim that actions evaluated on the midpoint of the scale were not really morally neutral, but instead morally ambivalent. In this case, a shift in the rating would not reflect the moralisation of the action, but instead something closer to an amplification effect. As a response to this criticism, we note that it is not clear to what extent morally neutral and non-moral are distinct on the conceptual level. For an action A to be non-moral, it is arguably necessary that A is also rated as morally neutral on our scale, for otherwise it would be either wrong or right (i.e. not non-moral). It is another question whether the rating of A as neutral on our scale is a sufficient condition for its being non-moral. It could be argued that it is not, because A could be morally ambivalent. For example, one might judge that Kauko's flying to Teneriffe for holidays despite knowing about the emissions is indeed a moral action (i.e. evaluable on the right-wrong axis), but that its moral value is negligible. To address these questions, future studies could more directly probe the normativity of the action (i.e. ask whether it's normative or just conventional; see Nichols, Citation2002; Rottman & Kelemen, Citation2012).

Another limitation is that we did not include a measure of body consciousness (e.g. Private Body Consciousness Scale; Miller et al., Citation1981), which has in some studies moderated effects of disgust on morality (Schnall et al., Citation2015). However, we did include the related Disgust Sensitivity scale (DS-R) in Experiment 1, which did not moderate our effects. One could also raise the concern that we did not provide the participants with a definition of what it means for an action to be morally wrong. For instance, Turiel (Citation1983) defines immoral action as one that is wrong universally and independent of authorities, and deserves punishment. We did not use such definition in our operationalisation, but instead simply asked the participants to rate the depicted actions on a scale from morally wrong to right.

The main strength of the current study is that it utilised for the first time pre-tested neutral moral scenarios, which enabled us to directly and systematically test the moralisation hypothesis. Previous studies have mainly used morally wrong actions, which makes it impossible for them to test directly the moralisation hypothesis. Additionally, a strength of Experiment 2 is that it used neutral and wrong variants of the same moral scenarios. This reduces random variance due to the scenario formulations and enables more direct tests of the differential effect of disgust on morally neutral and wrong behaviour.

Conclusions

Earlier studies on the effect of disgust on moral ratings have used as stimuli actions that are judged to be wrong even without any emotional manipulation. Thus, they cannot test whether disgust is sufficient to moralise an action, but instead can only test the amplification hypothesis. In the present study, we tested the moralisation hypothesis by examining the effect of artificially induced disgust on neutral moral scenarios, as well as the differential effect of disgust on neutral and wrong scenarios. The results did not support the moralisation or the amplification hypotheses. Instead, Bayesian analysis lent substantial evidence for the null hypothesis that disgust has no effect on moral judgment. The results are consistent with the meta-analysis of Landy and Goodwin (Citation2015), which found no support for the effect of incidental disgust on moral judgment. Future studies should aim to replicate the findings using probes in the purity domain, which could be more sensitive to emotional manipulation.

Data availability statement

The data and materials that support the findings of this study are openly available in OSF at https://osf.io/wbyfu/, reference number DOI 10.17605/OSF.IO/WBYFU.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 The Heinz dilemma is one of many fictitious scenarios used by Kohlberg (Citation1981) to study the stages of moral development:

A woman was near death from a special kind of cancer. There was one drug that the doctors thought might save her. It was a form of radium that a druggist in the same town had recently discovered. The drug was expensive to make, but the druggist was charging ten times what the drug cost him to produce. He paid $200 for the radium and charged $2,000 for a small dose of the drug. The sick woman's husband, Heinz, went to everyone he knew to borrow the money, but he could only get together about $1,000 which is half of what it cost. He told the druggist that his wife was dying and asked him to sell it cheaper or let him pay later. But the druggist said: ‘No, I discovered the drug and I’m going to make money from it.’ So Heinz got desperate and broke into the man's store to steal the drug for his wife. Should Heinz have broken into the laboratory to steal the drug for his wife? Why or why not? (Kohlberg, Citation1981)

2 The Bayes Factor is an estimate for how many more times plausible the data is under the alternative hypothesis compared to the null hypothesis. BF above 1 indicates evidence for the alternative hypothesis, whereas BF below 1 indicates evidence for the null hypothesis. The inverted BF was used as a more intuitive measure of evidence for the null effect, because the values of the inverted BF increase as the null hypothesis receives more support. BF between 1 and 3 can be considered as weak evidence for the alternative hypothesis, 3–10 substantial, and 10–30 strong. Respective values apply with respect to the inverted BF and the null hypothesis (Jeffreys, Citation1961).

References