3,079
Views
5
CrossRef citations to date
0
Altmetric
Articles

Negative content in auditory verbal hallucinations: a natural language processing approach

, , , , &
Pages 139-149 | Received 26 Feb 2021, Accepted 07 Jun 2021, Published online: 21 Jun 2021

ABSTRACT

Introduction

Negative content of auditory verbal hallucinations (AVH) is a strong predictor of distress and impairment. This paper quantifies emotional voice-content in order to explore both subjective (i.e. perceived) and objectively (i.e. linguistic sentiment) measured negativity and investigates associations with distress.

Methods

Clinical and non-clinical participants with frequent AVH (n = 40) repeated and recorded their AVH verbatim directly upon hearing. The AVH were analyzed for emotional valence using Pattern, a rule-based sentiment analyzer for Dutch. The AVH of the clinical individuals were compared to those of non-clinical voice-hearers on emotional valence and associated with experienced distress.

Results

The mean objective valence of AVH in patients was significantly more negative than those of non-clinical voice-hearers. In the clinical individuals a larger proportion of the voice-utterances was negative (34.7% versus 18.4%) in objective valence. The linguistic valence of the AVH showed a significant, strong association with the perceived negativity, amount of distress and disruption of life, but not with the intensity of distress.

Conclusions

Our results indicate that AVH of patients have a more negative linguistic content than those of non-clinical voice-hearers, which is associated with the experienced distress. Thus, patients not only perceive their voices as more negative, objective analyses confirm this.

Introduction

Auditory verbal hallucinations (AVH) are a cardinal feature of psychosis and one of the most common positive symptoms in schizophrenia (Baethge et al., Citation2005). They also occur in individuals without a psychiatric or neurological disorder, with median reported prevalences around 9.6% (Maijer et al., Citation2018). A recent population study found that up to 29.4% of the general population reported the experience of AVH over the course of a month (Linszen et al., Citationin press) when a sensitive questionnaire is used (Schutte et al., Citation2018). AVH in non-clinical and clinical individuals are similar in terms of loudness, personification and number of voices heard, but the perceived emotional content differs with a tendency towards negative valence content in patients (Daalman et al., Citation2011). Negative voice-content appears to be one of the major differences between clinical and non-clinical voice-hearers (Larøi, Citation2012) and is a strong predictor of experienced distress and impairment in daily functioning (Larøi et al., Citation2019).

It is not yet clear how we should define “negative” voice-content. Linguistic voice-content assessments, based on emotional valence estimations for individual words, may not lead to valid estimations of the negative content, since words are best interpreted in their context. Personal memories or experiences can give a certain passage a negative meaning, although the meanings of its constituting words might appear neutral or even positive. For example, a patient much detested that AVH called him by his last name, as children who bullied him at school used to do this.

In a previous study on this topic (van der Gaag et al., Citation2003), voice-content was rated by two independent raters. Their results indicate that both positive and negative voice-content assessed by the raters is interpreted as such by voice-hearers. However, content assessed to be neutral by independent raters could still be interpreted as either positive or negative by the voice-hearers. This finding indeed confirms that seemingly neutral voices can have a personal negative/positive valence, perhaps depending on adverse life experiences or affective processing alterations in clinical voice-hearers (Aleman & Kahn, Citation2005; Cohen & Minor, Citation2010; Reiff et al., Citation2012). This could lead to the hypothesis that clinical and non-clinical individuals with AVH have similar voices in terms of linguistic emotional valence, but differ in the processing or interpretation of the voice-content. The cause for more severe distress from AVH in clinical voice-hearers would then lie in affective processing, rather than in the objective valence of the AVH.

Little is known about the objective linguistic emotional valence of AVH. A recent preliminary study (n = 6) explored the emotional content of AVH compared to general inner thoughts based on linguistic emotional valence, suggesting that AVH were more negative than inner thoughts (Turkington et al., Citation2019). A previous study by our group (De Boer et al., Citation2016) showed that the AVH of individuals with a psychotic disorder contained more terms of abuse than AVH of non-clinical individuals.

Given the consistent association between negative emotional content and distress engendered by AVH, reducing negative (interpretations of) voice-content is an often applied approach for cognitive behavioural therapy (CBT) in patients with distressing hallucinations. To further inform such lines of treatment, detailed knowledge about the emotional content of AVH is essential.

The current study examines the emotional valence of voice-content using linguistic sentiment analysis in clinical and non-clinical voice-hearers. Sentiment analysis is a method in natural language processing that aims to quantify the emotional polarity or valence of text, which can be negative, neutral or positive. Second, we assess the relation between linguistic sentiment and perceived negativity and distress in both clinical and non-clinical voice-hearers. By assessing linguistic emotional valence as well as self-rated perceived negativity we aim to establish whether negative voice-content is objectively more negative in clinical voice-hearers, or whether they process their voice-content in such a way that it leads to a more subjectively negative perception.

Based on previous work by our group (Daalman et al., Citation2011; De Boer et al., Citation2016), we hypothesise that both objective and subjective voice-content in clinical voice-hearers is more negative than in non-clinical voice-hearers. We further expect objective voice-content to be predominantly negative in the clinical voice-hearers, and predominantly positive in the non-clinical group. Finally, we expect objective voice-content to be strongly associated with subjective negativity, distress and disruption of life in both groups.

Methods

Participants

All participants experienced persistent AVHs (i.e., at least once a month for over a year). A total of 40 participants were included; 21 patients with a psychotic disorder and 19 non-clinical participants who experience AVH. Participants were included if they heard voices at least daily. Non-clinical participants were recruited through a Dutch website; for full methodology see previous reports on this sample (Daalman et al., Citation2011; De Boer et al., Citation2016). The non-clinical voice-hearers were screened for the absence of a psychiatric disorder by psychiatrists using the Comprehensive Assessment of Symptoms and History (CASH) interview (Andreasen et al., Citation1992) and the Structured Clinical Interview for Personality Disorder (SCID-II) (First et al., Citation1997). Non-clinical voice-hearers were excluded if (1) they had a diagnosis or treatment for psychiatric disorders other than depressive or anxiety disorders in complete remission; (2) they had a history of alcohol or drug abuse in the past 3 months. The Psychotic Symptom Rating Scale (PSYRATS) for auditory hallucinations was applied for the phenomenological characteristics of the hallucinations (Haddock et al., Citation1999). All procedures were approved by the Ethical Review Board of the University Medical Center Utrecht. All participants gave written informed consent.

Procedures

Shadowing procedure

The shadowing procedure was conducted at the University Medical Center Utrecht. Participants were instructed to repeat out loud their AVH verbatim directly upon hearing them for the duration of one minute. They were further instructed to repeat their AVH with the same intonation, loudness, and pronunciation as the voice(s) they perceived. Their verbatim repetitions were recorded using a voice-recording device. Voice recording started with the onset of the participants' repetition of the AVH and was stopped after one minute. This procedure was repeated three times per participant in the same session, resulting in three minutes of recorded voice-speech. Some participants experience AVH almost continuously, whereas others had less frequent AVH on the day of the recording. The procedure lasted between 10 and 30 min, depending on the frequency of the AVH. Recordings were saved as .wav files.

Language analyses

The shadowing audio files were transcribed using CLAN software, according to the CHILDES manual (MacWhinney, Citation2000). All transcriptions were made by trained linguistics students who were native speakers of Dutch, and were blinded to the presence of a clinical disorder. The transcriptions were divided into utterances. Utterance boundaries were determined on the basis of prosodic and semantic coherence.

Sentiment analyses were performed using Pattern (https://github.com/clips/pattern), an open-source Python package for natural language processing. The Dutch submodule contains a rule-based sentiment analyser, which is based on a lexicon of about 4000 Dutch lemmas. The algorithm takes into account downtoners, amplifiers and negations. Downtoners are adverbs that diminish the sentiment of an adjective (e.g., “nearly dark”), whereas amplifiers strengthens it (e.g., “very dark”), and negations assert that something is not the case (e.g., “not dark”). Following previous research (Nazareth et al., Citation2019), this lexicon was expanded using Moors lexicon (Moors et al., Citation2013), which contains valence scores for approximately 4300 Dutch words. These valence scores were rescaled to the [−1; 1] range Pattern uses and were added to the lexicon, along with their corresponding part of speech (POS) tags. The final lexicon contained 8218 different Dutch nouns, verbs, adverbs and adjectives. Pattern’s “parse” and “split” functions were used to annotate words with their POS tags. This lexicon is a selection of the Dutch language vocabulary, which is estimated to consist of at least 1 million words. The “sentiment” function was used to calculate the sentiment of each utterance. Mean valence scores were calculated per participant by averaging over all utterances. The variance of valence was calculated as the standard deviation of the valence of all utterances per participant. The minimal and maximal valence scores were calculated as the utterance with respectively the lowest and highest valence per participant. Valence scores in the [−.03; .03] range were considered neutral, conforming to previous research (Nazareth et al., Citation2019). On average, 31 utterances were obtained per participant, of which 17 received a valence score that was used in the analyses. One clinical voice-hearer only heard English hallucinations during the shadowing procedure and was therefore excluded from the analyses.

Statistics

All statistical analyses were run in IBM SPSS Statistics version 25.0.0.2. Participant characteristics were compared between groups using an analysis of variance (ANOVA) for continuous values, and χ2 tests for categorical values. ANOVA’s were used to compare the linguistic emotional valence characteristics between groups. The grouping variable was the presence/absence of a psychotic disorder. Mann–Whitney U tests were used to assess differences in the phenomenological characteristics of AVHs between the two groups. The phenomenological outcome measures were derived from the PSYRATS. A χ2 test was used to test differences in distributions across groups. Bivariate Pearson’s correlations were used to assess the association between linguistic valence and the characteristics of AVHs. Alpha was set at .05 for all analyses.

Results

Demographic characteristics of the clinical and the non-clinical voice-hearers are presented in . Clinical and non-clinical voice-hearers did not differ in age or sex. The non-clinical voice-hearers had a younger age of onset of AVH than the clinical voice-hearers. None of the non-clinical voice-hearers had a history of depression or anxiety disorders or used psychotropic medication. One of the clinical voice-hearers had a comorbid borderline personality disorder. The proportion of voice-utterances that were scored using Pattern differed between groups (F (1, 38) = 8.46, Partial η2 = .181, p = 0.006); Pattern recognised a greater proportion of the non-clinical voice-utterances than of the clinical voice-utterances. The mean valence of the voice-utterances significantly differed between the groups (see ).

Table 1. Demographics and AVH valence characteristics.

Phenomenological characteristics, including perceived (negativity) of the voices, are presented in . The voice-utterances (clinical n = 338, non-clinical n = 310) showed significantly different objective (i.e., linguistic sentiment) valence distributions in the clinical versus the non-clinical voice-hearers (χ2 (2, n = 648) = 23.76, φ = .192, p < 0.0001), see . In the clinical individuals, 34.7% of the voice-utterances were objectively negative, 6.8% were neutral and 58.5% were positive. In the non-clinical voice-hearers 18.4% of the voice-utterances were objectively negative, 4.9% were neutral and 76.7% were positive. Post-hoc analyses revealed that the distribution of objectively positive versus negative voice-utterances differed between groups (χ2 (1, n = 606) = 23.26, φ = .196, p < 0.0001), whereas the distribution of objectively positive versus neutral or negative versus neutral voice-utterances was not significantly different between groups (p > 0.05).

Figure 1. Distribution of sentiment of the AVH in clinical and non-clinical voice-hearers. Mean emotional valence of the AVH range from −1 to 1. 1 indicates highly positive valence, −1 highly negative valence. Absolute frequencies are displayed. Valence scores of 0 are considered neutral.

Figure 1. Distribution of sentiment of the AVH in clinical and non-clinical voice-hearers. Mean emotional valence of the AVH range from −1 to 1. 1 indicates highly positive valence, −1 highly negative valence. Absolute frequencies are displayed. Valence scores of 0 are considered neutral.

Table 2. Comparison of AVHs in clinical and non-clinical voice-hearers.

When looking at the linguistic sentiment distribution of voice-utterances over the participants, our results indicate that 65% of the clinical voice-hearers predominantly heard positive voices, 25% predominantly heard negative voices and 10% heard an equal amount of positive and negative voices. Of the non-clinical voice-hearers, 90% heard predominantly positive voices, 5% heard negative voices and 5% heard an equal amount of positive and negative voices. An example of one of the clinical voice-utterances classified as positive was “once your turn will come” (translated from the original Dutch “eens kom je aan de beurt”), whereas a negative utterance was “that bitch must die” (translated from the original Dutch “dat wijf moet dood”).

The mean objective valence was strongly associated with the amount and intensity of perceived (subjective) negativity (r = −.619, p = 0.001, r = −.474, p = 0.005 respectively), amount of distress (r = −.579, p = 0.004), and disruption of life (r = −.409, p = 0.016), whereas no significant association was found between objective valence and the intensity of distress (r = −.295, p = 0.090).

Discussion

In line with our expectations, we found that the AVH utterances of patients with a psychotic disorder had a more negative linguistic emotional valence than those of non-clinical voice-hearers. Our findings are in line with previous research that shows a preponderance towards negative self-rated voice-content in patients (Daalman et al., Citation2011; Larøi, Citation2012; Larøi et al., Citation2019). We extend these findings by showing that this tendency remains when objectively quantified, in the absence of information on linguistic context. Moreover, in contrast to expectations, most clinical and non-clinical voice-hearers predominantly heard objectively positive voices, yet the proportion of positive versus negative and neutral voices was smaller in the patients. The perceived negativity, amount of distress from the voices and the disruption of life by the voices was strongly associated with the mean linguistic emotional valence of the voices, whereas the intensity of distress from the voices was not.

Our study has both scientific and therapeutic implications. First, we have shown that even in the absence of linguistic context, patients’ AVH contain more objectively negative content than AVH of non-clinical voice-hearers. This suggests that AVH language in patients is more often negative (objectively), independent of potential alterations in emotional processing, personal memories or negative associations that may additionally affect the perceived negativity. A prominent pathophysiological model for explaining negative voice-content suggests that AVH result from activation of the right hemisphere Broca’s area homologue, which is associated with the production of swear words (Sommer et al., Citation2008; Sommer & Diederen, Citation2009). However, since we did not test swear words in this project, we were unable to assess whether our results are in line with this framework.

Second, CBT for AVH is currently focused on changing voice-hearers’ beliefs about their voices, based on the cognitive model of hallucinations (Chadwick & Birchwood, Citation1994) which suggests that “distress and behavioural repertoire in voice-hearers is most closely tied to beliefs about voices, irrespective of content” (Larøi et al., Citation2019; Peters et al., Citation2012, p. 1507). Whereas we do not deny the importance of a persons’ beliefs in the generation of distress, our findings show that distress is closely tied to content, even when beliefs are left out of the equation. Solely focusing on beliefs about the voices might therefore not be sufficient to alleviate the distress. Indeed, whereas CBT has proven effective for AVH, effect sizes are relatively small and there is no evidence that CBT changes the perceived malevolence of voices (Sommer et al., Citation2012; van der Gaag et al., Citation2014). Our findings may contribute to developing additional angles for CBT. For example, our results show that although patients hear more objectively negative voices than non-clinical voice-hearers, both groups also hear objectively positive voices. Therefore an additional aim in CBT could be to shift some of the attentional weight from the negative towards the positive voice-utterances, in an attempt to relieve some of the distress. This could be achieved by training a persons’ metalinguistic skills (Bialystok & Ryan, Citation1985; Tunmer et al., Citation1988), which can enhance their ability to focus on their positive or negative valence, rather than on the content itself. Other metalinguistic approaches to hallucinations include focusing on the grammatical structure of the voices (Corona Hernández et al., Citationin press), or reducing negative associations by replacing them with positive word associations (Moritz et al., Citation2007; Moritz & Jelinek, Citation2011; Moritz & Russu, Citation2013).

This study has some limitations. First, the sample size is small. Second, although we extended the emotional valence tool with an additional set of words, only ∼60% of all voice-utterances were recognised by Pattern. A word category that is not included in Pattern is swear words. Previous work by our group (De Boer et al., Citation2016) indicates that the voices of patients contain more swear words, yet these were not rated by Pattern. This likely affected our results and may explain in part why also the clinical voice-hearers predominantly heard positive voices and why a smaller proportion of the utterances of the clinical voice-hearers were recognised by Pattern. Third, all patients were collected through the “voices clinic” of the UMC Utrecht, which is an outpatient clinic for patients with chronic AVH. This may have led to a selection bias since only patients that regard their voices as distressing come to the clinic for treatment. Fourth, although we did not exclude lifetime diagnoses of depression or anxiety disorders in the non-clinical voice-hearers, none of the non-clinical participants had a history of mental illness. This may have influenced the perceived distress or characteristics of the voices. Finally, the AVH were obtained using the shadowing procedure, which has several limitations. Shadow recordings started with the onset of AVH and stopped after one minute. Some participants experienced AVH for the full duration of the recording, whereas others did not. This may have influenced the amount of AVH captured on record. Further, the participant is trusted to repeat the contents of their hallucinations correctly, which makes the recordings subjective. Also, as a result of the use of this method, the participant focuses on the AVH which could result in a change in cognitive processing of the hallucination. This could result in a recording that is not representative of the AVH that are generally experienced by the participant. However, it has to be acknowledged that there is no more direct way to gain access to the content of AVH, as this is a strictly private experience. In future studies, this may be checked by asking participants to rate the resemblance of the recorded AVH and the AVH they generally hear on a Likert scale.

It is important to note that sentiment analysis, in general, has its limitations, it is for example less capable of dealing with highly complex sentences and performs less accurately in new domains (Astya, Citation2017). Replication studies are required to establish whether sentiment analyses are indeed accurate at capturing negative voice-content. Future research should also focus on more in-depth linguistic analyses of the differences between clinical and non-clinical voices since for, example, the use of power or politeness by the voices can shed a light on the relationship individuals have with their voices (Demjen et al., Citation2020), which could provide additional angles for CBT.

In conclusion, we have shown that both clinical and non-clinical voice-hearers predominantly hear positive voices, yet the proportion of objectively negative versus positive voices is larger in patients. The linguistic emotional valence of voices is strongly associated with the perceived distress and disruption of life, irrespective of context or personal memories. This has important implications for additions to current CBT regimes since current models are based on the idea that distress in voice-hearers is caused by their beliefs about the voices, irrespective of their content. Instead, our findings suggest the opposite is also true, namely; distress is closely tied to the content of the voices, irrespective of personal beliefs.

Acknowledgments

We would like to thank those who participated in the study and lab members for their help in processing and collecting data.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

J. N. de Boer

Janna de Boer: Conceptualisation, Methodology, Formal analysis, Writing – Original draft. Hugo Corona Hernández: Writing – Review & Editing. Frank Gerritse: Methodology, Software. Sanne Brederoo: Writing – Review & Editing. Frank Wijnen: Writing – Review & Editing, Supervision. Iris Sommer: Writing – Review & Editing, Supervision, Funding acquisition.

References