1,461
Views
1
CrossRef citations to date
0
Altmetric
Original Articles

Financial reward has differential effects on behavioural and self-report measures of listening effort

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 900-910 | Received 10 Jun 2020, Accepted 29 Jan 2021, Published online: 25 Feb 2021

Abstract

Objectives

To investigate the effects of listening demands and motivation on listening effort (LE) in a novel speech recognition task.

Design

We manipulated listening demands and motivation using vocoded speech and financial reward, respectively, and measured task performance (correct response rate) and indices of LE (response times (RTs), subjective ratings of LE and likelihood of giving up). Effects of inter-individual differences in cognitive skills and personality on task performance and LE were also assessed within the context of the Cognitive Energetics Theory (CET).

Study sample

Twenty-four participants with normal-hearing (age range: 19 − 33 years, 6 male).

Results

High listening demands decreased the correct response rate and increased RTs, self-rated LE and self-rated likelihood of giving up. High financial reward increased subjective LE ratings only. Mixed-effects modelling showed small fixed effects for competitiveness on LE measured using RTs. Small fixed effects were found for cognitive skills (lexical decision RTs and backwards digit span) on LE measured using RTs and correct response rate, respectively.

Conclusions

The effects of listening demands on LE in the speech recognition task aligned with CET, whereas predictions regarding the influence of motivation, cognitive skills and personality were only partially supported.

Introduction

Listening effort (LE) has been defined as ‘the mental exertion required to attend to, and understand, an auditory message’ (McGarrigle et al. Citation2014, p 434). A number of subjective (e.g. the NASA task load index (NASA-TLX; Hart and Staveland Citation1988), behavioural (e.g. RTs) and physiological (e.g. cardiac reactivity) measures of LE have been proposed (see McGarrigle et al. Citation2014 for a review). The Framework for Understanding Effortful Listening (FUEL), defines LE as the ‘deliberate allocation of mental resources to overcome obstacles in goal pursuit when carrying out a [listening] task’ (Pichora-Fuller et al. Citation2016, p. 10 S). According to this definition, the allocation of cognitive resources is not an automatic response to increased listening demands, but occurs only when a listener is motivated to achieve a particular goal.

The conceptual understanding of motivation outlined by FUEL builds upon Motivational Intensity Theory (MIT; Brehm and Self Citation1989), which posits that effort expenditure is influenced by both motivation and task demands. Most importantly, MIT predicts an interaction between listening demands and motivation. For relatively easy tasks, resource conservation limits the influence of motivation, such that the amount of effort expended never exceeds that required, regardless of motivation level. In contrast, for relatively hard tasks, effort is mobilised in line with motivation; the greater the importance of success, the more effort is exerted. Thus, MIT predicts an interaction between listening demands and motivation driven by a greater influence of motivation at higher demands, as seen in some previous LE studies (e.g. Kahneman and Beatty Citation1966; Richter Citation2016; Mirkovic et al. Citation2019; Zhang et al. Citation2019).

Nevertheless, there are inconsistencies both within and between studies that do not fit with MIT predictions based solely on task demands and motivation. When multiple LE measures are used within the same study, these do not always show consistent effects of listening demands and motivation, for example, a tone discrimination task resulted in changes in cardiac reactivity but no effects on performance accuracy or response times (RTs; Richter Citation2016). These differential effects may be due to the multi-dimensionality of LE, with different outcome measures tapping into interrelated aspects of the LE construct (McMahon et al. Citation2016; Strauss and Francis Citation2017; Hughes et al. Citation2018; Strand et al. Citation2018; Alhanbali et al. Citation2019; Herrmann and Johnsrude Citation2020). Moreover, additional factors may moderate the relationship between task difficulty and motivation such as different ways of operationalising motivation (Picou and Ricketts Citation2014; c.f. Koelewijn et al. 2018; Richter Citation2016). For instance, no significant effect on RTs was found when motivating participants with financial reward (Richter Citation2016), but ‘threat of evaluation’ decreased RTs in an auditory oddball task (Carrillo de la Peña and Cadeveira Citation2000). Evaluative threat may increase arousal, a factor which has been demonstrated to result in faster RTs (Hackley and Valle-Inclán Citation1998); in other task designs increased LE may be reflected in slowed responses due to greater cognitive processing (Pisoni and Tash Citation1974; Marslen-Wilson and Tyler Citation1980). The slowing of RTs with increased listening demands has been interpreted as reflecting increased LE (Houben, van Doorn-Bierman, and Dreschler Citation2013). However, RTs are not a ‘process pure measure’ (Pichora-Fuller et al. Citation2016, 19S); therefore, changes in RT may also reflect other aspects of cognition, such as memory.

One aspect that is not considered by FUEL/MIT but which may well influence motivation and effort expenditure is personality traits, such as the need for closure. Need for closure refers to the strength of an individual’s preference for clear, ordered and stable knowledge compared to confusion, uncertainty and ambiguity (Kruglanski and Webster Citation1996; Roets and van Hiel Citation2008; Viola et al. Citation2015). Participants with high need for closure, measured using the Need for Closure Scale (Kruglanski and Webster Citation1996), tend to choose less effortful means to achieve closure (see Kruglanski (Citation2004) for an overview), but may exert greater effort when only effortful means to achieve closure are available (Kruglanski, Peri, and Zakai Citation1991; Kruglanski, Webster, and Klem Citation1993; Klein and Webster Citation2000; Richter, Baeriswyl, and Roets Citation2012; Sankaran, Szumowska, and Kossowska Citation2017). Another personality trait that may be important is achievement motivation, which refers to an individual’s desire to achieve success and accomplish challenging goals (Capa, Audiffren, and Ragot Citation2008). The strength of this motive relative to an individual’s desire to avoid failure determines resultant achievement motivation (McClelland et al. Citation1953). Individuals who are high in resultant achievement motivation may exert more effort than those who are low in achievement motivation (Beh Citation1990; Capa, Audiffren, and Ragot Citation2008; Hinsz and Jundt Citation2005; Humphreys and Revelle Citation1984).

A model from the wider field of effort research, the Cognitive Energetics Theory (CET) (Kruglanski et al. Citation2012), accommodates many of the aspects of motivation discussed above, and may thus be particularly well suited to elucidate current inconsistencies in LE research. Although also building upon MIT, CET incorporates personality factors and individual differences in resource capacity into a theory of effort and performance. CET therefore offers a more comprehensive model of motivation and effort compared to FUEL (Pichora-Fuller et al. Citation2016). In addition, CET is intended to apply to ‘all instances of goal-directed thinking’ (Kruglanski et al. Citation2012, p. 3), whereas FUEL posits that LE involves the ‘deliberate’ allocation of resources. Thus, CET is applicable to subjective effort associated with goal pursuit (which may be measured using self-rated outcomes), objective effort (which may be indexed using behavioural and physiological outcomes) and performance accuracy.

CET describes the underlying decision-making process behind effort investment in terms of opposing forces: a ‘driving force’ towards exerting effort (which depends upon goal importance, i.e. how motivated a person is to succeed in the task, and resource availability) and a ‘restraining force’ towards restricting effort (which depends upon task demands but also an individual’s tendency to conserve resources). The balance between these forces is assumed to govern how much effort is exerted. CET makes a distinction between the maximum energy an individual is willing to mobilise to achieve a specific goal and the actual energy used, which depends upon several factors, including individual differences in the tendency towards resource conservation.

Application of CET to a speech recognition task

In CET, the individual’s assessment of the importance of goal achievement, and the size of their resource pool, determines the magnitude of the ‘potential’ driving force (Kruglanski et al. Citation2012). Within a speech recognition task, we propose that the resource pool relates to two types of cognitive skills that have received particular scrutiny in the context of speech perception: working memory resources (Rönnberg Citation2003; Rönnberg et al. Citation2013; Rönnberg, Holmer, and Rudner Citation2019) and linguistic skills operationalised as lexical decision-making ability (Kaandorp et al. Citation2016).

In CET, actual effort expenditure is limited by a restraining force consisting of three additive components: (a) task demands, (b) alternative goals, which compete with the target activity for resources, and (c) resource conservation (Kruglanski et al. Citation2012). Resource conservation is high in individuals who have a high need for closure (Kruglanski and Webster Citation1996; Roets and van Hiel Citation2008; Viola et al. Citation2015), and based on previous studies (e.g. Beh Citation1990; Hinsz and Jundt Citation2005; Capa, Audiffren, and Ragot Citation2008), may be low in individuals who have a motivational style more focussed on achieving success (i.e. achievement motivation). Hence, considering these personality traits may help to account for differences in LE studies.

Another advantage of using CET over FUEL is that it makes quantifiable predictions about the likely effects of manipulating task demands and motivation on effort and performance in the context of driving and restraining forces. A strong driving force, for example, is expected to permit the use of more effective means to goal attainment and lead to better performance, though at a cost in terms of effort. A strong restraining force, on the other hand, restricts the use of these resource-heavy strategies and hence results in poorer performance. In the present study, our main aim was to investigate how listening demands and motivation (operationalised as financial reward) regulate listening effort in a speech recognition task, in the context of CET. Our secondary aims were to investigate the possible moderating influence of (i) resource conservation (operationalised as need for closure and individual differences in achievement motivation) and (ii) resource pool capacity (operationalised as working memory span and lexical-decision-making ability) on the relationship between motivation and listening demands. We chose different types of outcome measures to reflect the multidimensionality of LE and to test whether CET predictions apply to the subjective (self-reported ratings of LE and likelihood of ‘giving up’, or avoidance, as described in Picou and Ricketts (Citation2014)) and objective (correct response rate, RT) outcome measures used in the present study. We made the following predictions:

We predicted that listening demands would show a main effect, with higher demands resulting in lower correct response rates, longer RTs and higher subjective ratings of LE and likelihood of ‘giving up’. We also predicted a main effect of reward, with high value reward expected to motivate participants more than low, and result in higher correct response rates, longer RTs, higher self-rated LE and lower self-rated likelihood of giving up. Interactions between listening demands and reward were predicted for the correct response rate, RTs and subjective ratings of LE and likelihood of giving up, driven by a greater motivational influence of financial reward under higher listening demands. In addition, we hypothesised that differences in the resource pool capacity (measured by working memory span and lexical-decision making ability) and in the resource conservation aspect of the restraining force (need for closure and individual differences in achievement motivation) would predict the correct response rate and subjective and behavioural measures of LE. We expected measures of resource conservation and measures of resource pool capacity to interact with listening demands and reward for behavioural and self-report LE outcomes.

Methods

Participants

To be eligible, participants needed to be between 18 and 35 years old, with normal hearing, normal or corrected-to-normal vision and no previous neurological issues or speech problems. Twenty-four (18 female) NH native-English speaking adults participated in the study, ranging from 19 to 33 years of age (median = 23). This sample size is sufficient to achieve 80% power (1 - β = .80, α = .05) for a medium effect size (f = .25) for a 2 × 2 repeated-measures factorial design (Faul et al. Citation2009) for each of the four main outcome measures (see Procedures and Data Analysis). Prior to taking part, participants were informed that the purpose of the study was to understand whether a person’s motivation to complete a listening task changes the amount of effort they use. Participants were compensated for their time with a £15 honorarium and were informed that they would have the chance to earn additional performance-based rewards by answering the questions correctly, in order to incentivise maximal effort exertion throughout the task (see Speech Recognition Task section below). The study was reviewed and approved by the University of Manchester Research Ethics Committee (approval number: 2019-6493-10583) and pre-registered with the Open Science Framework https://osf.io/6x7pd?view_only=d91bf9c111124fc2ab1fc6c52893182f.

Hearing screening

Each participant was screened using otoscopy, tympanometry and pure-tone audiometry to ensure they met the eligibility criteria for participation. All participants had bilateral NH (≤20 dB HL for test frequencies of 250, 500, 1000, 2000, 4000 and 8000 Hz) (British Society of Audiology (BSA) Citation2017) and reported no recent ear infections or surgery, previous neurological issues or speech problems.

Materials

Speech recognition task: Stimuli

A speech recognition task using degraded sentences was chosen as these types of tasks are effective in eliciting LE that can be measured using RTs (e.g. Gatehouse and Gordon Citation1990; Pals et al. Citation2015). Ninety Harvard IEEE sentences (Rothauser et al. Citation1969), spoken by a male speaker, were used as the speech materials. Speech intelligibility was modified using vocoding, an effective way to manipulate the intelligibility of speech in a controlled manner (Drullman, Festen, and Plomp Citation1994; Shannon et al. Citation1995). Vocoding has been shown to affect subjective and objective measures of LE (McMahon et al. Citation2016; Winn Citation2016).

Vocoded stimuli were created using a custom algorithm in Matlab (The Mathworks R Citation2018a). Speech stimuli were processed using a 2-band (high listening demands) or a 3-band (moderate listening demands) tone vocoder, with the frequency of each vocoder band logarithmically spaced between 80 and 8000 Hz. Two and 3 bands were chosen based on pilot testing that resulted in mean correct response rates of around 80% for the moderate listening demands condition and around 50% for the high listening demands condition, using the speech recognition task described below. The carrier frequencies were 225, 1047 and 4861 Hz for the 3-band vocoder and 440 Hz and 4440 Hz for the 2-band vocoder. The temporal envelope of the output of each channel was extracted using half-wave rectification and smoothing (using a low-pass filter with a cut-off frequency of 300 Hz) and used to modulate a sinusoidal carrier with a frequency equal to the centre frequency of the vocoder band. The signals within each band were then summed to produce tone-vocoded sentences.

Speech recognition task: Procedure

shows that in each trial of the speech recognition task, the sentence was presented twice. In the moderate listening demands condition, the first presentation (‘cue’) of the sentence was vocoded to produce a moderate degree of intelligibility (3-band vocoder) followed by a second presentation (‘target’) vocoded for low speech intelligibility (2-band vocoder). In the high listening demands condition, the sentence was always presented (both ‘cue’ and ‘target’) at a low intelligibility level (2-band vocoder). Thus, the second presentation (‘target’) of the speech sentence was always generated with a 2-band vocoder and was therefore identical in terms of its physical properties in both the high and moderate listening demands conditions, with only the ‘cue’ sentence changing in terms of its physical properties and perceived intelligibility. Our approach dissociates the perceptual effects of changes in speech intelligibility and LE from acoustical differences that can be used to vary listening demands. This is achieved by manipulating the perceived intelligibility of identical speech stimuli through prior exposure, that is, vocoded speech that is initially relatively unintelligible can become more intelligible after participants are exposed to an intelligible version of the same speech stimulus (e.g. Davis et al. Citation2005; Millman, Johnson, and Prendergast Citation2015). The use of identical speech stimuli that manipulate listening demands and LE could be particularly advantageous in interpreting changes in objective (physiological) measures of LE.

Figure 1. Depiction of a typical ‘high’ reward trial. In ‘low’ reward trials the pre-trial screen informed participants that the reward was £0.25 rather than £2.50.

Figure 1. Depiction of a typical ‘high’ reward trial. In ‘low’ reward trials the pre-trial screen informed participants that the reward was £0.25 rather than £2.50.

To future-proof the design for potential physiological testing, an assessment method for speech intelligibility was chosen that minimised movement-related noise caused by overt verbal responses. A test word from the sentence was selected randomly from either the beginning, middle or end of the sentence to ensure participants had to listen to the entire sentence. Of the 80 test words, 27 were selected from the beginning, 27 from the middle and 26 from the end of the sentence. Participants were asked to select, using a mouse, which word they had heard within the preceding sentence from amongst five foils presented as a 6-word visual grid (see ). The mouse cursor returned to the middle of the screen when the visual grid was displayed. The location of the test word varied randomly within the 6-word grid, with an equal chance of the test word appearing in any of the 6 positions. All five foils were either phonologically or semantically related to the test word or other foils. For instance, the sentence ‘The loss of a second ship was hard to take’ and the test word ‘take’ had the following foils: phonological foils related to ‘take’ (‘talk’, ‘tale’); a semantic foil for ‘take’ (‘accept’); phonological or semantic foils for other foils (‘except’, ‘tell’). An online rhyming dictionary, rhymezone.com, was used to select phonologically and semantically related foils (>90 similarity rating). The six options were presented immediately after the speech presentation to minimise memory requirements.

Main outcome measures

Correct response rate and RTs

Participants were asked to respond as accurately and quickly as possible. The percentage of correct responses (correct response rate) and the average speed of responses (RTs) were measured. Mean RTs were computed inclusive of incorrect trials to avoid data loss (Houben, van Doorn-Bierman, and Dreschler Citation2013). Mean RTs for incorrect trials were 3.1 s longer than RTs for correct trials (t(748) = 16.48, p < .001), but excluding incorrect trials from the analysis did not change the overall pattern of results.

Subjective ratings of LE and likelihood of giving up

After each trial in the speech recognition task (see ), the monitor displayed two consecutive questions to gauge subjective LE and the likelihood of giving up: ‘How hard did you work to understand what was said?’ and ‘How likely would you be to give up or just stop trying?’ We will refer to these measures as self-rated ‘work’ and ‘giving up’, respectively. The wording used to elicit these self-report ratings was almost identical to the wording used by Picou and Ricketts (Citation2014, Citation2018) and is based upon questions from the Speech, Spatial and Qualities Hearing Scale (Gatehouse and Noble Citation2004). Participants provided subjective ratings, using a mouse, on a visual scale between 0 (‘not at all’) to 100 (‘very’).

Other outcomes measures

NASA task load index

In the NASA Task Load Index (NASA-TLX; Hart and Staveland Citation1988), participants are asked to rate how mentally, physically and temporally demanding they found a recently completed task. Additionally, participants were asked to give ratings on their perceived performance level, how much effort they used and how frustrating they found the task. Rating scales run between 1 (‘very low’) and 20 (‘very high’), except for self-rated performance for which the scale runs from 1 (‘failure’) to 20 (‘perfect’). Participants completed the NASA-TLX immediately after completing all trials. These ratings were collected to gain an overall picture of effort levels and perceptions of the task to aid interpretations of other analyses.

Visual search task

A target word was displayed visually and participants were instructed to select the target word in the 6-word grid as quickly as possible. The mean visual search RT was calculated based on 20 trials. As items within the 6-word response grid used for the speech recognition task were not equally spaced (i.e. selection of the outer items required a slightly greater mouse movement), the mean visual search RT for each participant was used to account for physical differences in the spacing of the items in the 6-word grid.

Covariate measures

Motivational personality traits

Achievement motivation was measured using the Personal Mastery and Competitive Excellence subscales of the Motivational Trait Questionnaire (Heggestad and Kanfer Citation2000). Both of these sub-scales index achievement-orientated traits: individuals scoring high in personal mastery strive to maximise their performance even for challenging tasks, whilst individuals high in competitive excellence strive to achieve a level of success above their peers. The personal mastery section has 16 items, and statements include: ‘I set goals as a way to improve my performance’. The competitive excellence section has 13 items, and statements include ‘Even in non-competitive situations, I find ways to compete with others’. Statements were rated between 1 (very untrue of me) and 6 (very true of me).

Need for closure was measured using the Need for Closure Scale (Roets and van Hiel Citation2011, updated from the original version written by Webster and Kruglanski Citation1994), which indexes a person’s closed-mindedness, dislike of uncertainty and preference for order and predictability (Roets et al. Citation2015). The scale has 15 items, example items include ‘I don’t like situations that are uncertain’. Participants rated these statements between 1 (strongly disagree) and 6 (strongly agree).

Cognitive tests

All cognitive tests were carried out using Inquisit 5 (Millisecond Software Citation2015). Working memory was assessed using an auditory version of the backwards digit span test. Participants were presented with a series of digits and asked to recall them in reverse order. Responses were recorded using a computer keyboard. Participants received two practice trials prior to the main assessment. Participants were initially presented with a 2-digit sequence. Subsequently, the sequence length was adjusted based on performance. Correct recall increased the length of the sequence by 1; failing to recall the sequence correctly after two attempts reduced the sequence length by 1. The backwards digit span was defined as the maximal sequence length of correctly recalled digits after 14 trials.

Linguistic ability was assessed using a lexical decision-making task. Participants were presented with 4 or 5 letter strings and had to indicate whether the strings made up words or non-words as quickly and accurately as possible. Participants recorded their responses via a computer keyboard to yield the lexical decision RT. The task consisted of a practice block containing 6 trials (3 non-words and 3 words in random order), followed by 52 test trials (consisting of 26 words and 26 non-words presented randomly). In each trial, a fixation cross was presented for 700 ms, followed by the stimulus for 250 ms and then a blank screen. The mean RT was calculated for correct trials only.

Procedures

All tasks were completed in a single testing session, which lasted around 1 hour. For the speech recognition task, participants were seated in a sound-attenuated booth facing a computer monitor and given task instructions. During each trial, vocoded sentences were presented diotically at a fixed level of 65 dB(A) via loudspeakers at ±45° azimuth.

After a practice block consisting of 10 trials, 80 test trials were presented in 8 blocks of 10 trials each. There were four high-reward (£2.50) and four low-reward (£0.25) blocks presented in random order. No explanation as to why some trials were worth more than others was provided. Prior to each block, participants were informed that they would receive a financial bonus for answering 6 or more items correctly over the next block of 10 trials. Each block consisted of five trials with moderate listening demands and five trials with high listening demands, presented in random order. Feedback on the performance and the associated award was not given until the end of the experiment, to disassociate the effects of financial reward from mood-related changes in effort, which may occur when participants are given trial-by-trial feedback (Carver Citation2006; Koelewijn et al. Citation2018).

After the speech-recognition task, participants completed the visual search task. Following this, participants were asked to complete the NASA-TLX (Hart and Staveland Citation1988) and the personality questionnaires. Finally, participants were asked to complete the backwards digit span and the lexical decision-making tasks.

Data analyses

Prior to statistical analysis, correct response rates, self-rated ‘work’ and self-rated ‘giving up’ were converted to rationalised arcsine units (RAU) (Studebaker Citation1985). To remove outliers from the RT data, RTs further than three standard deviations from the mean for each participant were removed (Picou, Charles, and Ricketts Citation2017). A log10 transformation was then applied to the RTs to meet the assumption of normality for parametric statistics. For each dependent variable (correct response rate, RT, self-rated ‘work’, self-rated ‘giving up’), a repeated-measures analysis of variance (ANOVA) with two within-subject factors, listening demands (moderate/high) and financial reward (low/high) was conducted.

Linear mixed modelling was carried out to investigate whether cognitive skills and personality traits predicted outcomes from the speech recognition task. Statistical analyses were run in R version 3.5.1 (R Core Team Citation2018), using RStudio 1.1.453 and the nlme package (Pinheiro et al. Citation2020). For each outcome measure from the speech recognition task (correct response rate, RTs, self-rated “work” and self-rated “giving up”) exploratory mixed models were fitted. Eight fixed effect predictors were included: listening demands, financial reward, an interaction term for listening demands and financial reward, backwards digit span, mean lexical decision RT, mean need for closure score, and total scores on the Personal Mastery and Competitive Excellence subscales of the Motivational Trait Questionnaire. Participants were included as a random effect in all mixed models. We used a backwards stepwise procedure (Pinheiro and Bates Citation2000) to prune the initial model in such a way that higher-level interaction terms only remained if they improved the model fit.Footnote1

For each significant cognitive and personality main effect in the exploratory models, we conducted further analyses to investigate whether these predictors interacted with financial reward or listening demands. The full model included main effects for listening demands, reward and the cognitive or personality effect under investigation, plus all first- and second-level interaction effects and was subsequently pruned in the manner described above.

Results

Speech recognition and LE measures

shows the results of the speech recognition task and the associated measures of LE. The correct response rates (% correct) for the speech recognition task are shown in . A repeated-measures ANOVA with two factors (moderate/high listening demands and low/high financial reward), showed a significant effect of listening demands (F(1,23) = 53.76, MSE = 146.49, p < .001, ηp2 = .70) on the correct response rate, with a higher mean correct response rate in the moderate compared with the high listening demands condition collapsed across reward condition (moderate: mean = 69.6%, SEM = .020; high: mean = 50.2%, SEM = .031). There was no significant effect of financial reward on the correct response rate (F(1,23) = .296, MSE = 115.49, p = .592, ηp2 = .013) and no significant interaction between listening demands and financial reward (F(1,23) = .015, MSE = 85.55, p = .902, ηp2 = .001).

Figure 2. (a) Correct response rates (RAU) (b) Mean RTs (log10(s)) (c) Mean self-rated work (RAU) and (d) Mean self-rated likelihood of giving up (RAU) as a function of financial reward for the speech recognition task (**p <.001; *p < .05). Circles represent the moderate listening demands condition, squares represent the high listening demands condition. Error bars represent ±1 standard error of the mean. Results within reward conditions are offset to aid visualisation.

Figure 2. (a) Correct response rates (RAU) (b) Mean RTs (log10(s)) (c) Mean self-rated work (RAU) and (d) Mean self-rated likelihood of giving up (RAU) as a function of financial reward for the speech recognition task (**p <.001; *p < .05). Circles represent the moderate listening demands condition, squares represent the high listening demands condition. Error bars represent ±1 standard error of the mean. Results within reward conditions are offset to aid visualisation.

shows mean RTs (log10(s)) for the speech recognition task. A repeated-measures ANOVA conducted on the RTs showed a significant effect of listening demands (F(1,23) = 18.02, MSE = .01, p < .001, ηp2 = .44) with a slower mean RT in the high listening demands condition, compared with the moderate listening demands condition collapsed across reward condition (high: mean = .788 log10(s), SEM = .023; moderate: mean = .714 log10(s), SEM = .024). There was no significant effect of financial reward on RTs (F(1,23) = 1.83, MSE = .01, p = .190, ηp2 = .074) and the interaction between listening demands and financial reward was non-significant (F(1,23) = .250, MSE = .01, p = .622, ηp2 = .011). To control for physical differences in the spacing of items in the 6-word grid, we calculated a mean visual search RT for each participant (see Visual Search Task in Methods) and subtracted this value from the mean RTs gathered in the listening task in each condition. Running the ANOVA on the adjusted listening task RTs revealed the same pattern of results (i.e. a significant effect of listening demands (F(1,23) = 19.17, MSE = 22.99, p <.001, ηp2 = .455), a non-significant effect of motivation (F(1,23) = 1.77, MSE = 2.64, p = .197, ηp2 = .071) and a non-significant interaction (F(1,23) = .29, MSE = .27, p = .594, ηp2 = .013)).

show ‘work’ and ‘giving up’ self-report ratings (0–100%) for the speech recognition task, respectively. For self-rated ‘work’, a repeated-measures ANOVA showed a significant effect of listening demands (F(1,23) = 21.74, MSE = 57.67, p <.001, ηp2 = .49) with a higher mean ‘work’ rating when listening demands were high compared with moderate, collapsed across reward condition (high: mean = 65.59%, SEM = 3.858; moderate: mean = 58.50%, SEM = 4.076). There was a significant main effect of financial reward (F(1,23) = 6.94, MSE = 23.85, p = .015, ηp2 = .23) with a higher mean “work” rating for high reward compared to low reward collapsed across listening demand conditions (high: mean = 63.43%, SEM = 3.829; low: mean = 60.67%, SEM = 3.829). No significant interaction between reward and listening demands was measured for ‘work’ ratings (F(1,23) = .258, MSE = 13.32, p = .616, ηp2 = .011). A significant main effect of listening demands was measured for self-rated ‘giving up’ (F(1,23) = 24.51, MSE = 25.87, p < .001, ηp2 = .52), driven by a higher mean ‘giving up’ rating for high compared to moderate listening demands collapsed across reward conditions (high: mean = 31.94%, SEM = 5.001; moderate: mean = 27.73%, SEM = 4.644). Financial reward did not have a significant effect on self-rated ‘giving up’ (F(1,23) = .942, MSE = 77.65, p = .342, ηp2 = .039) and the interaction between financial reward and listening demands was not significant (F(1,23) = .710, MSE = 10.44, p = .408, η2 = .030).

NASA-TLX

shows mean ratings for each subscale of the NASA-TLX (Hart and Staveland Citation1988). Participants reported high levels of effort, mental demand and frustration after completing the speech recognition task, suggesting that participants engaged with the task and expended LE.

Figure 3. Mean ratings post speech-recognition task using the NASA-TLX questionnaire (higher values indicate greater demands) Error bars represent ±1 standard error of the mean.

Figure 3. Mean ratings post speech-recognition task using the NASA-TLX questionnaire (higher values indicate greater demands) Error bars represent ±1 standard error of the mean.

Cognitive and personality measures

Group means, standard deviations and ranges for motivational traits and cognitive abilities are shown in . Based on means/ranges in previous studies (e.g. Viola et al. Citation2015), all participants would be classified as low in need for closure. Means and standard deviations for the Motivational Trait Questionnaire subscales were similar to Hinsz and Jundt (Citation2005). Means and standard deviations for backwards digit span were very similar to those recorded for young NH participants in Woods et al. (Citation2011). Means and standard deviations for lexical decision RTs were similar to the NH young participants in Strand et al. (Citation2018). Pearson’s correlation coefficients between motivational traits (need for closure, competitive excellence and personal mastery) and cognitive skills (backwards digit span and lexical decision RT) were non-significant and small (all r <.2 except competitive excellence and need for closure where r = .44, data not shown).

Table 1. Group means, standard deviations and ranges for personality traits/cognitive abilities.

Multi-level modelling of cognitive and personality factors

shows exploratory mixed models for each outcome measure, which included listening demands, reward, demand*reward interaction, backwards digit span, lexical decision RT and totals for the Competitive Excellence and Personal Mastery subsections of the Motivational Trait Questionnaire and the Need for Closure Scale as fixed effects. For correct response rate, alongside a significant effect of listening demands (F(1, 1893) = 81.23, p < .001, ηp2 = .04) we found a significant fixed effect of backwards digit span (F(1, 18) = 7.87, p = 0.01, ηp2 = .01). Our analysis showed that the best fitting model for the correct response rate consisted of listening demands, reward and backwards digit span, of which listening demands (F(1, 1894) = 81.27, p <.001, ηp2 = .02) and backwards digit span (F(1,22) = 6.62, p = 0.02, ηp2 = .01) were individually significant fixed effects. Hence, backwards digit span did not interact with either listening demand or financial reward to affect correct response rate.

Table 2. Summary of mixed models.

For RTs, in addition to a significant fixed effect of listening demands (F(1, 1870) = 41.29, p < .001, ηp2 = .02), we found significant fixed effects for competitive excellence (F(1, 18) = 5.74, p = 0.03, ηp2 = .02) and lexical decision RT (F(1,18) = 6.53, p = 0.02, ηp2 = .02). When exploring potential interaction effects of competitive excellence on RTs, the best-fitting model showed individually significant fixed effects for listening demands (F(1, 1871) = 41.28, p <.001, ηp2 = .02) and competitive excellence (F(1,22) = 4.27, p = 0.05, ηp2 = .02) only. When exploring potential interaction effects of lexical decision RTs on RTs, the best-fitting model showed that only listening demands (F(1, 1871) = 41.29, p <.001, ηp2 = .02) and lexical decision RT (F(1,22) = 5.23, p = 0.03, ηp2 = .02) were individually significant fixed effects.

Only fixed effects of listening demands and reward were found for self-rated work (demands: F(1, 1893) = 65.61, p <.001, ηp2 = .02; reward: F(1, 1893) = 11.30, p < .001, ηp2 = .003) and giving up (demands: F(1, 1893) = 56.51, p <.001, ηp2 = .01; reward: F(1,1893) = 7.40, p = .01, ηp2 = .001). Therefore no further interactions were explored.

Discussion

This study evaluated the relationship between listening demands and motivation in a speech recognition task in the context of a multi-factorial model from the wider field of effort research, i.e. CET (Kruglanski et al. Citation2012). We manipulated motivation by varying financial reward and listening demands by varying the degree of degradation of the vocoded speech presented to listeners. We measured the effects of these manipulations on four main outcomes (correct response rate, RTs, self-rated work, self-rated giving up). We also considered the modulating effects of personality factors and cognitive skills. The manipulations and co-varying factors, as well as the resulting hypotheses, reflect predictions made by CET.

The prediction of a main effect of listening demands was supported. Varying prior knowledge of tone-vocoded speech was found to be an effective way of manipulating listening demands: high listening demands led to significantly decreased correct response rates, increased RTs and increased self-rated “work” and “giving up”. These findings are consistent with CET (Kruglanski et al. Citation2012) and are also in line with FUEL (Pichora-Fuller et al. Citation2016).

We also found the predicted main effect of reward, consistent with CET, which stipulates that financial reward increases motivation, resulting in a stronger driving force and greater mobilisation of effort to counteract the restraining force. It is important to note, however, that according to the mixed model (), the effects of financial reward were limited to self-rated work and giving up and did not extend to increased correct response rates or changes in RTs. It is possible these results did not reflect LE but instead were due to demand characteristics of the experiment, that is, participants realising that greater effort was expected in high reward trials.

Although we cannot rule out demand effects, we propose two alternate interpretations for why financial reward affected only self-rated but not behavioural outcomes. First, LE may be a multi-dimensional concept (McMahon et al. Citation2016; Strauss and Francis Citation2017; Hughes et al. Citation2018; Strand et al. Citation2018; Alhanbali et al. Citation2019) where some measures, for example, self-report, show an effect of LE and others, for example, behavioural outcomes, do not. In a similar vein, other studies have shown physiological effects in response to increased LE under conditions of high financial reward but no significant behavioural effects (e.g. Richter Citation2016; Koelewijn et al. Citation2018). Self-rated LE measures may also be more sensitive to the effects of motivation than behavioural measures (Pichora-Fuller et al. Citation2016; Herrmann and Johnsrude 2020), which might explain why RTs did not appear to be sensitive to the financial reward manipulation used in this study. Second, RTs are sensitive to how motivation is operationalised: Weis et al. (Citation2013) found differential effects on RTs in an auditory discrimination task depending upon whether the financial motivator was presented to participants as a reward (starting from zero and gaining money for correct answers) or a punishment (starting from maximum and losing money for incorrect responses).

The interaction between listening demands and financial reward was non-significant cf. CET, FUEL and MIT. Richter, Gendolla, and Wright (Citation2016) suggest a number of extensions to MIT (Brehm and Self Citation1989) that may limit the greater influence of motivation at higher demands, some of which may explain the lack of interactive effects in this study. These extensions include fatigue level, mood, and participant perceptions of their ability to succeed at a task. In the present study, despite scoring well above chance (∼50% correct) in the high listening demands condition, some participants may have perceived the task as too difficult and hence offering greater reward would have little impact upon effort investment. This may also account for the very high levels of frustration recorded on the NASA-TLX (). Coupled with differences in task design, this may also explain why our results conflict with Zhang et al. (Citation2019) who found an interaction between demands and reward on performance in NH listeners at high correct response rates (∼70-85%).

We predicted that differences in the resource pool capacity (measured by backwards digit span and lexical decision RTs) and in the resource conservation aspect of the restraining force (i.e. need for closure and inter-individual differences in achievement motivation) would impact upon the correct response rate and subjective and behavioural measures of LE in our speech recognition task. A significant main effect of backwards digit span was found for the correct response rate. This result suggests that participants with greater resources performed better, consistent with the driving force component of CET. This finding also supports the ELU model (Rönnberg Citation2003; Rönnberg et al. Citation2013; Rönnberg, Holmer, and Rudner Citation2019), that is, working memory resources are recruited during effortful listening to resolve mismatches between input and representations stored in long-term memory. However, note that individual differences in backwards digit span did not predict behavioural (RT) or self-rated LE, a result that we discuss further in the limitations section (see below).

In contrast, lexical decision RT predicted only RTs from the speech recognition task and not correct response rate. Moreover, the direction of this association did not follow CET predictions. Specifically, CET predicts greater resources (here we assumed lexical decision–making ability) would strengthen the driving force. Yet in the present study, greater LE appeared to be exerted by participants who were slower at lexical decision-making. Our results are also inconsistent with previous suggestions (Larsby, Hällgren, and Lyxell Citation2008; Rönnberg et al. Citation2008; Kaandorp et al. Citation2016; Lyxell and Rönnberg Citation1992) that lexical decision RT is related to the correct response rate in a speech task. The significant predictive effect we found for RTs in the speech recognition task and the lexical-decision-making task may simply be due to both tasks measuring information processing speed.

Personality traits (need for closure, competitive excellence and personal mastery) were included in the multi-level models as these traits may influence the tendency to conserve resources as part of the restraining aspect of CET. CET specifically identifies need for closure as a trait that affects resource conservation. We also expected that individuals with higher Motivational Trait Questionnaire scores (indicating greater achievement motivation) would show a stronger interaction between listening demands and motivation as these individuals may be less conservative in the allocation of their resources (Beh Citation1990; Capa, Audiffren, and Ragot Citation2008; Hinsz and Jundt Citation2005). The only outcome measure for which the exploratory modelling showed a significant effect for competitive excellence was RT. There, more competitive individuals tended to have longer RTs, suggestive of greater LE exertion. However, this main effect did not interact with motivation or listening demands and did not moderate the interaction between these two factors.

Based on CET, we would have expected a significant inverse relationship between participants’ tendency to need closure and effort exertion. The null effect for need for closure goes against CET predictions regarding resource conservation. Moreover, need for closure did not interact with motivation or listening demands or moderate the interaction between these two factors, as was expected based on CET. However, this null-effect may be due to a lack of variability in need for closure within our sample, as, based on previous research (e.g. Viola et al. Citation2015), all our participants would be classified as having low need for closure. Previous research finding a significant effect of need for closure on effort investment (e.g. Richter, Baeriswyl, and Roets Citation2012) screened a large number of participants and conducted an experiment only with the participants who scored in the upper and lower quartiles, i.e. an ‘extreme group’ approach which increases statistical power (Cohen 1998). We, on the other hand, measured need for closure as a continuous covariate. Since the effect of personality on LE outcomes in a speech recognition task appears to be small, employing an extreme group approach and/or enlarging the sample size of the present study may have revealed an effect of personality in line with CET.

Limitations of the present study

We did not find a consistent effect of financial reward on the main outcome measures in the present study, whereas research which informed CET predictions includes experiments where motivation in a listening task was manipulated using financial reward (e.g. Bijleveld, Custers, and Aarts Citation2009). However, the effectiveness of extrinsic rewards when manipulating motivation has been questioned. Previous meta-analyses have concluded that offering tangible rewards undermines a person’s intrinsic motivation, that is, their desire to engage with interesting tasks to the best of their ability (Rummel and Feinberg Citation1988; Wiersma Citation1992; Tang and Hall Citation1995; Deci, Koestner, and Ryan Citation1999). A performance-contingent reward may erode a person’s perceived autonomy and competence at a task if they attribute their performance to be due to the reward rather than their own interest (Lepper, Greene, and Nisbett Citation1973; Deci and Ryan Citation1985). It is, therefore, possible that financial reward may have demotivated some participants in the present study.

More granular aspects of study design may also impact upon the effectiveness of the motivating variable. We presented a high (£2.50) versus low (£0.25) reward for achieving a correct response rate of ≥60% per every 10 trials. Other studies used a much higher reward threshold e.g. 90% (Richter Citation2016) which may have encouraged greater effort, although setting the threshold to gain a reward too high may discourage effort investment if the goal is perceived as impossible (Brehm and Self Citation1989). In addition, participants could feasibly exert high effort in every trial, regardless of reward condition, to maximise the overall amount of reward they received. Introducing the need to strategically allocate resources (e.g. Zhang et al. Citation2019), may result in a more effective manipulation of motivation.

The lack of significant interactions between listening demands and financial reward, and the null effects for the resource pool and resource conservation covariates on our main outcome measures, may be explained by a lack of statistical power and/or a lack of variability in our resource pool measures. Our sample size reflects the number of participants needed to identify main effects on the main outcome measures. The current results will inform appropriate sample sizes in future studies to investigate the interactions predicted by CET/MIT. Such future research can then clarify whether the current null effects were due to methodological limitations of the present study (e.g. lack of extreme groups for personality traits, the possibility that participants perceived the listening task to be too hard), or whether CET is not appropriate for predictions in this particular context.

Conclusions

The present study shows that manipulating prior knowledge by using vocoded speech is a feasible way of varying speech intelligibility and associated measures of LE (RTs, self-ratings) in young, NH listeners. The effects of offering financial reward on LE were more complex: changes in subjective ratings of ‘work’ and ‘giving up’ with increased financial reward were not mirrored by increased correct response rates or greater LE investment, as measured by RTs. We found only partial support for CET predictions which are intended to apply to ‘all instances of goal-directed thinking’ (p. 3, Kruglanski et al. Citation2012). It is unclear whether this is due to the limitations of financial reward as a manipulator of motivation or the multi-dimensional nature of LE (McMahon et al. Citation2016; Pichora-Fuller et al. Citation2016; Strauss and Francis Citation2017; Hughes et al. Citation2018; Strand et al. Citation2018; Alhanbali et al. Citation2019). The results of the exploratory analyses presented here suggests the influence of personality and cognitive skills on effortful listening and their interaction with listening demands and motivation is small.

Abbreviations
CET=

Cognitive Energetics Theory

ELU=

Ease of Language Understanding

FUEL=

Framework for Understanding Effortful Listening

LE=

Listening effort

MIT=

Motivational Intensity Theory

MSE=

Mean square error

NH=

Normal hearing

RAU=

Rationalised arcsine unit

RT=

Response time

SNR=

Signal-to-noise ratio

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that support the findings of this study are available from the corresponding author, PJC, upon request.

Additional information

Funding

This work was funded (PJC) and supported (AH, KJM, REM) by the NIHR Manchester Biomedical Research Centre [BRC-1215-20007].

Notes

1 The Akaike Information Criterion (AIC) was used to estimate the fit of each model. Comparisons were made between the AIC of a model containing a particular interaction effect and a model excluding this interaction term while keeping all other terms identical. The pruned model with the lowest AIC value was then compared with the unpruned model for this level. If the AIC of the pruned model was lower, the pruned model was carried forward to the next stage of analysis and set as the new base model for pruning. If the AIC of the pruned model was higher than the unpruned model, indicating a worse fit, an ANOVA was conducted to compare both model fits. If the pruned model was not significantly worse, it was carried forward as the new base model for pruning. Following this procedure, interaction terms were progressively eliminated, until only one remained. The final model was established by using ANOVA to compare this model to a model consisting of only the three main effects. ML (maximum likelihood) estimation was used for the stepwise comparisons and upon establishing the final model, fixed effects were calculated using REML (restricted maximum likelihood) estimation i.e. a modelling approach similar to Heinrich, Ferguson, and Mattys (Citation2019) and Knight and Heinrich (Citation2017, Knight and Heinrich Citation2018).

References