407
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Cochlear Implant Users Experience the Sound-To-Music Effect

, , , , , , & show all
Received 28 Mar 2023, Accepted 23 Jan 2024, Published online: 12 Feb 2024

ABSTRACT

The speech-to-song illusion is a robust effect where repeated speech induces the perception of singing; this effect has been extended to repeated excerpts of environmental sounds (sound-to-music effect). Here, we asked whether repetition could elicit musical percepts in cochlear implant (CI) users, who experience challenges with perceiving music due to both physiological and device limitations. Thirty adult CI users and thirty age-matched controls with normal hearing (NH) completed two repetition experiments for speech and nonspeech sounds (water droplets). We hypothesized that CI users would experience the sound-to-music effect from temporal/rhythmic cues alone, but to a lesser magnitude compared to NH controls, given the limited access to spectral information CI users receive from their implants. We found that CI users did experience the sound-to-music effect but to a lesser degree compared to NH participants. Musicality ratings were not associated with musical training or frequency resolution, and among CI users, clinical variables like duration of hearing loss also did not influence ratings. Cochlear implants provide a strong clinical model for disentangling the effects of spectral and temporal information in an acoustic signal; our results suggest that temporal cues are sufficient to perceive the sound-to-music effect when spectral resolution is limited. Additionally, incorporating short repetitions into music specially designed for CI users may provide a promising way for them to experience music.

Introduction

From a young age, speech and song are easy to distinguish from one another. However, the speech-to-song illusion illustrates that this seemingly categorical distinction can also be manipulated (Vanden Bosch der Nederlanden et al., Citation2022). This illusion is a unique perceptual transformation where a spoken phrase is first experienced as it was originally intended (as speech), and with repetition, listeners experience the same recording instead as if it had been sung (Deutsch et al., Citation2011). The speech-to-song effect was originally reported with the phrase “sometimes behave so strangely” (; Deutsch, Citation1995, Citation2003), and it has since been replicated with other English phrases (Rowland et al., Citation2019; Tierney et al., Citation2013, Citation2018a, Citation2018b), in several other languages (Falk & Rathcke, Citation2010; Falk et al., Citation2014; Groenveld et al., Citation2020; Rathcke et al., Citation2021), and even in languages unfamiliar to the listener (Jaisin et al., Citation2016; Margulis et al., Citation2015). Because the speech-to-song illusion is widely experienced across many cultures and languages, it can provide a unique window into understanding how auditory and cognitive systems generate music from non-musical inputs.

Figure 1. An example of how cochlear implants process speech. We processed an original stimulus file from Deutsch Citation2003 (A) through a cochlear implant simulation (Litvak et al., Citation2007) and plotted the respective waveforms (B) and spectrograms of component frequencies over time (C) within the range of CIs. An enhanced view of the word “so” (green) illustrates the similarity between acoustic (D) and CI processed (E) samples for high frequency consonants like/s/while low frequency vowels like/o/lack the regularly spaced bands corresponding to formants and have a coarse representation of pitch. Without clear spectral resolution of harmonics or timbre, music perception through a cochlear implant is impeded and overall enjoyment tends to be low. Audio files used in this figure may be freely downloaded from our OSF repository (https://osf.io/dujt3/). Original file – clip_Figure1.wav; CI processed file – clipVocoded_Figure1.wav.

Figure 1. An example of how cochlear implants process speech. We processed an original stimulus file from Deutsch Citation2003 (A) through a cochlear implant simulation (Litvak et al., Citation2007) and plotted the respective waveforms (B) and spectrograms of component frequencies over time (C) within the range of CIs. An enhanced view of the word “so” (green) illustrates the similarity between acoustic (D) and CI processed (E) samples for high frequency consonants like/s/while low frequency vowels like/o/lack the regularly spaced bands corresponding to formants and have a coarse representation of pitch. Without clear spectral resolution of harmonics or timbre, music perception through a cochlear implant is impeded and overall enjoyment tends to be low. Audio files used in this figure may be freely downloaded from our OSF repository (https://osf.io/dujt3/). Original file – clip_Figure1.wav; CI processed file – clipVocoded_Figure1.wav.

The speech-to-song illusion is not dependent on musical training (Vanden Bosch Der Nederlanden et al., Citation2015), and perceivers have highly stable musicality responses (Tierney et al., Citation2018a) that are difficult to unhear once perception of the illusion occurs (Groenveld et al., Citation2020). Prior work has also found that acoustic properties of speech such as pitch contour can be manipulated to make a speech stimulus sound more or less musical with repetition (Falk et al., Citation2014; Tierney et al., Citation2018a). We refer to musicality ratings as the perceived song-like or music-like quality of repeating sounds (e.g., as in Simchy-Gross & Margulis, Citation2018; Tierney et al., Citation2021). In the present study, we questioned whether repetition could elicit musical percepts in a group of individuals known to experience difficulties with music perception and appreciation in their daily lives – cochlear implant users (Drennan et al., Citation2015; Gfeller et al., Citation2012; Kang et al., Citation2009).

A cochlear implant (CI) is a neuroprosthetic that enables a sense of hearing for individuals with significant sensorineural hearing loss. These devices transduce sounds into a series of electrical pulses that are sent by radio frequency to an internal receiver and are ultimately transmitted to an electrode array that is surgically implanted in the cochlea. There, electrical current is applied according to the tonotopic mapping of the cochlea (i.e., from high to low frequencies, spiraling inward; ) to trigger action potentials in otherwise latent auditory nerve fibers. The place of stimulation along the cochlea conveys a rudimentary sense of pitch-one that is both sparse and frequency-shifted in comparison to normal hearing.

Several factors limit the spectral resolution of sounds heard with a CI, the effects of which are detrimental for music perception. One factor is the interaction between neighboring electrodes that results from the spread of electrical excitation within the fluid-filled cochlea – commonly referred to as channel interaction. In typical acoustic hearing, thousands of auditory hair cells mechanically transduce sound waves in a precise tonotopic mapping. CIs must bypass these damaged cells to directly stimulate primary auditory neurons. As a result, current spreads through the surrounding fluid. This can effectively reduce a 22-electrode array to only a fraction of channels that are actually functional (Berg et al., Citation2019, Citation2021; Friesen et al., Citation2001; Gifford et al., Citation2022). Thus, the nature of the electrode-neural interface remains a limiting factor for the number of useable channels. For low frequencies, this means that a range of over a hundred Hertz is mapped to a single electrode and up to 1,000 Hertz for the highest-frequency electrodes (Ali et al., Citation2015). Not surprisingly, this affects frequency discrimination and music perception for CI users who require, on average, three semitones (Drennan et al., Citation2015; Fujita & Ito, Citation1999; Kang et al., Citation2009) or sometimes as much as a full octave to discriminate different musical pitches (Gfeller et al., Citation2002). In contrast, listeners with typical acoustic hearing can distinguish frequency differences well below one semitone. Because musical melodies typically contain smaller intervals than many CI listeners can differentiate between, even simple, monophonic tunes can be difficult for CI users to identify (Drennan et al., Citation2015).

Another factor that affects music perception with a CI is how sounds are encoded by implant processors. All commercially available implants prioritize temporal information in low modulation frequencies (i.e., below 300 Hz) that correspond to the slow changes in the temporal envelope, or the overall outline of a sound wave (). That is, current CI systems extract the frequency-specific temporal envelope corresponding to each electrode’s bandpass filter. This approach works well for conveying speech, though it discards faster modulations corresponding to temporal fine structure. Important musical qualities like pitch and timbre rely on temporal fine structure, which are poorly conveyed through an implant. Even experimental CI technologies attempting to preserve temporal fine structure (Riss et al., Citation2014) still prove challenging due to physiological limitations (Oxenham et al., Citation2004). In sum, the sound envelope is most important for speech perception and fine structure for pitch perception (Smith et al., Citation2002), and CIs more successfully preserve the former.

A CI can afford remarkable speech perception abilities in quiet based primarily on temporal encoding of speech envelopes; however, the addition of even moderate background noise like neighboring talkers can drastically reduce understanding (Dunn et al., Citation2020; Gifford et al., Citation2018). For the same reason, two musical sounds with the same envelope (i.e., matching attack, duration, and decay) can also be rendered indistinguishable through CI processing despite having quite different temporal fine structures that correspond to say, a snare drum and a flute. Unsurprisingly, many CI users report reduced satisfaction with music listening following CI activation (Riley et al., Citation2018). Furthermore, rhythm – encoded by the temporal envelope of an acoustic signal rather than temporal fine structure – is more easily perceived by CI users. For this reason, CI users generally perform similar to normal hearing (NH) controls on rhythm tasks (Brockmeier et al., Citation2011; Kim et al., Citation2010; Limb & Roy, Citation2014) and prefer music with salient rhythms (Buyens et al., Citation2014, Citation2018).

To illustrate the differences between acoustic hearing and cochlear implant processing in the context of the speech-to-song effect, we plotted the waveforms of a normal, unaltered spoken phrase from (Deutsch et al., Citation2011) () and the same stimulus filtered through a cochlear implant processing algorithm (Litvak et al., Citation2007); both sound files can be downloaded from our Open Science Framework (OSF) repository (https://osf.io/dujt3/). The waveforms of the unfiltered stimulus versus the filtered stimulus are quite similar (), illustrating the high temporal fidelity of cochlear implants. In contrast, the low spectral resolution of CIs becomes more apparent when considering the frequency composition of the spectrogram (). In particular, the visualization of the word “so” illustrates the initial broad-spectrum frequencies of a consonant followed then by horizontal bands of formants corresponding to a resonant vowel (, acoustic). While the fundamental frequency of the vowel is evident in a dark band on the CI-processed spectrogram (, electric), the formants are not. In effect, CI processing makes vowels appear more like consonants in a spectrogram and similarly obscures musical elements like harmonics and timbre.

The underlying acoustics that differ between speech and song are subtle (Vanden Bosch der Nederlanden et al., Citation2022) and may not be perceptible after CI implantation. Though bottom-up, peripheral encoding with CIs is limited, successful music and song perception may still be possible for CI users via top-down perceptual mechanisms such as repetition. This is similar to how top-down mechanisms such as semantic content can facilitate speech recognition (e.g., Moberly & Reed, Citation2019). Assessing the sound-to-music effect in CI users is critical for understanding what types of cues (e.g., temporal, spectral, repetition-based) contribute to music perception, and if the auditory system in individuals with sensorineural hearing loss is able to make sense of a sound-to-music effect (i.e., generating musicality from repetition of different types of acoustic input).

A consideration for whether CI users could experience a perceptual transformation to song comes in part from prior studies identifying a broader “sound-to-music effect” for non-speech sounds. A wide range of sounds seem more musical upon repetition, including randomly generated and complex tones (Margulis & Simchy-Gross, Citation2016), a range of environmental sounds (Simchy-Gross & Margulis, Citation2018; Tierney et al., Citation2018b) including water droplets (Rowland et al., Citation2019), and percussive cross-sticks aligned with speech rhythms (Rowland et al., Citation2019). It seems to be a broader property of the auditory system that once a stimulus is looped over and over, without being altered or harmonically transposed during the intervening repetitions (Deutsch et al., Citation2011), many stimuli start to be perceived as more musical. And, though both melody and rhythm play a role in the effect (Falk et al., Citation2014), the studies mentioned above suggest that rhythm alone may be sufficient to elicit repetition-induced musicality.

We hypothesized that CI users would experience the sound-to-music effect based primarily upon the well-preserved temporal cues of cochlear implants that convey rhythm. However, we also hypothesized they would experience the effect at a lower magnitude compared to NH controls, given controls’ greater access to melodic cues also contributing to the effect.

Methods

Participants

Thirty CI users and thirty age-matched listeners with normal hearing (NH) were included in this study (see for participant demographics). This study was approved by the Vanderbilt University Institutional Review Board (IRB). CI users were recruited from the Vanderbilt Cochlear Implant Lab database and the Vanderbilt Bill Wilkerson center, while NH control participants were recruited from the community. Seven CI users and eight NH controls reported formal musical training (more details in Music experience survey section).

Table 1. Participant demographic data.

Participants from both groups were included in the study if they self-reported 1) no major neurological or psychiatric conditions and 2) English as a first language. Because two CI users self-reported neurological histories (n = 1 stroke 8 years prior, n = 1 medicated epilepsy), we conducted our main analyses both with and without these participants in the models. Additional inclusion criteria included a hearing screening for NH controls requiring pure tone thresholds <20 dB HL in both ears for octave frequencies 250–8000 Hz. CI users had to have postlingual onset of moderate-to-profound hearing loss. An additional two CI users and three NH controls completed the study but were excluded from all analyses due to technical difficulties (n = 3), prelingual deafness (n = 1 CI user), and an abnormal hearing screening (n = 1 control).

For all CI users, moderate-to-profound hearing loss had occurred after 2 years of age, and, for the 27 CI users for whom this data was available, the mean duration of hearing loss was 14.7 years (SD = 13.54, range = 1–47 years). Here, we used the duration of significant hearing loss (i.e., when participants reported difficulty communicating over the phone, or sudden/profound loss) which we attained through a mixture of self-report and clinical records. Twenty-two individuals had unilateral CIs and eight had bilateral CIs. For 13 individuals who had residual hearing in the non-implanted ear (contralateral to the CI), we removed any hearing aids and occluded this ear during testing with a foam earplug so that the experiments were completed with CI-only listening. Six CI users, however, had residual acoustic hearing in the implanted ear(s) yielding a form of “hybrid hearing” that involves combined electric-acoustic stimulation, or EAS. These six CI users did have access to some – typically low frequency – acoustic cues during experimental testing.

Frequency Discrimination Task

To account for the role of frequency perception in the speech-to-song illusion, we designed a 3-interval, 2-alternative forced-choice (2AFC) frequency discrimination task. In each trial, participants were presented with three intervals: one reference interval of a 440 Hz pure tone (300-ms duration, 0.5 interstimulus interval) and two other intervals (one of which was a different frequency). Participants indicated which of the remaining two intervals sounded different from the others by selecting the corresponding interval number (“2” or “3”) on a touchscreen. The experiment began with a large target increment of 200 Hz (e.g., 640 Hz on the first trial) and narrowed via an adaptive staircase procedure (2-down, 1-up). There were 12 total reversals. Steps were initially 50% for the first four reversals, 20% for the next four reversals, and 5% for the last four reversals. Threshold was calculated as the geometric mean of the last eight reversals. The task was programmed in MATLAB 2018a and took approximately 4 minutes to complete.

To avoid the undue influence of outliers in subsequent analyses, we winsorized frequency discrimination thresholds separately for each group for values below the 5 or above the 95%ile cutoffs using the winsor function in MATLAB.

Standardized Frequency Discrimination Task for CI Users: UWCAMP

For a clinical measure of frequency discrimination, CI users also completed a standardized task: the pitch discrimination subset of the University of Washington’s Clinical Assessment of Music Perception (UWCAMP) (Kang et al., Citation2009). In this 2AFC task, participants identify the complex tone that is higher in pitch. The task is administered in three interleaved adaptive tracks (1-up and 1-down tracking procedure) for three base frequencies (262, 330, 391 Hz), corresponding to the notes C4, E4, and G4. This task returns a threshold at 75% accuracy on a psychometric function. For consistency, we also winsorized these thresholds.

Main Repetition Experiments: Stimuli and Task

We tested 14 speech stimuli (English phrases and sentences) and 14 environmental (water) stimuli developed by Rowland et al. (Citation2019). Speech stimuli ranged from 0.739 to 2.592 seconds long and water stimuli ranged from 0.700 to 2.870 seconds long. Sound files and deidentified data are available on Open Science Framework (https://osf.io/dujt3/).

Using similar experimental methods to (Rowland et al., Citation2019) (), we asked participants to listen to a single presentation of an audio clip, after which they made a “pre-repetition” perceptual rating in response to the question “How much does this sound like speech or singing” (speech condition) or “How much does this sound like music” (water condition). Participants responded by clicking along a horizontal line with the endpoints “exactly like speech/exactly like singing” (speech condition) or “not at all like music/exactly like music” (water condition); this scale represented a 0–1 continuum. The stimulus was then looped 16 times while participants continuously responded to the same question on the same scale; these continuous responses were collected following the protocol of (Rowland et al., Citation2019) but were not analyzed in the present study. After the 16 loops, participants provided a second “post-repetition” rating. Four practice trials for each condition confirmed participant understanding prior to the start of each experiment. We administered the water condition followed by the speech condition for all participants with the exception of one CI user who, due to technical difficulties, only completed the speech condition. As in (Rowland et al., Citation2019), our analyses focus on the discrete pre- and post-repetition responses.

Figure 2. Schematic of the trial structure for the speech condition. Figure reproduced with permission from (Rowland et al., Citation2019).

Figure 2. Schematic of the trial structure for the speech condition. Figure reproduced with permission from (Rowland et al., Citation2019).

Qualitative Feedback Survey

Following each experiment (speech, water), participants answered the brief question, “In this section of the experiment, would you say that you typically perceived: a) a rhythm, b) a melody, c) both rhythm and melody, or d) none of the above?” Responses were coded on a binary scale with the variables “Rhythm” and “Melody.” For example, if a participant responded “both” in a given condition, they were assigned a “1” for Rhythm and a “1” for Melody. Conversely, if a participant responded “neither” they were assigned a “0” for both variables.

Music Experience Survey

Last, all participants completed an online music experience survey with select questions from the Munich Music Questionnaire (Brockmeier et al., Citation2000) and the Ollen Musical Sophistication Index (Ollen, Citation2006). The survey consisted of three sections: 1) general questions related to music listening and importance of music in daily life, 2) experience with music with a cochlear implant (CI group only), and 3) musical training. For analyses, we used a binary scale (−1 = non-musician, 1 = musician) which was based on the yes/no survey question: “Do you currently consider yourself a musician?” All NH controls (n = 30) and all but one CI participant (n = 29) completed the survey. Questions are in supplemental Table S1.

Procedure and Test Environment

This study was conducted in accordance with the guidelines and regulations of Vanderbilt University’s IRB. All participants provided written informed consent and were compensated for 1–1.5 h of their time. Stimuli were presented in stereo from two Yamaha Model HS8 powered loudspeakers, positioned at approximately 45° and 315° azimuth from participants who were seated in a sound-attenuated booth throughout testing.

Analysis

First, we analyzed ratings using linear mixed effects models with the fitlme function in MATLAB 2018b. Model comparisons were made using likelihood ratio tests and AIC/BIC comparisons (the compare function in MATLAB). Categorical variables were effect coded as follows:

GroupNH controls: −1CI users: 1

Conditionwater: −1speech: 1

Timepre-repetition: −1post-repetition: 1

We investigated effects from the main model using post-hoc tests, correcting for multiple tests using a Bonferroni correction for significance, and assessing for normality of variables using qqplot and the Anderson-Darling Test (adtest function in MATLAB). In cases where assumptions of normality were not met, we conducted non-parametric post-hoc tests (Wilcoxon signed rank or Mann–Whitney U-test); otherwise parametric t-tests were employed.

For both groups, we then fit separate linear mixed effects models with frequency discrimination threshold and musical experience as potential predictors of musicality ratings (i.e., post-repetition – pre-repetition = ΔRating). For the CI group, we explored whether the following variables were associated with ΔRating: duration of hearing loss, presence/absence of acoustic amplification during testing – as was the case for the 6 CI recipients with acoustic hearing preservation in the implanted ear(s) – and daily hearing condition (i.e., whether individuals had acoustic hearing available during daily life).

Lastly, we assessed qualitative feedback ratings using Fisher’s exact tests to assess group differences in the proportion of responses in each category (rhythm, melody).

Results

Replication of Rowland et al., Citation2019

To establish the validity of the current experiment, we compared the results from our NH controls (mean age = 50.8 years) to the younger participants in (Rowland et al., Citation2019) (mean age = 19–20 years old). In (Rowland et al., Citation2019), mean musicality ratings (i.e., ΔRating) were 0.26 in the speech condition and 0.47 in the water condition compared to 0.25 and 0.36, respectively, in the present study. Independent samples t-tests confirmed that, despite a 30+ year age difference, these two cohorts were not significantly different from one another (speech: t(58) = 0.14, p = 0.89; water: t(58) = 1.71, p = 0.092), suggesting that age does not affect the magnitude of the sound-to-music effect ().

Figure 3. Comparison of results from (Rowland et al., Citation2019) to the older cohort of normal hearing participants from this study. We replicated results for both speech and water conditions with no significant differences between the two cohorts. Left figure panel reproduced with permission from (Rowland et al., Citation2019).

Figure 3. Comparison of results from (Rowland et al., Citation2019) to the older cohort of normal hearing participants from this study. We replicated results for both speech and water conditions with no significant differences between the two cohorts. Left figure panel reproduced with permission from (Rowland et al., Citation2019).

The Sound-To-Music Effect in Cochlear Implant Users

For speech stimuli, the mean musicality ratings (i.e., ΔRating) were 0.18 (SE = 0.025) for CI users and 0.25 (SE = 0.043) for NH controls in the present study (as noted above). For water stimuli, mean musicality ratings were 0.19 (SE = 0.032) for CI users and 0.36 (SE = 0.048) for NH controls (; ).

Figure 4. Comparison of pre-repetition and post-repetition ratings for CI users (red) and NH controls (purple) in the speech condition (left) and water condition (right). Perceptual ratings ranged from 0 (exactly like speech/not at all like music) to 1 (exactly like singing/exactly like music). Both groups experienced the sound-to-music effect (i.e., a positive slope). Error bars are standard error of the mean.

Figure 4. Comparison of pre-repetition and post-repetition ratings for CI users (red) and NH controls (purple) in the speech condition (left) and water condition (right). Perceptual ratings ranged from 0 (exactly like speech/not at all like music) to 1 (exactly like singing/exactly like music). Both groups experienced the sound-to-music effect (i.e., a positive slope). Error bars are standard error of the mean.

Table 2. Descriptive statistics for the repetition experiments. Mean (standard error) and 95% confidence intervals.

The full main model (fit by maximum likelihood) included perceptual ratings (averaged over items) as the dependent variable, fixed effects for group (CI users, NH controls), condition (speech, water) and time (pre-repetition, post-repetition) as well as their interactions, and random intercepts for participant and participant varying by group:

model1 <- fitlme(data, Rating ~ 1+ Group*Condition*Time + (1|participant: Group) + (1|participant))

The main effect of repetition (i.e., Time) was significant (β = 0.12, p < 0.001) and there was no main effect of group (β = −0.019, p = 0.32). Post-hoc tests comparing pre-and post-repetition ratings, separately for each group and condition, confirmed that in all cases, post-repetition ratings were higher, suggesting that CI users and NH controls are able to experience the effect (CI users’ speech: Z = 4.39, p<0.001; NH controls speech: Z = 4.34, p < 0.001; CI users’ water: Z = 4.31, p < 0.001; NH controls water: Z = 4.68, p < 0.001; tests applied a Bonferroni correction for multiple tests). Ratings for both groups were greater for water compared to speech (main effect condition, β = −0.057, p < 0.001). There was a significant group × time interaction, and post-hoc tests showed this was due to higher post-repetition ratings (across speech and water condition together) in the NH control group (t(117) = −2.12, p = 0.036). No other interactions were significant. See for full results of Model1.

Table 3. Fixed effects results for main sound-to-music effect model (Model1).

Model1 provided a better fit to the data than a) a model without interactions amongst fixed effects (p = 0.0093), and b) a null model without any fixed effects, but with the same random effects structure as Model1 (p < 0.001). See for model comparison statistics. We also ran a version of Model1 with covariates for age and gender, which were not significant predictors of Rating. Last, we ran a version of Model1 excluding the two CI users with neurological confounds, and found results were nearly identical to the full Model1. Therefore, all subsequent models were run with the full cohort of CI users (n = 30).

Table 4. Model fit statistics for the full Model1 and reduced versions of the model.

As a complementary analysis to further understand musicality ratings between groups, we ran a model with the magnitude of the repetition effect, ΔRating, as the dependent variable. This differed from Model1 in that Model1 used absolute ratings as the dependent variable, whereas this analysis used difference values as the dependent variable. In both cases, ratings were averaged across all items in a given condition. In this analysis of ΔRating, NH controls had higher slopes than CI users (main effect of group, β = −0.060, p = 0.0082). Again, we found a main effect of condition (β = −0.031, p = 0.035) and no significant group × condition interaction (β = 0.024, p = 0.10). Overall, the results from this model parallel those of Model1. See Table S2 in the Supplement for full model results.

To test whether there was a disproportionate subset of “non-perceiving” participants between the two groups, we analyzed individual data (). Non-perceivers were participants with a mean ΔRating of 0.05 or less on the 0–1 continuous rating scale. In the speech condition, five CI users and nine NH controls met this criterion, as did eight CI users and six NH controls in the water condition. The number of non-perceivers was not significantly different between groups in either the speech condition (χ2 = 1.49, p = 0.22) or water condition (χ2 = 0.47, p = 0.49), suggesting a similar proportion of individuals who did and did not perceive the illusion.

Figure 5. Individual participant data for the repetition experiments (speech left, water right). Each participant’s post and pre-repetition ratings (averaged over all trials) are plotted in gray, with the group means overlayed in color (CI users in red, NH controls in purple). There is significant individual variability in the magnitude of the effect.

Figure 5. Individual participant data for the repetition experiments (speech left, water right). Each participant’s post and pre-repetition ratings (averaged over all trials) are plotted in gray, with the group means overlayed in color (CI users in red, NH controls in purple). There is significant individual variability in the magnitude of the effect.

Frequency Discrimination

After winsorizing, frequency discrimination threshold summary statistics for the two groups were as follows: CI users: mean = 25.62 Hz (0.98 semitones), SE=5.03; NH controls: mean = 5.68 Hz (0.22 semitones), SE=2.11 (). Not surprisingly, a Welch’s two-sample t-test indicated that CI users have significantly poorer frequency discrimination (i.e., higher thresholds) than NH controls (t(37.5)=3.66, p<0.001, CI: 8.90–30.98). Descriptive statistics for UWCAMP were as follows: 262 Hz condition (mean ± SD = 49.33 ± 37.64 Hz; mean = 2.99 semitones); 330 Hz condition (mean ± SD = 57.55 ± 74.53 Hz; mean = 2.78 semitones); 391 Hz condition (mean ± SD = 79.86 ± 94.06 Hz; mean = 3.22 semitones). These thresholds are largely consistent with previous UWCAMP data (Drennan et al., Citation2015; Kang et al., Citation2009).

Figure 6. Comparison of frequency discrimination thresholds, after winsorizing, for CI users (n = 29) and NH controls (n = 29). Each bar represents one participant’s threshold for distinguishing pitches above/below a 440 Hz reference tone (black line). CI users (red) had significantly higher thresholds than NH controls (purple; p<0.001).

Figure 6. Comparison of frequency discrimination thresholds, after winsorizing, for CI users (n = 29) and NH controls (n = 29). Each bar represents one participant’s threshold for distinguishing pitches above/below a 440 Hz reference tone (black line). CI users (red) had significantly higher thresholds than NH controls (purple; p<0.001).

We then fit separate models for the CI (model2) and NH (model3) groups, with frequency discrimination threshold as a fixed effect in the model and a random intercept for participant. The dependent variable in these models was ΔRating.

model2 <- fitlme(CI_data, ΔRating ~1+ Condition*Freq_Discrim + (1|participant))

model3 <- fitlme(NH_data, ΔRating ~1+ Condition*Freq_Discrim + (1|participant))

In the CI group there was neither an effect of frequency discrimination threshold (β = −0.0014, p = 0.11) nor condition (β = −0.011, p = 0.65) on ΔRating. In the NH control group, there was no effect of frequency discrimination on ΔRating, but there was a significant main effect of condition (β = −0.067, p = 0.014) indicating slopes were higher for water compared to speech, consistent with our previous models. There were no significant interactions for either group (model results in Table S3). These results indicate that frequency resolution did not affect individuals’ perception of the sound-to-music effect.

Relationship Between CI users’ Clinical Factors and the Sound-To-Music Effect

Next, we explored whether clinical variables within the CI group were associated with the magnitude of the sound-to-music effect (i.e., ΔRating). This model was based on the twenty-six individuals who had all clinical data available.

model4 <- fitlme(CI_data, ΔRating ~1+ Condition + Duration Hearing Loss + Hearing Condition + Condition Tested + (1|participant))

None of these clinical factors – duration of hearing loss, presence/absence of acoustic hearing during experimental testing, presence/absence of acoustic hearing in daily life – were significantly associated with ΔRating, and condition (i.e., speech, water) was also not significantly associated with ΔRating. This suggests that clinical variability in acoustic hearing and duration of hearing loss did not impact musicality ratings (model results in Table S4).

Musical Experience (ME)

Seven participants in the CI group and eight participants in the NH group reported having musical experience (ME). ME was entered as a fixed effect in order to understand how musical experience influenced perceptual musicality ratings:

model5 <- fitlme(data, ΔRating ~1 + Group*Condition + ME + (1|participant:Group) + (1|participant))

There was not a main effect of musical experience (β = 0.0066, p = 0.80) indicating that musicians and non-musicians alike perceived the sound-to-music effect. The main effect of condition was not significant (β = −0.027, p = 0.062). Additionally, there was a main effect of group (β = −0.060, p = 0.0089), indicating NH controls had higher slopes than CI users, and a marginally significant group × condition interaction (β = 0.028, p = 0.051). For full model results, see Table S5. In addition to our binary scale, we created a cumulative score which included six questions from the musical experience survey, each normalized to a 0–1 scale. Results were almost identical when we used this more nuanced measure of musical experience. Overall, these results suggest that musicianship did not influence musicality ratings.

Qualitative Feedback Responses

In the speech condition, there were no group differences for the number of rhythm and melody qualitative ratings (OR = 0.72, p = 0.50). Similarly, the water condition had no group differences for the number of rhythm and melody responses (OR = 0.96, p = 1.00). These results suggest that participants in both groups were qualitatively experiencing similar percepts (), which is consistent with their quantitative ratings (). Because there were seemingly more “rhythm” responses for both groups in the water condition compared to speech, we did an exploratory Z-test for proportions which indicated that participants did perceive rhythmic cues to be more apparent than melody cues in the water condition (z=-6.12, p < 0.001). As a complementary approach, we then compared the proportion of total rhythm responses to a proportion of 0.5 (i.e., an assumed equal number of rhythm and melody responses) and again found rhythm cues to be more apparent (z = 4.33, p < 0.01). The number of rhythm and melody responses did not differ from one another in the speech condition (z = −1.58, p = 0.11) and did not differ when comparing to an assumed equal number of responses between the two (z = 1.12, p = 0.26).

Figure 7. Qualitative feedback ratings for speech condition (left) and water condition (right) for both CI users and NH controls. Following each experiment, participants answered the question, “would you say that you typically perceived: a) a rhythm, b) a melody, c) both rhythm and melody, or d) none of the above?” results indicate similar responses in both CI users and NH controls.

Figure 7. Qualitative feedback ratings for speech condition (left) and water condition (right) for both CI users and NH controls. Following each experiment, participants answered the question, “would you say that you typically perceived: a) a rhythm, b) a melody, c) both rhythm and melody, or d) none of the above?” results indicate similar responses in both CI users and NH controls.

Experience with Music in CI Users

Because music satisfaction is typically decreased in CI users, we wanted to characterize this tendency in our cohort. When surveyed about music listening satisfaction, CI users reported a range of ratings from 1 to 10 on a scale of “very unsatisfied” to “very satisfied.” The median response was 4 (SD=2.65). After separating participants by whether they did (n = 17) or did not (n = 11) have residual acoustic hearing, we found that the median satisfaction ratings were 4 and 3, respectively ().

Figure 8. Histograms of responses to questions in the music experience survey, provided by CI users. Participant numbers are included on each figure panel. Vertical dotted lines represent the median and both questions were on a 10-point likert scale. panel A. “how satisfied are you with how music sounds through your cochlear implant?” panel B. “Currently how often do you listen to music?” (red) and “before hearing loss, how often did you listen to music?” (purple).

Figure 8. Histograms of responses to questions in the music experience survey, provided by CI users. Participant numbers are included on each figure panel. Vertical dotted lines represent the median and both questions were on a 10-point likert scale. panel A. “how satisfied are you with how music sounds through your cochlear implant?” panel B. “Currently how often do you listen to music?” (red) and “before hearing loss, how often did you listen to music?” (purple).

Ratings for current time spent listening to music ranged from 1 (“never) to 10 (“often) and the median response was 7 (SD=2.85). Participants reported more time spent listening to music before their hearing loss began, with a median response of 9 (SD = 2.70) on the same scale (). Overall, CI users reported more time spent listening to music prior to their hearing loss and low satisfaction with music after CI activation.

Discussion

Given the difficulty that CI users face with music perception, we questioned whether they could perceive musicality through repetition in the speech-to-song illusion, thereby circumventing device limitations for encoding spectral elements like pitch. In line with our hypothesis, we found that CI users experienced the musical transformation for both speech and non-speech (i.e., dripping water) stimuli, but at a lower magnitude compared to NH controls. We also found that frequency discrimination, musical training, and clinical variables within the CI group were not associated with musicality ratings. CIs provide a strong clinical model for disentangling the effects of spectral and temporal information in an acoustic signal, and our results suggest that intact rhythm perception is sufficient for experiencing this auditory illusion.

In our prior work, a deconstructed speech signal with either rhythmic or spectral content alone could elicit the sound-to-music effect, suggesting that repetition-induced musicality may be a broader principle of the auditory system (Rowland et al., Citation2019). Others have suggested that a possible mechanism for this illusion could be disinhibition of neural circuitry underlying pitch/melody perception (Deutsch et al., Citation2011). Based on the present findings, pitch information does seem to contribute to the magnitude of the sound-to-music effect, as the addition of spectral cues (i.e., available to NH controls but not to CI users) led to an increased magnitude, or slope, of musicality ratings. However, since CI users also experienced the sound-to-music effect, temporal/rhythmic cues do seem sufficient to perceive the effect. The stimulus manipulations in (Rowland et al., Citation2019) (i.e., percussive cross-stick samples at speech event onsets) highlight the importance of rhythmic cues in the sound-to-music effect – results that we replicate here in a clinical population. Our findings suggest that the auditory system may be relying more on temporal envelope cues rather than fine structure for the perception of this illusion.

For both groups, musicality ratings were higher for water than for speech, which parallels the condition effect sizes also reported in (Rowland et al., Citation2019). We replicated prior results (Rowland et al., Citation2019) in our older, NH control group which suggests, along with other findings (Mullin et al., Citation2021), that age does not affect the magnitude of the sound-to-music effect. Furthermore, our replication highlights that musicality ratings are highly stable across participants for a given stimulus (Tierney et al., Citation2018a).

In the qualitative post-repetition survey, CI users and NH controls reported experiencing similar percepts () which aligned well with their quantitative ratings. Both groups reported perceiving rhythm and melody in the speech condition, while in the water condition, rhythmic cues were significantly more apparent. Environmental sounds can often be identified with temporal/rhythmic cues, conveyed by the sound envelope, even in the absence of detailed spectral information (Gygi et al., Citation2004). The sound envelope of speech contains relatively slow amplitude fluctuations over time while the sound envelope of environmental sounds is typically much faster. As a result, these two categories can be successfully distinguished by this characteristic alone (Reed & Delhorne, Citation2005), despite the varied sounds (e.g., water dripping, a dog barking, ambulance siren, etc.) and associated acoustics that constitute “environmental sounds.” CI users have been shown to have relatively good environmental sound perception for sounds with distinct temporal envelopes (Inverso & Limb, Citation2010; Reed & Delhorne, Citation2005; Shafiro et al., Citation2022). Our qualitative findings provide evidence that rhythmic cues are both important and apparent to listeners perceiving the sound-to-music effect.

We also found that about one-quarter of the participants in both groups were “non-perceivers,” indicating notable individual differences in the sound-to-music effect. A few papers have reported on individual differences for this effect (Mullin et al., Citation2021; Tierney et al., Citation2021), though relatively more have used linear mixed effects models as we have done to account for individual participant behavior (Groenveld et al., Citation2020; Jaisin et al., Citation2016; Margulis & Simchy-Gross, Citation2016; Simchy-Gross & Margulis, Citation2018; Tierney et al., Citation2018a). While the speech-to-song illusion is indeed a robust phenomenon reported in the literature, there are substantial individual differences in who is susceptible to experiencing this musical transformation.

Next, we examined whether other factors, including frequency discrimination, clinical characteristics, and musical experience would influence musicality ratings. We hypothesized that CI users with better frequency discrimination (i.e., lower thresholds) would have greater musicality ratings. However, frequency discrimination was not associated with musicality ratings in either group. This was surprising for the CI group in particular given the considerable variability in frequency discrimination thresholds (); however, other work in normal hearing participants also did not find a relationship between musicality ratings and frequency discrimination (Tierney et al., Citation2021).

Within the CI group, none of the clinical variables we measured was significantly associated with musicality ratings. These variables included duration of significant hearing loss, presence/absence of acoustic hearing during experimental testing, and presence/absence of acoustic hearing in daily life (i.e., residual hearing). This was at first surprising given other work showing that duration of deafness and acoustic hearing are associated with clinical outcome measures for speech recognition (Blamey et al., Citation2012; Holden et al., Citation2013; Lazard et al., Citation2012; Roditi et al., Citation2009). But, more recent work suggests that duration of deafness may not play a large role in functional outcomes when accounting for other factors such as daily CI processor use (DeFreese et al., Citation2023). The associations between musicality ratings in our experiment and both duration of hearing loss and residual hearing were, however, trending in the anticipated directions (though not statistically significant, see Table S4). It is possible that our sample size (n = 30) was too small to detect such relationships.

As hypothesized, musical experience was not associated with musicality ratings. This aligns with prior work showing that musical training does not correlate with the magnitude of the sound-to-music effect (Tierney et al., Citation2021) and that musicians and non-musicians alike perceive the effect (Mullin et al., Citation2021; Vanden Bosch Der Nederlanden et al., Citation2015). Recent work does show, however, that beat and tone perception skills (i.e., musical aptitude), rather than formal musical training, is associated with greater magnitude of the sound-to-music effect (Tierney et al., Citation2021) and enhanced neural coding for speech (Mankel & Bidelman, Citation2018). Conversely, one study showed that musical sophistication (as measured by the continuous and multidimensional Gold-MSI scale (Müllensiefen et al., Citation2014)) was associated with reduced speech-to-song musicality ratings (Groenveld et al., Citation2020). Future work could assess whether musical aptitude and musical sophistication, rather than formal musicianship, are associated with the sound-to-music effect in CI users.

Last, we assessed music satisfaction and experience in CI users and found low satisfaction with music listening post implantation and less time spent listening to music following hearing loss (). This is consistent with widely reported outcomes of reduced satisfaction/appreciation of music and less time spent listening to music after CI activation (Drennan et al., Citation2015; Fuller et al., Citation2022; Gfeller et al., Citation2000; Leal et al., Citation2003; Riley et al., Citation2018). Thus, the cohort of CI users in this study appear representative of the broader CI population in regard to low music satisfaction.

One methodological limitation of our work is that participants always completed the water condition first, followed by the speech condition. Testing order bias could be contributing to our finding that, overall, water ratings were higher than speech ratings. However, we do not necessarily think this is the case because in (Rowland et al., Citation2019) water ratings were also greater than speech ratings, and that study was a between-subjects, rather than a within-subjects, design. Also, prior work suggests that “it is hard to ‘unhear’ the illusion” (Groenveld et al., Citation2020) such that when participants heard a more “musical” condition first, they tended to give higher musicality ratings for subsequent stimuli. If this were the case in our data, we would have observed higher musicality ratings for speech (the second condition participants completed) compared to water (the first, and indeed more musical condition participants completed). Instead, we did not find that musical expectations generalized between stimulus types. Nonetheless, it would be important for future studies employing within-subjects designs in larger participant samples to counterbalance conditions.

A second limitation of this work relates to the on-screen scale where participants made perceptual ratings. These scales, with the endpoints “exactly like speech/exactly like singing” and “not at all like music/exactly like music” () could suggest a change in perception over a trial that might influence participants to respond in a positive way. While this is a possibility, we think this was unlikely in the majority of cases as a) there were several “non-perceiving” participants who did not move their mouse over the course of the experiment and b) the initial mouse position defaulted to the middle of the scale, precluding any biases about participant starting points/changes in perception over time.

Future Directions

A crucial next step for this work is to better understand the neural correlates and neurophysiological mechanisms underlying the sound-to-music effect. To our knowledge, there are only two studies that have explored the sound-to-music effect with neuroimaging (i.e., fMRI) in normal hearing individuals (Hymers et al., Citation2015; Tierney et al., Citation2013). These studies found that speech perceived as song, compared to normal speech, activated bilateral regions in the superior temporal and middle temporal gyri along with some frontal and parietal regions – findings that align with brain regions involved in song perception more generally (Schön et al., Citation2010; Whitehead & Armony, Citation2018). The sound-to-music effect provides a unique opportunity to further understand the relative overlap (Peretz et al., Citation2015) and dissociation (Chen et al., Citation2023; Norman-Haignere et al., Citation2015) of music elements (rhythm, melody) and speech/language processing in the brain. Neuroimaging studies could also help determine what factors may be at play for perceivers and non-perceivers of the sound-to-music effect.

Future work could use time-resolved methods such as magnetoencephalography (MEG) or electroencephalography (EEG) to understand the neural time-course corresponding to the change in perception from speech/water to song/music. When acoustic features are carefully matched, cortical tracking of song is more precise than cortical tracking of speech in difficult listening conditions, but is similar in easy listening conditions (Vanden Bosch der Nederlanden et al., Citation2020). Other work has found that cortical tracking for music and speech rhythm is modulated by musical training (Harding et al., Citation2019). During the transformation from speech to song, in which no acoustic changes occur, it would be interesting to investigate the degree of neural phase-locking over the course of repetitions and understand how phase-locking patterns map on to behavioral perceptions of speech and song. In CI users, neuroimaging techniques such as EEG/MEG and fMRI are more limited due to the electrical and ferromagnetic components of the implant. Techniques such as functional near infrared spectroscopy (fNIRS) could allow for a more nuanced understanding of the cortical regions that may come online during stimulus repetition in the sound-to-music effect in CI users.

In addition to future directions for neuroimaging, our work could inform new clinical directions for making music accessible to CI users. Composers could modify music to sound more pleasant for CI users; already, some work has designed deep neural networks that, in real time, remix pop songs and other music to sound more enjoyable (Gajęcki & Nogueira, Citation2018; Tahmasebi et al., Citation2020). Here, we show that CI users perceive more musicality after multiple, unalerted repetitions of a single stimulus, suggesting that music producers and composers could better incorporate this common tool into music tailored specially for this population. These newer approaches in music composition can be pursued in parallel with continued improvements to cochlear implant technology. Such work will hopefully spark collaborations between musicians, composers, audiologists, scientists, and patient stakeholders, bringing together interdisciplinary perspectives to advance quality of life and music listening for CI users.

Conclusion

Cochlear implant users experience challenges with music perception due to both physiological limitations (e.g., channel interaction) as well as engineering limitations (e.g., envelope-based rather than fine structure coding). As a result, many CI recipients report low satisfaction with music listening (Riley et al., Citation2018). In this study, we asked whether repetition of short, looped spoken phrases and environmental sound clips could be a useful tool for eliciting music perception in this population. We found that CI users did perceive the musical transformation for both speech and water droplets. While CI technology continues to improve features for music perception, repetition incorporated into music specially composed for CI users may provide a promising way for them to experience music.

Author Contributions

AVK, IMB, JR, and RHG designed the study. AVK, IMB, AJD acquired data. AVK, IMB, and AJD conducted data analysis. ALH provided input on statistical analyses and data interpretation. AVK and IMB drafted the manuscript and RHG, JR, MTW, and RLG provided guidance and feedback during manuscript preparation. All authors approved the manuscript for publication.

Supplemental material

Supplemental Material

Download MS Word (28.5 KB)

Acknowledgments

We thank Michael Burchesky, AuD and Rayah Kirby for assistance with data collection, Dr. Chris Stecker for creating the frequency discrimination task, and Dr. Dan Gustavson for data analysis suggestions. We also thank all of our participants. The color palette used in the figures is from nationalparkcolors by Katie Jolly: https://github.com/katiejolly/nationalparkcolors.

Disclosure statement

Author RHG was a consultant for Skylark Bio as well as a member of the Audiology Advisory Board for Advanced Bionics and Cochlear Americas at the time of publication. No competing interests are declared for any other authors.

Data availability statement

Stimulus files and deidentified participant data are available at the following Open Science Framework repository: https://osf.io/dujt3/.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/25742442.2024.2313430

Additional information

Funding

This work was supported by a National Science Foundation Graduate Research Fellowship Program award and NIDCD F31DC020112 to AVK, NIDCD F31DC015956 to IMB, VICTR Award VR52433 to AD, NIDCD R01 DC016977 to RLG, and NIDCD R01 DC009404 to RHG. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders.

References

  • Ali, H., Noble, J. H., Gifford, R. H., Labadie, R. F., Dawant, B. M., Hansen, J. H. L., & Tobey, E. (2015). Image-guided customization of frequency-place mapping in cochlear implants. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5843–5847. https://doi.org/10.1109/ICASSP.2015.7179092
  • Berg, K. A., Noble, J. H., Dawant, B. M., Dwyer, R. T., Labadie, R. F., & Gifford, R. H. (2019). Speech recognition as a function of the number of channels in perimodiolar electrode recipients. The Journal of the Acoustical Society of America, 145(3), 1556–1564. https://doi.org/10.1121/1.5092350
  • Berg, K. A., Noble, J. H., Dawant, B. M., Dwyer, R. T., Labadie, R. F., & Gifford, R. H. (2021). Speech recognition as a function of the number of channels for an array with large inter-electrode distances. The Journal of the Acoustical Society of America, 149(4), 2752–2763. https://doi.org/10.1121/10.0004244
  • Blamey, P., Artieres, F., Başkent, D., Bergeron, F., Beynon, A., Burke, E., Dillier, N., Dowell, R., Fraysse, B., Gallégo, S., Govaerts, P. J., Green, K., Huber, A. M., Kleine-Punte, A., Maat, B., Marx, M., Mawman, D., Mosnier, I., O’Connor, A. F. … Lazard, D. S. (2012). Factors affecting auditory performance of postlinguistically deaf adults using cochlear implants: An update with 2251 patients. Audiology and Neurotology, 18(1), 36–47. https://doi.org/10.1159/000343189
  • Brockmeier, S. J., Baumgartner, W. D., Steinhoff, J., Nopp, P., Weisser, P., Thurner, E., Ebenhoch, H., Gedlicka, W., Arnold, W., & Gstöttner, W. (2000). Munich music questionnaire to record music listening habits of people with post-lingual deafness after cochlear implantation. Advances in Oto-Rhino-Laryngology, 57, 405–407. https://doi.org/10.1159/000059192
  • Brockmeier, S. J., Fitzgerald, D., Searle, O., Fitzgerald, H., Grasmeder, M., Hilbig, S., Vermiere, K., Peterreins, M., Heydner, S., & Arnold, W. (2011). The music perception test: A novel battery for testing music perception of cochlear implant users. Cochlear Implants International, 12(1), 10–20. https://doi.org/10.1179/146701010X12677899497236
  • Buyens, W., Van Dijk, B., Moonen, M., & Wouters, J. (2014). Music mixing preferences of cochlear implant recipients: A pilot study. International Journal of Audiology, 53(5), 294–301. https://doi.org/10.3109/14992027.2013.873955
  • Buyens, W., Van Dijk, B., Moonen, M., & Wouters, J. (2018). Evaluation of a stereo music preprocessing scheme for cochlear implant users. Journal of the American Academy of Audiology, 29(1), 35–43. https://doi.org/10.3766/jaaa.16103
  • Chen, X., Affourtit, J., Ryskin, R., Regev, T. I., Norman-Haignere, S., Jouravlev, O., Malik-Moraleda, S., Kean, H., Varley, R., & Fedorenko, E. (2023). The human language system, including its inferior frontal component in “Broca’s area,” does not support music perception. Cerebral Cortex, 33(12), 7904–7929. https://doi.org/10.1093/cercor/bhad087
  • DeFreese, A. J., Lindquist, N. R., Shi, L., Holder, J. T., Berg, K. A., Haynes, D. S., & Gifford, R. H. (2023). The impact of daily processor use on adult cochlear implant outcomes: Reexamining the roles of duration of deafness and age at implantation. Otology & Neurotology, 44(7), 672–678. https://doi.org/10.1097/MAO.0000000000003920
  • Deutsch, D. (1995). Musical illusions and paradoxes. Philomel Records.
  • Deutsch, D. (2003). Phantom words and other curiosities. Philomel Records.
  • Deutsch, D., Henthorn, T., & Lapidis, R. (2011). Illusory transformation from speech to song. The Journal of the Acoustical Society of America, 129(4), 2245–52. https://doi.org/10.1121/1.3562174
  • Drennan, W. R., Oleson, J. J., Gfeller, K., Crosson, J., Driscoll, V. D., Won, J. H., Anderson, E. S., & Rubinstein, J. T. (2015). Clinical evaluation of music perception, appraisal and experience in cochlear implant users. International Journal of Audiology, 54(2), 114–123. https://doi.org/10.3109/14992027.2014.948219
  • Dunn, C., Miller, S. E., Schafer, E. C., Silva, C., Gifford, R. H., & Grisel, J. J. (2020). Benefits of a hearing registry: Cochlear implant candidacy in quiet versus noise in 1,611 patients. American Journal of Audiology, 29(4), 851–861. https://doi.org/10.1044/2020_AJA-20-00055
  • Falk, S., & Rathcke, T. (2010). On the speech-to-song illusion: Evidence from German. Proceedings of the International Conference on Speech Prosody, Chicago, IL (pp. 2–5).
  • Falk, S., Rathcke, T., & Bella, S. D. (2014). When speech sounds like music. Journal of Experimental Psychology: Human Perception and Performance, 40(4), 1491–1506. https://doi.org/10.1037/a0036858
  • Friesen, L. M., Shannon, R. V., Baskent, D., & Wang, X. (2001). Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic hearing and cochlear implants. The Journal of the Acoustical Society of America, 110(2), 1150–1163. https://doi.org/10.1121/1.1381538
  • Fujita, S., & Ito, J. (1999). Ability of nucleus cochlear implantees to recognize music. Annals of Otology, Rhinology and Laryngology, 108(7 I), 634–640. https://doi.org/10.1177/000348949910800702
  • Fuller, C., Free, R., Maat, B., & Başkent, D. (2022). Self-reported music perception is related to quality of life and self-reported hearing abilities in cochlear implant users. Cochlear Implants International, 23(1), 1–10. https://doi.org/10.1080/14670100.2021.1948716
  • Gajęcki, T., & Nogueira, W. (2018). Deep learning models to remix music for cochlear implant users. The Journal of the Acoustical Society of America, 143(6), 3602–3615. https://doi.org/10.1121/1.5042056
  • Gfeller, K., Christ, A., Knutson, J. F., Witt, S., Murray, K. T., & Tyler, R. S. (2000). Musical backgrounds, listening habits, and aesthetic enjoyment of adult cochlear implant recipients. Journal of the American Academy of Audiology, 11(7), 390–406. https://doi.org/10.1055/s-0042-1748126
  • Gfeller, K., Jiang, D., Oleson, J. J., Driscoll, V., Olszewski, C., Knutson, J. F., Turner, C., & Gantz, B. (2012). The effects of musical and linguistic components in recognition of real-world musical excerpts by cochlear implant recipients and normal-hearing adults. Journal of Music Therapy, 49(1), 68–101. https://doi.org/10.1093/jmt/49.1.68
  • Gfeller, K., Witt, S., Adamek, M., Mehr, M., Rogers, J., Stordahl, J., & Ringgenberg, S. (2002). Effects of training on timbre recognition and appraisal by postlingually deafened cochlear implant recipients. Journal of the American Academy of Audiology, 13(3), 132–145. https://doi.org/10.1055/s-0040-1715955
  • Gifford, R. H., Noble, J. H., Camarata, S. M., Sunderhaus, L. W., Dwyer, R. T., Dawant, B. M., Dietrich, M. S., & Labadie, R. F. (2018). The relationship between spectral modulation detection and speech recognition: Adult versus pediatric cochlear implant recipients. Trends in Hearing, 22, 1–14. https://doi.org/10.1177/2331216518771176
  • Gifford, R. H., Sunderhaus, L. W., Holder, J. T., Berg, K. A., Dawant, B. M., Noble, J. H., Perkins, E., & Camarata, S. (2022). Speech recognition as a function of the number of channels for pediatric cochlear implant recipients. JASA Express Letters, 2(9), 094403. https://doi.org/10.1121/10.0013428
  • Groenveld, G., Burgoyne, J. A., & Sadakata, M. (2020). I still hear a melody: Investigating temporal dynamics of the speech-to-song illusion. Psychological Research, 84(5), 1451–1459. https://doi.org/10.1007/s00426-018-1135-z
  • Gygi, B., Kidd, G. R., & Watson, C. S. (2004). Spectral-temporal factors in the identification of environmental sounds. The Journal of the Acoustical Society of America, 115(3), 1252–1265. https://doi.org/10.1121/1.1635840
  • Harding, E. E., Sammler, D., Henry, M. J., Large, E. W., & Kotz, S. A. (2019). Cortical tracking of rhythm in music and speech. Neuroimage: Reports, 185(October 2018), 96–101. https://doi.org/10.1016/j.neuroimage.2018.10.037
  • Holden, L. K., Finley, C. C., Firszt, J. B., Holden, T. A., Brenner, C., Potts, L. G., Gotter, B. D., Vanderhoof, S. S., Mispagel, K., Heydebrand, G., & Skinner, M. W. (2013). Factors affecting open-set word recognition in adults with cochlear implants. Ear and Hearing, 34(3), 342–360. https://doi.org/10.1097/AUD.0b013e3182741aa7
  • Hymers, M., Prendergast, G., Liu, C., Schulze, A., Young, M. L., Wastling, S. J., Barker, G. J., & Millman, R. E. (2015). Neural mechanisms underlying song and speech perception can be differentiated using an illusory percept. Neuroimage: Reports, 108, 225–233. https://doi.org/10.1016/j.neuroimage.2014.12.010
  • Inverso, Y., & Limb, C. J. (2010). Cochlear implant-mediated perception of nonlinguistic sounds. Ear and Hearing, 31(4), 505–514. https://doi.org/10.1097/AUD.0b013e3181d99a52
  • Jaisin, K., Suphanchaimat, R., Figueroa Candia, M. A., & Warren, J. D. (2016). The speech-to-song illusion is reduced in speakers of tonal (vs. non-tonal) languages. Frontiers in Psychology, 7, 662. https://doi.org/10.3389/fpsyg.2016.00662
  • Kang, R., Nimmons, G. L., Drennan, W., Longnion, J., Ruffin, C., Nie, K., Won, J. H., Worman, T., Yueh, B., & Rubinstein, J. (2009). Development and validation of the University of Washington clinical assessment of music perception test. Ear and Hearing, 30(4), 411–418. https://doi.org/10.1097/AUD.0b013e3181a61bc0
  • Kim, I., Yang, E., Donnelly, P. J., & Limb, C. J. (2010). Preservation of rhythmic clocking in cochlear implant users: A study of isochronous versus anisochronous beat detection. Trends in Amplification, 14(3), 164–169. https://doi.org/10.1177/1084713810387937
  • Lazard, D. S., Vincent, C., Venail, F., van de Heyning, P., Truy, E., Sterkers, O., Skarzynski, P. H., Skarzynski, H., Schauwers, K., O’Leary, S., Mawman, D., Maat, B., Kleine-Punte, A., Huber, A. M., Green, K., Govaerts, P. J., Fraysse, B., Dowell, R., Dillier, N. & Blamey, P. J. (2012). Pre-, per- and postoperative factors affecting performance of postlinguistically deaf adults using cochlear implants: A new conceptual model over time. PLoS One, 7(11), 1–11. https://doi.org/10.1371/journal.pone.0048739
  • Leal, M. C., Shin, Y. J., Laborde, M. L., Calmels, M. N., Verges, S., Lugardon, S., Andrieu, S., Deguine, O., & Fraysse, B. (2003). Music perception in adult cochlear implant recipients. Acta Oto-Laryngologica, 123(7), 826–835. https://doi.org/10.1080/00016480310000386
  • Limb, C. J., & Roy, A. T. (2014). Technological, biological, and acoustical constraints to music perception in cochlear implant users. Hearing Research, 308, 13–26. https://doi.org/10.1016/j.heares.2013.04.009
  • Litvak, L. M., Spahr, A. J., Saoji, A. A., & Fridman, G. Y. (2007). Relationship between perception of spectral ripple and speech recognition in cochlear implant and vocoder listeners. The Journal of the Acoustical Society of America, 122(2), 982–991. https://doi.org/10.1121/1.2749413
  • Mankel, K., & Bidelman, G. M. (2018). Inherent auditory skills rather than formal music training shape the neural encoding of speech. Proceedings of the National Academy of Sciences of the United States of America, 115(51), 13129–13134. https://doi.org/10.1073/pnas.1811793115
  • Margulis, E. H., & Simchy-Gross, R. (2016). Repetition enhances the musicality of randomly generated tone sequences. Music Perception: An Interdisciplinary Journal, 33(4), 509–514. https://doi.org/10.1525/mp.2016.33.4.509
  • Margulis, E. H., Simchy-Gross, R., & Black, J. L. (2015). Pronunciation difficulty, temporal regularity, and the speech-to-song illusion. Frontiers in Psychology, 6, 48. https://doi.org/10.3389/fpsyg.2015.00048
  • Moberly, A. C., & Reed, J. (2019). Making sense of sentences: Top-down processing of speech by adult cochlear implant users. Journal of Speech, Language, & Hearing Research, 62(8), 2895–2905. https://doi.org/10.1044/2019_JSLHR-H-18-0472
  • Müllensiefen, D., Gingras, B., Musil, J., Stewart, L., & Snyder, J. (2014). The musicality of non-musicians: An index for assessing musical sophistication in the general population. PLoS One, 9(2), e89642. https://doi.org/10.1371/journal.pone.0089642
  • Mullin, H. A. C., Norkey, E. A., Kodwani, A., Vitevitch, M. S., Castro, N., & Dick, F. (2021). Does age affect perception of the speech-to-song illusion? PLoS One, 16(4), 1–16. https://doi.org/10.1371/journal.pone.0250042
  • Norman-Haignere, S., Kanwisher, N. G., & McDermott, J. H. (2015). Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron, 88(6), 1281–1296. https://doi.org/10.1016/j.neuron.2015.11.035
  • Ollen, J. E.(2006). A criterion-related validity test of selected indicators of musical sophistication using expert ratings. [ Doctoral thesis]. Ohio State University.
  • Oxenham, A. J., Bernstein, J. G. W., & Penagos, H. (2004). Correct tonotopic representation is necessary for complex pitch perception. Proceedings of the National Academy of Sciences of the United States of America, 101(5), 1421–1425. https://doi.org/10.1073/pnas.0306958101
  • Peretz, I., Vuvan, D., Lagrois, M. E., & Armony, J. L. (2015). Neural overlap in processing music and speech. Philosophical Transactions of the Royal Society B: Biological Sciences, 9(JUNE), 20140090. https://doi.org/10.3389/fnhum.2015.00330
  • Rathcke, T., Falk, S., & Dalla Bella, S. (2021). Music to your ears: Sentence sonority and listener background modulate the ‘“speech-to-song illusion. Music Perception, 38(5), 499–508. https://doi.org/10.1525/mp.2021.38.5.499
  • Reed, C. M., & Delhorne, L. A. (2005). Reception of environmental sounds through cochlear implants. Ear and Hearing, 26(1), 48–61. https://doi.org/10.1097/00003446-200502000-00005
  • Riley, P. E., Ruhl, D. S., Camacho, M., & Tolisano, A. M. (2018). Music appreciation after cochlear implantation in adult patients: A systematic review. Otolaryngology - Head and Neck Surgery, 158(6), 1002–1010. https://doi.org/10.1177/0194599818760559
  • Riss, D., Hamzavi, J. S., Blineder, M., Honeder, C., Ehrenreich, I., Kaider, A., Baumgartner, W. D., Gstoettner, W., & Arnoldner, C. (2014). FS4, FS4-p, and FSP: A 4-month crossover study of 3 fine structure sound-coding strategies. Ear and Hearing, 35(6), e272–e281. https://doi.org/10.1097/AUD.0000000000000063
  • Roditi, R. E., Poissant, S. F., Bero, E. M., & Lee, D. J. (2009). A predictive model of cochlear implant performance in postlingually deafened adults. Otology & Neurotology, 30(4), 449–454. https://doi.org/10.1097/MAO.0b013e31819d3480
  • Rowland, J., Kasdan, A., & Poeppel, D. (2019). There is music in repetition: Looped segments of speech and nonspeech induce the perception of music in a time-dependent manner. Psychonomic Bulletin and Review, 26(2), 583–590. https://doi.org/10.3758/s13423-018-1527-5
  • Schön, D., Gordon, R., Campagne, A., Magne, C., Astésano, C., Anton, J. L., & Besson, M. (2010). Similar cerebral networks in language, music and song perception. Neuroimage: Reports, 51(1), 450–461. https://doi.org/10.1016/j.neuroimage.2010.02.023
  • Shafiro, V., Luzum, N., Moberly, A. C., & Harris, M. S. (2022). Perception of environmental sounds in cochlear implant users: A systematic review. Frontiers in Neuroscience, 15(January), 1–13. https://doi.org/10.3389/fnins.2021.788899
  • Simchy-Gross, R., & Margulis, E. H. (2018). The sound-to-music illusion: Repetition can musicalize nonspeech sounds. Music & Science, 1, 205920431773199. https://doi.org/10.1177/2059204317731992
  • Smith, Z. M., Delgutte, B., & Oxenham, A. J.(2002). Chimaeric sounds reveal dichotomies in auditory perception. Nature, 416(6876), 87–90. https://doi.org/10.1038/416087a
  • Tahmasebi, S., Gajȩcki, T., Nogueira, W., Zhang, C., & Yu, B. (2020). Multi-task network representation learning. Frontiers in Neuroscience, 14(May), 1–10. https://doi.org/10.3389/fnins.2020.00434
  • Tierney, A., Dick, F., Deutsch, D., & Sereno, M. (2013). Speech versus song: Multiple pitch-sensitive areas revealed by a naturally occurring musical illusion. Cerebral Cortex. https://doi.org/10.1093/cercor/bhs003
  • Tierney, A., Patel, A. D., & Breen, M. (2018a). Acoustic foundations of the speech-to-song illusion. Journal of Experimental Psychology: General, 147(6), 888–904. https://doi.org/10.1037/xge0000455
  • Tierney, A., Patel, A. D., & Breen, M. (2018b). Repetition enhances the musicality of speech and tone stimuli to similar degrees. Music Perception: An Interdisciplinary Journal, 35(5), 573–578. https://doi.org/10.1525/mp.2018.35.5.573
  • Tierney, A., Patel, A. D., Jasmin, K., & Breen, M. (2021). Individual differences in perception of the speech-to-song illusion are linked to musical aptitude but not musical training. Journal of Experimental Psychology: Human Perception and Performance, 47(12), 1681–1697. https://doi.org/10.1037/xhp0000968
  • Vanden Bosch Der Nederlanden, C. M., Hannon, E. E., & Snyder, J. S. (2015). Everyday musical experience is sufficient to perceive the speech-to-song illusion. Journal of Experimental Psychology: General, 144(2), e43–e49. https://doi.org/10.1037/xge0000056
  • Vanden Bosch der Nederlanden, C. M., Joanisse, M., & Grahn, J. A. (2020). Music as a scaffold for listening to speech: Better neural phase-locking to song than speech. Neuroimage: Reports, 214(May 2019), 116767. https://doi.org/10.1016/j.neuroimage.2020.116767
  • Vanden Bosch der Nederlanden, C. M., Nederlanden, B. D., Qi, X., Seth, P., Grahn, J. A., Joanisse, M. F., Sequeira, S., & Hannon, E. E. (2022). Developmental changes in the categorization of speech and song. October, 26(5), 1–21. https://doi.org/10.1111/desc.13346
  • Whitehead, J. C., & Armony, J. L. (2018). Singing in the brain: Neural representation of music and voice as revealed by fMRI. Human Brain Mapping, 39(12), 4913–4924. https://doi.org/10.1002/hbm.24333