395
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Phonetics as a means of nationalising art songs: a comparative music-phonetics study based on Zhao Yuanren’s New Poetry Collection

, ORCID Icon, , , , , , & show all
Pages 202-226 | Received 26 May 2022, Accepted 24 Jan 2024, Published online: 21 Feb 2024

Abstract

This study investigates the dynamic equilibrium between music and language within art songs, as exemplified in Zhao Yuanren's New Poetry Collection. Zhao Yuanren, a modern musician with a strong background in both phonetics and music theory, serves as the focal point of our research. Our primary objective is to explore how the fusion of new poetry and melodic compositions in Zhao Yuanren's works imbues art songs with a vivid sense of national identity. Through the provision of quantitative data, this study contributes to the evolution of Chinese art songs in the contemporary era and offers insights into the cultural implications of the genre. To analyze the structural characteristics and phonetic elements presented in the melodies of the New Poetry Collection, we employ visual image analysis. Additionally, we conduct acoustic data analysis, focusing on parameters such as frequency (Hz), intensity (dB), and pitch (st), which enable us to visually represent the speech sound aspects embedded in musical compositions and vocal performances. Our findings suggest the following: (1) Quantitative linguistic methodologies, as applied through the lens of linguistic musicology, offer a valid theoretical framework for the examination of Chinese art songs. (2) The subset of art songs under scrutiny in this study demonstrates adherence to specific linguistic constraints in various ways. (3) The phonetic elements identified in this research hold the potential to nationalise art songs and may have broader applications in other artistic domains. This study injects a rational, empirically grounded approach into the traditional realm of musicological cantorial relationships, thus providing valuable empirical data for the nationalisation of art songs. These findings underscore the practical significance of Chinese culture as a national treasure and lend support to the flourishing of Chinese art songs while preserving their cultural identity.

1. Introduction

Both communication and vocality represent distinct forms of vocal expression, utilising the resonant capacities of the vocal tract. Voice, in its various forms, serves as a carrier of intention, emotion, persona, and the mental state of those engaged in spoken or sung expression (Malawey, Citation2020, p. 14). Despite their close relationship, the considerable differences in the semantic information conveyed by voice distinguish communication from vocality, as noted by LaBelle (Citation2010). Song performances inherently entail a complex interplay of multimodal communication involving performers and audiences (Watts & Morrissey, Citation2019). Within the context of vocal music, the term ‘Voice’ embodies a dual nature, encompassing both linguistic and non-linguistic content, as expounded upon by LaBelle (Citation2010, p. 149). Voice, as a concept, carries metaphorical implications associated with identity, power, authority, agency, representation, and empowerment (Malawey, Citation2020, p. 14). Our comprehension of voice enriches our understanding of its multifaceted meanings, its role as an indicator of emotional states and identity, and its connections to social constructs. This study concentrates on examining the cultural metaphors inherent in the sonic materiality of voice within Chinese art songs, elucidating their existence, scope, and significance.

1.1. Music language relations

Music and language exhibit numerous parallels, both in terms of their representational aspects (Watts & Morrissey, Citation2019, p. 204) and the application of preference rules (Johanneke & Schreuder, Citation2022, p. 28). However, the intricate and multifaceted nature of the relationship between music and language often eludes easy categorisation (Weidman, Citation2014). To comprehensively grasp the nuances and meanings associated with voice, it is imperative to adopt a multidisciplinary perspective, encompassing fields such as cognitive science (Patel, Citation2007), acoustics (Johanneke & Schreuder, Citation2022), musicology (Ladd et al., Citation2020), linguistics (Chan, Citation1987), sociolinguistics (Pichler & Williams, Citation2016; Watts & Morrissey, Citation2019), ethnography (Faudree, Citation2012), anthropology (Weidman, Citation2014), and even more abstract domains like semiotics (Faudree, Citation2012) and philosophy (Malawey, Citation2020, p. 14). This necessitates a growing emphasis on interdisciplinary research within our field.

1.2. Speech song relations

It is important to note that investigations into the connection between music and language have traditionally fallen within the purview of musicologists, with a particular emphasis on vocal music – a form of art that intricately weaves musical notation with textual elements (Qian, Citation2014). A well-known Chinese proverb underscores the importance of aligning the melody of a song with the nuances of language (Editorial Board of Chinese Music Dictionary, Institute of Music, China Academy of Art, 1984, p. 458). This adage reflects the accumulated wisdom of Chinese composers and singers, who, as native speakers of a tonal language, have honed their craft over millennia of performance practice (Pu & Xia, Citation2012). Throughout history, music scholars have explored the complex interplay between music and language, seeking to understand how music composition, performance, and perception are influenced by linguistic contexts (Temperley, Citation2022).

In contemporary times, musicologists have embraced multidisciplinary research methods to delve deeper into the coordination of singing and vocalisation within songs. This interdisciplinary approach has enriched various facets of music analysis (Malawey, Citation2020), music performance (Ladd et al., Citation2020), the preservation of music cultural heritage (Janssen et al., Citation2017; Zhang & Cross, Citation2021), the expression of music semantics (Han & Sundberg, Citation2017), and more. For instance, Chan (Citation1987) employed quantitative data to unveil potential systematic connections between melodies and sung lyrics in Chinese Cantonese pop songs. His findings indicated that melodies retained distinctive features mirroring lexical tones in terms of their relative levels and contours. While this early study provided a compelling demonstration of music-linguistic correlations, its limited dataset (six songs) and sole focus on fundamental frequency (F0) paved the way for further exploration, serving as a foundational backdrop for the research presented in this paper.

1.3. Relative research background

A substantial body of scholarly work has established a robust connection between linguistic and musical components (Chan, Citation1987; Ladd et al., Citation2020). The subjects explored in these studies primarily encompass the function of this relationship, its significance, its manifestation across diverse artistic mediums, its variations across different cultural contexts, and the approaches and perspectives employed to investigate this correlation.

1.3.1. Function

The central focus of the music-language connection lies in its shared cultural metaphor, a fact that holds particular intrigue for both fields. This emphasis naturally extends to its sub-branch: the song-speech connection. The voice in the domains of music and language is perceived as a culturally constructed phenomenon (Eidsheim, Citation2014, p. 339) carrying profound ideological implications (Weidman, Citation2014, p. 45). Faudree (Citation2012) contends that the functions and values of both music and language are socially constructed, fundamentally shaping communication in distinct cultural ways. This form of communication serves as a means of signifying and participating in narratives, conveying social messages, adhering to musical conventions, and engaging in instrumental dialogues (Burns & Woods, Citation2004).

At a microscopic level, music enriched with phonetic elements, such as songs, possesses the unique capacity to both create and disrupt meaning and individual identity (Neumark, Citation2010, p. 96). It also serves as a medium for mapping larger cultural constructs. For instance, American vocalists consciously cultivate their musical style, identity, social class, and genre by incorporating ‘Americanisms’ (Trudgill, Citation1983) or various regional dialects (Pichler & Williams, Citation2016) into their songs (Gibson, Citation2023). Similarly, the Chinese composer Zhao Yuanren is believed to have strategically employed phonetic elements, including phonology and syntax, to imbue his art songs with a distinct ‘Chinese’ essence (Zou & Wang, Citation2021). Additionally, Chaloupková (Citation2021) suggests that the Chinese cultural influence in Zhao's music is evident in the choice of themes and the incorporation of existing pentatonic melodies.

However, prior studies have lacked sufficient quantitative evidence to substantiate these notions. Therefore, the investigation into the presence or absence of analogous ‘Chinese characteristics’ in Chinese art songs, as well as the manner in which they manifest, has captured our attention and emerged as one of the primary research hypotheses in this study.

1.3.2. Weights relationship

As we have come to understand, songs encompass both music and language, yet the question of which side, or which combination, exerts a predominant influence on the sociological outcomes and cultural identities they engender remains open for debate. Middleton (Citation2006, pp. 29–30) presents a model that sheds light on this intricate relationship. He argues that it should be viewed as a partial, asymmetrical, and adaptable interplay, involving a cognitive loop ranging from musical dominance (i.e. where ‘voice tends toward speech’) to linguistic dominance (i.e. where ‘words govern the musical flow, functioning as narrative’). Across various genres and cultures, metrical and tonal constraints may interact in highly specific ways (Ladd et al., Citation2020). This perspective aligns with Zhao Yuanren's approach to composing Chinese art songs, characterised by extensive reference to speech tones while retaining a basis in free composition (Zhao, Citation1994, p. 16).

However, it is crucial to acknowledge that lyrics intelligibility and the inherent musicality of a song's melody are equally vital components of an art song, collectively conferring meaning upon it (Malawey, Citation2020, p. 20). An excessive prioritisation of either aspect can significantly impact the aesthetic sensibility of the listener (Fine & Ginsborg, Citation2014; Kennedy & Trofimovich, Citation2008; Qu et al., Citation2020). Hence, another research question addressed in this paper pertains to the actual distribution of musical and linguistic elements within Chinese art songs, as well as the conditions governing the onset and mode of operation of this bidirectional interaction.

1.3.3. Genre and cultural idolsyncratic

Meanings associated with vocal music exhibit significant variations across diverse musical genres and linguistic contexts (Malawey, Citation2020, p. 27), often corresponding to distinct metrical and tonal constraints (Ladd et al., Citation2020). Consequently, the application of targeted research methods and specific paradigms for analyzing particular music collections yields valuable insights that warrant further investigation.

1.3.3.1. Cultural idolsyncratic

Vocal melodies that incorporate tone languages as their foundation are likely shaped not only by musical considerations but also by the intricate interplay between the melody and the lexical tones of the lyrics (Zhang & Cross, Citation2021). The question of whether and to what extent tonal languages influence the construction of song melodies has sparked significant research enthusiasm globally. However, a notable and widespread limitation in these studies is the lack of attention to sung performances in languages other than English (Gibson, Citation2023; McGowan & Levitt, Citation2011).

There are approximately 7,000 languagesFootnote1 spoken worldwide, with more than half belonging to the tonal language category. Chinese, specifically represented by Mandarin, encompasses all the typical characteristics of a tonal language. It stands as the most widely spoken language, with over one billion speakers, and is one of the most extensively studied languages globally (Haspelmath & Bibiko, Citation2005; Savage et al., Citation2022). Despite its natural research value, the scope of research in this domain is significantly less than what it merits. This paper aims to bridge this research gap.

1.3.3.2. Genre idolsyncratic

The functions and values attributed to language in various vocal performance art forms exhibit variations across genres. Music genres that place a pronounced emphasis on language while attenuating rhythm have garnered the most attention in research. Examples include rock music (Middleton, Citation2006) and rap music (Pichler & Williams, Citation2016), as well as pop songs (Gibson, Citation2023; Malawey, Citation2020) and folk songs (Watts & Morrissey, Citation2019). These genres offer varying degrees of insight into the intricate relationship between music and language.

Within the Chinese cultural landscape, research in this domain encompasses multidimensional vocal art forms, including Cantonese Pop Songs (Chan, Citation1987), Chinese anhemitonic pentatonic folk songs (Liu et al., Citation2022), Chinese opera (Han & Sundberg, Citation2017; Wu, Citation1985), Shanxi folk songs (Yang et al., Citation2015), and Midu folk songs (Yukun, Citation2020). However, quantitative evidence for Chinese art songs, either in a cross-cultural or cross-genre context, has been notably absent.

Art songs, as multimodal expressions of music and language, possess inherent convergence qualities (Kimball, Citation2013, p. 9), rendering them amenable to quantitative analysis (McLeish, Citation2019). Poetry chanting and singing operate within specific structural frameworks (Kimball, Citation2013, p. 12), marked by the qǐ-chéng-zhuǎn-hé (‘起承转合’) structural transitions occurring within defined time scales (Zou & Wang, Citation2021). Semantic content within art song melodies is both supported and constrained by literary elements, intricately linking the narrative arcs of music and speech (Kimball, Citation2013).

Following the introduction of the art song genre to China from Europe in the latter half of the eighteenth century, local composers successfully adapted it within the context of Chinese culture. Chinese art songs serve as valuable materials for investigating the phonological-melodic correlations reflecting Chinese cultural diversity (Feng, Citation2020). Despite their significance and potential, research into the Music-Language correspondence of Chinese art songs has been limited in scale. Qu et al. (Citation2020) explored the intelligibility of art songs sung in Mandarin Chinese and concluded that the pitch of the art songs inversely correlated with listeners’ comprehension of the lyrics. While this study offered insights into the music-linguistic correlation in Chinese art songs, no systematic investigation of this correlation has been conducted. Therefore, this empirical study applies linguistic research methods to examine how tonal considerations factor into the singing and composition of Chinese art songs.

It was imperative to address the issue of sample selection for the study of Chinese art songs. Zhao Yuanren, a pioneering figure in early Chinese music, was among the first to employ phonetics as a means to nationalise art songs (Zhao & Xue, Citation1987). His collection of art song compositions, known as the New Poetry Collection, was meticulously chosen as the primary source for this research. Zhao Yuanren, also as the inaugural generation of Chinese linguists (Chen, Citation2017), possessed a profound understanding of Chinese language phonetics and an extensive knowledge of local Chinese folk tonalities. He stands as a trailblazing figure in the realm of Chinese language musicology (Jiang, Citation2008; Qian, Citation2014). As the first modern Chinese composer to merge art song composition with linguistic research, he provided invaluable empirical material for studies in linguistic musicology. The compositions of Zhao Yuanren's art songs vividly reflect his aspirations in linguistic musicology. He even introduced the Five-Level Tone Mark (abbreviated as FLTM) to facilitate the examination of musical melodies in conjunction with Mandarin Chinese tones (Qian, Citation2014). Despite his significant contributions, the existing body of research primarily focuses on his linguistic achievements, largely overlooking the unique linguistic musicological value embedded within his compositions. Some scholars have argued that Zhao Yuanren's lifetime achievements should be further recognised as foundational to the field of language musicology (Chen, Citation2017; Qian, Citation2018, p. 115; Zhao, Citation1992).

Current literature predominantly centres on text-based descriptive analyses, acknowledging the multidimensional value of his work in language musicology (Chaloupková, Citation2021; Tokita & Cheung, Citation2023). Zou and Wang (Citation2021) conducted an empirical analysis of ‘How can I not think of her’ from the New Poetry Collection, substantiating that Zhao Yuanren imparted a distinctly Chinese auditory character to his music. This finding served as a foundational basis for the present study, which aimed to expand the sample size to encompass all 14 art songs found within the entirety of the New Poetry Collection, building upon prior research.

1.3.3.3. Historical context

Within vocal performance, all meanings are contextually ascribed, fluid, and culturally contingent within specific cultural and historical contexts (Malawey, Citation2020, p. 29). During the era of the collection's inception, China existed at the crossroads of semi-colonialism and semi-feudalism, marked by the collision of new ideologies and cultures. The New Culture Movement and the May Fourth Movement represented prominent intellectual and cultural liberation movements that were in full swing, with language and music emerging as two pivotal components. In the realm of language, visionaries such as Hu Shi and Zhao Yuanren spearheaded the national language unification debate and the vernacular new poetry movement. These movements brought revolutionary changes to spoken and written languages, ultimately establishing a new national language as the lingua franca while supplanting the written language with the vernacular, an influence still palpable in contemporary Chinese culture.

The poems within the New Poetry Collection emerged during the backdrop of a ‘literary revolution’ in China. Vocal art was considered a potent means to disseminate progressive ideologies during this period. ‘New youth’ who had returned from Western studies composed a plethora of art songs that embraced the ‘new Mandarin’ and ‘vernacular poetry’ movements within the theoretical framework of Western tonal music theory. This legacy persists to the present day and significantly impacted the subsequent generations of Chinese art song composition (Chaloupková, Citation2021). Zhao Yuanren's New Poetry Collection stands as one of these masterpieces, serving as a conduit not only for promulgating new vernacular poetry through musical melodies, rooted in the unique essence of Chinese culture but also for introducing traditional Chinese music theory into the Western genre of art songs (ibid., p. 35). His compositions have left an indelible mark of profound influence, serving as a valuable source of learning and guidance for subsequent generations of art song creators (Li’an & Yodwised, Citation2022). Even today, on the centennial anniversary of the establishment of the Chinese art song genre, it continues to be sung and studied (Chaloupková, Citation2021).

1.3.4. Relative research approach

In the early stages of the field's development, research predominantly leaned towards employing purely textual approaches (Weidman, Citation2014, p. 43) to offer insights into vocal composition, performance, and perception, often lacking empirical foundations (Temperley, Citation2022). Only within the past decade have linguistic research methods focused on syntax and rhyme started making inroads into the realm of music studies. Linguistic researchers recognised that musical melodies could serve as a valuable means to explore speech melodies (Chow & Brown, Citation2018). Han and Sundberg (Citation2017), in an empirical investigation of Chinese kunqu, revealed that objectively measured acoustic data could elucidate vocal style. Additionally, Savage et al. (Citation2022) substantiated that the melodic sequences of folk songs contained universal regularities that could contribute to our understanding of language and culture evolution.

Consequently, a series of comparative studies focusing on local dialects and regional folk songs in China have emerged. Zhang and Cross (Citation2021) conducted a corpus analysis on the tone-melody correspondence within a Chaozhou folk song, unearthing evidence of dialect influence on melody. A similar phenomenon was observed in sung Cantonese (Schellenberg & Gick, Citation2020). The aforementioned literature provides a robust and verifiable empirical framework for the study of language musicology within the Chinese context, shedding valuable light on the present study.

1.4. Research participants and experimental methods for speech and music

Since the experimental paradigm of this study has not been validated iteratively, this subsection presents the conceptual model used in this study and their corresponding units (see Figure ).

Figure 1. Conceptual model of the present study.

Conceptual Model of the Study. The conceptual model presented in this paper encompasses four primary components: music, language, pitch embellishment, and units. Within this model, the principal focus lies on exploring the interplay between music and language, specifically examining the connections between note-tone relationships and melody-intonation. Additionally, the study places a significant emphasis on pitch embellishment, considering it a pivotal aspect of the research. The primary acoustic parameters under investigation in this study are pitch, intensity, and duration.
Figure 1. Conceptual model of the present study.

1.4.1. Mandarin tone and music

In tonal language, the rise and fall of the pitch play a dialectical role, and the tone, like the vowels and consonants, is part of the word itself; therefore, changing the pitch can completely change a word’s meaning. Chinese characters also have a unique melody and rhythm within their tones and can be represented by four symbolic identification systems (tone, pinyin phonetic symbolsFootnote2, Five Level Tone MarkFootnote3, and traditional toneFootnote4). The four tones in modern Chinese reflect the four ways of melodic movement: ‘level tone’ (also known as 阴平,-, pitch value 55Footnote5) is like a monotone hold; ‘rising tone’ (阳平,´, 35) is an upward movement from low to high; ‘falling-rising tone’ (上声,ˇ, 214) is a downward and then upward folding movement; and ‘falling tone’ (去声,ˇ, 51) is a downward movement from high to low. This combination of variations in Chinese tones has strong melodic potential (Sun, Citation2022).

The language and music pitch system are compared from the F0 state (to be detailed in 1.5.3). Modern linguistics has many pitch analysis systems in phonetic and musical studies that match F0 curves to discrete note sequences. For example, the ToBI systemFootnote6 based on AM theoryFootnote7 (Backman & Hirchberg, Citation2004), the Prosogram model, and so on. In the study of Mandarin Chinese phonology, Five-Level Tone Mark (also known as the Five Degree System,) created by Zhao Yuanren is the most effective pitch analysis system. It is also still important to note that individual variability in voice (Hudson & Holbrook, Citation1982) leads to variations in fundamental speech frequencies that are quite different from the discrete pitch sequences performed by many musical instruments. However, pitch comparison systems represented by the FLTM exclude variables related to individual variability, using relative pitch as the standard. Thus, researchers can use simple directional descriptions to illustrate pitch contours without using exact numbers to quantify interval direction and size (Kirby & Ladd, Citation2016; Ladd & Kirby, Citation2020).

FLTM serves as a mediator between pitch and tone and can be directly linked to Zhao Yuanren’s musical composition concept to form a musical-linguistic correspondence structure. According to the principle of the FLTM, this structure was reconstructed into musical scores to expand the time domain information (WHICH is not present in the FLTM), and this finding contributes to accurately characterise Mandarin tones, and to assist the learning and singing of Mandarin tones (Zhao & Fei, Citation2017).

Zhao Yuanren’s daughter, Zhao Rulan, stated that linguistic features as musical material was a primary concern in her father’s musical compositions (Zhao & Xue, Citation1987Footnote8). Zhao Yuanren believed that lyrics and melodies should be arranged within a framework of an even-oblique pattern of words and music. His philosophy was to select a wide range of pitches based on the characters’ tones and then to select specific pitches based on the characters’ ‘even or oblique’ properties, the Mandarin vocalisation rhythm, and the way they are structured. Specifically, the level and rising tones of the Mandarin tonal system are collectively called the even tone (平声, later abbreviated as E). The falling-rising and falling tones are collectively called the oblique tone (仄声, O). Zhao Yuanren chose the notes according to the even and oblique tones of the Chinese characters. He believed that even tones should be arranged first in the notes corresponding to the flat tones (i.e. one single tone for one Chinese character, with C, E, and G first), while the variable tone (i.e. multiple single tones for one Chinese character) should be used from high to low. In the case of oblique characters, the priority should be to use a variable tone, and when equipped with a flat tone, D, F, A, B should be used first. However, when the flat tone and the oblique tone are connected, the flat tone should be lower and the oblique tone higher (see Figure ). His compositional principles are present in many ancient Chinese vocal forms, such as the function of daobai (道白) in historical dramas (Zhao, Citation1994). It is worth noting that this compositional principle is not dogmatic and needs to be used in conjunction with other music compositional techniques to achieve artistic expression based on the integrity of textual and vocal expression.

Figure 2. Correlation between Mandarin tones and Zhao Yuanren’s musical composition concept.

A correspondence diagram containing the four expressions of Mandarin Chinese tones and Zhao Yuanren's principles of pitch selection for creating Chinese art songs.
Figure 2. Correlation between Mandarin tones and Zhao Yuanren’s musical composition concept.

1.4.2. Mandarin intonation and music

The Modern Chinese Dictionary defines intonation as: ‘the tone of speech’; for example, the configuration of the height and weight of speech in a sentence (Dictionary Department, Institute of Linguistics, Chinese Academy of Social Sciences, Citation1984; Yang, Citation1983). In the 1930s, Mr. Zhao Yuanren, a pioneer in Chinese intonation research, proposed a series of ideas related to intonation, such as the melodic direction of the four intonation and tone combination pitches (high rising tone, falling depressed tone, flat straight tone, and zigzag tone) and the algebraic sum relationship between intonation and word tone, among others. The intonation-related doctrines he proposed are still influential and recognised by the academic community.

Through structured application in speech, linguistic intonation conveys syntactic, pragmatic, and emphatic information and suggests rhythmic organisation patterns (Cutler, Citation1998). Chinese intonation emphasis is reflected in two independent variables: pitch accent and boundary tone. Boundary tones convey the tone of the discourse, such as questioning and declarative tones; the syllable that carries intonation information is called boundary tone. The acoustic representation of boundary tones is indicated by their fundamental frequency (F0)’s start position and/or end position (can also be indicated by slope). Pitch accent is a focus for conveying linguistic content in discourse (Jusczyk & Krumhansl, Citation1993; Lin, Citation2004). Empirical studies on both interrogative and declarative sentences show different pitch orientations, with most intonational information being carried by the last one or two non-alphabetic syllablesFootnote9 of the sentence’s last rhyme (Lin, Citation2004).

Pitch decreases are a significant trail of phrase boundaries in speech and music (Jusczyk & Krumhansl, Citation1993). Many of the New Poetry Collection pieces use descending pitch contours as phrase boundaries. For example, Listen to the Rain, Crossing the Indian Ocean, Bottle Flower, and He. Different patterns of melodic contours emerge from the different syntactic structures of Chinese intonation. For example, when accentual phrases are not at the end of a sentence, the utterance often ends with a rising pitch movement (Jun & Fougeron, Citation2000, Citation2002). In Chinese intonation, the tone of yes-no interrogative sentences is generally raised and the range of tones at the end of the sentence is greatly expanded (Yang et al., Citation2015). In addition, the overall tonal domain of specific interrogative sentences is raised, and the tonal domain is broadened (also called focal tone) before and after question words (Steele, Citation1969; Sun, Citation2002; Yan et al., Citation2016). This study focuses on the presentation of boundary tones in speech syntax in the context of music and explores the commonality of musical and linguistic syntax in the treatment of boundary structures.

Intonation characteristics are also expressed in musical works. Music, Language, and the Brain defines melody in music and language as ‘an organized sequence of pitches that conveys abundant information to the listener’ (Patel, Citation2007, p. 136). This definition emphasises the ideational character of music and language while affirming the homology of their melodic construction and perceptual patterns. The musical elements are organised in logical ways to form melodies; this way of unfolding is highly analogous to the way phrases in language are combined into sentences by grammatical rules. For example, when listening to He (a piece in New Poetry Collection), one can feel that the notes are divided into parts through silent sections, like musical rests. Although we can easily divide phrases in our auditory sense, these phrases are only one level in the musical hierarchy; the different levels are nested from low to high to form the entirety of a piece’s melody. Compared with other tonal languages, Chinese places special emphasis on the language tone variations and the organised pattern of the intonation pitch. This pattern is also found in Chinese music melodies. The co-existence of ‘sonority’ in language and music (Qian, Citation2018) makes the combination of words and music a natural area to investigate the relationship between Chinese verbal melody and Chinese musical melody.

1.4.3. Physically relevant quantity

1.4.3.1. Pitch & Intensity

Similar to music, speech exhibits characteristic properties manifested through the four fundamental physical constants: pitch, intensity, timber, and length. Within these parameters, pitch, intensity, and length hold particular significance for encapsulating vocal features, both in the context of song and speech (Huang et al., Citation2022; Wang & Shi, Citation2020). These parameters also lend themselves effectively to the translation of acoustic effects into visual representations through computer-based transcription, analysis, and visualisation of acoustic data (Feng & Xu, Citation2022; Malawey, Citation2020; Wang, Citation2019). For this study, we draw on the work of Han and Sundberg (Citation2017) and Feng and Xu (Citation2022) in the realm of Chinese folk traditional opera and focus on three primary factors: fundamental frequency pitch, articulation duration, and intensity changes in speech and singing as our primary units of analysis. These units often exhibit non-synchronized performances, making quantitative analysis of their interactions a valuable avenue for uncovering the dynamics of influence within utterances.

  1. Pitch, a fundamental property that enables sounds to be arranged along a low-to-high dimensionFootnote10 (Liu, Citation2011), is primarily described by the frequency, measured in cycles per second (Hz), with its physical foundation being the fundamental frequency (F0). In speech, F0 serves as a reflection of pitch information, encompassing intonation nuances characterised by rapid rises and falls within specific time dimensions (Steele, Citation1969). Each musical note, being a periodic sound comprising a fundamental frequency and a series of higher harmonics, embodies a range of pitches corresponding to the fundamental frequency.

  2. Pitch is widely recognised as the predominant dimension employed to construct ordering systems for musical elements (Patel, Citation2007, p. 7). Across the trajectory of human civilisation, music has employed a single octave as a reference point for generating a spectrum of pitches and intervals. In Western Tonality Music, the division of each octave into 12 semitones (st), with each semitone comprising 100 centsFootnote11, serves as a critical framework. Notably, Ladd and Kirby (Citation2020, Citation2020) have conducted several pertinent studies in tonal languages, underscoring the significance of individual musical notes and the relative pitch of adjacent syllables in exploring the principles governing the alignment of tone and musical melody, thereby offering substantial insights for this paper.

  3. Intensity pertains to the average energy density of sound waves, quantified in decibels (dB SPL). Tone intensity plays a pivotal role in acoustically underpinning the perception of emphasis in both music and speech (Huang et al., Citation2022). The auditory experience associated with sound intensity and pitch collectively contributes to the perception of loudness.

1.5. Issues in the current study

Previous research has significantly influenced the methodology of this study, which is founded on the following hypotheses:

H1: The research framework presented in this paper (Figure model, i.e. a quantitative linguistic approach) serves as an effective method for examining the linguistic characteristics within Chinese art songs.

H2: In keeping with findings in other cultural and genre contexts, there exists a degree of similarity between Chinese Mandarin and Chinese art songs in terms of their internal structure and external expressions, both of which possess culturally implicit functions.

H3: The genre conventions governing Chinese art songs are linked to the proportional distribution of musical and linguistic elements within the songs.

H4: The incorporation of linguistic elements into music provides composers and singers with a degree of self-identification and imparts a culturally diverse auditory characterization to listeners of Chinese art songs.

In response to these research questions, we have chosen to analyze the music-language features of pitch and intensity in the Chinese art song collection, ‘New Poetry Collection,’ due to their relatively clear correspondence with linguistic meanings. This empirical corpus study, situated at the intersection of music and language, involves a comprehensive comparison of pitch and pitch sequences (melodies) in the tones of the words and intonations that correspond to the songs and lyrics. This analysis is facilitated through acoustic data analysis and the application of the Five-Level Tone Mark (FLTM). The research employs a multidimensional approach, encompassing music ontology analysis to examine the musical context, FLTM to explore the linguistic context, and linguistic musicology tools to transcribe the audio corpus of vocals into quantifiable acoustic data (Hz, dB, st) for rigorous quantitative analysis. Visual analysis is employed to effectively communicate the study's findings. Through this approach, the present study unveils the mechanistic aspects of music context, including tone and intonation features such as slope, pitch accents, and boundary effects. In doing so, it provides both quantitative insights and a theoretical foundation for evaluating the role of phonological elements in the composition and performance of Chinese art songs.

2. Materials

2.1. Music database

In this paper, recorded songs serve as the primary research material for the music section. Previous research has outlined two distinct functions of recorded songs: (1) serving as a prominent form of cultural production, projecting music into broader sociocultural contexts and conveying specific emotional or expressive meanings ascribed by listeners (Marshall, Citation2019); (2) serving as objects of study for the examination of acoustic parameters, offering valuable insights into voice within various scholarly domains, and providing effective material for investigating sound materiality (Malawey, Citation2020, p. 4). These multifaceted attributes make song recordings ideal for exploring both aspects of the model examined in this paper.

As a result, we curated a database of recorded music for the New Poetry Collection, encompassing a comprehensive selection of Zhao Yuanren's art songs (see the Appendix for detailed information). The voice samples chosen for this study encompassed various forms, including male and female solos, choruses, oratorios, instrumental solos, and instrumental ensembles, intended for future targeted research (refer to the Appendix for a complete catalog of musical pieces included in this study; these will be referenced later by their serial numbers in Table , e.g. Audio #1).

Table 1. Basic Information about the songs and poems included in the New Poetry Collection.

In additionFootnote15,Footnote16 to the audio recordings, we also consulted the original scores and other textual materials from the New Poetry Collection, annotated by Zhao Yuanren himself. These materials included the title phrase, preface, table of contents, main score, short score, lyrics pronunciation, lyrics word list, and song notes. The textual content provided descriptive insights into Zhao Yuanren’s musical concepts, the distinctive musical characteristics of each piece, the underlying musical theoretical frameworks, employed phonetic elements, and guidance for interpreting the lyrics (Zhao, Citation2005).

2.2. Corpus

Correspondingly, speech recording with poems from the New Poetry Collection was chosen as the main research material (Table for details). To solve the problem of ‘new poems cannot be sung’, Zhao Yuanren incorporated new poetry material in music and singing when he composed the New Poetry Collection. The entire collection’s lyrics use 480 words (one word with two uses was counted as two words). The lyrics were in poetic form, including new-style poems, five-line poems, seven-line poems, and prose poems. To reconcile the musical and literary aspects of the art songs, and to compose while maintaining the aesthetic values of both sides, Mr. Zhao did not choose literary works based on ‘good poetry’ but added and modified the words used in the lyrics. Table lists the basic information on the songs and poems used in the song collection.

3. Methodology

3.1. Music and phonetic visualisation

See Table for a list of software used in the musical and linguistic elements’ visual analysis.

Table 2. Application details.

3.2. Procedure

Figure shows that the experiments were divided into two parts: music processing and speech processing.

Figure 3. Schematic diagram of the research process.

A schematic diagram of the research process, which include Material Collection, Data Extraction, Data collection, Data Analysis, and Conclusion.
Figure 3. Schematic diagram of the research process.

3.2.1. Speech processing

Initially, we selected New Poetry Collection poems as recitation materials (see Appendix for details), and then we selected participant’ speech recordings as Mandarin speech samples by applying qualifying conditions, including physical condition, type of native language, gender, age, vernacular, and other conditions. We selected one participant (referred to as A) as the sample for the Chinese Mandarin corpus recording. A is a native Chinese speaker (female, 25 years old, graduate student of Chinese language and literature, and current secondary school teacher) from an official dialect area in northern China and has passed the official Chinese Mandarin grade test (Level 1). Before the recording, three professors specialising in Chinese linguistics judged the participant as natural and representative of Mandarin Chinese and cultural tradition. The study was approved by Yanshan University, and the participant provided written informed consent.

Before recording, the participant performed three read-aloud exercises to reduce experimental bias in speech caused by differences in familiarity. Participants were audio-recorded using a recitation style, where they sat in a quiet, closed room and recorded their voices through a microphone (Intel® Corporation Smart Voice Technology for 12S) and Cool Edit Pro (2.1). The recorded sample rate was 16000, the selected channel was mono, and the sampling accuracy was 16 bits. The recorded audio was exported as a .wav file using Praat, which allowed for acoustic data visual collation and extraction of acoustic data of speech audio (Praat: Doing Phonetics by Computer, Citation2022).

3.2.2. Music processing

We initiated our research by compiling an extensive collection of audio from various online platforms and available music databases. Our selection criteria included (1) clear sound quality, (2) comprehensive information such as chronology, artist details, and performance context, and (3) adherence to the manuscript of the score composed by Zhao Yuanren. Additionally, regarding the types of audio, following the approach of Malawey (Citation2020), we initially prioritised audio recordings that offered consistency, ease of access, and a fixed reference point. Occasionally, we supplemented these with live performance recordings from YouTube to provide a more comprehensive view of the performances.

The initial selection of relevant audio was carried out by three professional music experts using online resources. During the subsequent screening stage, we transcribed Zhao Yuanren's original scores from the New Poetry Collection using Sibelius (Citation2014), a commonly used electronic music notation tool among musicians. This process generated Musical Instrument Digital Interface (MIDI) format files and .wax audio files (Gilbers et al., Citation2020). We then meticulously time-aligned the transcribed audio with the online-recorded music, allowing for a permissible error of ±5 s; audio exceeding this error threshold was excluded. Finally, a panel of three professional musicians verified the accuracy, vocal clarity, and overall quality of the selected musical material (for details of the final selection and reasons, please refer to the Appendix).

To prepare the selected audio and video files for analysis, we used Cool Edit Pro and Bee Cut, applying noise reduction techniques as needed. All music recordings were resampled to a frequency of 16 kHz, and audio editing was performed to meet the study's requirements. We utilised Tony software to extract music audio acoustic data, enabling the extraction of precise music waveforms, spectrograms, audio time data, pitch track data, and note data in text format (.csv) for subsequent data analysis (Mauch et al., Citation2015).

3.2.3. Data analysis

Data organisation, calculation, and descriptive analysis were performed using Microsoft Excel and IBM SPSS Statistics (see Table for details).

The main data calculation types and their formulas are outlined below.

The process of converting fundamental frequency (Hz) to semitones (St) is essential for this study. The formula, as presented in equation (Equation1), employs the variable X to denote the Hz value. Given that pitch-frequency conversion encompasses both intonation and tone, and recognising the absence of a fixed pitch system (such as the 12-equal temperament) for speech, this research adopts the actual frequency of the audio material as the foundation for computing the floating St value, with up to three decimal places omitted. In this equation, X represents the frequency in Hz, and the resulting St value provides a measure of pitch variation in relation to a reference frequency of 440 Hz. (1) St=12log2(X440)(1) Melody contour trends: ordinary least squares linear fitting was used for the slope calculation. The slope (k) is the degree of slope of the two related indicators (x,y). The formula is shown in (2). (2) k=(y1y2)/(x1x2)(2) Drawing of melody contour trends: we plotted the F0 data and corresponding speech audio and music audio slopes in a scatter plot for visual analysis. The slope was plotted as in (3), where y is the F0 data, x is the time data, and b is the intercept. (3) y=kx+b(3) Combining the fundamental frequency and the calculated St data, we further used IBM SPSS Statistics (24.0) for statistical analysis and scientific mapping.

4. Result and discussion

4.1. Intonation

4.1.1. Melodic contour

A familiar rule of musical melody is that successive pitches are dominated by small intervals (Huron, Citation2006). Most melodic movement is achieved through smaller steps. This study investigated the melodic movements in the New Poetry Collection. First, we counted Audio #10 (as listed in Appendix) melodic contour intervals, as shown in Figure , and found that smaller intervals dominate the musical melody (37.7% of diatonic intervals and 28% of third intervals).

Figure 4. Number of adjacent interval steps in Audio10. A bar chart of the count analysis for Audio 10#s’ adjacent interval in the Appendix. Smaller intervals dominate the musical melody.

Figure 4. Number of adjacent interval steps in Audio10. A bar chart of the count analysis for Audio 10#s’ adjacent interval in the Appendix. Smaller intervals dominate the musical melody.

The investigation into the presence of small pitch movements in both speech melodies and music melodies is crucial for confirming a potential correlation between the two. To establish this connection, it is essential to demonstrate that both exhibit a significant prevalence of small pitch variations. Since pitch in both music and Mandarin is quantified in terms of frequency, the relative semitone difference between two adjacent frequencies serves as a means to gauge the relationship between these two domains. This investigative approach, using pitch histograms based on actual pitch data, is adapted from the methodology employed by Liu et al. (Citation2022) in their study of traditional Chinese anhemitonic pentatonic folk songs. However, this paper introduces an innovative twist by utilising actual pitch-based pitch histograms instead of notation-based ones, with the aim of providing a more precise depiction of pitch characteristics. For this comparative study of melodic contours, we have selected the speech-recited version of Music #10 (recorded by participant A) and the sung version (Audio #10). Table presents basic information regarding the semitones of the selected audio samples.

Table 3. Voice and music information for Audio #10.

As shown in Table , The number of valid cases for music audio was 9675 and for speech audio was 2377. Vocal expression of the same text differs music’s vocal range is slightly higher than that of speech (vocal range of speech: −4.99st-31.01st, vocal range of music: −3.59st-36.3st). Results are shown in Figure . The vertical coordinate is the frequency of occurrence (pcs), and the horizontal coordinate is the size of the st value.

Figure 5. Histogram of frequency statistics of adjacent st in the music (A) and speech (B) audio.

A bar chart counting Audio #10 melodic contour intervals. Figure 6 consists of two plots, with music audio on the left A and speech audio on the right B. Both figures show a clear centrality.
Figure 5. Histogram of frequency statistics of adjacent st in the music (A) and speech (B) audio.

A particularly noteworthy observation is that both verbal and musical melodies exhibit a prevalence of smaller pitch intervals. In both cases, the data reveals a significant concentration of intervals characterised by their small size (A: Mean ± SD = 0.000 ± 0.005; B: Mean ± SD = 0.002 ± 0.004). This finding substantiates the hypothesis positing a potential correlation between the melodic contours of music and speech. It suggests that there is indeed a degree of similarity between the pitch patterns conveying melodic contours in both domains – music and speech. The interpretation of this observed similarity can be approached from two dominant perspectives:

  1. Influence of Song on Speech: Some argue that the prevalence of song in the phylogenetic development of humanity predates that of language (Mithen, Citation2007). From this perspective, songs could have influenced speech patterns by impacting pitch, potentially serving to strengthen social bonds within groups (Gamble, Citation2012).

  2. Influence of Language on Song: Conversely, it is suggested that language has an ephemeral influence on song. In other words, language experience might contribute to a preference for smaller pitch intervals in music (Unyk et al., Citation1992; Vos & Troost, Citation1989). In the cultural context examined in this paper, deviating from the pitch patterns of speech in sung melodies could lead to misunderstandings of the song's meaning and result in errors in its performance. Traditional vocal genres, such as Chinese traditional opera and local dialect folk songs, often exhibit melodies that align closely with speech patterns (Qian, Citation2014).

From the data presented in this study, it appears that Chinese art songs are not an exception to this pattern. However, it is important to note that while this finding provides valuable insights, the dominant mode within the Chinese music-language connection is not definitively established. Further targeted research and data collection are needed to fully understand this complex relationship.

There is a wider range of pitch movement in music compared to language (Rangea = 17.63, Rangeb = 7.77), which shows the differentiation. Paired-samples t-tests were conducted on the st data of music and speech; the results revealed a significant correlation between the speech and music chromatic intervals (Mean ± SD =2.91 ± 2.00, P(two-tailed) < .001Footnote12). This phenomenon may be based on an important difference: the linguistic category is primarily based on tone, while the musical category is primarily based on pitch. Music uses pitch as a means of expression, whereas, in language, pitch is merely an accessory to grammar and word rhyme. Zhao Yuanren was also aware of possible differences between individual vocal range and average intonation range at the time of composition and, therefore, tended to choose a moderate tone (Yu, Citation2008, p. 1). However, even when pitch range is consciously lowered, the artistic expression of the music still presents a wider range of sounds.

4.1.2. Melodic direction

We further extracted the fundamental frequency features of the speech recitation audio and song singing audio for melodic contour direction analysis. The descriptive analysis of the audio’s basic features is shown in Table .

Table 4. Voice and music information for Audio #2.

As shown in Table , although the music audio range is wider than that of speech audio (RangeSpeech:93.15Hz-304.88 Hz, RangeMusic:82.21Hz-293.06 Hz), there is a relatively stable fluctuation within the range (SDSpeech = 33.81 Hz, SDMusic = 41.72 Hz).

The physiological basis of the vocal folds and vocal structure give Mandarin a distinct ‘downward slant’ in vocalisation. This is manifested in a gradual decrease in pitch baseline and a gradual contraction in pitch range as the language unfolds (Heng, Citation2017). The poem ‘Little Poem’ is a new five-line poem consisting of four declarative stanzas. The poetic metre is ‘O-O-O-E-E, O-O-E-E-O. O-O-O-E-E, E-O-E-E-O’.Footnote13 Figure shows the scatter as well as the moving average linear analysis plot based on the F0 data from the speech recitation audio (the prediction period is 25, i.e. every 25 data points is a period). The poetic metre is marked in (Figure ). As can be seen in the figure: although there are different degrees of ups and downs in the middle of each sentence, they all show a significant drop at the end of the sentence. The results of linear regression analysis indicate that the poem generally shows a clear downward trend (where the slope (k) is −1.679 and the intercept (b) is 191.5.).

Figure 6. Scatter diagram of the voice recitation’s melodic contour in Audio #2.

A scatter plot which shows the scatter as well as the moving average linear analysis plot based on the F0 data from the speech recitation audio, the poetic metre is marked in centre of the figure. The voice recitation’s melodic contour proceeds from high to low.
Figure 6. Scatter diagram of the voice recitation’s melodic contour in Audio #2.

To quantify the extent to which verbal intonation corresponds to musical melody in the song, we further compared the verbal intonation to the musical contours in the sung version. Extensive comparison of the songs showed that the downward slant of the melodic contour is manifested in several tracks in the New Poetry Collection, the most representative of which, #2 was chosen for visual analysis (see Figure ). Track #2 is a short, delicate art song with 25 bars (song structure shown in Table ). The A section is the introduction and the A’ section is the body. The main part is also divided into four verses according to the same poetic metre.

Figure 7. Scatter diagram of singing pitch’s melodic contour in Audio #2.

A scatter plot which shows the scatter as well as the linear analysis plot based on the F0 data from the singing pitch audio. The figure consists of three parts, the upper figure (A`) shows the data situation for the main section, and the lower figures (a`1, a`2) show the data situation for each of the two phrases of the main section. The singing pitch’s melodic contour proceeds from high to low.
Figure 7. Scatter diagram of singing pitch’s melodic contour in Audio #2.

Table 5. #2 Curved structure diagram.

As shown in Figure (A` section), linear analysis of the singing pitch fundamental frequency shows a high consistency with the downward dip presented in Figure (k = −1.730, b = 234.94). Not coincidentally, this downward trend is shown to varying degrees across music databases (such as Audio#13 k = −0.0815, b = 201.09; Audio #6 k = −0.0057, b = 222.09). Given music’s natural characteristics, the number of F0 is several times higher than that of speech audio, which shows a more significant downward inclination. In the A` section (body part), the two sections show different decreasing trends (see Figure a`1, a`2), with the lower section showing a significantly higher degree of decrease than the upper half of the section (a`1 k = −2.3607 b = 312.23; a`2 k = −7.1197 b = 654.04) Through musicological harmonic and tonal analysis of the songs, we found that the downward leaning principle was most likely intentional on the part of the composer. For example, the off-key progression of D-chords and Napoli chords is used at bars 20–22 at the end of the song, creating a d-centered tonal expansion. Such harmonic progressions combine with the melodic line to create a downward leaning tendency and embed phonetic elements.

This finding may not be generalisable to all songs, although the downbeat nature of music and language in the same direction is apparent in Track #2. We do not observe a significant downward trend in the overall melodic trend in the partial songs in the music database (for example, Audio #4 k = 2.3271, b = 273.38; Audio #10 k = 0.0381, b = 407.5; Audio #11 k = 0.0161, b = 226.37). A study that artificially synthesised nonsense utterances, modified intonation contours and stress in speech, and subjected them to listening comprehension experiments found that participants used different judgment strategies for changes in pitch and stress, suggesting that the perceptual mechanisms of verbal and musical melodies differ (Perrachione et al., Citation2013; Terken & Collier, Citation1989). Research on the occurrence mechanism of sinkability is ongoing; future empirical studies that investigate the cognitive mechanism involved are needed.

4.1.3. Phonetic boundary tones and phrase boundary tones

Next, to explore the effects of different verbal sentence patterns on melodic direction, we separately investigated different sentence types of boundary tone in intonation.

4.1.3.1. Declarative sentences

The horizontal coordinate of Figure is time (s), the vertical coordinate is F0 (Hz), and the audio was recited by participant A.

Figure 8. Final sentence voice F0 line of #1 (declarative intonation).

Pitch curve of a declarative intonation in Song #1. The direction of the music conforms to the laws of Chinese intonation.
Figure 8. Final sentence voice F0 line of #1 (declarative intonation).

The lyrics in Song #1 consist of six five-word verses. The piece is divided into six verses using a pattern of long, voiceless stanzas, and pitch drops. Figure shows the frequency curve of reading Song #1, ‘Tang You Ren Ai Ta, Geng Ru He Dai Ta. (What would happen to him if someone harmed him?) in declarative order. The sentence F0 range was 118.42-384.80 Hz, with a variance of 52.42 Hz and a mean value of 213.46 Hz. The sentence has two boundary structures (indicated in brackets in the chart); the first ending at 19.5 s, with a right boundary indicated by a pitch drop. The second combined structure terminates at the end of the sentence, again showing a clear trend of pitch descent.

4.1.3.2. Interrogative sentences (yes-no interrogative intonation)

Figure shows the speech fundamental frequency analysis of the final sentence in Song #1 (yes-no interrogative intonation). The speech’s melodic contour is maintained at 161.07 Hz-332.29 Hz (M = 201.68 Hz, SD = 171.23 Hz) The melodic line is relatively smooth and there is no obvious descending contour. This phenomenon contrasts with the declarative statements’ pitch contours in the figure, where both phrases’ boundaries are marked by a rising pitch movement, which is a distinctive feature of Mandarin yes-no interrogative sentences.

Figure 9. Final sentence voice F0 line of #1 (yes-no interrogative intonation).

Pitch curve of a yes-no interrogative intonation in Song #1. The direction of the music conforms to the laws of Chinese intonation.
Figure 9. Final sentence voice F0 line of #1 (yes-no interrogative intonation).

The horizontal coordinate of the Figure is time (s), the vertical coordinate is F0 (Hz), and the audio is recited by participant A.

Further reference is made to the musical melodic character of the phrase (Figure ). The score example shows that the composer separates the upper and lower phrases through musical ornamentation, which constitutes the contour boundary of the upper phrase, concretely reflected in the ‘a tempo’ increasing the timing of the sinking notes. This contrasts with the previous phrase through the sudden intensity and speed recovery, shaping the sense of boundary between phrases. Although the following vocal melody ends in a smooth cascade descent, the accompanying harmonic limbs form the boundaries of a rising pitch organisation with successive third cascades. This corresponds to the boundaries’ contours for the greatly expanded tonal range at the end of the phrase, which shows in the interrogative intonation. Similarly, in ‘New Poetry Collection’, the question ‘How can I not think of him?’ recurs in Song #1 and the recurring question ‘How can I not think of him?’ in Song #11 and ‘Who bought the cloth?’ in Song #4, etc.

Figure 10. The score excerpt of Song #1 final phrase.

A score excerpt of Song #1 final phrase with three kinds of annotation: lyric in Chinese, lyric in English. And pinyin and value of citation tones. A clear correlation appears at the red box in the figure.
Figure 10. The score excerpt of Song #1 final phrase.

4.1.3.3. Interrogative sentences (specific interrogation)

The horizontal coordinate in Figure is time (s) and the vertical coordinate is fundamental frequency (Hz). The A graph shows the F0 curve (green curve) and the version’s spectrogram (orange-green bottom graph); the B graph shows the F0 (black curve) of the voice recorded by A, labelled using pinyin and pentatonic markers.

Figure 11. Comparison of the F0 of Song #4 selection’s singing (A) and voice(B) melody.

An F0 comparison graph of Song #4 selection's singing and voice melody, including the top and bottom parts. The two figures show a clear reverse contrast at the red box. Figure long description: The horizontal coordinate in Figure is time (s) and the vertical coordinate is fundamental frequency (Hz). The A graph shows the F0 curve (green curve) and the version’s spectrogram (orange-green bottom graph); the B graph shows the F0 (black curve) of the voice recorded by A, labelled using pinyin and pentatonic markers.
Figure 11. Comparison of the F0 of Song #4 selection’s singing (A) and voice(B) melody.

In this segment, the individual fundamental frequency curves of the words ‘shi51, shei35, bu51’ exhibit a remarkable level of consistency, as illustrated in Figure . When examining the overall pitch contour, it becomes evident that both the trajectory of pitch changes in speech and the melodic direction of the musical piece follow a similar pattern, transitioning from an ascending to descending trend and culminating in a peak in the middle of the phrase. Figure -B, however, presents a slight difference. While the phonetic progression appears to be tonally expanded, particularly on the interrogative pronoun ‘shei35,’ in the musical rendition, the frequency peak aligns with the word ‘mai214.’ This suggests that the peak contour lines of the selected segment and speech are essentially identical, indicating that this segment still adheres to the phonetic pitch characteristics within the context of artistic treatment.

4.2. Tone

The aforementioned findings provide confirmation that Zhao Yuanren based his selection of musical notes on the distinction between ‘even’ and ‘oblique’ tones. It's worth noting that this principle of composition is interwoven within each tune of the New Poetry Collection, which Zhao Yuanren referred to as ‘rhyming words’ (韵字). Rhyming words is a relatively abstract concept, with Zhao Yuanren only indicating that it is primarily used for ‘important words.’ However, neither his own work nor subsequent research has provided a standardised definition or usage scenarios for Rhyming words.

In the New Poetry Collection, we encounter examples that may shed light on this question. For instance, in Song #13 composed in the style of a Scherzo, the choice of melody aligns closely with the Zhong Zhou school'sFootnote14 usage of word tones. However, when examining its specific application, the melody seems to deviate from the direction of the tones (refer to Figure ). The two characters ‘风(feng55) 吹(chui55)’ in the score both bear even tones, but their direction is upward and then flat, which does not adhere to the principle of ‘ping downward.’ This clearly indicates that both musicality and linguistic elements play significant roles in the musical composition of the New Poetry Collection. While the melodies within the ‘rhyming words’ category may predominantly represent linguistic elements, those outside of it may emphasise musical elements to a greater extent.

Figure 12. The score excerpt of Song #13.

A score excerpt of Song #13 final phrase use to illustrate the pitch-tone correlation with three kinds of annotation: lyric in Chinese, lyric in English. And pinyin and value of citation tones.
Figure 12. The score excerpt of Song #13.

Because of the dialectical function of the tone in Mandarin, singing against the rule of the rise and fall of words can lead to semantic errors, which is often referred to in the music field as ‘singing backward (唱倒字)’. For example, in Chorus #14, the lyrics ‘Xing55 Hui55’ can easily be misheard as ‘Xing51 Hui51’. Zhao Yuanren also paid attention to the intelligibility of the voice in his composition. The three changes in Chorus #14: ‘Mu51 Ai214 (twilight)’ to ‘Mu51 Se51 (twilight)’; ‘Ji35 Xuan35 (swirling)’ to ‘Xuan35 Zhuan214 (spinning)’; ‘Zai51 Hai214 Mo51 Li214 (in the seafoam)’ to ‘Zai51 Lang51 Hua55 De51 Bai35 Mo51 Li214 (in the white foam of the waves)’ are notable examples.

In addition, the melody is written in a way that laterally expanded it in line with the direction of the tune. For example, the melody of Song #13, where the word ‘身(Shen55) 影(Ying214)’ appears at the end of two different stanzas, shows a clear consistency of direction (see Figure ). Figure A appears in the third part of the song (bar 94, one-quarter note to a beat), and the lower Figure B appears in the fourth part of the song (bars 164-165, quarter note to a beat). The character ‘身 (shen55)’ (Figure , A and B), which is an even tone, was composed with a unified strategy of upbeats, continuous tones, and short time values in the second-degree progression. The expression of the word ‘影 (ying214)’ raises the vocal position through the upward interval of a third of the tone, the strong beat emphasises the word tone [-ng], and the long time value emphasises the word tone [214] and the rhythmic boundary. Zhao Yuanren stated that such melodic script could effectively prevent ‘people who do not distinguish between -n and -ng [confusing it with] ‘声(sheng55) 音 (yin55)’ (Zhao, Citation1994, p. 112).

Figure 13. Melodic F0 curve for the word ‘身影’ appearing twice in Song #14.

Comparison of the F0 curves for the two occurrences of the word ‘身影’ in Song #14, with four kinds of annotation: score excerpt, lyric in Chinese, lyric in English. And pinyin and value of citation tones. The two clips show a clear contrast in the word ‘影’.
Figure 13. Melodic F0 curve for the word ‘身影’ appearing twice in Song #14.

Having identified the basic principles of Zhao Yuanren’s composition above, we now turn our attention to the internal melody of individual character sounds. Song #14, the longest poem in the New Poetry Collection, was selected for analysis (350 total characters). The tone proportions are level tone 24.2%, rising tone 18.5%, falling-rising tone 25.1%, falling tone 23.7%, and light tone 8.2%. In line with Zhao Yuanren’s principle of creating rhyming words as objects, this study also uses representative rhyming words for visual analysis.

The parts of Figure circled in red are examples of the correspondence between Mandarin tones and melodies.

Figure 14. Song#14: A selection of the five tones in Mandarin.

A pitch- tone correlation Schematic, including five kinds of tone (level, rising, falling-rising, falling, light tone). The figure circled in red are examples of the correspondence between Mandarin tones and melodies.
Figure 14. Song#14: A selection of the five tones in Mandarin.

Through comparison, we found that the individual ‘rhyming words’ melodic expression in the New Poetry Collection are consistent with the direction of linguistic word-phoneme mapping in the score (as listed in Table ). This phenomenon is reflected in varying degrees in each piece of the New Poetry Collection. Because of limited space, only some examples of fragmentary scores are discussed here as examples.

Table 6. Melodic characteristics of tone and melody in Song #14.

We found relatively insignificant similarities in the melodic voices at the falling tone. This may be because the phenomenon of ‘falling-rising and falling mixing (上去混用)’ inherent in Mandarin tones caused Zhao Yuanren to overlook it in his composition. In Chinese phonetics, pronunciation of the declension has been debated since ancient times; there are countless polysyllabic characters in common speech that use one word for two purposes. Several theories, such as the ‘ancient two-voice theory (古无去两声说)’ and the ‘ancient no-voice theory (古无去声说)’ have been used to illustrate the historical homology and acoustic similarity of the upper and lower voices. One study pointed out that the acoustic indicators of the declension are very complex, and their trends differ. Differences in opening patterns, vowels, rhymes, and other linguistic factors can cause the melodic line of the falling tone declension, which should be downward, to move upward in the opposite direction (Zhao & Fei, Citation2017). In addition, in some Chinese dialects (e.g. Suzhou, Changsha, Guangzhou, etc.), pronunciation of vowel-rhyme vowels varies, resulting in the phenomenon of ‘falling-rising and falling mixing (上去混用)’ (Wang & Shi, Citation2020). These linguistic arguments may explain the unique melodic characteristics of the falling tone in music.

4.3. Ornamentation

Musical ornamentation serves as a conditioning mechanism through which speech melodies are subordinated and harmonised within the realm of musical expression. To prevent excessive bias towards phonetic elements in music, ornamentation should be guided by principles that respect artistic expressive features and tonal construction (Sun, Citation2022). Art songs, while incorporating numerous phonetic features while maintaining musical expression, often incorporate various types of musical ornamentation. This phenomenon is particularly evident in the New Poetry Collection, where a diverse range of musical ornamentation is employed. This includes dense strong and weak ornamentation, extended ornamentation, sustaining ornamentation, glissando ornamentation, appoggiatura, and staccato ornamentation, among others. Among this wide array of musical ornamentation, the most pronounced correlation between music and speech is found in the appoggiatura and the huayin ornamentation. These will be discussed separately in the following sections.

4.3.1. Phonological substrates in huayin ornamentation

Huayin, also known as ‘滑音’ in Chinese, is a form of pitch bending, which is an ornamentation technique involving the gliding from one note to another within the melodic progression of a melody. Typically, this ornamentation is placed between two adjacent notes, and the direction of the line represents the direction of the glissando. In traditional Chinese folk songs, the use of ‘huayin’ has become an important and common way to express the melodic style of vernacular music.

Huayin is frequently encountered in the New Poetry Collection, with approximately 57% of the compositions incorporating the huayin technique. Its presence often leads to adjustments in the tonal qualities of words, aligning them more closely with the melodic progression of the music. This enhances the intelligibility of the music. Figure illustrates the sound intensity curve and corresponding musical excerpt from Audio #13 (bars 19-22). The two words within the red box represent falling tones, and the inherent melodic progression is from high to low. However, both musical examples exhibit a descending melodic progression due to the inclusion of huayin. Notably, the character ‘看 (kan51)’ resides at the boundary of intonation, and with the incorporation of alliteration and huayin, it experiences a gradual descent, signified by the ritardando marking. The syllable ‘an’ of this character descends to the lower register without any delay, resulting in a clear and typical Chinese gentle temperament.

Figure 15. Song #14: Score excerpt and intensity curve at huayin.

A huayin-Chinese tone correlation Schematic, including the upper and lower parts, with the red lines framing the segments illustrating the existence of the correspondence. Figure 15 long description: The upper part of Figure 15 shows the musical sound intensity curve, and the lower part shows the corresponding spectral example selection. The horizontal coordinate is time (s) and the vertical coordinate is sound intensity (dB) (Audio #13).
Figure 15. Song #14: Score excerpt and intensity curve at huayin.

In Figure (right diagram), the main melody at the word ‘似 (si51)’ is originally a parallel progression of F-F, but the addition of the appoggiatura completes the typical phonetic downward fifth in the same direction.

Many of the musical melodies with added huayin present distinct phonetic elements, as exemplified by the melody in Figure . Zhao Yuanren discovered the natural characteristic of less sliding pitch and more definite pitch in vocal singing and regarded it as the major difference compared to natural tone, and the major obstacle when ‘the composer tries to match the work with the level’ (Zhao, Citation1994, p. 12). To this end, he believes that the frequent use of huayin in the New Poetry Collection is not only for the sake of musical expression but also by adapting the micro-melodic direction (smaller than the musical score) to bury the national gene deep in the linguistically significant art songs.

4.3.2. Phonological substrates in appoggiatura ornamentation

Zhao Yuanren attaches great importance to the phonetic expression of the leaning tone. Up to 92% of the songs in the New Poetry Collection use the appoggiatura ornamentation technique (for details, see Songs 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14). Figure (left) shows the score of Song #6 (bar 18) as an example. In the score, three musical ornamentation are used for the word ‘笋’ (sun214): the sustaining ornamentation, the legato ornamentation, and the appoggiatura ornamentation. The sustaining ornamentation locates the focal point of the phrase on the word ‘笋’ through free prolongation, while the legato ornamentation and the appoggiatura ornamentation together shape the musical character of the word ‘笋’ into a coherent and suppressed tone.

Figure 16. Score excerpt from Song #6 at the appoggiatura (left) and Song #9 with huayin and appoggiatura (right).

A huayin/appoggiatura-Chinese tone correlation Schematic, including the upper and lower parts, with the red lines framing the segments illustrating the existence of the correspondence. Figure long description: The upper part of Figure shows the intensity curve, the lower part shows the corresponding spectral example selection, the horizontal coordinate is time (s), and the vertical coordinate is intensity (dB) (audio material Song #6 and Song #9).
Figure 16. Score excerpt from Song #6 at the appoggiatura (left) and Song #9 with huayin and appoggiatura (right).

The melodic fragments in Figure (right) are from bars 75–76 in Song #9. The melodies in the red boxes show clear phonetic homogeneity. First, the word ‘老 (lao214)’ has two kinds of ornamental tones, the glissando, and the appoggiatura. The word’s phonetics are a falling-rising tone, which is characterised by a short descent followed by an immediate ascension to the second treble value. In the melody, the word uses the light tone ‘的(de)’ in front of it, which allows a short descent, and then uses the appoggiatura and glissando to raise the pitch by a third, which perfectly corresponds to the quality of descending and then ascending in speech. The diagram also shows that the character ‘树 (shu51)’ is connected to the third downward note by a legato ornamentation, echoing the downward character in the falling tone.

This finding further validates the hypothesis of this study. Appoggiatura serves as a means of embellishment in various traditional Chinese operas, ancient poems, and local folk songs, enhancing the artistic expression of vocal performances (Guo, Citation2019; Ma, Citation2020). For instance, in the rendition of Shanxi folk songs, singers often embellish and adjust words by introducing improvised appoggiatura to align with the word's tonal characteristics. Similarly, in the realm of phonetics, empirical studies have emerged, exploring the use of huayin and appoggiatura to convey Chinese words with tonal nuances. Chen contends that huayin permits continuous alterations within a character, whereas appoggiatura allows for brief descents in the upper voice and other rapid acoustic shifts (Chen, Citation2012).

5. Conclusion

In conclusion, through a comprehensive corpus analysis of the New Poetry Collection, a significant collection of Chinese art songs, this study has illuminated how phonetic elements are intricately woven into the fabric of these compositions. The examination has unveiled their influence on melodic progression strategies, phrase combinations, musical element arrangements, and the incorporation of ornamentation.

Within the research framework presented in this paper (Figure model, employing a quantitative linguistic approach), the relationship between music and language in Chinese art songs is effectively expounded, both in terms of representational dimensions and intrinsic mechanisms. This underscores the efficacy of our research framework in addressing these pertinent issues. Within this framework, it becomes evident that phonetics is deliberately employed to shape the distinct Chinese auditory aesthetic characteristics inherent in Chinese art songs.

A significant contribution of this study lies in the identification of specific linguistic constraints governing the subset of art songs under examination. We have discerned shared pitch features between music and language in Chinese art songs, such as a preference for small pitch intervals, congruent melodic directions, and alignment of character tones with musical keys. Furthermore, our multidisciplinary analytical approach has unveiled traces of the composer's intentional creation of internal connections within the compositions. Chinese art songs adeptly regulate the balance of speechiness in the musical melody through the use of rhyme words and ornamentation, thus molding the auditory qualities of these art songs with a distinctive Chinese flavour. Rhyme words primarily match phonetics, while non-rhyme words are allowed greater creative freedom guided by music theory. Ornamentation, exemplified by huayin and appoggiatura, modulates melodic direction at the levels of intensity and pitch.

This perspective beckons us to contemplate the potential of Chinese art songs in shaping national and self-identity. In this new phase of contemporary musicology, maintaining a dynamic equilibrium between music and phonology in vocal arts becomes imperative (Meng, Citation2022; Wang, Citation2019). Phonetics, as the cornerstone of vocal music, holds a pivotal role in Chinese musical elements. The phonetic elements unearthed in this study offer a pathway to nationalise art songs and can potentially extend to other art forms. By accounting for Chinese character pitch, intonation, melodic direction, phrase arrangement, pitch selection, and ornamentation, music can embody a distinct cultural identity within the realm of Chinese art songs, authentically ‘sounding Chinese’.

Notably, this study answers the call made by Yang Yinliu, a revered Chinese musicologist, to acknowledge Zhao Yuanren's significance in the domain of linguistic musicology (Yang, Citation1983). Through empirical verification of the language-musicianship synergy evident in Zhao Yuanren's vocal compositions, this research not only reaffirms his position as the pioneer of language-musicology in Chinese culture but also underscores the academic value of the New Poetry Collection.

6. Limitation

We must acknowledge that the pattern of tonal unfolding in music is an integrated and complex system. The floating of any element in the system can cause differences in acoustics, and the elements’ interrelationships have been studied as ‘meta-relationships’ (Patel, Citation2007, pp. 153–157). In this extensive system, the sung words’ influence on the cantata is primarily driven by their vocal tones (Yu, Citation2008, p. 15), but the cantata is also influenced by other phonetic components of the sung words. Changes in any of the elements of speech structure, such as tone (intonation), sharpness, tone length, and stress, can result in very different vocalizations and auditory perceptions. That is, word and tone violations in musical melodies do not imply the failure of phonetic elements embedded in the sung words; reasons for the changes and cognitive differences behind them should be analyzed from a more macroscopic perspective.

Supplemental material

Supplemental Material

Download MS Word (22.5 KB)

Acknowledgments

The authors would like to thank Janette for English language editing.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Notes

1 This estimate is generally accepted by linguists and is based on Ethnologue: Languages of the World. (http://www.ethnologue.org)

2 Pinyin phonetic symbols: Symbolic markers of the Chinese tonal system.

3 Five Level Tone Mark: This is a method invented by Zhao Yuanren in 1920 to record the tonal values of Chinese language tones (Chao, 1930). It is the most effective quantitative marking scheme in the study of Chinese phonology. This notation method is based on a relatively average pitch range and trend (Zhao, Citation1994), and is built on the twelve mean rhythms in music, with horizontal coordinates indicating time and vertical coordinates indicating relative pitch, forming a five-point notation corresponding to the pentatonic scale. This is expressed as the relative pitch tonal value numbers 1, 2, 3, 4, and 5, corresponding to low, semi-low, medium, semi-high, and high (Zhao, Citation1994).

4 Traditional tone: A system for classifying early Chinese tones.

5 That is, if the syllable is marked with the pitch value ‘55’, it implies a high-high pitch case with a pitch equivalent to the interval of the note G-G, similar to the following.

6 The ToBI system is also called Pitch and Pause marks.

7 AM theory is also known as ‘autonomous segmental-rhythmic theory’.

8 This is a Chinese translation of an article written in English by Dr. Rulan Chao, daughter of the late renowned linguist and composer Mr. Zhao Yuanren, and a professor at Harvard University, when she was working in the United States. The origin article ("My Father's (YR Chao) Musical Life.") was published in the U.S. edition of The Stone Lion Review, No. 13:66-74, 1985.

9 Non-alphabetic syllables: i.e., the sustained pronunciation of the syllable after the alphabetic pronunciation.

10 The concept of pitch here is the Acoustical Society of America’s standard definition.

11 The algorithms and concepts will be developed in detail in the Procedure section.

15 The way the themes are classified here is based on two new poetry anthologies, the New Poetry Collection and the Classified Vernacular Poetry Selection. (Shanghai New Poetry Society Editorial Board, 1920).

16 The song is an art song based on the Chinese translation of Dumas’ ‘La Dame aux caméllias’, and its tune is not directly related to the original play.

12 Research determination: p≤0.05 was found to be significantly correlated.

13 O: Oblique Tone, E: Even Tone.

14 Jinhuzhongzhou school: Zhao Yuanren in ‘The Problem of Norms in Chinese Phonetics’ summarized the traditional ways of matching vocal tones with musical tones into three and used them crosswise in creative practice. The first and most commonly used type: Jinhuzhongzhou school, with the even tone downward or relatively lower and the oblique tone upward or relatively higher. The second type is the Guoyin school, which follows the tone of Mandarin for raising and lowering; its usage rate is slightly lower than the first type. The third type ignores the word tune school, which focuses entirely on the expression of music (Zhao, Citation1994).

References

  • Beckman, M, Hirschberg, J, & Shattuck-Hufnagel, S. (2004). The Original ToBi System and the Evolution of the ToBi Framework. Prosodic Typology: The Phonology of Intonation and Phrasing, 9–54. https://doi.org/10.1093/acprof:oso/9780199249633.003.0002
  • Burns, L., & Woods, A. (2004). Authenticity, appropriation, signification: Tori Amos on gender, race, and violence in covers of Billie holiday and Eminem. Music Theory Online, 10(2), https://www.mtosmt.org/issues/mto.04.10.2mto.04.10.2.burns_woods.html.
  • Chaloupková, L. (2021). The Chinese Art song, Yishu Gequ: Between tradition and modernity. Acta Universitatis Carolinae Philologica, 3, 29–46.
  • Chan, M. K. M. (1987). Tone and melody in cantonese. Annual Meeting of the Berkeley Linguistics Society, 13(0), Article 0. https://doi.org/10.3765/bls.v13i0.1828
  • Chen, W. (2017). The development of linguistics in China: A study of the contributions of Yuen Ren Chao and Wang Li. Historiographia Linguistica, 44(1), 1–46. https://doi.org/10.1075/hl.44.1.01che
  • Chen, X. (2012). 试论音乐在对外汉语教学中的应用[An Experiment on the Application of Music in Teaching Chinese as a Foreign Language](硕士学位论文,上海师范大学). https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD201301&filename=1012454946.nh.
  • Chow, I., & Brown, S. (2018). A musical approach to speech melody. Frontiers in Psychology, 9, https://www.frontiersin.org/article/10.3389fpsyg.2018.00247.
  • Cutler, A. (1998). Prosodic structure and word recognition. In A. D. Friederici (Ed.), Language comprehension: A biological perspective (pp. 41–70). Springer. https://doi.org/10.1007/978-3-642-97734-3_2.
  • Dictionary Department, Institute of Linguistics, Chinese Academy of Social Sciences. (1984). 中国音乐词典 [The Chinese Music Dictionary]. 人民音乐出版社. https://book.douban.com/subject/1092531/?i=0.
  • Eidsheim, N. (2014). Race and the aesthetics of vocal timbre (pp. 338–365). https://doi.org/10.1017/CBO9781139208451.012.
  • Faudree, P. (2012). Music, language, and texts: Sound and semiotic ethnography. Annual Review of Anthropology, 41(1), 519–536. https://doi.org/10.1146/annurev-anthro-092611-145851
  • Feng, C. C. (2020). 中国艺术歌曲创作百年巡礼[A century of Chinese Art song composition]. 音乐研究, 4, 105–123.
  • Feng, L. Y., & Xu, C. Z. (2022). 地方戏中清、浊音唱词对唱腔音高、音强的潜在影响 以长沙、岳阳花鼓戏同主题唱段对比分析为例[The potential influence of clear and turbid singing words on the pitch and intensity of the singing voice in local operas–a comparative analysis of the same theme singing in Changsha and Yueyang Hua Gu operas as an example]. 中国戏剧, 02, 63–65.
  • Fine, P. A., & Ginsborg, J. (2014). Making myself understood: Perceived factors affecting the intelligibility of sung text. Frontiers in Psychology, 5, https://www.frontiersin.org/articles/10.3389fpsyg.2014.00809.
  • Gamble, C. (2012). When the words Dry Up: Music and material metaphors half a million years ago. In N. Bannan (Ed.), Music, language, and human evolution (pp. 81). Oxford University Press. https://doi.org/10.1093/acprof:osobl/9780199227341.003.0004.
  • Gibson, A. (2023). Pop Song English as a supralocal norm. Language in Society, 1–28. https://doi.org/10.1017/S0047404523000131
  • Gilbers, S., Hoeksema, N., de Bot, K., & Lowie, W. (2020). Regional variation in west and east coast African-American English prosody and Rap flows. Language and Speech, 63(4), 713–745. https://doi.org/10.1177/0023830919881479
  • Guo, C. (2019). Chinese television and national identity construction: The cultural politics of music-entertainment programmes. Chinese Journal of Communication, 12(2), 247–248. https://doi.org/10.1080/17544750.2019.1584470
  • Han, Q., & Sundberg, J. (2017). Duration, pitch, and loudness in Kunqu opera stage speech. Journal of Voice, 31(2), 255.e1–255.e7. https://doi.org/10.1016/j.jvoice.2016.06.014
  • Haspelmath, M., & Bibiko, H.-J. (2005). ‘The world atlas of language structures’. Oxford linguistics. Oxford University Press.
  • Heng, D. (2017). 汉语自然话语陈述句音高下倾研究[A study on the downward pitch tilt of declarative sentences in Chinese natural speech](硕士学位论文,暨南大学). https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD201801&filename=1017868312.nh.
  • Huang, W., Yu, S., Shi, Q. L., & Ran, Q. B. (2022). 普通话语句音强变化模式——基于SCS和DBS语料的分析[Patterns of phonetic intensity variation in Mandarin utterances–an analysis based on SCS and DBS corpus]. 智能计算机与应用, 12(02), 24–31.
  • Hudson, A I, & Holbrook, A. (1982). Fundamental Frequency Characteristics of Young Black Adults: Spontaneous Speaking and Oral Reading. Journal of Speech, Language, and Hearing Research, 25(1), 25–28. https://doi.org/10.1044/jshr.2501.25
  • Huron, D. (2006). Sweet anticipation: Music and the psychology of expectation (pp. xii, 462). The MIT Press.
  • Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. Frontiers in Psychology, 8, https://www.frontiersin.org/articles/10.3389fpsyg.2017.00621.
  • Jiang, Y. D. (2008). 从赵元任艺术歌曲中音乐与语言的共通性看其对现代音乐史的影响 [The influence of Zhao Yuanren’s art songs on modern music history from the commonality of music and language in his art songs], [master’s thesis, Jilin University]. https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD2008&filename=2008064764.nh.
  • Johanneke, M., & Schreuder, M. (2022). Prosodic processes in language and music.
  • Jun, S.-A., & Fougeron, C. (2000). A phonological model of French intonation. Intonation, 209–242. https://doi.org/10.1007/978-94-011-4317-2_10
  • Jun, S.-A., & Fougeron, C. (2002). Realizations of accentual phrase in French intonation. Probus, 14, https://doi.org/10.1515/prbs.2002.002
  • Jusczyk, P., & Krumhansl, C. (1993). Pitch and rhythmic patterns affecting infants’ sensitivity to musical phrase structure. Journal of Experimental Psychology: Human Perception and Performance, 19(3), 627–640. https://doi.org/10.1037/0096-1523.19.3.627
  • Kennedy, S., & Trofimovich, P. (2008). Intelligibility, comprehensibility, and accentedness of L2 speech: The role of listener experience and semantic context. The Canadian Modern Language Review, 64(3), 459–489. https://doi.org/10.3138/cmlr.64.3.459
  • Kimball, C. (2013). Art song: Linking poetry and music. Hal Leonard Corporation.
  • Kirby, J., & Ladd, D. R. (2016). Tone-melody correspondence in Vietnamese popular song. 5th international symposium on tonal aspects of languages (TAL 2016), 48–51. https://doi.org/10.21437/TAL.2016-10.
  • LaBelle, B. (2010). Raw orality: Sound poetry and live bodies. Voice, 146.
  • Ladd, D. R., & Kirby, J. (2020). Tone–melody matching in tone-language singing. In C. Gussenhoven, & A. Chen (Eds.), The Oxford handbook of language prosody (pp. 675–688). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198832232.013.47.
  • Ladd, D. R., Kirby, J., Ladd, D. R., & Kirby, J. (2020). Tone–melody Matching in tone-language singing. 675–688. https://doi.org/10.1093/oxfordhb/9780198832232.013.47.
  • Li’an, C., & Yodwised, C. (2022). Chinese Art songs in 1920-1949: Vocal pedagogy and singing technique. Asia Pacific Journal of Religions and Cultures, 6(1), Article 1.
  • Lin, M. C. (2004). 汉语语调与声调[Chinese intonation and tones]. 语言文字应用, 3, 11.
  • Liu, C. (2011). 中国古代音乐语言与文学语言之关系研究[A study of the relationship between ancient Chinese musical language and literary language](博士学位论文,华中科技大学). https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CDFD0911&filename = 1011110108.nh.
  • Liu, H., Jiang, K., Gamboa, H., Xue, T., & Schultz, T. (2022). Bell shape embodying Zhongyong: The pitch histogram of traditional Chinese anhemitonic pentatonic folk songs. Applied Sciences, 12(16), Article 16. https://doi.org/10.3390/app12168343
  • Ma, Y. F. (2020). 探究古诗词演唱中的倚音演唱——以《武陵春》《长相知》为例[Exploring the leaning tone singing in the singing of ancient poems–Taking “Wuling Spring” and “Chang Xiang Zhi” as examples]. 北方音乐, 11, 87–88.
  • Malawey, V. (2020). A blaze of light in every Word: Analyzing the popular singing voice. https://doi.org/10.1093/oso/9780190052201.001.0001.
  • Marshall, L. (2019). Do people value recorded music? Cultural Sociology, 13(2), 141–158. https://doi.org/10.1177/1749975519839524
  • Mauch, M., Cannam, C., Bittner, R., Fazekas, G., Salamon, J., Dai, J., Bello, J., & Dixon, S. (2015). Computer-aided melody note transcription using the tony software: Accuracy and efficiency. Proceedings-First International Conference on Technologies for Music Notation and Representation (TENOR).
  • McGowan, R. W., & Levitt, A. G. (2011). A comparison of rhythm in English dialects and music. Music Perception, 28(3), 307–314. https://doi.org/10.1525/mp.2011.28.3.307
  • McLeish, T. (2019). The poetry and music of science: Comparing creativity in science and Art. Oxford University Press.
  • Meng, F. Y. (2022). 风起于青萍之末——从音乐与语言的关系看中国音乐的旋律学建设意义[The Wind rises at the End of the green weeds - The Significance of the melodic construction of Chinese music from the relationship between music and language]. 中国音乐(01), 39–42. https://doi.org/10.13812/j.cnki.cn11-1379/j.2022.01.005.
  • Middleton, R. (2006). Rock singing. In P. John (Ed.), The Cambridge companion to singing (Transferred to digital printing (pp. 28–41). Cambridge University Press.
  • Mithen, S. (2007). The singing neanderthals: The origins of music, language, mind, and body. Harvard University Press.
  • Music Notation Software – Sibelius – Avid. (2014). Retrieved 14 August 2023, from https://www.avid.com/sibelius.
  • Neumark, N. (2010). Doing things with voices: Performativity and voice. In N. Neumark, R. Gibson, & T. van Leeuwen (Eds.), Voice: Vocal aesthetics in digital arts and media (pp. 95). The MIT Press. https://doi.org/10.7551/mitpress/9780262013901.003.0006.
  • Patel, A. D. (2007). Music, language, and the brain. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195123753.001.0001.
  • Perrachione, T. K., Fedorenko, E. G., Vinke, L., Gibson, E., & Dilley, L. C. (2013). Evidence for shared cognitive processing of pitch in music and language. PloS ONE, 8(8), e73372. https://doi.org/10.1371/journal.pone.0073372
  • Pichler, P., & Williams, N. (2016). Hipsters in the hood: Authenticating indexicalities in young men’s hip-hop talk. Language in Society, 45(4), 557. https://doi.org/10.1017/S0047404516000427
  • Praat: Doing Phonetics by Computer. (2022). Retrieved 14 August 2023, from https://www.fon.hum.uva.nl/praat/.
  • Pu, H. J., & Xia, X. (2012). “依字行腔”表义功能质疑——兼及我国声乐创作中“词义”表达的一个理论研究盲区 [questioning the functionality of the “word-based accent” - A theoretical research blind spot in the expression of “word meaning” in China's Vocal Music Composition]. 华南师范大学学报(社会科学版), 4, 138–143.
  • Qian, R. (2014). 唱词音声的“音乐性”再认识 [Re-conceptualizing the “musicality” of vocal prosody]. 星海音乐学院学报, 1, 82–85.
  • Qian, R. (2018). 语言音乐学基础[Fundamentals of Linguistic Musicology].中央音乐学院出版社.
  • Qu, G., Sun, Y., Han, B., Yu, P., Liu, J., & Yang, S. (2020). Preliminary study on lyrics intelligibility at different pitches in Chinese vocal music. Acta Oto-Laryngologica, 140(7), 558–563. https://doi.org/10.1080/00016489.2019.1646926
  • Savage, P. E., Passmore, S., Chiba, G., Currie, T. E., Suzuki, H., & Atkinson, Q. D. (2022). Sequence alignment of folk song melodies reveals cross-cultural regularities of musical evolution. Current Biology, 32(6), 1395–1402.e8. https://doi.org/10.1016/j.cub.2022.01.039
  • Schellenberg, M., & Gick, B. (2020). Microtonal variation in sung cantonese. Phonetica, 77(2), 83–106. https://doi.org/10.1159/000493755
  • Steele, J. (1969). An essay towards establishing the melody and measure of speech, 1775. Scolar P.
  • Sun, D. J. (2002). 汉语语法教程[Chinese Grammar Course]. 北京语言文化大学出版社.
  • Sun, H. J. (2022). 论旋律中的族性基因和语言基质——兼论汉语声调的音乐潜能与新韵旧体诗的入乐吟唱方法[The ethnic genes and linguistic substrates in melody: A discussion of the musical potential of Chinese tones and the method of chanting new rhymes and old style poems]. 中国音乐 (01), 10–23 + 29. https://doi.org/10.13812/j.cnki.cn11-1379/j.2022.01.002.
  • Temperley, D. (2022). Music and language. Annual Review of Linguistics, 8(1), 153–170. https://doi.org/10.1146/annurev-linguistics-031220-121126
  • Terken, J., & Collier, R. (1989). Fundamental frequency and perceived prominence of accented syllables. The Journal of the Acoustical Society of America, 86(S1), S35–S36. https://doi.org/10.1121/1.2027475
  • Tokita, A. M., & Cheung, J. H. Y. (2023). The Art song in east Asia and Australia, 1900 to 1950. Taylor & Francis.
  • Trudgill, P. (1983). On dialect: Social and geographical perspectives. Basil Blackwell.
  • Unyk, A. M., Trehub, S. E., Trainor, I. J., & Schellenberg, E. G. (1992). Lullabies and simplicity: A cross-cultural perspective. Psychology of Music, 20(1), 15–28. https://doi.org/10.1177/0305735692201002
  • Vos, P. G., & Troost, J. M. (1989). Ascending and descending melodic intervals: Statistical findings and their perceptual relevance. Music Perception, 6(4), 383–396. https://doi.org/10.2307/40285439
  • Wang, P., & Shi, F. (2020). 汉语普通话不同语句类型的音强分布模式[Phonetic intensity distribution patterns of different utterance types in Mandarin Chinese]. 南开语言学刊 (02), 19–28.
  • Wang, Y. X. L. (2019). 钢琴乐句重音“语言”研究 [A study of the “language” of accentuation in piano phrases] [硕士, 西南大学]. https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CMFD&dbname=CMFD202001&filename=1019913177.nh&uniplatform=NZKPT&v=f1ueu9L1__bJ6vkRnMkNC-0ksblNp0aolKtu13i7ea02YXxUkTncMnpPRWPgYVEy.
  • Watts, R. J., & Morrissey, F. A. (2019). Language, the singer and the song: The sociolinguistics of folk performance. Cambridge University Press.
  • Weidman, A. (2014). Anthropology and voice. Annual Review of Anthropology, 43(1), 37–51. https://doi.org/10.1146/annurev-anthro-102313-030050
  • Wu, J. D. (1985). 戏曲唱腔中的音乐因素和语言因素[Musical and linguistic elements in opera singing]. 中国音乐, 04, 39–42.
  • Yan, J. P., Wang, P., & Shi, F. (2016). 普通话特指问句语调的声学实验分析 [Acoustic experimental analysis of the intonation of Mandarin special interrogatives]. 汉藏语学报 (00), 121–129.
  • Yang, Y., Welch, G., Sundberg, J., & Himonides, E. (2015). Tuning features of Chinese folk song singing: A case study of hua’er music. Journal of Voice, 29(4), 426–432. https://doi.org/10.1016/j.jvoice.2014.08.013
  • Yang, Y. L. (1983). 语言与音乐 [Language and Music]. 人民音乐出版社.
  • Yu, H. Y. (2008). 腔词关系研究 [A study of cadential word relations]. 中央音乐学院出版社.
  • Yukun, C. (2020). The language character of midu folk song. The Study of Culture & Art, 15, 15–46. https://doi.org/10.35413/culart.2020.15..001
  • Zhang, X., & Cross, I. (2021). Analysing the relationship between tone and melody in Chaozhou songs. Journal of New Music Research, 50(4), 299–311. https://doi.org/10.1080/09298215.2021.1974490
  • Zhao, K., & Fei, L. H. (2017). 基于五度标记法 精确描述汉语声调 [Accurate description of Chinese tones based on the five-degree marking method]. 广州广播电视大学学报, 17(01), 49–54 + 110.
  • Zhao, Q. (1992). 访赵元任兼谈词曲的配合[An interview with zhao yuanren and the matching of lyrics and music]. 中国音乐, 1, 54–55.
  • Zhao, R. L., & Xue, L. (1987). 我父亲的音乐生活 [My father’s musical life]. Chinese Music, 03, 1968.
  • Zhao, Y. R. (1994). 赵元任音乐论文集 [Zhao Yuanren Music Essay Collection]. 中国文联出版公司出版社.
  • Zhao, Y R. (2005). 赵元任全集 [The Collected Works of Zhao Yuanren] (Vol. 11).
  • Zou, I. Y., & Wang, W. S.-Y. (2021). The musical language of yuen Ren chao: A case study of the modernization of Chinese music. Journal of Chinese Linguistics, https://doi.org/10.1353/jcl.2017.0098