Predicting phonetic transcription agreement: Insights from research in infant vocalizations: Clinical Linguistics & Phonetics: Vol 21 , No 10

Abstract

The purpose of this study is to provide new perspectives on correlates of phonetic transcription agreement. Our research focuses on phonetic transcription and coding of infant vocalizations. The findings are presumed to be broadly applicable to other difficult cases of transcription, such as found in severe disorders of speech, which similarly result in low reliability for a variety of reasons. We evaluated the predictiveness of two factors not previously documented in the literature as influencing transcription agreement: canonicity and coder confidence. Transcribers coded samples of infant vocalizations, judging both canonicity and confidence. Correlation results showed that canonicity and confidence were strongly related to agreement levels, and regression results showed that canonicity and confidence both contributed significantly to explanation of variance. Specifically, the results suggest that canonicity plays a major role in transcription agreement when utterances involve supraglottal articulation, with coder confidence offering additional power in predicting transcription agreement.

Keywords

Notes

1. Throughout this paper the terms “transcription reliability” and “transcription agreement” are used interchangeably, although “agreement” is, in principle, a type of “reliability”. Cucchiarini (Citation1996) clarifies the distinction.

2. The criterion for judgement of canonicity in infant vocalizations has always been (and to the present remains) primarily auditory. The methods section provides a description of the auditory judgement procedure that has been used in the second author's laboratories for years. A primary reason that we continue to focus on auditory rather than instrumental acoustic judgements in this research is that the definition of canonical syllables in acoustic terms is still not yet fully established. We here offer a brief summary of the acoustic status of the definition.

A criterion duration for formant transitions has been nominally established based upon acoustic examination of relatively measurable formant transitions in auditorily judged canonical and non‐canonical syllables from infants, as summarized in Oller (Citation2000a). Measurement of formant transitions in infant vocalizations can be extremely difficult, especially because high pitch of many infant syllables produces harmonics that are very widely spread. The nominal criterion based on relatively measurable transitions is 120 ms (usually focusing on F2) as a maximum for canonical syllables. This value is primarily based on infant syllables where both F1 and F2 have been reliably visible in spectrographic displays with at least 600 Hz analysis bandwidth, and where F1 and F2 vary from a consonantal locus to a nuclear (vowel) locus and then reverse slope. The end of the formant transitions can thus be referenced to a steady state or a reversal of slope.

However, beyond the nominal durational criterion, it is clear that to attain a generally applicable acoustic definition of canonical syllables, additional specification is needed to account for differing types of syllables and differing utterance‐level patterns. For example, syllables with nasal or aspirated consonants often show extremely short formant transitions in acoustic displays, and we are investigating the utility of amplitude rise time as a possible substitute for or supplement to formant transition duration as a criterion for canonicity in such cases. Also, at slow speaking rates the maximum transition duration may need to be higher than at rapid rates.

A transition slope criterion is also obviously required (because if slope is too low, change in formant frequency would be heard as no change). The slope data on dysarthric patients from Kent et al. (Citation1989) focused on circumstances where F2 appeared to be a useful focus for determining a criterion for intelligible syllables: If average F2 transition slopes were lower than a ratio of 2.5 (Hz ms⁻¹), speakers proved to be highly unintelligible. However, there are of course canonical syllable types where F2 slope is inherently low, i.e. if F2 locus for a consonant is near the F2 target for its adjacent vowel. So slopes of other formants (F1 and/or F3) may need to be referenced to determine canonicity in such cases. Further, the slope criteria for canonical transitions suggested by the adult F2 data would presumably need to be normalized for infant formant values which are known to vary widely from those of adults. As research proceeds towards the development of a more elaborate and finely tuned acoustic definition of the notion canonical syllable, it will have to make reference to a wide variety of acoustic facts, but the success of the approach will always need to be referenced to auditory judgements of real listeners about well‐formedness. In the meantime, auditory judgements remain at centre stage in the judgement of canonicity.

3. There was, in fact, wide variation in transcriber criteria for the assessment of canonicity. The mean utterance canonicity value for the 8 transcribers ranged from .90 to .25. This range could presumably have been limited by training to specific criteria of canonicity judgement based on work with many exemplars of infant utterances. However, it was our goal to assess intuitive responses both in terms of canonicity and confidence judgements. Hence, coders made their own decisions about how best to interpret the canonicity definition after it was provided along with a few example utterances during the training period. In the future we hope to conduct research to compare reliability for canonicity judgements in three circumstances: (a) as in the present work, with minimal criterion setting through training, (b) with much more rigorous training to limit the variation in criteria among coders, and (c) with purely instrumental acoustic canonicity judgements. Approach (c) will only be possible to implement after further specification of acoustic criteria for canonicity (see note 2).

4. The utterance canonicity judgements for this t‐test analysis were not identical to the ones utilized in the correlational analyses. For the t‐test analysis we sought to indicate lack of canonicity in terms of articulatory transitions that auditorily seemed particularly disruptive to the rhythmic structure of whole utterances. In contrast the judgements for the correlational analyses were made at the level of the segment with no particular attention to the utterance as a whole.

5. In a separate descriptive analysis utilizing the segment‐by‐segment judgements from the correlational analyses, we split the distribution by utterances with canonicity values above the mean and those with values below the mean. Results were similar to those reported in the main text for utterances categorized as canonical or non‐canonical—in the split plot analysis, utterances with canonicity values above the mean had a similar and slightly larger advantage in transcription agreement over utterances with canonicity values below the mean (.65 vs .45, respectively).

6. The 7 comparator coders utilized a variety of phonetic symbols corresponding to sounds not occurring in American English in addition to those listed for the standard coder. Among the standard coder's non‐native symbols were many that occurred multiple times across the comparator coders. The additional non‐native symbols not utilized by the standard coder were used very infrequently across the other coders, the great bulk of them exactly once. To simplify the LIPP^™ analysis program (which would have had to be elaborated greatly to incorporate all the infrequently occurring symbols as non‐natives), we resolved to key the analysis on the standard coder's utilization of non‐native symbols.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 65.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 484.00 Add to cart

* Local tax will be added as applicable

Predicting phonetic transcription agreement: Insights from research in infant vocalizations

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Predicting phonetic transcription agreement: Insights from research in infant vocalizations

Abstract

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature