982
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

Adding a fourth rater to three had little impact in pre-linguistic outcome classification

, , , , &
Pages 138-153 | Received 08 Jan 2020, Accepted 17 Apr 2020, Published online: 06 May 2020

ABSTRACT

The consequence of differing levels of agreement across raters is rarely studied. Subsequently, knowledge is limited on how number of raters affects the outcome. The present study aimed to examine the impact on pre-linguistic outcome classifications of 12-month-old infants when using four raters compared to three. Thirty experienced Speech and Language Therapists (SLTs) from five countries assessed 20 minute video recordings of four 12-month-old infants during a play session with a parent. One recording was assessed twice. A naturalistic listening method in real time was used. This involved: (1) assessing, each syllable as canonical or non-canonical, and (2) following the recording, assessing if the infant was babbling canonically and listing the syllables the infant produced with command. The impact that four raters had on outcome, compared to three, was explored by classifying the outcome based on all possible combinations of three raters and determining the frequency that the outcome assessment changed when a fourth assessor was added. Results revealed that adding a fourth rater had a minimal impact on canonical babbling ratio assessment. Presence/absence of canonical babbling and size of consonant inventory showed a negligible impact on three out of four recordings, whereas the size of syllable inventory and presence/absence of canonical babbling was minimally affected in one recording by adding a fourth rater. In conclusion, adding a forth rater in assessment of pre-linguistic utterances in 12-month-old infants with naturalistic assessment in real time does not affect outcome classifications considerably. Thus, using three raters, as opposed to four, is recommended.

Background

Observational data based on perception or judgement is subject to potential variation caused by 1) the instrumentation or coding procedure, 2) the condition of the observation, 3) the subjects under observation and 4) the rater themselves (Cordes, Citation1994). For this reason, observations based on a single rater are regarded poor practice (Wyatt et al., Citation1996) and observations based on multiple raters are preferred (Cordes, Citation1994). Any observation involving multiple raters requires careful consideration with regards to the number of raters required to ensure that observations are as stable and independent of inter-rater variation as possible. Researchers have attempted to limit the variation caused by multiple raters by training (Laski et al., Citation1988; Sell et al., Citation2009; Willadsen et al., Citation2017). Calibrations have focused on inter and intra agreement across raters’ observations and on procedure of assessment, for example, the description of variables (Cordes, Citation1994). For example, severe hypernasality is described as ‘evident on vowel productions and voiced consonants’ in the Cleft Audit Protocol for Speech (CAPS-A), (John et al., Citation2006). Despite calibration and detailed definitions of variables, sources of variation such as observer internal standards, attention lapses, fatigue and human error can still exist (Keuning et al., Citation1999; Kreiman et al., Citation1993).

Phonetic transcription is the most commonly used method to assess pre-linguistic utterances in infants. However, phonetic transcription was developed to measure mature speech and concerns have been raised about its suitability in a younger cohort (see Ramsdell et al., Citation2007, for a discussion). This detailed type of listening, where every sound is transcribed after the rater has listened several times to each sound, appears to overestimate sound inventories of infants compared to parental report (Ramsdell et al., Citation2012). Another area of concern is the decrease in agreement between raters with a more detailed transcription, for example, by adding several diacritics (Cucchiarini, Citation1996; Shriberg & Lof, Citation1991; Stoel-Gammon, Citation2001). Phonetic transcription is also time and resource consuming, leading to variations in practice (see ). There is a need to develop less time-consuming but valid and reliable assessment methods.

Table 1. Examples of different approaches to assessment of inter-rater reliability of phonetic transcription of pre-linguistic speech

Naturalistic listening has more recently been developed as a method to evaluate pre-linguistic utterances in a manner resembling the way parents listen to their infants in real time (Ramsdell et al., Citation2012). This method requires raters to listen to an entire speech sample, in real time, without taking notes during the listening. This method is thought to be appropriate for infants and is more time efficient than phonetic transcription. Ramsdell et al. (Citation2012) utilized four raters, independent of the child, and reported the outcome as the mean value of the four assessments. They concluded that a combination of caregiver report and multiple raters can provide a new method of assessing pre-linguistic utterances. Naturalistic listening has since been used to assess canonical babbling ratio (CBR), canonical babbling status and syllable volubility in different clinical populations, differentiating these from typically developing infants which tend to have good agreeability (Belardi et al., Citation2017; Patten et al., Citation2014).

Willadsen et al. (Citation2019) explored the difference between raters’ assessments of syllable inventory size and unique syllable types in a study of naturalistic listening in real time compared to phonetic transcription. Syllable inventory size was around twice as large for any occurrence of a syllable assessed by phonetic transcription than naturalistic listening. Frequently occurring syllables (>5) by means of phonetic transcription were generally also listed by means of naturalistic listening in real time. In the next step, Willadsen et al. (Citation2019) utilizing three raters, explored assessment of presence of canonical babbling, CBR, syllable inventory size and unique syllable types, by means of naturalistic listening in real time. They found complete agreement across raters for presence of canonical babbling, high agreement on ranking of CBR (ICC = .95) and syllable inventory size (Pearson’s r = .83) as well as complete agreement among all raters in 78.3% of unique syllable types.

Further attempts to simplify the assessment of pre-linguistic utterances used observation forms. Outcomes such as babbling and consonant production were assessed by Speech and Language Therapists (SLTs) following child interaction in real time during a video recording session (Lieberman & Lohmander, Citation2014). They found complete agreement for presence of canonical babbling but a lower agreement on total number of consonant types, i.e. consonant inventory size. Thus, there is a body of studies supporting the use of observational methods in real time for assessment of pre-linguistic utterances.

Studies of infants during their first year of life have shown a similar consonant inventory across languages (Locke, Citation1983) and consonant inventories are more similar to infants from other language environments than to adults in the infants’ own language environment (Kern & Davis, Citation2009). Cross-linguistic studies using phonetic transcription have included transcribers with and without the same language background as the infants. These studies have supported that transcribers’ language background has not influenced the transcriptions (Lee et al., Citation2010), nor has listeners’ ambient language had any significant effect on identification of the infants’ ambient language (Engstrand et al., Citation2003; Lee et al., Citation2017). It is therefore reasonable to believe that language background will not influence assessment of syllable inventory by naturalistic listening in real time either, as this method is less fine-grained than phonetic transcription.

As naturalistic listening in real time is a new method developed for assessment of pre-linguistic speech in infants, old standards for number of raters and agreement levels developed for phonetic transcription cannot be used. So far, a small scale study has shown good agreement between three raters in assessment of infants with cleft palate by means of naturalistic listening in real time (Willadsen et al., Citation2019), whereas Ramsdell et al. (Citation2012) utilized four raters in their first study of the method. Although Willadsen et al. (Citation2019) showed good agreement between three raters, agreement level varied between the pairs of raters, supporting the use of greater than two raters. Further, utilizing even numbers of raters introduces the need for an arbitrator in the presence of disagreement.

According to the protocol (Shaw et al., Citation2019) naturalistic listening in real time will be used to assess pre-linguistic utterances in 558 participants in an international randomized clinical trial investigating the influence of Timing Of Primary Surgery for Cleft Palate (TOPS), funded by National Institute of Dental and Craniofacial Research (NIDCR). Due to the additional resources required to support four raters, the impact of the fourth rater on speech outcome assessment, as opposed to three, is explored.

This study compared the impact on the assessment of the following pre-linguistic outcomes observed by three raters, as opposed to four, through naturalistic listening in real time: 1) presence or absence of canonical babbling; 2) CBR; 3) size of syllable inventory; and 4) unique syllable types. Further, the study explored the unique syllable types produced in relation to the raters’ language background.

Method

Participants

Thirty-one experienced SLTs, all specializing in cleft palate speech, participated in the study. They were recruited as raters of 12-month-old infants in the TOPS study. The raters were native speakers of one of five languages included in the TOPS trial; Danish, English, Norwegian, Brazilian Portuguese and Swedish. All raters had previously participated in calibration meetings including theoretical and practical lessons on pre-linguistic development in infants with and without cleft palate. This was followed by online training with interactive and/or direct feedback, and thereafter a further three-day calibration meeting including listening exercises individually, in small groups and in plenum. At the end of the calibration meeting, all raters were tested to ensure suitability for participation in this study. First, an analysis of all raters’ data in relation to their language background was performed. This test of inter and intra-rater reliability of the raters identified one outlier who required further training before the final speech assessment in the TOPS project. This rater was therefore removed from the further data set in the present study to ensure that conclusions from this analysis reflect the final assessments the raters will make. This leaves assessment data from 30 raters for the analysis.

Speech samples

The speech samples consisted of four unique video recordings of infants, during a play session with a parent. The speech samples were chosen to represent a child from each of the five participating country in the TOPS trial. To be included the child had to produce at least 80 utterances in the recording. One of the authors reviewed the video recordings before the test took place. Unfortunately, no Brazilian practice recordings fulfilled the requirements of at least 80 utterances. Accordingly, four recordings, one each from Denmark, Norway, Sweden and the United Kingdom were used in the study. The English recording was duplicated and renamed to evaluate intra-rater reliability.

Each 45 minute video recording was edited to be approximately 20 minutes in length. The edited recording started at the time in the recording when the infant was more talkative. All recordings were practice video recordings, meaning that no study TOPS data were used. To enhance quality of the recordings, SLTs who collected video recordings of the infants had participated in a data collection calibration meeting, had received written instructions on the procedure and used the same video recording equipment (JVC-GY-HM100E Solid state) and microphone (Rode NT4).

Procedure

All four video speech samples were reviewed by all 30 raters, with one video being assessed twice. Four raters had recorded one infant each; otherwise the raters were unfamiliar with the infants assessed. Independent assessments were undertaken in silent rooms, at the same place and time, using the same type of laptop (EliteBook 840) and headphones (AKG K271 Mark II). Raters were encouraged to take comfort breaks between each video recording to reduce rater fatigue. The raters listened to an entire speech sample (20 minutes) in real time and were not allowed to take notes. The video was assessed using software, TimeStamper, developed for the TOPS project (Willadsen et al., Citation2018). While watching the video, the rater in real time assessed the infants’ sound production by striking a default key for each canonical syllable and non-canonical syllable, respectively. The program automatically stamps a time marker for each key strike and calculates the CBR (aim 2). The rater is only able to see the last annotation to avoid any influence on the final decision of whether she/he thinks the child was babbling canonically. At the end of the recording, a window pops up asking the rater to select canonical yes/no (aim 1), and then to list the syllables she/he found the child produced with command (aim 3 and 4). ‘With command’ was defined as syllables the rater thought the infant produced systematically, not simply arbitrary motoric variations, that were notable to raters (Willadsen et al., Citation2019).

Speech outcomes

Variation in outcomes assessed by three versus four raters was explored for each of the following four outcomes:

  1. Presence/absence of canonical babbling; an assessment performed at the end of a recording based on their overall impression of the infant’s ability to produce speech like syllables (Willadsen et al., Citation2019).

  2. CBR; generated from the output file from TimeStamper based on the percentage of canonical syllables out of the total number of syllables identified (Willadsen et al., Citation2018).

  3. Size of syllable inventory; total number of different syllable types the rater thought the child produced with command.

  4. Unique syllable types; unique consonants in syllables listed by the rater (e.g. [ba ha]).

  5. Unique syllable types listed across all raters and across raters by language.

Presence/absence of canonical babbling

Each rater determined the presence or absence of canonical babbling and the majority assessment determined the outcome. When there were three raters, there were four possible scenarios, all of which allowed for a majority assessment ():

Table 2. Scenarios with 3 raters

There were five scenarios when a fourth rater was added (). In one of these scenarios, when two raters classified the infant as canonical babbling present and two classified as canonical babbling absent, the majority assessment could not be determined resulting in the need for an arbitrator.

Table 3. Scenarios with 4 raters

When there are 30 speech assessors and three raters are required there are 4060 different rater combinations. To determine the impact of adding a fourth rater, we considered the outcome of adding each of the 27 remaining raters to each combination of three, thus giving a total of 109,620 possible combinations (27*4060).

The difference in proportions of presence/absence of canonical babbling when the fourth rater was added, and the number of combinations that would require an arbitrator, were recorded.

Canonical babbling ratio

The mean CBR was calculated for each of the 4060 combinations of three raters and calculated again once a fourth rater, from the remaining 27 was added.

The absolute difference in means between the 4060 combinations of three and each of the corresponding 27 combinations of four were calculated.

Size of syllable inventory

The mean size of syllable inventory was calculated for each of the 4060 combinations of three raters and then calculated again once a fourth rater, from the remaining 27, was added.

The absolute difference in means between the 4060 combinations of three and each of the corresponding 27 combinations of four were calculated. The number of combinations, when this was greater than or equal to 1, was recorded. A combination with an absolute difference in means greater than or equal to 1 indicates a change in the raters’ outcome when four raters are used.

Syllable types

To determine the impact of adding a fourth rater, we considered the outcome of adding each of the 27 remaining raters’ classification to each combination of three raters.

The number of times that the introduction of a fourth rater resulted in a new sound being identified was recorded. A new sound indicates a change in the rater’ outcome if four raters are used.

Comparisons

Impact of outcome assessment by inclusion of a fourth rater and the resulting need for an arbitrator, as described above, was explored for each of the four unique recordings separately. As recording 2 was assessed twice, the results for each outcome between this recording and its reassessment are also compared.

All analyses were done with SAS software version 9.3 by a suitably qualified statistician.

Cross-linguistic descriptives

Unique syllables types identified by at least 10% of the raters were descriptively analysed for the total group and by raters’ language background.

Results

Presence/absence of canonical babbling

In only one of the four recordings (#4), did a fourth rater have an impact on the conclusion of presence/absence of canonical babbling. For this recording, 25 of the 30 raters classified the infant as ‘Canonical babbling present’ and 5 classified the infant as ‘Canonical babbling absent’. In the 4060 combinations of choosing 3 raters from these 30, there were 250 (6%) occurrences where an extra ‘Canonical babbling present’ would have resulted in the need for an arbitrator and 1500 (37%) occurrences where an extra ‘Canonical babbling absent’ would have resulted in a split, i.e. for 4 of the remaining 27 raters, being added would necessitate an arbitrator. In the remaining 57% combinations, no arbitrator was required.

Canonical babbling ratio

shows the distribution of the mean absolute difference in CBR between 3 and 4 raters and shows the summary statistics of absolute difference for each unique recording. The impact of the fourth rater was minimal, the maximum absolute difference of the mean CBR across recordings varied between from 0.06 to 0.13 ().

Table 4. Summary statistics of the mean absolute difference in canonical babbling ratio between 3 and 4 raters

Figure 1. Distribution of the mean absolute difference in canonical babbling ratio (CBR) between 3 and 4 raters by recording 1 to 4 (top to bottom)

Figure 1. Distribution of the mean absolute difference in canonical babbling ratio (CBR) between 3 and 4 raters by recording 1 to 4 (top to bottom)

Size of syllable inventory

No key differences were observed between three or four raters as shown by the count of absolute differences being greater than or equal to 1 (column 8 in ). Recording four showed some impact of the fourth rater on the results; however, the impact was only noticeable <3% of the time as shown by the count of absolute differences being greater than or equal to 1 column. It then follows that the mean, median and standard deviation vary more than the previous recordings.

Table 5. Summary statistics of distribution of absolute differences in mean consonant inventory between each combination of 3 raters and on introducing a fourth rater

Unique syllable types

shows the distribution of the number of raters that, when added to each combination of three, identified one new sound for each one of the four recordings.

Figure 2. Distribution of the number of times, when adding a fourth rater to each combination of 3 raters, new sounds were identified for recording 1 to 4 (top to bottom)

Figure 2. Distribution of the number of times, when adding a fourth rater to each combination of 3 raters, new sounds were identified for recording 1 to 4 (top to bottom)

The median number of fourth raters identifying at least one new sound on recording 1 was 23 (minimum = 15, maximum = 27), whereas the median number of fourth raters identifying at least one new sound on recording 2 was 15. The minimum number of fourth raters identifying at least one new sound was 3, the maximum being 27; however, these instances both occurred less than 1% of the time. When adding a fourth rater to each combination of 3 on recording 3, the distribution of the number of raters suggests wide disagreement between raters (). The most common number of fourth raters identifying at least one new sound compared to the original group of three was 7. The minimum number of fourth raters identifying at least one new sound was 0, the maximum was 27. For recording 4, the figure suggests wide disagreement among raters. Most commonly, 10 out of the 27 fourth raters identified at least one new sound compared to the original group of 3. The minimum number of fourth raters identifying at least one new sound was 0, the maximum being 27 and the median 17; provides further details.

Table 6. Summary statistics for the number of times the introduction of a fourth rater (27 possibilities for each combination) identified new sounds

Unique syllable types listed across raters and across raters by language

Unique syllable types identified by at least 10% of the raters are presented in for recordings #1–4, both overall and by the raters’ language background. The more commonly a unique syllable was assessed, the more raters agreed across language backgrounds. On more uncommon unique syllables there were some differences across raters, but no systematic differences related to language background were observed.

Table 7. Unique syllable types for recording 1 listed across all raters and across language by rater. Syllables identified by at least 10% of the 30 raters are included. Syllables are listed in order of highest to lowest frequency

Table 8. Unique syllable types for recording 2 listed across all raters and across language by rater. Syllables identified by at least 10% of the 30 raters are included. Syllables are listen in order of highest to lowest frequency

Table 9. Unique syllable types for recording 3 across all raters and across language by rater. Syllables identified by at least 10% of the 30 raters are included. Syllables are listen in order of highest to lowest frequency

Table 10. Unique syllable types for recording 4 listed across all raters and across language by rater. Syllables identified by at least 10% of the 30 raters are included. Syllables are listed in order of most identified to least

Duplicated assessment

Adding a fourth rater did not change the result of presence/absence of canonical babbling in any of the 4060 combinations.

The impact of the fourth rater was minimal on CBR (). The maximum absolute difference of 0.06 differed by 0.01 from the repeated recording (0.07), again indicating consistency in the repeated assessments made. No key differences between three of four raters was found on size of syllable inventory as shown by the count of absolute differences being greater than or equal to one (). Also, the duplicated assessment of type of consonants showed consistency with the assessment of recording 2 ().

Table 11. The duplicated result on canonical babbling ratio (CBR), size of syllable inventory (Cons invent) and type of consonants (Cons type) in recording 2

Discussion

The purpose of the present study was to examine the impact on pre-linguistic outcomes of 12-month-old infants assessed by four raters as opposed to three. The focus of the present study was not to examine how well the raters agreed but rather to determine to which degree the outcome assessment would be affected if a fourth rater was added.

SLT students are able to learn naturalistic assessment in real time and assess infants with the same native language with unilateral cleft lip and palate reliably (Willadsen et al., Citation2019). This present study expands the number of raters from three to 30 and explores the complexity of raters speaking five different languages. For each of the pre-linguistic outcomes, adding a fourth rater did not affect outcome classification. Specifically, adding a fourth rater had a minimal impact on CBR. Presence/absence of canonical babbling and size of syllable inventory showed a negligible impact on three out of four recordings. In one recording (#4), size of syllable inventory was affected minimally as well as presence/absence of canonical babbling by adding a fourth rater. For the assessment of presence/absence of canonical babbling, 24 fourth raters changed the result from a majority agreement to a split decision in 6% of the combinations, and in 37% of the combinations, four of the fourth raters changed the results from a majority agreement to a split decision. In both scenarios, this would have introduced the need for an arbitrator. Thus, the majority of these discrepancies came from a minority of the raters.

Perceptual assessment is more challenging in some cases, which might explain some of the discrepancies. Child #4 had a relatively low CBR (mean 0.22), which is close to the often cited cut-off limit for canonical babbling of 0.15 (Oller et al., Citation1994). Ramsdell et al. (Citation2007) examined the influence of canonicity and rater confidence on inter-rater agreement and found a strong relationship between the two and transcription agreement. Specifically, canonicity was a significant factor in predicting inter-rater reliability. Accordingly, the high frequency of non-canonical syllables in recording #4 may have contributed to the discrepancies in assessments for this infant. Furthermore, many raters had listed this speech sample (#4) as including glottal stops and glottal fricatives, speech sounds reported to decrease inter-rater reliability (Vihman & Greenlee, Citation1987; Willadsen & Albrechtsen, Citation2006). Although glottal syllables were listed as syllable types by the raters, they are not canonical syllables as there is no formant transition between the consonant and the vowel (Oller, Citation2000). Accordingly, they were not counted as canonical syllables, which may have had an impact on the ratings. Although all raters had taken part in extensive training, it is evident that disagreements may arise, especially in cases where an infant has less advanced babbling including a relatively low CBR, and use immature articulations such as glottal stops.

The variable unique syllable types differ from the others as the raters (after listening to the video recording) had to list the syllables they thought the child produced with control. This means a high number of possible sounds. Thus, it is not surprising to find a higher variation on this variable. Further, in naturalistic listening in real-time memory constraints, what stays memorable, may differ between raters; a perfect agreement will never occur consequently. The cross-linguistic comparison of raters’ unique syllable types gave no indication of any systematic differences between raters due to different language background. This is in line with earlier reported good inter language transcriber agreement of consonants in babbling suggesting no native language transcription bias (Lee et al., Citation2010).

Earlier studies have described that the number of syllable types is much higher by means of phonetic transcription than by means of naturalistic listening (Ramsdell et al., Citation2012; Willadsen et al., Citation2019). However, raters generally listed syllable types occurring >5 times in phonetic transcription by naturalistic listening in real time (Willadsen et al., Citation2019). Thus, the majority of the most common syllable types is expectedly represented by means of naturalistic listening in real time. As the differences in number of consonant types by adding a fourth rater only impacted results minimally, there is no need for four raters in future studies.

Intra-rater reliability

One repeated assessment was done. Consistency between repeated assessments was good, but a limitation was that the re-assessment was performed on the same day. That may have influenced size of syllable inventory, unique syllable types and presence/absence of canonical babbling, but not CBR as it is impossible to remember this type of information (a mean of 200 syllables). Furthermore, three different ratings were performed between the first and second listening of the duplicated recording, to reduce effects of remembering syllable inventory.

Strengths and limitations

In summary, to study if three or four raters should be included in future ratings of pre-linguistic assessment of utterances in 12-month-old infants, several actions were completed to rule out biases relating to the observational process. The raters were well calibrated and had to pass a certain level of reliability to be included in the study. The coding procedure was well structured with written rules on how to perform the assessments. Software specifically developed for this study was used together with the same type of laptops and headphones for all participants. The raters did the ratings at the same time in the same venue and the video recordings were performed with the same recording equipment and were done after calibration of the recording personnel.

Although the agreement was high among infants deemed to be canonical, we cannot assume that it is possible to generalize the outcome to non-canonical infants. Thus, it is a limitation that no speech sample with a clearly non-canonical infant was included. Further, as we plan to assess five languages in the TOPS trial, it had been an advantage to include five languages in the speech samples.

This study adds to the knowledge base of how pre-linguistic outcomes are affected by number of raters in cross-linguistic assessment of 12-month-old infants’ babbling. Future studies should continue to explore if the results are applicable on pre-linguistic vocalizations of non-canonical infants and on younger infants. Further development of how naturalistic listening in real time can be used in a valid and reliable manner in clinic is warranted.

Conclusions

In conclusion, the changes in outcome classification by adding a fourth rater did not support the additional time and resources that would be associated with a fourth assessment.

Disclosure of interest

The authors report no conflict of interest. This publication was made possible by Grants Number R21DE15128, U01DE018664 and U01DE018837 from the National Institute of Dental and Craniofacial Research (NIDCR). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIDCR.

Acknowledgments

The authors acknowledge the SLTs in our partner centres in Brazil, Denmark, Norway, Sweden and UK for participating in calibration of data recording and assessment. Specifically we would like to thank Prof Kevin Munro for assistance with the manuscript and the SLTs who performed the assessments: Liz Albery, Silvia Helena Alvarez Piazentin-Penna, Malin Appelqvist, Ragnhild Aukner, Pia Bodling, Joan Bogh Nielsen, Melanie Bowden, Karin Brunnegård, Haline Coracine Miguel, Line Dahl Jorgensen, Josefine Enfält, Ana Paula Fukushiro, Cristina Guedes de Azevedo Bento Goncaleves, Jorunn Lemvik, Louise Leturgie, Eva Liljerehn, Natalie Lodge, Siobhan Mcmahon, Kathryn Patrick, Nina-Helen Pedersen, Ginette Phippen, Liisi Raud Westberg, Lucy Rigby, Anne Roberts, Lucy Smith, Helene Soegaard, Maria Sporre, Ann-Sofie Taleman, Jorid Tangstad, Stephanie Van Eeden, Renata Yamashita

Additional information

Funding

This work was supported by the National Institute of Dental and Craniofacial Research [R21DE15128, U01DE018664 and U01DE018837].

References