4,205
Views
5
CrossRef citations to date
0
Altmetric
Original Articles

How we measure language skills of children at scale: A call to move beyond domain-specific tests as a proxy for language

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon

Abstract

Purpose

The aim of this research note is to encourage child language researchers and clinicians to give careful consideration to the use of domain-specific tests as a proxy for language; particularly in the context of large-scale studies and for the identification of language disorder in clinical practice.

Method

We report on data leveraged through the prospective Raine Study cohort. Participants included 1626 children aged 10 years (n = 104 with developmental language disorder [DLD] and n = 1522 without DLD). We assessed the predictive utility of common language measures including subtests of a standardised omnibus language assessment, non-verbal intelligence, and a domain-specific receptive vocabulary test.

Result

Children with DLD performed within the average range on a measure of non-verbal intelligence (z = −0.86) and receptive vocabulary (z = −0.38), as well as two out of the six subtests on the omnibus language assessment (zs > −1.50). The magnitude of the predictive relationship between language assessments and the likelihood of a child meeting criteria for DLD at 10 years was assessed using a logistic regression model, which was significant: χ2(8) = 16.91, p = 0.031. Semantic Relationships (OR = 1.13, CI = 1.04 − 1.23, p = .004), Formulated Sentences (OR = 1.07, CI = 1.01 − 1.13, p = .028), Recalling Sentences (OR = 1.20, CI = 1.15 − 1.26, p < .001), and Sentence Assembly (OR = 1.17, CI = 1.07 − 1.30, p = .001) were significant predictors of DLD.

Conclusion

Domain-specific language assessments, particularly those testing receptive vocabulary, may overestimate the language ability of children with DLD. Caution is urged when using such tests by clinicians and researchers, especially those measuring language skills of children at scale. Future directions for measuring the functional impact of DLD are presented.

Introduction

Developmental language disorder (DLD) is a common childhood condition, which affects children’s ability to understand and use language compared to typically developing peers (Bishop et al., Citation2016; Bishop et al., Citation2017). Difficulties learning language can have a significant impact on overall development, and a range of difficulties persist well into adolescence and adulthood (Law et al., Citation2009). Recent international consensus efforts (Bishop et al., Citation2016, Citation2017) and updates to international classifications for disability (i.e. Diagnostic and Statistical Manual [DSM]-5; American Psychological Association [APA], Citation2013; International Classification of Diseases [ICD]-11; World Health Organisation [WHO], 2019) have outlined criteria for establishing a diagnosis of DLD. Briefly, DLD is diagnosed when language difficulties are persistent across language modalities (i.e. written and oral language) and into later development, cause a significant functional impact on communication and/or learning in everyday life, the deficits are not explained by another biomedical condition (such as autism, intellectual disability, or acquired brain injury), and the onset of symptoms occurs in early childhood (APA, Citation2013; Bishop et al., Citation2017; WHO, Citation2019).

Recency in the widespread application of DLD as a label for such childhood language disorders through international consensus (Bishop et al., Citation2017) should be noted, particularly in relation to previous labels, such as specific language impairment (SLI). The distinction between SLI and DLD is that the label of SLI was applied when children demonstrated low language in the presence of average non-verbal intelligence (i.e. standard score ≥85), whereas DLD is applicable to children who demonstrate low average non-verbal intelligence (i.e. standard score ≥70). As such, every child who meets criteria for SLI would meet criteria for DLD, but not every child who meets criteria for DLD would meet criteria for SLI. A preference for DLD has been adopted, as empirical evidence does not support the use of cognitive referencing to distinguish children with low language who have average non-verbal intelligence (i.e. SLI and DLD) from those who have low average non-verbal intelligence (i.e. DLD only). Specifically, evidence has indicated there is no difference in the functional impact and educational attainment of DLD with average or low average non-verbal intelligence or responsiveness to intervention (see Bishop et al. [Citation2016] for a summary). As such, when discussing previous research on childhood language disorders such as SLI and DLD, we will refer to DLD and highlight the use of SLI where relevant.

Persistent difficulties across language modalities are typically assessed using a battery of standardised tests to determine DLD diagnosis. For example, DLD may be identified when children do not meet criterion on two out of five language composite scores (typically expressive and receptive vocabulary, expressive and receptive grammar, and a measure of discourse abilities) in consideration of a measure of non-verbal intelligence (APA, Citation2013; Norbury et al., Citation2016; Tomblin et al., Citation1997; WHO, Citation2019). However, studies have varied in the cut-off scores used to indicate criterion on standardised tests. Earlier studies using SLI terminology (Tomblin et al., Citation1997) applied a −1.25 standard deviation (i.e. standard score ≤83) cut-off on standardised language tests, and non-verbal intelligence within 1.00 standard deviation (i.e. standard score ≥85) of the norm-referenced sample. More recently, researchers (e.g. Norbury et al., Citation2016) applied a more stringent cut-off of −1.50 standard deviation (i.e. standard score ≤78) on standardised language tests, alongside non-verbal intelligence within 2.00 standard deviations (i.e. standard score ≥70) of the norm-reference sample. Norbury et al. (Citation2016) illustrated that applying a discrepancy in language and cognitive abilities (i.e. Tomblin et al., Citation1997) is not clinically meaningful by comparing the sample of children meeting criteria for DLD with those who would meet criteria for SLI (4.8% in the UK sample) on indicators of social, emotional, and behavioural problems, and academic attainment, finding no differences between when respective criteria were applied. Beyond preliminary diagnosis, evidence of the functional impact of DLD was also collected through a range of sources, including standardised tests alongside self-, parent-, and teacher-report measures of perceived difficulty or capacity.

Researchers and clinicians may face challenges when selecting assessment tools in the diagnostic processes for DLD. A recent systematic review has indicated a dearth of evidence supporting a single standardised test for diagnostic purposes for children 4–12 years with limitations in all assessments regarding psychometric quality (Denman et al., Citation2017). Particularly, few studies reported evidence of reliability and content validity, and no studies reported evidence of structural validity, internal consistency, or error measurement of assessments commonly used to diagnose child language disorder. Denman et al. (Citation2017) concluded that out of 15 assessments identified in the review, four tools, the Assessment of Literacy and Language (ALL; Lombardino et al., Citation2005), the Clinical Evaluation of Language Fundamentals-5th Edition (CELF-5; Wiig et al., Citation2013), the Clinical Evaluation of Language Fundamentals-Preschool 2nd Edition (CELF-P2; Wiig et al., Citation2004), and the Preschool Language Scales-5th Edition (PLS-5; Zimmerman et al., Citation2011), demonstrated the most robust psychometrics and were recommended for use.

Despite the evidence of long-term impacts associated with DLD (e.g. Law et al., Citation2009), there remains a scarcity of epidemiological studies investigating the prevalence of the childhood disorder. One US-based study investigated the prevalence of DLD (aka SLI) with monolingual 5-year-old children (n = 2009; Tomblin et al., Citation1997). The standardised language battery used for these studies included selected subtests from the Test of Language Development-2: Primary (TOLD-2:P; Newcomer & Hammill, 1998), as well as a narrative task that included measures of both narrative comprehension and production. More recently, Norbury et al. (Citation2016) determined the prevalence of DLD in the UK with 4- to 5-year-olds (n = 529). The prevalence was estimated at 7.58% using contemporary DLD diagnostic criteria (i.e. −1.50 standard deviation cut-off on language tests and non-verbal intelligence within 2.00 standard deviations of the population mean). The standardised language battery for this study included measures of expressive and receptive vocabulary (Brownell, Citation2010), receptive (Bishop, Citation2003) and expressive grammar (Marinis et al., Citation2011), and narrative comprehension and production (Adams et al., Citation2001).

The prevalence of DLD in an Australian population was recently estimated at 6.4% in middle childhood (10 years; n = 1626) using the Raine Study longitudinal data (Calder et al., Citation2022). This study also applied contemporary DLD diagnostic criteria. The proportion of children meeting criteria for DLD was determined by using scores from the Clinical Evaluation of Language Fundamentals Third Edition (CELF-3;Semel et al., Citation1995) to measure language functioning and the Raven’s Coloured Progressive Matrices (RCPM; Raven, Citation1977) to measure non-verbal intelligence. The CELF-3 provides an aggregated language index (Total Language Score), a Receptive Language Index, and an Expressive Language Index. The Receptive Language Index is derived from three subtests: Concepts and Following Directions, Word Classes, and Semantic Relationships. The Expressive Language Index is derived from three subtests: Formulated Sentences, Recalling Sentences, and Sentence Assembly. Since the CELF-3 was not normed on Australian children, raw scores were converted to z-scores to identify children that were 1.50 standard deviations or below the sample population mean on the Total Language Score. The mean z-score for children identified for DLD was 2.03, indicating most children performed well below the population mean on the omnibus language measure.

Estimates for low language ability have also been demonstrated in the Early Language in Victoria Study (ELVS; Eadie et al., Citation2021; McKean et al., Citation2017; Reilly et al., Citation2010). Low language was determined using the CELF-P2 (Wiig et al., Citation2004) and Clinical Evaluation of Language Fundamentals Fourth Edition (CELF-4; Semel et al., Citation2003) with a −1.25 standard deviation cut-off (i.e. standard score ≤81) at all 4-, 7-, and 11-year follow-ups. At 4 years (n = 1596), 20.6% of children were deemed to have low language, and 17.2% were identified to meet diagnostic criteria for SLI (Reilly et al., Citation2010). At 7 years (n = 1204), 19% of children were found to have low language, with no mention of DLD or SLI (McKean et al., Citation2017). At 11 years (n = 839), 16.9% of children were identified to present with low language (again, with no specific reference to DLD or SLI; Eadie et al., Citation2021).

Of interest, other Australian longitudinal studies that have evaluated developmental language outcomes have used a vocabulary comprehension measure as a proxy for overall language ability. For example, the language abilities of children have been evaluated using data from the Longitudinal Study of Australian Children (LSAC; Harrison & McLeod, Citation2010; McLeod & Harrison, Citation2009; Zubrick et al., Citation2015). McLeod and Harrison (Citation2009) assessed the prevalence of speech and language impairment in children aged 4–5 years (n = 4983). The study drew upon various metrics to determine the proportion of children with speech and language impairment, including the Adapted Peabody Picture Vocabulary Test-3 (PPVT-3; Rothman, Citation2003). Findings indicated that 13.0% of children were 1–2 standard deviations below the mean and 1.7% were 2.0 standard deviations below the mean on the PPVT-3. In comparison, 9.5% had parents that were concerned about how their child understood language, and 16.9% were considered less competent than their peers in receptive language ability as judged by teachers. The implication being multiple indicators of language impairment across a range of contexts indicated high prevalence in early childhood.

The LSAC has also been leveraged to report risk and protective factors for 4–5-year-old children with speech and language impairment (Harrison & McLeod, Citation2010). Risk factors for low score on the PPVT-3 (i.e. ≥1.00 standard deviation below the mean) as a proxy for language included being male, reactive temperament, and parents who spoke a language other than English, whereas protective factors included social and persistent temperament, maternal psychological well-being, and support for children’s learning at home. Findings suggest that these risk and protective factors can be used to identify children for receipt of early intervention programs. Of note, Harrison and McLeod's (Citation2010) inclusion of bilingual children in their analysis enriched the interpretation of potential risk and protective factors relative to previous research; however, the caveat of deriving scores from norm-referenced assessment, which may underestimate bilingual language ability, was not apparent.

While some studies from the LSAC have specifically investigated receptive vocabulary skills of children (e.g. Taylor et al., Citation2013), Zubrick et al. (Citation2015) investigated the patterns of stability between language (as measured by the PPVT-3) and literacy in children aged 4–10 years (n = 2792). Results indicated that of the total sample, 69% demonstrated that middle-high vocabulary at 4–10 years progressed to middle-high literacy ability at 10 years. Conversely, only 26 children (less than 1%) demonstrated persistent low vocabulary, which progressed to low literacy in later childhood. This highlights that children’s progress from oral to literate language is not stable and predictable.

Even as the Peabody Picture Vocabulary Test (PPVT) has featured heavily in the context of an Australian longitudinal study, its use to identify child language difficulties has been discouraged (Dunn & Dunn, Citation1981). The reduced sensitivity of the PPVT to the presence of language functioning that is below average range is evident across multiple studies of preschool-aged children with and without diagnosed DLD (e.g. Jackson et al., Citation2019; Yarian et al., Citation2021). Indeed, researchers have suggested that the PPVT, including more recent versions, should not be used to determine eligibility for the criteria for language impairment (Spaulding et al., Citation2006). As to why the PPVT is associated with limited sensitivity to DLD, it is possible that its focus on single word recognition, in the absence of other critical aspects of language function, is of relevance (Frizelle et al., Citation2019). For instance, the measure does not assess known areas of deficit, such as phonological processing (Pennington & Bishop, Citation2009) and morphosyntax (Rice et al., Citation1998). Consequently, the PPVT provides very little insight into a child’s language capacity beyond their ability to match picture stimulus to spoken lexical items with three distractors. This may explain the nonstable trajectory of children with low vocabulary reported by Zubrick et al. (Citation2015), especially when the experiential dependence of vocabulary acquisition in schooling years is considered. That is, children with low vocabulary may demonstrate marked growth once they have broadened their experiences at school.

So far, we have highlighted research that indicates the use of domain-specific tests for measuring language functioning, and that children with DLD may perform within the average range on such tests, including the PPVT. Surveys indicate that the use of such tests is widespread in clinical practice (e.g. Caesar & Kohler, Citation2009), even for the purpose of diagnosing DLD (aka SLI; Betz et al., Citation2013). Spaulding et al. (Citation2013) investigated the viability of the PPVT for diagnosing DLD and found that although children with DLD scored lower than typically developing peers, results from discriminant analyses indicated a standard score cut-off of 103 to reliably distinguish between groups of children. Therefore, children with DLD are likely to perform within the average range on the test, which may potentially overestimate their general language functioning. In fact, the use of standardised assessments alone to determine diagnosis is likely to overestimate the functional abilities of children with DLD, highlighting the importance of considering the impact to everyday functioning resulting from language disorder (Bishop et al., Citation2016, Citation2017).

The current study

The aim of this research note is to encourage researchers and clinicians to reflect on their use of domain-specific tests as proxy for overall language skills; particularly, researchers who assess language in large-scale studies and clinicians who assess language to identify DLD. We report on additional analyses conducted as part of a prevalence study using the Raine Study data (Calder et al., Citation2022). Specifically, our primary objective was to investigate the performance of a cohort of children with DLD (as identified using an omnibus language assessment) on a measure of receptive vocabulary compared to children without DLD.

Method and result

Participants

Participants included second generation (Gen2) Raine Study participants, which comprised n = 2868 live births at King Edward Memorial Hospital in Perth, Western Australia between 1989 and 1991. Inclusion criteria for the Raine Study were expecting mothers with a gestational age of 16–20 weeks, English proficiency to communicate with investigators, and residency in Western Australia. Raine Study participants have shown to be representative of the general population at follow-ups (White et al., Citation2017). Data in the current study were analysed from the 10-year (1999–2002) follow-up. Each follow-up was approved by the institutional ethics committee and written informed consent from the participants was obtained for each follow-up. At the 10-year follow-up, complete language data were available for 56.69% (n = 1626) of Gen2 participants.

Frequencies of children with (n = 104) and without (n = 1522) DLD, and demographic information have been rereported from Calder et al. (Citation2022, Table 2, p. 2047) and are presented in . Recruitment and follow-up for the Raine Study were approved by the Human Ethics Committee at King Edward Memorial Hospital. Analysis of existing data was approved by the Raine Study and Curtin University Human Research Ethics Committee (HREC approval number: HRE2021-0117).

Table I. Frequencies of children with and without developmental language disorder and demographic information.

Table II. Participant z-scores on primary variables.

Variables

Meeting criteria for DLD diagnosis was determined using scores from the Clinical Evaluation of Language Fundamentals Third Edition (CELF-3; Semel et al., Citation1995) to measure language functioning and the Raven’s Coloured Progressive Matrices (RCPM; Raven, Citation1977) to measure non-verbal intelligence. The Clinical Evaluation of Language Fundamentals (CELF, now in its fifth edition; Wiig et al., Citation2013) is a standardised omnibus assessment that is widely used for clinical and research purposes due to its sound psychometric properties (Denman et al., Citation2017). The CELF-3 provides an aggregated language index (Total Language Score), a Receptive Language Index, and an Expressive Language Index. The Receptive Language Index is derived from three subtests: Concepts and Following Directions requires children to point to pictures of shapes and items in response to verbal instructions; Word Classes requires children to choose two words out of three or four that are associated; and Semantic Relationships requires children to complete a sentence based on two words or phrases out of four choices that are semantically related. The Expressive Language Index is also derived from three subtests: Formulated Sentences requires children to generate a sentence when provided a word and visual stimulus; Recalling Sentences requires children to repeat verbally presented sentences of increasing complexity; and Sentence Assembly requires children to compose intact sentences based on visually and verbally presented words. Since the CELF-3 was not normed on Australian children, raw scores were converted to z-scores to identify children that were more than 1.50 standard deviations below the population sample mean on the Total Language Score. Scores on Receptive and Expressive Language Indices were converted to z-scores to identify cases of DLD which are characterised by receptive, expressive, or receptive-expressive language difficulties. Likewise, raw scores were converted to z-scores to identify children who were within 2.00 standard deviations of the population on the RCPM Total Score.

The Peabody Picture Vocabulary Test-Revised (PPVT-R; Dunn & Dunn, Citation1981) was used to assess the receptive vocabulary of participants. Since this tool is widely used as a proxy for language in longitudinal studies (e.g. Harrison & McLeod, Citation2010; McLeod & Harrison, Citation2009; Zubrick et al., Citation2015), it was of interest to investigate the DLD cohort’s performance on such a measure compared to children without DLD. In keeping with scores on the CELF-3 and RCPM, raw scores on the test were converted to z-scores.

presents z-scores for indices and subtests on the CELF-3, the RCPM, and the PPVT-R. On average, the children meeting criteria for DLD scored more than 2.00 standard deviations below the population mean on the CELF-3 Total Language Score. Notably, the DLD cohort scored within 1.50 standard deviations on the Word Classes and Sentence Assembly subtests. Further, the DLD cohort scored within 1.00 standard deviation (–0.38) on the PPVT-R, suggesting that this cohort scored well within the average range on this test of receptive vocabulary.

Statistical analysis and results

A logistic regression model was run to determine the magnitude of the predictive relationship between CELF-3 subtests and the PPVT-R, and the likelihood a child met criteria for DLD at 10 years (). Data were analysed using SPSS version 27.

Table III. Binomial logistic regression of language measures as predictors of developmental language disorder at 10 years.

The model was significant, χ2(8) = 15.49, p = 0.05, explained 71.8% of the variance, and correctly identified 93.6% of cases who met criteria for DLD (see ). Semantic Relationships (OR = 0.58, CI = 0.41 − 0.84, p = .004), Formulated Sentences (OR = 0.65, CI = 0.45 − 0.96, p = .028), Recalling Sentences (OR = 0.09, CI = 0.05 − 0.17, p < .001), and Sentence Assembly (OR = 0.48, CI = 0.31 − 0.75, p = .001) were significant predictors of DLD. All other variables were non-significant as predictors for DLD at 10 years.

Discussion and future directions

We identified children who met criteria for DLD at aged 10 if they scored 1.50 standard deviations or below on the CELF-3 Total Language Score. This score is an aggregate measure of language functioning derived from six subtests—three in the receptive modality and three in the expressive modality. We presented mean z-scores on the CELF-3 subtests (), which indicated that the receptive Word Classes (–1.41) and expressive Sentence Assembly (–1.43) subtests fell within 1.5 standard deviations of the population mean for participants meeting criteria for DLD. Word Classes draws upon a child’s vocabulary knowledge and their metalinguistic awareness to identify logical associations between words, whereas Sentence Assembly requires the child to formulate grammatically and semantically correct sentences. Both subtests assess the child’s metalinguistic awareness, suggesting that this may be a relative area of strength in the Raine Study cohort of children meeting criteria for DLD. The mean z-score for the receptive Semantics Relationships (–1.89) subtest was the lowest of all subtests, which also draws upon the child’s ability to correctly interpret semantic meaning relationships and conceptual knowledge of location and temporal ordering. Similarly, children with DLD in the Raine Study cohort presented with marked difficulty on the receptive Concepts and Following Directions (–1.81) subtest. These findings suggest that conceptual knowledge and the ability to follow instructions were likely areas of deficit for these children, which may have translated into profound functional impacts participating within a classroom environment. In terms of identifying children who are likely to meet criteria for DLD, Redmond et al. (Citation2019) recently found a measure of sentence recall (analogous to the Recalling Sentences expressive subtest) to be a valid marker for the screening of childhood language impairment. Indeed, the Raine Study cohort performed well below the population mean on the Recalling Sentences subtest (–1.76), providing further evidence for the utility of a sentence recall task in screening for DLD.

Conversely, although the PPVT as a measure of receptive vocabulary is a widely used proxy for language in longitudinal studies (e.g. McLeod & Harrison, Citation2009; Zubrick et al., Citation2015) and in clinical practice (Caesar & Kohler, Citation2009; Betz et al., Citation2013), children with DLD in the Raine Study cohort did not demonstrate pronounced deficits on this measure, scoring on average within 1.00 standard deviation of the population mean (–0.38). This finding is consistent with performance on the Word Classes subtest also being within the average range, generally, and has important implications for research and clinical practice. Spaulding et al.'s (Citation2006) review of the diagnostic utility of 43 standardised, norm referenced tests designed to assess child language found children meeting criteria for DLD (aka SLI) scored within 1 standard deviation of the normative sample of the PPVT. Recently, Yarian et al. (Citation2021) and Jackson et al. (Citation2019) also demonstrated that although there were significant differences between children with and without DLD on the PPVT, the DLD groups scored well within the average range on the measure. Therefore, relying on the PPVT as a measure of overall language may in fact overestimate children’s language function relative to other available tests. In Denman et al.'s (Citation2017) comprehensive review of psychometric properties of childhood language assessments, the PPVT was excluded since it was not deemed a comprehensive language assessment. Finally, Spaulding et al. (Citation2013) systematically evaluated the diagnostic accuracy of the PPVT, and discriminate analyses indicated a cut-score of 103 (well within the average range) is optimal for identifying DLD (aka SLI). In combination, these findings highlight the need to treat the PPVT as a valid measure of childhood language function discerningly as well as its limitation as a sole index of language function in existing longitudinal cohort studies and in clinical practice.

This research note seeks to urge clinicians and researchers to go beyond domain-specific language assessments when determining the presence of DLD and its functional impact. It is true that we report on the prevalence of DLD in the Raine Study using standardised tests (CELF-3, RCPM), which has encouraged us to call to action the selection of more diverse measures for large population studies. This call to action is particularly relevant for measuring the language skills of older children and adolescents, when children with DLD are likely to reach near equivalent scores on some standardised tests (Spaulding et al., Citation2006), making it challenging to identify the proportion of affected children likely to have ongoing difficulties. Tasks that evaluate grammaticality judgement, for instance, show promise for differentiating children with DLD from those with typical language development, as grammatical judgements reached adult-like asymptote in typically developing 15-year-olds (Rice et al., Citation2009). Further, the time taken to administer tasks such as grammaticality judgement and sentence recall is relatively brief compared to omnibus language assessments, which provides an additional advantage for use when assessing large cohorts.

The measurement of functional impacts in epidemiological studies is also scarce. We acknowledge that measures of the functional impact of language difficulties, an important component of DLD diagnostic criteria (Bishop et al., Citation2017), were not available in the Raine Study dataset. Highlighting the absence of such measures in epidemiological studies creates an impetus for including functional outcomes as part of regular follow-ups, such as evaluating curriculum-based assessments. For example, Norbury et al. (Citation2016) reported evidence of functional impact in educational, social, emotional, and behavioural domains using the Early Years Foundation Stage Profile (Department for Education, Citation2013) to determine a “good level of development” as well as through teacher report using the Strengths and Difficulties Questionnaire (Goodman, Citation1997) to determine if children demonstrated abnormal behaviours. Results indicated 11.80% of children with DLD demonstrated a good level of development compared to 69.59% of children without DLD, and 9.68% of children with DLD demonstrated abnormal behaviour compared to 5.24% of children without DLD. These findings highlight the urgent need to measure functional impacts associated with DLD.

The urgency for measuring functional impact is furthered by recent findings from Duff et al. (Citation2022), who found that despite showing lower academic performance compared to children without DLD, children with DLD exhibited low rates of support services, especially compared to children with dyslexia. Wolf et al. (Citation2022) have also recently argued for a multidimensional approach to identifying and treating language disorder in the school-age years, which considers sound/word and sentence/discourse levels alongside verbal memory to reliably identify children who experience functional impacts associated with language difficulties. Other tools, such as the Focus on the Outcomes of Communication Under Six (FOCUS; Thomas-Stonell et al., Citation2010) may be used to determine how language difficulties affect participation in daily life for preschool-age children. Another example of determining functional impact would be to consider limited academic success within a responsiveness to intervention (RTI) model of service delivery. Within RTI, other curriculum-based assessments could be considered to measure functional impact, as children with DLD will likely experience challenges across curriculum areas (e.g. Duff et al., Citation2022).

Washington (Citation2007) described the use of the WHO's International Classification of Functioning, Disability and Health (ICF; Citation2001) as a multidimensional approach to describe functional impacts associated with DLD (aka SLI) and how the framework can be used to evaluate outcomes for affected children. Cunningham et al. (Citation2017) conducted a systematic review using the WHO ICF-Children and Youth version and found few studies that addressed participation-based outcomes. The authors summarise valid and reliable measures that can be used to evaluate changes in communicative participation, play, and social communication (including the FOCUS). Such tools can be used alongside measures of body structure and function, and activity to provide a greater understanding of the impact of communication difficulties in everyday lives, as well as facilitating goal setting and intervention outcomes.

Finally, consideration of the limitations of standardised language assessment for language learners from culturally and linguistically diverse backgrounds is a priority. Although the majority of participants in the Raine Study identified as Caucasian and speaking English most at home, the results from standardised assessments on culturally and linguistically diverse individuals should be interpreted with caution. In particular, receptive vocabulary measures, such as the PPVT, have shown to inaccurately represent the language abilities of Aboriginal children (Pearce & Flanagan, Citation2019; Zupan et al., Citation2021). Dynamic assessment should also be considered alongside existing assessment tools, which assesses a child’s ability to learn without the need for the child to draw upon previous experience. Hunt et al. (Citation2022) recently conducted a systematic review, which found dynamic assessment may be a suitable and relatively time efficient method of diagnosing DLD in multilingual children, so perhaps this approach should be used in future longitudinal studies when working with children from culturally and linguistically diverse backgrounds.

Limitations

This study reports on Raine Study data, which is a representative prospective pregnancy cohort study (White et al., Citation2017). Limitations to the current study include the proportion of children for whom language data were not available, which may increase the risk of selection bias (although see Calder et al., Citation2022 for tests for systematic differences in missing data). Additionally, measures of functional impact were not available in the Raine Study dataset, which is a limitation to applying contemporary DLD diagnostic criteria to the cohort. Consequently, we highlight potential measures of functional impact that may be included in subsequent follow-ups for Gen3, and future prospective cohort studies. Although the number of plausibly bilingual children (n = 3) with DLD in the Raine Study were few, these cases were tested only on their English skills, highlighting the need for culturally appropriate methods of language testing, such as dynamic assessment. Of particular relevance, the cases of children with DLD who were identified as Aboriginal (n = 2) did not identify as speaking a language other than English, which highlights a potential cultural and linguistic mismatch between ethnicity and languages spoken at home and the need to select culturally appropriate language assessments (Zupan et al., Citation2021), including language sampling (Pearce & Flanagan, Citation2019). Lastly, although the Raine Study has shown to be representative of the Australian population (White et al., Citation2017), participants were recruited from one hospital in Western Australia, which increases the risk of selection bias.

Conclusion

This research note presents evidence that children meeting criteria for DLD at 10 years as determined by performance on a standardised omnibus language assessment performed within the average range on a domain-specific receptive vocabulary assessment. There is ample evidence to suggest that children with DLD perform comparably to their typically developing peers on domain-specific assessments of receptive vocabulary. Consequently, we are at risk of overestimating a child’s language abilities if we conclude capacity based on these assessments alone. As such, both clinicians and researchers should be mindful of using such tests to identify children with, or at risk of, DLD. We have presented evidence from recent studies as considerations for future directions, especially when measuring functional impact and working with multilingual children. Our hope is that we can move beyond using domain-specific language tests as a proxy for language and focus attention on investigating tools that capture the complexities and multifaceted nature of DLD.

Acknowledgements

We would like to acknowledge the Raine Study participants and their families for their ongoing participation, and the Raine Study team for study co-ordination and data collection. We thank the NHMRC for their long-term contribution to funding the study over the last 30 years. The core management of the Raine Study is funded by The University of Western Australia, Curtin University, Telethon Kids Institute, Women and Infants Research Foundation, Edith Cowan University, Murdoch University, The University of Notre Dame Australia, and the Raine Medical Research Foundation.

Declaration of interest

No potential conflict of interest was reported by the author(s).

References

  • Adams, C., Cooke, R., Crutchley, A., Hesketh, A., & Reeves, D. (2001). Assessment of comprehension and expression 6-11.GL assessment. Retrieved from http://www.gl-assessment.co.uk/products/assessment-comprehension-and-expression-6-11
  • American Psychological Association (APA). (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, D.C.: American Psychological Association. doi:10.1176/appi.books.9780890425596
  • Betz, S. K., Eickhoff, J. R., & Sullivan, S. F. (2013). Factors influencing the selection of standardized tests for the diagnosis of specific language impairment. Language, Speech, and Hearing Services in Schools, 44(2), 133–146. doi:10.1044/0161-1461(2012/12-0093)
  • Bishop, D. V., Snowling, M. J., Thompson, P. A., Greenhalgh, T., & Consortium, C. (2016). CATALISE: A multinational and multidisciplinary Delphi consensus study. Identifying language impairments in children. PLoS One, 11(7), e0158753. doi:10.1371/journal.pone.0158753
  • Bishop, D. V., Snowling, M. J., Thompson, P. A., Greenhalgh, T., Catalise‐2 Consortium, Adams, C., … House, A. (2017). Phase 2 of CATALISE: A multinational and multidisciplinary Delphi consensus study of problems with language development: Terminology. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 58(10), 1068–1080. doi:10.1111/jcpp.12721
  • Bishop, D. (2003). Test for reception of Grammar-2. London: The Psychological Corporation.
  • Brownell, R. (2010). Expressive/receptive one word picture vocabulary tests (4th ed.). Cambridge, UK: Pearson.
  • Caesar, L. G., & Kohler, P. D. (2009). Tools clinicians use: A survey of language assessment procedures used by school-based speech-language pathologists. Communication Disorders Quarterly, 30(4), 226–236. doi:10.1177/1525740108326
  • Calder, S. D., Brennan-Jones, C. G., Robinson, M., Whitehouse, A., & Hill, E. (2022). The prevalence of and potential risk factors for Developmental Language Disorder at 10 years in the Raine Study. Journal of Paediatrics and Child Health, 2022, 1–7. doi:10.1111/JPC.16149
  • Cunningham, B. J., Washington, K. N., Binns, A., Rolfe, K., Robertson, B., & Rosenbaum, P. (2017). Current methods of evaluating speech-language outcomes for preschoolers with communication disorders: A scoping review using the ICF-CY. Journal of Speech, Language, and Hearing Research, 60(2), 447–464. doi:10.1044/2016_JSLHR-L-15-0329
  • Denman, D., Speyer, R., Munro, N., Pearce, W. M., Chen, Y. W., & Cordier, R. (2017). Psychometric properties of language assessments for children aged 4–12 years: A systematic review. Frontiers in Psychology, 8, 1515. doi:10.3389/fpsyg.2017.01515
  • Department for Education. (2013). The early years foundation stage profile handbook. London: Department for Education.
  • Duff, D. M., Hendricks, A. E., Fitton, L., & Adlof, S. M. (2022). Reading and math achievement in children with dyslexia, developmental language disorder, or typical development: Achievement gaps persist from second through fourth grades. Journal of Learning Disabilities, 2022, 002221942211055. doi:10.1177/00222194221105515
  • Dunn, L. M., & Dunn, L. M. (1981). Peabody picture vocabulary test–revised. Circle Pines, MN: American Guidance Service.
  • Eadie, P., Bavin, E. L., Bretherton, L., Cook, F., Gold, L., Mensah, F., … Reilly, S. (2021). Predictors in infancy for language and academic outcomes at 11 years. Pediatrics, 147(2), 1712. doi:10.1542/peds.2020-1712
  • Frizelle, P., Thompson, P., Duta, M., & Bishop, D. V. (2019). Assessing children’s understanding of complex syntax: A comparison of two methods. Language Learning, 69(2), 255–291. doi:10.1111/lang.12332
  • Goodman, R. (1997). The strengths and difficulties questionnaire: A research note. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 38(5), 581–586. doi:10.1111/j.1469-7610.1997.tb01545.x
  • Harrison, L. J., & McLeod, S. (2010). Risk and protective factors associated with speech and language impairment in a nationally representative sample of 4-to 5-year-old children. Journal of Speech, Language, and Hearing Research, 53(2), 508–529. doi:10.1044/1092-4388(2009/08-0086)
  • Hunt, E., Nang, C., Meldrum, S., & Armstrong, E. (2022). Can dynamic assessment identify language disorder in multilingual children? Clinical applications from a systematic review. Language, Speech, and Hearing Services in Schools, 53(2), 598–625. doi:10.1044/2021_LSHSS-21-00094
  • Jackson, E., Leitao, S., Claessen, M., & Boyes, M. (2019). Fast mapping short and long words: Examining the influence of phonological short-term memory and receptive vocabulary in children with developmental language disorder. Journal of Communication Disorders, 79, 11–23. doi:10.1016/j.jcomdis.2019.02.001
  • Law, J., Rush, R., Schoon, I., & Parsons, S. (2009). Modeling developmental language difficulties from school entry into adulthood: Literacy, mental health, and employment outcomes. Journal of Speech, Language, and Hearing Research, 52(6), 1401–1416. doi:10.1044/1092-4388(2009/08-0142)
  • Lombardino, L. J., Leiberman, R., & Brown, J. C. (2005). Assessment of Literacy and Language. San Antonio, TX: Pearson Psychcorp.
  • Marinis, T., Chiat, S., Armon-Lotem, S., Piper, J., & Roy, P. (2011). School-age sentence imitation test-E32. Retrieved from http://www.city.ac.uk/health/research/centre-forlanguage-communication-sciences-research/veps-very-earlyprocessing-skills/veps-assessments
  • McKean, C., Reilly, S., Bavin, E. L., Bretherton, L., Cini, E., Conway, L., … Mensah, F. (2017). Language outcomes at 7 years: Early predictors and co-occurring difficulties. Pediatrics, 139(3), 1684. doi:10.1542/peds.2016-1684
  • McLeod, S., & Harrison, L. J. (2009). Epidemiology of speech and language impairment in a nationally representative sample of 4-to 5-year-old children. Journal of Speech, Language, and Hearing Research, 52(5), 1213–1229. doi:10.1044/1092-4388(2009/08-0085)
  • Newcomer, P. L., & Hammill, D. D. (2008). Test of Language Development—Intermediate, 4th Edn. Austin, TX: Pro-Ed.
  • Norbury, C. F., Gooch, D., Wray, C., Baird, G., Charman, T., Simonoff, E., … Pickles, A. (2016). The impact of nonverbal ability on prevalence and clinical presentation of language disorder: Evidence from a population study. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 57(11), 1247–1257. doi:10.1111/jcpp.12573
  • Pearce, W. M., & Flanagan, K. (2019). Language abilities of Indigenous and non-Indigenous Australian children from low socioeconomic backgrounds in their first year of school. International journal of Speech-Language Pathology, 21(2), 212–223. doi:10.1080/17549507.2018.1444091
  • Pennington, B. F., & Bishop, D. V. (2009). Relations among speech, language, and reading disorders. Annual review of Psychology, 60, 283–306. doi:10.1146/annurev.psych.60.110707.163548
  • Raven, J. (1977). Raven’s coloured progressive matrices. London, England: H. K. Lewis.
  • Redmond, S. M., Ash, A. C., Christopulos, T. T., & Pfaff, T. (2019). Diagnostic accuracy of sentence recall and past tense measures for identifying children’s language impairments. Journal of Speech, Language, and Hearing Research, 62(7), 2438–2454. doi:10.1044/2019_jslhr-l-18-0388
  • Reilly, S., Wake, M., Ukoumunne, O. C., Bavin, E., Prior, M., Cini, E., … Bretherton, L. (2010). Predicting language outcomes at 4 years of age: Findings from Early Language in Victoria Study. Pediatrics, 126(6), e1530–e1537. doi:10.1542/peds.2010-0254
  • Rice, M. L., Hoffman, L., & Wexler, K. (2009). Judgments of omitted BE and DO in questions as extended finiteness clinical markers of specific language impairment (SLI) to 15 years: A study of growth and asymptote. Journal of Speech, Language, and Hearing Research, 52(6), 1417–1433. doi:10.1044/1092-4388(2009/08-0171)
  • Rice, M. L., Wexler, K., & Hershberger, S. (1998). Tense over time: The longitudinal course of tense acquisition in children with specific language impairment. Journal of Speech, Language, and Hearing Research, 41(6), 1412–1431. doi:10.1044/jslhr.4106.1412
  • Rothman, S. (2003). An Australian version of the Adapted PPVT-lll for use in research. Melbourne: Australian Council for Educational Research. Growing up in Australia website. Retrieved from: http://www.growingupinaustralia.gov.au/pubs/issues/ip2.pdf.A
  • Semel, E. M., Wiig, E. H., & Secord, W. (1995). CELF-3: Clinical evaluation of language fundamentals. San Antonio: The Psychological Corporation.
  • Semel, E., Wiig, E. H., & Secord, W. A. (2003). Clinical evaluation of language fundamentals (CELF–4th Edition).
  • Spaulding, T. J., Hosmer, S., & Schechtman, C. (2013). Investigating the interchangeability and diagnostic utility of the PPVT-III and PPVT-IV for children with and without SLI. International journal of Speech-Language Pathology, 15(5), 453–462. doi:10.3109/17549507.2012.762042
  • Spaulding, T. J., Plante, E., & Farinella, K. A. (2006). Eligibility criteria for language impairment: Is the low end of normal always appropriate? Language, Speech, and Hearing Services in Schools, 37(1), 61–72. doi:10.1044/0161-1461(2006/007)
  • Taylor, C. L., Christensen, D., Lawrence, D., Mitrou, F., & Zubrick, S. R. (2013). Risk factors for children’s receptive vocabulary development from four to eight years in the Longitudinal Study of Australian Children. PLoS One, 8(9), e73046. doi:10.1371/journal.pone.0073046
  • Thomas-Stonell, N. L., Oddson, B., Robertson, B., & Rosenbaum, P. L. (2010). Development of the FOCUS (Focus on the Outcomes of Communication Under Six), a communication outcome measure for preschool children. Developmental Medicine & Child Neurology, 52(1), 47–53. doi:10.1111/j.1469-8749.2009.03410.x
  • Tomblin, J. B., Records, N. L., Buckwalter, P., Zhang, X., Smith, E., & O’Brien, M. (1997). Prevalence of specific language impairment in kindergarten children. Journal of Speech, Language, and Hearing Research, 40(6), 1245–1260. doi:10.1044/jslhr.4006.1245
  • Washington, K. N. (2007). Using the ICF within speech-language pathology: Application to developmental language impairment. Advances in Speech Language Pathology, 9(3), 242–255. doi:10.1080/14417040701261525
  • White, S. W., Eastwood, P. R., Straker, L. M., Adams, L. A., Newnham, J. P., Lye, S. J., & Pennell, C. E. (2017). The Raine study had no evidence of significant perinatal selection bias after two decades of follow up: A longitudinal pregnancy cohort study. BMC Pregnancy and Childbirth, 17(1), 1–10. doi:10.1186/s12884-017-1391-8
  • Wiig, E. H., Secord, W. A., & Semel, E. (2004). Clinical evaluation of language fundamentals-preschool, 2nd ed. San Antonio, TX: Pearson Psychcorp.
  • Wiig, E. H., Semel, E., & Secord, W. A. (2013). Clinical evaluation of language fundamentals, 5th ed.. Bloomington, MN: Pearson Psychcorp.
  • Wolf Nelson, N. W., Plante, E., Anderson, M., & Applegate, E. B. (2022). The Dimensionality of Language and Literacy in the School-Age Years. Journal of Speech, Language, and Hearing Research, 2022, 1–19. doi:10.1044/2022_JSLHR-21-00534
  • World Health Organization. (2001). International Classification of Functioning, Disability, and Health. World Health Organization, Geneva.
  • World Health Organization. (2019). International statistical classification of diseases and related health problems (11th ed). Geneva: World Health Organization. https://icd.who.int/
  • Yarian, M., Washington, K. N., Spencer, C. E., Vannest, J., & Crowe, K. (2021). Exploring predictors of expressive grammar across different assessment tasks in preschoolers with or without DLD. Communication Disorders Quarterly, 42(2), 111–121. doi:10.1177/1525740119868238
  • Zimmerman, I. L., Steiner, V. G., & Pond, R. E. (2011). Preschool language scales, 5th ed. Minneapolis, MN: Pearson Psychcorp.
  • Zubrick, S. R., Taylor, C. L., & Christensen, D. (2015). Patterns and predictors of language and literacy abilities 4–10 years in the longitudinal study of Australian children. PLoS one, 10(9), e0135612. doi:10.1371/journal.pone.0135612
  • Zupan, B., Campbell‐Woods, N., & Thompson, H. (2021). Scoping review: Language assessment practices for Aboriginal and Torres Strait Islander children in Australia and guidelines for clinical practice. The Australian Journal of Rural Health, 29(6), 879–895. doi:10.1111/ajr.12766