692
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

Exploring the Factor Structure of a K–12 English Language Proficiency Assessment

ORCID Icon, , &
Pages 130-149 | Published online: 01 Mar 2018
 

ABSTRACT

In this study we investigated the internal factor structure of a large-scale K–12 assessment of English language proficiency (ELP) using samples of fourth- and eighth-grade English learners (ELs) in one state. While U.S. schools are mandated to measure students’ ELP in four language domains (listening, reading, speaking, and writing), some ELP standards released recently have defined ELP on the basis of integrated modalities, such as receptive language or collaborative communication. To explore whether current assessments can empirically support new conceptualizations such as these, we compared seven models based on different hypothesized structures for language proficiency. For the Grade 8 students, we find support for a hierarchical factor model, with general language underlying the four domains. A model with the four domains offered the best fit for the Grade 4 sample but fell just shy of criteria for acceptable fit. Models that incorporate more specific higher-order modalities, such as literacy or productive language, functioned less well for the given data of Grades 4 and 8 samples, suggesting the current shift in ELP definition may require shifts in how ELP assessments are built and scored.

Notes

1 Please note that the data and the analyses for this study were collected and conducted while NCLB was the federal law.

2 We note that blueprints are not exhaustive information about test content and that some developers may have plans for how to ensure that domain-built tests can still address integrated skills.

3 One exception is a study by Römhild, Kenyon, and MacGregor (Citation2011), although it is important to note that their studies of dimensionality were within domain subtests, rather than across domains.

6 Work has also been done to make explicit connections between updated ELP standards and the language demands present in CCR standards (Chi, Garcia, Surber, & Trautman, Citation2011; Council of Chief State School Officers, Citation2012).

7 The authors are unable to reveal the name of the assessment, because this would violate our data-sharing agreement with the state whose data we used in our analysis.

8 Our motivation for this sampling was to remove students for whom factors other than language proficiency may have affected their assessment performance. We removed students with missing content achievement data because the current study is part of a larger inquiry in which content achievement scores were also used. We acknowledge that these decisions narrow the scope of our inferences to only students with similar profiles to those in our sample.

9 In terms of missing data, we note that all Grade 4 students had complete response data, while roughly 10% of Grade 8 students were missing responses to at least one item. These 90 students had 48 missing data patterns among them; among these patterns, only two were observed across more than 5 students (6 and 16 students). These two most often repeated patterns involved missing two and three items, respectively, and the remaining 46 missing data patterns were observed across only 1 or 2 students in the majority of cases. On the basis of this analysis of missing data, we felt justified using WLSMV as an estimation procedure, particularly because all MPlus models are estimated by default using missing data theory and all available data (Muthén & Muthén, Citation1998).

10 We also consulted the weighted root mean-square residual (WRMR), a residual-based fit index where a value <1 indicates acceptable fit, but because of the relatively experimental nature of this index, we do not report those results here. The estimated values can be made available on request; in general, however, we note that none of the models differed from one another in their WRMR performance relative to the criterion, meaning the use of this criterion would not have affected our model choice or interpretation.

11 Because these analyses also used categorical item level data, we used the WLSMV estimation method here as well.

12 All models also exceeded the proposed cutoff of 1.0 for the WRMR criterion, suggesting that some items might be particularly ill fitting, as evidenced by problematically large residuals.

13 As in the Grade 4 sample, all of the Grade 8 models exceeded the proposed WRMR cutoff of 1.0, again suggesting large residuals for some or all items.

14 All models in both grade levels also fell short on the experimental WRMR index, suggesting enough items with poor fits (i.e., large residuals) to create overall fit problems.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 232.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.