References
- Amato, J. M., & Watkins, M. W. (2011). The predictive validity of CBM writing indices for eighth-grade students. The Journal of Special Education, 44(4), 195–204. https://doi.org/https://doi.org/10.1177/0022466909333516
- American Educational Research Association, National Council on Measurement in Education, & American Psychological Association. (2014) . Standards for educational and psychological testing.
- Amorim, E., Cançado, M., & Veloso, A. (2018). Automated essay scoring in the presence of biased ratings. In Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers) (New Orleans, USA: Association for Computational Linguistics), 229–237. https://doi.org/https://doi.org/10.18653/v1/N18-1021
- Betts, J., Reschly, A., Pickart, M., Heistad, D., Sheran, C., & Marston, D. (2008). An examination of predictive bias for second grade reading outcomes from measures of early literacy skills in kindergarten with respect to english-language learners and ethnic subgroups. School Psychology Quarterly, 23(4), 553–570. https://doi.org/https://doi.org/10.1037/1045-3830.23.4.553
- Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational Measurement (pp. 221–256). Praeger.
- Dascălu, M. (2014). Analyzing discourse and text complexity for learning and collaborating (Vol. 534). Springer International Publishing. https://doi.org/https://doi.org/10.1007/978-3-319-03419-5
- Diedenhofen, B., Musch, J., & Olivier, J. (2015). cocor: A comprehensive solution for the statistical comparison of correlations. PLoS ONE, 10(4), e0121945. https://doi.org/https://doi.org/10.1371/journal.pone.0121945
- Espin, C. A., Scierka, B. J., Skare, S., & Halverson, N. (1999). Criterion-related validity of curriculum-based measures in writing for secondary school students. Reading & Writing Quarterly: Overcoming Learning Difficulties, 15(1), 5–27. https://doi.org/https://doi.org/10.1080/105735699278279
- Evans-Hampton, T. N., Skinner, C. H., Henington, C., Sims, S., & McDaniel, C. E. (2002). An investigation of situational bias: conspicuous and covert timing during curriculum-based measurement of mathematics across african american and caucasian students. School Psychology Review, 31(4), 529–539. https://doi.org/https://doi.org/10.1080/02796015.2002.12086172
- Furey, W. M., Marcotte, A. M., Hintze, J. M., & Shackett, C. M. (2016). Concurrent validity and classification accuracy of curriculum-based measurement for written expression. School Psychology Quarterly, 31(3), 369–382. https://doi.org/https://doi.org/10.1037/spq0000138
- Graham, S., Hebert, M., Sandbank, M. P., & Harris, K. R. (2016). Assessing the writing achievement of young struggling writers: Application of generalizability theory. Learning Disability Quarterly, 39(2), 72–82. https://doi.org/https://doi.org/10.1177/0731948714555019
- Han, K., Colarelli, S. M., & Weed, N. C. (2019). Methodological and statistical advances in the consideration of cultural diversity in assessment: A critical review of group classification and measurement invariance testing. Psychological Assessment, 31(12), 1481–1496. https://doi.org/https://doi.org/10.1037/pas0000731
- Hanushek, E. A., Kain, J. F., & Rivkin, S. G. (2002). Inferring program effects for special populations: Does special education raise achievement for students with disabilities? Review of Economics and Statistics, 84(4), 584–599. https://doi.org/https://doi.org/10.1162/003465302760556431
- Heymans, M., & Eekhout, I. (2021). Psfmi. Prediction model selection and performance evaluation in multiple imputed datasets [ Computer software]. https://mwheymans.github.io/psfmi/
- Hosp, J. L., Hosp, M. A., & Dol, J. K. (2011). Potential bias in predictive validity of universal screening measures across disaggregation subgroups. School Psychology Review, 40(1), 108–131. https://doi.org/https://doi.org/10.1080/02796015.2011.12087731
- Kane, M. T. (2013). Validating the Interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/https://doi.org/10.1111/jedm.12000
- Keegan, P. J., Brown, G. T. L., & Hattie, J. A. C. (2013). A psychometric view of sociocultural factors in test validity: The development of standardised test materials for Māori medium schools in New Zealand/Aotearoa. In S. Phillipson, K. Ku, & S. N. Phillipson (Eds.), Constructing educational achievement: A sociocultural perspective (pp. 42–54). Routledge.
- Keller-Margulis, M. A., Mercer, S. H., & Matta, M. (2021). Validity of automated text evaluation tools for written-expression curriculum-based measurement: a comparison study. Reading and Writing, 34(10), 2461–2480. https://doi.org/https://doi.org/10.1007/s11145-021-10153-6
- Keller-Margulis, M. A., Mercer, S. H., & Thomas, E. L. (2016). Generalizability theory reliability of written expression curriculum-based measurement in universal screening. School Psychology Quarterly, 31(3), 383–392. https://doi.org/https://doi.org/10.1037/spq0000126
- Kim, Y.-S. G., Schatschneider, C., Wanzek, J., Gatlin, B., & Al Otaiba, S. (2017). Writing evaluation: Rater and task effects on the reliability of writing scores for children in grades 3 and 4. Reading and Writing: An Interdisciplinary Journal, 30(6), 1287–1310. https://doi.org/https://doi.org/10.1007/s11145-017-9724-6
- Kline, R. B. (2013). Assessing statistical aspects of test fairness with structural equation modelling. Educational Research and Evaluation, 19(2–3), 204–222. https://doi.org/https://doi.org/10.1080/13803611.2013.767624
- Kunnan, A. J. (2000). Fairness and justice for all. In A. J. Kunnan (Ed.), Fairness and validation in language assessment (pp. 1–14). Cambridge University Press.
- Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/https://doi.org/10.2307/2529310
- Licht, C. (2010). New methods for generating significance levels from multiply-imputed data [ Doctoral dissertation, University of Bamberg]. Deutsch National Bibliothek. Marston, D., & Deno, S. (1981). The Reliability of Simple, Direct Measures of Written Expression (Vol. IRLDRR-50). University of Minnesota, Institute for Research on Learning Disabilities.
- McMaster, K. L., & Campbell, H. (2008). New and existing curriculum-based writing measures: Technical features within and across grades. School Psychology Review, 37(4), 550–556. https://doi.org/https://doi.org/10.1080/02796015.2008.12087867
- McMaster, K. L., & Espin, C. (2007). Technical features of curriculum-based measurement in writing: A literature review. The Journal of Special Education, 41(2), 68–84. https://doi.org/https://doi.org/10.1177/00224669070410020301
- McMaster, K. L., Shin, J., Espin, C. A., Jung, P.-G., Wayman, M. M., & Deno, S. L. (2017). Monitoring elementary students’ writing progress using curriculum-based measures: Grade and gender differences. Reading and Writing, 30(9), 2069–2091. https://doi.org/https://doi.org/10.1007/s11145-017-9766-9
- McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Automated evaluation of text and discourse with coh-metrix. Cambridge University Press.
- Mercer, S. H. (2020). writeAlizer: Generate predicted writing quality and written expression CBM scores (Version 1.2.0) [ Computer software]. https://github.com/shmercer/writeAlizer/
- Mercer, S. H., Keller-Margulis, M. A., Faith, E. L., Reid, E. K., & Ochs, S. The potential for automated text evaluation to improve the technical adequacy of written expression curriculum-based measurement. (2019). Learning Disability Quarterly, 42(2), 117–128. 803296. https://doi.org/https://doi.org/10.1177/0731948718
- Muthén, L. K., & Muthén, B. O. (1998). Mplus user’s guide (Eighth Edition ed.). Muthén & Muthén.
- Page, E. B. (2003). Project essay grade: PEG. In M. D. Shermis & J. C. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43–54). Erlbaum.
- Persky, H. R., Daane, M. C., & Jin, Y. (2003). The nation’s report card: Writing 2002. (NCES 2003–529). U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics.
- Ritchey, K. D., & Coker, D. L. (2013). An investigation of the validity and utility of two curriculum-based measurement writing tasks. Reading & Writing Quarterly, 29(1), 89–119. https://doi.org/https://doi.org/10.1080/10573569.2013.741957
- Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisaceck, F., Sanchez, J. C., & Müller, M. (2011). PROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(1), 1–8. https://doi.org/https://doi.org/10.1186/1471-2105-12-77
- Robitzsch, A., & Grund, S. (2021). miceadds: Some additional multiple imputation functions, especially for “mice” (R package version 3.11-6) [Computer software]. https://CRAN.R-project.org/package=miceadds
- Romig, J. E., Therrien, W. J., & Lloyd, J. W. (2017). Meta-analysis of criterion validity for curriculum-based measurement in written language. The Journal of Special Education, 51(2), 72–82. https://doi.org/https://doi.org/10.1177/0022466916670637
- RStudio Team. (2020). RStudio: Integrated development for R [Computer software]. RStudio, PBC. http://www.rstudio.com/
- Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. SAGE Publications.
- Skiba, R. J., Artiles, A. J., Kozleski, E. B., Losen, D. J., & Harry, E. G. (2016). Risks and consequences of oversimplifying educational inequities: A response to Morgan et al. Educational Researcher, 45(3), 221–225. 2015. https://doi.org/https://doi.org/10.3102/0013189X16644606
- Skiba, R. J., Simmons, A. B., Ritter, S., Gibb, A. C., Rausch, M. K., Cuadrado, J., & Chung, C.-G. (2008). Achieving equity in special education: History, status, and current challenges. Exceptional Children, 74(3), 264–288. https://doi.org/https://doi.org/10.1177/001440290807400301
- Smolkowski, K., Cummings, K. D., & Strycker, L. (2016). An introduction to the statistical evaluation of fluency measures with signal detection theory. In K. D. Cummings & Y. Petscher (Eds.), The fluency construct: Curriculum-based measurement concepts and applications (pp. 187–221). Springer. https://doi.org/https://doi.org/10.1007/978-1-4939-2803-3_8
- Texas Education Agency (2013). Standard setting technical report.
- van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/https://doi.org/10.18637/jss.v045.i03
- Warne, R. T., Yoon, M., & Price, C. J. (2014). Exploring the various interpretations of “test bias”. Cultural Diversity & Ethnic Minority Psychology, 20(4), 570–582. https://doi.org/https://doi.org/10.1037/a0036503
- Weissenburger, J. W., & Espin, C. A. (2005). Curriculum-based measures of writing across grade levels. Journal of School Psychology, 43(2), 153–169. https://doi.org/https://doi.org/10.1016/j.jsp.2005.03.002
- Wilson, J., Olinghouse, N. G., McCoach, D. B., Santangelo, T., & Andrada, G. N. (2016). Comparing the accuracy of different scoring methods for identifying sixth graders at risk of failing a state writing assessment. Assessing Writing, 27, 11–23. https://doi.org/https://doi.org/10.1016/j.asw.2015.06.003
- Wilson, J., Roscoe, R., & Ahmed, Y. (2017). Automated formative writing assessment using a levels of language framework. Assessing Writing, 34, 16–36. https://doi.org/https://doi.org/10.1016/j.asw.2017.08.002
- Xu, Y., & Drame, E. R. (2008). Examining sociocultural factors in response to intervention models. Childhood Education, 85(1), 26–32. https://doi.org/https://doi.org/10.1080/00094056.2008.10523053