Search in:

Advanced search

Assessment in Education: Principles, Policy & Practice Volume 29, 2022 - Issue 2

Submit an article Journal homepage

394

Views

CrossRef citations to date

Altmetric

Articles

Evaluating validity and bias for hand-calculated and automated written expression curriculum-based measurement scores

Michael Mattaa Department of Psychological, Health & Learning Sciences, University of Houston, Houston, USACorrespondence[email protected]

https://orcid.org/0000-0003-4266-0130 View further author information

Sterett H. Mercerb Department of Educational and Counselling Psychology, and Special Education, The University of British Columbia, Vancouver, Canada

https://orcid.org/0000-0002-7940-4221 View further author information

Milena A. Keller-Margulisa Department of Psychological, Health & Learning Sciences, University of Houston, Houston, USA

https://orcid.org/0000-0001-7539-5375 View further author information

Pages 200-218 | Received 29 May 2021, Accepted 09 Feb 2022, Published online: 28 Feb 2022

Cite this article
https://doi.org/10.1080/0969594X.2022.2043240
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Amato, J. M., & Watkins, M. W. (2011). The predictive validity of CBM writing indices for eighth-grade students. The Journal of Special Education, 44(4), 195–204. https://doi.org/https://doi.org/10.1177/0022466909333516
Web of Science ®Google Scholar
American Educational Research Association, National Council on Measurement in Education, & American Psychological Association. (2014) . Standards for educational and psychological testing.
Google Scholar
Amorim, E., Cançado, M., & Veloso, A. (2018). Automated essay scoring in the presence of biased ratings. In Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers) (New Orleans, USA: Association for Computational Linguistics), 229–237. https://doi.org/https://doi.org/10.18653/v1/N18-1021
Google Scholar
Betts, J., Reschly, A., Pickart, M., Heistad, D., Sheran, C., & Marston, D. (2008). An examination of predictive bias for second grade reading outcomes from measures of early literacy skills in kindergarten with respect to english-language learners and ethnic subgroups. School Psychology Quarterly, 23(4), 553–570. https://doi.org/https://doi.org/10.1037/1045-3830.23.4.553
Web of Science ®Google Scholar
Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational Measurement (pp. 221–256). Praeger.
Google Scholar
Dascălu, M. (2014). Analyzing discourse and text complexity for learning and collaborating (Vol. 534). Springer International Publishing. https://doi.org/https://doi.org/10.1007/978-3-319-03419-5
Google Scholar
Diedenhofen, B., Musch, J., & Olivier, J. (2015). cocor: A comprehensive solution for the statistical comparison of correlations. PLoS ONE, 10(4), e0121945. https://doi.org/https://doi.org/10.1371/journal.pone.0121945
PubMed Web of Science ®Google Scholar
Espin, C. A., Scierka, B. J., Skare, S., & Halverson, N. (1999). Criterion-related validity of curriculum-based measures in writing for secondary school students. Reading & Writing Quarterly: Overcoming Learning Difficulties, 15(1), 5–27. https://doi.org/https://doi.org/10.1080/105735699278279
Google Scholar
Evans-Hampton, T. N., Skinner, C. H., Henington, C., Sims, S., & McDaniel, C. E. (2002). An investigation of situational bias: conspicuous and covert timing during curriculum-based measurement of mathematics across african american and caucasian students. School Psychology Review, 31(4), 529–539. https://doi.org/https://doi.org/10.1080/02796015.2002.12086172
Web of Science ®Google Scholar
Furey, W. M., Marcotte, A. M., Hintze, J. M., & Shackett, C. M. (2016). Concurrent validity and classification accuracy of curriculum-based measurement for written expression. School Psychology Quarterly, 31(3), 369–382. https://doi.org/https://doi.org/10.1037/spq0000138
PubMed Web of Science ®Google Scholar
Graham, S., Hebert, M., Sandbank, M. P., & Harris, K. R. (2016). Assessing the writing achievement of young struggling writers: Application of generalizability theory. Learning Disability Quarterly, 39(2), 72–82. https://doi.org/https://doi.org/10.1177/0731948714555019
Web of Science ®Google Scholar
Han, K., Colarelli, S. M., & Weed, N. C. (2019). Methodological and statistical advances in the consideration of cultural diversity in assessment: A critical review of group classification and measurement invariance testing. Psychological Assessment, 31(12), 1481–1496. https://doi.org/https://doi.org/10.1037/pas0000731
PubMed Web of Science ®Google Scholar
Hanushek, E. A., Kain, J. F., & Rivkin, S. G. (2002). Inferring program effects for special populations: Does special education raise achievement for students with disabilities? Review of Economics and Statistics, 84(4), 584–599. https://doi.org/https://doi.org/10.1162/003465302760556431
Web of Science ®Google Scholar
Heymans, M., & Eekhout, I. (2021). Psfmi. Prediction model selection and performance evaluation in multiple imputed datasets [ Computer software]. https://mwheymans.github.io/psfmi/
Google Scholar
Hosp, J. L., Hosp, M. A., & Dol, J. K. (2011). Potential bias in predictive validity of universal screening measures across disaggregation subgroups. School Psychology Review, 40(1), 108–131. https://doi.org/https://doi.org/10.1080/02796015.2011.12087731
Web of Science ®Google Scholar
Kane, M. T. (2013). Validating the Interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/https://doi.org/10.1111/jedm.12000
Web of Science ®Google Scholar
Keegan, P. J., Brown, G. T. L., & Hattie, J. A. C. (2013). A psychometric view of sociocultural factors in test validity: The development of standardised test materials for Māori medium schools in New Zealand/Aotearoa. In S. Phillipson, K. Ku, & S. N. Phillipson (Eds.), Constructing educational achievement: A sociocultural perspective (pp. 42–54). Routledge.
Google Scholar
Keller-Margulis, M. A., Mercer, S. H., & Matta, M. (2021). Validity of automated text evaluation tools for written-expression curriculum-based measurement: a comparison study. Reading and Writing, 34(10), 2461–2480. https://doi.org/https://doi.org/10.1007/s11145-021-10153-6
Web of Science ®Google Scholar
Keller-Margulis, M. A., Mercer, S. H., & Thomas, E. L. (2016). Generalizability theory reliability of written expression curriculum-based measurement in universal screening. School Psychology Quarterly, 31(3), 383–392. https://doi.org/https://doi.org/10.1037/spq0000126
PubMed Web of Science ®Google Scholar
Kim, Y.-S. G., Schatschneider, C., Wanzek, J., Gatlin, B., & Al Otaiba, S. (2017). Writing evaluation: Rater and task effects on the reliability of writing scores for children in grades 3 and 4. Reading and Writing: An Interdisciplinary Journal, 30(6), 1287–1310. https://doi.org/https://doi.org/10.1007/s11145-017-9724-6
PubMed Web of Science ®Google Scholar
Kline, R. B. (2013). Assessing statistical aspects of test fairness with structural equation modelling. Educational Research and Evaluation, 19(2–3), 204–222. https://doi.org/https://doi.org/10.1080/13803611.2013.767624
Google Scholar
Kunnan, A. J. (2000). Fairness and justice for all. In A. J. Kunnan (Ed.), Fairness and validation in language assessment (pp. 1–14). Cambridge University Press.
Google Scholar
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/https://doi.org/10.2307/2529310
PubMed Web of Science ®Google Scholar
Licht, C. (2010). New methods for generating significance levels from multiply-imputed data [ Doctoral dissertation, University of Bamberg]. Deutsch National Bibliothek. Marston, D., & Deno, S. (1981). The Reliability of Simple, Direct Measures of Written Expression (Vol. IRLDRR-50). University of Minnesota, Institute for Research on Learning Disabilities.
Google Scholar
McMaster, K. L., & Campbell, H. (2008). New and existing curriculum-based writing measures: Technical features within and across grades. School Psychology Review, 37(4), 550–556. https://doi.org/https://doi.org/10.1080/02796015.2008.12087867
Web of Science ®Google Scholar
McMaster, K. L., & Espin, C. (2007). Technical features of curriculum-based measurement in writing: A literature review. The Journal of Special Education, 41(2), 68–84. https://doi.org/https://doi.org/10.1177/00224669070410020301
Web of Science ®Google Scholar
McMaster, K. L., Shin, J., Espin, C. A., Jung, P.-G., Wayman, M. M., & Deno, S. L. (2017). Monitoring elementary students’ writing progress using curriculum-based measures: Grade and gender differences. Reading and Writing, 30(9), 2069–2091. https://doi.org/https://doi.org/10.1007/s11145-017-9766-9
Web of Science ®Google Scholar
McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Automated evaluation of text and discourse with coh-metrix. Cambridge University Press.
Google Scholar
Mercer, S. H. (2020). writeAlizer: Generate predicted writing quality and written expression CBM scores (Version 1.2.0) [ Computer software]. https://github.com/shmercer/writeAlizer/
Google Scholar
Mercer, S. H., Keller-Margulis, M. A., Faith, E. L., Reid, E. K., & Ochs, S. The potential for automated text evaluation to improve the technical adequacy of written expression curriculum-based measurement. (2019). Learning Disability Quarterly, 42(2), 117–128. 803296. https://doi.org/https://doi.org/10.1177/0731948718
Web of Science ®Google Scholar
Muthén, L. K., & Muthén, B. O. (1998). Mplus user’s guide (Eighth Edition ed.). Muthén & Muthén.
Google Scholar
Page, E. B. (2003). Project essay grade: PEG. In M. D. Shermis & J. C. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43–54). Erlbaum.
Google Scholar
Persky, H. R., Daane, M. C., & Jin, Y. (2003). The nation’s report card: Writing 2002. (NCES 2003–529). U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics.
Google Scholar
Ritchey, K. D., & Coker, D. L. (2013). An investigation of the validity and utility of two curriculum-based measurement writing tasks. Reading & Writing Quarterly, 29(1), 89–119. https://doi.org/https://doi.org/10.1080/10573569.2013.741957
Web of Science ®Google Scholar
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisaceck, F., Sanchez, J. C., & Müller, M. (2011). PROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(1), 1–8. https://doi.org/https://doi.org/10.1186/1471-2105-12-77
PubMed Web of Science ®Google Scholar
Robitzsch, A., & Grund, S. (2021). miceadds: Some additional multiple imputation functions, especially for “mice” (R package version 3.11-6) [Computer software]. https://CRAN.R-project.org/package=miceadds
Google Scholar
Romig, J. E., Therrien, W. J., & Lloyd, J. W. (2017). Meta-analysis of criterion validity for curriculum-based measurement in written language. The Journal of Special Education, 51(2), 72–82. https://doi.org/https://doi.org/10.1177/0022466916670637
Web of Science ®Google Scholar
RStudio Team. (2020). RStudio: Integrated development for R [Computer software]. RStudio, PBC. http://www.rstudio.com/
Google Scholar
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. SAGE Publications.
Google Scholar
Skiba, R. J., Artiles, A. J., Kozleski, E. B., Losen, D. J., & Harry, E. G. (2016). Risks and consequences of oversimplifying educational inequities: A response to Morgan et al. Educational Researcher, 45(3), 221–225. 2015. https://doi.org/https://doi.org/10.3102/0013189X16644606
Web of Science ®Google Scholar
Skiba, R. J., Simmons, A. B., Ritter, S., Gibb, A. C., Rausch, M. K., Cuadrado, J., & Chung, C.-G. (2008). Achieving equity in special education: History, status, and current challenges. Exceptional Children, 74(3), 264–288. https://doi.org/https://doi.org/10.1177/001440290807400301
Web of Science ®Google Scholar
Smolkowski, K., Cummings, K. D., & Strycker, L. (2016). An introduction to the statistical evaluation of fluency measures with signal detection theory. In K. D. Cummings & Y. Petscher (Eds.), The fluency construct: Curriculum-based measurement concepts and applications (pp. 187–221). Springer. https://doi.org/https://doi.org/10.1007/978-1-4939-2803-3_8
Google Scholar
Texas Education Agency (2013). Standard setting technical report.
Google Scholar
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/https://doi.org/10.18637/jss.v045.i03
Web of Science ®Google Scholar
Warne, R. T., Yoon, M., & Price, C. J. (2014). Exploring the various interpretations of “test bias”. Cultural Diversity & Ethnic Minority Psychology, 20(4), 570–582. https://doi.org/https://doi.org/10.1037/a0036503
PubMed Web of Science ®Google Scholar
Weissenburger, J. W., & Espin, C. A. (2005). Curriculum-based measures of writing across grade levels. Journal of School Psychology, 43(2), 153–169. https://doi.org/https://doi.org/10.1016/j.jsp.2005.03.002
Web of Science ®Google Scholar
Wilson, J., Olinghouse, N. G., McCoach, D. B., Santangelo, T., & Andrada, G. N. (2016). Comparing the accuracy of different scoring methods for identifying sixth graders at risk of failing a state writing assessment. Assessing Writing, 27, 11–23. https://doi.org/https://doi.org/10.1016/j.asw.2015.06.003
Web of Science ®Google Scholar
Wilson, J., Roscoe, R., & Ahmed, Y. (2017). Automated formative writing assessment using a levels of language framework. Assessing Writing, 34, 16–36. https://doi.org/https://doi.org/10.1016/j.asw.2017.08.002
Web of Science ®Google Scholar
Xu, Y., & Drame, E. R. (2008). Examining sociocultural factors in response to intervention models. Childhood Education, 85(1), 26–32. https://doi.org/https://doi.org/10.1080/00094056.2008.10523053
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Evaluating validity and bias for hand-calculated and automated written expression curriculum-based measurement scores

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Evaluating validity and bias for hand-calculated and automated written expression curriculum-based measurement scores

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date