ABSTRACT
Written expression curriculum-based measurement (WE-CBM) is a formative assessment approach for screening and progress monitoring. To extend evaluation of WE-CBM, we compared hand-calculated and automated scoring approaches in relation to the number of screening samples needed per student for valid scores, the long-term predictive validity and diagnostic accuracy of scores, and predictive and diagnostic bias for underrepresented student groups. Second- to fifth-grade students (n = 609) completed five WE-CBM tasks during one academic year and a standardised writing test in fourth and seventh grade. Averaging WE-CBM scores across multiple samples improved validity. Complex hand-calculated metrics and automated tools outperformed simpler metrics for the long-term prediction of writing performance. No evidence of bias was observed between African American and Hispanic students. The study will illustrate the absence of test bias as necessary condition for fair and equitable screening procedures and the importance of future research to include comparisons with majority groups.
Acknowledgments
This research was supported by the Society for the Study of School Psychology (SSSP) Early Career Research Award. This research was also supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A190100 awarded to the University of Houston (PI – Milena Keller-Margulis). The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplementary material
Supplemental data for this article can be accessed here.
Notes
1. Separate multilevel regression models were estimated with each metric as the outcome for the first two time points to examine the effects of a nested structure (i.e. level-2 of classroom) on the findings of the study. The Intra-Class Coefficients (ICC) were substantial for the random intercepts across models (ICC > .10) but negligible for the random slopes (ICC < .10). In other words, while high variability was observed across the intercepts of classrooms, the same variability was not observed for the slopes. Thus, the results presented in this section and the inferences drawn in the Discussion would not change after the inclusion of the effects of classrooms.
Additional information
Notes on contributors
Michael Matta
Michael Matta, Ph.D., is a Research Scientist in School Psychology at the University of Houston. His research focuses on the validation of brief, computer-based assessments to extend the interpretation of student test scores to real-world outcomes and help develop fair and equitable ways to use such scores in applied settings.
Sterett H. Mercer
Sterett H. Mercer, Ph.D., is a Professor in Special Education in the Department of Educational and Counselling Psychology & Special Education, at the University of British Columbia. His research focuses on the measurement of student academic skills in response to instruction and intervention. https://ecps.educ.ubc.ca/person/sterett-mercer/
Milena A. Keller-Margulis
Milena A. Keller-Margulis, Ph.D., is an Associate Professor of School Psychology in the Psychological, Health, and Learning Sciences Department at the University of Houston. Her research focuses on the use of curriculum-based measures to assess academic skills in the context of multi-tiered systems of support. https://www.uh.edu/education/about/directory/employee-profile/index.php?id=504