REFERENCES
- An, X., Curby, T. W., & Brock, L. L. (2019). Is the child really what’s being rated? Sources of variance in teacher ratings of socioemotional skills. Journal of Psychoeducational Assessment, 37(7), 899–910. https://doi.org/https://doi.org/10.1177/0734282918808618
- Anthony, C. J., & DiPerna, J. C. (2017). Identifying sets of maximally efficient items from the Academic Competence Evaluation Scales-Teacher Form. School Psychology Quarterly, 32(4), 552–559. https://doi.org/https://doi.org/10.1037/spq0000205
- Anthony, C. J., & DiPerna, J. C. (2018). Piloting a short form of the Academic Competence Evaluation Scales. School Mental Health, 10(3), 314–321. https://doi.org/https://doi.org/10.1007/s12310-018-9254-7
- Benson, N. F., Floyd, R. G., Kranzler, J. H., Eckert, T. L., Fefer, S. A., & Morgan, G. B. (2019). Test use and assessment practices of school psychologists in the United States: Findings from the 2017 national survey. Journal of School Psychology, 72, 29–48. https://doi.org/https://doi.org/10.1016/j.jsp.2018.12.004
- Bergeron, R., Floyd, R. G., McCormack, A. C., & Farmer, W. L. (2008). The generalizability of externalizing behavior composites and subscale scores across time, rater, and instrument. School Psychology Review, 37(1), 91–108. https://doi.org/https://doi.org/10.1080/02796015.2008.12087911
- Cohen, J. (1988). Statistical power and analysis for the behavioral sciences. (2nd ed.). Lawrence Erlbaum Associates.
- Congdon, P. J., & MeQueen, J. (2000). The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37(2), 163–178. https://doi.org/https://doi.org/10.1111/j.1745-3984.2000.tb01081.x
- Cronbach, L. J. (1990). Essentials of psychological testing. (5th ed.). Harper & Row.
- Croskerry, P., Singhal, G., & Mamede, S. (2013). Cognitive debiasing 1: origins of bias and theory of debiasing. BMJ Quality & Safety, 22(Suppl 2), ii58–ii64. https://doi.org/https://doi.org/10.1136/bmjqs-2012-001712
- DiPerna, J. C., & Elliott, S. N. (2000). Academic Competence Evaluation Scales. The Psychological Corporation.
- DuPaul, G. J., Rapport, M. D., & Perriello, L. M. (1991). Teacher ratings of academic skills: The development of the Academic Performance Rating Scale. School Psychology Review, 20(2), 284–300. https://doi.org/https://doi.org/10.1080/02796015.1991.12085552
- DuPaul, G. J., Reid, R., Anastopoulos, A. D., Lambert, M. C., Watkins, M. W., & Power, T. J. (2016). Parent and teacher ratings of attention-deficit/hyperactivity disorder symptoms: Factor structure and normative data. Psychological Assessment, 28(2), 214–225. Retrieved from https://psycnet.apa.org/fulltext/2015-23326-001.htmlhttps://doi.org/https://doi.org/10.1037/pas0000166
- Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197–221. https://doi.org/https://doi.org/10.1207/s15434311laq0203_2
- Eckes, T. (2009). Many-facet Rasch measurement. In S. Takala (Ed.), Reference supplement to the manual for relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (Section H). Council of Europe: Language Policy Division. https://www.coe.int/t/dg4/Linguistic/CEF-refSupp-SectionH.pdf
- Engelhard, G., & Wind, S. A. (2017). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge. https://doi.org/https://doi.org/10.4324/9781315766829
- Florida Department of Education. (2019). Evidence of reliability and validity (Florida Standards Assessment Technical Report 2017-2018, Volume 4). http://www.fldoe.org/core/fileparse.php/5663/urlt/V4-FSA-1718-TechRpt.pdf
- Gresham, F. M., & Elliott, S. N. (2008). Social skills improvement system: Rating scales manual. NCS Pearson.
- Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it?Psychological Methods, 5(1), 64–86. https://doi.org/https://doi.org/10.1037/1082-989x.5.1.64
- Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review, 103(3), 582–591. https://doi.org/https://doi.org/10.1037/0033-295x.103.3.582
- Kettler, R. J., Elliott, S. N., DiPerna, J. C., Bolt, D. M., Reiser, D., & Resurreccion, L. (2014). Student and teacher ratings of academic competence: An examination of cross-informant agreement. Journal of Applied School Psychology, 30(4), 338–354. https://doi.org/https://doi.org/10.1080/15377903.2014.950442
- Kilgus, S. P., & von der Embse, N. P. (2014). Unpublished technical manual of the Social, Academic, and Emotional Behavior Risk Screener.
- Linacre, J. M. (1989). Many-facet Rasch measurement. (2nd ed.). MESA Press.
- Linacre, J. M. (1996). Generalizability Theory and Many-Facet Rasch measurement. Objective Measurement: Theory Into Practice, 3, 85–98.
- Linacre, J. M. (2002). What do infit and outfit, mean-square and standardized mean?Rasch Measurement Transactions, 16, 878.
- Linacre, J. M. (2018). Facets Rasch model computer program [software manual]. Winsteps.com.
- Linacre, J. M., & Wright, B. D. (2002). Construction of measures from many-facet data. Journal of Applied Measurement, 3(4), 486–512. https://psycnet.apa.org/record/2002-06916-006
- Lunz, M. E., Wright, B. D., & Linacre, J. M. (1990). Measuring the impact of judge severity of examination scores. Applied Measurement in Education, 3(4), 331–345. https://doi.org/https://doi.org/10.1207/s15324818ame0304_3
- Mashburn, A. J., Hamre, B. K., Downer, J. T., & Pianta, R. C. (2006). Teacher and classroom characteristics associated with teachers ‘ratings of prekindergartners’ relationships and behaviors. Journal of Psychoeducational Assessment, 24(4), 367–380. https://doi.org/https://doi.org/10.1177/0734282906290594
- McNamara, T. F. (1996). Measuring second language performance. Longman.
- Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/https://doi.org/10.1037/0003-066X.50.9.741
- Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422. https://www.researchgate.net/profile/Carol_Myford/publication/9069043_Detecting_and_Measuring_Rater_Effects_Using_Many-Facet_Rasch_Measurement_Part_I/links/54cba70e0cf298d6565848ee.pdf
- Myford, C. M., Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part 2. Journal of Applied Measurement, 5, 189–227. https://www.researchgate.net/profile/Carol_Myford/publication/8636147_Detecting_and_Measuring_Rater_Effects_Using_Many-Facet_Rasch_Measurement_Part_II/links/54eb80630cf2ff89649dd777/Detecting-and-Measurinng-Rater-Effects-Using-Many-Facet-Rasch-measurement-Part-II.pdf
- Owens, J. S., Allan, D. M., Kassab, H., & Mikami, A. Y. (2020). Evaluating a short form of the Academic Competence Evaluation Scales: Expanded examination of psychometric properties. School Mental Health, 12(1), 38–52. Advance online publication. https://doi.org/https://doi.org/10.1007/s12310-019-09347-9
- Pendergast, L. L., Youngstrom, E. A., Ruan-Iu, L., & Beysolow, D. (2018). The nomogram: A decision-making tool for practitioners using multitiered systems of support. School Psychology Review, 47(4), 345–359. https://doi.org/https://doi.org/10.17105/SPR-2017-0097.V47-4
- Raines, T. C., Dever, B. V., Kamphaus, R. W., & Roach, A. T. (2012). Universal screening for behavioral and emotional risk: A promising method for reducing disproportionate placement in special education. The Journal of Negro Education, 81(3), 283–296. https://doi.org/https://doi.org/10.7709/jnegroeducation.81.3.0283
- Renaissance Learning. (2012). FSA Math technical manual. Renaissance Learning.
- Renaissance Learning. (2015). FSA ELA technical manual. Renaissance Learning.
- Romer, N., von der Embse, N., Eklund, K., Kilgus, S., Perales, K., Splett, J. W., Suldo, S., & Wheeler, D. (2020). Best Practices in Social, Emotional, and Behavioral Screening: An Implementation Guide. Version 2.0. https://smhcollaborative.org/universalscreening
- Splett, J. W., Raborn, A., Brann, K., Smith-Millman, M. K., Halliday, C., & Weist, M. D. (2020). Between-teacher variance of students' teacher-rated risk for emotional, behavioral, and adaptive functioning. Journal of School Psychology, 80, 37–53. https://doi.org/https://doi.org/10.1016/j.jsp.2020.04.001
- Splett, J. W., Smith-Millman, M., Raborn, A., Brann, K. L., Flaspohler, P. D., & Maras, M. A. (2018). Student, teacher, and classroom predictors of between-teacher variance of students' teacher-rated behavior. School Psychology Quarterly: The Official Journal of the Division of School Psychology, American Psychological Association, 33(3), 460–468. https://doi.org/https://doi.org/10.1037/spq0000241
- Stahl, J. A. (1994). What does generalizability theory (G-Theory) offer that Many-Facet Rasch Measurement cannot duplicate?Rasch Measurement Transactions, 8, 342–343.
- Sudweeks, R. R., Reeve, S., & Bradshaw, W. S. (2004). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9(3), 239–261. https://doi.org/https://doi.org/10.1016/j.asw.2004.11.001
- Styck, K., Anthony, C. J., Flavin, A., Riddle, D., & LaBelle, B. (in press). Are ratings in the eye of the beholder? A tutorial on Many Facet Rasch Measurement to evaluate rater effects in school psychology. Journal of School Psychology.
- Styck, K., Anthony, C. J., Sandilos, L. E., & DiPerna, J. (2020). Examining rater effects on the Classroom Assessment Scoring System. Child Development, Advance online publication. https://doi.org/https://doi.org/10.1111/cdev.13460
- Tanner, N., Eklund, K., Kilgus, S. P., Johnson, A. H., & Bowman-Perrott, L. (2018). Generalizability of universal screening measures for behavioral and emotional risk. School Psychology Review, 47(1), 3–17. https://doi.org/https://doi.org/10.17105/SPR-2017-0044.V47-1
- von der Embse, N. P., Kilgus, S. P., Eklund, K., Ake, E., Levi-Neilsen, S., & Eckert, T. (2018). Training teachers to facilitate early identification of mental and behavioral health risks. School Psychology Review, 47(4), 372–384. https://doi.org/https://doi.org/10.17105/SPR-2017-0094.V47-4
- Wang, B. (2010). On rater agreement and rater training. English Language Teaching, 3(1), 108–112. https://doi.org/https://doi.org/10.5539/elt.v3n1p108
- Warmbold-Brann, K. (2017). The effect of an intensive teacher training on the accuracy of social, emotional, and behavioral screening results [Unpublished doctoral dissertation]. University of Missouri-Columbia.
- Whitcomb, S. A., & Merrell, K. W. (2013). Behavioral, social, and emotional assessment of children and adolescents. (4th ed.). Routledge.
- Wind, S. A., & Engelhard, G.,Jr. (2013). How invariant and accurate are domain ratings in writing assessment?Assessing Writing, 18(4), 278–299. https://doi.org/https://doi.org/10.1016/j.asw.2013.09.002
- Wolfe, E. W., & McVay, A. (2010). Rater effects as a function of rater training context (White paper). Pearson Assessments.