Search in:

School Psychology Review Volume 51, 2022 - Issue 1: Special Topic Section Social, Emotional, and Behavioral Assessment within Tiered Decision-Making Frameworks: Advancing Research through Reflections on the Past Decade

Submit an article Journal homepage

866

Views

CrossRef citations to date

Altmetric

Special Topic Section Social, Emotional, and Behavioral Assessment within Tiered Decision-Making Frameworks: Advancing Research through Reflections on the Past Decade

Evaluating the Impact of Rater Effects on Behavior Rating Scale Score Validity and Utility

Christopher J. Anthonya University of FloridaCorrespondence[email protected]
View further author information

Kara M. Styckb Northern Illinois University

https://orcid.org/0000-0002-3642-8530 View further author information

Erin Cookec P.K. Yonge Developmental Research SchoolView further author information

Justin R. Martela University of FloridaView further author information

Katherine E. Fryea University of FloridaView further author information

Pages 25-39 | Received 29 Feb 2020, Accepted 15 Sep 2020, Published online: 04 Jan 2021

Cite this article
https://doi.org/10.1080/2372966X.2020.1827681
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

REFERENCES

An, X., Curby, T. W., & Brock, L. L. (2019). Is the child really what’s being rated? Sources of variance in teacher ratings of socioemotional skills. Journal of Psychoeducational Assessment, 37(7), 899–910. https://doi.org/https://doi.org/10.1177/0734282918808618
Google Scholar
Anthony, C. J., & DiPerna, J. C. (2017). Identifying sets of maximally efficient items from the Academic Competence Evaluation Scales-Teacher Form. School Psychology Quarterly, 32(4), 552–559. https://doi.org/https://doi.org/10.1037/spq0000205
Google Scholar
Anthony, C. J., & DiPerna, J. C. (2018). Piloting a short form of the Academic Competence Evaluation Scales. School Mental Health, 10(3), 314–321. https://doi.org/https://doi.org/10.1007/s12310-018-9254-7
Google Scholar
Benson, N. F., Floyd, R. G., Kranzler, J. H., Eckert, T. L., Fefer, S. A., & Morgan, G. B. (2019). Test use and assessment practices of school psychologists in the United States: Findings from the 2017 national survey. Journal of School Psychology, 72, 29–48. https://doi.org/https://doi.org/10.1016/j.jsp.2018.12.004
Google Scholar
Bergeron, R., Floyd, R. G., McCormack, A. C., & Farmer, W. L. (2008). The generalizability of externalizing behavior composites and subscale scores across time, rater, and instrument. School Psychology Review, 37(1), 91–108. https://doi.org/https://doi.org/10.1080/02796015.2008.12087911
Google Scholar
Cohen, J. (1988). Statistical power and analysis for the behavioral sciences. (2nd ed.). Lawrence Erlbaum Associates.
Google Scholar
Congdon, P. J., & MeQueen, J. (2000). The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37(2), 163–178. https://doi.org/https://doi.org/10.1111/j.1745-3984.2000.tb01081.x
Google Scholar
Cronbach, L. J. (1990). Essentials of psychological testing. (5th ed.). Harper & Row.
Google Scholar
Croskerry, P., Singhal, G., & Mamede, S. (2013). Cognitive debiasing 1: origins of bias and theory of debiasing. BMJ Quality & Safety, 22(Suppl 2), ii58–ii64. https://doi.org/https://doi.org/10.1136/bmjqs-2012-001712
Google Scholar
DiPerna, J. C., & Elliott, S. N. (2000). Academic Competence Evaluation Scales. The Psychological Corporation.
Google Scholar
DuPaul, G. J., Rapport, M. D., & Perriello, L. M. (1991). Teacher ratings of academic skills: The development of the Academic Performance Rating Scale. School Psychology Review, 20(2), 284–300. https://doi.org/https://doi.org/10.1080/02796015.1991.12085552
Google Scholar
DuPaul, G. J., Reid, R., Anastopoulos, A. D., Lambert, M. C., Watkins, M. W., & Power, T. J. (2016). Parent and teacher ratings of attention-deficit/hyperactivity disorder symptoms: Factor structure and normative data. Psychological Assessment, 28(2), 214–225. Retrieved from https://psycnet.apa.org/fulltext/2015-23326-001.htmlhttps://doi.org/https://doi.org/10.1037/pas0000166
Google Scholar
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197–221. https://doi.org/https://doi.org/10.1207/s15434311laq0203_2
Google Scholar
Eckes, T. (2009). Many-facet Rasch measurement. In S. Takala (Ed.), Reference supplement to the manual for relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (Section H). Council of Europe: Language Policy Division. https://www.coe.int/t/dg4/Linguistic/CEF-refSupp-SectionH.pdf
Google Scholar
Engelhard, G., & Wind, S. A. (2017). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge. https://doi.org/https://doi.org/10.4324/9781315766829
Google Scholar
Florida Department of Education. (2019). Evidence of reliability and validity (Florida Standards Assessment Technical Report 2017-2018, Volume 4). http://www.fldoe.org/core/fileparse.php/5663/urlt/V4-FSA-1718-TechRpt.pdf
Google Scholar
Gresham, F. M., & Elliott, S. N. (2008). Social skills improvement system: Rating scales manual. NCS Pearson.
Google Scholar
Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it?Psychological Methods, 5(1), 64–86. https://doi.org/https://doi.org/10.1037/1082-989x.5.1.64
Google Scholar
Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review, 103(3), 582–591. https://doi.org/https://doi.org/10.1037/0033-295x.103.3.582
Google Scholar
Kettler, R. J., Elliott, S. N., DiPerna, J. C., Bolt, D. M., Reiser, D., & Resurreccion, L. (2014). Student and teacher ratings of academic competence: An examination of cross-informant agreement. Journal of Applied School Psychology, 30(4), 338–354. https://doi.org/https://doi.org/10.1080/15377903.2014.950442
Google Scholar
Kilgus, S. P., & von der Embse, N. P. (2014). Unpublished technical manual of the Social, Academic, and Emotional Behavior Risk Screener.
Google Scholar
Linacre, J. M. (1989). Many-facet Rasch measurement. (2nd ed.). MESA Press.
Google Scholar
Linacre, J. M. (1996). Generalizability Theory and Many-Facet Rasch measurement. Objective Measurement: Theory Into Practice, 3, 85–98.
Google Scholar
Linacre, J. M. (2002). What do infit and outfit, mean-square and standardized mean?Rasch Measurement Transactions, 16, 878.
Google Scholar
Linacre, J. M. (2018). Facets Rasch model computer program [software manual]. Winsteps.com.
Google Scholar
Linacre, J. M., & Wright, B. D. (2002). Construction of measures from many-facet data. Journal of Applied Measurement, 3(4), 486–512. https://psycnet.apa.org/record/2002-06916-006
Google Scholar
Lunz, M. E., Wright, B. D., & Linacre, J. M. (1990). Measuring the impact of judge severity of examination scores. Applied Measurement in Education, 3(4), 331–345. https://doi.org/https://doi.org/10.1207/s15324818ame0304_3
Google Scholar
Mashburn, A. J., Hamre, B. K., Downer, J. T., & Pianta, R. C. (2006). Teacher and classroom characteristics associated with teachers ‘ratings of prekindergartners’ relationships and behaviors. Journal of Psychoeducational Assessment, 24(4), 367–380. https://doi.org/https://doi.org/10.1177/0734282906290594
Google Scholar
McNamara, T. F. (1996). Measuring second language performance. Longman.
Google Scholar
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/https://doi.org/10.1037/0003-066X.50.9.741
Google Scholar
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422. https://www.researchgate.net/profile/Carol_Myford/publication/9069043_Detecting_and_Measuring_Rater_Effects_Using_Many-Facet_Rasch_Measurement_Part_I/links/54cba70e0cf298d6565848ee.pdf
Google Scholar
Myford, C. M., Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part 2. Journal of Applied Measurement, 5, 189–227. https://www.researchgate.net/profile/Carol_Myford/publication/8636147_Detecting_and_Measuring_Rater_Effects_Using_Many-Facet_Rasch_Measurement_Part_II/links/54eb80630cf2ff89649dd777/Detecting-and-Measurinng-Rater-Effects-Using-Many-Facet-Rasch-measurement-Part-II.pdf
Google Scholar
Owens, J. S., Allan, D. M., Kassab, H., & Mikami, A. Y. (2020). Evaluating a short form of the Academic Competence Evaluation Scales: Expanded examination of psychometric properties. School Mental Health, 12(1), 38–52. Advance online publication. https://doi.org/https://doi.org/10.1007/s12310-019-09347-9
Google Scholar
Pendergast, L. L., Youngstrom, E. A., Ruan-Iu, L., & Beysolow, D. (2018). The nomogram: A decision-making tool for practitioners using multitiered systems of support. School Psychology Review, 47(4), 345–359. https://doi.org/https://doi.org/10.17105/SPR-2017-0097.V47-4
Google Scholar
Raines, T. C., Dever, B. V., Kamphaus, R. W., & Roach, A. T. (2012). Universal screening for behavioral and emotional risk: A promising method for reducing disproportionate placement in special education. The Journal of Negro Education, 81(3), 283–296. https://doi.org/https://doi.org/10.7709/jnegroeducation.81.3.0283
Google Scholar
Renaissance Learning. (2012). FSA Math technical manual. Renaissance Learning.
Google Scholar
Renaissance Learning. (2015). FSA ELA technical manual. Renaissance Learning.
Google Scholar
Romer, N., von der Embse, N., Eklund, K., Kilgus, S., Perales, K., Splett, J. W., Suldo, S., & Wheeler, D. (2020). Best Practices in Social, Emotional, and Behavioral Screening: An Implementation Guide. Version 2.0. https://smhcollaborative.org/universalscreening
Google Scholar
Splett, J. W., Raborn, A., Brann, K., Smith-Millman, M. K., Halliday, C., & Weist, M. D. (2020). Between-teacher variance of students' teacher-rated risk for emotional, behavioral, and adaptive functioning. Journal of School Psychology, 80, 37–53. https://doi.org/https://doi.org/10.1016/j.jsp.2020.04.001
Google Scholar
Splett, J. W., Smith-Millman, M., Raborn, A., Brann, K. L., Flaspohler, P. D., & Maras, M. A. (2018). Student, teacher, and classroom predictors of between-teacher variance of students' teacher-rated behavior. School Psychology Quarterly: The Official Journal of the Division of School Psychology, American Psychological Association, 33(3), 460–468. https://doi.org/https://doi.org/10.1037/spq0000241
Google Scholar
Stahl, J. A. (1994). What does generalizability theory (G-Theory) offer that Many-Facet Rasch Measurement cannot duplicate?Rasch Measurement Transactions, 8, 342–343.
Google Scholar
Sudweeks, R. R., Reeve, S., & Bradshaw, W. S. (2004). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9(3), 239–261. https://doi.org/https://doi.org/10.1016/j.asw.2004.11.001
Google Scholar
Styck, K., Anthony, C. J., Flavin, A., Riddle, D., & LaBelle, B. (in press). Are ratings in the eye of the beholder? A tutorial on Many Facet Rasch Measurement to evaluate rater effects in school psychology. Journal of School Psychology.
Google Scholar
Styck, K., Anthony, C. J., Sandilos, L. E., & DiPerna, J. (2020). Examining rater effects on the Classroom Assessment Scoring System. Child Development, Advance online publication. https://doi.org/https://doi.org/10.1111/cdev.13460
Google Scholar
Tanner, N., Eklund, K., Kilgus, S. P., Johnson, A. H., & Bowman-Perrott, L. (2018). Generalizability of universal screening measures for behavioral and emotional risk. School Psychology Review, 47(1), 3–17. https://doi.org/https://doi.org/10.17105/SPR-2017-0044.V47-1
Google Scholar
von der Embse, N. P., Kilgus, S. P., Eklund, K., Ake, E., Levi-Neilsen, S., & Eckert, T. (2018). Training teachers to facilitate early identification of mental and behavioral health risks. School Psychology Review, 47(4), 372–384. https://doi.org/https://doi.org/10.17105/SPR-2017-0094.V47-4
Google Scholar
Wang, B. (2010). On rater agreement and rater training. English Language Teaching, 3(1), 108–112. https://doi.org/https://doi.org/10.5539/elt.v3n1p108
Google Scholar
Warmbold-Brann, K. (2017). The effect of an intensive teacher training on the accuracy of social, emotional, and behavioral screening results [Unpublished doctoral dissertation]. University of Missouri-Columbia.
Google Scholar
Whitcomb, S. A., & Merrell, K. W. (2013). Behavioral, social, and emotional assessment of children and adolescents. (4th ed.). Routledge.
Google Scholar
Wind, S. A., & Engelhard, G.,Jr. (2013). How invariant and accurate are domain ratings in writing assessment?Assessing Writing, 18(4), 278–299. https://doi.org/https://doi.org/10.1016/j.asw.2013.09.002
Google Scholar
Wolfe, E. W., & McVay, A. (2010). Rater effects as a function of rater training context (White paper). Pearson Assessments.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Evaluating the Impact of Rater Effects on Behavior Rating Scale Score Validity and Utility

REFERENCES

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Evaluating the Impact of Rater Effects on Behavior Rating Scale Score Validity and Utility

REFERENCES

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date