Search in:

Advanced search

Measurement: Interdisciplinary Research and Perspectives Volume 22, 2024 - Issue 1

Submit an article Journal homepage

Views

CrossRef citations to date

Altmetric

Research Article

Detecting Rater Bias in Mixed-Format Assessments

Stefanie A. Winda Educational Measurement, Educational Studies in Psychology, Research Methodology, and Counseling, The University of AlabamaCorrespondence[email protected]

https://orcid.org/0000-0002-1599-375X View further author information

Yuan Geb Department of Psychometrics, The College Board

https://orcid.org/0000-0002-4960-6201 View further author information

Pages 20-30 | Published online: 20 Feb 2024

Cite this article
https://doi.org/10.1080/15366367.2023.2173468
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Andrich, D. A. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573. https://doi.org/10.1007/BF02293814
Web of Science ®Google Scholar
Andrich, D. A., & Hagquist, C. (2012). Real and artificial differential item functioning. Journal of Educational and Behavioral Statistics, 37(3), 387–416. https://doi.org/10.3102/1076998611411913
Web of Science ®Google Scholar
Andrich, D. A., & Hagquist, C. (2015). Real and artificial differential item functioning in polytomous items. Educational and Psychological Measurement, 75(2), 185–207. https://doi.org/10.1177/0013164414534258
PubMed Web of Science ®Google Scholar
Dewberry, C., Davies-Muir, A., & Newell, S. (2013). Impact and causes of rater severity/leniency in appraisals without postevaluation communication between raters and ratees. International Journal of Selection and Assessment, 21(3), 286–293. https://doi.org/10.1111/ijsa.12038
Web of Science ®Google Scholar
Draba, R. E.1977The identification and interpretation of item bias (NO. 25; Research Memorandum). Statistical Laboratory, Department of Education, University of Chicago.
Google Scholar
Eckes, T. (2015). Introduction to many-facet rasch measurement: Analyzing and evaluating rater-mediated assessments (2nd ed.). Peter Lang.
Google Scholar
Engelhard, G., & Wind, S. A. (2013). Rating quality studies using rasch measurement theory (Research Report No. 2013–3). The College Board.
Google Scholar
Engelhard, G., & Wind, S. A. (2018). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Taylor & Francis.
Google Scholar
Ercikan, K., Julian, M. W., Burket, G. R., Weber, M. M., & Link, V. (1998). Calibration and scoring of tests with multiple‐choice and constructed‐response item types. Journal of Educational Measurement, 35(2), 137–154. https://doi.org/10.1111/j.1745-3984.1998.tb00531.x
Web of Science ®Google Scholar
Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79. https://doi.org/10.37546/JALTJJ34.1-3
Google Scholar
Guo, W., & Wind, S. A. (2021). Examining the impacts of ignoring rater effects in mixed-format tests. Journal of Educational Measurement, 58(3), 364–387. https://doi.org/10.1111/jedm.12292
Web of Science ®Google Scholar
Han, C. (2015). Investigating rater severity/leniency in interpreter performance testing: A multifaceted rasch measurement approach. Interpreting, 17(2), 255–283. https://doi.org/10.1075/intp.17.2.05han
Web of Science ®Google Scholar
Jin, K.-Y., & Eckes, T. (2021). Detecting differential rater functioning in severity and centrality: the dual DRF facets model. Educational and Psychological Measurement, 82(4), 001316442110432. https://doi.org/10.1177/00131644211043207
Web of Science ®Google Scholar
Kim, S., Walker, M. E., & McHale, F. (2008). Equating of mixed‐format tests in large‐scale assessments. ETS Research Report Series, 2008(1), i–26. https://doi.org/10.1002/j.2333-8504.2008.tb02112.x.
Google Scholar
Linacre, J. M. (1989). Many-facet rasch measurement. MESA Press.
Google Scholar
Linacre, J. M. (2015). Facets rasch measurement ( Version 3.71.4).
Google Scholar
Mao, X., Zhang, J., & Xin, T. (2022). The optimal design of bifactor multidimensional computerized adaptive testing with mixed-format items. Applied Psychological Measurement, 46(7), 605–621. https://doi.org/10.1177/01466216221108382
PubMed Web of Science ®Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272
Web of Science ®Google Scholar
Myers, N. D., Wolfe, E. W., Feltz, D. L., & Penfield, R. D. (2006). Identifying differential item functioning of rating scale items with the Rasch model: An introduction and an application. Measurement in Physical Education and Exercise Science, 10(4), 215–240. https://doi.org/10.1207/s15327841mpee1004_1
Google Scholar
Myford, C. M., & Wolfe, E. W. (2000). Strengthening the ties that bind: Improving the linking network in sparsely connected rating designs. ETS Research Report Series, 2000(1), i–34. https://doi.org/10.1002/j.2333-8504.2000.tb01832.x.
Google Scholar
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.
PubMedGoogle Scholar
NAEP - Scoring. (n.d). National center for education statistics.
Google Scholar
NAEP Scoring—Backreading. (n.d.). National center for education statistics. Retrieved June 7, 2021, from https://nces.ed.gov/nationsreportcard/tdw/scoring/scoring_backreading.aspx
Google Scholar
NAEP Scoring—Within-Year Interrater Agreement. (n.d.).National center for education statistics. Retrieved June 7, 2021, from https://nces.ed.gov/nationsreportcard/tdw/scoring/scoring_within.aspx
Google Scholar
Peabody, M. R., & Wind, S. A. (2019). Exploring the stability of differential item functioning across administrations and critical values using the rasch separate calibration t-test method. Measurement: Interdisciplinary Research and Perspectives, 17(2), 78–92. https://doi.org/10.1080/15366367.2018.1533782
Web of Science ®Google Scholar
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502. https://doi.org/10.1007/BF02294403
Web of Science ®Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests ( Expanded edition, 1980). University of Chicago Press.
Google Scholar
Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465–493. https://doi.org/10.1177/0265532208094273
Web of Science ®Google Scholar
Sinharay, S. (2015). Assessment of person fit for mixed-format tests. Journal of Educational and Behavioral Statistics, 40(4), 343–365. https://doi.org/10.3102/1076998615589128
Web of Science ®Google Scholar
Wind, S. A. (2019). Examining the impacts of rater effects in performance assessments. Applied Psychological Measurement, 43(2), 159–171. https://doi.org/10.1177/0146621618789391
PubMed Web of Science ®Google Scholar
Wind, S. A., & Ge, Y. (2021). Detecting rater biases in sparse rater-mediated assessment networks. Educational and Psychological Measurement, 81(5), 996–1022. https://doi.org/10.1177/0013164420988108
PubMed Web of Science ®Google Scholar
Wind, S. A., & Guo, W. (2021). Beyond agreement: Exploring rater effects in large-scale mixed format assessments. Educational Assessment, 26(4), 264–283. https://doi.org/10.1080/10627197.2021.1962277
Web of Science ®Google Scholar
Winke, P., Gass, S., & Myford, C. (2012). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30(2), 231–252. https://doi.org/10.1177/0265532212456968
Web of Science ®Google Scholar
Wolfe, E. W. (2004). Identifying rater effects using latent trait models. Psychology Science, 46(1), 35–51.
Google Scholar
Wolfe, E. W., & McVay, A. (2012). Application of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practice, 31(3), 31–37. https://doi.org/10.1111/j.1745-3992.2012.00241.x
Google Scholar
Yao, L., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30(6), 469–492. https://doi.org/10.1177/0146621605284537
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Detecting Rater Bias in Mixed-Format Assessments

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Detecting Rater Bias in Mixed-Format Assessments

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date