Search in:

Applied Measurement in Education Volume 33, 2020 - Issue 3: Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance

Submit an article Journal homepage

338

Views

CrossRef citations to date

Altmetric

Research Article

Evaluating Human Scoring Using Generalizability Theory

Yaw BimpehResearch and Analysis, AQACorrespondence[email protected]

William PointerResearch and Analysis, AQA

Ben Alexander SmithResearch and Analysis, AQA

Liz HarrisonResearch and Analysis, AQA

Pages 198-209 | Published online: 21 Jul 2020

Cite this article
https://doi.org/10.1080/08957347.2020.1750403
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. Washington, DC: AERA.
Google Scholar
Brennan, R. L. (1992). An NCME instructional module on generalizability theory. Educational Measurement: Issues and Practice, 11(4), 27–34. doi:10.1111/j.1745-3992.1992.tb00260.x
Google Scholar
Brennan, R. L. (2001). Generalizability theory. New York: Springer-Verlag.
Google Scholar
Chiu, C. W. T., & Wolfe, E. W. (2002). A method for analyzing sparse data matrices in the generalizability theory framework. Applied Psychological Measurement, 26(3), 321–338. doi:10.1177/0146621602026003006
Web of Science ®Google Scholar
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: Wiley.
Google Scholar
Eckes, T. (2017). Guest editorial rater effects: Advances in item response modeling of human ratings–Part I. Psychological Test and Assessment Modeling, 59(4), 443–452.
Google Scholar
Engelhard, G., Jr. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31(2), 93–112. doi:10.1111/j.1745-3984.1994.tb00436.x
Web of Science ®Google Scholar
Greener, J. M., & Osburn, H. G. (1980). Accuracy of corrections for restriction in range due to explicit selection in heteroscedastic and non-linear distributions. Educational and Psychological Measurement, 40(2), 337–346. doi:10.1177/001316448004000208
Web of Science ®Google Scholar
Gross, A. L. (1982). Relaxing assumptions underlying corrections for range restriction. Educational and Psychological Measurement, 42(3), 795–801. doi:10.1177/001316448204200311
Web of Science ®Google Scholar
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Google Scholar
Henderson, C. R. (1953). Estimation of variance and covariance components. Biometrics, 9(2), 226–252. doi:10.2307/3001853
Web of Science ®Google Scholar
Houston, W. M., Raymond, M. R., & Svec, J. C. (1991). Adjustments for rater effects in performance assessment. Applied Psychological Measurement, 15(4), 409–421. doi:10.1177/014662169101500411
Web of Science ®Google Scholar
Johnson, S., & Johnson, R. (2009). Conceptualising and interpreting reliability (Ofqual Report No. 10/4706). Coventry England: Ofqual.
Google Scholar
Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994). The vermont portfolio assessment program: Findings and implications. Educational Measurement: Issues and Practice, 13(3), 5–6. doi:10.1111/j.1745-3992.1994.tb00443.x
Google Scholar
Meadows, M., & Billington, L. (2005). A review of the literature on marking reliability (Report for the National Assessment Agency by AQA Centre for Education Research and Policy, England).
Google Scholar
Searle, S. R. (1987). Linear models for unbalanced data. New York, NY: Wiley.
Google Scholar
Searle, S. R., Casella, G., & McCulloch, C. E. (1992). Variance components. New York: Wiley.
Google Scholar
Shavelson, R. J., Gao, X., & Baxter, G. (1996). On the content validity of performance assessments: Centrality of domain-specification. In M. Birenbaum & F. Dochy (Eds.), Alternatives in assessment of achievements, learning processes and prior knowledge (pp. 131–141). Boston: Kluwer Academic Publishers.
Google Scholar
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage Publications.
Google Scholar
Tisi, J., Whitehouse, G., Maughan, S., & Burdett, N. (2013). A review of literature on marking reliability research (Report for Ofqual). Slough England: NFER.
Google Scholar
Wind, S. A., & Peterson, M. E. (2018). A systematic review of methods for evaluating rating quality in language assessment. Language Testing, 35(2), 161–192. doi:10.1177/0265532216686999
Web of Science ®Google Scholar
Zhang, M. (2013, March). Contrasting automated and human scoring of essays (R&D Connections, No. 21). Princeton, NJ: Educational Testing Service.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Evaluating Human Scoring Using Generalizability Theory

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Evaluating Human Scoring Using Generalizability Theory

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date