Search in:

Applied Measurement in Education Volume 33, 2020 - Issue 3: Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance

Submit an article Journal homepage

299

Views

CrossRef citations to date

Altmetric

Research Article

Applying Cognitive Theory to the Human Essay Rating Process

Bridgid Finna Cognitive Science, Educational Testing ServiceCorrespondence[email protected]

Burcu Arslana Cognitive Science, Educational Testing Service

Matthew Walshb Cognitive Science, RAND Corp

Pages 223-233 | Published online: 21 Jul 2020

Cite this article
https://doi.org/10.1080/08957347.2020.1750405
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Arthur, W., Jr, Bennett, W., Jr, Stanush, P. L., & McNelly, T. L. (1998). Factors that influence skill decay and retention: A quantitative review and analysis. Human Performance, 11(1), 57–101. doi:10.1207/s15327043hup1101_3
Web of Science ®Google Scholar
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 1, 1–48.
Google Scholar
Bond, L. 1995. Unintended consequences of performance assessment: Issues of bias and fairness. Educational Measurement Issues and Practice, 14(4), 21–24. doi:10.1111/j.1745-3992.1995.tb00885.x
Google Scholar
Braun, H. I. (1988). Understanding scoring reliability: Experiments in calibrating essay readers. Journal of Educational Statistics, 13, 1–18. doi:10.3102/10769986013001001
Google Scholar
Congdon, P. J., & McQueen, J. (2000). The stability of rater severity in large scale assessment programs. Journal of Educational Measurement, 37(2), 163–178. doi:10.1111/j.1745-3984.2000.tb01081.x
Web of Science ®Google Scholar
Deane, P. (2011). Writing assessment and cognition (Research report 11-1). Princeton, NJ: Educational Testing Service.
Google Scholar
Ebbinghaus, H. (1964). Memory: A contribution to experimental psychology ( H. A. Ruger, C. E. Bussenius Trans.). New York, NY: Dover. (Original work published 1885).
Google Scholar
Farr, M. J. (1986). The long-term retention of knowledge and skills: A cognitive and instructional perspective (No. IDA-M-205). Alexandria, VA: Institute For Defense Analyses
Google Scholar
Finn, B., & Roth, A. (2020). An interview study with rSAT raters on calibration and operational scoring practices. Manuscript in preparation.
Google Scholar
Finn, B., Wendler, C., Pedley, K., & Arslan, B. (2018). Does the time between scoring session impact scoring accuracy? (ETS Research Report 18-31). Princeton, NJ: Educational Testing Service.
Google Scholar
Healy, A. F., Clawson, D. M., McNamara, D. S., Marmie, W. R., Schneider, V. I., Rickard, T. C., … Bourne, L. E., Jr (1993). The long-term retention of knowledge and skills. The Psychology of Learning and Motivation, 30, 135–164.
Web of Science ®Google Scholar
Hintzman, D. L., & Ludlum, G. (1980). Differential forgetting of prototypes and old instances: Simulation by an exemplar-based classification model. Memory & Cognition, 8(4), 378–382. doi:10.3758/BF03198278
PubMed Web of Science ®Google Scholar
Jastrzembski, T., Gluck, K., & Gunzelmann, G. (2006). Knowledge tracing and prediction of future trainee performance. In: Proceedings of the 2006 Interservice/Industry Training, Simulation, and Education Conference (pp. 1498–1508). Orlando, FL: National Training Systems Association.
Google Scholar
Joe, J. N., Harmes, J. C., & Hickerson, C. A. (2011). Using verbal reports to explore rater perceptual processes in scoring: A mixed methods application to oral communication assessment. Assessment in Education: Principles, Policy & Practice, 18, 239–258.
Google Scholar
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54–71. doi:10.1177/026553229501200104
Google Scholar
McClellan, C. A. (2010). Constructed-response scoring—Doing it right. R&D Connections, 13. Retrieved from https://www. ets.org/Media/Research/pdf/RD_Connections13.pdf.
Google Scholar
Murre, J. M., & Dros, J. (2015). Replication and analysis of Ebbinghaus’ forgetting curve. PLoS One, 10(7), e0120644. doi:10.1371/journal.pone.0120644
PubMed Web of Science ®Google Scholar
Myford, C. M., & Wolfe, E. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371–389. doi:10.1111/j.1745-3984.2009.00088.x
Web of Science ®Google Scholar
Parke, C. S., Lane, S., & Stone, C. A. (2006). Impact of a state performance assessment program in reading and writing. Educational Research and Evaluation, 12, 239–269.
Google Scholar
Pavlik, P. I., Jr (2007). Understanding and applying the dynamics of test practice and study practice. Instructional Science, 35, 407–441
Web of Science ®Google Scholar
Pavlik, P. I., Jr., & Anderson, J. R. (2003). An ACT-R model of the spacing effect. In F. Detje, D. Doerner, & H. Schaub (Eds.), In Proceedings of the Fifth International Conference on Cognitive Modeling (pp. 177–182). Bamberg, Germany: Universitats-Verlag Bamberg.
Google Scholar
Pavlik, P. I., Jr, & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29, 559–586.
PubMed Web of Science ®Google Scholar
Pavlik, P. I., Jr, & Anderson, J. R. (2008). Using a model to compute the optimal schedule of practice. Journal of Experimental Psychology: Applied, 14, 101–117.
PubMed Web of Science ®Google Scholar
R Development Core Team. (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: Author. ISBN 3-900051-07-0. Retrieved from http://www.R-project.org.
Google Scholar
Ricker-Pedley, K. L. (2011). An examination of the link between rater calibration performance and subsequent scoring accuracy in Graduate Record Examinations®(GRE®) Writing (Research report 11-1). Princeton, NJ: Educational Testing Service.
Google Scholar
Ricker-Pedley, K. L., & Li, H. (2010). Rater calibration and subsequent scoring performance [Internal manuscript]. Princeton, NJ: Educational Testing Service.
Google Scholar
Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217.
Web of Science ®Google Scholar
van Rijn, D. H., van Maanen, L., & van Woudenberg, M. (2009). Passing the test: Improving learning gains by balancing spacing and testing effects. In Proceedings of the 9th International Conference of Cognitive Modeling, Manchester, UK.
Google Scholar
Wendler, C., Glazer, N., & Cline, F. (2019). Examining the calibration process for GRE raters. (ETS GRE-19-01 & ETS RR-19-09). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/ets2.12245
Google Scholar
Wigglesworth, G. (1994). Patterns of rater behaviour in the assessment of an oral interaction test. Australian Review of Applied Linguistics, 17, 77–103.
Google Scholar
Wilson, K. M. (1982). GMAT and GRE aptitude test performance in relation to primary language and scores on TOEFL (Research report 82-28). Princeton, NJ: Educational Testing Service.
Google Scholar
Zhang, M. (2013). Contrasting automated and human scoring of essays. R & D Connections, 21, 2. Princeton, NJ: Educational Testing Service.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Applying Cognitive Theory to the Human Essay Rating Process

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Applying Cognitive Theory to the Human Essay Rating Process

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date