299
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Applying Cognitive Theory to the Human Essay Rating Process

, &

References

  • Arthur, W., Jr, Bennett, W., Jr, Stanush, P. L., & McNelly, T. L. (1998). Factors that influence skill decay and retention: A quantitative review and analysis. Human Performance, 11(1), 57–101. doi:10.1207/s15327043hup1101_3
  • Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 1, 1–48.
  • Bond, L. 1995. Unintended consequences of performance assessment: Issues of bias and fairness. Educational Measurement Issues and Practice, 14(4), 21–24. doi:10.1111/j.1745-3992.1995.tb00885.x
  • Braun, H. I. (1988). Understanding scoring reliability: Experiments in calibrating essay readers. Journal of Educational Statistics, 13, 1–18. doi:10.3102/10769986013001001
  • Congdon, P. J., & McQueen, J. (2000). The stability of rater severity in large scale assessment programs. Journal of Educational Measurement, 37(2), 163–178. doi:10.1111/j.1745-3984.2000.tb01081.x
  • Deane, P. (2011). Writing assessment and cognition (Research report 11-1). Princeton, NJ: Educational Testing Service.
  • Ebbinghaus, H. (1964). Memory: A contribution to experimental psychology ( H. A. Ruger, C. E. Bussenius Trans.). New York, NY: Dover. (Original work published 1885).
  • Farr, M. J. (1986). The long-term retention of knowledge and skills: A cognitive and instructional perspective (No. IDA-M-205). Alexandria, VA: Institute For Defense Analyses
  • Finn, B., & Roth, A. (2020). An interview study with rSAT raters on calibration and operational scoring practices. Manuscript in preparation.
  • Finn, B., Wendler, C., Pedley, K., & Arslan, B. (2018). Does the time between scoring session impact scoring accuracy? (ETS Research Report 18-31). Princeton, NJ: Educational Testing Service.
  • Healy, A. F., Clawson, D. M., McNamara, D. S., Marmie, W. R., Schneider, V. I., Rickard, T. C., … Bourne, L. E., Jr (1993). The long-term retention of knowledge and skills. The Psychology of Learning and Motivation, 30, 135–164.
  • Hintzman, D. L., & Ludlum, G. (1980). Differential forgetting of prototypes and old instances: Simulation by an exemplar-based classification model. Memory & Cognition, 8(4), 378–382. doi:10.3758/BF03198278
  • Jastrzembski, T., Gluck, K., & Gunzelmann, G. (2006). Knowledge tracing and prediction of future trainee performance. In: Proceedings of the 2006 Interservice/Industry Training, Simulation, and Education Conference (pp. 1498–1508). Orlando, FL: National Training Systems Association.
  • Joe, J. N., Harmes, J. C., & Hickerson, C. A. (2011). Using verbal reports to explore rater perceptual processes in scoring: A mixed methods application to oral communication assessment. Assessment in Education: Principles, Policy & Practice, 18, 239–258.
  • Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54–71. doi:10.1177/026553229501200104
  • McClellan, C. A. (2010). Constructed-response scoring—Doing it right. R&D Connections, 13. Retrieved from https://www. ets.org/Media/Research/pdf/RD_Connections13.pdf.
  • Murre, J. M., & Dros, J. (2015). Replication and analysis of Ebbinghaus’ forgetting curve. PLoS One, 10(7), e0120644. doi:10.1371/journal.pone.0120644
  • Myford, C. M., & Wolfe, E. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371–389. doi:10.1111/j.1745-3984.2009.00088.x
  • Parke, C. S., Lane, S., & Stone, C. A. (2006). Impact of a state performance assessment program in reading and writing. Educational Research and Evaluation, 12, 239–269.
  • Pavlik, P. I., Jr (2007). Understanding and applying the dynamics of test practice and study practice. Instructional Science, 35, 407–441
  • Pavlik, P. I., Jr., & Anderson, J. R. (2003). An ACT-R model of the spacing effect. In F. Detje, D. Doerner, & H. Schaub (Eds.), In Proceedings of the Fifth International Conference on Cognitive Modeling (pp. 177–182). Bamberg, Germany: Universitats-Verlag Bamberg.
  • Pavlik, P. I., Jr, & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29, 559–586.
  • Pavlik, P. I., Jr, & Anderson, J. R. (2008). Using a model to compute the optimal schedule of practice. Journal of Experimental Psychology: Applied, 14, 101–117.
  • R Development Core Team. (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: Author. ISBN 3-900051-07-0. Retrieved from http://www.R-project.org.
  • Ricker-Pedley, K. L. (2011). An examination of the link between rater calibration performance and subsequent scoring accuracy in Graduate Record Examinations®(GRE®) Writing (Research report 11-1). Princeton, NJ: Educational Testing Service.
  • Ricker-Pedley, K. L., & Li, H. (2010). Rater calibration and subsequent scoring performance [Internal manuscript]. Princeton, NJ: Educational Testing Service.
  • Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217.
  • van Rijn, D. H., van Maanen, L., & van Woudenberg, M. (2009). Passing the test: Improving learning gains by balancing spacing and testing effects. In Proceedings of the 9th International Conference of Cognitive Modeling, Manchester, UK.
  • Wendler, C., Glazer, N., & Cline, F. (2019). Examining the calibration process for GRE raters. (ETS GRE-19-01 & ETS RR-19-09). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/ets2.12245
  • Wigglesworth, G. (1994). Patterns of rater behaviour in the assessment of an oral interaction test. Australian Review of Applied Linguistics, 17, 77–103.
  • Wilson, K. M. (1982). GMAT and GRE aptitude test performance in relation to primary language and scores on TOEFL (Research report 82-28). Princeton, NJ: Educational Testing Service.
  • Zhang, M. (2013). Contrasting automated and human scoring of essays. R & D Connections, 21, 2. Princeton, NJ: Educational Testing Service.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.