265
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring

, &

References

  • Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V.2. Journal of Technology, Learning, and Assessment, 4(3). Retrieved from http://www.jtla.org
  • Baker, B. A. (2012). Individual differences in rater decision-making style: An exploratory mixed-methods study. Language Assessment Quarterly, 9(3), 225–248. doi:10.1080/15434303.2011.637262
  • Bejar, I. I. (2017). A historical survey of research regarding constructed-response formats. In R. Bennett & M. von Davier (Eds.), Advancing human assessment: Methodological, psychological, and policy contributions. New York, NY: Springer. Retrieved from https://link.springer.com/chapter/10.1007/978-3-319-58689-2_18
  • Bejar, I. I., Williamson, D. M., & Mislevy, R. J. (2006). Human scoring. In D. M. Williamson, R. J. Mislevy, & I. I. Bejar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 49–82). Mahwah, NJ: Lawrence Erlbaum.
  • Bennett, R. E., & Ben-Simon, A. (2005). Toward theoretically meaningful automated essay scoring. Journal of Technology, Learning, and Assessment.
  • Braun, H. I. (1986). Calibration of essay readers: Final report (RR-86-09). Princeton, NJ. doi:10.1002/j.2330-8516.1986.tb00164.x
  • Braun, H. I. (1988). Understanding scoring reliability: Experiments in calibrating essay readers. Journal of Educational Statistics, 13(1), 1–18. doi:10.3102/10769986013001001
  • Bröder, A., Gräf, M., & Kieslich, P. J. (2017). Measuring the relative contributions of rule-based and exemplar-based processes in judgment: Validation of a simple model. Judgment and Decision Making, 12(5), 491–506.
  • Cohen, Y. (2017). Estimating the intra-rater reliability of essay raters. Frontiers in Education, 2(49). doi:10.3389/feduc.2017.00049
  • Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. The Modern Language Journal, 86(1), 67–96. doi:10.1111/1540-4781.00137
  • Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81(2), 95–106. doi:10.1037/h0037613
  • Diederich, P. B., French, J. W., & Carlton, S. T. (1961). Factors in judgments of writing ability (RB-61-15). Princeton, NJ. doi:10.1002/j.2333-8504.1961.tb00286.x
  • Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185. doi:10.1177/0265532207086780
  • Eckes, T. (2012). Operational rater types in writing assessment: Linking rater cognition to rater behavior. Language Assessment Quarterly, 9(3), 270–292. doi:10.1080/15434303.2011.649381
  • Egberink, I. J. L., Meijer, R. R., Veldkamp, B. P., Schakel, L., & Smid, N. G. (2010). Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM. Personality and Individual Differences, 48(8), 921–925. doi:10.1016/j.paid.2010.02.023
  • Farag, Y., Yannakoudakis, H., & Briscoe, T. (2018). Neural automated essay scoring and coherence modeling for adversarially crafted input. In Proceedings of NAACL-HLT 2018 (pp. 263–271). New Orleans, LA: ACL.
  • Freedman, S. W., & Calfee, R. C. (1983). Holistic assessment of writing: Experimental design and cognitive theory. In P. Mosenthal, L. Tamor, & S. A. Walmsley (Eds.), Research on writing: Principles and methods (pp. 75–98). New York, NY: Longman.
  • Houston, W. M., Raymond, M. R., & Svec, J. C. (1991). Adjustments for rater effects in performance assessment. Applied Psychological Measurement, 15(4), 409–421. doi:10.1177/014662169101500411
  • Karren, R. J., & Barringer, M. W. (2002). A review and analysis of the policy-capturing methodology in organizational research: Guidelines for research and practice. Organizational Research Methods, 5(4), 337–361. doi:10.1177/109442802237115
  • Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York, NY: Springer.
  • McClellan, C. A. (2010). Constructed-response scoring–doing it right (Report No. RDC-13). Princeton, NJ. Retrieved from http://www.ets.org/research/policy_research_reports/rdc-13
  • Monaghan, W., & Bridgeman, B. (2005). e-rater as a quality control of human scores. Retrieved from Princeton, NJ From the ETS Web site: http://www.ets.org/Media/Research/pdf/RD_Connections7.pdf
  • Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.
  • Naylor, J. C., & Wherry, R. J., Sr. (1965). The use of simulated stimuli and the “JAN” technique to capture and cluster the policies of raters. Educational and Psychological Measurement, 25(4), 969–986. doi:10.1177/001316446502500403
  • Nguyen, H., & Dery, L. (2018). Neural networks for automated essay grading. Retrieved from https://cs224d.stanford.edu/reports/huyenn.pdf
  • Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1/2), 100–115. doi:10.2307/2333009
  • Patz, R. J., Junker, B. W., Johnson, M. S., & Mariano, L. T. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics in Medicine, 27(4), 341–384. doi:10.3102/10769986027004341
  • Paul, S. R. (1981). Bayesian methods for calibration of examiners. British Journal of Mathematical and Statistical Psychology, 34(2), 213–223. doi:10.1111/j.2044-8317.1981.tb00630.x
  • Powers, D. E. (2005). “Wordiness”: A selective review of its influence, and suggestions for investigating its relevance in tests requiring extended written responses (RM-04-08). Princeton, NJ. Retrieved from https://www.ets.org/Media/Research/pdf/RM-04-08.pdf
  • Powers, D. E., & Fowles, M. E. (2000). Likely impact of the GRE® writing assessment on graduate admissions decisions (GRE 97-06R, ETS RR −16). Princeton, NJ
  • Raymond, M. R., Harik, P., & Clauser, B. E. (2011). The impact of statistically adjusting for rater effects on conditional standard errors of performance ratings. Applied Psychological Measurement, 35(3), 235–246. doi:10.1177/0146621610390675
  • Ridgeway, G. (1999). The state of boosting. Computing Science and Statistics, 31, 172–181.
  • Sakyi, A. A. (2000). Validation of holistic scoring for ESL writing assessment: How raters evaluate compositions. In J. J. Kunnan (Ed.), Fairness and validation in language assessment (pp. 129–152). Cambridge: Cambridge University Press.
  • Song, Y., Heilman, M., Beigman Klebanov, B., & Deane, P. (2014). Applying argumentation schemes for essay scoring. Proceedings of the First Workshop on Argumentation Mining (pp. 69–78). Baltimore, Maryland: Association for Computational Lingustics.
  • Suto, I. (2012). A critical review of some qualitative research methods used to explore rater cognition. Educational Measurement: Issues and Practice, 31(3), 21–30. doi:10.1111/j.1745-3992.2012.00240.x
  • Suto, I., & Greatorex, J. (2008). What goes through an examiner’s mind? Using verbal protocols to gain insights into the GCS marking process. British Educational Research Journal, 34(2), 213–233. doi:10.1080/01411920701492050
  • Wang, C., Song, T., Wang, Z., & Wolfe, E. (2017). Essay selection methods for adaptive rater monitoring. Applied Psychological Measurement, 41(1), 60–79. doi:10.1177/0146621616672855
  • Wolfe, E. W. (2014). Methods for monitoring rating quality: Current practices and suggested changes.
  • Zhang, J. (2016). Same text different processing? Exploring how raters’ cognitive and meta-cognitive strategies influence rating accuracy in essay scoring. Assessing Writing, 27, 37–53. doi:10.1016/j.asw.2015.11.001

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.