1,124
Views
9
CrossRef citations to date
0
Altmetric
Articles

Weight-Based Classification of Raters and Rater Cognition in an EFL Speaking Test

REFERENCES

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  • Anderson, N. H. (1981). Foundation of information integration theory. New York, NY: Academic Press.
  • Bejar, I. I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2–9. doi:10.1111/emip.2012.31.issue-3
  • Breland, H. M., & Jones, R. J. (1984). Perceptions of writing skills. Written Communication, 1(1), 101–119. doi:10.1177/0741088384001001005
  • Brown, A. (2000). An investigation of the rating process in the IELTS oral interview. IELTS Research Reports, 3, 49–84.
  • Chalhoub-Deville, M. (1995). Deriving oral assessment scales across different tests and rater groups. Language Testing, 12(1), 16–33. doi:10.1177/026553229501200102
  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. doi:10.1177/001316446002000104
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
  • Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7(1), 31–51. doi:10.1177/026553229000700104
  • DeVellis, R. F. (2012). Scale development: Theory and applications (3rd ed.). Thousand Oaks, CA: Sage.
  • Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185. doi:10.1177/0265532207086780
  • Eckes, T. (2009). On common ground? How raters perceive scoring criteria in oral proficiency testing. In A. Brown, & K. Hill (Eds.), Tasks and criteria in performance assessment: Proceedings of the 28th Language Testing Research Colloquium (pp. 43–73). Frankfurt, Germany: Peter Lang.
  • Eckes, T. (2012). Operational rater types in writing assessment: Linking rater cognition to rater behavior. Language Assessment Quarterly, 9(3), 270–292. doi:10.1080/15434303.2011.649381
  • Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31(2), 93–112. doi:10.1111/jedm.1994.31.issue-2
  • Erdosy, M. U. (2004). Exploring variability in judging writing ability in a second language: A study of four experienced raters of ESL compositions. TOEFL® Research Report. No. RR-70. Princeton, NJ: ETS.
  • Freedman, S. (1979). How characteristics of student essays influence teachers’ evaluations. Journal of Educational Psychology, 71(3), 328–338. doi:10.1037/0022-0663.71.3.328
  • Gamaroff, R. (2000). Rater reliability in language assessment: The bug of all bears. System, 28(1), 31–53. doi:10.1016/S0346-251X(99)00059-7
  • Gui, M. (2012). Exploring differences between Chinese and American EFL teachers’ evaluations of speech performance. Language Assessment Quarterly, 9(2), 186–203. doi:10.1080/15434303.2011.614030
  • Hoffman, P. J. (1960). The paramorphic representation of clinical judgment. Psychological Bulletin, 57(2), 116–131. doi:10.1037/h0047807
  • Huot, B. A. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research, 60(2), 237–263. doi:10.3102/00346543060002237
  • Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Praeger Publishers.
  • Kim, H. J. (2015). A qualitative analysis of rater behavior on an L2 speaking assessment. Language Assessment Quarterly, 12(3), 239–261.
  • Lattin, J., Carroll, D. J., & Green, P. E. (2003). Analyzing multivariate data. Pacific Grove, CA: Duxbury Press.
  • Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago, IL: MESA Press.
  • Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–276. doi:10.1191/0265532202lt230oa
  • Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54–71. doi:10.1177/026553229501200104
  • McNamara, T. F. (1996). Measuring second language performance. London, UK: Longman.
  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: Macmillan.
  • Mooi, E., & Sarstedt, M. (2011). A concise guide to market research. Berlin, Heidelberg: Springer-Verlag.
  • Myford, C. M. (2012). Rater cognition research: Some possible directions for the future. Educational Measurement: Issues and Practice, 31(3), 48–49. doi:10.1111/emip.2012.31.issue-3
  • Orr, M. (2002). The FCE speaking test: Using rater reports to help interpret test scores. System, 30(2), 143–154. doi:10.1016/S0346-251X(02)00002-7
  • Pollit, A., & Murray, N. L. (1996). What raters really pay attention to. In M. Milanovic, & N. Saville (Eds.), Studies in language testing 3: Performance testing, cognition and assessment (pp. 74–91). Cambridge, UK: Cambridge University Press.
  • Pulakos, E. D. (1991). Rater training for performance appraisal. In J. W. Jones, B. D. Steffy, & D. W. Bray (Eds.), Applying psychology in business: The handbook for managers and human resource professionals (pp. 326–332). New York, NY: Lexington Books.
  • Purpura, J. E. (2014). Cognition and language assessment. In A. J. Kunnan (Ed.), The companion to language assessment (pp. 1452–1476). Chichester, UK: John Wiley & Sons.
  • Rafoth, B. A., & Rubin, D. L. (1984). The impact of content and mechanics on judgments of writing quality. Written Communication, 1(4), 446–458. doi:10.1177/0741088384001004004
  • Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465–493. doi:10.1177/0265532208094273
  • Stewart, T. R. (1988). Judgment analysis: Procedures. In B. Brehmer, & C. R. B. Joyce (Eds.), Human judgment: The SJT view (pp. 41–74). North-Holland, The Netherlands: Elsevier Science Publishers.
  • Vaughan, C. (1991). Holistic assessment: What goes on in the rater’s mind? In I. L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 111–125). Norwood, NJ: Ablex.
  • Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244. doi:10.1080/01621459.1963.10500845
  • Writing Group of Syllabus for TEM4-Oral. (2008). Syllabus for TEM4-Oral (Revised ed.). Shanghai, China: Shanghai Foreign Language Education Press.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.