1,206
Views
95
CrossRef citations to date
0
Altmetric
Original Articles

Examining Rater Effects in TestDaF Writing and Speaking Performance Assessments: A Many-Facet Rasch Analysis

Pages 197-221 | Published online: 16 Nov 2009

Keep up to date with the latest research on this topic with citation updates for this article.

Read on this site (11)

Kuan-Yu Jin & Thomas Eckes. (2022) Detecting Rater Centrality Effects in Performance Assessments: A Model-Based Comparison of Centrality Indices. Measurement: Interdisciplinary Research and Perspectives 20:4, pages 228-247.
Read now
Christopher J. Anthony, Kara M. Styck, Erin Cooke, Justin R. Martel & Katherine E. Frye. (2022) Evaluating the Impact of Rater Effects on Behavior Rating Scale Score Validity and Utility. School Psychology Review 51:1, pages 25-39.
Read now
Jordan M. Wheeler, George Engelhard & Jue Wang. (2022) Exploring Rater Accuracy Using Unfolding Models Combined with Topic Models: Incorporating Supervised Latent Dirichlet Allocation. Measurement: Interdisciplinary Research and Perspectives 20:1, pages 34-46.
Read now
Kuan-Yu Jin & Wen-Chung Wang. (2017) Assessment of Differential Rater Functioning in Latent Classes with New Mixture Facets Models. Multivariate Behavioral Research 52:3, pages 391-402.
Read now
Hyun Jung Kim. (2015) A Qualitative Analysis of Rater Behavior on an L2 Speaking Assessment. Language Assessment Quarterly 12:3, pages 239-261.
Read now
Claudia Harsch & Guido Martin. (2013) Comparing holistic and analytic scoring methods: issues of validity and reliability. Assessment in Education: Principles, Policy & Practice 20:3, pages 281-307.
Read now
Thomas Eckes. (2012) Operational Rater Types in Writing Assessment: Linking Rater Cognition to Rater Behavior. Language Assessment Quarterly 9:3, pages 270-292.
Read now
Rachel Kachchaf & Guillermo Solano-Flores. (2012) Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners. Applied Measurement in Education 25:2, pages 162-177.
Read now
Claudia Harsch & André Alexander Rupp. (2011) Designing and Scaling Level-Specific Writing Tasks in Alignment With the CEFR: A Test-Centered Approach. Language Assessment Quarterly 8:1, pages 1-33.
Read now

Articles from other publishers (84)

Huiying Cai & Xun Yan. (2023) Triangulating NLP-based analysis of rater comments and MFRM: An innovative approach to investigating raters’ application of rating scales in writing assessment. Language Testing.
Crossref
Mari Honko, Reeta Neittaanmäki, Scott Jarvis & Ari Huhta. (2023) Beyond literacy and competency – The effects of raters’ perceived uncertainty on assessment of writing. Assessing Writing 57, pages 100768.
Crossref
Stefanie A. Wind. (2022) A sequential approach to detecting differential rater functioning in sparse rater-mediated assessment networks. Language Testing 40:2, pages 209-226.
Crossref
Iasonas Lamprianou, Dina Tsagari & Nansia Kyriakou. (2023) Experienced but detached from reality: Theorizing and operationalizing the relationship between experience and rater effects. Assessing Writing 56, pages 100713.
Crossref
Soroush Sabbaghan & Ismaeil Fazel. 2023. Fake Degrees and Fraudulent Credentials in Higher Education. Fake Degrees and Fraudulent Credentials in Higher Education 169 185 .
Kuan-Yu Jin & Thomas Eckes. (2021) Detecting Differential Rater Functioning in Severity and Centrality: The Dual DRF Facets Model. Educational and Psychological Measurement 82:4, pages 757-781.
Crossref
Zhiqiang Yang, Yongqiang Zeng, Zhifang Li & Zhiqing Lin. (2022) Interrogating the Construct of PRETCO-Oral: Longitudinal Evidence From Raters and Test-Takers. Frontiers in Psychology 13.
Crossref
Gabriele Kecker & Thomas Eckes. (2022) Der digitale TestDaF: Aufbruch in neue Dimensionen des Sprachtestens. Informationen Deutsch als Fremdsprache 49:4, pages 289-324.
Crossref
Gudrun Erickson, Linda Borger & Eva Olsson. (2022) National assessment of foreign languages in Sweden: A multifaceted and collaborative venture. Language Testing 39:3, pages 474-493.
Crossref
Çiğdem AKIN ARIKAN, Pınar KANIK UYSAL, Huzeyfe BİLGE & Kasım YILDIRIM. (2022) Reliability of Ratings of Multidimensional Fluency Scale with Many-Facet Rasch ModelReliability of Ratings of Multidimensional Fluency Scale with Many-Facet Rasch Model. International Journal of Assessment Tools in Education 9:2, pages 470-491.
Crossref
Pratik S. Sachdeva, Renata Barreto, Claudia von Vacano & Chris J. Kennedy. (2022) Assessing Annotator Identity Sensitivity via Item Response Theory: A Case Study in a Hate Speech Corpus. Assessing Annotator Identity Sensitivity via Item Response Theory: A Case Study in a Hate Speech Corpus.
Süleyman Demir, Betül Düşünceli, Levent Ertuna & Tuğba Seda Çolak. (2022) Determining the levels of Professional competence of counsellor candidates in Turkey. International Journal for the Advancement of Counselling 44:2, pages 356-372.
Crossref
De Van Phung & Michael Michell. (2022) Inside Teacher Assessment Decision-Making: From Judgement Gestalts to Assessment Pathways. Frontiers in Education 7.
Crossref
Sara Gesuato & Victoriya Trubnikova. 2022. Handbook of Research on Policies and Practices for Assessing Inclusive Teaching and Learning. Handbook of Research on Policies and Practices for Assessing Inclusive Teaching and Learning 211 242 .
Gülden Kaya Uyanik & Levent Ertuna. (2022) Examination of Testlet Effect in Open-Ended Items. SAGE Open 12:1, pages 215824402210798.
Crossref
Li Liu & Guodong Jia. 2022. Assessing the English Language Writing of Chinese Learners of English. Assessing the English Language Writing of Chinese Learners of English 155 173 .
Aslıhan ERMAN ASLANOĞLU & Mehmet ŞATA. (2021) Examining the Differential Rater Functioning in the Process of Assessing Writing Skills of Middle School 7th Grade Students. Participatory Educational Research 8:4, pages 239-252.
Crossref
Masaki Uto & Masashi Okano. (2021) Learning Automated Essay Scoring Models Using Item-Response-Theory-Based Scores to Decrease Effects of Rater Biases. IEEE Transactions on Learning Technologies 14:6, pages 763-776.
Crossref
Souba Rethinasamy. (2021) The Effects of Different Rater Training Procedures on ESL Essay Raters’ Rating Accuracy. Pertanika Journal of Social Sciences and Humanities 29:S3.
Crossref
Mardiana Idris. (2021) Determining English Language Lecturers’ Quality of Marking in Continuous Assessment through Rasch Analysis. Pertanika Journal of Social Sciences and Humanities 29:S3.
Crossref
Wenjing Guo & Stefanie A. Wind. (2021) Examining the Impacts of Ignoring Rater Effects in Mixed‐Format Tests. Journal of Educational Measurement 58:3, pages 364-387.
Crossref
Masaki Uto. (2020) Accuracy of performance-test linking based on a many-facet Rasch model. Behavior Research Methods 53:4, pages 1440-1454.
Crossref
Kara M. Styck, Christopher J. Anthony, Angela Flavin, David Riddle & Brittany LaBelle. (2021) Are ratings in the eye of the beholder? A non-technical primer on many facet Rasch measurement to evaluate rater effects on teacher behavior rating scales. Journal of School Psychology 86, pages 198-221.
Crossref
Chao Han. 2021. Testing and Assessment of Interpreting. Testing and Assessment of Interpreting 85 113 .
Armin Berger. 2021. Developing Advanced English Language Competence. Developing Advanced English Language Competence 297 321 .
Nejdet KARADAG, Belgin BOZ YUKSEKDAG, Murat AKYILDIZ & Ali Ihsan IBILEME. (2020) ASSESSMENT AND EVALUATION IN OPEN EDUCATION SYSTEM: STUDENTS’ OPINIONS ABOUT OPEN-ENDED QUESTION (OEQ) PRACTICE. Turkish Online Journal of Distance Education 22:1, pages 179-193.
Crossref
Inan Deniz Erguvan & Beyza Aksu Dunya. (2020) Analyzing rater severity in a freshman composition course using many facet Rasch measurement. Language Testing in Asia 10:1.
Crossref
Enayat A. Shabani & Jaleh Panahi. (2020) Examining consistency among different rubrics for assessing writing. Language Testing in Asia 10:1.
Crossref
Stefanie A. Wind. (2020) Exploring the Impact of Rater Effects on Person Fit in Rater‐Mediated Assessments. Educational Measurement: Issues and Practice 39:4, pages 76-94.
Crossref
Masaki Uto & Maomi Ueno. (2020) A generalized many-facet Rasch model and its Bayesian estimation using Hamiltonian Monte Carlo. Behaviormetrika 47:2, pages 469-496.
Crossref
Jason Fan & Xun Yan. (2020) Assessing Speaking Proficiency: A Narrative Review of Speaking Assessment Research Within the Argument-Based Validation Framework. Frontiers in Psychology 11.
Crossref
Masaki Uto, Duc-Thien Nguyen & Maomi Ueno. (2020) Group Optimization to Maximize Peer Assessment Accuracy Using Item Response Theory and Integer Programming. IEEE Transactions on Learning Technologies 13:1, pages 91-106.
Crossref
Rosa Aghekyan. (2020) Validation of the SIEVEA instrument using the Rasch analysis. International Journal of Educational Research 103, pages 101619.
Crossref
Masaki Uto & Masashi Okano. 2020. Artificial Intelligence in Education. Artificial Intelligence in Education 549 561 .
Charles Nagle. (2019) Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk. Journal of Second Language Pronunciation 5:2, pages 294-323.
Crossref
Mustafa İLHAN. (2019) An Empirical Study for the Statistical Adjustment of Rater BiasAn Empirical Study for the Statistical Adjustment of Rater Bias. International Journal of Assessment Tools in Education 6:2, pages 193-201.
Crossref
Ahmet Volkan Yüzüak, Sinan Erten & Yılmaz Kara. (2019) Analysis of Laboratory Videos of Science Teacher Candidates with Many-Facet Rasch Measurement Model. Journal of Education in Science, Environment and Health.
Crossref
Peter Ho. (2019) A new approach to measuring Overall Liking with the Many-Facet Rasch Model. Food Quality and Preference 74, pages 100-111.
Crossref
Stefanie A. Wind & Eli Jones. (2019) The Effects of Incomplete Rating Designs in Combination With Rater Effects. Journal of Educational Measurement 56:1, pages 76-100.
Crossref
Masoumeh Ahmadi Shirazi. (2019) For a Greater Good: Bias Analysis in Writing Assessment. SAGE Open 9:1, pages 215824401882237.
Crossref
Bart Deygers. 2018. Second Handbook of Information Technology in Primary and Secondary Education. Second Handbook of Information Technology in Primary and Secondary Education 1 29 .
Masaki Uto. 2019. Artificial Intelligence in Education. Artificial Intelligence in Education 494 506 .
Bart Deygers. 2019. Second Handbook of English Language Teaching. Second Handbook of English Language Teaching 541 569 .
Kuan-Yu Jin & Wen-Chung Wang. (2018) A New Facets Model for Rater's Centrality/Extremity Response Style. Journal of Educational Measurement 55:4, pages 543-563.
Crossref
Ute Knoch & Carol A. Chapelle. (2017) Validation of rating processes within an argument-based framework. Language Testing 35:4, pages 477-499.
Crossref
Franz Holzknecht, Ari Huhta & Iasonas Lamprianou. (2018) Comparing the outcomes of two different approaches to CEFR-based rating of students’ writing performances across two European countries. Assessing Writing 37, pages 57-67.
Crossref
Iasonas Lamprianou. (2017) Investigation of Rater Effects Using Social Network Analysis and Exponential Random Graph Models. Educational and Psychological Measurement 78:3, pages 430-459.
Crossref
Masaki Uto & Maomi Ueno. (2018) Empirical comparison of item response theory models with rater's parameters. Heliyon 4:5, pages e00622.
Crossref
Kaja Zupanc & Erik Štrumbelj. (2018) A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment. PLOS ONE 13:4, pages e0195297.
Crossref
Park, Lili. (2018) Inter-rater Reliability of English Essay Test and Korean-English Translation Test Across Multiple Topics and Rating Criteria. The Journal of Foreign Studies null:43, pages 111-136.
Crossref
John Norris & Anastasia Drackert. (2017) Test review: TestDaF. Language Testing 35:1, pages 149-157.
Crossref
김현정. (2017) Improving the Validity of L2 Performance Assessments: Use of Many-Facet Rasch Measurement. Studies in Foreign Language Education 31:3, pages 277-297.
Crossref
Daniel R. Isbell. (2017) Assessing C2 writing ability on the Certificate of English Language Proficiency: Rater and examinee age effects. Assessing Writing 34, pages 37-49.
Crossref
Bengü Börkan. (2017) Exploring Variability Sources in Student Evaluation of Teaching via Many-Facet Rasch Model. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, pages 15-15.
Crossref
김현정. (2017) Defining the construct of second language speaking ability. Studies in Foreign Language Education 31:1, pages 113-140.
Crossref
Heejeong Jeong. (2017) Narrative and expository genre effects on students, raters, and performance criteria. Assessing Writing 31, pages 113-125.
Crossref
Jonathan Trace, Valerie Meier & Gerriet Janssen. (2016) “I can see that”: Developing shared rubric category interpretations through score negotiation. Assessing Writing 30, pages 32-43.
Crossref
Moonsoo Lee & Dongchun Cha. (2016) A Comparison of Generalizability Theory and Many Facet Rasch Measurement in an Analysis of Mathematics Creative Problem Solving Test. Journal of Curriculum and Evaluation 19:2, pages 251-279.
Crossref
Dominique Casanova & Marc Demeuse. (2017) Évaluateurs évalués : évaluation diagnostique des compétences en évaluation des correcteurs d’une épreuve d’expression écrite à forts enjeux. Mesure et évaluation en éducation 39:3, pages 59-94.
Crossref
Fahimeh Marefat & Mojtaba Heydari. (2016) Native and Iranian teachers’ perceptions and evaluation of Iranian students’ English essays. Assessing Writing 27, pages 24-36.
Crossref
Jinsong Fan & Trevor Bond. 2016. Pacific Rim Objective Measurement Symposium (PROMS) 2015 Conference Proceedings. Pacific Rim Objective Measurement Symposium (PROMS) 2015 Conference Proceedings 29 50 .
David Coniam & Peter FalveyDavid Coniam & Peter Falvey. 2016. Validating Technological Innovation. Validating Technological Innovation 123 132 .
David Coniam & Peter FalveyDavid Coniam & Peter Falvey. 2016. Validating Technological Innovation. Validating Technological Innovation 57 77 .
Chao Han. (2015) Investigating rater severity/leniency in interpreter performance testing. Interpreting. International Journal of Research and Practice in Interpreting Interpreting / International Journal of Research and Practice in Interpreting Interpreting 17:2, pages 255-283.
Crossref
Shujing Wu & Tongpei Dou. (2015) Validation of an Oral English Test Based on Many-faceted Rasch Model. Journal of Language Teaching and Research 6:4, pages 866.
Crossref
이은하. (2015) Examining the Rater Reliability of a Writing Performance Assessment in Korean as a Second Language(KSL) for Academic Purposes - A Many-facet Rasch Model Analysis -. KOREAN EDUCATION null:103, pages 311-354.
Crossref
Sun-Young Shin & Doreen Ewert. (2014) What accounts for integrated reading-to-write task scores?. Language Testing 32:2, pages 259-281.
Crossref
이향. (2014) An investigation of the differences between novice teachers, experienced teachers and non-teacher native speakers in evaluation of Korean language learners’ speech.. Journal of Korean Language Education 25:4, pages 163-188.
Crossref
Hui Li & Nuria Lorenzo-Dus. (2014) Investigating how vocabulary is assessed in a narrative task through raters' verbal protocols. System 46, pages 1-13.
Crossref
Zia Tajeddin & Minoo Alemi. (2013) Criteria and Bias in Native English Teachers’ Assessment of L2 Pragmatic Appropriacy: Content and FACETS Analyses. The Asia-Pacific Education Researcher 23:3, pages 425-434.
Crossref
Sebastiaan de Klerk, Theo J.H.M. Eggen & Bernard P. Veldkamp. (2014) A blending of computer-based assessment and performance-based assessment: Multimedia-Based Performance Assessment (MBPA). The introduction of a new method of assessment in Dutch Vocational Education and Training (VET). CADMO:1, pages 39-56.
Crossref
Christian Spoden, Jens Fleischer & Detlev Leutner. (2013) Niedrige Testmodellpassung als Resultat mangelnder Auswertungsobjektivität bei der Kodierung landesweiter Vergleichsarbeiten durch LehrkräfteLow Test Model Fit and Teacher Rater Bias—Results from a State-Wide Administered Large-Scale Assessment of Competencies. Journal für Mathematik-Didaktik 35:1, pages 79-99.
Crossref
Khaled Barkaoui. 2013. The Companion to Language Assessment. The Companion to Language Assessment 1301 1322 .
Chia-Wei Fan, Renée R. Taylor, Elin Ekbladh, Helena Hemmingsson & Jan Sandqvist. (2013) Evaluating the Psychometric Properties of a Clinical Vocational Rehabilitation Outcome Measurement: The Assessment of Work Performance (AWP). OTJR: Occupation, Participation and Health 33:3, pages 125-133.
Crossref
Soo Jung Youn & James Dean Brown. 2013. Assessing Second Language Pragmatics. Assessing Second Language Pragmatics 98 123 .
유경아. (2012) Does Raters’ Rating Experience Influence English-speaking Test Ratings?. Studies in English Language & Literature 38:3, pages 243-263.
Crossref
George Leckie & Jo-Anne Baird. (2011) Rater Effects on Essay Scoring: A Multilevel Analysis of Severity Drift, Central Tendency, and Rater Experience. Journal of Educational Measurement 48:4, pages 399-418.
Crossref
김보람. (2011) Resolving Discrepant Ratings in Writing Assessments: The Choice of Resolution Method and Its Application. English Teaching 66:2, pages 211-231.
Crossref
Dominique Casanova & Marc Demeuse. (2014) Analyse des différentes facettes influant sur la fidélité de l’épreuve d’expression écrite d’un test de français langue étrangère. Mesure et évaluation en éducation 34:1, pages 25-53.
Crossref
David Coniam. (2010) Validating onscreen marking in Hong Kong. Asia Pacific Education Review 11:3, pages 423-431.
Crossref
Johannes Eckerth & Erwin Tschirner. (2009) Review of recent research (2002–2009) on applied linguistics and language teaching with specific reference to L2 German (part 2). Language Teaching 43:1, pages 38-65.
Crossref
Xiaoming Xi & Pam Mollaun. (2014) HOW DO RATERS FROM INDIA PERFORM IN SCORING THE TOEFL IBT™ SPEAKING SECTION AND WHAT KIND OF TRAINING HELPS?. ETS Research Report Series 2009:2.
Crossref
Thomas Eckes. (2008) Rater types in writing performance assessments: A classification approach to rater variability. Language Testing 25:2, pages 155-185.
Crossref
Thomas Eckes. (2006) Multifacetten-Rasch-Analyse von Personenbeurteilungen. Zeitschrift für Sozialpsychologie 37:3, pages 185-195.
Crossref

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.