ABSTRACT
The factors that influence rater scoring have been a subject of great interest to researchers in second language assessment. However, the research on the impact of test-takers’ speech profiles (e.g., a jagged or a flat profile reflecting analytic subscores) on raters’ scoring behaviors remains to be seen. To investigate the role of speech profiles in scoring, we collected analytic and holistic rating scores from 28 trained raters while they were marking the performances of three groups of speakers with distinct profiles, determined by prior ratings. We tracked eleven of the raters’ eye-movements to record how often and how long they looked at the various categories on the rating scales. We found that the raters perceived speakers who have better pronunciation as overall more competent speakers. Meanwhile, speakers’ score profiles influenced raters’ attention: raters fixated longer and more often, and made more eye-visits, to the lexical grammar category while assessing speakers with a jagged profile. Raters spent less time assessing the pronunciation of the speakers who were pre-identified as having better pronunciation. The findings shed light on the impact of speech characteristics on raters’ cognition and score assignments and therefore have important implications for rater training in L2 speaking assessments.
摘要
影响评分者评分的因素一直是二语语言测试学者非常感兴趣的一个主题。然而, 关于应试者的语音特征 (例如, 平衡或非平衡的分解评分子分数) 对评分者评分行为影响的研究仍有待于深入。为了研究语音分数概况在评分中的作用, 我们从 28 位受过培训的评分者那里收集了三组具有不同分数概况的说话者的分解和整体评分分数, 这些分数概况是根据先前的评分确定的。我们追踪了 11 名评分者的眼球运动, 以记录他们查看评分量表上各个评分标准的频率和时长。我们发现, 评分者认为发音更好的说话者是综合口语能力更强的说话者。同时, 说话者的分解分数概况影响了评估者的注意力:评估者在评估具有不平衡分解分数概况的说话者时, 在词汇语法类别上停留的时间更长, 更频繁, 眼看的次数也更多。评估者花费更少的时间来评估被预先确定为具有更好发音的说话者的发音。研究结果揭示了语音特征对评分者打分过程和结果的影响, 对二语口语测试中的评分者培训具有重要意义。
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 In an exploratory study that aimed to map how fluency scoring, potentially mediated by task design features, impacted final Aptis speaking-test scores, Tavakoli et al. (Citation2017) recorded various fluency metrics (such as those related to speech rate and repairs or repetition) within 32 Aptis-test-takers’ speech samples. However, the 32 test takers were selected by filtering out of the larger sample anyone who had a jagged score profile. The researchers did not mention how many test takers were filtered out due to having a jagged score profile, thus readers cannot ascertain how frequent such profiles are within a sample of Aptis test takers. Such information would be useful to more broadly understand the study's findings, and to better contextualize the field's more established claims about individual speech components that are frequently studied, such as fluency.
2 We originally designed the study to focus on the eye-movement behaviors of L1 English speakers of English. We did not intend to include L2 English speakers in the eye-tracking data collection phase of the study because previous eye-tracking research has shown that the language processing associated with L1 and L2 languages are different (e.g., Cop et al., Citation2017) and that raters coming from different educational or experience backgrounds may rate differently (Chalhoub-Deville, Citation1995). However, we did collect eye-movement data from L2 English speakers (international graduate students) because we recruited raters from within a university participant pool which required that all in the pool be able to participate fully if they wanted to. Thus, we did not analyze the eye-tracking data from the L2 English speakers, as they were not in the original study design, but we include their data in the published data file for this study (if 90% of their eye-movements were recorded) for future analyses and use.
3 According to the manual for Tobii TX-300, the percent is calculated by dividing the number of eye tracking samples with usable gaze data that were correctly identified, by the number of attempts.