Views

CrossRef citations to date

Altmetric

Articles

Can automated machine translation evaluation metrics be used to assess students’ interpretation in the language learning classroom?

Chao HanDepartment of English, College of Foreign Languages and Cultures, Xiamen University, Xiamen, People’s Republic of China

https://orcid.org/0000-0002-6712-0555

Xiaolei LuDepartment of English, College of Foreign Languages and Cultures, Xiamen University, Xiamen, People’s Republic of ChinaCorrespondence[email protected]

https://orcid.org/0000-0002-6929-4110

Abstract

The use of translation and interpreting (T&I) in the language learning classroom is commonplace, serving various pedagogical and assessment purposes. Previous utilization of T&I exercises is driven largely by their potential to enhance language learning, whereas the latest trend has begun to underscore T&I as a crucial skill to be acquired as part of transcultural competence for language learners and future language users. Despite their growing popularity and utility in the language learning classroom, assessing T&I is time-consuming, labor-intensive and cognitively taxing for human raters (e.g., language teachers), primarily because T&I assessment entails meticulous evaluation of informational equivalence between the source-language message and target-language renditions. One possible solution is to rely on automated quality metrics that are originally developed to evaluate machine translation (MT). In the current study, we investigated the viability of using four automated MT evaluation metrics, BLEU, NIST, METEOR and TER, to assess human interpretation. Essentially, we correlated the automated metric scores with the human-assigned scores (i.e., the criterion measure) from multiple assessment scenarios to examine the degree of machine-human parity. Overall, we observed fairly strong metric-human correlations for BLEU (Pearson’s r = 0.670), NIST (r = 0.673) and METEOR (r = 0.882), especially when the metric computation was conducted on the sentence level rather than the text level. We discussed these emerging findings and others in relation to the feasibility of operationalizing MT metrics to evaluate students’ interpretation in the language learning classroom.

Supplemental data for this article is available online at https://doi.org/10.1080/09588221.2021.1968915 .

Keywords:

Disclosure statement

No potential conflict of interest was reported by the authors.

Funding

This study was supported by the National Education Examinations Authority and British Council English Assessment Research Grant and the Fundamental Research Funds for the Central Universities (no. 2072021116).

Notes on contributors

Dr. Chao Han’s research interests include testing and assessment issues in translation and interpreting (T&I), pedagogically-oriented T&I studies and methodological aspects of T&I research.

Dr. Xiaolei Lu’s research interests include corpus processing, translation technology and automated translation/interpreting assessment.

Notes

1 In MT, the quality estimation (QE) methodology can be used to estimate MT quality without recourse to target-language references. However, actualization of QE to evaluate T&I in the language learning classroom is remotely possible, due to its sophisticated computation.

2 Please also refer to Han et al. (Citation2021) from which our data were derived from and in which the rater-generated scores based on the source- and target-language references were compared to examine human raters’ scoring patterns.

3 The undergraduate English majors selected interpreting as their optional course, while the postgraduate students majored in English language and literature with a special focus on English-Chinese interpreting.

4 TEM-4 and TEM-8 are two primary English proficiency tests developed specifically for English majors in China’s mainland.

5 In hindsight, we also calculated a new document-level metric, called Corpus BLEU, which accounts for the micro-average precision at a corpus level. The corpus BLEU metric computed on the basis of one reference text had relatively strong correlations with the human raters-generated scores in each of the four conditions: r = 0.737 for the target text and the sentence level condition, r = 0.723 for the target text and the text level condition, r = 0.716 for the source text and the sentence level condition, and r = 0.728 for the source text and the text level condition. These results were slightly larger (in excess of about r = 0.05) than the correlation coefficients associated with BLEU scores previously computed and reported in Table 5. This means that the previous text-level BLEU scores are largely comparable with the new document-level Corpus BLEU scores.

6 https://github.com/luxiaolei930/CALL-Metrics-Sentence-level-scoring

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Can automated machine translation evaluation metrics be used to assess students’ interpretation in the language learning classroom?

Information for

Open access

Opportunities

Help and information

Can automated machine translation evaluation metrics be used to assess students’ interpretation in the language learning classroom?

Abstract

Disclosure statement

Funding

Notes on contributors

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature