1,844
Views
40
CrossRef citations to date
0
Altmetric
Articles

The design and evaluation of a Statistical Machine Translation syllabus for translation students

&
Pages 295-315 | Received 11 Sep 2013, Accepted 10 Apr 2014, Published online: 27 Aug 2014
 

Abstract

Despite the acknowledged importance of translation technology in translation studies programmes and the current ascendancy of Statistical Machine Translation (SMT), there has been little reflection to date on how SMT can or should be integrated into the translation studies curriculum. In a companion paper we set out a rationale for including a holistic SMT syllabus in the translation curriculum. In this paper, we show how the priorities and aspirations articulated in that source can be operationalised in the translation technology classroom and lab. We draw on our experience of designing and evaluating an SMT syllabus for a cohort of postgraduate student translators at Dublin City University in 2012. In particular, we report on data derived from a mixed-methods approach that aims to capture the students’ view of the syllabus and their self-assessment of their own learning. Using the construct of self-efficacy, we show significant increases in students’ knowledge of and confidence in using machine translation in general and SMT in particular, after completion of teaching units in SMT. We report on additional insights gleaned from student assignments, and conclude with ideas for future refinements of the syllabus.

Acknowledgements

The authors would like to thank all the students who participated, as well as Prof. Andy Way and Dr Jie Jiang, who supported us in our use of Cloud-based SMT throughout this project, and the two anonymous reviewers who provided valuable feedback on this paper.

Notes

1. Interested readers may contact the authors for more information on course content.

2. The MA and the MSc overlap to a large extent but differ in that MSc students are required to study a programming language (Java), whereas MA students are not, and while MA students must complete specialised language-pair specific translation practice modules, MSc students are not obliged to do so.

3. See Doherty and Moorkens (Citation2013) for a photograph of this lab set-up.

4. The programmes in question are each worth 90 ECTS credits, with 60 credits allocated to the taught course and a further 30 to a research dissertation. For details of the ECTS system, see http://ec.europa.eu/education/tools/ects_en.htm. (All web links in this paper were last accessed in March 2014.)

5. The DGT-TM is accessible from http://ipsc.jrc.ec.europa.eu/index.php?id=197. It contains the entire body of European legislation, comprising all the treaties, regulations and directives adopted by the European Union.

6. Since 2012 a small amount of material for a twenty-third language, Irish or Gaeilge, has been added to the DGT-TM.

7. In practice, many SMT engines are trained on corpora of 200,000 high-quality aligned translation units in specialised domains while others, e.g. Google Translate, may employ millions of translation units and have wider coverage.

8. This ‘burden’ should not be exaggerated, however; while increasing the quantity of data used to train an engine inevitably slows down the training process, many students dealt effectively with such restrictions by leaving engines to train overnight; and, as the service is based in the Cloud, students can access it off-campus and even via smartphones.

9. When the syllabus was re-delivered in 2013 (a process not reported on in detail here), the platform used by the vast majority of students was KantanMT.com.

10. TMX files contain source-language units aligned with their target-language translations, along with any relevant metadata (e.g. the date the ‘translation unit’ was created, the name of the client, etc.).

11. Given the dynamic nature of such websites, readers are advised to type ‘edit distance calculator’, ‘General Text Matcher’ or ‘Levenshtein distance calculator’ into their search engines to find instances of such services.

12. Asiya-Online is available from http://asiya.lsi.upc.edu/demo/asiya_online.php.

13. This outcome underlines how difficult it is to move from linguistic diagnoses of problems in SMT output to linguistic interventions in source texts that can help solve those problems (see Kenny and Doherty, this issue).

14. When we re-implemented our SMT syllabus in 2013, the interventions were distributed as follows: bilingual training data added: 27; glossary uploaded: 18; target-language training data added: 12; source text edited: 11; post-edits made: 5. The average number of interventions made per student rose to 2.5, with 29 students making 73 interventions between them.

15. The questionnaire item here was: ‘How much time, on average, do you spend using a computer per week?’ The scale presented to respondents was: 1 = 0 hrs, 2 = 1 to 10 hrs, 3 = 11 to 19 hrs, 4 = 20 to 39 hrs, 5 = 30+ hrs. Results at t1 were as follows: t1 median = 4, mean = 4.20, SD =.71. At t2 the mean had decreased very slightly: t2 median = 4, mean = 4.15, SD =.88). Unsurprisingly, a repeated measures t-test found no significant change between the two time points, where t = 4.38, df = 19, p =.666.

16. Item: ‘How would you rate your knowledge of translation memories (TMs)?’ Scale: 1 = poor, 2 = below average, 3 = average, 4 = above average, 5 = excellent. Results at t1: median = 3, mean = 3.20, SD =.81. Results at t2: median = 4, mean = 3.50, SD =.69. A repeated measures t-test found no significant change between the two time points (t = -1.831, df = 19, p =.0828).

17. Item: How would you rate your knowledge of machine translation (MT)?’ Scale: 1 = poor, 2 = below average, 3 = average, 4 = above average, 5 = excellent. Results at t1: median = 3, mean = 2.80, SD =.616. Results at t2: median = 4, mean = 3.45, SD =.686). A repeated measures t-test found a significant increase for this item (t = -3.322, df = 19, p =.004).

18. This is one reason why Google Translate could not be used to deliver the syllabus described here.

19. For example, DGT-TM files are ‘zipped’ (that is, individual .tmx files are compressed and then grouped into archives), but students do not need to ‘unzip’ them as the bilingual translation memory extraction tool provided by the Joint Research Centre (see above; and Steinberger et al. Citation2012) can extract bilingual .tmx files directly from zipped files.

20. This comment relates to output from a Japanese—English SMT engine that was trained on inadequate data.

21. We are grateful, in particular, to colleagues on the OPTIMALE project (http://www.translator-training.eu/) who have shared similar experiences with us.

22. See the COSTA MT Evaluation Tool: An Open Toolkit for Human Machine Translation Evaluation, https://code.google.com/p/costa-mt-evaluation-tool/.

23. In order to protect the anonymity of questionnaires, we elicited information on students language pairs independently.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 209.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.