1,261
Views
8
CrossRef citations to date
0
Altmetric
Article

Parallel subtitle corpora and their applications in machine translation and translatology

, , &
Pages 595-610 | Received 31 Oct 2012, Accepted 29 May 2013, Published online: 19 Sep 2013
 

Abstract

SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four subtitling companies (InVision, DDS, Titelbild, VSI) and five technical partners (ALS, ATC, TextShuttle, University of Maribor, Vicomtech).For the SUMAT project, translated subtitles for seven language pairs have been collected. Four subtitling companies have contributed to this effort, which has so far resulted in collections numbering between 200,000 and 2 million subtitles per language pair. This paper describes the process of converting, classifying and aligning the subtitles. Conversion to a common text format and cross-language alignment were automatically done, using specially built converters, whilst classifying the subtitles according to text genre was a manual process, performed by the teams harvesting the subtitles.The resulting subtitle corpora are perfectly suited for various applications. The focus of the SUMAT project is to use them as training material for statistical machine translation systems, and this paper will report on the initial experiences with some of the language pairs. In addition, the parallel corpora may serve as input data for parallel concordancing systems. As part of the project, a small prototype has been built which shows how word-aligned parallel subtitles offer new insights for translation science.

Acknowledgements

The work leading to these results has received funding from the European Community under grant agreement no 270919.

Notes

11. For more information on the technique of using ‘template’ files for subtitling see Georgakopoulou (Citation2003, pp. 210–221).

15. The corpus is called WIT3 and can be downloaded from wit3.fbk.eu/mt.php?release=2012-01

18. This sentence is found at the bottom of every search results page on www.glosbe.com

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 178.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.