Parallel subtitle corpora and their applications in machine translation and translatology

Bywood, L., Volk, M., Fishel, M. and Georgakopoulou, P. 2013. Parallel subtitle corpora and their applications in machine translation and translatology. Perspectives: Studies in Translatology. 21 (4), pp. 595-610. https://doi.org/10.1080/0907676X.2013.831920

TitleParallel subtitle corpora and their applications in machine translation and translatology
AuthorsBywood, L., Volk, M., Fishel, M. and Georgakopoulou, P.
Abstract

SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four subtitling companies (InVision, DDS, Titelbild, VSI) and five technical partners (ALS, ATC, TextShuttle, University of Maribor, Vicomtech).For the SUMAT project, translated subtitles for seven language pairs have been collected. Four subtitling companies have contributed to this effort, which has so far resulted in collections numbering between 200,000 and 2 million subtitles per language pair. This paper describes the process of converting, classifying and aligning the subtitles. Conversion to a common text format and cross-language alignment were automatically done, using specially built converters, whilst classifying the subtitles according to text genre was a manual process, performed by the teams harvesting the subtitles.The resulting subtitle corpora are perfectly suited for various applications. The focus of the SUMAT project is to use them as training material for statistical machine translation systems, and this paper will report on the initial experiences with some of the language pairs. In addition, the parallel corpora may serve as input data for parallel concordancing systems. As part of the project, a small prototype has been built which shows how word-aligned parallel subtitles offer new insights for translation science.

JournalPerspectives: Studies in Translatology
Journal citation21 (4), pp. 595-610
ISSN0907-676X
Year2013
PublisherRoutledge
Digital Object Identifier (DOI)https://doi.org/10.1080/0907676X.2013.831920
Publication dates
Published19 Sep 2013

Related outputs

Audiovisual Translation: The Road Ahead
Nikolic, K. and Bywood, L. 2021. Audiovisual Translation: The Road Ahead. Journal of Audiovisual Translation. 4 (1), pp. 50-70. https://doi.org/10.47476/jat.v4i1.2021.156

Lindsay Bywood interviews Carol Robertson on her experience of the early days of subtitling at the BBC
Bywood, L. 2020. Lindsay Bywood interviews Carol Robertson on her experience of the early days of subtitling at the BBC. JoSTrans - The Journal of Specialised Translation. 34.

Book review: Sanderson, John D. and Carla Botella-Tejera (eds) (2018) Focusing on Audiovisual Translation Research
Bywood, L. 2020. Book review: Sanderson, John D. and Carla Botella-Tejera (eds) (2018) Focusing on Audiovisual Translation Research. JoSTrans - The Journal of Specialised Translation. 34.

Technology and Audiovisual Translation
Bywood, L. 2020. Technology and Audiovisual Translation. in: Bogucki, Łukasz and Deckert, Mikołaj (ed.) The Palgrave Handbook of Audiovisual Translation and Media Accessibility Palgrave Macmillan. pp. 503-517

Post-Editing in Practice: Process, Product and Networks
Nunes Vieira, L., Alonso, E. and Bywood, L. 2019. Post-Editing in Practice: Process, Product and Networks. JoSTrans - The Journal of Specialised Translation. 31.

Testing the retranslation hypothesis for audiovisual translation: the films of Volker Schlöndorff subtitled into English
Bywood, L. 2019. Testing the retranslation hypothesis for audiovisual translation: the films of Volker Schlöndorff subtitled into English. Perspectives. 27 (6), pp. 815-832. https://doi.org/10.1080/0907676x.2019.1593467

Embracing the threat: machine translation as a solution for subtitling
Bywood, L., Etchegoyhen, T. and Georgakopoulou, P. 2017. Embracing the threat: machine translation as a solution for subtitling . Perspectives: Studies in Translatology. 25 (3), pp. 492-508. https://doi.org/10.1080/0907676X.2017.1291695

Machine translation quality in an audiovisual context
Burchardt, A., Lommel, A., Bywood, L., Harris, K. and Popovic, M. 2016. Machine translation quality in an audiovisual context. Target. 28 (2), pp. 206-221. https://doi.org/10.1075/target.28.2.03bur

Book review: 'In translation: translators on their work and what it means' edited by Esther Allen and Susan Bernofsky (2013)
Bywood, L. 2014. Book review: 'In translation: translators on their work and what it means' edited by Esther Allen and Susan Bernofsky (2013). JoSTrans - The Journal of Specialised Translation. 21, pp. 206-208.

Machine translation for subtitling: a large-scale evaluation
Bywood, L., Etchegoyhen, T., Georgakopoulou, P., Fishel, M., Jiang, J., Loenhout, G., Pozo, A., Turner, A., Volk, M. and Maucec, M. 2014. Machine translation for subtitling: a large-scale evaluation. LREC 2014, Ninth International Conference on Language Resources and Evaluation. Harpa Concert Hall and Conference Center, Reykjavik, Iceland 26 May 2014

MT in subtitling and the rising profile of the post-editor
Georgakopoulou, P. and Bywood, L. 2014. MT in subtitling and the rising profile of the post-editor. Multilingual. 25 (1), pp. 24-28.

SUMAT: an online service for subtitling by machine translation
Bywood, L., Georgakopoulou, P., Etchegoyhen, T., Fishel, M., Jiang, J., Loenhout, G., Pozo, A., Spiliotopoulos, D., Turner, A. and Maucec, M. 2013. SUMAT: an online service for subtitling by machine translation. Machine Translation (MT) Summit XIV. ACROPOLIS Conference Centre, Nice, France 02 Sep 2013

Embracing the threat: machine translation as a solution
Bywood, L., Georgakopoulou, P. and Etchegoyhen, T. 2013. Embracing the threat: machine translation as a solution. Subtitling: a collective approach. University of Nottingham, Nottingham, UK 12 Jul 2013

Permalink - https://westminsterresearch.westminster.ac.uk/item/99828/parallel-subtitle-corpora-and-their-applications-in-machine-translation-and-translatology


Share this
Tweet
Email

Usage statistics

78 total views
0 total downloads
5 views this month
0 downloads this month
These values are for the period from September 2nd 2018, when this repository was created

Export as