Figures & data
Figure 1. Percentage of citations of Europarl according to field of study. Total number of citations = 1,000.
![Figure 1. Percentage of citations of Europarl according to field of study. Total number of citations = 1,000.](/cms/asset/9b17d84d-50f5-463a-b6e2-603512e563a8/rmps_a_1485716_f0001_oc.jpg)
Table 1. Samples of the German, English and Spanish monolingual Europarl files ep-11-05-09-018.txt (= proceedings of chapter 18 from May 9, 2011). Information about speaker and original language in metadata tags.
Table 2. Examples of inconsistent and incorrectly encoded source language identifiers in Europarl source files.
Table 3. Examples of inconsistencies in speaker names that require normalisation prior to statement matching.
Table 4. Number of Europarl statements yielded by source language identification and speaker matching procedure.
Table 5. Total sizes of extracted subcorpora in tokens. For parallel corpora, only target language tokens are counted.
Supplemental Material
Download Comma-Separated Values File (239.6 KB)Supplemental Material
Download Comma-Separated Values File (1.8 KB)Data availability statement
The data extracted with the tool presented in this article are available in Zenodo at https://zenodo.org/record/1066474#.WnnEM3wiHcs (DOI: 10.5281/zenodo.1066473) and https://zenodo.org/record/1066472#.WnnEYXwiHcs (DOI: 10.5281/zenodo.106647). These data were derived from the European Parliament Proceedings Parallel Corpus (www.statmt.org/europarl). The data supporting the bibliometric analyses presented in this article are available within the supplementary materials.