57
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

On the Relative Influence of Corpus and Dictionary Size in a Study Using Non-Parallel Corpora

Pages 137-148 | Published online: 09 Aug 2010
 

Abstract

We did an experiment on Japanese-to-German translation of 2-part compound nouns via their components using a small dictionary and a large Target Language (TL) corpus. As TL translation variants, we considered expressions containing adjectives or genitive adjuncts, as well as diverse forms for the first component of a German compound. Verification in a TL corpus is a good means of deciding among these forms, at least. In order to get significant statistics from corpora, large data quantities are important. As parallel data are still quite scarce, using monolingual corpora instead is an option, but it requires the use of a dictionary. In our study, insufficient dictionary size was an obstacle much bigger than corpus size. We tried to quantify the relative influence of the two resources to assess system balance. We predict that a middle-sized dictionary of about 100,000 entries would give good coverage of compound noun components.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.