1,086
Views
15
CrossRef citations to date
0
Altmetric
Articles

Supporting collocation learning with a digital library

, &
Pages 87-110 | Published online: 05 Feb 2010
 

Abstract

Extensive knowledge of collocations is a key factor that distinguishes learners from fluent native speakers. Such knowledge is difficult to acquire simply because there is so much of it. This paper describes a system that exploits the facilities offered by digital libraries to provide a rich collocation-learning environment. The design is based on three processes that have been identified as leading to lexical acquisition: noticing, retrieval and generation. Collocations are automatically identified in input documents using natural language processing techniques and used to enhance the presentation of the documents and also as the basis of exercises, produced under teacher control, that amplify students' collocation knowledge. The system uses a corpus of 1.3 B short phrases drawn from the web, from which 29 M collocations have been automatically identified. It also connects to examples garnered from the live web and the British National Corpus.

Acknowledgements

We gratefully acknowledge the stimulating environment provided by the digital library laboratory at the University of Waikato. This research is funded by the Royal Society of New Zealand Marsden fund.

Notes

3. These articles are from the University of Waikato Pathway College's IELTS course.

4. We use the OpenNLP tagger, http://opennlp.sourceforge.net

5. This step can be disabled when creating the collection, which might be desirable if collocations are expected to contain neologisms (such as the word google) that do not appear in the British National Corpus and have therefore been omitted from web phrases.

6. The Google n-gram collection is available on six DVDs from http://www.ldc.upenn.edu

7. Of course, the limited context makes this a less reliable, although still useful, procedure.

8. We are implementing further hint options, such as giving the first letter, last letter, or dictionary definition of the target word.

9. Language level metadata can be specified explicitly for each document when the collection is built; if it is not, the Flesch-Kincaid grade level (http://en.wikipedia.org/wiki/Flesch-E2_Kincaid_readability_test) is used.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.