1,086
Views
15
CrossRef citations to date
0
Altmetric
Articles

Supporting collocation learning with a digital library

, &
Pages 87-110 | Published online: 05 Feb 2010
 

Abstract

Extensive knowledge of collocations is a key factor that distinguishes learners from fluent native speakers. Such knowledge is difficult to acquire simply because there is so much of it. This paper describes a system that exploits the facilities offered by digital libraries to provide a rich collocation-learning environment. The design is based on three processes that have been identified as leading to lexical acquisition: noticing, retrieval and generation. Collocations are automatically identified in input documents using natural language processing techniques and used to enhance the presentation of the documents and also as the basis of exercises, produced under teacher control, that amplify students' collocation knowledge. The system uses a corpus of 1.3 B short phrases drawn from the web, from which 29 M collocations have been automatically identified. It also connects to examples garnered from the live web and the British National Corpus.

Acknowledgements

We gratefully acknowledge the stimulating environment provided by the digital library laboratory at the University of Waikato. This research is funded by the Royal Society of New Zealand Marsden fund.

Notes

3. These articles are from the University of Waikato Pathway College's IELTS course.

4. We use the OpenNLP tagger, http://opennlp.sourceforge.net

5. This step can be disabled when creating the collection, which might be desirable if collocations are expected to contain neologisms (such as the word google) that do not appear in the British National Corpus and have therefore been omitted from web phrases.

6. The Google n-gram collection is available on six DVDs from http://www.ldc.upenn.edu

7. Of course, the limited context makes this a less reliable, although still useful, procedure.

8. We are implementing further hint options, such as giving the first letter, last letter, or dictionary definition of the target word.

9. Language level metadata can be specified explicitly for each document when the collection is built; if it is not, the Flesch-Kincaid grade level (http://en.wikipedia.org/wiki/Flesch-E2_Kincaid_readability_test) is used.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 339.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.