The volume of electronic text in different languages, particularly on the World Wide Web, is growing significantly, and the problem of users who are restricted in the number of languages they read obtaining information from this text is becoming more widespread. This article investigates some of the issues involved in achieving multilingual information extraction (IE), describes the approach adopted in the M-LaSIE-II IE system, which addresses these problems, and presents the results of evaluating the approach against a small parallel corpus of English/French newswire texts. The approach is based on the assumption that it is possible to construct a language independent representation of concepts relevant to the domain, at least for the small well-defined domains typical of IE tasks, allowing multilingual IE to be successfully carried out without requiring full machine translation.
Free access
Using a language independent domain model for multilingual information extraction
Reprints and Corporate Permissions
Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?
To request a reprint or corporate permissions for this article, please click on the relevant link below:
Academic Permissions
Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?
Obtain permissions instantly via Rightslink by clicking on the button below:
If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.
Related research
People also read lists articles that other readers of this article have read.
Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.
Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.