26
Views
1
CrossRef citations to date
0
Altmetric
Original Articles

An automatic filtering method for field association words by deleting unnecessary words

, , , &
Pages 247-261 | Received 22 Jun 2005, Accepted 15 Jun 2006, Published online: 17 Feb 2007
 

Abstract

Document classification and summarization are very important for document text retrieval. Generally, humans can recognize fields such as ⟨Sports⟩ or ⟨Politics⟩ based on specific words called Field Association (FA) words in those document fields. The traditional method causes misleading redundant words (unnecessary words) to be registered because the quality of the resulting FA words depends on learning data pre-classified by hand. Therefore recall and precision of document classification are degraded if the classified fields classified by hand are ambiguous. We propose two criteria: deleting unnecessary words with low frequencies, and deleting unnecessary words using category information. Moreover, using the proposed criteria unnecessary words can be deleted from the FA words dictionary created by the traditional method. Experimental results showed that 25% of 38 372 FA word candidates were identified as unnecessary and deleted automatically when the presented method was used. Furthermore, precision and F-measure were improved by 26% and 15%, respectively, compared with the traditional method.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,129.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.