Abstract
Clustering the results of a search helps the user to review the information gathered. In this article, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can make it easier for the user to comprehend the labels and search results. To realize this goal, we make three proposals. The first is to use Named Entity Extraction for term extraction. The second is to create a new label-selecting criterion based on importance in the search result and the relation between terms and search queries. The third is a label categorization using category information of labels, which is generated by named entity extraction. We implement a prototype system based on these proposals and find that it offers a much higher performance than existing methods; we focus on news articles in this article, but the system is not topic specific.
Notes
3Though the named entity extraction tool extracts numeric expressions as well as proper nouns, we use only proper nouns (person, organization, location, artifact name) in this article.
4The search target of our system comes from Japanese news articles and that of Clusty from English news articles. So, when we use our system, the queries are translated to Japanese.