227
Views
6
CrossRef citations to date
0
Altmetric
Articles

A novel approach to text clustering using genetic algorithm based on the nearest neighbour heuristic

ORCID Icon, ORCID Icon &
Pages 291-303 | Received 24 Sep 2019, Accepted 13 Feb 2020, Published online: 10 Mar 2020
 

Abstract

In this paper, we propose a novel clustering algorithm which uses a weighted combination of several criteria as its fitness function. We demonstrate the suitability of the new method in the case of clustering text documents. The proposed algorithm leverages the concept of nearest neighbour separation (NNS) to enhance the separation of the clusters and also outlines a heuristic to compute the NNS. A new parameterized fitness function has been proposed which can be tuned to provide more weightage to the traditional metrics based on inter- and intra-cluster distances of clusters or on the NNS. Genetic Algorithm has been used to perform the actual clustering and the results obtained has been compared with the traditional K-Means algorithm. The performance of the algorithm has been tested on different standard datasets, and the results have been presented.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

D. Mustafi

Debjani Mustafi is affiliated to Birla Institute of Technology, Mesra, India. She is currently working as a faculty member in the Department of Computer Science and Engineering. She has been actively associated with academics. She has authored and co-authored multiple peer reviewed journals and conferences. Her research interests include Text mining, Data analysis and visualization, Evolutionary Computing.

A. Mustafi

Abhijit Mustafi is affiliated to Computer Science and Engg., Birla Institute of Technology, Mesra, India. He is currently providing services as Associate Professor. He has authored and co-authored multiple peer-reviewed scientific papers and presented works at many national and International conferences. His academic career is decorated with several reputed awards and funding. Abhijit Mustafi research interests include Information retrieval from web corpus, Dynamic data visualization and blind source separation of images.

G. Sahoo

Gadadhar Sahoo received his M. Sc. degree in Mathematics from Utkal University in 1980 and Ph.D. degree in the area of Computational Mathematics from Indian Institute of Technology, Kharagpur in 1987. He is currently working as a Professor in the Department of Computer Science and Engineering. He has approximately 300 publications in in different national and international journals. His area of interests are Soft and Evolutionary Computing, Grid Computing, ML, Image Processing, Wireless Sensor Network, Bio-Informatics, Cloud Computing, etc.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 288.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.