227
Views
6
CrossRef citations to date
0
Altmetric
Articles

A novel approach to text clustering using genetic algorithm based on the nearest neighbour heuristic

ORCID Icon, ORCID Icon &
Pages 291-303 | Received 24 Sep 2019, Accepted 13 Feb 2020, Published online: 10 Mar 2020

References

  • Abualigah LM, Khader AT, Al-Betar MA, et al. Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl. 2017;84:24–36.
  • Bharti KK, Singh PK. Chaotic gradient artificial bee colony for text clustering. Soft Comput. 2016;20(3):1113–1126.
  • Karaa WBA, Ashour AS, Sassi DB. Medline text mining: an enhancement genetic algorithm based approach for document clustering. In: Hassanien Aboul, Grosan Crina, Tolba FahmyMd., editors. In: Applications of intelligent optimization in biology and medicine. Springer; 2016. p. 267–287.
  • Abualigah LM, Khader AT. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput. 2017;73:4773–4795.
  • Karol S, Mangat V. Evaluation of text document clustering approach based on particle swarm optimization. Open Comput Sci. 2013;3(2):69–90.
  • Kanungo T, Mount DM, Netanyahu NS, et al. An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell. 2002;24(7):881–892.
  • Cutting DR, Karger DR, Pedersen JO. Scatter/gather: a cluster-based approach to browsing large document collections. ACM SIGIR forum. New York, USA, Vol. 51. ACM; 2017. p. 148–159.
  • Aggarwal CC, Zhai CX. Mining text data. New York, USA: Springer; 2012.
  • Kim HK, Kim H, Cho S. Bag-of-concepts: comprehending document representation through clustering words in distributed representation. Neurocomputing. 2017;266:336–352.
  • Vijayarani S, Ilamathi J, Nithya S. Preprocessing techniques for text mining-an overview. Int J Comput Sci Commun Netw. 2015;5(1):7–16.
  • Chen C-L, Tseng FSC, Liang T. An integration of wordnet and fuzzy association rule mining for multi-label document clustering. Data Knowl Eng. 2010;69(11):1208–1226.
  • Beil F, Ester M, Xu X. Frequent term-based text clustering. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton Alberta Canada, ACM; July, 2002. p. 436–442.
  • Mugunthadevi K, Punitha SC, Punithavalli M, et al. Survey on feature selection in document clustering. Int J Comput Sci Eng. 2011;3(3):1240–1241.
  • Li M, Zhang L. Multinomial mixture model with feature selection for text clustering. Knowl Based Syst. 2008;21(7):704–708.
  • Park S, An DU, Cheon CI. Document clustering method using weighted semantic features and cluster similarity. 2010 Third IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning. IEEE; Kaohsiung, Taiwan, 2010. p. 185–187.
  • Grossman DA, Frieder O. Information retrieval: algorithms and heuristics. Vol. 15. New York: Springer Science & Business Media; 2012.
  • Singh KN, Devi HM, Mahanta AK. Document representation techniques and their effect on the document clustering and classification: a review. Int J Adv Res Comput Sci. 2017;8(5):1780–1784.
  • Abraham A, Das S, Konar A. Document clustering using differential evolution. IEEE Congress on Evolutionary Computation, Vancouver, Canada, CEC 2006. IEEE; 2006. p. 1784–1791.
  • Bisht S, Paul A. Document clustering: a review. Int J Comput Appl. 2013;73(11):26–33.
  • Abualigah LM, Khader AT, Hanandeh ES. A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell. 2018;73:111–125.
  • Alshamiri AK, Singh A, Surampudi BR. Artificial bee colony algorithm for clustering: an extreme learning approach. Soft Comput. 2016;20(8):3163–3176.
  • Forsati R, Mahdavi M, Shamsfard M, et al. Efficient stochastic algorithms for document clustering. Inf Sci. 2013;220:269–291.
  • Ranjan R, Sahoo G. A new clustering approach for anomaly intrusion detection. arXiv preprint arXiv:1404.2772; 2014.
  • Li M, Deng S, Wang L, et al. Hierarchical clustering algorithm for categorical data using a probabilistic rough set model. Knowl Based Syst. 2014;65:60–71.
  • Celebi ME, Kingravi HA, Vela PA. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl. 2013;40(1):200–210.
  • Kim Y, Shim K, Kim M-S, et al. Dbcure-mr: an efficient density-based clustering algorithm for large data using mapreduce. Inf Syst. 2014;42:15–35.
  • Popat SK, Emmanuel M. Review and comparative study of clustering techniques. Int J Comput Sci Inf Technol. 2014;5(1):805–812.
  • Duwairi R, Abu-Rahmeh M. A novel approach for initializing the spherical k-means clustering algorithm. Simul Model Pract Theory. 2015;54:49–63.
  • Han J, Kamber M, Pei J. Data mining: concepts and techniques. Waltham, USA: Morgan kaufmann; 2006.
  • Cimiano P, Hotho A, Staab S. Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. ECAI., Valencia, Spain, Vol. 16; 2004. p. 435.
  • Xu R, Xu J, Wunsch DC. Clustering with differential evolution particle swarm optimization. 2010 IEEE Congress on Evolutionary Computation (CEC)., Barcelona, Spain, IEEE; 2010. p. 1–8.
  • Nanda SJ, Panda G. A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput. 2014;16:1–18.
  • Cui X, Potok TE. Document clustering analysis based on hybrid pso+ k-means algorithm. J Comput Sci. 2005;27:33.
  • Guha S, Mishra N. Clustering data streams. In: Data stream management; Springer, Berline, Heidelberg; 2016. p. 169–187.
  • Wang J, Su X. An improved k-means clustering algorithm. IEEE 3rd International Conference on Communication Software and Networks (ICCSN). IEEE; Xi'an, China 2011. p. 44–46.
  • Velmurugan T. Performance based analysis between k-means and fuzzy c-means clustering algorithms for connection oriented telecommunication data. Appl Soft Comput. 2014;19:134–146.
  • Pujari AK. Data mining techniques. Hyderabad, India: Universities Press; 2001.
  • Firdaus S, Uddin A. A survey on clustering algorithms and complexity analysis. Int J Comput Sci Issues. 2015;12(2):62.
  • Banerjee A, Merugu S, Dhillon IS, et al. Clustering with bregman divergences. J Mach Learn Res. 2005;6:1705–1749.
  • Chawla S, Gionis A. k-means–: a unified approach to clustering and outlier detection. Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM; Austin, Texas, 2013. p. 189–197.
  • Song W, Li CH, Park SC. Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Syst Appl. 2009;36(5):9095–9104.
  • Song W, Qiao Y, Park SC, et al. A hybrid evolutionary computation approach with its application for optimizing text document clustering. Expert Syst Appl. 2015;42(5):2517–2524.
  • Sridhar S, Dunham MH. Data mining, introduction and advanced topics. New Delhi, India: Prentice Hall Publication; 2013.
  • Saravanan D, Srinivasan S. Video data mining information retrieval using birch clustering technique. Artificial Intelligence and Evolutionary Algorithms in Engineering Systems. Springer; Kumaracoil, India, 2015. p. 583–594.
  • Mansoori EG. Gach: a grid-based algorithm for hierarchical clustering of high-dimensional data. Soft Comput. 2014;18(5):905–922.
  • Ferrari DG, De Castro LN. Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf Sci. 2015;301:181–194.
  • Xu D, Tian Y. A comprehensive survey of clustering algorithms. Ann Data Sci. 2015;2(2):165–193.
  • Agnihotri D, Verma K, Tripathi P. Pattern and cluster mining on text data. 2014 Fourth International Conference on Communication Systems and Network Technologies (CSNT). IEEE; Bhopal, India 2014. p. 428–432.
  • Sneath P, Sokal R. Unweighted pair group method with arithmetic mean. In: Numerical taxonomy. 1973. Springer, Berlin, p. 230–234.
  • Wang X, Qian B, Davidson I. On constrained spectral clustering and its applications. Data Min Knowl Discov. 2014;28:1–30.
  • Peng B, Zhang L, Zhang D. A survey of graph theoretical approaches to image segmentation. Pattern Recognit. 2013;46(3):1020–1038.
  • Karaboga D, Ozturk C. A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput. 2011;11(1):652–657.
  • Forsati R, Keikha A, Shamsfard M. An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing. 2015;159:9–26.
  • Gomaa WH, Fahmy AA. A survey of text similarity approaches. Int J Comput Appl. 2013;68(13):13–18.
  • Maroosi A, Amiri B. A new clustering algorithm based on hybrid global optimization based on a dynamical systems approach algorithm. Expert Syst Appl. 2010;37(8):5645–5652.
  • Xiang S, Nie F, Zhang C. Learning a mahalanobis distance metric for data clustering and classification. Pattern Recognit. 2008;41(12):3600–3612.
  • Bharti KK, Singh PK. Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl. 2015;42(6):3105–3114.
  • Das S, Abraham A, Konar A. Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybernet A. 2008;38(1):218–237.
  • Williams RJ, Volberg RA. The classification accuracy of four problem gambling assessment instruments in population research. Int Gambl Stud. 2014;14(1):15–28.
  • Ma B, Yuan H, Wu Y. Exploring performance of clustering methods on document sentiment analysis. J Inf Sci. 2017;43(1):54–74.
  • Hai Z, Chang K, Kim J-J, et al. Identifying features in opinion mining via intrinsic and extrinsic domain relevance. IEEE Trans Knowl Data Eng. 2014;26(3):623–634.
  • Yau C-K, Porter A, Newman N, et al. Clustering scientific documents with topic modeling. Scientometrics. 2014;100(3):767–786.
  • Jin C, Jin S. Automatic image annotation using feature selection based on improving quantum particle swarm optimization. Signal Processing. 2015;109:172–181.
  • Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Vol. 1. Cambridge: Cambridge University Press; 2008.
  • Mustafi D, Sahoo G, Mustafi A. A multi criteria document clustering approach using genetic algorithm. In: Computational intelligence in data mining. Vol. 1. Springer; New Delhi, 2016. p. 237–247.
  • Munková D, Munk M, Vozár M. Data pre-processing evaluation for text mining: transaction/sequence model. Procedia Comput Sci. 2013;18:1198–1207.
  • Wei T, Lu Y, Chang H, et al. A semantic approach for text clustering using wordnet and lexical chains. Expert Syst Appl. 2015;42(4):2264–2275.
  • Witten IH, Frank E, Hall MA. Data mining: practical machine learning tools and techniques. San Fransisco, US: Morgan Kaufmann; 2016.
  • Dagher GG, Fung BCM. Subject-based semantic document clustering for digital forensic investigations. Data Knowl Eng. 2013;86:224–241.
  • Feinerer I, Buchta C, Geiger W, et al. The textcat package for n-gram based text categorization in R. J Stat Softw. 2013;52(6):1–17.
  • Graovac J. A variant of n-gram based language-independent text categorization. Intell Data Anal. 2014;18(4):677–695.
  • Salton G, Wong A, Yang C-S. A vector space model for automatic indexing. Commun ACM. 1975;18(11):613–620.
  • Nasir JA, Varlamis I, Karim A, et al. Semantic smoothing for text clustering. Knowl Based Syst. 2013;54:216–229.
  • Grossman DA. Information retrieval: algorithms and heuristics. Vol. 15. New York: Springer; 2004.
  • Haddi E, Liu X, Shi Y. The role of text pre-processing in sentiment analysis. Procedia Comput Sci. 2013;17:26–32.
  • Shlens J. A tutorial on principal component analysis. Vol. 82. San Diego: Systems Neurobiology Laboratory, University of California at San Diego; 2005.
  • Chua FCT. Dimensionality reduction and clustering of text documents. Singapore: Singapore Management University; 2009.
  • Deerwester S, Dumais ST, Furnas GW, et al. Indexing by latent semantic analysis. J Am Soc Inf Sci. 1990;41(6):391.
  • Rana C, Jain SK. An evolutionary clustering algorithm based on temporal features for dynamic recommender systems. Swarm Evol Comput. 2014;14:21–30.
  • Aliguliyev RM. Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization. Comput Intell. 2010;26(4):420–448.
  • Maulik U, Bandyopadhyay S. Genetic algorithm-based clustering technique. Pattern Recognit. 2000;33(9):1455–1465.
  • Premalatha K, Natarajan AM. Genetic algorithm for document clustering with simultaneous and ranked mutation. Mod Appl Sci. 2009;3(2):75–82.
  • Denoeux T, Kanjanatarakul O, Sriboonchitta S. Ek-nnclus: a clustering procedure based on the evidential k-nearest neighbor rule. Knowl Based Syst. 2015;88:57–69.
  • Chou C-H, Hsieh Y-Z, Su M-C, et al. A new measure of cluster validity using line symmetry. J Inf Sci Eng. 2014;30(2):443–461.
  • Vdorhees EM. The cluster hypothesis revisited. ACM SIGIR forum. Japan, Vol. 51. ACM; 2017. p. 35–43.
  • Caballero R, Laguna M, Martí R, et al. Multiobjective clustering with metaheuristic optimization technology. Valencia, España: Departamento de Estadística e Investigación Operativa, Universidad de Valencia; 2006. (Reporte Técnico).
  • Arbelaitz O, Gurrutxaga I, Muguerza J, et al. An extensive comparative study of cluster validity indices. Pattern Recognit. 2013;46(1):243–256.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.