Search in:

Advanced search

International Journal of Computers and Applications Volume 44, 2022 - Issue 3

Submit an article Journal homepage

227

Views

CrossRef citations to date

Altmetric

Articles

A novel approach to text clustering using genetic algorithm based on the nearest neighbour heuristic

D. MustafiDepartment of CSE, Birla Institute of Technology, Ranchi, IndiaCorrespondence[email protected]

https://orcid.org/0000-0002-7055-5031 View further author information

A. MustafiDepartment of CSE, Birla Institute of Technology, Ranchi, India

https://orcid.org/0000-0003-3454-0470 View further author information

G. SahooDepartment of CSE, Birla Institute of Technology, Ranchi, IndiaView further author information

Pages 291-303 | Received 24 Sep 2019, Accepted 13 Feb 2020, Published online: 10 Mar 2020

Cite this article
https://doi.org/10.1080/1206212X.2020.1735035
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Abualigah LM, Khader AT, Al-Betar MA, et al. Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl. 2017;84:24–36.
Web of Science ®Google Scholar
Bharti KK, Singh PK. Chaotic gradient artificial bee colony for text clustering. Soft Comput. 2016;20(3):1113–1126.
Web of Science ®Google Scholar
Karaa WBA, Ashour AS, Sassi DB. Medline text mining: an enhancement genetic algorithm based approach for document clustering. In: Hassanien Aboul, Grosan Crina, Tolba FahmyMd., editors. In: Applications of intelligent optimization in biology and medicine. Springer; 2016. p. 267–287.
Google Scholar
Abualigah LM, Khader AT. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput. 2017;73:4773–4795.
Web of Science ®Google Scholar
Karol S, Mangat V. Evaluation of text document clustering approach based on particle swarm optimization. Open Comput Sci. 2013;3(2):69–90.
Google Scholar
Kanungo T, Mount DM, Netanyahu NS, et al. An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell. 2002;24(7):881–892.
Web of Science ®Google Scholar
Cutting DR, Karger DR, Pedersen JO. Scatter/gather: a cluster-based approach to browsing large document collections. ACM SIGIR forum. New York, USA, Vol. 51. ACM; 2017. p. 148–159.
Google Scholar
Aggarwal CC, Zhai CX. Mining text data. New York, USA: Springer; 2012.
Google Scholar
Kim HK, Kim H, Cho S. Bag-of-concepts: comprehending document representation through clustering words in distributed representation. Neurocomputing. 2017;266:336–352.
Web of Science ®Google Scholar
Vijayarani S, Ilamathi J, Nithya S. Preprocessing techniques for text mining-an overview. Int J Comput Sci Commun Netw. 2015;5(1):7–16.
Google Scholar
Chen C-L, Tseng FSC, Liang T. An integration of wordnet and fuzzy association rule mining for multi-label document clustering. Data Knowl Eng. 2010;69(11):1208–1226.
Web of Science ®Google Scholar
Beil F, Ester M, Xu X. Frequent term-based text clustering. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton Alberta Canada, ACM; July, 2002. p. 436–442.
Google Scholar
Mugunthadevi K, Punitha SC, Punithavalli M, et al. Survey on feature selection in document clustering. Int J Comput Sci Eng. 2011;3(3):1240–1241.
Google Scholar
Li M, Zhang L. Multinomial mixture model with feature selection for text clustering. Knowl Based Syst. 2008;21(7):704–708.
Web of Science ®Google Scholar
Park S, An DU, Cheon CI. Document clustering method using weighted semantic features and cluster similarity. 2010 Third IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning. IEEE; Kaohsiung, Taiwan, 2010. p. 185–187.
Google Scholar
Grossman DA, Frieder O. Information retrieval: algorithms and heuristics. Vol. 15. New York: Springer Science & Business Media; 2012.
Google Scholar
Singh KN, Devi HM, Mahanta AK. Document representation techniques and their effect on the document clustering and classification: a review. Int J Adv Res Comput Sci. 2017;8(5):1780–1784.
Google Scholar
Abraham A, Das S, Konar A. Document clustering using differential evolution. IEEE Congress on Evolutionary Computation, Vancouver, Canada, CEC 2006. IEEE; 2006. p. 1784–1791.
Google Scholar
Bisht S, Paul A. Document clustering: a review. Int J Comput Appl. 2013;73(11):26–33.
Google Scholar
Abualigah LM, Khader AT, Hanandeh ES. A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell. 2018;73:111–125.
Web of Science ®Google Scholar
Alshamiri AK, Singh A, Surampudi BR. Artificial bee colony algorithm for clustering: an extreme learning approach. Soft Comput. 2016;20(8):3163–3176.
Web of Science ®Google Scholar
Forsati R, Mahdavi M, Shamsfard M, et al. Efficient stochastic algorithms for document clustering. Inf Sci. 2013;220:269–291.
Web of Science ®Google Scholar
Ranjan R, Sahoo G. A new clustering approach for anomaly intrusion detection. arXiv preprint arXiv:1404.2772; 2014.
Google Scholar
Li M, Deng S, Wang L, et al. Hierarchical clustering algorithm for categorical data using a probabilistic rough set model. Knowl Based Syst. 2014;65:60–71.
Web of Science ®Google Scholar
Celebi ME, Kingravi HA, Vela PA. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl. 2013;40(1):200–210.
Web of Science ®Google Scholar
Kim Y, Shim K, Kim M-S, et al. Dbcure-mr: an efficient density-based clustering algorithm for large data using mapreduce. Inf Syst. 2014;42:15–35.
Web of Science ®Google Scholar
Popat SK, Emmanuel M. Review and comparative study of clustering techniques. Int J Comput Sci Inf Technol. 2014;5(1):805–812.
Google Scholar
Duwairi R, Abu-Rahmeh M. A novel approach for initializing the spherical k-means clustering algorithm. Simul Model Pract Theory. 2015;54:49–63.
Web of Science ®Google Scholar
Han J, Kamber M, Pei J. Data mining: concepts and techniques. Waltham, USA: Morgan kaufmann; 2006.
Google Scholar
Cimiano P, Hotho A, Staab S. Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. ECAI., Valencia, Spain, Vol. 16; 2004. p. 435.
Google Scholar
Xu R, Xu J, Wunsch DC. Clustering with differential evolution particle swarm optimization. 2010 IEEE Congress on Evolutionary Computation (CEC)., Barcelona, Spain, IEEE; 2010. p. 1–8.
Google Scholar
Nanda SJ, Panda G. A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput. 2014;16:1–18.
Web of Science ®Google Scholar
Cui X, Potok TE. Document clustering analysis based on hybrid pso+ k-means algorithm. J Comput Sci. 2005;27:33.
Google Scholar
Guha S, Mishra N. Clustering data streams. In: Data stream management; Springer, Berline, Heidelberg; 2016. p. 169–187.
Google Scholar
Wang J, Su X. An improved k-means clustering algorithm. IEEE 3rd International Conference on Communication Software and Networks (ICCSN). IEEE; Xi'an, China 2011. p. 44–46.
Google Scholar
Velmurugan T. Performance based analysis between k-means and fuzzy c-means clustering algorithms for connection oriented telecommunication data. Appl Soft Comput. 2014;19:134–146.
Web of Science ®Google Scholar
Pujari AK. Data mining techniques. Hyderabad, India: Universities Press; 2001.
Google Scholar
Firdaus S, Uddin A. A survey on clustering algorithms and complexity analysis. Int J Comput Sci Issues. 2015;12(2):62.
Google Scholar
Banerjee A, Merugu S, Dhillon IS, et al. Clustering with bregman divergences. J Mach Learn Res. 2005;6:1705–1749.
Web of Science ®Google Scholar
Chawla S, Gionis A. k-means–: a unified approach to clustering and outlier detection. Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM; Austin, Texas, 2013. p. 189–197.
Google Scholar
Song W, Li CH, Park SC. Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Syst Appl. 2009;36(5):9095–9104.
Web of Science ®Google Scholar
Song W, Qiao Y, Park SC, et al. A hybrid evolutionary computation approach with its application for optimizing text document clustering. Expert Syst Appl. 2015;42(5):2517–2524.
Web of Science ®Google Scholar
Sridhar S, Dunham MH. Data mining, introduction and advanced topics. New Delhi, India: Prentice Hall Publication; 2013.
Google Scholar
Saravanan D, Srinivasan S. Video data mining information retrieval using birch clustering technique. Artificial Intelligence and Evolutionary Algorithms in Engineering Systems. Springer; Kumaracoil, India, 2015. p. 583–594.
Google Scholar
Mansoori EG. Gach: a grid-based algorithm for hierarchical clustering of high-dimensional data. Soft Comput. 2014;18(5):905–922.
Web of Science ®Google Scholar
Ferrari DG, De Castro LN. Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf Sci. 2015;301:181–194.
Web of Science ®Google Scholar
Xu D, Tian Y. A comprehensive survey of clustering algorithms. Ann Data Sci. 2015;2(2):165–193.
Google Scholar
Agnihotri D, Verma K, Tripathi P. Pattern and cluster mining on text data. 2014 Fourth International Conference on Communication Systems and Network Technologies (CSNT). IEEE; Bhopal, India 2014. p. 428–432.
Google Scholar
Sneath P, Sokal R. Unweighted pair group method with arithmetic mean. In: Numerical taxonomy. 1973. Springer, Berlin, p. 230–234.
Google Scholar
Wang X, Qian B, Davidson I. On constrained spectral clustering and its applications. Data Min Knowl Discov. 2014;28:1–30.
Web of Science ®Google Scholar
Peng B, Zhang L, Zhang D. A survey of graph theoretical approaches to image segmentation. Pattern Recognit. 2013;46(3):1020–1038.
Web of Science ®Google Scholar
Karaboga D, Ozturk C. A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput. 2011;11(1):652–657.
Web of Science ®Google Scholar
Forsati R, Keikha A, Shamsfard M. An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing. 2015;159:9–26.
Web of Science ®Google Scholar
Gomaa WH, Fahmy AA. A survey of text similarity approaches. Int J Comput Appl. 2013;68(13):13–18.
Google Scholar
Maroosi A, Amiri B. A new clustering algorithm based on hybrid global optimization based on a dynamical systems approach algorithm. Expert Syst Appl. 2010;37(8):5645–5652.
Web of Science ®Google Scholar
Xiang S, Nie F, Zhang C. Learning a mahalanobis distance metric for data clustering and classification. Pattern Recognit. 2008;41(12):3600–3612.
Web of Science ®Google Scholar
Bharti KK, Singh PK. Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl. 2015;42(6):3105–3114.
Web of Science ®Google Scholar
Das S, Abraham A, Konar A. Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybernet A. 2008;38(1):218–237.
Web of Science ®Google Scholar
Williams RJ, Volberg RA. The classification accuracy of four problem gambling assessment instruments in population research. Int Gambl Stud. 2014;14(1):15–28.
Web of Science ®Google Scholar
Ma B, Yuan H, Wu Y. Exploring performance of clustering methods on document sentiment analysis. J Inf Sci. 2017;43(1):54–74.
Web of Science ®Google Scholar
Hai Z, Chang K, Kim J-J, et al. Identifying features in opinion mining via intrinsic and extrinsic domain relevance. IEEE Trans Knowl Data Eng. 2014;26(3):623–634.
Web of Science ®Google Scholar
Yau C-K, Porter A, Newman N, et al. Clustering scientific documents with topic modeling. Scientometrics. 2014;100(3):767–786.
Web of Science ®Google Scholar
Jin C, Jin S. Automatic image annotation using feature selection based on improving quantum particle swarm optimization. Signal Processing. 2015;109:172–181.
Web of Science ®Google Scholar
Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Vol. 1. Cambridge: Cambridge University Press; 2008.
Google Scholar
Mustafi D, Sahoo G, Mustafi A. A multi criteria document clustering approach using genetic algorithm. In: Computational intelligence in data mining. Vol. 1. Springer; New Delhi, 2016. p. 237–247.
Google Scholar
Munková D, Munk M, Vozár M. Data pre-processing evaluation for text mining: transaction/sequence model. Procedia Comput Sci. 2013;18:1198–1207.
Google Scholar
Wei T, Lu Y, Chang H, et al. A semantic approach for text clustering using wordnet and lexical chains. Expert Syst Appl. 2015;42(4):2264–2275.
Web of Science ®Google Scholar
Witten IH, Frank E, Hall MA. Data mining: practical machine learning tools and techniques. San Fransisco, US: Morgan Kaufmann; 2016.
Google Scholar
Dagher GG, Fung BCM. Subject-based semantic document clustering for digital forensic investigations. Data Knowl Eng. 2013;86:224–241.
Web of Science ®Google Scholar
Feinerer I, Buchta C, Geiger W, et al. The textcat package for n-gram based text categorization in R. J Stat Softw. 2013;52(6):1–17.
PubMed Web of Science ®Google Scholar
Graovac J. A variant of n-gram based language-independent text categorization. Intell Data Anal. 2014;18(4):677–695.
Web of Science ®Google Scholar
Salton G, Wong A, Yang C-S. A vector space model for automatic indexing. Commun ACM. 1975;18(11):613–620.
Web of Science ®Google Scholar
Nasir JA, Varlamis I, Karim A, et al. Semantic smoothing for text clustering. Knowl Based Syst. 2013;54:216–229.
Web of Science ®Google Scholar
Grossman DA. Information retrieval: algorithms and heuristics. Vol. 15. New York: Springer; 2004.
Google Scholar
Haddi E, Liu X, Shi Y. The role of text pre-processing in sentiment analysis. Procedia Comput Sci. 2013;17:26–32.
Google Scholar
Shlens J. A tutorial on principal component analysis. Vol. 82. San Diego: Systems Neurobiology Laboratory, University of California at San Diego; 2005.
Google Scholar
Chua FCT. Dimensionality reduction and clustering of text documents. Singapore: Singapore Management University; 2009.
Google Scholar
Deerwester S, Dumais ST, Furnas GW, et al. Indexing by latent semantic analysis. J Am Soc Inf Sci. 1990;41(6):391.
Web of Science ®Google Scholar
Rana C, Jain SK. An evolutionary clustering algorithm based on temporal features for dynamic recommender systems. Swarm Evol Comput. 2014;14:21–30.
Web of Science ®Google Scholar
Aliguliyev RM. Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization. Comput Intell. 2010;26(4):420–448.
Web of Science ®Google Scholar
Maulik U, Bandyopadhyay S. Genetic algorithm-based clustering technique. Pattern Recognit. 2000;33(9):1455–1465.
Web of Science ®Google Scholar
Premalatha K, Natarajan AM. Genetic algorithm for document clustering with simultaneous and ranked mutation. Mod Appl Sci. 2009;3(2):75–82.
Google Scholar
Denoeux T, Kanjanatarakul O, Sriboonchitta S. Ek-nnclus: a clustering procedure based on the evidential k-nearest neighbor rule. Knowl Based Syst. 2015;88:57–69.
Web of Science ®Google Scholar
Chou C-H, Hsieh Y-Z, Su M-C, et al. A new measure of cluster validity using line symmetry. J Inf Sci Eng. 2014;30(2):443–461.
Web of Science ®Google Scholar
Vdorhees EM. The cluster hypothesis revisited. ACM SIGIR forum. Japan, Vol. 51. ACM; 2017. p. 35–43.
Google Scholar
Caballero R, Laguna M, Martí R, et al. Multiobjective clustering with metaheuristic optimization technology. Valencia, España: Departamento de Estadística e Investigación Operativa, Universidad de Valencia; 2006. (Reporte Técnico).
Google Scholar
Arbelaitz O, Gurrutxaga I, Muguerza J, et al. An extensive comparative study of cluster validity indices. Pattern Recognit. 2013;46(1):243–256.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A novel approach to text clustering using genetic algorithm based on the nearest neighbour heuristic

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A novel approach to text clustering using genetic algorithm based on the nearest neighbour heuristic

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date