455
Views
22
CrossRef citations to date
0
Altmetric
Original Articles

Parallel and distributed clustering framework for big spatial data mining

, &
Pages 671-689 | Received 10 Oct 2017, Accepted 23 Feb 2018, Published online: 16 Mar 2018

References

  • Big data and analytics builds the foundation for cognitive ; 2017. Available from: http://1www.idc.com/prodserv/4Pillars/bigdata
  • Han J , Pei J , Kamber M . Introduction. In: Data mining: concepts and techniques. Elsevier; 2011. p. 1–39.
  • Bacarella D . Distributed clustering algorithm for large scale clustering problems. 2013. Available frm: urn:nbn:se:uu:diva-212089
  • Tsoumakas G , Vlahavas I . Distributed data mining. In: Encyclopedia of Data Warehousing and Mining; 2009.
  • Fu Y . Distributed data mining: an overview. In: Newsletter of the IEEE Technical Committee on Distributed Processing. Rolla, MO; 2001. p. 5.
  • Park BH , Kargupta H . Distributed data mining: algorithms, systems, and applications. Baltimore (MD): Citeseer; 2002.
  • Karine Zeitouni LY . Le data mining spatial et les bases de données spatiales. Revue internationale de géomatique. 1999;9:389–423.
  • Ghosh S . Distributed systems: an algorithmic approach. Boca Raton (FL): Chapman & Hall; 2014.
  • Aouad L , Le-Khac NA , Kechadi T . Image analysis platform for data management in the meteorological domain. In: 7th Industrial Conference in Data Mining Proceedings. Vol. 4597; Berlin Heidelberg: Springer; 2007. p. 120–134.
  • Wu X , Zhu X , Wu GQ , et al . Data mining with Big Data. IEEE Trans Knowl Data Eng. 2014;26:97–107.
  • Rokach L , Schclar A , Itach E . Ensemble methods for multi-label classification. Expert Syst Appl. 2014;41:7507–7523.
  • Bauer E , Kohavi R . An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach Learn. 1999;36:105–139.
  • Huang JW , Lin SC , Chen MS . Dpsp: distributed progressive sequential pattern mining on the cloud. Adv Knowl Discovery Data Min. 2010:27–34.
  • Yang XY , Liu Z , Fu Y . MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International Conference on Information Sciences and Interaction Sciences (ICIS). Chengdu, China; 2010. p. 99–102.
  • Lin X . Mr-apriori: association rules algorithm based on mapreduce. In: 2014 5th IEEE International Conference on Software Engineering and Service Science (ICSESS). Beijing, China; 2014. p. 141–144.
  • Hsieh LC , Wu GL , Hsu YM , et al . Online image search result grouping with mapreduce-based image clustering and graph construction for large-scale photos. J Visual Commun Image Representation. 2014;25:384–395.
  • He Y , Tan H , Luo W , et al . Mr-dbscan: a scalable mapreduce-based dbscan algorithm for heavily skewed data. Front Comput Sci. 2014;8:83–99.
  • Sun T , Shu C , Li F , et al. An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies. Hiroshima, Japan; 2009. p. 494–499.
  • Kim Y , Shim K , Kim MS , et al . Dbcure-mr: an efficient density-based clustering algorithm for large data using mapreduce. Inf Syst. 2014 Jun;42:15–35.
  • Bendechache M , Kechadi MT . Distributed clustering algorithm for spatial data mining. In: 2nd International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM). Fuzhou, China: IEEE; 2015. p. 60–65.
  • Bendechache M , Le-Khac NA , Kechadi MT . Efficient large scale clustering based on data partitioning. In: International Conference on Data Science and Advanced Analytics (DSAA); IEEE; 2016. p. 612–621.
  • Bendechache M , Le-Khac NA , Kechadi MT . Performance evaluation of a distributed clustering approach for spatial datasets. In: 15th International Conference on Australasian Data Mining Conference (AusDM), CRPIT. Melbourne, Australia; 2017.
  • Lloyd S . Least squares quantization in pcm. IEEE Trans Inf Theory. 1982;28:129–137.
  • Kaufman L , Rousseeuw PJ . Partitioning around medoids (program pam). In: Finding groups in data: an introduction to cluster analysis. Wiley Online Library; 1990. p. 68–125.
  • Guha S , Rastogi R , Shim K . CURE: an efficient clustering algorithm for large databases. In: Tiwary A , Franklin M , editors. ACM Sigmod Record. Vol. 27. New York (NY): ACM; 1998. p. 73–84
  • Zhang T , Ramakrishnan R , Livny M . BIRCH: an efficient data clustering method for very large databases. In: Widom J , editor. ACM Sigmod Record. Vol. 25. New York (NY): ACM; 1996. p. 103–114.
  • Ester M , Kriegel HP , Sander J , et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96; Portland, Oregon: AAAI Press; 1996. p. 226–231.
  • Aouad L , Le-Khac NA , Kechadi T . Lightweight clustering technique for distributed data mining applications. In: Advances in data mining. Theoretical aspects and applications; Springer; 2007. p. 120–134.
  • Dhillon I , Modha D . A data-clustering algorithm on distributed memory multiprocessor. In: Large-scale parallel data mining, workshop on large-scale parallel KDD systems, SIGKDD. London, UK: Springer-Verlag; 1999. p. 245–260.
  • Garg A , Mangla A , Bhatnagar V , et al. Pbirch: a scalable parallel clustering algorithm for incremental data. In: 10th International Symposium on Database Engineering and Applications (IDEAS-06). Delhi, India; 2006. p. 315–316.
  • Geng H , Deng X , Ali H . A new clustering algorithm using message passing and its applications in analyzing microarray data. In: Proceedings. Fourth International Conference on Machine Learning and Applications. Los Angeles (CA): IEEE; 2005. 6 p.
  • Dhillon ID , Modha DS . A data-clustering algorithm on distributed memory multiprocessors. In: Zaki MJ , Ho C-T, editors. Large-scale parallel data mining. Berlin Heidelberg: Springer; 2000. p. 245–260.
  • Xu X , Jaeger J , Kriegel HP . A fast parallel clustering algorithm for large spatial databases. Data Min Knowl Discovery Arch. 1999;3:263–290.
  • Laloux JF , Le-Khac NA , Kechadi MT . Efficient distributed approach for density-based clustering. In: Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), 20th IEEE International Workshops. Paris, France; 2011. p. 145–150.
  • Le Khac NA , Aouad LM , Kechadi MT . Knowledge map layer for distributed data mining. J ISAST Trans Intell Syst. 2008;1:98–107.
  • Roddick JF , Hornsby K , Spiliopoulou M . An updated bibliography of temporal, spatial, and spatio-temporal data mining research. In: Roddick JF , Hornsby K , editors. Temporal, spatial, and spatio-temporal data mining. Berlin, Heidelberg: Springer; 2001. p. 147–163.
  • Kivinen J , Mannila H . The power of sampling in knowledge discovery. In: Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. Minneapolis (MN): ACM; 1994. p. 77–85.
  • Compieta P , Di Martino S , Bertolotto M , et al . Exploratory spatio-temporal data mining and visualization. J Visual Lang Comput. 2007;18:255–279.
  • He Y , Tan H , Luo W , et al . Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In: 17th International Conference on Parallel and Distributed Systems. Tainan, Taiwan: IEEE; 2011. p. 473–480.
  • Bendechache M , Le-Khac NA , Kechadi MT . Hierarchical aggregation approach for distributed clustering of spatial datasets. In: 16th International Conference on Data Mining Workshops (ICDMW). Barcelona, Spain: IEEE; 2016. p. 1098–1103.
  • Chaudhuri A , Chaudhuri B , Parui S . A novel approach to computation of the shape of a dot pattern and extraction of its perceptual border. Comput Vision Image Understanding. 1997;68:257–275.
  • Melkemi M , Djebali M . Computing the shape of a planar points set. Pattern Recognit. 2000;33:1423–1436.
  • Duckhama M , Kulikb L , Worboysc M , et al . Efficient generation of simple polygons for characterizing the shape of a set of points in the plane. Vol. 41. New York (NY): Elsevier Science Inc.; 2008. p. 3224–3236.
  • Zhang T , Ramakrishnan R , Livny M . Birch: an efficient data clustering method for very large databases. In: SIGMOD-96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data. Vol. 25; New York, NY, USA: ACM; 1996. p. 103–114.
  • Guha S , Rastogi R , Shim K . CURE: an efficient clustering algorithm for large databases. In: Guha S , Rastogi R , Shim K , editors. Information systems. Vol. 26. Oxford: Elsevier Science Ltd.; 2001. p. 35–58.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.