120
Views
0
CrossRef citations to date
0
Altmetric
Articles

TreeKDE: clustering multivariate data based on decision tree and using one-dimensional kernel density estimation

ORCID Icon, ORCID Icon & ORCID Icon
Pages 740-758 | Received 09 Nov 2021, Accepted 12 Dec 2022, Published online: 22 Dec 2022

References

  • R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data, Data. Min. Knowl. Discov. 11 (2005), pp. 5–33. doi: 10.1007/s10618-005-1396-1.
  • M. Ankerst, M.M. Breunig, H.P. Kriegel, and J. Sander, OPTICS, ACM SIGMOD Rec. 28 (1999), pp. 49–60. doi: 10.1145/304181.304187.
  • A. Asuncion and D. Newman, Uci machine learning repository (2007). Available at http://archive.ics.uci.edu/ml.
  • J.E. Chacón and T. Duong, Multivariate Kernel Smoothing and Its Applications, CRC Press, 2018.
  • S.K. Chinnamgari, R Machine Learning Projects: Implement Supervised, Unsupervised, and Reinforcement Learning Techniques Using R 3.5, Packt Publishing Ltd, 2019.
  • D. Defays, An efficient algorithm for a complete link method, Comput. J. 20 (1977), pp. 364–366. doi: 10.1093/comjnl/20.4.364.
  • M. Ester, H.P. Kriegel, J. Sander, and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in Kdd, Vol. 96, 1996, pp. 226–231.
  • G. Gan, C. Ma, and J. Wu, Data Clustering: Theory, Algorithms, and Applications, SIAM, 2020.
  • A. Gramacki, Nonparametric Kernel Density Estimation and Its Computational Aspects, Springer, 2018. doi: 10.1007/978-3-319-71688-6.
  • J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques, 3rd ed., Morgan Kaufmann, Waltham-USA, 2011.
  • T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Science & Business Media, 2009.
  • H. Huang, C. Ding, D. Luo, and T. Li, Simultaneous tensor subspace selection and clustering: The equivalence of high order svd and k-means clustering, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 327–335. doi: 10.1145/1401890.1401933.
  • L. Hubert and P. Arabie, Comparing partitions, J. Classif. 2 (1985), pp. 193–218.
  • S. Itani, F. Lecron, and P. Fortemps, A one-class classification decision tree based on kernel density estimation, Appl. Soft. Comput. 91 (2020), pp. 106250.
  • A. Kassambara, Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning, Vol. 1, STHDA, 2017.
  • L. Kaufman and P. Rousseeuw, Clustering large data sets, in Pattern Recognition in Practice II (1986), E. S. Gelsema and L. N. Kanal, eds., North-Holland, 1986, pp. 425–437.
  • L. Kaufman and P.J. Rousseeuw, Clustering by means of medoids, in Statistical Data Analysis Based on the l1 Norm, Y. Dodge, ed., 1987, pp. 405–416.
  • L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, 9th ed., John Wiley & Sons, 1990. doi: 10.1002/9780470316801.
  • M. Kretowski, Evolutionary Decision Trees in Large-Scale Data Mining, Springer, 2019.
  • P. Kulczycki and M. Charytanowicz, A complete gradient clustering algorithm formed with kernel estimators, Int. J. Appl. Math. Comput. Sci. 20 (2010), pp. 123–134. doi: 10.2478/v10006-010-0009-3.
  • B. Liu, Y. Xia, and P.S. Yu, Clustering through decision tree construction, in Proceedings of the Ninth International Conference on Information and Knowledge Management, 2000, pp. 20–29.
  • S. Łukasik, Parallel computing of kernel density estimates with MPI, in International Conference on Computational Science, Springer, 2007, pp. 726–733.
  • J. MacQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, Oakland, CA, USA, 1967, pp. 281–297.
  • O.Z. Maimon and L. Rokach, Data Mining with Decision Trees: Theory and Applications, Vol. 81, World Scientific, 2014.
  • L.C. Matioli, S. Santos, M. Kleina, and E.A. Leite, A new algorithm for clustering based on kernel density estimation, J. Appl. Stat. 45 (2017), pp. 347–366. doi: 10.1080/02664763.2016.1277191.
  • G. Menardi and A. Azzalini, An advancement in clustering via nonparametric density estimation, Stat. Comput. 24 (2014), pp. 753–767. doi: 10.1007/s11222-013-9400-x.
  • R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2019. Available at https://www.R-project.org/.
  • P. Ram and A.G. Gray, Density estimation trees, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, pp. 627–635.
  • A. Rodriguez and A. Laio, Clustering by fast search and find of density peaks, Science 344 (2014), pp. 1492–1496. doi: 10.1126/science.1242072.
  • P.J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20 (1987), pp. 53–65. doi: 10.1016/0377-0427(87)90125-7.
  • D. Scaldelai, L. C. Matioli, S. R. Santos and M. Kleina, MulticlusterKDE: a new algorithm for clustering based on multivariate kernel density estimation, Journal of Applied Statistics, 49 (2022), pp. 98–121. doi:10.1080/02664763.2020.1799958
  • D.W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed., John Wiley & Sons, 2015.
  • L. Scrucca, M. Fop, T.B. Murphy, and A.E. Raftery, mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models, R. J. 8 (2016), pp. 289–317. doi: 10.32614/RJ-2016-021.
  • R. Sibson, Slink: An optimally efficient algorithm for the single-link cluster method, Comput. J. 16 (1973), pp. 30–34. doi: 10.1093/comjnl/16.1.30.
  • B.W. Silverman, Density Estimation for Statistics and Data Analysis, Vol. 26, CRC Press, 1986.
  • P. Smyth, A. Gray, and U.M. Fayyad, Retrofitting decision tree classifiers using kernel density estimation, in Machine Learning Proceedings 1995, Elsevier, 1995, pp. 506–514.
  • W.W. Sun and L. Li, Dynamic tensor clustering, J. Am. Stat. Assoc. 114 (2019), pp. 1894–1907. doi: 10.1080/01621459.2018.1527701.
  • M.P. Wand and M.C. Jones, Kernel Smoothing, Chapman and Hall/CRC, 1994.
  • W. Wang, J. Yang, and R. Muntz, STING: A statistical information grid approach to spatial data mining, in VLDB, Vol. 97, 1997, pp. 186–195.
  • I.H. Witten and E. Frank, Data mining: Practical machine learning tools and techniques with java implementations, Acm Sigmod Rec. 31 (2002), pp. 76–77.
  • J. Wu, Z. Lin, and H. Zha, Essential tensor learning for multi-view spectral clustering, IEEE. Trans. Image. Process. 28 (2019), pp. 5910–5922. doi: 10.1109/TIP.2019.2916740.
  • J. Xie, H. Gao, W. Xie, X. Liu, and P.W. Grant, Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors, Inf. Sci. 354 (2016), pp. 19–40. doi: 10.1016/j.ins.2016.03.011.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.