335
Views
3
CrossRef citations to date
0
Altmetric
Clustering, Matching, and Prediction

Poisson Kernel-Based Clustering on the Sphere: Convergence Properties, Identifiability, and a Method of Sampling

& ORCID Icon
Pages 758-770 | Received 09 Mar 2018, Accepted 04 Mar 2020, Published online: 20 Apr 2020

References

  • Axler, S., Bourdon, P., and Ramey, W. (2001), Harmonic Function Theory (2nd ed.), New York: Springer-Verlag.
  • Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. (2003), “Generative Model-Based Clustering of Directional Data,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 19–29. DOI: 10.1145/956750.956757.
  • Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. (2005), “Clustering on the Unit Hypersphere Using von Mises-Fisher Distributions,” Journal of Machine Learning Research, 6, 1345–1382.
  • Banerjee, A., and Ghosh, J. (2002), “Frequency Sensitive Competitive Learning for Clustering on High-Dimensional Hyperspheres,” in Proceedings International Joint Conference on Neural Networks, pp. 1590–1595.
  • Banfield, J. D., and Raftery, E. (1993), “Model-Based Gaussian and Non-Gaussian Clustering,” Biometrics, 49, 803–821. DOI: 10.2307/2532201.
  • Baudry, J. P. (2015), “Estimation and Model Selection for Model-Based Clustering With the Conditional Classification Likelihood,” Electronic Journal of Statistics, 9, 1041–1077. DOI: 10.1214/15-EJS1026.
  • Biernacki, C., Celeux, G., and Govaert, G. (2000), “Assessing a Mixture Model for Clustering With the Integrated Completed Likelihood,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719–725. DOI: 10.1109/34.865189.
  • Bijral, A. S., Breitenbach, M., and Grudic, G. (2007), “Mixture of Watson Distributions: A Generative Model for Hyperspherical Embeddings,” in Artificial Intelligence and Statistics (AISTATS), pp. 35–42.
  • Bilmes, J. A. (1997), “A Gentle Tutorial on the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” Technical Report ICSI-TR-97-021, University of California, Berkeley.
  • Campbell, N. A., and Mahon, R. J. (1974), “A Multivariate Study of Variation in Two Species of Rock Crab of the Genus Leptograpsus,” Australian Journal of Zoology, 22, 417–425. DOI: 10.1071/ZO9740417.
  • Dai, F., and Xu, Y. (2013), Approximation Theory and Harmonic Analysis on Sphere and Balls, New York: Springer-Verlag.
  • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood From Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–38. DOI: 10.1111/j.2517-6161.1977.tb01600.x.
  • Dhillon, I. S., and Modha, D. S. (2001), “Concept Decompositions for Large Sparse Text Data Using Clustering,” Machine Learning, 42, 143–175.
  • Dhillon, I. S., and Sra, S. (2003), “Modeling Data Using Directional Distributions,” Technical Report # TR-03-06, Department of Computer Sciences, The University of Texas at Austin, Austin, TX.
  • Dortet-Bernadet, J., and Wicker, N. (2008), “Model-Based Clustering on the Unit Sphere With an Illustration Using Gene Expression Profiles,” Biostatistics, 9, 66–80. DOI: 10.1093/biostatistics/kxm012.
  • Downs, T. D. (2009), “Cauchy Families of Directional Distributions Closed Under Location and Scale Transformations,” The Open Statistics and Probability Journal, 1, 76–92. DOI: 10.2174/1876527000901010076.
  • Dryden, I. L. (2005), “Statistical Analysis on High-Dimensional Sphere and Shade Spaces,” The Annals of Statistics, 33, 1643–1665. DOI: 10.1214/009053605000000264.
  • Duwairi, R., and Abu-Rahmeh, M. (2015), “A Novel Approach for Initializing the Spherical K-Means Clustering Algorithm,” Simulation Modelling Practice and Theory, 54, 49–63. DOI: 10.1016/j.simpat.2015.03.007.
  • Fisher, N. I. (1996), Statistical Analysis of Circular Data, Cambridge, UK: Cambridge University Press.
  • Fraley, C., and Raftery, A. E. (2002), “Model-Based Clustering, Discriminant Analysis and Density Estimation,” Journal of the American Statistical Association, 97, 611–631. DOI: 10.1198/016214502760047131.
  • Fraley, C., and Raftery, A. E. (2007), “Model-Based Methods of Classification: Using the mclust Software in Chemometrics,” Journal of Statistical Software, 18, 1–13.
  • Fujita, A., Takahashi, D. Y., and Patriota, A. G. (2014), “A Nonparametric Method to Estimate the Number of Clusters,” Computational Statistics and Data Analysis, 73, 27–39. DOI: 10.1016/j.csda.2013.11.012.
  • Garcia-Portugués, E. (2013), “Exact Risk Improvement of Bandwidth Selectors for Kernel Density Estimation With Directional Data,” Electronic Journal of Statistics, 7, 1655–1685.
  • Golzy, M., Markatou, M., and Shivram, A. (2016), “Algorithms for Clustering on the Sphere: Advances and Applications,” Proceedings of the World Congress on Engineering and Computer Science (Vol. I), pp. 420–425.
  • Hall, P., Watson, G. S., and Cabrera, I. (1987), “Kernel Density Estimation With Spherical Data,” Biometrica, 74, 751–762. DOI: 10.1093/biomet/74.4.751.
  • Holzmann, H., and Munk, A. (2006), “Identifiability of Finite Mixtures of Elliptical Distributions,” Scandinavian Journal of Statistics, 33, 753–763. DOI: 10.1111/j.1467-9469.2006.00505.x.
  • Hornik, K., Feinerer, I., Kober, K., and Buchta, C. (2012), “Spherical K-Means Clustering,” Journal of Statistical Software, 50, 1–22. DOI: 10.18637/jss.v050.i10.
  • Hornik, K., and Grün, B. (2013), “On Conjugate Families and Jeffreys Priors for von Mises-Fisher Distributions,” Journal of Statistical Planning and Inference, 143, 992–999. DOI: 10.1016/j.jspi.2012.11.003.
  • Hornik, K., and Grün, B. (2014a), “On Maximum Likelihood Estimation of the Concentration Parameter of von Mises-Fisher Distributions,” Computational Statistics, 29, 945–957.
  • Hornik, K., and Grün, B. (2014b), “movMF: An R Package for Fitting Mixtures of von Mises-Fisher Distributions,” Journal of Statistical Software, 58, 1–31.
  • Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218. DOI: 10.1007/BF01908075.
  • Jammalamadaka, S. R., Bhadra, N., Chaturvedi, D., Kutty, T. K., Majumda, P. P., and Poduval G. (1986), “Functional Assessment of Knee and Ankle During Level Walking,” in Data Analysis in Life Science, Calcutta, India: Indian Statistical Institute, pp. 21–54.
  • Kato, S., and Jones, M. C. (2013), “An Extended Family of Circular Distributions Related to Wrapped Cauchy Distributions via Brownian Motion,” Bernoulli, 19, 154–171. DOI: 10.3150/11-BEJ397.
  • Kato, S., and McCullagh, P. (2018), “Möbius Transformation and a Cauchy Family on the Sphere,” arXiv no. 1510.07679v2.
  • Kent, J. T., and Tyler, D. E. (1988), “Maximum Likelihood Estimation for the Wrapped Cauchy Distribution,” Journal of Applied Statistics, 15, 247–354. DOI: 10.1080/02664768800000029.
  • Lee, A. (2010), “Circular Data,” Wiley Interdisciplinary Review: Computational Statistics, 2, 477–486. DOI: 10.1002/wics.98.
  • Lev́y, P. (1939), “L’addition des variables aléatoires définies sur une circonférence,” Bulletin de la Société Mathématique de France, 67, 1–41.
  • Li, J., Ray, S., and Lindsay, B. G. (2007), “A Nonparametric Statistical Approach to Clustering via Mode Identification,” Journal of Machine Learning Research, 8, 1687–1723.
  • Lin, L., and Li, J. (2017), “Clustering With Hidden Markov Model on Variable Blocks,” Journal of Machine Learning Research, 18, 1–49.
  • Lindsay, B. G. (1995), Mixture Models: Theory, Geometry and Apllications, in NSF-CBMS Regional Conference Series in Probability and Statistics (Vol. 5), IMS.
  • Lindsay, B. G., and Markatou, M. (2002), Statistical Distances: A Global Framework to Inference (Book Manuscript Under Contract), New York: Springer-Verlag.
  • Maitra, R., and Ramler, I. P. (2010), “A K-Mean-Directions Algorithm for Fast Clustering of Data on the Sphere,” Journal of Computational and Graphical Statistics, 19, 377–396. DOI: 10.1198/jcgs.2009.08155.
  • Mardia, K. V., and Jupp, P. E. (2000), Directional Statistics, Wiley Series in Probability and Statistics, New York: Wiley.
  • McLachlan, G., and Peel, D. (2000), Finite Mixture Models, Wiley Series in Probability and Statistics, New York: Wiley.
  • McNicholas, P. D. (2016), “Model-Based Clustering,” Journal of Classification, 33, 331–373. DOI: 10.1007/s00357-016-9211-9.
  • Modha, D. S., and Spangler, W. S. (2003), “Feature Weighting in K-Means Clustering,” Machine Learning, 52, 217–237.
  • Paine, P. J., Preston, S. P., Tsagris, M., and Wood, T. A. (2017) “An Elliptically Symmetric Angular Gaussian Distribution,” Statistics and Computing, 28, 689–697, DOI: 10.1007/s11222-017-9756-4.
  • Peel, D., Whiten, W. J., and McLachlan, G. J. (2001), “Fitting Mixtures of Kent Distributions to Aid in Joint Set Identification,” Journal of the American Statistical Association, 96, 56–63. DOI: 10.1198/016214501750332974.
  • R Development Core Team (2008), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing, available at http://www.R-project.org/.
  • Ramler, I. P. (2008), “Improved Statistical Methods for k-Means Clustering of Noisy and Directional Data,” Graduate Theses and Dissertations, Iowa State University, Paper 10949.
  • Rousseeuw, P. J. (1987), “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,” Journal of Computional and Appllied Mathematics, 20, 53–65. DOI: 10.1016/0377-0427(87)90125-7.
  • Sra, S., and Karp, D. (2013), “The Multivariate Watson Distribution: Maximum-Likelihood Estimation and Other Aspects,” Journal of Multivariate Analysis, 114, 256–269. DOI: 10.1016/j.jmva.2012.08.010.
  • Tibshirani, R., and Walter, G. (2005), “Cluster Validation by Prediction Strength,” Journal of Computational and Graphical Statistics, 14, 511–528. DOI: 10.1198/106186005X59243.
  • Tibshirani, R., Walter, G., and Hastie, T. (2001), “Estimating the Number of Components in a Dataset via the Gap Statistic,” Journal of Royal Statistical Society, Series B, 63, 411–423. DOI: 10.1111/1467-9868.00293.
  • Titterington, D. M., Smith, A. F. M., and Markov, U. E. (1985), Statistical Analysis of Finite Mixture Distributions, New York: Wiley.
  • Watson, G. S. (1983), Statistics on Sphere, New York: Wiley.
  • Wintner, A. (1947), “On the Shape of the Angular Case of Cauchy’s Distribution Curves,” The Annals of Mathematical Statistics, 18, 589–593. DOI: 10.1214/aoms/1177730351.
  • Xu, L., and Jordan, M. I. (1996), “On Convergence Properties of the EM Algorithm for Gaussian Mixtures,” Neural Computation, 8, 129–151. DOI: 10.1162/neco.1996.8.1.129.
  • Yakowitz, S. J., and Spraging, J. D. (1968), “On the Identifiability of Finite Mixtures,” The Annals of Mathematical Statistics, 39, 209–214. DOI: 10.1214/aoms/1177698520.
  • Zhong, S. (2005), “Efficient Online Spherical K-Means Clustering,” in Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, pp. 3180–3185.
  • Zhong, S., and Ghosh, J. (2003), “A Unified Framework for Model-Based Clustering,” Journal of Machine Learning Research, 4, 1001–1037.
  • Zhong, S., and Ghosh, J. (2005), “Generative Model-Based Document Clustering: A Comparative Study,” Knowledge and Information Systems, 8, 374–384.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.