1,525
Views
2
CrossRef citations to date
0
Altmetric
Theory and Methods

Cohesion and Repulsion in Bayesian Distance Clustering

ORCID Icon, , , & ORCID Icon
Pages 1374-1384 | Received 25 May 2021, Accepted 09 Mar 2023, Published online: 18 Apr 2023

References

  • American Numismatic Society (n.d.). “Silver Denarius of Nero, Rome, AD 64 - AD 65 (BMC 74, RIC I Second Edition Nero 53),” ANS ID: 1944.100.39423, available at http://numismatics.org/collection/1944.100.39423.
  • Argiento, R., and De Iorio, M. (2022), “Is infinity that far? A Bayesian Nonparametric Perspective of Finite Mixture Models,” Annals of Statistics, 50, 2641–2663. DOI: 10.1214/22-AOS2201.
  • Barcella, W., De Iorio, M., and Baio, G. (2017), “A Comparative Review of Variable Selection Techniques for Covariate Dependent Dirichlet Process Mixture Models,” Canadian Journal of Statistics, 45, 254–273. DOI: 10.1002/cjs.11323.
  • Barry, D., and Hartigan, J. A. (1992), “Product Partition Models for Change Point Problems,” The Annals of Statistics, 20, 260–279. DOI: 10.1214/aos/1176348521.
  • Beaumont, M. A., Zhang, W., Balding, D. J. (2002), “Approximate Bayesian Computation in Population Genetics,” Genetics, 162, 2025–2035. DOI: 10.1093/genetics/162.4.2025.
  • Besag, J. (1975), “Statistical Analysis of Non-Lattice Data,” Journal of the Royal Statistical Society, Series D, 24, 179–195. DOI: 10.2307/2987782.
  • Betancourt, B., Zanella, G., and Steorts, R. C. (2022), “Random Partition Models for Microclustering Tasks,” Journal of the American Statistical Association, 117, 1215–1227. DOI: 10.1080/01621459.2020.1841647.
  • Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. (1999), “When Is “Nearest Neighbor” Meaningful?” in Database Theory—ICDT’99, eds. C. Beeri, and P. Buneman, pp. 217–235. Berlin Heidelberg: Springer. DOI: 10.1007/3-540-49257-7˙15.
  • Chandler, R. E., and Bate, S. (2007), “Inference for Clustered Data Using the Independence Loglikelihood,” Biometrika, 94, 167–183. DOI: 10.1093/biomet/asm015.
  • Classical Numismatic Group (n.d.). “Triton XX, Lot: 673,” RIC I 53; WCN 57; RSC 119; BMCRE 74-6; BN 220-1, available at https://www.cngcoins.com/Coin.aspx?CoinID=324866.
  • Corander, J., Sirén, J., and Arjas, E. (2008), “Bayesian Spatial Modeling of Genetic Population Structure,” Computational Statistics, 23, 111–129. DOI: 10.1007/s00180-007-0072-x.
  • Dahl, D. B. (2008), “Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics,” in “2008 JSM Proceedings: Papers Presented at the Joint Statistical Meetings, Denver, Colorado, August 3-7, 2008, and other ASA-sponsored Conferences; Communicating Statistics: Speaking Out and Reaching Out,” Alexandria, VA: American Statistical Association.
  • Dahl, D. B., Day, R., and Tsai, J. W. (2017), “Random Partition Distribution Indexed by Pairwise Information,” Journal of the American Statistical Association, 112, 721–732. DOI: 10.1080/01621459.2016.1165103.
  • Dahl, D. B., Johnson, D. J., and Müller, P. (2022), “Search Algorithms and Loss Functions for Bayesian Clustering,” Journal of Computational and Graphical Statistics, 31, 1189–1201. DOI: 10.1080/10618600.2022.2069779.
  • Dasgupta, A., and Raftery, A. E. (1998), “Detecting Features in Spatial Point Processes with Clutter via Model-Based Clustering,” Journal of the American Statistical Association, 93, 294–302. DOI: 10.2307/2669625.
  • Denison, D. G. T., and Holmes, C. C. (2001), “Bayesian Partitioning for Estimating Disease Risk,” Biometrics, 57, 143–149. DOI: 10.1111/j.0006-341X.2001.00143.x.
  • Dryden, I. L., and Mardia, K. V. (2016), Statistical Shape Analysis, with Applications in R, New York, NY: Wiley. DOI: 10.1002/9781119072492.
  • Duan, L. L., and Dunson, D. B. (2021), “Bayesian Distance Clustering,” Journal of Machine Learning Research, 22, 1–27. http://jmlr.org/papers/v22/20-688.html.
  • Fearnhead, P., and Prangle, D. (2012), “Constructing Summary Statistics for Approximate Bayesian Computation: Semi-Automatic Approximate Bayesian Computation,” Journal of the Royal Statistical Society, Series B, 74, 419–474. DOI: 10.1111/j.1467-9868.2011.01010.x.
  • Fúquene, J., Steel, M., and Rossell, D. (2019), “On Choosing Mixture Components via Non-Local Priors,” Journal of the Royal Statistical Society, Series B, 81, 809–837. DOI: 10.1111/rssb.12333.
  • Fritsch, A., and Ickstadt, K. (2009), “Improved Criteria for Clustering based on the Posterior Similarity Matrix,” Bayesian Analysis, 4, 367–391. DOI: 10.1214/09-BA414.
  • Gao, T., Kovalsky, S. Z., Boyer, D. M., and Daubechies, I. (2019a), “Gaussian Process Landmarking for Three-Dimensional Geometric Morphometrics,” SIAM Journal on Mathematics of Data Science, 1, 237–267. DOI: 10.1137/18M1203481.
  • Gao, T., Kovalsky, S. Z., and Daubechies, I. (2019b). “Gaussian Process Landmarking on Manifolds,” SIAM Journal on Mathematics of Data Science, 1, 208–236. DOI: 10.1137/18M1184035.
  • Gerhard Hirsch Nachfolger (2013). “Auktion 293, Lot 2656,” image available at https://www.numisbids.com/n.php?p=lot&sid=514&lot=2656.
  • Gnedin, A. V., and Pitman, J. (2006), “Exchangeable Gibbs Partitions and Stirling Triangles,” Journal of Mathematical Sciences, 138, 5674–5685. DOI: 10.1007/s10958-006-0335-z.
  • Hartigan, J. (1990), “Partition Models,” Communications in Statistics - Theory and Methods, 19, 2745–2756. DOI: 10.1080/03610929008830345.
  • Hennig, C. (2015), “What are the True Clusters?” Pattern Recognition Letters, 64, 53–62. DOI: 10.1016/j.patrec.2015.04.009.
  • Ishwaran, H., and James, L. F. (2003), “Generalized Weighted Chinese Restaurant Processes for Species Sampling Mixture Models,” Statistica Sinica, 13, 1211–1235. https://www.jstor.org/stable/24307169.
  • Jain, S., and Neal, R. M. (2004), “A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model,” Journal of Computational and Graphical Statistics, 13, 158–182. DOI: 10.1198/1061860043001.
  • Johnstone, I. M., and Titterington, D. M. (2009), “Statistical Challenges of Hhigh-Dimensional Data,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367, 4237–4253. DOI: 10.1098/rsta.2009.0159.
  • Klaus, J. (1995), Topology (2nd ed.), New York: Springer-Verlag.
  • Kleinberg, J. (2002), “An Impossibility Theorem for Clustering,” in “Proceedings of the 15th International Conference on Neural Information Processing Systems,” NIPS’02, pp. 463–470, Cambridge, MA: MIT Press. DOI: 10.5555/2968618.2968676.
  • Kruskal, J. B. (1964), “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis,” Psychometrika, 29, 1–27. DOI: 10.1007/BF02289565.
  • Larribe, F., and Fearnhead, P. (2011), “On Composite Likelihoods in Statistical Genetics,” Statistica Sinica, 21, 43–69. https://www.jstor.org/stable/24309262.
  • Lau, J. W., and Green, P. J. (2007), “Bayesian Model-Based Clustering Procedures,” Journal of Computational and Graphical Statistics, 16, 526–558. DOI: 10.1198/106186007X238855.
  • Lele, S., and Taper, M. L. (2002), “A Composite Likelihood Approach to (co)variance Components Estimation,” Journal of Statistical Planning and Inference, 103, 117–135. DOI: 10.1016/S0378-3758(01)00215-4.
  • Li, N., and Stephens, M. (2003), “Modeling Linkage Disequilibrium and Identifying Recombination Hotspots Using Single-Nucleotide Polymorphism Data,” Genetics, 165, 2213–2233. DOI: 10.1093/genetics/165.4.2213.
  • Lindsay, B. G. (1988), “Composite Likelihood Methods,” in “Statistical Inference from Stochastic Processes (Ithaca, NY, 1987),” volume 80 of Contemporary Mathematics, pp. 221–239. Providence, RI: American Mathematical Society. DOI: 10.1090/conm/080/999014.
  • Lipman, Y., Yagev, S., Poranne, R., Jacobs, D. W., and Basri, R. (2014), “Feature Matching with Bounded Distortion,” ACM Transactions on Graphics, 33, 1–14. DOI: 10.1145/2602142.
  • Lowe, D. (2004). “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, 60, 91–110. DOI: 10.1023/B:VISI.0000029664.99615.94.
  • Marjoram, P., Molitor, J., Plagnol, V., and Tavaré, S. (2003), “Markov Chain Monte Carlo Without Likelihoods,” Proceedings of the National Academy of Sciences, 100, 15324–15328. DOI: 10.1073/pnas.0306899100.
  • Mclachlan, G., and Basford, K. (1988), “Mixture Models: Inference and Applications to Clustering,” Journal of the Royal Statistical Society, Series C, 38, 384–384. DOI: 10.2307/2348072.
  • Meilǎ, M. (2007), “Comparing Clusterings—An Information based Distance,” Journal of Multivariate Analysis, 98, 873–895. DOI: 10.1016/j.jmva.2006.11.013.
  • Miller, J. (2020), “BayesianMixtures.” Version 0.1.1, available at https://github.com/jwmi/BayesianMixtures.jl.
  • Miller, J., Betancourt, B., Zaidi, A., Wallach, H., and Steorts, R. C. (2015), “Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set,” online preprint. DOI: 10.48550/arXiv.1512.00792.
  • Miller, J. W., and Harrison, M. T. (2018), “Mixture Models With a Prior on the Number of Components,” Journal of the American Statistical Association, 113, 340–356. DOI: 10.1080/01621459.2016.1255636.
  • Møller, J., and Skare, Ø. (2001), “Coloured Voronoi Tessellations for Bayesian Image Analysis and Reservoir Modelling,” Statistical Modelling, 1, 213–232. DOI: 10.1177/1471082X0100100304.
  • Müller, P., Quintana, F., and Rosner, G. L. (2011), “A Product Partition Model With Regression on Covariates,” Journal of Computational and Graphical Statistics, 20, 260–278. DOI: 10.1198/jcgs.2011.09066.
  • Nowicki, K., and Snijders, T. A. B. (2001), “Estimation and Prediction for Stochastic Blockstructures,” Journal of the American Statistical Association, 96, 1077–1087. DOI: 10.1198/016214501753208735.
  • Paganin, S., Herring, A. H., Olshan, A. F., and Dunson, D. B. (2021), “Centered Partition Processes: Informative Priors for Clustering (with Discussion),” Bayesian Analysis, 16, 301–670. DOI: 10.1214/20-BA1197.
  • Petralia, F., Rao, V., and Dunson, D. (2012), “Repulsive Mixtures,” in “Advances in Neural Information Processing Systems” (Vol. 25), eds. F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger. Curran Associates, Inc. Available at https://proceedings.neurips.cc/paper/2012/file/8d6dc35e506fc23349dd10ee68dabb64-Paper.pdf.
  • Pitman, J. (1996). “Some Developments of the Blackwell-MacQueen URN Scheme,” Lecture Notes-Monograph Series, 30, 245–267. DOI: 10.1214/lnms/1215453576.
  • Pitman, J., and Yor, M. (1997), “The Two-Parameter Poisson-Dirichlet Distribution Derived from a Stable Subordinator,” The Annals of Probability, 25, 855–900. http://www.jstor.org/stable/2959614. DOI: 10.1214/aop/1024404422.
  • Quinlan, J. J., Quintana, F. A., and Page, G. L. (2017), “Parsimonious Hierarchical Modeling Using Repulsive Distributions,” DOI: 10.48550/arXiv.1701.04457.
  • Quintana, F. A. (2006), “A Predictive View of Bayesian Clustering,” Journal of Statistical Planning and Inference, 136, 2407–2429. DOI: 10.1016/j.jspi.2004.09.015.
  • Quintana, F. A., and Iglesias, P. L. (2003), “Bayesian Clustering and Product Partition Models,” Journal of the Royal Statistical Society, Series B, 65, 557–574. DOI: 10.1111/1467-9868.00402.
  • Rigon, T., Herring, A. H., and Dunson, D. B. (2023), “A Generalized Bayes Framework for Probabilistic Clustering,” Biometrika. DOI: 10.1093/biomet/asad004.
  • Schmidt, M. N., and Morup, M. (2013), “Nonparametric Bayesian Modeling of Complex Networks: An Introduction,” IEEE Signal Processing Magazine, 30, 110–128. DOI: 10.1109/MSP.2012.2235191.
  • Snijders, T. A. B., and Nowicki, K. (1997), “Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure,” Journal of the Classification, 14, 75–100. DOI: 10.1198/016214501753208735.
  • Stephens, M. (2000), “Dealing with Label Switching in Mixture Models,” Journal of the Royal Statistical Society, Series B, 62, 795–809. DOI: 10.1111/1467-9868.00265.
  • Szeliski, R. (2010), Computer Vision: Algorithms and Applications, London: Springer. DOI: 10.1007/978-3-030-34372-9.
  • Taylor, Z. M. (2020), “The Computer-Aided Die Study (CADS): A Tool for Conducting Numismatic Die Studies with Computer Vision and Hierarchical Clustering,” Bachelor’s thesis, Trinity Uni. Available at https://digitalcommons.trinity.edu/compsci_honors/54/.
  • Varin, C., Reid, N., and Firth, D. (2011), “An Overview of Composite Likelihood Methods,” Statistica Sinica, 21, 5–42. https://www.jstor.org/stable/24309261.
  • Wade, S., and Ghahramani, Z. (2018), “Bayesian Cluster Analysis: Point Estimation and Credible Balls”, (with Discussion), Bayesian Analysis, 13, 559–626. DOI: 10.1214/17-BA1073.
  • Xu, Y., Müller, P., and Telesca, D. (2016), “Bayesian Inference for Latent Biologic Structure with Determinantal Point Processes (DPP),” Biometrics, 72, 955–964. DOI: 10.1111/biom.12482.
  • Zanella, G., Betancourt, B., Wallach, H., Miller, J., Zaidi, A., and Steorts, R. C. (2016), “Flexible Models for Microclustering with Application to Entity Resolution,” in “Proceedings of the 30th International Conference on Neural Information Processing Systems,” NIPS’16, pp. 1425–1433. Red Hook, NY: Curran Associates Inc.