399
Views
5
CrossRef citations to date
0
Altmetric
Machine Learning

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

ORCID Icon, &
Pages 887-906 | Received 18 May 2020, Accepted 22 Jan 2022, Published online: 28 Mar 2022

References

  • Abbe, E. (2018), “Community Detection and Stochastic Block Models: Recent Developments,” The Journal of Machine Learning Research, 18, 6446–6531.
  • Abbe, A., Fan, J., Wang, K., and Zhong, Y. (2020), “Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank,” Annals of Statistics, 48, 1452–1474.
  • Achlioptas, D., and McSherry, F. (2007), “Fast Computation of Low-Rank Matrix Approximations,” Journal of the ACM (JACM), 54, 1–19. DOI: 10.1145/1219092.1219097.
  • Adamic, L. A., and Glance, N. (2005), “The Political Blogosphere and the 2004 US Election: Divided They Blog,” in Proceedings of the 3rd international workshop on Link Discovery, pp. 36–43. ACM.
  • Ahn, K., Lee, K., and Suh, C. (2018), “Hypergraph Spectral Clustering in the Weighted Stochastic Block Model,” IEEE Journal of Selected Topics in Signal Processing, 12, 959–974. DOI: 10.1109/JSTSP.2018.2837638.
  • Allen-Zhu, Z., and Li, Y. (2016), “Lazysvd: Even Faster svd Decomposition Yet Without Agonizing Pain,” in Advances in Neural Information Processing Systems, pp. 974–982.
  • Arora, S., Hazan, E., and Kale, S. (2006), “A Fast Random Sampling Algorithm for Sparsifying Matrices,” in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, eds. M. Goemans, K. Jansen, J. D. P. Rolim, and L. Trevisan, pp. 272–279, Berlin: Springer.
  • Arroyo, J., and Levina, E. (2021), “Overlapping Community Detection in Networks via Sparse Spectral Decomposition,” Sankhya A, 83, 1–35, 2021. DOI: 10.1007/s13171-021-00245-4.
  • Baglama, J., and Reichel, L. (2005), “Augmented Implicitly Restarted Lanczos Bidiagonalization Methods,” SIAM Journal on Scientific Computing, 27, 19–42. DOI: 10.1137/04060593X.
  • Baglama, J., Reichel, L., and Lewis, B. W. (2019), irlba: Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices. Available at https://CRAN.R-project.org/package=irlba. R package version 2.3.3.
  • Calvetti, D., Reichel, L., and Sorensen, D. C. (1994), “An Implicitly Restarted Lanczos Method for Large Symmetric Eigenvalue Problems,” Electronic Transactions on Numerical Analysis, 2, 1–21.
  • Cape, J., Tang, M., and Priebe, C. E. (2019), “Signal-Plus-Noise Matrix Models: Eigenvector Deviations and Fluctuations,” Biometrika, 106, 243–250. DOI: 10.1093/biomet/asy070.
  • Chin, P., Rao, A., and Vu, V. (2015), “Stochastic Block Model and Community Detection in Sparse Graphs: A Spectral Algorithm with Optimal Rate of Recovery,” in Conference on Learning Theory, pp. 391–423.
  • Choi, D. S., Wolfe, P. J., and Airoldi, E. M. (2012), “Stochastic Blockmodels with a Growing Number of Classes,” Biometrika, 99, 273–284. DOI: 10.1093/biomet/asr053.
  • Clarkson, K. L., and Woodruff, D. P. (2017), “Low-Rank Approximation and Regression in Input Sparsity Time,” Journal of the ACM (JACM), 63, 1–45. DOI: 10.1145/3019134.
  • Deng, S., Ling, S., and Strohmer, T. (2021), “Strong Consistency, Graph Laplacians, and the Stochastic Block Model,” Journal of Machine Learning Research, 22, 1–44.
  • Drineas, P., and Mahoney, M. W. (2016), “Randnla: Randomized Numerical Linear Algebra,” Communications of the ACM, 59, 80–90. DOI: 10.1145/2842602.
  • Drineas, P., Mahoney, M. W., and Muthukrishnan, S. (2006), “Sampling Algorithms for l 2 Regression and Applications,” in Proceedings of the Seventeenth annual ACM-SIAM Symposium on Discrete Algorithm, pp. 1127–1136. Society for Industrial and Applied Mathematics.
  • Drineas, P., Mahoney, M. W., Muthukrishnan, S., and Sarlós, T. (2011), “Faster Least Squares Approximation,” Numerische Mathematik, 117, 219–249. DOI: 10.1007/s00211-010-0331-6.
  • Drineas, p., Magdon-Ismail, M., Mahoney, M. W., and Woodruff, D. P. (2012), “Fast Approximation of Matrix Coherence and Statistical Leverage,” Journal of Machine Learning Research, 13, 3475–3506.
  • Erichson, N. B., Voronin, S., Brunton, S. L., and Kutz, J. N. (2019), “Randomized Matrix Decompositions Using r,” Journal of Statistical Software, 89, 1–48. DOI: 10.18637/jss.v089.i11.
  • Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T., and Priebe, C. E. (2013), “Consistent Adjacency-Spectral Partitioning for the Stochastic Block Model When the Model Parameters are Unknown,” SIAM Journal on Matrix Analysis and Applications, 34, 23–39. DOI: 10.1137/120875600.
  • Gao, C., and Ma, Z. (2020), “Discussion of ‘Network Cross-Validation by Edge Sampling’,” Biometrika, 107, 281–284. DOI: 10.1093/biomet/asaa022.
  • Gao, C., Lu, Y., and Zhou, H. H. (2015), “Rate-Optimal Graphon Estimation,” Annals of Statistics, 43, 2624–2652.
  • Gao, C., Ma, Z., Zhang, A. Y., and Zhou, H. H. (2017), “Achieving Optimal Misclassification Proportion in Stochastic Block Models,” The Journal of Machine Learning Research, 18, 1980–2024.
  • Gittens, A., and Tropp, J. A. (2009), “Error Bounds for Random Matrix Approximation Schemes,” arXiv preprint arXiv:0911.4108.
  • Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010), “A Survey of Statistical Network Models,” Foundations and Trends[textregistered] in Machine Learning, 2, 129–233. DOI: 10.1561/2200000005.
  • Halko, N., Martinsson, P.-G., and Tropp, J. A. (2011), “Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,” SIAM Review, 53, 217–288. DOI: 10.1137/090771806.
  • Holland, P. W., Laskey, K. B., and Leinhardt, S. (1983), “Stochastic Blockmodels: First Steps,” Social Networks, 5, 109–137. DOI: 10.1016/0378-8733(83)90021-7.
  • Hu, D., Ubaru, S., Gittens, A., Clarkson, K. L., Horesh, L., and Kalantzis, V. (2021), “Sparse Graph Based Sketching for Fast Numerical Linear Algebra,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3255–3259. IEEE.
  • Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218. DOI: 10.1007/BF01908075.
  • Ji, P., and Jin, J. (2016), “Coauthorship and Citation Networks for Statisticians,” The Annals of Applied Statistics, 10, 1779–1812. DOI: 10.1214/15-AOAS896.
  • Joseph, A., and Yu, B. (2016), “Impact of Regularization on Spectral Clustering,” The Annals of Statistics, 44, 1765–1791. DOI: 10.1214/16-AOS1447.
  • Karrer, B., and Newman, M. E. J. (2011), “Stochastic Blockmodels and Community Structure in Networks,” Physical Review E, 83, 016107. DOI: 10.1103/PhysRevE.83.016107.
  • Kolaczyk, E. D. (2009), Statistical Analysis of Network Data: Methods and Models, New York: Springer.
  • Kumar, A., Sabharwal, Y., and Sen, S. (2004), “A Simple Linear Time (1+ epsilon)-Approximation Algorithm for k-means Clustering in Any Dimensions,” in Annual Symposium on Foundations of Computer Science (Vol. 45), pp. 454–462. IEEE Computer Society Press.
  • Le, C. M., Levina, E., and Vershynin, R. (2015), “Sparse Random Graphs: Regularization and Concentration of the Laplacian,” arXiv preprint arXiv:1502.03049.
  • Lehoucq, R. B. (1995), “Analysis and Implementation of an Implicitly Restarted Arnoldi Iteration,” Technical Report, Department of Computational and Applied Mathematics, Rice University, Houston, TX.
  • Lei, J., and Rinaldo, A. (2015), “Consistency of Spectral Clustering in Stochastic Block Models,” The Annals of Statistics, 43, 215–237. DOI: 10.1214/14-AOS1274.
  • Leskovec, J., Kleinberg, J., and Faloutsos, C. (2005), “Graphs Over Time: Densification Laws, Shrinking Diameters and Possible Explanations,” in Proceedings of the Eleventh ACM SIGKDD International conference on Knowledge Discovery in Data Mining, pp. 177–187.
  • —— (2007), “Graph Evolution: Densification and Shrinking Diameters,” ACM Transactions on Knowledge Discovery from Data (TKDD), 1, 1–40.
  • Levin, K. D., Roosta, F., Tang, M., Mahoney, M. W., and Priebe, C. E. (2021), “Limit Theorems for Out-of-Sample Extensions of the Adjacency and Laplacian Spectral Embeddings,” Journal of Machine Learning Research, 22, 1–59.
  • Li, M., and Kang, E. L. (2019), “Randomized Algorithms of Maximum Likelihood Estimation with Spatial Autoregressive Models for Large-Scale Networks,” Statistics and Computing, 29, 1165–1179. DOI: 10.1007/s11222-019-09862-4.
  • Li, T., Levina, E., and Zhu, J. (2020a), “Community Models for Partially Observed Networks from Surveys,” arXiv preprint arXiv:2008.03652.
  • Li, T., Levina, E., and Zhu, J. (2020b), “Network Cross-Validation by Edge Sampling,” Biometrika, 107, 257–276. DOI: 10.1093/biomet/asaa006.
  • Liao, Z., Couillet, R., and Mahoney, M. W. (2020), “Sparse Quantized Spectral Clustering,” arXiv preprint arXiv:2010.01376.
  • Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., and Ma, Y. (2012), “Robust Recovery of Subspace Structures by Low-Rank Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 171–184. DOI: 10.1109/TPAMI.2012.88.
  • Lyzinski, V., Sussman, D. L., Tang, M., Athreya, A., Priebe, C. E. (2014), “Perfect Clustering for Stochastic Nlockmodel Graphs via Adjacency Spectral Embedding,” Electronic Journal of Statistics, 8, 2905–2922. DOI: 10.1214/14-EJS978.
  • Ma, P., Mahoney, M. W., and Yu, B. (2015), “A Statistical Perspective on Algorithmic Leveraging,” The Journal of Machine Learning Research, 16, 861–911.
  • Ma, S., Su, L., and Zhang, Y. (2021), “Determining the Number of Communities in Degree-Corrected Stochastic Block Models,” Journal of Machine Learning Research, 22, 1–63.
  • Mahoney, M. W. (2011), “Randomized Algorithms for Matrices and Data,” Foundations and Trends[textregistered] in Machine Learning, 3, 123–224.
  • Mahoney, M. W., and Drineas, P. (2009), “Cur Matrix Decompositions for Improved Data Analysis,” Proceedings of the National Academy of Sciences, 106, 697–702. DOI: 10.1073/pnas.0803205106.
  • Manning, C., Raghavan, P., and Schütze, H. (2010), “Introduction to Information Retrieval,” Natural Language Engineering, 16, 100–103.
  • Martinsson, P. G. (2016), “Randomized Methods for Matrix Computations,” arXiv preprint arXiv:1607.01649.
  • Martinsson, P. G., and Tropp, J. (2020), “Randomized Numerical Linear Algebra: Foundations & Algorithms,” arXiv preprint arXiv:2002.01387.
  • Matoušek, J. (2000), “On Approximate Geometric k-Clustering,” Discrete & Computational Geometry, 24, 61–84.
  • Newman, M. (2018), Networks, Oxford: Oxford University Press.
  • Paul, S., and Chen, Y. (2020), “Spectral and Matrix Factorization Methods for Consistent Community Detection in Multi-Layer Networks,” The Annals of Statistics, 48, 230–250. DOI: 10.1214/18-AOS1800.
  • Pilanci, M., and Wainwright, M. J. (2016), “Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares,” The Journal of Machine Learning Research, 17, 1842–1879.
  • Pilanci, M., and Wainwright, M. J. (2017), “Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence,” SIAM Journal on Optimization, 27, 205–245.
  • Qin, T., and Rohe, K. (2013), “Regularized Spectral Clustering Under the Degree-Corrected Stochastic Blockmodel,” in Advances in Neural Information Processing Systems, pp. 3120–3128.
  • Qiu, Y., and Mei, J. (2019), RSpectra: Solvers for Large-Scale Eigenvalue and SVD Problems. Available at https://CRAN.R-project.org/package= RSpectra.
  • Raskutti, G., and Mahoney, M. W. (2016), “A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares,” The Journal of Machine Learning Research, 17, 7508–7538.
  • Rohe, K., Chatterhee, S., and Yu, B. (2011), “Spectral Clustering and the High-Dimensional Stochastic Block Model,” The Annals of Statistics, 39, 1878–1915.
  • Sakai, T., and Imiya, A. (2009), “Fast Spectral Clustering with Random Projection and Sampling,” in International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 372–384. Springer.
  • Sarkar, P., and Bickel, P. J. (2015), “Role of Normalization in Spectral Clustering for Stochastic Blockmodels,” The Annals of Statistics, 43, 962–990. DOI: 10.1214/14-AOS1285.
  • Sinha, K. (2018), “K-means Clustering Using Random Matrix Sparsification,” in International Conference on Machine Learning, pp. 4684–4692. PMLR.
  • Su, L., Wang, W., and Zhang, Y. (2019), “Strong Consistency of Spectral Clustering for Stochastic Block Models,” IEEE Transactions on Information Theory, 66, 324–338. DOI: 10.1109/TIT.2019.2934157.
  • Tang, M., Cape, J., and Priebe, C. E. (2021), “Asymptotically Efficient Estimators for Stochastic Blockmodels: The Naive mle, the Rank-Constrained mle, and the Spectral,” Bernoulli, Preprint. DOI: 10.3150/21-BEJ1376.
  • Terada, Y. (2014), “Strong Consistency of Reduced k-means Clustering,” Scandinavian Journal of Statistics, 41, 913–931. DOI: 10.1111/sjos.12074.
  • Tremblay, N., and Loukas, A. (2020), “Approximating Spectral Clustering via Sampling: A Review,” Sampling Techniques for Supervised or Unsupervised Tasks, pp. 129–183.
  • Tremblay, N., Puy, G., Gribonval, R., and Vandergheynst, P. (2016), “Compressive Spectral Clustering,” in International Conference on Machine Learning, pp. 1002–1011. PMLR.
  • Tropp, J. A., Yurtsever, A., Udell, M., and Cevher, V. (2016), “Randomized Single-View Algorithms for Low-Rank Matrix Approximation,” arXiv preprint, arXiv:1609.00048.
  • Vidal, R., Ma, Y., and Sastry, S. (2005), “Generalized Principal Component Analysis (gpca),” IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1945–1959. DOI: 10.1109/TPAMI.2005.244.
  • Von Luxburg, U. (2007), “A Tutorial on Spectral Clustering,” Statistics and Computing, 17, 395–416. DOI: 10.1007/s11222-007-9033-z.
  • Wang, H. (2019), “More Efficient Estimation for Logistic Regression with Optimal Subsamples,” Journal of Machine Learning Research, 20, 1–59.
  • Wang, H., Zhu, R., and Ma, P. (2018), “Optimal Subsampling for Large Sample Logistic Regression,” Journal of the American Statistical Association, 113, 829–844. DOI: 10.1080/01621459.2017.1292914.
  • Wang, H., Yang, M., and Stufken, J. (2019a), “Information-Based Optimal Subdata Selection for Big Data Linear Regression,” Journal of the American Statistical Association, 114, 393–405. DOI: 10.1080/01621459.2017.1408468.
  • Wang, S., Gittens, A., and Mahoney, M. W. (2017), “Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging,” The Journal of Machine Learning Research, 18, 8039–8088.
  • Wang, S., Gittens, A., and Mahoney, M. W. (2019a), “Scalable Kernel k-means Clustering with Nystrom Approximation: Relative-Error Bounds,” Journal of Machine Learning Research, 20, 1–49.
  • Witten, R., and Candès, E. (2015), “Randomized Algorithms for Low-Rank Matrix Factorizations: Sharp Performance Bounds,” Algorithmica, 72, 264–281. DOI: 10.1007/s00453-014-9891-7.
  • Yan, D., Huang, L., and Jordan, M. I. (2009), “Fast Approximate Spectral Clustering,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. DOI: 10.1145/1557019.1557118.
  • Yang, C., Priebe, C. E., Park, Y., and Marchette, D. J. (2020), “Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering,” Journal of Computational and Graphical Statistics, 30, 422–441. DOI: 10.1080/10618600.2020.1824870.
  • Yang, J., and Leskovec, J. (2015), “Defining and Evaluating Network Communities Based on Ground-Truth,” Knowledge and Information Systems, 42, 181–213. DOI: 10.1007/s10115-013-0693-z.
  • Yin, H., Benson, A. R., Leskovec, J., and Gleich, D. F. (2017), “Local Higher-Order Graph Clustering,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 555–564. ACM.
  • Yun, S.-Y., and Proutiere, A. (2014), “Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms,” arXiv preprint arXiv:1412.7335.
  • Yun, S.-Y., and Proutiere, A. (2016), “Optimal Cluster Recovery in the Labeled Stochastic Block Model,” in Advances in Neural Information Processing Systems, pp. 965–973.
  • Zhou, J., Tu, Y., Chen, Y., and Wang, H. (2017), “Estimating Spatial Autocorrelation with Sampled Network Data,” Journal of Business & Economic Statistics, 35, 130–138, 2017. DOI: 10.1080/07350015.2015.1061437.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.