Search in:

Advanced search

Journal of Computational and Graphical Statistics Volume 31, 2022 - Issue 3

Submit an article Journal homepage

399

Views

CrossRef citations to date

Altmetric

Machine Learning

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

Hai Zhanga Center for Modern Statistics, School of Mathematics, Northwest University, Xi’an, China

https://orcid.org/0000-0002-1104-8283 View further author information

Xiao Guoa Center for Modern Statistics, School of Mathematics, Northwest University, Xi’an, ChinaCorrespondence[email protected]
View further author information

Xiangyu Changb Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, Xi’an, ChinaView further author information

Pages 887-906 | Received 18 May 2020, Accepted 22 Jan 2022, Published online: 28 Mar 2022

Cite this article
https://doi.org/10.1080/10618600.2022.2034636
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Abbe, E. (2018), “Community Detection and Stochastic Block Models: Recent Developments,” The Journal of Machine Learning Research, 18, 6446–6531.
Web of Science ®Google Scholar
Abbe, A., Fan, J., Wang, K., and Zhong, Y. (2020), “Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank,” Annals of Statistics, 48, 1452–1474.
PubMed Web of Science ®Google Scholar
Achlioptas, D., and McSherry, F. (2007), “Fast Computation of Low-Rank Matrix Approximations,” Journal of the ACM (JACM), 54, 1–19. DOI: 10.1145/1219092.1219097.
Web of Science ®Google Scholar
Adamic, L. A., and Glance, N. (2005), “The Political Blogosphere and the 2004 US Election: Divided They Blog,” in Proceedings of the 3rd international workshop on Link Discovery, pp. 36–43. ACM.
Google Scholar
Ahn, K., Lee, K., and Suh, C. (2018), “Hypergraph Spectral Clustering in the Weighted Stochastic Block Model,” IEEE Journal of Selected Topics in Signal Processing, 12, 959–974. DOI: 10.1109/JSTSP.2018.2837638.
Web of Science ®Google Scholar
Allen-Zhu, Z., and Li, Y. (2016), “Lazysvd: Even Faster svd Decomposition Yet Without Agonizing Pain,” in Advances in Neural Information Processing Systems, pp. 974–982.
Google Scholar
Arora, S., Hazan, E., and Kale, S. (2006), “A Fast Random Sampling Algorithm for Sparsifying Matrices,” in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, eds. M. Goemans, K. Jansen, J. D. P. Rolim, and L. Trevisan, pp. 272–279, Berlin: Springer.
Google Scholar
Arroyo, J., and Levina, E. (2021), “Overlapping Community Detection in Networks via Sparse Spectral Decomposition,” Sankhya A, 83, 1–35, 2021. DOI: 10.1007/s13171-021-00245-4.
Google Scholar
Baglama, J., and Reichel, L. (2005), “Augmented Implicitly Restarted Lanczos Bidiagonalization Methods,” SIAM Journal on Scientific Computing, 27, 19–42. DOI: 10.1137/04060593X.
Web of Science ®Google Scholar
Baglama, J., Reichel, L., and Lewis, B. W. (2019), irlba: Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices. Available at https://CRAN.R-project.org/package=irlba. R package version 2.3.3.
Google Scholar
Calvetti, D., Reichel, L., and Sorensen, D. C. (1994), “An Implicitly Restarted Lanczos Method for Large Symmetric Eigenvalue Problems,” Electronic Transactions on Numerical Analysis, 2, 1–21.
Google Scholar
Cape, J., Tang, M., and Priebe, C. E. (2019), “Signal-Plus-Noise Matrix Models: Eigenvector Deviations and Fluctuations,” Biometrika, 106, 243–250. DOI: 10.1093/biomet/asy070.
Web of Science ®Google Scholar
Chin, P., Rao, A., and Vu, V. (2015), “Stochastic Block Model and Community Detection in Sparse Graphs: A Spectral Algorithm with Optimal Rate of Recovery,” in Conference on Learning Theory, pp. 391–423.
Google Scholar
Choi, D. S., Wolfe, P. J., and Airoldi, E. M. (2012), “Stochastic Blockmodels with a Growing Number of Classes,” Biometrika, 99, 273–284. DOI: 10.1093/biomet/asr053.
PubMed Web of Science ®Google Scholar
Clarkson, K. L., and Woodruff, D. P. (2017), “Low-Rank Approximation and Regression in Input Sparsity Time,” Journal of the ACM (JACM), 63, 1–45. DOI: 10.1145/3019134.
Web of Science ®Google Scholar
Deng, S., Ling, S., and Strohmer, T. (2021), “Strong Consistency, Graph Laplacians, and the Stochastic Block Model,” Journal of Machine Learning Research, 22, 1–44.
Web of Science ®Google Scholar
Drineas, P., and Mahoney, M. W. (2016), “Randnla: Randomized Numerical Linear Algebra,” Communications of the ACM, 59, 80–90. DOI: 10.1145/2842602.
Web of Science ®Google Scholar
Drineas, P., Mahoney, M. W., and Muthukrishnan, S. (2006), “Sampling Algorithms for l 2 Regression and Applications,” in Proceedings of the Seventeenth annual ACM-SIAM Symposium on Discrete Algorithm, pp. 1127–1136. Society for Industrial and Applied Mathematics.
Google Scholar
Drineas, P., Mahoney, M. W., Muthukrishnan, S., and Sarlós, T. (2011), “Faster Least Squares Approximation,” Numerische Mathematik, 117, 219–249. DOI: 10.1007/s00211-010-0331-6.
Web of Science ®Google Scholar
Drineas, p., Magdon-Ismail, M., Mahoney, M. W., and Woodruff, D. P. (2012), “Fast Approximation of Matrix Coherence and Statistical Leverage,” Journal of Machine Learning Research, 13, 3475–3506.
Web of Science ®Google Scholar
Erichson, N. B., Voronin, S., Brunton, S. L., and Kutz, J. N. (2019), “Randomized Matrix Decompositions Using r,” Journal of Statistical Software, 89, 1–48. DOI: 10.18637/jss.v089.i11.
Web of Science ®Google Scholar
Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T., and Priebe, C. E. (2013), “Consistent Adjacency-Spectral Partitioning for the Stochastic Block Model When the Model Parameters are Unknown,” SIAM Journal on Matrix Analysis and Applications, 34, 23–39. DOI: 10.1137/120875600.
Web of Science ®Google Scholar
Gao, C., and Ma, Z. (2020), “Discussion of ‘Network Cross-Validation by Edge Sampling’,” Biometrika, 107, 281–284. DOI: 10.1093/biomet/asaa022.
Google Scholar
Gao, C., Lu, Y., and Zhou, H. H. (2015), “Rate-Optimal Graphon Estimation,” Annals of Statistics, 43, 2624–2652.
Web of Science ®Google Scholar
Gao, C., Ma, Z., Zhang, A. Y., and Zhou, H. H. (2017), “Achieving Optimal Misclassification Proportion in Stochastic Block Models,” The Journal of Machine Learning Research, 18, 1980–2024.
Web of Science ®Google Scholar
Gittens, A., and Tropp, J. A. (2009), “Error Bounds for Random Matrix Approximation Schemes,” arXiv preprint arXiv:0911.4108.
Google Scholar
Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010), “A Survey of Statistical Network Models,” Foundations and Trends[textregistered] in Machine Learning, 2, 129–233. DOI: 10.1561/2200000005.
Google Scholar
Halko, N., Martinsson, P.-G., and Tropp, J. A. (2011), “Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,” SIAM Review, 53, 217–288. DOI: 10.1137/090771806.
Web of Science ®Google Scholar
Holland, P. W., Laskey, K. B., and Leinhardt, S. (1983), “Stochastic Blockmodels: First Steps,” Social Networks, 5, 109–137. DOI: 10.1016/0378-8733(83)90021-7.
Web of Science ®Google Scholar
Hu, D., Ubaru, S., Gittens, A., Clarkson, K. L., Horesh, L., and Kalantzis, V. (2021), “Sparse Graph Based Sketching for Fast Numerical Linear Algebra,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3255–3259. IEEE.
Google Scholar
Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218. DOI: 10.1007/BF01908075.
Web of Science ®Google Scholar
Ji, P., and Jin, J. (2016), “Coauthorship and Citation Networks for Statisticians,” The Annals of Applied Statistics, 10, 1779–1812. DOI: 10.1214/15-AOAS896.
Web of Science ®Google Scholar
Joseph, A., and Yu, B. (2016), “Impact of Regularization on Spectral Clustering,” The Annals of Statistics, 44, 1765–1791. DOI: 10.1214/16-AOS1447.
Web of Science ®Google Scholar
Karrer, B., and Newman, M. E. J. (2011), “Stochastic Blockmodels and Community Structure in Networks,” Physical Review E, 83, 016107. DOI: 10.1103/PhysRevE.83.016107.
PubMed Web of Science ®Google Scholar
Kolaczyk, E. D. (2009), Statistical Analysis of Network Data: Methods and Models, New York: Springer.
Google Scholar
Kumar, A., Sabharwal, Y., and Sen, S. (2004), “A Simple Linear Time (1+ epsilon)-Approximation Algorithm for k-means Clustering in Any Dimensions,” in Annual Symposium on Foundations of Computer Science (Vol. 45), pp. 454–462. IEEE Computer Society Press.
Google Scholar
Le, C. M., Levina, E., and Vershynin, R. (2015), “Sparse Random Graphs: Regularization and Concentration of the Laplacian,” arXiv preprint arXiv:1502.03049.
Google Scholar
Lehoucq, R. B. (1995), “Analysis and Implementation of an Implicitly Restarted Arnoldi Iteration,” Technical Report, Department of Computational and Applied Mathematics, Rice University, Houston, TX.
Google Scholar
Lei, J., and Rinaldo, A. (2015), “Consistency of Spectral Clustering in Stochastic Block Models,” The Annals of Statistics, 43, 215–237. DOI: 10.1214/14-AOS1274.
Web of Science ®Google Scholar
Leskovec, J., Kleinberg, J., and Faloutsos, C. (2005), “Graphs Over Time: Densification Laws, Shrinking Diameters and Possible Explanations,” in Proceedings of the Eleventh ACM SIGKDD International conference on Knowledge Discovery in Data Mining, pp. 177–187.
Google Scholar
—— (2007), “Graph Evolution: Densification and Shrinking Diameters,” ACM Transactions on Knowledge Discovery from Data (TKDD), 1, 1–40.
Google Scholar
Levin, K. D., Roosta, F., Tang, M., Mahoney, M. W., and Priebe, C. E. (2021), “Limit Theorems for Out-of-Sample Extensions of the Adjacency and Laplacian Spectral Embeddings,” Journal of Machine Learning Research, 22, 1–59.
Web of Science ®Google Scholar
Li, M., and Kang, E. L. (2019), “Randomized Algorithms of Maximum Likelihood Estimation with Spatial Autoregressive Models for Large-Scale Networks,” Statistics and Computing, 29, 1165–1179. DOI: 10.1007/s11222-019-09862-4.
Web of Science ®Google Scholar
Li, T., Levina, E., and Zhu, J. (2020a), “Community Models for Partially Observed Networks from Surveys,” arXiv preprint arXiv:2008.03652.
Google Scholar
Li, T., Levina, E., and Zhu, J. (2020b), “Network Cross-Validation by Edge Sampling,” Biometrika, 107, 257–276. DOI: 10.1093/biomet/asaa006.
Web of Science ®Google Scholar
Liao, Z., Couillet, R., and Mahoney, M. W. (2020), “Sparse Quantized Spectral Clustering,” arXiv preprint arXiv:2010.01376.
Google Scholar
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., and Ma, Y. (2012), “Robust Recovery of Subspace Structures by Low-Rank Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 171–184. DOI: 10.1109/TPAMI.2012.88.
Web of Science ®Google Scholar
Lyzinski, V., Sussman, D. L., Tang, M., Athreya, A., Priebe, C. E. (2014), “Perfect Clustering for Stochastic Nlockmodel Graphs via Adjacency Spectral Embedding,” Electronic Journal of Statistics, 8, 2905–2922. DOI: 10.1214/14-EJS978.
Web of Science ®Google Scholar
Ma, P., Mahoney, M. W., and Yu, B. (2015), “A Statistical Perspective on Algorithmic Leveraging,” The Journal of Machine Learning Research, 16, 861–911.
Web of Science ®Google Scholar
Ma, S., Su, L., and Zhang, Y. (2021), “Determining the Number of Communities in Degree-Corrected Stochastic Block Models,” Journal of Machine Learning Research, 22, 1–63.
Web of Science ®Google Scholar
Mahoney, M. W. (2011), “Randomized Algorithms for Matrices and Data,” Foundations and Trends[textregistered] in Machine Learning, 3, 123–224.
Google Scholar
Mahoney, M. W., and Drineas, P. (2009), “Cur Matrix Decompositions for Improved Data Analysis,” Proceedings of the National Academy of Sciences, 106, 697–702. DOI: 10.1073/pnas.0803205106.
PubMed Web of Science ®Google Scholar
Manning, C., Raghavan, P., and Schütze, H. (2010), “Introduction to Information Retrieval,” Natural Language Engineering, 16, 100–103.
Web of Science ®Google Scholar
Martinsson, P. G. (2016), “Randomized Methods for Matrix Computations,” arXiv preprint arXiv:1607.01649.
Google Scholar
Martinsson, P. G., and Tropp, J. (2020), “Randomized Numerical Linear Algebra: Foundations & Algorithms,” arXiv preprint arXiv:2002.01387.
Google Scholar
Matoušek, J. (2000), “On Approximate Geometric k-Clustering,” Discrete & Computational Geometry, 24, 61–84.
Web of Science ®Google Scholar
Newman, M. (2018), Networks, Oxford: Oxford University Press.
Google Scholar
Paul, S., and Chen, Y. (2020), “Spectral and Matrix Factorization Methods for Consistent Community Detection in Multi-Layer Networks,” The Annals of Statistics, 48, 230–250. DOI: 10.1214/18-AOS1800.
Web of Science ®Google Scholar
Pilanci, M., and Wainwright, M. J. (2016), “Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares,” The Journal of Machine Learning Research, 17, 1842–1879.
Web of Science ®Google Scholar
Pilanci, M., and Wainwright, M. J. (2017), “Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence,” SIAM Journal on Optimization, 27, 205–245.
Web of Science ®Google Scholar
Qin, T., and Rohe, K. (2013), “Regularized Spectral Clustering Under the Degree-Corrected Stochastic Blockmodel,” in Advances in Neural Information Processing Systems, pp. 3120–3128.
Google Scholar
Qiu, Y., and Mei, J. (2019), RSpectra: Solvers for Large-Scale Eigenvalue and SVD Problems. Available at https://CRAN.R-project.org/package= RSpectra.
Google Scholar
Raskutti, G., and Mahoney, M. W. (2016), “A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares,” The Journal of Machine Learning Research, 17, 7508–7538.
Web of Science ®Google Scholar
Rohe, K., Chatterhee, S., and Yu, B. (2011), “Spectral Clustering and the High-Dimensional Stochastic Block Model,” The Annals of Statistics, 39, 1878–1915.
Web of Science ®Google Scholar
Sakai, T., and Imiya, A. (2009), “Fast Spectral Clustering with Random Projection and Sampling,” in International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 372–384. Springer.
Google Scholar
Sarkar, P., and Bickel, P. J. (2015), “Role of Normalization in Spectral Clustering for Stochastic Blockmodels,” The Annals of Statistics, 43, 962–990. DOI: 10.1214/14-AOS1285.
Web of Science ®Google Scholar
Sinha, K. (2018), “K-means Clustering Using Random Matrix Sparsification,” in International Conference on Machine Learning, pp. 4684–4692. PMLR.
Google Scholar
Su, L., Wang, W., and Zhang, Y. (2019), “Strong Consistency of Spectral Clustering for Stochastic Block Models,” IEEE Transactions on Information Theory, 66, 324–338. DOI: 10.1109/TIT.2019.2934157.
Web of Science ®Google Scholar
Tang, M., Cape, J., and Priebe, C. E. (2021), “Asymptotically Efficient Estimators for Stochastic Blockmodels: The Naive mle, the Rank-Constrained mle, and the Spectral,” Bernoulli, Preprint. DOI: 10.3150/21-BEJ1376.
Google Scholar
Terada, Y. (2014), “Strong Consistency of Reduced k-means Clustering,” Scandinavian Journal of Statistics, 41, 913–931. DOI: 10.1111/sjos.12074.
Web of Science ®Google Scholar
Tremblay, N., and Loukas, A. (2020), “Approximating Spectral Clustering via Sampling: A Review,” Sampling Techniques for Supervised or Unsupervised Tasks, pp. 129–183.
Google Scholar
Tremblay, N., Puy, G., Gribonval, R., and Vandergheynst, P. (2016), “Compressive Spectral Clustering,” in International Conference on Machine Learning, pp. 1002–1011. PMLR.
Google Scholar
Tropp, J. A., Yurtsever, A., Udell, M., and Cevher, V. (2016), “Randomized Single-View Algorithms for Low-Rank Matrix Approximation,” arXiv preprint, arXiv:1609.00048.
Google Scholar
Vidal, R., Ma, Y., and Sastry, S. (2005), “Generalized Principal Component Analysis (gpca),” IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1945–1959. DOI: 10.1109/TPAMI.2005.244.
PubMed Web of Science ®Google Scholar
Von Luxburg, U. (2007), “A Tutorial on Spectral Clustering,” Statistics and Computing, 17, 395–416. DOI: 10.1007/s11222-007-9033-z.
Web of Science ®Google Scholar
Wang, H. (2019), “More Efficient Estimation for Logistic Regression with Optimal Subsamples,” Journal of Machine Learning Research, 20, 1–59.
Web of Science ®Google Scholar
Wang, H., Zhu, R., and Ma, P. (2018), “Optimal Subsampling for Large Sample Logistic Regression,” Journal of the American Statistical Association, 113, 829–844. DOI: 10.1080/01621459.2017.1292914.
PubMed Web of Science ®Google Scholar
Wang, H., Yang, M., and Stufken, J. (2019a), “Information-Based Optimal Subdata Selection for Big Data Linear Regression,” Journal of the American Statistical Association, 114, 393–405. DOI: 10.1080/01621459.2017.1408468.
Web of Science ®Google Scholar
Wang, S., Gittens, A., and Mahoney, M. W. (2017), “Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging,” The Journal of Machine Learning Research, 18, 8039–8088.
Web of Science ®Google Scholar
Wang, S., Gittens, A., and Mahoney, M. W. (2019a), “Scalable Kernel k-means Clustering with Nystrom Approximation: Relative-Error Bounds,” Journal of Machine Learning Research, 20, 1–49.
Web of Science ®Google Scholar
Witten, R., and Candès, E. (2015), “Randomized Algorithms for Low-Rank Matrix Factorizations: Sharp Performance Bounds,” Algorithmica, 72, 264–281. DOI: 10.1007/s00453-014-9891-7.
Web of Science ®Google Scholar
Yan, D., Huang, L., and Jordan, M. I. (2009), “Fast Approximate Spectral Clustering,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. DOI: 10.1145/1557019.1557118.
Google Scholar
Yang, C., Priebe, C. E., Park, Y., and Marchette, D. J. (2020), “Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering,” Journal of Computational and Graphical Statistics, 30, 422–441. DOI: 10.1080/10618600.2020.1824870.
Web of Science ®Google Scholar
Yang, J., and Leskovec, J. (2015), “Defining and Evaluating Network Communities Based on Ground-Truth,” Knowledge and Information Systems, 42, 181–213. DOI: 10.1007/s10115-013-0693-z.
Web of Science ®Google Scholar
Yin, H., Benson, A. R., Leskovec, J., and Gleich, D. F. (2017), “Local Higher-Order Graph Clustering,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 555–564. ACM.
Google Scholar
Yun, S.-Y., and Proutiere, A. (2014), “Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms,” arXiv preprint arXiv:1412.7335.
Google Scholar
Yun, S.-Y., and Proutiere, A. (2016), “Optimal Cluster Recovery in the Labeled Stochastic Block Model,” in Advances in Neural Information Processing Systems, pp. 965–973.
Google Scholar
Zhou, J., Tu, Y., Chen, Y., and Wang, H. (2017), “Estimating Spatial Autocorrelation with Sampled Network Data,” Journal of Business & Economic Statistics, 35, 130–138, 2017. DOI: 10.1080/07350015.2015.1061437.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date