721
Views
5
CrossRef citations to date
0
Altmetric
Scalable and Efficient Computation

A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions Using Sequential Quadratic Programming

, , &
Pages 261-273 | Received 02 Jul 2018, Accepted 01 Nov 2019, Published online: 08 Jan 2020

References

  • Agarwal, N., Bullins, B., and Hazan, E. (2016), “Second-Order Stochastic Optimization for Machine Learning in Linear Time,” arXiv no. 1602.03943.
  • Aharon, M., Elad, M., and Bruckstein, A. (2006), “K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation,” IEEE Transactions on Signal Processing, 54, 4311–4322. DOI: 10.1109/TSP.2006.881199.
  • Andersen, E. D., and Andersen, K. D. (2000), “The MOSEK Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm,” in High Performance Optimization, eds. H. Frenk, K. Roos, T. Terlaky, and S. Zhang, Boston, MA: Springer, pp. 197–232.
  • Atkinson, S. E. (1992), “The Performance of Standard and Hybrid EM Algorithms for ML Estimates of the Normal Mixture Model With Censoring,” Journal of Statistical Computation and Simulation, 44, 105–115. DOI: 10.1080/00949659208811452.
  • Auton, A., Abecasis, G. R., Altshuler, D. M., Durbin, R. M., Bentley, D. R., Chakravarti, A., Clark, A. G., Donnelly, P., Eichler, E. E., Flicek, P., and Wang, J. (2015), “A Global Reference for Human Genetic Variation,” Nature, 526, 68–74.
  • Bezanson, J., Karpinski, S., Shah, V. B., and Edelman, A. (2012), “Julia: A Fast Dynamic Language for Technical Computing,” arXiv no. 1209.5145.
  • Birgin, E. G., Martínez, J. M., and Raydan, M. (2000), “Nonmonotone Spectral Projected Gradient Methods on Convex Sets,” SIAM Journal on Optimization, 10, 1196–1211. DOI: 10.1137/S1052623497330963.
  • Bottou, L., and Le Cun, Y. (2004), “Large Scale Online Learning,” in Advances in Neural Information Processing Systems (Vol. 16), pp. 217–224.
  • Boyd, S. P., and Vandenberghe, L. (2004), Convex Optimization, New York: Cambridge University Press.
  • Brown, L. D., and Greenshtein, E. (2009), “Nonparametric Empirical Bayes and Compound Decision Approaches to Estimation of a High-Dimensional Vector of Normal Means,” The Annals of Statistics, 37, 1685–1704. DOI: 10.1214/08-AOS630.
  • Byrd, R. H., Hansen, S. L., Nocedal, J., and Singer, Y. (2016), “A Stochastic Quasi-Newton Method for Large-Scale Optimization,” SIAM Journal on Optimization, 26, 1008–1031. DOI: 10.1137/140954362.
  • Cordy, C. B., and Thomas, D. R. (1997), “Deconvolution of a Distribution Function,” Journal of the American Statistical Association, 92, 1459–1465. DOI: 10.1080/01621459.1997.10473667.
  • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood Estimation From Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–38. DOI: 10.1111/j.2517-6161.1977.tb01600.x.
  • Duchi, J., Shalev-Shwartz, S., Singer, Y., and Chandra, T. (2008), “Efficient Projections Onto the l1-Ball for Learning in High Dimensions,” in Proceedings of the 25th International Conference on Machine Learning, pp. 272–279. DOI: 10.1145/1390156.1390191.
  • Dunning, I., Huchette, J., and Lubin, M. (2017), “Jump: A Modeling Language for Mathematical Optimization,” SIAM Review, 59, 295–320. DOI: 10.1137/15M1020575.
  • Golub, G. H., and Van Loan, C. F. (2012), Matrix Computations (Vol. 3), Baltimore, MD: JHU Press.
  • Halko, N., Martinsson, P.-G., and Tropp, J. A. (2011), “Finding Structure With Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,” SIAM Review, 53, 217–288. DOI: 10.1137/090771806.
  • Ho, K. L., and Olver, S. (2018), “LowRankApprox.jl: Fast Low-Rank Matrix Approximation in Julia,” Zenodo, available at 10.5281/zenodo.1254148.
  • Jiang, W., and Zhang, C.-H. (2009), “General Maximum Likelihood Empirical Bayes Estimation of Normal Means,” The Annals of Statistics, 37, 1647–1684. DOI: 10.1214/08-AOS638.
  • Johnstone, I. M., and Silverman, B. W. (2004), “Needles and Straw in Haystacks: Empirical Bayes Estimates of Possibly Sparse Sequences,” The Annals of Statistics, 32, 1594–1649. DOI: 10.1214/009053604000000030.
  • Kiefer, J., and Wolfowitz, J. (1956), “Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters,” The Annals of Mathematical Statistics, 27, 887–906. DOI: 10.1214/aoms/1177728066.
  • Kim, D., Sra, S., and Dhillon, I. S. (2010), “Tackling Box-Constrained Optimization via a New Projected Quasi-Newton Approach,” SIAM Journal on Scientific Computing, 32, 3548–3563. DOI: 10.1137/08073812X.
  • Koenker, R., and Gu, J. (2017), “REBayes: An R Package for Empirical Bayes Mixture Methods,” Journal of Statistical Software, 82, 1–26. DOI: 10.18637/jss.v082.i08.
  • Koenker, R., and Mizera, I. (2014), “Convex Optimization, Shape Constraints, Compound Decisions, and Empirical Bayes Rules,” Journal of the American Statistical Association, 109, 674–685. DOI: 10.1080/01621459.2013.869224.
  • Laird, N. (1978), “Nonparametric Maximum Likelihood Estimation of a Mixing Distribution,” Journal of the American Statistical Association, 73, 805–811. DOI: 10.1080/01621459.1978.10480103.
  • MacArthur, J., Bowler, E., Cerezo, M., Gil, L., Hall, P., Hastings, E., Junkins, H., McMahon, A., Milano, A., Morales, J., Pendlington, Z. M., Welter, D., Burdett, T., Hindorff, L., Flicek, P., Cunningham, F., and Parkinson, H. (2017), “The New NHGRI-EBI Catalog of Published Genome-Wide Association Studies (GWAS Catalog),” Nucleic Acids Research, 45, D896–D901. DOI: 10.1093/nar/gkw1133.
  • Nocedal, J., and Wright, S. J. (2006), Nonlinear Optimization, New York: Springer Science and Business Media.
  • Pearson, K. (1894), “Contributions to the mathematical theory of evolution,” Philosophical Transactions of the Royal Society of London A, 185, 71–110. DOI: 10.1098/rsta.1894.0003.
  • Potra, F. A., and Wright, S. J. (2000), “Interior-Point Methods,” Journal of Computational and Applied Mathematics, 124, 281–302. DOI: 10.1016/S0377-0427(00)00433-7.
  • R Core Team (2017), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing, available at https://www.R-project.org.
  • Redner, R. A., and Walker, H. F. (1984), “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Review, 26, 195–239. DOI: 10.1137/1026034.
  • Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” The Annals of Mathematical Statistics, 22, 400–407. DOI: 10.1214/aoms/1177729586.
  • Salakhutdinov, R., Roweis, S., and Ghahramani, Z. (2003), “Optimization With EM and Expectation-Conjugate-Gradient,” in Proceedings of the 20th International Conference on Machine Learning, pp. 672–679.
  • Schmidt, M., Berg, E., Friedlander, M., and Murphy, K. (2009), “Optimizing Costly Functions With Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm,” in Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, pp. 456–463.
  • Stephens, M. (2016), “False Discovery Rates: A New Deal,” Biostatistics, 18, 275–294. DOI: 10.1093/biostatistics/kxw041.
  • Stephens, M., Carbonetto, P., Dai, C., Gerard, D., Lu, M., Sun, L., Willwerscheid, J., Xiao, N., and Zeng, M. (2018), “ashr: Methods for Adaptive Shrinkage, Using Empirical Bayes,” available at http://CRAN.R-project.org/package=ashr.
  • Varadhan, R., and Roland, C. (2008), “Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm,” Scandinavian Journal of Statistics, 35, 335–353. DOI: 10.1111/j.1467-9469.2007.00585.x.
  • Wood, A. R., Esko, T., Yang, J., Vedantam, S., Pers, T. H., Gustafsson, S., Chu, A. Y., Estrada, K., Luan, J. A., Kutalik, Z., and Amin, N. (2014), “Defining the Role of Common Variation in the Genomic and Biological Architecture of Adult Human Height,” Nature Genetics, 46, 1173–1186. DOI: 10.1038/ng.3097.
  • Wright, S. J. (1998), “Superlinear Convergence of a Stabilized SQP Method to a Degenerate Solution,” Computational Optimization and Applications, 11, 253–275.
  • Xu, L., and Jordan, M. I. (1996), “On Convergence Properties of the EM Algorithm for Gaussian Mixtures,” Neural Computation, 8, 129–151. DOI: 10.1162/neco.1996.8.1.129.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.