References
- Agarwal, N., Bullins, B., and Hazan, E. (2016), “Second-Order Stochastic Optimization for Machine Learning in Linear Time,” arXiv no. 1602.03943.
- Aharon, M., Elad, M., and Bruckstein, A. (2006), “K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation,” IEEE Transactions on Signal Processing, 54, 4311–4322. DOI: 10.1109/TSP.2006.881199.
- Andersen, E. D., and Andersen, K. D. (2000), “The MOSEK Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm,” in High Performance Optimization, eds. H. Frenk, K. Roos, T. Terlaky, and S. Zhang, Boston, MA: Springer, pp. 197–232.
- Atkinson, S. E. (1992), “The Performance of Standard and Hybrid EM Algorithms for ML Estimates of the Normal Mixture Model With Censoring,” Journal of Statistical Computation and Simulation, 44, 105–115. DOI: 10.1080/00949659208811452.
- Auton, A., Abecasis, G. R., Altshuler, D. M., Durbin, R. M., Bentley, D. R., Chakravarti, A., Clark, A. G., Donnelly, P., Eichler, E. E., Flicek, P., and Wang, J. (2015), “A Global Reference for Human Genetic Variation,” Nature, 526, 68–74.
- Bezanson, J., Karpinski, S., Shah, V. B., and Edelman, A. (2012), “Julia: A Fast Dynamic Language for Technical Computing,” arXiv no. 1209.5145.
- Birgin, E. G., Martínez, J. M., and Raydan, M. (2000), “Nonmonotone Spectral Projected Gradient Methods on Convex Sets,” SIAM Journal on Optimization, 10, 1196–1211. DOI: 10.1137/S1052623497330963.
- Bottou, L., and Le Cun, Y. (2004), “Large Scale Online Learning,” in Advances in Neural Information Processing Systems (Vol. 16), pp. 217–224.
- Boyd, S. P., and Vandenberghe, L. (2004), Convex Optimization, New York: Cambridge University Press.
- Brown, L. D., and Greenshtein, E. (2009), “Nonparametric Empirical Bayes and Compound Decision Approaches to Estimation of a High-Dimensional Vector of Normal Means,” The Annals of Statistics, 37, 1685–1704. DOI: 10.1214/08-AOS630.
- Byrd, R. H., Hansen, S. L., Nocedal, J., and Singer, Y. (2016), “A Stochastic Quasi-Newton Method for Large-Scale Optimization,” SIAM Journal on Optimization, 26, 1008–1031. DOI: 10.1137/140954362.
- Cordy, C. B., and Thomas, D. R. (1997), “Deconvolution of a Distribution Function,” Journal of the American Statistical Association, 92, 1459–1465. DOI: 10.1080/01621459.1997.10473667.
- Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood Estimation From Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–38. DOI: 10.1111/j.2517-6161.1977.tb01600.x.
- Duchi, J., Shalev-Shwartz, S., Singer, Y., and Chandra, T. (2008), “Efficient Projections Onto the l1-Ball for Learning in High Dimensions,” in Proceedings of the 25th International Conference on Machine Learning, pp. 272–279. DOI: 10.1145/1390156.1390191.
- Dunning, I., Huchette, J., and Lubin, M. (2017), “Jump: A Modeling Language for Mathematical Optimization,” SIAM Review, 59, 295–320. DOI: 10.1137/15M1020575.
- Golub, G. H., and Van Loan, C. F. (2012), Matrix Computations (Vol. 3), Baltimore, MD: JHU Press.
- Halko, N., Martinsson, P.-G., and Tropp, J. A. (2011), “Finding Structure With Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,” SIAM Review, 53, 217–288. DOI: 10.1137/090771806.
- Ho, K. L., and Olver, S. (2018), “LowRankApprox.jl: Fast Low-Rank Matrix Approximation in Julia,” Zenodo, available at 10.5281/zenodo.1254148.
- Jiang, W., and Zhang, C.-H. (2009), “General Maximum Likelihood Empirical Bayes Estimation of Normal Means,” The Annals of Statistics, 37, 1647–1684. DOI: 10.1214/08-AOS638.
- Johnstone, I. M., and Silverman, B. W. (2004), “Needles and Straw in Haystacks: Empirical Bayes Estimates of Possibly Sparse Sequences,” The Annals of Statistics, 32, 1594–1649. DOI: 10.1214/009053604000000030.
- Kiefer, J., and Wolfowitz, J. (1956), “Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters,” The Annals of Mathematical Statistics, 27, 887–906. DOI: 10.1214/aoms/1177728066.
- Kim, D., Sra, S., and Dhillon, I. S. (2010), “Tackling Box-Constrained Optimization via a New Projected Quasi-Newton Approach,” SIAM Journal on Scientific Computing, 32, 3548–3563. DOI: 10.1137/08073812X.
- Koenker, R., and Gu, J. (2017), “REBayes: An R Package for Empirical Bayes Mixture Methods,” Journal of Statistical Software, 82, 1–26. DOI: 10.18637/jss.v082.i08.
- Koenker, R., and Mizera, I. (2014), “Convex Optimization, Shape Constraints, Compound Decisions, and Empirical Bayes Rules,” Journal of the American Statistical Association, 109, 674–685. DOI: 10.1080/01621459.2013.869224.
- Laird, N. (1978), “Nonparametric Maximum Likelihood Estimation of a Mixing Distribution,” Journal of the American Statistical Association, 73, 805–811. DOI: 10.1080/01621459.1978.10480103.
- MacArthur, J., Bowler, E., Cerezo, M., Gil, L., Hall, P., Hastings, E., Junkins, H., McMahon, A., Milano, A., Morales, J., Pendlington, Z. M., Welter, D., Burdett, T., Hindorff, L., Flicek, P., Cunningham, F., and Parkinson, H. (2017), “The New NHGRI-EBI Catalog of Published Genome-Wide Association Studies (GWAS Catalog),” Nucleic Acids Research, 45, D896–D901. DOI: 10.1093/nar/gkw1133.
- Nocedal, J., and Wright, S. J. (2006), Nonlinear Optimization, New York: Springer Science and Business Media.
- Pearson, K. (1894), “Contributions to the mathematical theory of evolution,” Philosophical Transactions of the Royal Society of London A, 185, 71–110. DOI: 10.1098/rsta.1894.0003.
- Potra, F. A., and Wright, S. J. (2000), “Interior-Point Methods,” Journal of Computational and Applied Mathematics, 124, 281–302. DOI: 10.1016/S0377-0427(00)00433-7.
- R Core Team (2017), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing, available at https://www.R-project.org.
- Redner, R. A., and Walker, H. F. (1984), “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Review, 26, 195–239. DOI: 10.1137/1026034.
- Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” The Annals of Mathematical Statistics, 22, 400–407. DOI: 10.1214/aoms/1177729586.
- Salakhutdinov, R., Roweis, S., and Ghahramani, Z. (2003), “Optimization With EM and Expectation-Conjugate-Gradient,” in Proceedings of the 20th International Conference on Machine Learning, pp. 672–679.
- Schmidt, M., Berg, E., Friedlander, M., and Murphy, K. (2009), “Optimizing Costly Functions With Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm,” in Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, pp. 456–463.
- Stephens, M. (2016), “False Discovery Rates: A New Deal,” Biostatistics, 18, 275–294. DOI: 10.1093/biostatistics/kxw041.
- Stephens, M., Carbonetto, P., Dai, C., Gerard, D., Lu, M., Sun, L., Willwerscheid, J., Xiao, N., and Zeng, M. (2018), “ashr: Methods for Adaptive Shrinkage, Using Empirical Bayes,” available at http://CRAN.R-project.org/package=ashr.
- Varadhan, R., and Roland, C. (2008), “Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm,” Scandinavian Journal of Statistics, 35, 335–353. DOI: 10.1111/j.1467-9469.2007.00585.x.
- Wood, A. R., Esko, T., Yang, J., Vedantam, S., Pers, T. H., Gustafsson, S., Chu, A. Y., Estrada, K., Luan, J. A., Kutalik, Z., and Amin, N. (2014), “Defining the Role of Common Variation in the Genomic and Biological Architecture of Adult Human Height,” Nature Genetics, 46, 1173–1186. DOI: 10.1038/ng.3097.
- Wright, S. J. (1998), “Superlinear Convergence of a Stabilized SQP Method to a Degenerate Solution,” Computational Optimization and Applications, 11, 253–275.
- Xu, L., and Jordan, M. I. (1996), “On Convergence Properties of the EM Algorithm for Gaussian Mixtures,” Neural Computation, 8, 129–151. DOI: 10.1162/neco.1996.8.1.129.