Search in:

Advanced search

Journal of Computational and Graphical Statistics Volume 29, 2020 - Issue 4

Submit an article Journal homepage

330

Views

CrossRef citations to date

Altmetric

Bayesian and Latent Variable Models

Scalable Hyperparameter Selection for Latent Dirichlet Allocation

Wei XiaDepartment of Statistics, University of Florida, Gainesville, FLView further author information

Hani DossDepartment of Statistics, University of Florida, Gainesville, FLCorrespondence[email protected]
View further author information

Pages 875-895 | Received 20 Dec 2018, Accepted 02 Dec 2019, Published online: 15 May 2020

Cite this article
https://doi.org/10.1080/10618600.2020.1741378
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Asuncion, A., Welling, M., Smyth, P., and Teh, Y. W. (2009), “On Smoothing and Inference for Topic Models,” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ‘09, AUAI Press, Arlington, VA.
Google Scholar
Blei, D. M., and Jordan, M. I. (2003), “Modeling Annotated Data,” in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘03, ACM, New York, NY.
Google Scholar
Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017), “Variational Inference: A Review for Statisticians,” Journal of the American Statistical Association, 112, 859–877. DOI: 10.1080/01621459.2017.1285773.
Web of Science ®Google Scholar
Blei, D. M., and Lafferty, J. D. (2007), “A Correlated Topic Model of Science,” The Annals of Applied Statistics, 1, 17–35. DOI: 10.1214/07-AOAS114.
Web of Science ®Google Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003), “Latent Dirichlet Allocation,” Journal of Machine Learning Research, 3, 993–1022.
Web of Science ®Google Scholar
Celeux, G., Hurn, M., and Robert, C. P. (2000), “Computational And Inferential Difficulties With Mixture Posterior Distributions,” Journal of the American Statistical Association, 95, 957–970. DOI: 10.1080/01621459.2000.10474285.
Web of Science ®Google Scholar
Chen, M.-H. (1994), “Importance-Weighted Marginal Bayesian Posterior Density Estimation,” Journal of the American Statistical Association, 89, 818–824. DOI: 10.1080/01621459.1994.10476815.
Web of Science ®Google Scholar
Chen, Z. (2015), “Inference for the Number of Topics in the Latent Dirichlet Allocation Model via Bayesian Mixture Modelling,” Ph.D. thesis, University of Florida.
Google Scholar
Chib, S. (1995), “Marginal Likelihood From the Gibbs Output,” Journal of the American Statistical Association, 90, 1313–1321. DOI: 10.1080/01621459.1995.10476635.
Web of Science ®Google Scholar
Chib, S., and Jeliazkov, I. (2001), “Marginal Likelihood From the Metropolis-Hastings Output,” Journal of the American Statistical Association, 96, 270–281. DOI: 10.1198/016214501750332848.
Web of Science ®Google Scholar
Donoho, D. L., and Liu, R. C. (1991), “Geometrizing Rates of Convergence, II,” The Annals of Statistics, 19, 633–667. DOI: 10.1214/aos/1176348114.
Web of Science ®Google Scholar
Doss, H., and George, C. P. (2019), “Theoretical and Empirical Evaluation of a Grouped Gibbs Sampler for Parallel Computation in the LDA Model,” Tech. Rep., Department of Statistics, University of Florida.
Google Scholar
Doss, H., and Linero, A. (2019), “A Fully-Bayes Approach to Empirical Bayes Inference and Bayesian Sensitivity Analysis,” Tech. Rep., Department of Statistics, University of Florida.
Google Scholar
Escobar, M. D., and West, M. (1995), “Bayesian Density Estimation and Inference Using Mixtures,” Journal of the American Statistical Association, 90, 577–588. DOI: 10.1080/01621459.1995.10476550.
Web of Science ®Google Scholar
Flegal, J. M., Haran, M., and Jones, G. L. (2008), “Markov Chain Monte Carlo: Can We Trust the Third Significant Figure?,” Statistical Science, 23 250–260. DOI: 10.1214/08-STS257.
Web of Science ®Google Scholar
Flegal, J. M., Hughes, J., and Vats, D. (2016), “mcmcse: Monte Carlo Standard Errors for MCMC,” Riverside, CA and Minneapolis, MN, R Package Version 1.2-1.
Google Scholar
George, C. P. (2015), “Latent Dirichlet Allocation: Hyperparameter Selection and Applications to Electronic Discovery,” Ph.D. thesis, University of Florida.
Google Scholar
George, C. P., and Doss, H. (2018), “Principled Selection of Hyperparameters in the Latent Dirichlet Allocation Model,” Journal of Machine Learning Research, 18, 1–38.
Web of Science ®Google Scholar
George, E. I., and Foster, D. P. (2000), “Calibration and Empirical Bayes Variable Selection,” Biometrika, 87, 731–747. DOI: 10.1093/biomet/87.4.731.
Web of Science ®Google Scholar
Geyer, C. J., and Thompson, E. A. (1995), “Annealing Markov Chain Monte Carlo With Applications to Ancestral Inference,” Journal of the American Statistical Association, 90, 909–920. DOI: 10.1080/01621459.1995.10476590.
Web of Science ®Google Scholar
Griffiths, T. L., and Steyvers, M. (2004), “Finding Scientific Topics,” Proceedings of the National Academy of Sciences of the United States of America, 101, 5228–5235. DOI: 10.1073/pnas.0307752101.
PubMed Web of Science ®Google Scholar
Hobert, J. P. (2011), “The Data Augmentation Algorithm: Theory and Methodology,” in Handbook of Markov Chain Monte Carlo, eds. S. P. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, Boca Raton, FL: CRC Press, pp. 253–293.
Google Scholar
Jasra, A., Holmes, C. C., and Stephens, D. A. (2005), “Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling,” Statistical Science, 20, 50–67. DOI: 10.1214/088342305000000016.
Web of Science ®Google Scholar
Jones, G. L., Haran, M., Caffo, B. S., and Neath, R. (2006), “Fixed-Width Output Analysis for Markov Chain Monte Carlo,” Journal of the American Statistical Association, 101, 1537–1547. DOI: 10.1198/016214506000000492.
Web of Science ®Google Scholar
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. (1999), “An Introduction to Variational Methods for Graphical Models,” Machine Learning, 37, 183–233. DOI: 10.1023/A:1007665907178.
Web of Science ®Google Scholar
Marinari, E., and Parisi, G. (1992), “Simulated Tempering: A New Monte Carlo Scheme,” Europhysics Letters, 19, 451–458. DOI: 10.1209/0295-5075/19/6/002.
Google Scholar
Meng, X.-L., and Van Dyk, D. (1997), “The EM Algorithm—An Old Folk-Song Sung to a Fast New Tune,” Journal of the Royal Statistical Society, Series B, 59, 511–567. DOI: 10.1111/1467-9868.00082.
Google Scholar
Minka, T. P. (2003), “Estimating a Dirichlet Distribution,” available at http://research.microsoft.com/∼minka/papers/dirichlet/.
Google Scholar
Nash, J. C., and Varadhan, R. (2011), “Unifying Optimization Algorithms to Aid Software System Users: optimx for R,” Journal of Statistical Software, 43, 1–14. DOI: 10.18637/jss.v043.i09.
PubMed Web of Science ®Google Scholar
Neal, R. M. (2003), “Slice Sampling,” The Annals of Statistics, 31, 705–741. DOI: 10.1214/aos/1056562461.
Web of Science ®Google Scholar
Neal, R. M. (2011), “MCMC Using Hamiltonian Dynamics,” in Handbook of Markov Chain Monte Carlo, eds. S. P. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, Boca Raton, FL: CRC Press, pp. 113–162.
Google Scholar
Newman, D., Asuncion, A., Smyth, P., and Welling, M. (2009), “Distributed Algorithms for Topic Models,” Journal of Machine Learning Research, 10, 1801–1828.
Web of Science ®Google Scholar
Newton, M., and Raftery, A. (1994), “Approximate Bayesian Inference With the Weighted Likelihood Bootstrap” (with discussion), Journal of the Royal Statistical Society, Series B, 56, 3–48. DOI: 10.1111/j.2517-6161.1994.tb01956.x.
Google Scholar
Řehůřek, R., and Sojka, P. (2010), “Software Framework for Topic Modelling With Large Corpora,” in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta.
Google Scholar
Robert, C. P. (2001), The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, New York: Springer-Verlag.
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M. and Smyth, P. (2004), “The Author-Topic Model for Authors and Documents,” in Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI ‘04, AUAI Press, Arlington, VA.
Google Scholar
Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006), “Hierarchical Dirichlet Processes,” Journal of the American Statistical Association, 101, 1566–1581. DOI: 10.1198/016214506000000302.
Web of Science ®Google Scholar
Tsybakov, A. B. (1990), “Recursive Estimation of the Mode of a Multivariate distribution,” Problemy Peredachi Informatsii, 26, 38–45.
Google Scholar
Wallach, H. M. (2006), “Topic Modeling: Beyond Bag-of-Words,” in Proceedings of the 23rd International Conference on Machine Learning, ICML ‘06, ACM, New York, NY.
Google Scholar
Wallach, H. M. (2008), “Structured Topic Models for Language,” Ph.D. thesis, University of Cambridge.
Google Scholar
Wallach, H. M., Murray, I., Salakhutdinov, R., and Mimno, D. (2009), “Evaluation Methods for Topic Models,” in Proceedings of the 26th Annual International Conference on Machine Learning, ACM. DOI: 10.1145/1553374.1553515.
Google Scholar
Wolpert, R. L., and Schmidler, S. C. (2012), “α-Stable Limit Laws for Harmonic Mean Estimators of Marginal Likelihoods,” Statistica Sinica, 22, 1233–1251. DOI: 10.5705/ss.2010.221.
Web of Science ®Google Scholar
Xia, W. (2018), “Scalable Hyperparameter Selection for Latent Dirichlet Allocation,” Ph.D. thesis, University of Florida.
Google Scholar
Xia, W., and Doss, H. (2020), “Supplement to ‘Scalable Hyperparameter Selection for Latent Dirichlet Allocation’.”
Google Scholar
Yang, Y. (2005), “Can the Strengths of AIC and BIC Be Shared? A Conflict Between Model Indentification and Regression Estimation,” Biometrika, 92, 937–950. DOI: 10.1093/biomet/92.4.937.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Scalable Hyperparameter Selection for Latent Dirichlet Allocation

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Scalable Hyperparameter Selection for Latent Dirichlet Allocation

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date