3,542
Views
84
CrossRef citations to date
0
Altmetric
Applications and Case Studies

Improving and Evaluating Topic Models and Other Models of Text

&
Pages 1381-1403 | Received 01 Sep 2012, Published online: 04 Jan 2017

References

  • Adams, R. P., Ghahramani, Z., and Jordan, M. I. (2010), “Tree-Structured Stick Breaking for Hierarchical Data,” in Advances in Neural Information Processing Systems (NIPS) 23, pp. 19–27.
  • Airoldi, E. M., Anderson, A. G., Fienberg, S. E., and Skinner, K. K. (2006), “Who Wrote Ronald Reagan’s Radio Addresses?” Bayesian Analysis, 1, 289–320.
  • Airoldi, E. M., Blei, D. M., Erosheva, E. A., and Fienberg, S. E. (eds.) (2014), Handbook of Mixed Membership Models and Their Applications, Boca Raton, FL: Chapman & Hall/CRC Press.
  • Airoldi, E. M., Blei, D. M., Fienberg, S., and Xing, E. (2008), “Mixed-Membership Stochastic Blockmodels,” Journal of Machine Learning Research, 9, 1981–2014.
  • Airoldi, E. M., Erosheva, E. A., Fienberg, S. E., Joutard, C. J., Love, T. M., and Shringarpure, S. (2010), “Reconceptualizing the Classification of PNAS Articles,” Proceedings of the National Academy of Sciences, 107, 20899–20904.
  • Airoldi, E. M., Fienberg, S. E., and Skinner, K. K. (2007a), “Whose Ideas? Whose Words? Authorship of the Ronald Reagan Radio Addresses,” Political Science & Politics, 40, 501–506.
  • Airoldi, E. M., Fienberg, S. E., and Xing, E. P. (2007b), “Mixed Membership Analysis of Genome-Wide Expression Studies—Attribute Data,” arXiv no. 0711.2520.
  • Aletras, N., and Stevenson, M. (2013), Evaluating Topic Coherence Using Distributional Semantics, in IWCS, number 2009, Shrewsbury, PA: ICWS.
  • Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubinand, G. M., and Sherlock, G. (2000), “Gene Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium,” Nature Genetics, 25, 25–29.
  • Bakalov, A., McCallum, A., Wallach, H., and Mimno, D. (2012), “Topic Models for Taxonomies,” in Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 237–240.
  • Blei, D. (2012), “Introduction to Probabilistic Topic Models,” Communications of the ACM, 55, 77–84.
  • Blei, D., Griffiths, T., Jordan, M., and Tenenbaum, J. (2003), “Hierarchical Topic Models and the Nested Chinese Restaurant Process,” in NIPS 16, Cambridge, MA: MIT Press, pp. 17–24.
  • Blei, D., and McAuliffe, J. (2010), “Supervised Topic Models,” arXiv:1003.0783.
  • Blei, D., Ng, A., and Jordan, M. (2003), “Latent Dirichlet Allocation,” Journal of Machine Learning Research, 3, 993–1022.
  • Breiman, L. (2001), “Statistical Modeling: The Two Cultures,” Statistical Science, 16, 199–231.
  • Buntine, W., and Jakulin, A. (2006), “Discrete Components Analysis,” in Subspace, Latent Structure and Feature Selection, volume 3940 of Lecture Notes in Computer Science, Berlin: Springer, pp. 1–33.
  • Canny, J. (2004), “GAP: A Factor Model for Discrete Data,” in Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 122–129.
  • Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., and Blei, D. (2009), “Reading Tea Leaves: How Humans Interpret Topic Models,” in Advances in Neural Information Processing Systems 22, pp. 288–296.
  • Eisenstein, J., Ahmed, A., and Xing, E. P. (2011), “Sparse Additive Generative Models of Text,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1041–1048.
  • Harman, D. (1992), “Overview of the First Text Retrieval Conference (TREC-1),” in Proceedings of the First Text Retrieval Conference (TREC-1), pp. 1–20.
  • Hotelling, H. (1936), “Relations Between Two Sets of Variants,” Biometrika, 28, 321–377.
  • Hu, Y., Boyd-Graber, J., and Satinoff, B. (2011), “Interactive Topic Modeling,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 248–257.
  • Jia, J., Miratrix, L., Yu, B., Gawalt, B., El Ghaoui, L., Barnesmoore, L., and Clavier, S. (2014), “Concise Comparative Summaries (CCS) of Large Text Corpora With a Human Experiment,” Annals of Applied Statistics, 8, 499–529.
  • Jolliffe, I. T. (1986), Principal Component Analysis, New York: Springer-Verlag.
  • Kanehisa, M., and Goto, S. (2000), “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acids Research, 28, 27–30.
  • Lewis, D. D., Yang, Y., Rose, T. G., and Li, F. (2004), “RCV1: A New Benchmark Collection for Text Categorization Research,” Journal of Machine Learning Research, 5, 361–397.
  • Liu, J. S., and Wu, Y. N. (1999), “Parameter Expansion for Data Augmentation,” Journal of the American Statistical Association, 94, 1264–1274.
  • McCallum, A., Rosenfeld, R., Mitchell, T., and Ng, A. (1998), “Improving Text Classification by Shrinkage in a Hierarchy of Classes,” in Proceedings of the 15th International Conference on Machine Learning, pp. 359–367.
  • McLachlan, G., and Peel, D. (2000), Finite Mixture Models, New York: Wiley.
  • Meng, X., and Rubin, D. B. (1991), “Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm,” Journal of the American Statistical Association, 86, 899–909.
  • Mimno, D., Li, W., and McCallum, A. (2007), “Mixtures of Hierarchical Topics With Pachinko Allocation,” in Proceedings of the 24th International Conference on Machine Learning, pp. 633–640.
  • Mimno, D., Wallach, H., Talley, E., Leenders, M., and McCallum, A. (2011), “Optimizing Semantic Coherence in Topic Models,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272.
  • Mosteller, F., and Wallace, D. (1964), Inference and Disputed Authorship: The Federalist, Reading, MA: Addison-Wesley.
  • Mosteller, F., and Wallace, D. (1984), Applied Bayesian and Classical Inference: The Case of “The Federalist” Papers, New York: Springer-Verlag.
  • Neal, R. (2011), “MCMC using Hamiltonian Dynamics,” in Handbook of Markov Chain Monte Carlo, eds. S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, Boca Raton, FL: Chapman & Hall/CRC Press, pp. 113–162.
  • Newman, D., Lau, J. H., Grieser, K., and Baldwin, T. (2010), “Automatic Evaluation of Topic Coherence,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108.
  • Nigam, K., McCallum, A., Thrun, S., and Mitchell, T. (2000), “Text Classification From Labeled and Unlabeled Documents Using EM,” Machine Learning, 39, 103–134.
  • Perotte, A., Bartlett, N., Elhadad, N., and Wood, F. (2012), “Hierarchically Supervised Latent Dirichlet Allocation,” in Advances in Neural Information Processing Systems 24, pp. 2609–2617.
  • Ramage, D., Hall, D., Nallapati, R., and Manning, C. D. (2009), “Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora,” in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 248–256.
  • Rubin, T., Chambers, A., Smyth, P., and Steyvers, M. (2012), “Statistical Topic Models for Multi-Label Document Classification,” Machine Learning, 88, 157–208.
  • Sandhaus, E. (2008), The New York Times Annotated Corpus, Philadelphia,PA: Linguistic Data Consortium.
  • Sohn, K., and Xing, E. P. (2009), “A Hierarchical Dirichlet Process Mixture Model for Haplotype Reconstruction From Multi-Population Data,” Annals of Applied Statistics, 3, 791–821.
  • Wallach, H., Mimno, D., and McCallum, A. (2009), “Rethinking LDA: Why Priors Matter,” in Advances in Neural Information Processing Systems 22, pp. 1973–1981.
  • Zhu, J., Ahmed, A., and Xing, E. P. (2012), “Medlda: Maximum Margin Supervised Topic Models,” Journal of Machine Learning Research, 13, 2237–2278.
  • Zhu, J., and Xing, E. P. (2012), “Sparse Topical Coding,” arXiv:1202.3778.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.