References
- Arnold, T. B., and Tibshirani, R. J. (2014), “genlasso: Path Algorithm for Generalized Lasso Problems,” R Package Version 1.3.
- Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011), “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,” Foundations and Trends in Machine Learning, 3, 1–122. DOI: https://doi.org/10.1561/2200000016.
- Cao, Y., Zhang, A., and Li, H. (2017), “Microbial Composition Estimation From Sparse Count Data,” arXiv no. 1706.02380.
- Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., Pena, A. G., Goodrich, J. K., Gordon, J. I., Huttley, G. A., Kelley, S. T., Knights, D., Koenig, J. E., Ley, R. E., Lozupone, C. A., McDonald, D., Muegge, B. D., Pirrung, M., Reeder, J., Sevinsky, J. R., Turnbaugh, P. J., Walters, W. A., Widmann, J., Yatsunenko, T., Zaneveld, J., and Knight, R. (2010), “QIIME Allows Analysis of High-Throughput Community Sequencing Data,” Nature Methods, 7, 335–336. DOI: https://doi.org/10.1038/nmeth.f.303.
- Chen, J., Bushman, F. D., Lewis, J. D., Wu, G. D., and Li, H. (2013), “Structure-Constrained Sparse Canonical Correlation Analysis With an Application to Microbiome Data Analysis,” Biostatistics, 14, 244–258. DOI: https://doi.org/10.1093/biostatistics/kxs038.
- Feinerer, I., and Hornik, K. (2016), “wordnet: WordNet Interface,” R Package Version 0.1-11.
- Feinerer, I., and Hornik, K. (2017), “tm: Text Mining Package,” R Package Version 0.7-1.
- Fellbaum, C. (1998), WordNet: An Electronic Lexical Database, Cambridge, MA: Bradford Books.
- Forman, G. (2003), “An Extensive Empirical Study of Feature Selection Metrics for Text Classification,” Journal of Machine Learning Research, 3, 1289–1305.
- Friedman, J., Hastie, T., and Tibshirani, R. J. (2010), “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software, 33, 1–22. DOI: https://doi.org/10.18637/jss.v033.i01.
- Guinot, F., Szafranski, M., Ambroise, C., and Samson, F. (2017), “Learning the Optimal Scale for GWAS Through Hierarchical SNP Aggregation,” BMC Bioinformatics, 19, 1–14. DOI: https://doi.org/10.1186/s12859-018-2475-9.
- Huang, A. (2008), “Similarity Measures for Text Document Clustering,” in Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, pp. 49–56.
- Ke, T., Fan, J., and Wu, Y. (2015), “Homogeneity Pursuit,” Journal of the American Statistical Association, 110, 175–194. DOI: https://doi.org/10.1080/01621459.2014.892882.
- Khabbazian, M., Kriebel, R., Rohe, K., and Ané, C. (2016), “Fast and Accurate Detection of Evolutionary Shifts in Ornstein-Uhlenbeck Models,” Methods in Ecology and Evolution, 7, 811–824. DOI: https://doi.org/10.1111/2041-210X.12534.
- Kim, S., and Xing, E. P. (2012), “Tree-Guided Group Lasso for Multi-Response Regression With Structured Sparsity, With an Application to eQTL Mapping,” The Annals of Applied Statistics, 6, 1095–1117. DOI: https://doi.org/10.1214/12-AOAS549.
- Li, C., and Li, H. (2010), “Variable Selection and Regression Analysis for Graph-Structured Covariates With an Application to Genomics,” The Annals of Applied Statistics, 4, 1498–1516. DOI: https://doi.org/10.1214/10-AOAS332.
- Li, Y., Raskutti, G., and Willett, R. (2018), “Graph-Based Regularization for Regression Problems With Highly-Correlated Designs,” arXiv no. 1803.07658.
- Lin, W., Shi, P., Feng, R., and Li, H. (2014), “Variable Selection in Regression With Compositional Covariates,” Biometrika, 101, 785–797. DOI: https://doi.org/10.1093/biomet/asu031.
- Liu, X., Yu, S., Janssens, F., Glänzel, W., Moreau, Y., and De Moor, B. (2010), “Weighted Hybrid Clustering by Combining Text Mining and Bibliometrics on a Large-Scale Journal Database,” Journal of the Association for Information Science and Technology, 61, 1105–1119.
- Matsen, F. A., Kodner, R. B., and Armbrust, E. V. (2010), “pplacer: Linear Time Maximum-Likelihood and Bayesian Phylogenetic Placement of Sequences Onto a Fixed Reference Tree,” BMC Bioinformatics, 11, 538. DOI: https://doi.org/10.1186/1471-2105-11-538.
- McMurdie, P. J., and Holmes, S. (2013), “phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data,” PLOS ONE, 8, 1–11. DOI: https://doi.org/10.1371/journal.pone.0061217.
- Mohammad, S. M., and Turney, P. D. (2013), “Crowdsourcing a Word–Emotion Association Lexicon,” Computational Intelligence, 29, 436–465. DOI: https://doi.org/10.1111/j.1467-8640.2012.00460.x.
- Mukherjee, R., Pillai, N. S., and Lin, X. (2015), “Hypothesis Testing for High-Dimensional Sparse Binary Regression,” The Annals of Statistics, 43, 352–381. DOI: https://doi.org/10.1214/14-AOS1279.
- Pennington, J., Socher, R., and Manning, C. D. (2014), “GloVe: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. DOI: https://doi.org/10.3115/v1/D14-1162.
- Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018), “Deep Contextualized Word Representations,” in Proceedings of NAACL.
- R Core Team (2016), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
- Randolph, T. W., Zhao, S., Copeland, W., Hullar, M., and Shojaie, A. (2015), “Kernel-Penalized Regression for Analysis of Microbiome Data,” The Annals of Applied Statistics, 12, 540. DOI: https://doi.org/10.1214/17-AOAS1102.
- Ridenhour, B. J., Brooker, S. L., Williams, J. E., Van Leuven, J. T., Miller, A. W., Dearing, M. D., and Remien, C. H. (2017), “Modeling Time-Series Data From Microbial Communities,” The ISME Journal, 11, 2526–2537. DOI: https://doi.org/10.1038/ismej.2017.107.
- Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E., Lesniewski, R., Oakley, B., Parks, D., Robinson, C., Sahl, J. W., Stres, B., Thallinger, G. G., Van Horn, D., and Weber, C. (2009), “Introducing Mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities,” Applied and Environmental Microbiology, 75, 7537–7541. DOI: https://doi.org/10.1128/AEM.01541-09.
- Sculley, D. (2010), “Web-Scale k-Means Clustering,” in Proceedings of the 19th International Conference on World Wide Web, WWW’10, Association for Computing Machinery, New York, NY, USA, pp. 1177–1178.
- She, Y. (2010), “Sparse Regression With Exact Clustering,” Electronic Journal of Statistics, 4, 1055–1096. DOI: https://doi.org/10.1214/10-EJS578.
- Shi, P., Zhang, A., and Li, H. (2016), “Regression Analysis for Microbiome Compositional Data,” The Annals of Applied Statistics, 10, 1019–1040. DOI: https://doi.org/10.1214/16-AOAS928.
- Tang, Y., Li, M., and Niclolae, D. L. (2016), “Phylogenetic Dirichlet-Multinomial Model for Microbiome Data,” arXiv no. 1610.08974.
- Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. (2010), “Sentiment in Short Strength Detection Informal Text,” Journal of the Association for Information Science and Technology, 61, 2544–2558. DOI: https://doi.org/10.1002/asi.21416.
- Tibshirani, R. J. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
- Tibshirani, R. J., and Taylor, J. (2011), “The Solution Path of the Generalized Lasso,” The Annals of Statistics, 39, 1335–1371. DOI: https://doi.org/10.1214/11-AOS878.
- Wallace, M. (2007), “Jawbone Java WordNet API.”
- Wang, H., Lu, Y., and Zhai, C. (2010), “Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’10, ACM, New York, NY, USA, pp. 783–792.
- Wang, J., Shen, X., Sun, Y., and Qu, A. (2016), “Classification With Unstructured Predictors and an Application to Sentiment Analysis,” Journal of the American Statistical Association, 111, 1242–1253. DOI: https://doi.org/10.1080/01621459.2015.1089771.
- Wang, T., and Zhao, H. (2017a), “A Dirichlet-Tree Multinomial Regression Model for Associating Dietary Nutrients With Gut Microorganisms,” Biometrics, 73, 792–801. DOI: https://doi.org/10.1111/biom.12654.
- Wang, T., and Zhao, H. (2017b), “Structured Subcomposition Selection in Regression and Its Application to Microbiome Data Analysis,” The Annals of Applied Statistics, 11, 771–791.
- Xia, F., Chen, J., Fung, W. K., and Li, H. (2013), “A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis,” Biometrics, 69, 1053–1063. DOI: https://doi.org/10.1111/biom.12079.
- Yu, G., and Liu, Y. (2016), “Sparse Regression Incorporating Graphical Structure Among Predictors,” Journal of the American Statistical Association, 111, 707–720. DOI: https://doi.org/10.1080/01621459.2015.1034319.
- Zhai, J., Kim, J., Knox, K. S., Twigg, H. L., Zhou, H., and Zhou, J. J. (2018), “Variance Component Selection With Applications to Microbiome Taxonomic Data,” Frontiers in Microbiology, 9, 509. DOI: https://doi.org/10.3389/fmicb.2018.00509.
- Zhang, T., Shao, M.-F., and Ye, L. (2012), “454 Pyrosequencing Reveals Bacterial Diversity of Activated Sludge From 14 Sewage Treatment Plants,” The ISME Journal, 6, 1137–1147. DOI: https://doi.org/10.1038/ismej.2011.188.