Search in:

Advanced search

Journal of the American Statistical Association Volume 116, 2021 - Issue 534

Submit an article Journal homepage

2,100

Views

CrossRef citations to date

Altmetric

Theory and Methods

Rare Feature Selection in High Dimensions

Xiaohan Yana Microsoft Azure, Redmond, WACorrespondence[email protected]
View further author information

Jacob Bienb Data Sciences and Operations, USC Marshall, Los Angeles, CAView further author information

Pages 887-900 | Received 17 Mar 2018, Accepted 08 Jul 2020, Published online: 01 Sep 2020

Cite this article
https://doi.org/10.1080/01621459.2020.1796677
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Arnold, T. B., and Tibshirani, R. J. (2014), “genlasso: Path Algorithm for Generalized Lasso Problems,” R Package Version 1.3.
Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011), “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,” Foundations and Trends in Machine Learning, 3, 1–122. DOI: https://doi.org/10.1561/2200000016.
Google Scholar
Cao, Y., Zhang, A., and Li, H. (2017), “Microbial Composition Estimation From Sparse Count Data,” arXiv no. 1706.02380.
Google Scholar
Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., Pena, A. G., Goodrich, J. K., Gordon, J. I., Huttley, G. A., Kelley, S. T., Knights, D., Koenig, J. E., Ley, R. E., Lozupone, C. A., McDonald, D., Muegge, B. D., Pirrung, M., Reeder, J., Sevinsky, J. R., Turnbaugh, P. J., Walters, W. A., Widmann, J., Yatsunenko, T., Zaneveld, J., and Knight, R. (2010), “QIIME Allows Analysis of High-Throughput Community Sequencing Data,” Nature Methods, 7, 335–336. DOI: https://doi.org/10.1038/nmeth.f.303.
PubMed Web of Science ®Google Scholar
Chen, J., Bushman, F. D., Lewis, J. D., Wu, G. D., and Li, H. (2013), “Structure-Constrained Sparse Canonical Correlation Analysis With an Application to Microbiome Data Analysis,” Biostatistics, 14, 244–258. DOI: https://doi.org/10.1093/biostatistics/kxs038.
PubMed Web of Science ®Google Scholar
Feinerer, I., and Hornik, K. (2016), “wordnet: WordNet Interface,” R Package Version 0.1-11.
Google Scholar
Feinerer, I., and Hornik, K. (2017), “tm: Text Mining Package,” R Package Version 0.7-1.
Google Scholar
Fellbaum, C. (1998), WordNet: An Electronic Lexical Database, Cambridge, MA: Bradford Books.
Google Scholar
Forman, G. (2003), “An Extensive Empirical Study of Feature Selection Metrics for Text Classification,” Journal of Machine Learning Research, 3, 1289–1305.
Google Scholar
Friedman, J., Hastie, T., and Tibshirani, R. J. (2010), “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software, 33, 1–22. DOI: https://doi.org/10.18637/jss.v033.i01.
PubMed Web of Science ®Google Scholar
Guinot, F., Szafranski, M., Ambroise, C., and Samson, F. (2017), “Learning the Optimal Scale for GWAS Through Hierarchical SNP Aggregation,” BMC Bioinformatics, 19, 1–14. DOI: https://doi.org/10.1186/s12859-018-2475-9.
Web of Science ®Google Scholar
Huang, A. (2008), “Similarity Measures for Text Document Clustering,” in Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, pp. 49–56.
Google Scholar
Ke, T., Fan, J., and Wu, Y. (2015), “Homogeneity Pursuit,” Journal of the American Statistical Association, 110, 175–194. DOI: https://doi.org/10.1080/01621459.2014.892882.
PubMed Web of Science ®Google Scholar
Khabbazian, M., Kriebel, R., Rohe, K., and Ané, C. (2016), “Fast and Accurate Detection of Evolutionary Shifts in Ornstein-Uhlenbeck Models,” Methods in Ecology and Evolution, 7, 811–824. DOI: https://doi.org/10.1111/2041-210X.12534.
Web of Science ®Google Scholar
Kim, S., and Xing, E. P. (2012), “Tree-Guided Group Lasso for Multi-Response Regression With Structured Sparsity, With an Application to eQTL Mapping,” The Annals of Applied Statistics, 6, 1095–1117. DOI: https://doi.org/10.1214/12-AOAS549.
Web of Science ®Google Scholar
Li, C., and Li, H. (2010), “Variable Selection and Regression Analysis for Graph-Structured Covariates With an Application to Genomics,” The Annals of Applied Statistics, 4, 1498–1516. DOI: https://doi.org/10.1214/10-AOAS332.
PubMed Web of Science ®Google Scholar
Li, Y., Raskutti, G., and Willett, R. (2018), “Graph-Based Regularization for Regression Problems With Highly-Correlated Designs,” arXiv no. 1803.07658.
Google Scholar
Lin, W., Shi, P., Feng, R., and Li, H. (2014), “Variable Selection in Regression With Compositional Covariates,” Biometrika, 101, 785–797. DOI: https://doi.org/10.1093/biomet/asu031.
Web of Science ®Google Scholar
Liu, X., Yu, S., Janssens, F., Glänzel, W., Moreau, Y., and De Moor, B. (2010), “Weighted Hybrid Clustering by Combining Text Mining and Bibliometrics on a Large-Scale Journal Database,” Journal of the Association for Information Science and Technology, 61, 1105–1119.
Web of Science ®Google Scholar
Matsen, F. A., Kodner, R. B., and Armbrust, E. V. (2010), “pplacer: Linear Time Maximum-Likelihood and Bayesian Phylogenetic Placement of Sequences Onto a Fixed Reference Tree,” BMC Bioinformatics, 11, 538. DOI: https://doi.org/10.1186/1471-2105-11-538.
PubMed Web of Science ®Google Scholar
McMurdie, P. J., and Holmes, S. (2013), “phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data,” PLOS ONE, 8, 1–11. DOI: https://doi.org/10.1371/journal.pone.0061217.
Web of Science ®Google Scholar
Mohammad, S. M., and Turney, P. D. (2013), “Crowdsourcing a Word–Emotion Association Lexicon,” Computational Intelligence, 29, 436–465. DOI: https://doi.org/10.1111/j.1467-8640.2012.00460.x.
Web of Science ®Google Scholar
Mukherjee, R., Pillai, N. S., and Lin, X. (2015), “Hypothesis Testing for High-Dimensional Sparse Binary Regression,” The Annals of Statistics, 43, 352–381. DOI: https://doi.org/10.1214/14-AOS1279.
PubMed Web of Science ®Google Scholar
Pennington, J., Socher, R., and Manning, C. D. (2014), “GloVe: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. DOI: https://doi.org/10.3115/v1/D14-1162.
Google Scholar
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018), “Deep Contextualized Word Representations,” in Proceedings of NAACL.
Google Scholar
R Core Team (2016), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Randolph, T. W., Zhao, S., Copeland, W., Hullar, M., and Shojaie, A. (2015), “Kernel-Penalized Regression for Analysis of Microbiome Data,” The Annals of Applied Statistics, 12, 540. DOI: https://doi.org/10.1214/17-AOAS1102.
Web of Science ®Google Scholar
Ridenhour, B. J., Brooker, S. L., Williams, J. E., Van Leuven, J. T., Miller, A. W., Dearing, M. D., and Remien, C. H. (2017), “Modeling Time-Series Data From Microbial Communities,” The ISME Journal, 11, 2526–2537. DOI: https://doi.org/10.1038/ismej.2017.107.
PubMed Web of Science ®Google Scholar
Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E., Lesniewski, R., Oakley, B., Parks, D., Robinson, C., Sahl, J. W., Stres, B., Thallinger, G. G., Van Horn, D., and Weber, C. (2009), “Introducing Mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities,” Applied and Environmental Microbiology, 75, 7537–7541. DOI: https://doi.org/10.1128/AEM.01541-09.
PubMed Web of Science ®Google Scholar
Sculley, D. (2010), “Web-Scale k-Means Clustering,” in Proceedings of the 19th International Conference on World Wide Web, WWW’10, Association for Computing Machinery, New York, NY, USA, pp. 1177–1178.
Google Scholar
She, Y. (2010), “Sparse Regression With Exact Clustering,” Electronic Journal of Statistics, 4, 1055–1096. DOI: https://doi.org/10.1214/10-EJS578.
Web of Science ®Google Scholar
Shi, P., Zhang, A., and Li, H. (2016), “Regression Analysis for Microbiome Compositional Data,” The Annals of Applied Statistics, 10, 1019–1040. DOI: https://doi.org/10.1214/16-AOAS928.
Web of Science ®Google Scholar
Tang, Y., Li, M., and Niclolae, D. L. (2016), “Phylogenetic Dirichlet-Multinomial Model for Microbiome Data,” arXiv no. 1610.08974.
Google Scholar
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. (2010), “Sentiment in Short Strength Detection Informal Text,” Journal of the Association for Information Science and Technology, 61, 2544–2558. DOI: https://doi.org/10.1002/asi.21416.
Web of Science ®Google Scholar
Tibshirani, R. J. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
Google Scholar
Tibshirani, R. J., and Taylor, J. (2011), “The Solution Path of the Generalized Lasso,” The Annals of Statistics, 39, 1335–1371. DOI: https://doi.org/10.1214/11-AOS878.
Web of Science ®Google Scholar
Wallace, M. (2007), “Jawbone Java WordNet API.”
Google Scholar
Wang, H., Lu, Y., and Zhai, C. (2010), “Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’10, ACM, New York, NY, USA, pp. 783–792.
Google Scholar
Wang, J., Shen, X., Sun, Y., and Qu, A. (2016), “Classification With Unstructured Predictors and an Application to Sentiment Analysis,” Journal of the American Statistical Association, 111, 1242–1253. DOI: https://doi.org/10.1080/01621459.2015.1089771.
Web of Science ®Google Scholar
Wang, T., and Zhao, H. (2017a), “A Dirichlet-Tree Multinomial Regression Model for Associating Dietary Nutrients With Gut Microorganisms,” Biometrics, 73, 792–801. DOI: https://doi.org/10.1111/biom.12654.
PubMed Web of Science ®Google Scholar
Wang, T., and Zhao, H. (2017b), “Structured Subcomposition Selection in Regression and Its Application to Microbiome Data Analysis,” The Annals of Applied Statistics, 11, 771–791.
Web of Science ®Google Scholar
Xia, F., Chen, J., Fung, W. K., and Li, H. (2013), “A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis,” Biometrics, 69, 1053–1063. DOI: https://doi.org/10.1111/biom.12079.
PubMed Web of Science ®Google Scholar
Yu, G., and Liu, Y. (2016), “Sparse Regression Incorporating Graphical Structure Among Predictors,” Journal of the American Statistical Association, 111, 707–720. DOI: https://doi.org/10.1080/01621459.2015.1034319.
PubMed Web of Science ®Google Scholar
Zhai, J., Kim, J., Knox, K. S., Twigg, H. L., Zhou, H., and Zhou, J. J. (2018), “Variance Component Selection With Applications to Microbiome Taxonomic Data,” Frontiers in Microbiology, 9, 509. DOI: https://doi.org/10.3389/fmicb.2018.00509.
PubMed Web of Science ®Google Scholar
Zhang, T., Shao, M.-F., and Ye, L. (2012), “454 Pyrosequencing Reveals Bacterial Diversity of Activated Sludge From 14 Sewage Treatment Plants,” The ISME Journal, 6, 1137–1147. DOI: https://doi.org/10.1038/ismej.2011.188.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Rare Feature Selection in High Dimensions

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Rare Feature Selection in High Dimensions

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date