830
Views
1
CrossRef citations to date
0
Altmetric
Articles

Sequential Text-Term Selection in Vector Space Models

, &

References

  • Aldous, D. J. (1985), “Exchangeability and Related Topics,” in Ëcole d’Ëtë de Probabilitës de Saint-Flour XIII, Berlin, Heidelberg: Springer.
  • Batra, S., Bawa, S., and Punjab, P. (2010), “Using LSI and Its Variants in Text Classification,” in Advanced Techniques in Computing Sciences and Software Engineering, Dordrecht: Springer, pp. 313–316.
  • Belew, R. K., and Rijsbergen, C. J. V. (2000), Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW, New York: Cambridge University Press.
  • Berger, J., Sorensen, A. T., and Rasmussen, S. J. (2010), “Positive Effects of Negative Publicity: When Negative Reviews Increase Sales,” Marketing Science, 29, 815–827. DOI: 10.1287/mksc.1090.0557.
  • Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003), “Latent Dirichlet Allocation,” Journal of Machine Learning Research, 3, 993–1022.
  • Caropreso, M. F., Matwin, S., and Sebastiani, F. (2000), “Statistical Phrases in Automated Text Categorization,” Technical Report IEI-B4-07-2000, Pisa, Italy.
  • Chen, J., and Chen, Z. (2008), “Extended Bayesian Information Criteria for Model Selection With Large Model Spaces,” Biometrika, 95, 759–771. DOI: 10.1093/biomet/asn034.
  • Fan, J., and Lv, J. (2008), “Sure Independence Screening for Ultrahigh Dimensional Feature Space,” Journal of the Royal Statistical Society, Series B, 70, 849–911. DOI: 10.1111/j.1467-9868.2008.00674.x.
  • Fan, J., and Song, R. (2010), “Sure Independence Screening in Generalized Linear Models With np-Dimensionality,” The Annals of Statistics, 38, 3567–3604. DOI: 10.1214/10-AOS798.
  • Genkin, A., Lewis, D. D., and Madigan, D. (2007), “Large-Scale Bayesian Logistic Regression for Text Categorization,” Technometrics, 49, 291–304. DOI: 10.1198/004017007000000245.
  • Gomez, J. C., and Moens, M. F. (2012), “PCA Document Reconstruction for Email Classification,” Computational Statistics and Data Analysis, 56, 741–751. DOI: 10.1016/j.csda.2011.09.023.
  • Hofmann, T. (1999), “Probabilistic Latent Semantic Indexing,” in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57.
  • Hutter, M. (2001), “Distribution of Mutual Information,” Advances in Neural Information Processing Systems, 14, 399–406.
  • Ifrim, G., Bakir, G., and Weikum, G. (2008), “Fast Logistic Regression for Text Categorization With Variable-Length n-Grams,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 354–362.
  • Jia, J., Miratrix, L., Yu, B., Gawalt, B., El Ghaoui, L., Barnesmoore, L., and Clavier, S. (2014), “Concise Comparative Summaries (CCS) of Large Text Corpora With a Human Experiment,” The Annals of Applied Statistics, 8, 499–529. DOI: 10.1214/13-AOAS698.
  • Kent, J. T. (1983), “Information Gain and a General Measure of Correlation,” Biometrika, 70, 163–173. DOI: 10.1093/biomet/70.1.163.
  • Kudo, T., and Matsumoto, Y. (2004), “A Boosting Algorithm for Classification of Semi-Structured Text,” in Conference on Empirical Methods in Natural Language Processing, pp. 301–308.
  • Kumar, L., and Bhatia, P. K. (2013), “Text Mining: Concepts, Process and Applications,” Journal of Global Research in Computer Science, 4, 36–39.
  • Lee, T. Y., and Bradlow, E. T. (2011), “Automated Marketing Research Using Online Customer Reviews,” Journal of Marketing Research, 48, 881–894. DOI: 10.1509/jmkr.48.5.881.
  • Li, J., and Zha, H. (2006), “Two-Way Poisson Mixture Models for Simultaneous Document Classification and Word Clustering,” Computational Statistics and Data Analysis, 50, 163–180. DOI: 10.1016/j.csda.2004.07.013.
  • Li, R., Zhong, W., and Zhu, L. (2012), “Feature Screening via Distance Correlation Learning,” Journal of the American Statistical Association, 107, 1129–1139. DOI: 10.1080/01621459.2012.695654.
  • Liu, J., Li, R., and Wu, R. (2014), “Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates,” Journal of the American Statistical Association, 109, 266–274. DOI: 10.1080/01621459.2013.850086.
  • Liu, J., Zhong, W., and Li, R. (2015), “A Selective Overview of Feature Screening for Ultrahigh-Dimensional Data,” Science China Mathematics, 58, 1–22. DOI: 10.1007/s11425-015-5062-9.
  • Liu, T., Liu, S., Chen, Z., and Ma, W. Y. (2003), “An Evaluation on Feature Selection for Text Clustering,” in Proceedings of the 20th International Conference on Machine Learning, pp. 488–495.
  • Ludwig, S., De Ruyter, K., Friedman, M., Brüggen, E. C., Wetzels, M., and Pfann, G. (2013), “More Than Words: The Influence of Affective Content and Linguistic Style Matches in Online Reviews on Conversion Rates,” Journal of Marketing, 77, 87–103. DOI: 10.1509/jm.11.0560.
  • Manning, C. D., Raghavan, P., and Schütze, H. (2008), Introduction to Information Retrieval, Cambridge: Cambridge University Press.
  • Ng, H. T., Goh, W. B., and Low, K. L. (1997), “Feature Selection, Perceptron Learning, and a Usability Case Study for Text Categorization,” in Proceeding of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Vol. 31), pp. 67–73.
  • Salton, G. (1989), Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Boston, MA: Addison-Wesley Longman Publishing Co., Inc.
  • Salton, G., and Buckley, C. (1988), “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing & Management, 24, 513–523. DOI: 10.1016/0306-4573(88)90021-0.
  • Salton, G., Wong, A., and Yang, C. S. (1975), “A Vector Space Model for Automatic Indexing,” Communications of the ACM, 18, 613–620. DOI: 10.1145/361219.361220.
  • Sebastiani, F. (2002), “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, 34, 1–47. DOI: 10.1145/505282.505283.
  • Taddy, M. (2013), “Multinomial Inverse Regression for Text Analysis,” Journal of the American Statistical Association, 108, 771–772. DOI: 10.1080/01621459.2012.734168.
  • Tan, A. H. (1999), “Text Mining: The State of the Art and the Challenges,” in Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases (Vol. 8), pp. 65–70.
  • Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. DOI: 10.1111/j.2517-6161.1996.tb02080.x.
  • Turney, P. D., and Pantel, P. (2010), “From Frequency to Meaning: Vector Space Models of Semantics,” Journal of Artificial Intelligence Research, 37, 141–188. DOI: 10.1613/jair.2934.
  • Wang, H. (2009), “Forward Regression for Ultra-High Dimensional Variable Screening,” Journal of the American Statistical Association, 104, 1512–1524. DOI: 10.1198/jasa.2008.tm08516.
  • Wu, S. T., Li, Y., and Xu, Y. (2006), “Deploying Approaches for Pattern Refinement in Text Mining,” in Proceedings of the 6th International Conference on Data Mining, pp. 1157–1161.
  • Yang, Y., and Pedersen, J. O. (1997), “A Comparative Study on Feature Selection in Text Categorization,” in Proceedings of the 14th International Conference on Machine Learning, pp. 412–420.
  • Zheng, Z., Wu, X., and Srihari, R. (2004), “Feature Selection for Text Categorization on Imbalanced Data,” ACM SIGKDD Explorations Newsletter, 6, 80–89. DOI: 10.1145/1007730.1007741.
  • Zhao, Y., Yang, S., Narayan, V., and Zhao, Y. (2013), “Modeling Consumer Learning From Online Product Reviews,” Marketing Science, 32, 153–169. DOI: 10.1287/mksc.1120.0755.
  • Yu, L., and Liu, H. (2003), “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” in Proceedings of the 20th International Conference on Machine Learning, pp. 856–863.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.