830
Views
1
CrossRef citations to date
0
Altmetric
Articles

Sequential Text-Term Selection in Vector Space Models

, &
Pages 82-97 | Published online: 23 Jul 2019
 

Abstract

Text mining has recently attracted a great deal of attention with the accumulation of text documents in all fields. In this article, we focus on the use of textual information to explain continuous variables in the framework of linear regressions. To handle the unstructured texts, one common practice is to structuralize the text documents via vector space models. However, using words or phrases as the basic analysis terms in vector space models is in high debate. In addition, vector space models often lead to an extremely large term set and suffer from the curse of dimensionality, which makes term selection important and necessary. Toward this end, we propose a novel term screening method for vector space models under a linear regression setup. We first split the entire term space into different subspaces according to the length of terms and then conduct term screening in a sequential manner. We prove the screening consistency of the method and assess the empirical performance of the proposed method with simulations based on a dataset of online consumer reviews for cellphones. Then, we analyze the associated real data. The results show that the sequential term selection technique can effectively detect the relevant terms by a few steps.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

ACKNOWLEDGMENTS

Jingyuan Liu is corresponding author. The authors also want to thank Mr. Xiang Li in school of economics, Xiamen University for his technical support.

FUNDING

Additional information

Funding

This work was supported by funds for building world-class universities (disciplines) of the Renmin University of China, the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (No. 18XNLG02), Basic Scientific Center Project 71988101 of National Science Foundation of China, the National Natural Science Foundation of China (No. 11771361, 11871409, 11671334, 11831008, 11525101, 71532001), JAS14007, and China's National Key Research Special Program (No. 2016YFC0207704).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 123.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.