16,955
Views
25
CrossRef citations to date
0
Altmetric
Articles

S_I_LSTM: stock price prediction based on multiple data sources and sentiment analysis

, , &
Pages 44-62 | Received 30 Jan 2021, Accepted 03 Jun 2021, Published online: 14 Jun 2021
 

Abstract

Stocks price prediction is a current hot spot with great promise and challenges. Recently, there have been many stock price prediction methods. However, the prediction accuracy of these methods is still far from satisfactory. In this paper, we propose a stock price prediction method that incorporates multiple data sources and the investor sentiment, which can be called S_I_LSTM. Firstly, we crawl multiple data sources on the Internet and preprocess them respectively. These data involve stock historical data, technical indicators, and non-traditional data sources, such as stock posts and financial news. Then, we use the sentiment analysis method based on convolutional neural network for the non-traditional data, which can calculate the investors' sentiment index. Finally, we combine sentiment index, technical indicators and stock historical transaction data as the feature set of stock price prediction and adopt the long short-term memory network for predicting the China Shanghai A-share market. The experiments show that the predicted stock closing price is closer to the true closing price than the single data source, and the mean absolute error can achieve 2.386835, which is better than traditional methods. We verified the effectiveness on the real data sets of five listed companies.

Acknowledgements

This work was supported by the National Natural Science China under Grant 61872134, the Science and Technology Development Center of the Ministry of Education under Grant 2019J01020, the Science and Technology Program of Changsha City under Grant kh2005019, kq2004021 and the 2011 Collaborative Innovative Center for Development and Utilisation of Finance and Economics Big Data Property, Universities of Hunan Province.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Additional information

Funding

This work was supported by Natural Science Foundation of China: [grant number 61872134]; Natural Science Foundation of Hunan Province: [grant number 2019JJ50082]; Science and Technology Program of Changsha City: [grant number kh2005019, kq2004021]; Science and Technology Development Center of the Ministry of Education: [grant number 2019J01020].