Re-LSTM: A long short-term memory network text similarity algorithm based on weighted word embedding

Weidong Zhaoa College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, People’s Republic of ChinaCorrespondence[email protected]

Xiaotong Liua College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, People’s Republic of China

https://orcid.org/0000-0002-3618-0168

Jun Jingb Shandong High Speed Information Group Co., Ltd, Jinan, People’s Republic of China

Rongchang Xic Shandong Yiyou Liangyi Intelligent Technology Co., Ltd, Qingdao, People’s Republic of China

Abstract

Natural language processing text similarity calculation is a crucial and difficult problem that enables matching between various messages. This approach is the foundation of many applications. The word representation features and contextual relationships extracted by current text similarity computation methods are insufficient, and too many factors increase the computational complexity. Re-LSTM, a weighted word embedding long and short-term memory network, has therefore been proposed as a text similarity computing model. The two-gate mechanism of Re-LSTM neurons is built on the foundation of the conventional LSTM model and is intended to minimise the parameters and computation to some level. The hidden features and state information of the layer above each gate are considered for extracting more implicit features. By fully utilising the feature word and its domain association, the feature word’s position, and the word frequency information, the TF-IDF method and the χ²-C algorithm may effectively improve the representation of the weights on the words. The Attention mechanism is used in Re-LSTM to combine dependencies and feature word weights for deeper text semantic mining. The experimental results demonstrate that the Re-LSTM model outperforms baselines in terms of precision, recall, accuracy, and F1 values, all of which reach above 85% when applied to the QQPC and ATEC datasets.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Re-LSTM: A long short-term memory network text similarity algorithm based on weighted word embedding

Information for

Open access

Opportunities

Help and information

Re-LSTM: A long short-term memory network text similarity algorithm based on weighted word embedding

Abstract

Disclosure statement

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature