1,281
Views
0
CrossRef citations to date
0
Altmetric
Articles

Re-LSTM: A long short-term memory network text similarity algorithm based on weighted word embedding

, ORCID Icon, &
Pages 2652-2670 | Received 18 May 2022, Accepted 20 Oct 2022, Published online: 02 Nov 2022
 

Abstract

Natural language processing text similarity calculation is a crucial and difficult problem that enables matching between various messages. This approach is the foundation of many applications. The word representation features and contextual relationships extracted by current text similarity computation methods are insufficient, and too many factors increase the computational complexity. Re-LSTM, a weighted word embedding long and short-term memory network, has therefore been proposed as a text similarity computing model. The two-gate mechanism of Re-LSTM neurons is built on the foundation of the conventional LSTM model and is intended to minimise the parameters and computation to some level. The hidden features and state information of the layer above each gate are considered for extracting more implicit features. By fully utilising the feature word and its domain association, the feature word’s position, and the word frequency information, the TF-IDF method and the χ²-C algorithm may effectively improve the representation of the weights on the words. The Attention mechanism is used in Re-LSTM to combine dependencies and feature word weights for deeper text semantic mining. The experimental results demonstrate that the Re-LSTM model outperforms baselines in terms of precision, recall, accuracy, and F1 values, all of which reach above 85% when applied to the QQPC and ATEC datasets.

Disclosure statement

No potential conflict of interest was reported by the author(s).