391
Views
0
CrossRef citations to date
0
Altmetric
Research Article

IIM: an information interaction mechanism for aspect-based sentiment analysis

, &
Article: 2283390 | Received 01 Nov 2022, Accepted 09 Nov 2023, Published online: 02 Dec 2023

ABSTRACT

Term polarity co-extraction is an aspect-based sentiment analysis task, which has been widely used in the fields of user opinions extraction. It consists of two subtasks: aspect term extraction and aspect sentiment classification. Most existing studies solve aforesaid subtasks as independent tasks or simply unify the two subtasks without making full use of the relationship between tasks to mine the interaction of text information, which leads to low performance for practical applications. Meanwhile, the learning framework for these studies has a label drift phenomenon (LDP) in the process of predictive learning, increasing the learning error rate. To address the above problems, this study unifies subtasks and proposes a Unified framework based on the information interaction mechanism framework, called IIM. Specifically, we design an Information Interaction Channel (IIC) to construct closer semantic features to extract preliminary term-polarity unified labels from the perspective of basic semantics. For label inconsistency between aspect terms, a Position-aware Module (SAM) is proposed to alleviate the Label Drift Phenomenon (LDP). Moreover, we introduce a syntax-attention graph neural network (Syn-AttGCN) to model the syntactic structure of text and strengthen the emotional connection between aspect terms. The experimental results show that IIM outperforms most baselines. Meanwhile, the SAM module has a certain slowing effect on LDP.

1. Introduction

User-generated content is original content posted by users through social platforms. With the development of the Internet, UGC is widely available on social media, blogs, forums and other platforms. Taking UGC text as an example, the content of the text conveys users' voices and opinions. Through in-depth analysis of these users' opinions, it plays an important role in enterprise operation, product marketing, public opinion control and other fields.

To implement the applications mentioned above, it is crucial to analyse the sentiment of UGC text. Traditional sentiment analysis focuses on a single sentence or a document to classify the sentiment (Zhang et al., Citation2019), which defaults that the whole text has only a single sentiment. However, UGC text is often a representation of a target or one or more aspects of that target (Zhang et al., Citation2022) and have different sentiment categories (e.g. positive, negative and neutral) towards different aspect in real scenarios. It can be seen that the traditional method is relatively solidified and not realistic. Therefore, some scholars proposed Aspect-based sentiment analysis (ABSA) which is a fine-grained way of sentiment classification (Yang et al., Citation2020), reflecting the sentiment of the user towards one or more aspect terms (e.g. products, services, etc.). Given an example sentence such as ‘The price is reasonable although the service is poor.’, for the aspect term ‘price’, it has a positive sentiment, but the evaluation is negative for the term ‘service’. Note that we only discuss the case of aspect-based study in the later presentation.

To better complete the ABSA task, some subtasks have been proposed. Specifically, aspect term extraction (ATE) and aspect sentiment classification (ASC) are two key tasks of ABSA (Zhang et al., Citation2022) as shown in Figure . Since named entity recognition (NER) can identify consecutive subwords of aspect terms, the ATE task is in turn treated as the NER task. Further, regular NER is categorised as a sequence labelling (SL) task, taking Table  as an example, giving a text sequence, the model can predict the position label (e.g. B-beginning, I-inside, O-outside, E-ending, S-singleton) of each subword in the sequence. Then the entity objects contained in the text and their locations can be obtained after integrating and matching the labels (Chiu & Nichols, Citation2016; Ma & Hovy, Citation2016). Relatively speaking, ASC is a single classification task, that performs sentiment analysis of the identified aspect entity (e.g. the ‘Vista’ corresponds to the negative label ‘NEG’).

Figure 1. Case demonstrations of subtasks.

Figure 1. Case demonstrations of subtasks.

Table 1. Label drift phenomenon.

Table 2. Different work routes.

To further refine the ABSA task, some scholars have proposed a method of Term polarity co-extraction (TPC). More precisely, TPC is an important work derived for the above ABSA tasks, where the input is a UGC text and the output is the aspect terms contained in the text and their sentiment categories. Here, many relevant mature frameworks have already been available, as illustrated in Table . Pipeline is a staged, single-task processing approach, the ATE task is performed first and then the sentiment classification is completed based on the extracted terms (Hu et al., Citation2019; Li et al., Citation2019; Liang et al., Citation2020; Luo et al., Citation2020). The main advantage of this way of work is that there are lots of proven training paradigms, which can be used directly for each independent subtask. However, one of its fatal weaknesses is error propagation, where the error of the previous task continues to propagate as input to the next task, which leads to poor performance for most Pipeline work. A composite model, which can avoid error propagation, is the solution to the above problem as below. Joint is a way to combine the ATE and the ASC by sharing weights to jointly train two subtasks (He et al., Citation2019; Mao et al., Citation2021; Peng et al., Citation2020; Xu et al., Citation2018), avoiding error transmission. Nevertheless, this type of model may cause a huge number of parameters since one training is equivalent to using the framework of two prediction tasks. Therefore, it is particularly important to construct a lightweight model that extracts aspect terms and sentiment categories simultaneously. Consequently,Unified is proposed, which is an end-to-end modelling method that enables term polarity co-extraction at once by collapsing the labelling strategy (Li et al., Citation2019; Luo et al., Citation2019; Xu et al., Citation2019). In general, the correctness of prediction results for the ABSA is directly influenced by two factors: the position boundary of aspect terms and the category of aspect sentiment. Previous studies have mostly dealt with ATE and ASC tasks independently, thus separating the prediction of positional labels and affective labels for aspect terms. In this way, because the boundary information and emotional information of the text itself are not fully utilised, the interaction between the information is ignored, leading to an increase in the error rate. Meanwhile, there is a label inconsistency problem within the subwords of aspect terms when most models predict the labels. As shown in Table , we refer to it as the label drift phenomenon (LDP), which leads to limited model performance due to their lack of capture methods for valid information. In response to the above, there are some explorations, as follows.

In this paper, a Unified framework called information interaction mechanisms framework (IIM) is proposed. Firstly, lexical analysis of the text is performed to extract two candidate words, which act as entities, and the sentiment colour of the text. Then, an information interaction channel captures the correlation between candidate words and text content, which is used to extract tighter features to constrain the boundary of aspect terms in the results and clarify the sentiment polarity. To alleviate LDP within the aspect terms, constructing a position-aware module (SAM) at the information interaction sink can preserve the memory of continuous sequence. Meanwhile, for syntactic structure modelling, a graph neural network is introduced to better utilise the grammatical structure information of the text and enhance the boundary features of aspect terms. Finally, we use a multi-feature splice fusion strategy with the assistance of fully connected layers to complete the label prediction task and ensure the integrity of the information. The major contributions of this paper are summarised as follows:

  1. A novel Unified-based framework IIM is proposed for the TPC task, which is based on the classical model and has low complexity and easy implementation.

  2. We design an information interaction channel (IIC) to help extract effective features in text. Among them, a creative position-aware module (SAM) is built, which can alleviate the label drift problem (LDP) for TPC.

  3. We design a syntax-attention graph neural network (Syn-AttGCN) to model the grammatical structure of text. The experimental results show that the proposed method outperforms most of the baselines.

We elaborate on the work of this paper in the following sections. Section 2 describes the related work of the TPC task. A detailed explanation of the proposed method and the theoretical justification in Section 3 and Section 4 discusses the experimental outcomes. Section 5 involves summarises of this study.

2. Related work

ABSA is a summary of the user's opinions. According to the differences in UGC texts, existing methods for ABSA consist of aspect sentiment analysis (ASC) and term polarity co-extraction (TPC), respectively. Based on above methods, many learning frameworks have been proposed for ABSA.

Among them, ASC pays more attention to a certain identified aspect term in the text. For example, to assist in the completion of sentiment classification, the textual information (i.e. the information of semantics and location) of aspect term can be directly participated in the training (Nguyen & Le Nguyen, Citation2018; Xing et al., Citation2019; Xue & Li, Citation2018; Yang et al., Citation2019), during the feature extraction stage. Furthermore, Wang et al. (Citation2016) believed that it is meaningful to explore the relationship between aspect terms and textual information, and for this purpose, they have proposed an attention-based LSTM network to select information of interest to different aspects. Yi et al. (Citation2018) incorporated aspect information into a neural model by modelling word-aspect relations, which enables the model to adaptively focus on the correct word for a given aspect term. In addition, the approach of extracting text features using syntactic structure has also attracted the attention of many researchers. Sun et al. (Citation2019) and Zhang et al. (Citation2019) utilised graph convolutional neural networks (GNN) (Zhou et al., Citation2020) to model syntactic dependency trees to exploit syntactic information and the dependencies between words. Subsequently, various GNN-based approaches were proposed to solve such problems by explicitly exploiting grammatical information (Huang & Carley, Citation2019; Tang et al., Citation2020; Wang et al., Citation2020; Zhang & Qian, Citation2020; Zhu et al., Citation2021). Our study follows the above research closely. With the development of pre-training models, many scholars (Xiao & Luo, Citation2022; Yang & Yang, Citation2020) have combined them with the ASC task. Sun et al. (Citation2019) extended aspect terms to the end of the text as feature encoding and then conducted fine-tuning using the pre-training model Bert (Devlin et al., Citation2018) as the baseline model, with good results. The above work does help the model to have better performance by directly involving entity labels as prior knowledge in training. Whereas, this type of research method is relatively solidified. It requires advanced tagging aspect terms and limited data processing at one time, which is more workload and less practical. In real scenarios, such as the dataset SemEval (Pontiki et al. Citation2015, Citation2016), a text sample often contains multiple aspect terms. Hence the a need for a more flexible learning framework to extract all terms at once. The two studies (Mao et al., Citation2021; Yan et al., Citation2021) proposed a unified generative model, that can realise multi-aspect sentiment classification and even more subtasks.

Relatively speaking, TPC can simultaneously extract aspect terms and sentiment polarity synchronously to make up for the shortcomings of single-step extraction in the ASC task. To be more specific, TPC involves two subtasks ATE and ASC, Wang et al. (Citation2017) designed a multi-task one-stop solution of ABSA with state-of-the-art (SOTA) results by merging the two tasks into a task of SL to avoid the error propagation in way of the Pipeline. Yet, the SL is often better suited to a single ATE subtask than a union of multiple subtasks. Therefore, the Unified approach is considered by a large number of scholars. TPC is representative of the unified approach, which has more information available compared to the SL, such as the sentiment words contained in the text itself. How to make a unified learning architecture fully focus on the text with information influence has become the focus of subsequent research. Wang et al. (Citation2017) utilised syntactic relations to parse the relationship between aspect terms and opinion terms to achieve double propagation of information under the effect of multi-layer coupled attention coupling to better capture text features. Li et al. (Citation2018) proposed a novel model applying a unified labelling scheme, while by two stacked recurrent neural networks: The upper layer predicts the unified label to produce the final output of sentiment analysis based on the primary target; The next auxiliary performs prediction of target boundary to better explore the inter-task dependencies. Chen et al. (Citation2020) gave innovative learning ideas based on the above to extract better features by considering the relationship between tasks. Inspired by the above work, this paper uses a Unified-based modelling framework to consider how to make full use of existing textual information and mine the correlation between ATE and ASC, so that they work synergistically toward extracting better correlation features and improving the effectiveness of downstream tasks.

3. Proposed method

In this section, we first formally provide some important notations used in this paper. Next, we describe the main structure of the proposed model in detail. Finally, the details for the training loss function are given.

3.1. Task description

For given texts S={Si|i=1,,n}, n is the total number of text samples. We propose a unified tagging model that aims to simultaneously extract the aspect terms and corresponding sentiments (i.e. the ATE task and the ASC task) in S, and then obtain a combined unified label L={Li|i=1,,n}. Among them, each label Li may contain two cases as Table : The first one is {O}, representing a word outside the aspect term without sentiment; the other one is a combined unified label described below.

Firstly, the leftmost character of the label LiTp, where Tp={B,I,E,S}, which denotes the word Sij located at the beginning, inside, and end of the aspect term and as a single instance, respectively. Then, the remaining characters of the label LiTe, where Te={POS,NEU,NEG}, indicate sentiment polarity categories: positive, neutral and negative. And finally, the label is unified by the symbol ‘-’.

Taking the text ‘I love Linux which is a vast improvement over Windows Vista’ as an example, as illustrated in Figure . {S,B,E} is predicted location labels of the subwords ‘Linux’, ‘Windows’ and ‘Vista’. We combine labels of these subwords according to BIOES tagging scheme (see Tabel ) to accomplish ATE and make it clear that the terms ‘Linux’ and ‘Windows Vista’ are labelled by Tp, where Tp={S}{B,E}. Meanwhile, these terms were judged to be positive and negative in the ASC task, respectively. As stated, the prediction goal is to get the unified labels ‘S-POS’ and ‘B-NEG, E-NEG’, which corresponds to the aspect terms.

Figure 2. The overall framework of IIM. It consists of two parts: Part 1 focuses on Information Interaction Channel (IIC); Part 2 mainly discusses Syn-AttGCN components.

Figure 2. The overall framework of IIM. It consists of two parts: Part 1 focuses on Information Interaction Channel (IIC); Part 2 mainly discusses Syn-AttGCN components.

3.2. Main structure

In this paper, we discuss the Unified framework IIM. Figure consists of two core parts, which are mainly made up of IIC components and Syn-AttGCN components in detail below, as to explore deeply the relevance between the subtasks from the semantic features and syntactic features of the text.

3.2.1. Encoder

To benefit from the pre-trained model, the input text Si is encoded using the Bert Encoder and obtains the semantic feature matrix Mi. Then Mij is embedded into a long short-term memory (LSTM) network as the semantic vectors of each token to obtain the long-range dependent features Hij, where i denotes the ith text and j denotes the jth token in the text.

3.2.2. Information interaction channel (IIC)

From these studies (Li et al., Citation2019; Liang et al., Citation2020; Luo et al., Citation2019), it can be found that the effective information of text is important for classification results. As a result, this paper builds a channel IIC focussing on the raw information of aspect terms and corresponding sentiment categories, see Figure . It is hoped that the encoded semantic vector Hij will get more effective features after passing through the IIC to help the downstream task.

Figure 3. The specific structure of IIC. There are two novel concepts proposed, Information Interactive Attention (IIA) at the bottom of the figure and Position-aware Module (SAM) at the top of the figure. Note that the masked words are candidate boundary words and candidate emotion words, which are set to 1 and the rest is set to 0 in the feature representation. ⊙ represents function Tanh.

Figure 3. The specific structure of IIC. There are two novel concepts proposed, Information Interactive Attention (IIA) at the bottom of the figure and Position-aware Module (SAM) at the top of the figure. Note that the masked words are candidate boundary words and candidate emotion words, which are set to 1 and the rest is set to 0 in the feature representation. ⊙ represents function Tanh.

CBWs and CEWs Obviously, the UGC text Si contains words with boundary and emotional overtones which conceal a large amount of available information. Therefore, these words are considered to have an information influence in this paper. Specifically, they can generate guidance information by utilising the above words and bringing influence on the completion of the subtasks (i.e. ATE and ASC). For instance, the sentence ‘I love Linux which is a vast improvement over Windows Vista’, ‘Windows Vista’ is perceived as an aspect term in human perception, while the model does not know it. On the contrary, if we let the model learn this knowledge in advance, it will prioritise whether the immediately following word is ‘Vista’ based on experience when it encounters the word ‘Windows’. If the conclusion is definite, then the two contiguous words will be considered as subwords of the same aspect term (i.e. ‘Windows Vista’). For the ATE task, ‘Vista’ will be used as the ending boundary. Likewise, one piece of information is told in advance as prior knowledge here is that ‘love’ is a word expressing positive emotion. Thus, ‘love’ will be selected as the closest emotion word to preferentially convey positive sentiment to the ASC task, when we discuss the sentiment category of the aspect term ‘Linux’.

Based on above thought, we try to use the natural language processing tool Spacy from the subword granularity to analyse the lexicality of the text Si. Specifically speaking, nouns with substantive representation, such as common nouns and proper nouns, have a higher probability of being aspect terms. For instance, ‘Windows Vista’ and ‘Linux’ are mentioned above. Due to their available positional information that can determine the aspect term with maximum probability and constrain its scope, these nouns are selected as candidate boundary words (CBWs). Moreover, the emotional direct source for aspect terms is the words with emotional overtones in Si. These words can be broadly classified into several categories: verbs, adjectives, degree adverbs and subordinate conjunctions are selected as candidate emotion words (CEWs) which contain textual emotional information to influence the ASC task.

Information Interactive Attention (IIA) To date, Self-Attention has been effectively applied to information capture efforts (Wang et al., Citation2017), but it has rarely been considered in the interaction between information of context. Based on the above, we make some breakthroughs. See Figure , two types of candidate words (i.e. the CBWs and CEWs) are masked to get the feature vectors Fi of candidate words. Then Fi and output information Hi of the LSTM network are fed to IIA so that the two feature vectors interact with Si with the help of the attention mechanism. In this way, the prototype vectors of textual interaction information containing aspect terms and corresponding sentiment categories are obtained from the feature vectors.

To enforce the interaction of text, IIA is designed here in two directions, see Figure bottom. One is based on the text of CBWs, and the other one towards the text of CEWs. Specifically, with the feature vector Fi of candidate words (i.e. the boundary vector (Fi)b and the emotional vector (Fi)e) is mapped as the query vector and the encoding Hi for the ith text as the primary key vector. Afterwards, Fi is computed the similarity with Hi to obtain the information attention αi (i.e. the boundary information attention (αi)b and the emotional information attention (αi)e) is describe as: (1) αi=Softmax(WfFi×Wh(Hi)Td),(1) where Wf and Wh are trainable weight matrices, d denotes the dimension of the feature vector, (Hi)T is the transposed matrix of (Hi). Then the attention-focussed state representation Vi (i.e. the boundary information score (Vi)b and the sentiment information score (Vi)e is given by the weighted summation of the original state representation Fi, and their similarity αi to the hidden state representation Hi of the current token as follows: (2) Vi=αi×WvFij,(2) where Wv is trainable weight. Essentially, Vi limits the probability range of quasi-aspect words and identifies emotional categories in favour of the TPC task.

Finally, to give more attention to the prototypes of aspect words and emotional words, the interactive information features: (3) Zi=Tanh((Vi)bWb+(Vi)eWe),(3) are obtained through the activation function Tanh, where Wb and We are trainable weight matrix.

Position-aware Module (SAM) Table  describes a common LDP problem for TPC. According to this paper (Hochreiter & Schmidhuber, Citation1997), it is speculated that the model forgets the position information of preorder subwords over time, and weakens the semantic relevance between neighbouring words. Consequently, a particular component SAM is proposed in Figure top to alleviate the LDP.

Guided by this thought (Chung et al., Citation2014), we take the GRU as SAM's basic structure and introduce boundary hint signals between aspect subwords. To be specific, Ei of the output Zi from IIA and the Hi from the LSTM network are used as input information of the SAM module, as illustrated in Figure , which preserves the interactive feature and original contextual feature of the text, respectively. Then use the gate units Ct and Pt to control and store these features containing boundary ranges and sentiment for aspect term as follows: (4) Ct=σ(EiWce+hiWhc),(4) (5) Pt=σ(EiWpe+hiWhp),(5) Where hi is the hidden information at time i, W is a trainable parameter matrix and * stands for any character set that acts as an exponent. Candidate memory information: (6) hi~=Tanh(HiWsH+CthiWsh),(6) and hi ensure that the basic information is updated with the timing, ‘·’ represents multiplication by bit. Then hi is merged with the original features Hi to obtain continuous contextual semantic features (i.e. CBWs and CEWs) for known sequence information. As described in the first two sections, it is clear that this type of semantic feature is a priori experience. SAM as a downstream module can be full of these experiences to generate boundary hint signals: (7) si=Tanh(HiWsH+hiWsh),(7) for consecutive subwords of the aspect term and help the model predict scope more accurately.

To update the hidden feature hi+1, with the aid of Pt, we utilise the position information hi~ containing past sequences to pick important features depicted as: (8) hi+1=(1Pt)hi~+Pt(hi+si),(8) so that the output label Oi (see Figure ) of subwords can be represented continuously.

In short, boundary signals help to refine the boundary representation between consecutive subwords, by sensing position information of aspect terms in the sequence and enhancing memory for the key feature information of past moments. Thus the LDP problem is mitigated to some extent, and the error rate of the subtask is decreased. Taking Table  as an example, when ‘AMD’ gets the location label ‘B’ for ATE, si cues that the next sequence ‘Turin’ is the same aspect subword and is in the middle of the term (i.e. ‘I’). Similarly, ‘Processor’ is tagged by ‘E’ when it is prompted to be at the end of the term subword. On this basis, ASC uses existing text boundaries to determine the synchronous sentiment polarity of subwords and avoid extreme ‘LDPi’ to a large extent.

3.2.3. TFM layer

Based on the low-level features extracted above, a high-level expresser Transformer-Encoder (TFM) is introduced to further extract features and output them. To be more specific, TFM consists of two sub-modules, it is worth mentioning the important component of multi-head attention. It allows the model to jointly focus on information from representations of the different location spaces through multiple attention heads (Vaswani et al., Citation2017), which is the key to the powerful learning capability of TFM. The second one is simply the fully connected layer. In a word, TFM acts as a high-layer encoder that can potentially exploit the effective features Oi of low-layer networks and obtain closer features.

3.2.4. Syn-AttGCN

In addition to the above textual semantic features, syntactic structure is also a good pointcut for dealing with the TPC problems, as proved in these study (Huang & Carley, Citation2019; Sun et al., Citation2019; Tang et al., Citation2020; Wang et al., Citation2020; Zhang et al., Citation2019; Zhang & Qian, Citation2020; Zhu et al., Citation2021).

Traditional semantic features often rely on distance to judge the relevance between words (Mikolov et al., Citation2013). They believed that the closer the words in a sentence are, the more intimate they are. Taking the sentence ‘sometimes the service is great, sometimes not, is not stable’ as an example, ‘great’ is the closest sentiment word to the aspect term ‘service’, but the sentiment category of ‘great’ has little relevance to ‘service’. Consequently, the error rate will be greatly increased if the ASC task is done simply by the static position information. Therefore, a tool is needed to help us extract more comprehensive features, not only static features such as location or semantics. Suppose we can figure out the grammatical relationship of a sentence in advance and know that the sentiment word ‘stable’ is more strongly associated with the aspect term, this error will be reduced. Based on the above ideas, some exploration is described as follows.

Convolutional neural network (GCN) is a good syntactic feature extractor that can encode representation by learning local graph structure and node features of text (Kipf & Welling, Citation2016). This paper designs a Syn-AttGCN network, which incorporates syntactic information and attention module based on GCN in detail below.

The Dependency parsing model LAL-Parser (Veličković et al., Citation2017) has been adopted by a large number of scholars to constitute the structure of a sentence. Here, Syn-AttGCN utilises it to build a syntactic dependency tree in Figure  by simulating the representation of neighbouring nodes. Furthermore, the IIA be added to the network so that we pay more attention to the key nodes.

Figure 4. Syntactic Dependency Tree. A word in the text represents a node, and part-of-speech of the node is shown at the bottom. Arrows point to adjacent nodes that are syntactically strongly related, and the relationship between nodes on both sides of the arrow is described inside the arc.

Figure 4. Syntactic Dependency Tree. A word in the text represents a node, and part-of-speech of the node is shown at the bottom. Arrows point to adjacent nodes that are syntactically strongly related, and the relationship between nodes on both sides of the arrow is described inside the arc.

To be specific, with the high-level feature representation Hi as input, Syn-AttGCN takes the information of each token in the text as the node feature Hij. Then, the adjacency matrix A is established according to the syntactic dependency graph as shown in Figure , which can prompt the model to consider the syntactic relevance of adjacent nodes when completing the ASC task. To update the global node features, σ chooses sigmoid as the update function that can integrate the information provided by the adjacency matrix A and the hidden feature: (9) Hij+1=σ(D~12(A+I)D~12HijWi),(9) after the transformation of nonlinearity. Where A is the product of LAL-Parser, D~ is the degree matrix of A, I is the unit matrix about A and W is the trainable parameter. Meanwhile, this section attempts to add the IIA to exploit useful information. It receives the domain information and node features from GCN and assigns different weights to different neighbouring 1nodes. This highlights the edge information with strong node association in the graph structure and filters out some nodes with insignificant features. Based on the above, more comprehensive text features, including obtained good syntactic features from Syn-AttGCN and key nodes' features from IIA, can help the model understand the correlation between words and improve feature representation.

3.3. Training loss function

To gain comprehensive multi-feature information, we converge the above results including the underlying implicit features Hi, the syntactic features Gi from the Syn-AttGCN and the high-level output features Ti from the TFM layer. Then, a softmax classifier is used to obtain the probability pic of the output label. The optimisation objective of the TPC task can be depicted as: (10) CrossEntropy=1Nic=1Myiclogpic,(10) which is a cross-entropy loss of predicted distribution and gold distribution, and represents how the training loss is calculated. Where M is the number of categories and N is the number of samples.

4. Experiment

Our model is evaluated in experiments on three datasets. Firstly, the dataset used in the experiment, parameter configuration and baseline model are introduced. Then, the proposed model was verified by specific experiments, including comparison experiments, ablation experiments and case study. Finally, this part analysed the experimental results in detail, and discussed the influencing parameters and computational consumption involved in the experiment.

4.1. Dataset

In this section, our model is thoroughly evaluated on TPC task. First, numerous competitive models are used to compare with our model on three general data sets (i.e. M1, M2 and M3) to confirm whether their performance differences are significant. Next, the ablation study and effect of Parameters are further conducted to validate the proposed method. Moreover, a case study is performed to show that our model is effective for specific scenarios.

This research uses the Aspect Term Sentiment Analysis (ATSA) dataset selected by most of the work (Chen & Qian, Citation2020; Li et al., Citation2019; Luo et al., Citation2020; Xu et al., Citation2018). All of them are originated from the Semeval Challenges (Pontiki et al., Citation2015, Citation2016), where only the aspect terms and their sentiment polarities are labelled.

Table  describes the data statistics in detail, the M1 dataset is annotated by Li et al. (Citation2019), the M2 dataset is taken from Wang et al. (Citation2017), the M3 dataset is taken from Peng et al. More precisely, Lap14 and Res14-16 are from the 2014–2016 International Workshop on Semantic Evaluation (SemEval). Lap14 involves a total of more than 3000 reviews in the field of laptops, which is less in number, and the text is difficult due to more implicit expressions. Res14-16 is a combination of Res14, Res15 and Res16, and contains about 4000 reviews in the restaurant field, which is relatively less difficult. Twitter consists of varied posts from users, and the data quality is worse compared to Lap14 and Res14-16. The above datasets were divided into the training set and testing set in a 9:1 ratio, with each data providing the corresponding aspect term and sentiment polarity (i.e. POS, NEU and NEG).

Table 3. The details of datasets.

4.2. Parameter settings

At first, the experimental platform chooses RTX8000, and the deep learning framework is Pytorch 1.5.0. Then, the batch size of data for model training is set to 32, the learning rate is 5e−5 , the random seed is 42, the number of hidden layers for LSTM is 768, and the number of layers for GCN is 2. Besides, each experiment is repeatedly executed five times, with a total of 1500 training steps each time, and the model is saved every 100 steps. It is worth noting that the experimental approach of the benchmark model BERT-ABSA is adopted in this work. More precisely, the model weights after 1000 steps were selected for testing and the average F value of the testing effect was reported as our experimental results. In addition, the model is tested using the ten-fold cross-validation, since the Twitter dataset doesn't have a standard division of training and test sets.

4.3. Baseline

To better conduct comparison experiments, this paper is analysed from the following baseline of working route (i.e. Pipeline, Joint and Unified).

4.3.1. Pipeline

SPAN-base is an aspect-based sentiment analysis framework based on span-tagging, proposed by Hu et al. (Citation2019). The paper conducts experiments for three ways of working: Pipeline, Joint and Unified, and the model of Pipeline is cited here, with a pre-trained model bert-based-cased as the backbone network.

4.3.2. Joint

IMN is a learning network based on interactive multi-task proposed by He et al. (Citation2019) that learns multiple subtasks jointly to improve model performance by introducing a message passing architecture. RACL-Bert uses Bert to encode word vectors to allow collaborative work between multiple subtasks by designing a relation-aware collaborative learning framework, proposed by Chen et al. (Citation2020). DOER designs a model containing dual recurrent neural networks and cross-shared units to aid in improving the performance of both ATE and ASC subtasks, proposed by Luo et al. (Citation2019). GRACE is a Joint training model of gradient coordination and cascade tagging proposed to enhance the exchange between aspect terms and improve the attention of affective polarity to affective tokens, proposed by Luo et al (Citation2020). IMKTN-Bert is an iterative multi-knowledge network that was constructed to express the interaction relevance between subtasks, here using Bert as pre-training weights. The network structure was proposed by Liang et al. (Citation2020).

4.3.3. Unified

LM-LSTM-CRF is a common sequence labelling framework based on the convolutional neural network, proposed by Xu et al. (Citation2018). E2E-TBSA is an end-to-end model applying a unified labelling scheme designed to better implement the TPC tasks, proposed by Li et al. (Citation2019). BERT-ABSA is a series of simple yet classical benchmark neural network models that demonstrate the precedence of an end-to-end framework based on Bert encoding, summarised by Li et al. (Citation2019). CMLA is a multilayer coupled attention designed to capture important information, originally designed to accomplish aspect-opinion co-extraction, and is used here for the TPC task. The method was proposed by Wang et al. (Citation2017). Peng-two-stage, which in the original paper implements aspect-sentiment triad extraction by designing two stages, and only the model of the first stage is cited in this paper for unified label prediction, which was proposed by Peng et al. (Citation2020). RINANTE (Dai & Song, Citation2018) is an aspect and opinion co-extraction method that mines aspect and opinion term extraction rules based on the dependency relations of words in a sentence. Dual-MRC (Mao et al., Citation2021) and UGF-BART (Chung et al., Citation2014) is a generative paradigm for ABSA. PBERT-MTL (Ke et al., Citation2022) is a novel end-to-end model, which can detect all triples more efficiently. UniASTE-BERT (Chen et al., Citation2022) is a novel multi-task learning framework to achieve end-to-end aspect sentiment triplet extraction.

4.4. Experimental results and analysis

In this section, the proposed model is thoroughly evaluated on TPC tasks. At first, numerous competitive models are used to compare with our model on three general data sets (i.e. M1, M2 and M3) to confirm whether their performance differences are significant. Next, the ablation study and effect of Parameters are further conducted to validate the proposed method. Moreover, a case study is performed to show that our model is effective for specific scenarios.

4.4.1. Comparison experiments

Table  shows the results of all comparison experiments. Where P, R and F, represent Precision, Recall and micro-F1, respectively, are used as experimental evaluation metrics. To consider the model more comprehensively, the following experiments focus on the F score in the following discussion, while other indicators can be used for reference and study.

On M1 dataset, compared our method to other models for TPC on Table . Obviously, in most cases, IIM obtains the optimal or better results. We first validate the proposed IIM model on Lap14 dataset. Specifically, the SPAN-BASE model based on the Pipeline working route obtains the seconds, while our model based on the Unified working route is in the first place. Meanwhile, other models based on Unified working routes such as BERT-ABSA, E2E-TBSA and LM-LSTM-CRF are also following close behind. In fact, some similar thing happens with the Twitter dataset. Even more, SPAN-BASE obtained the firsts, and DOER based on the Joint working route is also no longer the worst. This suggests that the working route has little restriction on the model, and different situations can choose the appropriate working route to build the model according to the actual situation. This conclusion can also be applied to the Res14-16 dataset. Furthermore, among all the models, only the GRACE model and our IIM-Ptr model use Post-training. It is worth noting that Post-training is a pre-training strategy. GRACE as the state-of-the-art model for 2020 is proposed through this strategy. Compared to GRACE, our model does better. It shows that the proposed method has less dependence on pre-training and has better performance by itself for TPC tasks. Besides, It can be observed that GRACE achieves competitive results against other models. It is mainly because GRACE adopts the idea of collaborative work to improve the information attention and information exchange between aspect terms. Similarly, our method uses the information interaction channel to focus on the associated information of the attention text, realises the synchronous extraction of aspect terms, and obtains better results. This fully verifies the effectiveness of this idea of collaboration in IIM. It is another worth noting that we introduce BERT-ABSA with a simple structure and easy scalability as the infrastructure in this paper, and make some adjustments on this basis to build a new Unified model IIM. Specifically, our method designs the module of IIC and Syn-AttGC to focus on more effective features. Compared with BERT-ABSA, IIM has greater progress in the effect. It indicates that a carefully designed Unified model can take full advantage of the classical model and better adapt to the new task according to the bottleneck point of the existing task.

Table 4. Experiment results for TPC on the M1 dataset (Li et al., Citation2019).

Table 5. Comparison F scores for TPC on the M2 dataset (Wang et al., Citation2017).

Table 6. Comparison F scores for TPC on the M3 dataset (Peng et al., Citation2020).

Table  reports the experimental results of M2 dataset. It can be seen that IIM can approach or even surpass most of the baselines. Note that Dual-MRC and UGF-BART are the latest generative models. Different from the proposed discriminative model, the generative model has a wider range of learning and stronger learning ability. Compared with them, IIM gets some close scores. Perhaps the class of discriminative models such as IIM still has a lot of potential for TPC tasks. Moreover, RACL-Bert is a learning model based on the BERT-Large pre-trained models by cooperating subtasks. While our results are achieved on the BERT-Base model with almost half parameters. Compared to it, IIM has almost similar results. It suggests that IIM with a Unified framework can extend to more subtasks of ABSA and build a more lightweight framework.

Table  reports the experimental results of M3 dataset. It is obvious that IIM outperforms all other methods. Among them, IIM far exceeds the relevant research PBERT-MTL in 2022, indicating that IIM is up to date. Besides, Peng-two-stage achieves the suboptimal performance by guiding the label boundary prediction with auxiliary signals, indicating the guiding of the boundary information to the model. Likewise, IIM adopts an approximate idea (i.e. SAM) in our approach, which gets optimal results, demonstrating the practicability of the proposed structure SAM.

Figure 5. The ratio of LDP on different models. (a) M1 dataset, (b) M2 dataset, (c) M3 dataset.

Figure 5. The ratio of LDP on different models. (a) M1 dataset, (b) M2 dataset, (c) M3 dataset.

Overall, IIM is proven to be effective for TPC task in comparison Experiments.

4.4.2. Ablation study

In this work, the proposed model contains three key strategies: IIA, SAM and Gra-AttGCN. According to the above experimental results, it can be found that IIM significantly enhances the prediction performance. However, the role that each strategy plays in solving the TPC task remains unclear. To verify the effectiveness of these three strategies, the ablation study sets up several variations by removing certain structures or strategies from the original model, as shown in the last two rows of Table . As expected, all simplified variants have a performance drop of F scores. -IIA has seriously dropped by 2.13%, 2.62% and 1.7%, respectively. Besides, this section tries to train the model without SAM, and it can be seen that the evaluation scores have a certain degree of reduction. It demonstrates that the introduction of IIA or SAM is an effective aid to the model. An interesting finding is that introducing the component IIA or SAM individually into the IIM has relative gains on the F score, putting them together (i.e. the IIC, see Figure ), leads to the new SOTA result. This result illustrates the necessity of a combination of IIC components. From the above, we guess that SAM benefits from IIA plays an important role in helping reduce labelling errors and improve the correct prediction rate. In brief, each part of the IIM is indispensable.

4.4.3. Effect of parameters

The proposed IIM model may be affected by some parameters, and the key parameters GCN layers and training steps are mainly discussed here. First of all, GCN unites different node information and syntactic structure graphs to achieve feature transmission through a multi-layer network. Consequently, this part discusses the effect obtained with different GCN layers. As depicted in Figure (a), It can seen that the 2-layer network gets the best F value on three different datasets, and the effect decreases significantly with the increase of GCN layers, which shows that not the more layers of the network obviously the better the effect is. Our analysis suggests that too many layers of the network may produce too many parameters causing an increase in computational complexity. Next, Figure (b) shows the effect of different training steps on the performance of the model, and it can be seen that the F score tends to smooth after 1000 steps, which can indicate that our model has strong robustness to overfitting.

Figure 6. The performance of IIM under different parameters. (a) The impact of network layers, and (b) The impact of steps.

Figure 6. The performance of IIM under different parameters. (a) The impact of network layers, and (b) The impact of steps.

4.4.4. Case study

According to the difference of model's functional blocks, this section details the analysis of results of several examples for a case study:

LDP This paper designs a module SAM to mitigate the LDP. To verify its effectiveness, there are some cases described in Table . Taking the aspect term 'AMD Turin Processor' for example, it can be found inconsistencies in sentiment labels (e.g. POS and NEG) and violations of the BIEOS tagging scheme (i.e. ‘B, I, E’ as consecutive tags). However, there is no such phenomenon on IIM. The reason might be that the full model IIM can correct the incorrect labels, and the error of ‘I-NEG’ for the term ‘Windows Vista’ is similarly rectified. Meanwhile, the third aspect term ‘installation disk (DVD)’ produces two incorrect subword labels (i.e. E-NEG and E-POS) under the prediction of -SAM. And yet, the boundary label ‘E’ is corrected to ‘I’, and the errors are somewhat mitigated when it passes through the IIM. Furthermore, to better understand the LDP slowdown, we count data in Figure  which describes the ratio of LDP on different data sets for different methods. It can seen that IIM has fewer LDP problems on most datasets than other models. In particular, our method suffers less LDP perturbation than -SAM which removes the SAM module. In summary, the effectiveness of our study for mitigating the LDP problem can be seen (Figure ).

Table 7. The case analysis of LDP. The text represents input text, -SAM tries to remove the SAM strategy from the IIM, where ✗ means the wrong prediction for labels.

IIA and Candidate Words The candidate words consist of the CBWs and CEWs with information influence which are selected using a priori knowledge. As illustrated in Figure (a), these words are selected as CBWs shown on the horizontal axis including ‘Windows 7’, ‘Improvement’ and ‘Vista’. In the case of the combined IIA, it is obvious that the text will assign more attention to these candidate words, which means that they will be used as features to indicate the influence and suggest the boundary range of the aspect term of the final prediction. Similarly, the horizontal axis words ‘love’, ‘past’ and ‘over’ in Figure (b) are selected as CEWs, and the text allocates more attention to the emotions involved, which can guide the model to classify the polarity of labels. According to the actual situation of word label (i.e. WindowsB-POS; 7E-POS; VistaS-NEG), the above approach was able to guide the text to allocate the attention weights reasonably. It can be seen that the final aspect terms and their emotions have been covered, which will be used as experience to better direct the model and illustrate the usefulness of the above strategy.

Figure 7. The attention matrix shows that candidate words have different levels of attention in the text. (a) The attention matrix of CBWs, and (b) The attention matrix of CEWs.

Figure 7. The attention matrix shows that candidate words have different levels of attention in the text. (a) The attention matrix of CBWs, and (b) The attention matrix of CEWs.

4.4.5. Computational cost

To demonstrate that proposed model does not cost a large computational cost, IIM is compared with two strong baselines UGF-BART and DOER in terms of the parameter number and running time. Experiment run three models on the M1-Lap14 dataset with the same batch size 32 in a single RTX8000 GPU, and present the results in Table . Specifically, our runtime per epoch is close to or even faster than the previous state-of-the-art DOER result. Compared with the last UGF-BART model, although IIM has a certain disadvantage in computational time, not much difference and the parameter numbers are approximate. Combined with the results of comparison experiments, it is clear that IIM is a model with lower cost but better performance.

5. Conclusion

To realise TPC, this paper proposed a novel Unified-based framework IIM. Specifically, IIM designed an information interaction channel capable of helping the model to exploit the latent interactive features of text. Our method also conducted in-depth analyses on the LDP problem within aspect terms, adding a SAM module to enhance the memory location information and refine the word boundary representation, to alleviate the LDP. Besides, It can been found that good syntactic structure is crucial for enhancing the model's comprehensive ability, hence a graph neural network Syn-AttGCN was introduced to understand the association between words by generating syntactic dependency trees. Experiment on three general datasets and obtained favourable results, demonstrating the effectiveness of IIM on the TPC task.

Table 8. Results of controlled experiments for term polarity co-extraction.

In future work, we have interest in further study to simplify the SAM module using mathematics. It hoped that the simplified SAM can improve the mitigation effect of LDP and help build a more lightweight learning framework.

Disclosure statement

No potential conflict of interest was reported by the authors.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China under Grant 61862007.

References

  • Chen, F., Yang, Z., & Huang, Y. (2022). A multi-task learning framework for end-to-end aspect sentiment triplet extraction. Neurocomputing, 479, 12–21. https://doi.org/10.1016/j.neucom.2022.01.021
  • Chen, Z., & Qian, T(2020). Relation-aware collaborative learning for unified aspect-based sentiment analysis. In Proceedings of the 58th annual meeting of the association for computational ling2uistics (ACL 2020) (pp. 3685–3694). https://doi.org/10.18653/v1/2020.acl-main.340
  • Chiu, J. P., & Nichols, E. (2016). Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4, 357–370. https://doi.org/10.1162/tacl_a_00104
  • Chung, J, Cho, H., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proceedings of the 57th annual meeting of the association for computational ling2uistics (ACL 2019), 2019. https://doi.org/10.48550/arXiv.1412.3555
  • Dai, H., & Song, Y. (2018). Neural aspect and opinion term extraction with mined rules as weak supervision. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 5268–5277). https://doi.org/10.48550/arXiv.1907.03750
  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 4171–4186). https://doi.org/10.48550/arXiv.1810.04805
  • He, R., Lee, W. S., Ng, H. T., & Dahlmeier, D. (2019). An interactive multi-task learning network for end-to-end aspect-based sentiment analysis. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 504–515). https://doi.org/10.48550/arXiv.1906.06906
  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
  • Hu, M., Peng, Y., Huang, Z., Li, D., & Lv, Y. (2019). Open-domain targeted sentiment analysis via span-based extraction and classification. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 537–546). https://doi.org/10.48550/arXiv.1906.03820
  • Huang, B., & Carley, K. M. (2019). Syntax-aware aspect level sentiment classification with graph attention networks. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5469–5477). https://doi.org/10.48550/arXiv.1909.02606
  • Ke, C., Xiong, Q., Wu, C., Liao, Z., & Yi, H. (2022). Prior-bert and multi-task learning for target-aspect-sentiment joint detection. In ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7817–7821). Singapore. https://doi.org/10.1109/ICASSP43922.2022.9747904
  • Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. In 5th International conference on learning representations (ICLR 2017) (pp. 24–26). https://doi.org/10.48550/arXiv.1609.02907
  • Li, X., Bing, L., Li, P., & Lam, W. (2019). A unified model for opinion target extraction and target sentiment prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 6714–6721. https://doi.org/10.1609/aaai.v33i01.33016714
  • Li, X., Bing, L., Zhang, W., & Lam, W. (2019). Exploiting BERT for end-to-end aspect-based sentiment analysis. In Proceedings of the 5th workshop on noisy usergenerated text (W-NUT 2019) (pp. 34–41). https://doi.org/10.48550/arXiv.1910.00883
  • Liang, Y., Meng, F., Zhang, J., Chen, Y., Xu, J., & Zhou, J. (2020). An iterative multi-knowledge transfer network for aspect-based sentiment analysis. In Findings of the association for computational linguistics (EMNLP 2021) (pp. 1768–1780). https://doi.org/10.48550/arXiv.2004.01935
  • Luo, H., Ji, L., Li, T., Duan, N., & Jiang, D. (2020). Grace: Gradient harmonized and cascaded labeling for aspect-based sentiment analysis. In Findings of the association for computational linguistics (EMNLP 2020) (pp. 54–64). https://doi.org/10.48550/arXiv.2009.10557
  • Luo, H., Li, T., Liu, B., & Zhang, J. (2019). DOER: Dual cross-shared RNN for aspect term-polarity co-extraction. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 591–601). https://doi.org/10.48550/arXiv.1906.01794
  • Ma, X., & Hovy, E. (2016). End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1, 1064–1074. https://doi.org/10.48550/arXiv.1603.01354
  • Mao, Y., Shen, Y., Yu, C., & Cai, L. (2021). A joint training dual-MRC framework for aspect based sentiment analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15), 13543–13551. https://doi.org/10.1609/aaai.v35i15.17597
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. https://doi.org/10.48550/arXiv.1301.3781
  • Nguyen, H. T., & Le Nguyen, M. (2018). Effective attention networks for aspect-level sentiment classification. In 2018 10th International conference on knowledge and systems engineering (KSE) (pp. 25–30). https://doi.org/10.1109/KSE.2018.8573324
  • Peng, H., Xu, L., Bing, L., Huang, F., Lu, W., & Si, L. (2020). Knowing what, how and why: A near complete solution for aspect-based sentiment analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 8600–8607. https://doi.org/10.1609/aaai.v34i05.6383
  • Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., & Al-Smadi, M. (2016). Semeval-2016 task 5: Aspect based sentiment analysis. In International workshop on semantic evaluation (pp. 19–30). https://doi.org/10.18653/v1/S16-1002
  • Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., & Androutsopoulos, I. (2015). Semeval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 486–495). https://aclanthology.org/S15-2082.pdf
  • Sun, C., Huang, L., & Qiu, X. (2019). Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In Proceedings of the 2019 conference of the North american chapter of the association for computational linguistics: Human language technologies, 1 (pp. 380–385). https://doi.org/10.48550/arXiv.1903.09588
  • Sun, K., Zhang, R., Mensah, S., Mao, Y., & Liu, X. (2019). Aspect-level sentiment analysis via convolution over dependency tree. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5679–5688). https://aclanthology.org/D19-1569.pdf
  • Tang, H., Ji, D., Li, C., & Zhou, Q. (2020). Dependency graph enhanced dual-transformer structure for aspect-based sentiment classification. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 6578–6588). https://doi.org/10.18653/v1/2020.acl-main.588
  • Tay, Y., Tuan, L. A., & Hui, S. C. (2018). Learning to attend via word-aspect associative fusion for aspect-based sentiment analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.12049
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems 30: Annual conference on neural information processing systems (NIPS 2017) (pp. 5998–6008). https://arxiv.org/pdf/1706.03762v5.pdf
  • Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. In 6th International conference on learning representations (ICLR 2018) (p. 12). https://doi.org/10.48550/arXiv.1710.10903
  • Wang, K., Shen, W., Yang, Y., Quan, X., & Wang, R. (2020). Relational graph attention network for aspect-based sentiment analysis. In Proceedings of the 58th annual meeting of the association for computational linguistics (ACL 2020) (pp. 3229–3238). https://doi.org/10.48550/arXiv.2004.12362
  • Wang, W., Pan, S. J., Dahlmeier, D., & Xiao, X. (2017). Coupled multi-layer attentions for co-extraction of aspect and opinion terms. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10974
  • Wang, Y., Huang, M., Zhu, X., & Zhao, L. Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 conference on empirical methods in natural language processing(EMNLP 2016) (pp. 606–615). https://aclanthology.org/D16-1058.pdf
  • Xiao, J., & Luo, X. (2022) Aspect-level sentiment analysis based on BERT fusion multi-attention. In 2022 14th International conference on intelligent human-machine systems and cybernetics (IHMSC) (pp. 32–35). https://doi.org/10.1109/IHMSC55436.2022.00016
  • Xing, B., Liao, L., Song, D., Wang, J., Zhang, F., Wang, Z., & Huang, H. (2019). Earlier attention? Aspect-aware LSTM for aspect-based sentiment analysis. In Proceedings of the twenty-eighth international joint conference on artificial intelligence (IJCAI 2019) (pp. 5313–5319). https://doi.org/10.48550/arXiv.1905.07719
  • Xu, H., Liu, B., Shu, L., & Yu, P. S. (2019). BERT post-training for review reading comprehension and aspect-based sentiment analysis. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies (NAACL 2019), 1 (pp. 2324–2335).
  • Xu, H., Liu, B., Shu, L., & Yu, P. S. (2018). Double embeddings and CNN-based sequence labeling for aspect extraction. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2, 592–598. https://doi.org/10.48550/arXiv.1805.04601
  • Xue, W., & Li, T. (2018). Aspect based sentiment analysis with gated convolutional networks. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 1, 2514–2523. https://doi.org/10.48550/arXiv.1805.07043
  • Yan, H., Dai, J., & Zhang, Z. (2021). A unified generative framework for aspect-based sentiment analysis. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (pp. 2416–2429). https://doi.org/10.48550/arXiv.2106.04300
  • Yang, H., Zeng, B., Yang, J., Song, Y., & Xu, R. (2020). A multi-task learning model for Chinese-oriented aspect polarity classification and aspect term extraction. Neurocomputing, 419, 344–356. https://doi.org/10.1016/j.neucom.2020.08.001
  • Yang, J., & Yang, J. (2020). Aspect based sentiment analysis with self-attention and gated convolutional networks. In 2020 IEEE 11th international conference on software engineering and service science (ICSESS) (pp. 146–149). https://doi.org/10.1109/ICSESS49938.2020.9237640
  • Yang, J., Yang, R., & Lu, H., et al. (2019). Multi-entity aspect-based sentiment analysis with context, entity, aspect memory and dependency information. Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(4), 1–22. https://doi.org/10.1145/3321125.
  • Zhang, C., Li, Q., & Song, D. (2019). Aspect-based sentiment classification with aspect-specific graph convolutional networks. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 4568–4578). https://doi.org/10.48550/arXiv.1909.03477
  • Zhang, M., & Qian, T. (2020). Convolution over hierarchical syntactic and lexical graphs for aspect level sentiment analysis. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP 2020) (pp. 3540–3549). https://doi.org/10.18653/v1/2020.emnlp-main.286
  • Zhang, W., Li, X., Deng, Y., Bing, L., & Lam, W. (2022). A survey on aspect-based sentiment analysis: Tasks, methods, and challenges. https://doi.org/10.48550/arXiv.2203.01054
  • Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., & Sun, M. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57–81. https://doi.org/10.1016/j.aiopen.2021.01.001
  • Zhu, Y., Xu, W., Zhang, J., Liu, Q., Wu, S., & Wang, L. (2021). Deep graph structure learning for robust representations: A survey. https://doi.org/10.48550/arXiv.2103.03036