713
Views
2
CrossRef citations to date
0
Altmetric
Research Article

Optimal feature selection and invasive weed tunicate swarm algorithm-based hierarchical attention network for text classification

, &
Article: 2231171 | Received 24 Jan 2023, Accepted 24 Jun 2023, Published online: 10 Jul 2023

Abstract

Through social media platforms and the internet, the world is becoming more and more connected, and producing enormous amounts of data. Also, the texts are collected from social media, newspapers, user reviews of products, company press releases, etc. The correctness of the classification is mainly dependent on the kind of words utilised in the corpus and the features utilised for classification. Hence, due to the increasing growth of text data on the Internet, the accurate organisation and management of text data has become a great challenge. Hence, in this research, an effective Invasive Weed Tunicate Swarm Optimization-based Hierarchical Attention Network (IWTSO-based HAN) is implemented for achieving categorisation of text. Here, the features are mined from the text and thereby the optimal features are acquired to perform the classification strategy. The incorporation of parametric features of each optimisation ensures the proposed method to increase the convergence of global solutions by improving the categorisation effectiveness. The proposed method obtained better performance for text classification with measures, like accuracy, True Positive Rate (TRP), True Negative Rate (TNR), precision, and False Negative Rate (FNR) with values of 92.4%, 92.4%, 94.1%, 95.4%, and 0.0758.

1. Introduction

The text data is collected from various sources, like emails, social media, chats, insurance claims, tickets, web data, questions, and answers from customer services and user reviews. The text is a specifically rich source of data but acquiring insights from the text results in a time-consuming and complex task due to the nature of unstructured data (Minaee et al., Citation2020) (Fan et al., Citation2023). In general, the text documents consist of the huge dimensional noisy, and irrelevant terms. The presence of noise and the irrelevant factors direct to complexity of computation and generate noise in the learning procedure and hence it results in poor performance for the classifiers. Hence, it is required to process the text documents before performing text classification for minimising the dimension and eliminate noisy and irrelevant terms (Wang & Hong, Citation2019). The electronic text processing model is everywhere in recent decades with several files and formed more substantial issues to solve the user requirements information. Also, it is one of the methods is to classify the textual data automatically, so the users can retrieve the data easily manipulate and extract the information for recognising the patterns and thereby creating information. Electronic files are organised into various categories that become a growing interest in different organisations and people (Koller & Sahami, Citation1997; Stein et al., Citation2019). Text classification is the area that produces a solution for the issues and utilises a combination of knowledge fields, like data mining, natural language processing (NLP), information retrieval, machine learning etc. It is commonly measured a supervised machine-learning issue in such a way that the approach is trained from various examples for classifying the hidden piece of text (Vidyadhari et al., Citation2019; Stein et al., Citation2019; Kou et al., Citation2020). Communication is developed by the internet so that data sharing is possible through the internet. The text files present on the Internet hold huge commercial value. Mine the useful data hidden in the text gains more benefit from the text classification as it intends for assigning the text files by unidentified classes into a fixed number of classes with certain classifiers (Wang & Hong, Citation2019). Hence, the text classification mechanism is utilised to organise text data suitably. Text classification is a supervised learning model that allocates text documents to specified categories (Thirumoorthy & Muneeswaran, Citation2020). Text classification is otherwise termed text categorisation such that it is the tags assigning procedure or labels to textual units, like documents, paragraphs, queries, and sentences. Text classification is achieved either by automatic labeling or by manual annotation. Deep learning (Roy et al., Citation2023) and machine learning methods are commonly used for effective text classification. One of the machine learning approaches, named Structured Logistic Regression (SLR) is used in the context of text classification. Also, it is used in bioinformatics large-scale classification issues (Pedersen et al., Citation2014). With the increasing growth of text data in industrial appliances, text classification in an automatic way becomes more important (Minaee et al., Citation2020). Accordingly, text classification is present in different appliances, like complaint feedback of specific services in the social network, audio music genres, and TV programmes classification in the entertainment data selection, identification is specific topics, detection of accidents for disease outbreaks, accidents in the industry oil, analysis of essay answers, and the materials classification of various reading levels for the students concerning the education system, filtering of e-mail (Drucker et al., Citation1999; Guzella & Caminhas, Citation2009; Mohammadzadeh & Gharehchopogh, Citation2021), classification of pediatric asthma in the biomedicine (Wang & Hong, Citation2019; Pedersen et al., Citation2014), e-mail classification (Günal et al., Citation2006; Yu & Zhu, Citation2009), topic and author gender identification, sentiment classification (ElAmine Chennafi et al., Citation2022), speech detection (Aldjanabi et al., Citation2021) and web page classification (Chen & Hsieh, Citation2006; Anagnostopoulos et al., Citation2004; Thirumoorthy & Muneeswaran, Citation2020;Han et al., Citation2022).

The indivisible entity of text quantity is termed as the word or feature or term. The unstructured content credentials are specified as a feature vector in the text classification domain (Feng et al., Citation2023). The noise feature does not contain any facts regarding the category. It is very complex to conclude from the noise feature. For example, when the feature or term is present in each text document, then the feature or term will not be useful for the classification (Thirumoorthy & Muneeswaran, Citation2020). Accordingly, feature acquisition is a significant module utilised in the text classification process such that it aims to find the subset of a feature that contains a smaller number of relevant features and is more sufficient for maintaining or increasing the classification effectiveness (Belazzoug et al., Citation2020). In general, the feature chosen mechanism is employed to minimise the data dimension and error rate of the classifier and to decrease computational time (Thirumoorthy & Muneeswaran, Citation2020; Parlak & Uysal, Citation2023; Şahin & Kılıç, Citation2019; Saeed & Al Aghbari, Citation2022). It selects the subset of features from the entire features by various analyses (Liu et al., Citation2020; Zhou et al., Citation2022; Jin et al., Citation2023; Coban, Citation2022). The feature selection approaches are specifically found in different domains, like document clustering, text categorisation, computer vision, bioinformatics, image processing, and industrial appliances (Belazzoug et al., Citation2020).

1.1. Problem statement

Text classification is the method of categorising the set of text files into various classes from the predefined set. Text classification is an essential method to process and manage a huge amount of files in digital forms. Commonly, text classification is useful for extracting and summarising information and retrieval of text. There are many text classification methods implemented in previous studies. The problems identified in the previous studies are discussed below:

  • The computation time was increased owing to the huge dimension of the information.

  • The feature space with huge dimensionality has noise, and redundant and unrelated features, which affect the correctness of the model.

  • Some feature selection methods did not effectively select the important features, so the error rate of the classifier was improved.

Hence, in this research, IWTSO-based HAN is developed for the efficient classification of text documents.

In this research, a text classification framework is designed with HAN to classify the text documents of sentences. Here, the input text is passed to the pre-processing phase, and then the text data is subjected to the feature extraction module. Accordingly, the acquired features are passed to the feature selection phase in such a way that the best features are selected by employing the IWTSO. Then, in the classification module, the HAN is used to classify the text data in such a way that the HAN is trained by the IWTSO algorithm. Moreover, the IWTSO is formed by the integration of improved Invasive Weed Optimization (IWO) (Misaghi & Yaghoobi, Citation2019) and Tunicate Swarm Algorithm (TSA) (Kaur et al., Citation2020), respectively.

Major contributions of the research:

  • IWTSO-based HAN: An effective text classification framework is designed with HAN to classify the text documents of sentences. The IWTSO-based HAN effectively achieved better results using fitness determination.

  • The IWTSO algorithm is formed by integrating the IWO and TSA, which is used for the effective training of the HAN classifier.

The remaining parts of the manuscript are Section 2 presents the discussion of various text classification aproaches, and Section 3 explains the IWTSO-based HAN. Section 4 discussed the results of the IWTSO-based HAN, and the research is concluded in Section 5.

2. Literature survey

For the effective selection of features, Belazzoug et al. (Citation2020) established an improved sine cosine algorithm (ISCA). The novel search space regions were identified with the help of this algorithm and generated the best solution. The solutions considered were the better solution location and search space location. Here, the efficiency was improved and removed the premature convergence. The feature selection issues were solved, but it did not apply for some complicated issues. Lim and Kim (Citation2020) developed a quadratic programming-based model to compute the optimal balance between two dependencies for selecting the features. It considered the similarity measure for eliminating redundant terms. Here, the similarity measure was computed using the mutual information and ranking method. It is not limited to semantically related terms. It increased the categorisation accuracy, but it suffered from the requirements of large processing time. Thirumoorthy and Muneeswaran (Citation2020) developed a normalised difference measure-binary Jaya optimization algorithm (NDM-BJO) for acquiring the subset of best features. Here, the error rate was considered for measuring fitness. It reduced the dimension of feature space and increased the classification accuracy. Wang and Hong (Citation2019) developed a Hebb rule-based feature selection model (HRFS). It depends on the neural synapse scheme and finds the discriminative terms with the Hebb rules. It reduced the time and programming complexity, but it consumed more time for the optimisation procedure to converge.

Goudjil et al. (Citation2018) developed a support vector machine (SVM) model to solve the high dimensional classification issues. It used the active learning approach for text classification. It minimised labeling effort by selecting the samples that were labeled. It used posterior probability for selecting the informative samples. It increased the classification accuracy, but it needed more labeled samples for training the classifier. Ranjan and Prasad (Citation2018) introduced a lion fuzzy neural network (LFNN) for accomplishing text categorisation. It considered the dynamic database to achieve the classification process. Here, the feature was mined from the words for reducing the dimension of the search space, but the error rate was not reduced. Liu et al. (Citation2020) developed a relative document term frequency difference (RDTFD) for selecting the features. This method was used to partition the features of total text files into small feature sets based on the capability of features for discriminating the negative and the positive samples. It reduced the redundancy of features and improved the performance of categorisation. It did not increase the running speed. Borhani (Citation2020) developed an artificial neural network for text classification. It used a fast text classifier using an updating formula to tune the neural network. However, discriminative features were used for text mining. It discovered the knowledge and achieved better performance in mining the information. It increased the performance but resulted in high computational complexity. Gasmi (Citation2022) implemented a Bidirectional Encoder Representation from the Transformers (BERT) model based on the optimal deep learning scheme. Here, the parameter selection of deep Learning was done by Particle Swarm Optimization (PSO). The prediction of matching response was done by the k-Nearest Neighbors algorithm (KNN). The accuracy of this model was high, but it had complexity issues. Maragheh et al. (Citation2022) implemented the Spotted Hyena Optimizer (SHO)-Long Short-Term Memory (SHO-LSTM) approach for text classification. Here, for word embedding the Skip-gram approach was used. The weights of LSTM were optimised by the SHO for improving the correctness of the approach. It had better convergence capability, but the parameter optimisation was not that effective.

2.1 Review on meta-heuristics algorithms

The recent meta-heuristics optimisation algorithms are reviewed based on the advantages and disadvantages and the details are tabularised in Table .

Table 1. Review on recent meta-heuristics algorithms.

2.2 Research gaps

  • The quadratic programming model designed by (Lim & Kim, Citation2020) minimises the number of redundant terms chosen and increases the classification accuracy. However, this approach considers a greater number of dependencies between the terms, and it increases the processing time such that it results in a major limitation.

  • In the process of classification mechanism, eliminating the noise from the large volume of text documents, like redundant and irrelevant features results in a complex task. Hence, it is required to use dimension reduction schemes for solving such issues, as feature selection and feature extraction (Liu et al., Citation2020).

  • The machine learning and deep learning method obtained higher accuracy in the sentiment analysis and topic classification, but the higher performance often depends on the quality and the dimension of training samples that is often difficult to collect (Wei & Zou, Citation2019).

  • Much traditional text classification research concentrates only on a certain type of sentence or phrase. However, these techniques rely on target words or target sentences for solving text classification issues without the consideration of the relationship between each word (Liu & Guo, Citation2019).

  • Unlike documents of paragraphs, the short texts are unclear as it does not contain sufficient context information that results in a major issue for classification.

3. Proposed IWTSO-based HAN

The IWTSO-based HAN is devised in this research for classifying the text. Here, the best features are acquired by employing IWTSO. Finally, the classification of text process is carried out with HAN in such a way that it is trained by the IWTSO that is formed by merging improved IWO (Misaghi & Yaghoobi, Citation2019) with TSA (Kaur et al., Citation2020). Figure portrays the schematic view of the IWTSO-based HAN.

Figure 1. Schematic view of IWTSO-based HAN.

Figure 1. Schematic view of IWTSO-based HAN.

3.1 Acquisition of text data

Assume the dataset as D with n count of text documents and is represented as, (1) D={D1,D2,,Di,,Dn};1in(1) where, D denotes dataset, Di indicates text data available at ith dataset index, and n signifies the overall text files. For the Reuter dataset the value of n=19043, whereas for the 20-Newsgroup dataset n=19997, and with real-time data, the total number of documents is specified as n=5000, respectively. The input text data Di is used to execute the classification process.

3.2 Pre-processing

Di is subjected to the pre-processing, which is done by employing stop word removal and stemming processes. It is an important step in information retrieval and text mining due to unstructured data. This step aims to remove the noise from text data. Its purpose is to clean before processing the classification task. At this phase, the dimension of data can be reduced either through missing value, duplication, or minimising the overall features. Here, the data that is in the unstructured format is changed to the structured text representation. It is the procedure of cleaning and preparing the data text for classification.

Stop word removal: Stop word is part of natural language that does not contain any meaning in the text processing system. The main general words in the text documents are articles, pronouns, and prepositions. Words that do not provide meaning to documents are treated as stop words. These words are not assumed as keywords in text applications hence it is required to eliminate such words from documents. It removes the frequent and common words that do not have any important influence on the sentence. The stop words considered are, but, also, to, have, and can.

Stemming: It is the process to acquire the root or base of a word by eliminating prefixes and suffixes. It reduces the word variants to the root kind. The library used for stemming is Porter Stemmer. The outcome of this phase is signified as A with [U×V] size.

3.3 Extraction of features from text data

The extracted features are explained below:

Wordnet-based feature: It is a commonly used lexical resource for NLP tasks. This is the network of concepts, which is in the word nodes form. These word nodes are organised by the semantic associations among the words depending on their sense. The semantic relation is nothing but the pointers among the synsets. This is specifically utilised to find synsets (Liu et al., Citation2015). Accordingly, the synsets identified from the text data are specified as f1, respectively.

Co-occurrence-based feature: The use of term sets or item sets in the documents is termed as co-occurrences of terms. It means the frequent occurrence of from the text corpus. It is specified as, (2) f2=RitZt(2) Here, Rit denotes the words’ i and t co-occurrence frequency, and Zt indicates the word t frequency. The co-occurrence-based features acquired from text data are specified as f2.

TF-IDF: It contains two parts, namely TF and IDF. Accordingly, TF measures the individual words in the text, and IDF signifies the frequency of the word that is present in the text (Wu et al., Citation2020). (3) TF=n(a)n(3) where, n(a) indicates the number of occurrences of entries a in the class, and n denotes the entire entries.

The inverse text frequency in IDF is specified as, (4) IDF(a)=logN+1N(a)+1+1(4) where, N denotes the corpus overall texts, and N(a) indicates the total number of texts that contains a word in the corpus. The TF-IDF is represented as, (5) f3=TFIDF(a)=TF×IDF(a)(5) The extracted feature is specified as f, which is formed by f={f1,f2,f3}.

3.4 Feature selection by IWTSO algorithm

After the extraction of features from the text data, the unique and important features to be selected using the IWTSO. The selection of important features from the text data utilised to increase the categorisation correctness as the redundant and duplicate data can be removed. The IWO is the population-based algorithm. In general, the plant is represented as a weed that unintentionally grows in the environment. It is very effective in converging to the optimum solutions with the essential features, like competition, growth, and seeding in the weed colony. The basic characteristic features, like reproduction, spectral spread, and competitive deprivation are employed for the simulation of the habitat behaviour of the weeds. TSA is a bio-inspired algorithm. The swarm behaviour and the jet propulsion of TSA are integrated with the weed behaviour of improved IWO and so the convergence rate of the optimisation is increased thus enabling to generate global best solution by eliminating the local optima. The size of the selected feature X is [U×V]. In the Reuter dataset 19043 text documents are totally presents, and the size of selected features is indicated by [19043×5], whereas the 20-Newsgroup dataset has 19997 documents and the dimension of selected features are [19997×5]. Similarly, 5000 files are in the real-time data and the dimension of features is [5000×5].

Solution encoding: This is the solution vector representation, in which the selected optimal feature is indicated as X, whereas X<f, respectively. Figure portrays the solution encoding.

Figure 2. Solution encoding.

Figure 2. Solution encoding.

Fitness measure: It is a process of computing optimal features among a set of features by te consideration of the accuracy measure. The equation for this is shown in Eq. (37).

Algorithmic procedure of IWTSO

  1. Initialisation: Let us define H as the weeds’ population in the solution space, and Hbest is the finest position of weeds.

  2. Fitness computation: It is employed to discover the best solution in the selection of feature selection process in such a way that it is determined using the actual and the target values difference.

  3. Update solution: The location update solution is specified as, (6) Hls+1=η(s)Hls+HbestHls(6) (7) Hls+1=Hls(η(s)1)+Hbest(7)

The standard equation of TSA is, (8) Hls+1=J+B.G(8) (9) Hls+1=J+B|Jrand.Hls|(9) Let us assume, J>Hls (10) Hls+1=J+B(Jrand.Hls)(10) (11) Hls+1=J(1+B)BrandHls(11) (12) J=Hls+1+BrandHls1+B(12)

As J is the best TSA search agent, which is substituted in Hbest of improved IWO. (13) Hls+1=Hls(η(s)1)+Hls+1+BrandHls1+B(13) (14) Hls+1=Hls(η(s)1)+Hls+11+B+BrandHls1+B(14) (15) Hls+1Hls+11+B=Hls(η(s)1)+BrandHls1+B(15) (16) Hls+1(111+B)=Hls(η(s)1)+BrandHls1+B(16) (17) Hls+1(1+B11+B)=Hls(η(s)1)+BrandHls1+B(17) (18) Hls+1=1+BB[Hls(η(s)1)+BrandHls1+B](18) (19) Hls+1=1+BBHls(η(s)1)+randHls(19) where, (20) η(s)=(SsS)m(ηinitialηfinal)+ηfinalx(s)(20) (21) B=CI(21) (22) C=b2+b3K(22) (23) K=2.b1(23) (24) I=[Qmin+b1.QmaxQmin](24) Here, B denotes the vector, C implies gravity force, I signifies social force between the search agents, K indicates water flow advection, x(s) represents chaotic mapping, randb1, b2, and b3 are a random number with the interval of [0,1]. Here, Qmin is set to 1, and Qmax is set to 4, respectively.

  • (i) (iv) Feasibility evaluation: The factor of fitness is computed for every result in such a way that the result with the best fitness value is declared as the best result.

  • (ii) (v) Termination: The aforementioned steps are repeated until the best result is attained. Algorithm 1 represents the pseudo-code of the IWTSO.

Algorithm 1. Pseudo code of IWTSO

3.5 Text classification using proposed IWTSO-based HAN

After selecting the features, the text categorisation is achieved by employing the HAN. Text classification is the fundamental task in NLP. The key goal of text categorisation is to allocate the labels to text. The advantage of using HAN for text classification is to capture two basic insights regarding the document structure. First, as the document has a hierarchical structure, the document representation is constructed by modeling the sentence representation and then grouping them to text representation. Second, it is noted that different sentences and texts in the documents are separately informative. The importance of sentences in the text is generally context-dependent, that is same sentence or text is separately important in various contexts. To enhance the performance of categorisation, the HAN includes two different levels of attention, like word level and sentence level.

  1. Structure of HAN: The structure of HAN is composed of different parts (Yang et al., Citation2016). The input taken by the classifier has the dimension of [40×50], whereas the result generated by the Bidirectional layer has the size of [40×100]. However, the attention layer processes the data with dimension [40×100] and forms the result data with [1×100] size.

Word encoder: Here, the input feature X is embedded in vectors utilising the embedding matrix E. The bidirectional gated recurrent unit (GRU) consists of a forward GRU, in which the reading has been done from first to last, and in the backward GRU the reading has been done from last to first. (25) L=E.X;X[1,,M](25) (26) g=GRU(L),X[1,M](26) (27) g=GRU(L),X[M,1](27)

Here, annotation is obtained for the given feature X by integrating the forward and backward hidden state g such that g=[g,g].

Word attention: All the words do not equally contribute to the demonstration of sentence meaning. The significant words in the sentence are extracted by using this. The demonstration of informative features is merged for forming the sentence vector. (28) yX=tanh(E.g+q)(28) (29) β=exp(yXTyv)Lexp(yXTyv)(29) (30) V=β.g(30)

At first, word footnote g is passed to the one-layer MLP to get yX as the hidden demonstration of g, then the significance of the word as the resemblance of yX is measured with the context vector of word level yv and obtain the normalised weight β using the Softmax function. Later the sentence vector V is determined by utilising the word annotation weighted sum by the consideration of the weights. Figure portrays the structure of HAN.

Sentence encoder: V is the sentence vector, which is used to derive the document vector. This is useful for encoding sentences in bidirectional GRU. (31) gu=GRU(V)(31) (32) gu=GRU(V)(32) The annotation of the sentence is obtained by concatenating gu and gu. The gu[gu=gu,gu] summarize neighbour sentences around the sentences.

Figure 3. Structure of HAN.

Figure 3. Structure of HAN.

Sentence attention: The significance of sentences is determined using the sentence-level context vector W. (33) yu=tanh(Ezgu+qz)(33) (34) βu=exp(yuTW)uexp(yuTW)(34) (35) Y=uβu.gu(35)

Here, the document vector is denoted as Y, which has the details of text. The output has the dimension of [1×22].

  • (ii) Training process of HAN: The HAN is trained with the IWTSO.

Solution encoding: It is useful in identifying the accurate optimal solution. Here, H=[1×κ] is the solution, where κ indicates the weight. For every iteration the weight factor is updated.

Fitness function: The error variation between the target and the actual output is computed to determine the fitness measure, which is defined as, (36) F=1kλ=1k[OλYλ]2(36) where, the actual value is indicated as O, and the classified outcome is represented as Y. Other steps of the IWTSO algorithm are discussed in section 3.4.

4. Results and discussion

The results and analysis of the IWTSO-based HAN regarding the performance measures are explained in this section.

4.1 Experimental setup

The accomplishment of the IWTSO-based HAN is done in the PYTHON tool with TensorFlow and Keras library, windows 10 OS, intel processor, and 2GB RAM. Table shows the experimentation parameters.

Table 2. Experimental parameters.

4.2 Dataset description

The dataset used for the evaluation includes the Reuter dataset (https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection), 20-Newsgroup dataset (https://www.kaggle.com/crawford/20-newsgroups), and real-time data. The dataset details have been provided in Table .

Table 3. Dataset details.

4.3 Evaluation metrics

The evaluation is done using accuracy, TPR, precision, and TNR metrics.

4.4 Performance analysis

This section explains the performance analysis made by the IWTSO-based HAN.

  1. Analysis with Reuter dataset

Figure portrays the analysis of the IWTSO-based HAN with the Reuter dataset. Figure (a) shows the accuracy analysis. Considering the training data of 80%, the accuracy computed by the IWTSO-based HAN with feature size 100 is 0.813, 200 is 0.854, 300 is 0.897, 400 is 0.935, and 500 is 0.954. The performance analysis made by the TPR is portrayed in Figure (b). For 90% of training data, the TPR measured by the IWTSO-based HAN by considering feature size 100, 200, 300, 400, and 500 is 0.846, 0.874, 0.924, 0.931, and 0.954. Figure (c) depicts the performance analysis by considering the TNR measure. At 80% of the training data, the IWTSO-based HAN achieved the TNR with feature size 100 is 0.799, 200 is 0.815, 300 is 0.894, 400 is 0.914, and 500 is 0.924.

Figure 4. Performance analysis with Reuter dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure 4. Performance analysis with Reuter dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (d) portrays the FNR analysis. When considering 80% of training data, the developed method achieved the FNR by considering the feature size 100 is 0.193, 200 is 0.165, 300 is 0.096, 400 is 0.075, and 500 is 0.059. The precision analysis is illustrated in Figure (e). At 80% of the training value, the precision computed by the IWTSO-based HAN with the feature size of 100 is 0.865, 200 feature size is 0.887, 300 feature size is 0.904, 400 feature size is 0.924, and 500 feature size is 0.948.

  1. Analysis with 20-Newsgroup dataset

Figure depicts the performance analysis made by the IWTSO-based HAN by considering the 20-Newsgroup dataset. Figure (a) depicts the accuracy analysis. At 60% of the training value, the accuracy measured by the developed method by considering feature size 100 is 0.760, 200 is 0.784, 300 is 0.824, 400 is 0.874, and 500 is 0.924. Figure (b) portrays the analysis measured with the TPR measure. At training data = 60%, the TPR of the IWTSO-based HAN with feature size 100 is 0.741, 200 is 0.774, 300 is 0.814, 400 is 0.864, and 500 is 0.913. The TNR analysis is given in Figure (c). When training value = 60%, the TNR measured by the IWTSO-based HAN with 100 feature size is 0.724, 200 feature size is 0.754, 300 feature size is 0.804, 400 feature size is 0.877, and 500 feature size is 0.905.

Figure 5. Performance analysis with 20-Newsgroup dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure 5. Performance analysis with 20-Newsgroup dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (d) portrays the FNR analysis. At 70% of training value, the FNR of IWTSO-based HAN with 100 feature size is 0.226, 200 feature size is 0.215, 300 feature size is 0.175, 400 feature size is 0.126, and 500 feature size is 0.076. The precision analysis is portrayed in Figure (e). At 80% of the training value, the precision computed by the IWTSO-based HAN with the feature size of 100 is 0.874, 200 feature size is 0.897, 300 feature size is 0.914, 400 feature size is 0.924, and 500 feature size is 0.945.

4.5 Comparative methods

The performance improvement of the IWTSO-based HAN is analyzed by considering the conventional approaches, like Improved Sine Cosine Algorithm (ISCA) (Belazzoug et al., Citation2020), Lion Fuzzy neural network (LFNN) (Ranjan & Prasad, Citation2018), NDM-BJO (Thirumoorthy & Muneeswaran, Citation2020), Hebb rule-based feature selection (HRFS) (Wang & Hong, Citation2019), Improved IWO-HAN (Misaghi & Yaghoobi, Citation2019), TSA-HAN (Kaur et al., Citation2020), recurrent neural network (RNN), Long short-term memory (LSTM), and support vector machine (SVM).

4.6 Comparative analysis

This section explains the comparative analysis made by the IWTSO-based HAN with three kinds of datasets.

  1. Analysis with Reuter dataset

Figure portrays the comparative analysis made with the Reuter dataset. Figure (a) depicts the accuracy analysis. When the training value is considered as 60%, accuracy of existing ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.701, 0.732, 0.745, 0.754, 0.764, 0.801, 0.788, 0.836, and 0.847, whereas the IWTSO-based HAN has the accuracy of 0.854, that shows the percentage of improvement with traditional methods, such as ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 18%, 14%, 13%, 12%, 11%, 6%, 8%, 2%, and 0.8%. The TPR analysis is depicted in Figure (b). By considering the training data as 90%, TPR determined using ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM and IWTSO-based HAN is 0.781, 0.795, 0.8012, 0.8098, 0.814, 0.865, 0.837, 0.888, 0.898, and 0.903 which reports the percentage of improvement with ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 14%, 12%, 11%, 10%, 10%, 4%, 7.3%, 1.7%, and 0.6%.

Figure 6. Analysis with Reuter dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure 6. Analysis with Reuter dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (c) depicts the analysis of the TNR metric. When increasing the training value to 80%, TNR measured by ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 0.764, 0.784, 0.799, 0.804, 0.813, 0.831, 0.887, 0.916, 0.926, and 0.897 that reports the performance improvement with ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 15%, 13%, 11%, 10%, 9%, 7%, 4.8%, 1.7%, and 0.6%. The FNR analysis is portrayed in Figure (d). At 60% of training value, FNR observed by the traditional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.2998, 0.2679, 0.2531, 0.2419, 0.2258, 0.2005, 0.215, 0.198, and 0.188, whereas IWTSO-based HAN achieved lower FNR of 0.1587, respectively. Figure (e) portrays the precision analysis. At the training data as 70%, precision determined by the ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.735, 0.760, 0.798, 0.813, 0.866, 0.890, 0.875, 0.908, and 0.915, whereas the IWTSO-based HAN has the precision of 0.935, that outcome the performance improvement with that of ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 21%, 19%, 15%, 13%, 7%, 5%, 6.4%, 2.9%, and 2.1%.

  1. Analysis with 20-Newsgroup dataset

Figure portrays the comparative analysis of the developed method with the 20-Newsgroup dataset. Figure (a) illustrates the accuracy analysis. At training data as 60%, accuracy measured by methods, like ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.712, 0.745, 0.7541, 0.7654, 0.780, 0.813, 0.798, 0.836, and 0.847, whereas the IWTSO-based HAN has the accuracy of 0.865. The TPR analysis is depicted in Figure (b). By considering the training data as 90%, TPR using ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM and IWTSO-based HAN is 0.821, 0.851, 0.860, 0.864, 0.875, 0.884, 0.879, 0.908, 0.916, and 0.924.

Figure 7. Analysis with 20-Newsgroup dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure 7. Analysis with 20-Newsgroup dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (c) depicts the TNR analysis. For 60% of the training value, TNR measured by the existing ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.732, 0.763, 0.775, 0.785, 0.799, 0.824, 0.809, 0.847, 0.858, whereas proposed IWTSO-based HAN achieved the TNR of 0.887 such that it reported the performance improvement with that of ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 17%, 14%, 13%, 11%, 10%, 7%, 8.8%, 4.5%, and 3.3%. The FNR analysis is depicted in Figure (d). At 60% of training value, FNR observed by the traditional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.2786, 0.2459, 0.2346, 0.2215, 0.2159, 0.1864, 0.198, 0.158, and 0.136, whereas IWTSO-based HAN has lower FNR of 0.1135, respectively. Figure (e) denotes the precision analysis. By considering the training data as 90%, precision measured using ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 0.769, 0.785, 0.835, 0.898, 0.914, 0.935, 0.927, 0.945, 0.949 and 0.954.

  1. Analysis with real-time data

Figure portrays the comparative analysis made with real-time data. Figure (a) depicts the accuracy analysis. When the training value is assumed as 60%, accuracy measured by conventional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.722, 0.752, 0.765, 0.775, 0.784, 0.823, 0.798, 0.847, and 0.865, whereas the IWTSO-based HAN has 0.874 accuracy, that shows the percentage of improvement by considering the existing ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 17%, 14%, 12%, 11%, 10%, 6%, 8.7%, 3.09%, and 1%. The TPR analysis is shown in Figure (b). By considering the training data as 60%, TPR computed by existing methods, like ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.732, 0.763, 0.775, 0.781, 0.791, 0.832,0.816, 0.858, and 0.865, whereas the IWTSO-based HAN achieved the TPR of 0.887 that reports the performance of improvement with traditional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 17%, 14%, 13%, 12%, 11%, 6%, 8%, 3.3%, and 2.5%.

Figure 8. Analysis with real-time data, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure 8. Analysis with real-time data, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (c) portrays the TNR analysis. By considering the 60% training value, TNR measured by the existing ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 0.741, 0.774, 0.784, 0.799, 0.801, 0.849, 0.816, 0.866, 0.877, and 0.894. The FNR analysis is depicted in Figure (d). At 60% of training value, FNR observed by the traditional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.268, 0.237, 0.225, 0.219, 0.209, 0.168, 0.188, 0.147, and 0.136 whereas IWTSO-based HAN achieved lower FNR of 0.113, respectively. Figure (e) depicts the analysis of the precision. For the training data as 90%, precision measured using ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 0.7765, 0.7941, 0.8413, 0.8941, 0.9248, 0.9471, 0.936, 0.953, 0.958, and 0.9641.

4.7 Comparative discussion

Table portrays the comparative discussion of the IWTSO-based HAN. By considering the Reuter dataset, the accuracy measured by the IWTSO-based HAN is 0.913, whereas the TPR and TNR achieved by the IWTSO-based HAN are 90.3% and 93.2%. With the 20-Newsgroup dataset, the accuracy achieved by methods like ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 81.3%, 84.1%, 85.6%, 86.5%, 87.1%, 91.3%, 88.7%, 91.7%, 91.9%, and 92.4%. The proposed method achieved better accuracy, TPR, and TNR of 92.4%, 92.4%, and 94.1% for the 20-Newsgroup dataset. With the real-time data, the accuracy evaluated by the IWTSO-based HAN is 93.4%, whereas the TPR and TNR values are measured as 92.1%, and 93.2%, respectively.

Table 4. Comparative discussion.

The reasons for the better performance of the IWTSO-based HAN are discussed as follows:

In the IWTSO-based HAN, the pre-processing is done by stop word removal and stemming processes, in which the unstructured format is changed to the structured text representation. Also, the behaviour and the jet propulsion of TSA are integrated with the weed behaviour of improved IWO and so the convergence rate of the optimisation is increased thus enabling to generate global best solution by eliminating the local optima. Moreover, to enhance the performance of classification, two different attention levels of HAN are used, such as, word level and sentence level. Thus, the performance of the IWTSO-based HAN is improved compared to other existing methods.

5. Conclusion

An effective classifier named HAN is developed for performing the text classification process. The IWTSO-based HAN involves different phases to classify the text document. At first, the input text data is pre-processed, then, the feature extraction acquires features associated with the text data. The feature selection process is employed to select the optimal features of data to enhance the classification performance. The HAN is employed to classify the text document such that HAN is trained with the IWTSO algorithm. The IWTSO-based HAN obtained higher performance in terms of accuracy, TPR, TNR, precision, and lower FNR of 0.924, 0.924, 0.941, 0.954, and 0.0758, respectively. Text classification is used in various fields, like data mining, artificial intelligence, information retrieval, NLP, etc. The major applications of the text classifications are spam detection in emails, language detection, sentiment analysis, speech recognition, topic labeling, and intent detection. However, the efficiency of the feature selection method is not evaluated, which may affect the accuracy of the model. The future dimension of research would be the consideration of publicly available larger datasets. Also, the performance of the implemented feature selection method will be evaluated with other filter-based feature selection approaches.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 20 Newsgroup dataset. Retrieved November, 2020, from https://www.kaggle.com/datasets/crawford/20-newsgroups.
  • Abdollahzadeh, B., Gharehchopogh, F. S., Khodadadi, N., & Mirjalili, S. (2022). Mountain gazelle optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems. Advances in Engineering Software, 174, 103282. doi:10.1016/j.advengsoft.2022.103282
  • Aldjanabi, W., Dahou, A., Al-qaness, M. A. A., Elaziz, M. A., Helmi, A. M., & Damaševičius, R. (2021). Arabic offensive and hate speech detection using a cross-corpora multi-task learning model. Informatics, 8(4), 69. doi:10.3390/informatics8040069
  • Anagnostopoulos, I., Anagnostopoulos, C., Loumos, V., & Kayafas, E. (2004). Classifying web pages employing a probabilistic neural network. IEEE Proceedings-Software, 151(3), 139–150. doi:10.1049/ip-sen:20040121
  • Azizi, M., Talatahari, S., & Gandomi, A. H. (2023). Fire Hawk optimizer: A novel metaheuristic algorithm. Artificial Intelligence Review, 56(1), 287–363. doi:10.1007/s10462-022-10173-w
  • Belazzoug, M., Touahria, M., Nouioua, F., & Brahimi, M. (2020). An improved sine cosine algorithm to select features for text categorization. Journal of King Saud University-Computer and Information Sciences, 32(4), 454–464. doi:10.1016/j.jksuci.2019.07.003
  • Borhani, M. (2020). Multi-label log-loss function using L-BFGS for document categorization. Engineering Applications of Artificial Intelligence, 91, 103623. doi:10.1016/j.engappai.2020.103623
  • Che, Y., & He, D. (2022). An enhanced seagull optimization algorithm for solving engineering optimization problems. Applied Intelligence, 52(11), 13043–13081. doi:10.1007/s10489-021-03155-y
  • Chen, R. C., & Hsieh, C. H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427–435. doi:10.1016/j.eswa.2005.09.079
  • Coban, O. (2022). A new modification and application of item response theory-based feature selection for different machine learning tasks. Concurrency and Computation: Practice and Experience, 34(26), doi:10.1002/cpe.7282
  • Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural networks, 10(5), 1048–1054. doi:10.1109/72.788645
  • ElAmine Chennafi, M., Bedlaoui, H., Dahou, A., & Al-qaness, M. A. A. (2022). Arabic Aspect-Based Sentiment Classification Using Seq2Seq Dialect Normalization and Transformers. Knowledge, 2(3), 388–401. doi:10.3390/knowledge2030022
  • Fan, Y., Zhang, W., Bai, J., Lei, X., & Li, K. (2023). Privacy-preserving deep learning on big data in cloud. China Communications.
  • Feng, F., Li, K. C., Yang, E., Zhou, Q., Han, L., Hussain, A., & Cai, M. (2023). A novel oversampling and feature selection hybrid algorithm for imbalanced data classification. Multimedia Tools and Applications, 82(3), 3231–3267. doi:10.1007/s11042-022-13240-0
  • Gasmi, K. (2022). Improving bert-based model for medical text classification with an optimization algorithm. In the proceeding of International Conference on Computational Collective Intelligence, 1653, 101–111.
  • Gharehchopogh, F. S., Maleki, I., & Dizaji, Z. A. (2021). Chaotic vortex search algorithm: metaheuristic algorithm for feature selection. Evolutionary Intelligence.
  • Goudjil, M., Koudil, M., Bedda, M., & Ghoggali, N. (2018). A novel active learning method using SVM for text classification. International Journal of Automation and Computing, 15(3), 290–298. doi:10.1007/s11633-015-0912-z
  • Günal, S., Ergin, S., Gülmezoğlu, M. B., & Gerek, ÖN. (2006). On feature extraction for spam e-mail detection. In International Workshop on Multimedia Content Representation, Classification and Security, Springer, Berlin, Heidelberg, 635–642.
  • Guzella, T. S., & Caminhas, W. M. (2009). A review of machine learning approaches to spam filtering. Expert Systems with Applications, 36(7), 10206–10222. doi:10.1016/j.eswa.2009.02.037
  • Han, D., Pan, N., & Li, K. C. (2022). A traceable and revocable ciphertext-policy attribute-based encryption scheme based on privacy protection. IEEE Transactions on Dependable and Secure Computing, 19(1), 316–327. doi:10.1109/TDSC.2020.2977646
  • Jin, L., Zhang, L., & Zhao, L. (2023). Feature selection based on absolute deviation factor for text classification. Information Processing and Management, 60(3).
  • Kaur, S., Awasthi, L. K., Sangal, A. L., & Dhiman, G. (2020). Tunicate Swarm algorithm: A new bio-inspired based metaheuristic paradigm for global optimization. Engineering Applications of Artificial Intelligence, 90, 103541. doi:10.1016/j.engappai.2020.103541
  • Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. StanfordInfoLab.
  • Kou, G., Yang, P., Peng, Y., Xiao, F., Chen, Y., & Alsaadi, F. E. (2020). Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Applied Soft Computing, 86.
  • Lim, H., & Kim, D. W. (2020). Generalized term similarity for feature selection in text classification using quadratic programming. Entropy, 22(4), 395. doi:10.3390/e22040395
  • Liu, G., & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 337, 325–338. doi:10.1016/j.neucom.2019.01.078
  • Liu, Y., Ju, S., Wang, J., & Su, C. (2020). A new feature selection method for text classification based on independent feature space search. Mathematical Problems in Engineering.
  • Liu, Y., Sun, C. J., Lin, L., Wang, X., & Zhao, Y. (2015). Computing semantic text similarity using rich features. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, 44–52.
  • Maragheh, H. K., Gharehchopogh, F. S., Majidzadeh, K., & Sangar, A. B. (2022). A new hybrid based on long short-term memory network with spotted hyena optimization algorithm for multi-label text classification. Mathematics, 10, 1–24.
  • Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2020). Deep learning based text classification: A comprehensive review. ACM Computing, 1(1), 1–43.
  • Misaghi, M., & Yaghoobi, M. (2019). Improved invasive weed optimization algorithm (IWO) based on chaos theory for optimal design of PID controller. Journal of Computational Design and Engineering, 6(3), 284–295. doi:10.1016/j.jcde.2019.01.001
  • Mohammadzadeh, H., & Gharehchopogh, F. S. (2021). Feature selection with binary symbiotic organisms search algorithm for Email spam detection. International Journal of Information Technology and Decision Making, 20(1), 469–515. doi:10.1142/S0219622020500546
  • Naruei, I., & Keynia, F. (2022). Wild horse optimizer: A new meta-heuristic algorithm for solving engineering optimization problems. Engineering with Computers, 38(S4), 3025–3056. doi:10.1007/s00366-021-01438-z
  • Parlak, B., & Uysal, A. K. (2023). A novel filter feature selection method for text classification: Extensive feature selector. Journal of Information Science, 49(1), 59–78. doi:10.1177/0165551521991037
  • Pedersen, B. P., Ifrim, G., Liboriussen, P., Axelsen, K. B., Palmgren, M. G., Nissen, P., Wiuf, C., & Pedersen, C. N. S. (2014). Large scale identification and categorization of protein sequences using structured logistic regression. PLOS ONE, 9(1), e85139. doi:10.1371/journal.pone.0085139
  • Ranjan, N. M., & Prasad, R. S. (2018). LFNN: Lion fuzzy neural network-based evolutionary model for text classification using context and sense based features. Applied Soft Computing, 71, 994–1008. doi:10.1016/j.asoc.2018.07.016
  • Reuters-21578 Text Categorization Collection Data Set. Retrieved November, 2020, from https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection.
  • Roy, P. K., Tripathy, A. K., Weng, T. H., & Li, K. C. (2023). Securing social platform from misinformation using deep learning. Computer Standards & Interfaces, 84, 103674. doi:10.1016/j.csi.2022.103674
  • Saeed, M. M., & Al Aghbari, Z. (2022). ARTC: Feature selection using association rules for text classification. Neural Computing and Applications, 34(24), 22519–22529. doi:10.1007/s00521-022-07669-5
  • Şahin, DÖ, & Kılıç, E. (2019). Two new feature selection metrics for text classification. Automatika, 60(2), 162–171. doi:10.1080/00051144.2019.1602293
  • Stein, R. A., Jaques, P. A., & Valiati, J. F. (2019). An analysis of hierarchical text classification using word embeddings. Information Sciences, 471, 216–232. doi:10.1016/j.ins.2018.09.001
  • Tanhaeean, M., Moghaddam, R. T., & Akbari, A. H. (2022). Boxing Match algorithm: a new meta-heuristic algorithm. Soft Computing, 26(24), 13277–13299. doi:10.1007/s00500-022-07518-6
  • Thirumoorthy, K., & Muneeswaran, K. (2020). Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification. Sadhana, 45(1), 1–13. doi:10.1007/s12046-020-01443-w
  • Vidyadhari, C. H., Sandhya, N., & Premchand, P. (2019). A semantic word processing using enhanced cat swarm optimization algorithm for automatic text clustering. Multimedia Research, 2(4), 23–32.
  • Wang, H., & Hong, M. (2019). Supervised Hebb rule based feature selection for text classification. Information Processing & Management, 56(1), 167–191. doi:10.1016/j.ipm.2018.09.004
  • Wang, L., Cao, Q., Zhang, Z., Mirjalili, S., & Zhao, W. (2022). Artificial rabbits optimization: A new bio-inspired meta-heuristic algorithm for solving engineering optimization problems. Engineering Applications of Artificial Intelligence, 114.
  • Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In the Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, Association for Computational Linguistics, 6382–6388.
  • Wu, D., Yang, R., & Shen, C. (2020). Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm. Journal of Intelligent Information Systems, 1–23.
  • Xue, J., & Shen, B. (2023). Dung beetle optimizer: A new meta-heuristic algorithm for global optimization. The Journal of Supercomputing, 79(7), 7305–7336. doi:10.1007/s11227-022-04959-6
  • Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480–1489.
  • Yu, B., & Zhu, D. H. (2009). Combining neural networks and semantic feature space for email classification. Knowledge-Based Systems, 22(5), 376–381. doi:10.1016/j.knosys.2009.02.009
  • Zhou, H., Li, X., Wang, C., & Ma, Y. (2022). A feature selection method based on term frequency difference and positive weighting factor. Data and Knowledge Engineering, 141, 102060. doi:10.1016/j.datak.2022.102060