Full article: Optimal feature selection and invasive weed tunicate swarm algorithm-based hierarchical attention network for text classification

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Through social media platforms and the internet, the world is becoming more and more connected, and producing enormous amounts of data. Also, the texts are collected from social media, newspapers, user reviews of products, company press releases, etc. The correctness of the classification is mainly dependent on the kind of words utilised in the corpus and the features utilised for classification. Hence, due to the increasing growth of text data on the Internet, the accurate organisation and management of text data has become a great challenge. Hence, in this research, an effective Invasive Weed Tunicate Swarm Optimization-based Hierarchical Attention Network (IWTSO-based HAN) is implemented for achieving categorisation of text. Here, the features are mined from the text and thereby the optimal features are acquired to perform the classification strategy. The incorporation of parametric features of each optimisation ensures the proposed method to increase the convergence of global solutions by improving the categorisation effectiveness. The proposed method obtained better performance for text classification with measures, like accuracy, True Positive Rate (TRP), True Negative Rate (TNR), precision, and False Negative Rate (FNR) with values of 92.4%, 92.4%, 94.1%, 95.4%, and 0.0758.

KEYWORDS:

1. Introduction

The text data is collected from various sources, like emails, social media, chats, insurance claims, tickets, web data, questions, and answers from customer services and user reviews. The text is a specifically rich source of data but acquiring insights from the text results in a time-consuming and complex task due to the nature of unstructured data (Minaee et al., Citation2020) (Fan et al., Citation2023). In general, the text documents consist of the huge dimensional noisy, and irrelevant terms. The presence of noise and the irrelevant factors direct to complexity of computation and generate noise in the learning procedure and hence it results in poor performance for the classifiers. Hence, it is required to process the text documents before performing text classification for minimising the dimension and eliminate noisy and irrelevant terms (Wang & Hong, Citation2019). The electronic text processing model is everywhere in recent decades with several files and formed more substantial issues to solve the user requirements information. Also, it is one of the methods is to classify the textual data automatically, so the users can retrieve the data easily manipulate and extract the information for recognising the patterns and thereby creating information. Electronic files are organised into various categories that become a growing interest in different organisations and people (Koller & Sahami, Citation1997; Stein et al., Citation2019). Text classification is the area that produces a solution for the issues and utilises a combination of knowledge fields, like data mining, natural language processing (NLP), information retrieval, machine learning etc. It is commonly measured a supervised machine-learning issue in such a way that the approach is trained from various examples for classifying the hidden piece of text (Vidyadhari et al., Citation2019; Stein et al., Citation2019; Kou et al., Citation2020). Communication is developed by the internet so that data sharing is possible through the internet. The text files present on the Internet hold huge commercial value. Mine the useful data hidden in the text gains more benefit from the text classification as it intends for assigning the text files by unidentified classes into a fixed number of classes with certain classifiers (Wang & Hong, Citation2019). Hence, the text classification mechanism is utilised to organise text data suitably. Text classification is a supervised learning model that allocates text documents to specified categories (Thirumoorthy & Muneeswaran, Citation2020). Text classification is otherwise termed text categorisation such that it is the tags assigning procedure or labels to textual units, like documents, paragraphs, queries, and sentences. Text classification is achieved either by automatic labeling or by manual annotation. Deep learning (Roy et al., Citation2023) and machine learning methods are commonly used for effective text classification. One of the machine learning approaches, named Structured Logistic Regression (SLR) is used in the context of text classification. Also, it is used in bioinformatics large-scale classification issues (Pedersen et al., Citation2014). With the increasing growth of text data in industrial appliances, text classification in an automatic way becomes more important (Minaee et al., Citation2020). Accordingly, text classification is present in different appliances, like complaint feedback of specific services in the social network, audio music genres, and TV programmes classification in the entertainment data selection, identification is specific topics, detection of accidents for disease outbreaks, accidents in the industry oil, analysis of essay answers, and the materials classification of various reading levels for the students concerning the education system, filtering of e-mail (Drucker et al., Citation1999; Guzella & Caminhas, Citation2009; Mohammadzadeh & Gharehchopogh, Citation2021), classification of pediatric asthma in the biomedicine (Wang & Hong, Citation2019; Pedersen et al., Citation2014), e-mail classification (Günal et al., Citation2006; Yu & Zhu, Citation2009), topic and author gender identification, sentiment classification (ElAmine Chennafi et al., Citation2022), speech detection (Aldjanabi et al., Citation2021) and web page classification (Chen & Hsieh, Citation2006; Anagnostopoulos et al., Citation2004; Thirumoorthy & Muneeswaran, Citation2020;Han et al., Citation2022).

The indivisible entity of text quantity is termed as the word or feature or term. The unstructured content credentials are specified as a feature vector in the text classification domain (Feng et al., Citation2023). The noise feature does not contain any facts regarding the category. It is very complex to conclude from the noise feature. For example, when the feature or term is present in each text document, then the feature or term will not be useful for the classification (Thirumoorthy & Muneeswaran, Citation2020). Accordingly, feature acquisition is a significant module utilised in the text classification process such that it aims to find the subset of a feature that contains a smaller number of relevant features and is more sufficient for maintaining or increasing the classification effectiveness (Belazzoug et al., Citation2020). In general, the feature chosen mechanism is employed to minimise the data dimension and error rate of the classifier and to decrease computational time (Thirumoorthy & Muneeswaran, Citation2020; Parlak & Uysal, Citation2023; Şahin & Kılıç, Citation2019; Saeed & Al Aghbari, Citation2022). It selects the subset of features from the entire features by various analyses (Liu et al., Citation2020; Zhou et al., Citation2022; Jin et al., Citation2023; Coban, Citation2022). The feature selection approaches are specifically found in different domains, like document clustering, text categorisation, computer vision, bioinformatics, image processing, and industrial appliances (Belazzoug et al., Citation2020).

1.1. Problem statement

Text classification is the method of categorising the set of text files into various classes from the predefined set. Text classification is an essential method to process and manage a huge amount of files in digital forms. Commonly, text classification is useful for extracting and summarising information and retrieval of text. There are many text classification methods implemented in previous studies. The problems identified in the previous studies are discussed below:

The computation time was increased owing to the huge dimension of the information.
The feature space with huge dimensionality has noise, and redundant and unrelated features, which affect the correctness of the model.
Some feature selection methods did not effectively select the important features, so the error rate of the classifier was improved.

Hence, in this research, IWTSO-based HAN is developed for the efficient classification of text documents.

In this research, a text classification framework is designed with HAN to classify the text documents of sentences. Here, the input text is passed to the pre-processing phase, and then the text data is subjected to the feature extraction module. Accordingly, the acquired features are passed to the feature selection phase in such a way that the best features are selected by employing the IWTSO. Then, in the classification module, the HAN is used to classify the text data in such a way that the HAN is trained by the IWTSO algorithm. Moreover, the IWTSO is formed by the integration of improved Invasive Weed Optimization (IWO) (Misaghi & Yaghoobi, Citation2019) and Tunicate Swarm Algorithm (TSA) (Kaur et al., Citation2020), respectively.

Major contributions of the research:

IWTSO-based HAN: An effective text classification framework is designed with HAN to classify the text documents of sentences. The IWTSO-based HAN effectively achieved better results using fitness determination.
The IWTSO algorithm is formed by integrating the IWO and TSA, which is used for the effective training of the HAN classifier.

The remaining parts of the manuscript are Section 2 presents the discussion of various text classification aproaches, and Section 3 explains the IWTSO-based HAN. Section 4 discussed the results of the IWTSO-based HAN, and the research is concluded in Section 5.

2. Literature survey

For the effective selection of features, Belazzoug et al. (Citation2020) established an improved sine cosine algorithm (ISCA). The novel search space regions were identified with the help of this algorithm and generated the best solution. The solutions considered were the better solution location and search space location. Here, the efficiency was improved and removed the premature convergence. The feature selection issues were solved, but it did not apply for some complicated issues. Lim and Kim (Citation2020) developed a quadratic programming-based model to compute the optimal balance between two dependencies for selecting the features. It considered the similarity measure for eliminating redundant terms. Here, the similarity measure was computed using the mutual information and ranking method. It is not limited to semantically related terms. It increased the categorisation accuracy, but it suffered from the requirements of large processing time. Thirumoorthy and Muneeswaran (Citation2020) developed a normalised difference measure-binary Jaya optimization algorithm (NDM-BJO) for acquiring the subset of best features. Here, the error rate was considered for measuring fitness. It reduced the dimension of feature space and increased the classification accuracy. Wang and Hong (Citation2019) developed a Hebb rule-based feature selection model (HRFS). It depends on the neural synapse scheme and finds the discriminative terms with the Hebb rules. It reduced the time and programming complexity, but it consumed more time for the optimisation procedure to converge.

Goudjil et al. (Citation2018) developed a support vector machine (SVM) model to solve the high dimensional classification issues. It used the active learning approach for text classification. It minimised labeling effort by selecting the samples that were labeled. It used posterior probability for selecting the informative samples. It increased the classification accuracy, but it needed more labeled samples for training the classifier. Ranjan and Prasad (Citation2018) introduced a lion fuzzy neural network (LFNN) for accomplishing text categorisation. It considered the dynamic database to achieve the classification process. Here, the feature was mined from the words for reducing the dimension of the search space, but the error rate was not reduced. Liu et al. (Citation2020) developed a relative document term frequency difference (RDTFD) for selecting the features. This method was used to partition the features of total text files into small feature sets based on the capability of features for discriminating the negative and the positive samples. It reduced the redundancy of features and improved the performance of categorisation. It did not increase the running speed. Borhani (Citation2020) developed an artificial neural network for text classification. It used a fast text classifier using an updating formula to tune the neural network. However, discriminative features were used for text mining. It discovered the knowledge and achieved better performance in mining the information. It increased the performance but resulted in high computational complexity. Gasmi (Citation2022) implemented a Bidirectional Encoder Representation from the Transformers (BERT) model based on the optimal deep learning scheme. Here, the parameter selection of deep Learning was done by Particle Swarm Optimization (PSO). The prediction of matching response was done by the k-Nearest Neighbors algorithm (KNN). The accuracy of this model was high, but it had complexity issues. Maragheh et al. (Citation2022) implemented the Spotted Hyena Optimizer (SHO)-Long Short-Term Memory (SHO-LSTM) approach for text classification. Here, for word embedding the Skip-gram approach was used. The weights of LSTM were optimised by the SHO for improving the correctness of the approach. It had better convergence capability, but the parameter optimisation was not that effective.

2.1 Review on meta-heuristics algorithms

The recent meta-heuristics optimisation algorithms are reviewed based on the advantages and disadvantages and the details are tabularised in Table .

Table 1. Review on recent meta-heuristics algorithms.

Download CSV Display Table

2.2 Research gaps

The quadratic programming model designed by (Lim & Kim, Citation2020) minimises the number of redundant terms chosen and increases the classification accuracy. However, this approach considers a greater number of dependencies between the terms, and it increases the processing time such that it results in a major limitation.
In the process of classification mechanism, eliminating the noise from the large volume of text documents, like redundant and irrelevant features results in a complex task. Hence, it is required to use dimension reduction schemes for solving such issues, as feature selection and feature extraction (Liu et al., Citation2020).
The machine learning and deep learning method obtained higher accuracy in the sentiment analysis and topic classification, but the higher performance often depends on the quality and the dimension of training samples that is often difficult to collect (Wei & Zou, Citation2019).
Much traditional text classification research concentrates only on a certain type of sentence or phrase. However, these techniques rely on target words or target sentences for solving text classification issues without the consideration of the relationship between each word (Liu & Guo, Citation2019).
Unlike documents of paragraphs, the short texts are unclear as it does not contain sufficient context information that results in a major issue for classification.

3. Proposed IWTSO-based HAN

The IWTSO-based HAN is devised in this research for classifying the text. Here, the best features are acquired by employing IWTSO. Finally, the classification of text process is carried out with HAN in such a way that it is trained by the IWTSO that is formed by merging improved IWO (Misaghi & Yaghoobi, Citation2019) with TSA (Kaur et al., Citation2020). Figure portrays the schematic view of the IWTSO-based HAN.

Figure 1. Schematic view of IWTSO-based HAN.

3.1 Acquisition of text data

Assume the dataset as $D$ with $n$ count of text documents and is represented as, (1) $D = {D_{1}, D_{2}, \dots, D_{i}, \dots, D_{n}}; 1 \leq i \leq n$ (1) where, $D$ denotes dataset, $D_{i}$ indicates text data available at $i^{t h}$ dataset index, and $n$ signifies the overall text files. For the Reuter dataset the value of $n = 19043$ , whereas for the 20-Newsgroup dataset $n = 19997$ , and with real-time data, the total number of documents is specified as $n = 5000$ , respectively. The input text data $D_{i}$ is used to execute the classification process.

3.2 Pre-processing

$D_{i}$ is subjected to the pre-processing, which is done by employing stop word removal and stemming processes. It is an important step in information retrieval and text mining due to unstructured data. This step aims to remove the noise from text data. Its purpose is to clean before processing the classification task. At this phase, the dimension of data can be reduced either through missing value, duplication, or minimising the overall features. Here, the data that is in the unstructured format is changed to the structured text representation. It is the procedure of cleaning and preparing the data text for classification.

Stop word removal: Stop word is part of natural language that does not contain any meaning in the text processing system. The main general words in the text documents are articles, pronouns, and prepositions. Words that do not provide meaning to documents are treated as stop words. These words are not assumed as keywords in text applications hence it is required to eliminate such words from documents. It removes the frequent and common words that do not have any important influence on the sentence. The stop words considered are, but, also, to, have, and can.

Stemming: It is the process to acquire the root or base of a word by eliminating prefixes and suffixes. It reduces the word variants to the root kind. The library used for stemming is Porter Stemmer. The outcome of this phase is signified as $A$ with $[U \times V]$ size.

3.3 Extraction of features from text data

The extracted features are explained below:

Wordnet-based feature: It is a commonly used lexical resource for NLP tasks. This is the network of concepts, which is in the word nodes form. These word nodes are organised by the semantic associations among the words depending on their sense. The semantic relation is nothing but the pointers among the synsets. This is specifically utilised to find synsets (Liu et al., Citation2015). Accordingly, the synsets identified from the text data are specified as $f_{1}$ , respectively.

Co-occurrence-based feature: The use of term sets or item sets in the documents is termed as co-occurrences of terms. It means the frequent occurrence of from the text corpus. It is specified as, (2) $f_{2} = \frac{R_{i t}}{Z_{t}}$ (2) Here, $R_{i t}$ denotes the words’ $i$ and $t$ co-occurrence frequency, and $Z_{t}$ indicates the word $t$ frequency. The co-occurrence-based features acquired from text data are specified as $f_{2}$ .

TF-IDF: It contains two parts, namely TF and IDF. Accordingly, TF measures the individual words in the text, and IDF signifies the frequency of the word that is present in the text (Wu et al., Citation2020). (3) $T F = \frac{n (a)}{n}$ (3) where, $n (a)$ indicates the number of occurrences of entries $a$ in the class, and $n$ denotes the entire entries.

The inverse text frequency in IDF is specified as, (4) $I D F (a) = \log \frac{N + 1}{N (a) + 1} + 1$ (4) where, $N$ denotes the corpus overall texts, and $N (a)$ indicates the total number of texts that contains $a$ word in the corpus. The TF-IDF is represented as, (5) $f_{3} = T F - I D F (a) = T F \times I D F (a)$ (5) The extracted feature is specified as $f$ , which is formed by $f = {f_{1}, f_{2}, f_{3}}$ .

3.4 Feature selection by IWTSO algorithm

After the extraction of features from the text data, the unique and important features to be selected using the IWTSO. The selection of important features from the text data utilised to increase the categorisation correctness as the redundant and duplicate data can be removed. The IWO is the population-based algorithm. In general, the plant is represented as a weed that unintentionally grows in the environment. It is very effective in converging to the optimum solutions with the essential features, like competition, growth, and seeding in the weed colony. The basic characteristic features, like reproduction, spectral spread, and competitive deprivation are employed for the simulation of the habitat behaviour of the weeds. TSA is a bio-inspired algorithm. The swarm behaviour and the jet propulsion of TSA are integrated with the weed behaviour of improved IWO and so the convergence rate of the optimisation is increased thus enabling to generate global best solution by eliminating the local optima. The size of the selected feature $X$ is $[U \times V]$ . In the Reuter dataset 19043 text documents are totally presents, and the size of selected features is indicated by $[19043 \times 5]$ , whereas the 20-Newsgroup dataset has 19997 documents and the dimension of selected features are $[19997 \times 5]$ . Similarly, 5000 files are in the real-time data and the dimension of features is $[5000 \times 5]$ .

Solution encoding: This is the solution vector representation, in which the selected optimal feature is indicated as $X$ , whereas $X < f$ , respectively. Figure portrays the solution encoding.

Figure 2. Solution encoding.

Fitness measure: It is a process of computing optimal features among a set of features by te consideration of the accuracy measure. The equation for this is shown in Eq. (37).

Algorithmic procedure of IWTSO

Initialisation: Let us define $H$ as the weeds’ population in the solution space, and $H_{b e s t}$ is the finest position of weeds.
Fitness computation: It is employed to discover the best solution in the selection of feature selection process in such a way that it is determined using the actual and the target values difference.
Update solution: The location update solution is specified as, (6) $\begin{aligned} H_{l}^{s + 1} = η (s) H_{l}^{s} + H_{b e s t} - H_{l}^{s} \end{aligned}$ (6) (7) $\begin{aligned} H_{l}^{s + 1} = H_{l}^{s} (η (s) - 1) + H_{b e s t} \end{aligned}$ (7)

The standard equation of TSA is,

(8)

\begin{aligned} H_{l}^{s + 1} = \vec{J} + \vec{B} . \vec{G} \end{aligned}

(8)

(9)

\begin{aligned} H_{l}^{s + 1} = \vec{J} + \vec{B} | \vec{J} - r_{a n d} . H_{l}^{s} | \end{aligned}

(9) Let us assume,

\vec{J} > H_{l}^{s}

(10)

\begin{aligned} H_{l}^{s + 1} & = \vec{J} + \vec{B} (\vec{J} - r_{a n d} . H_{l}^{s}) \end{aligned}

(10)

(11)

\begin{aligned} H_{l}^{s + 1} & = \vec{J} (1 + \vec{B}) - \vec{B} r_{a n d} H_{l}^{s} \end{aligned}

(11)

(12)

\begin{aligned} \vec{J} & = \frac{H_{l}^{s + 1} + \vec{B} r_{a n d} H_{l}^{s}}{1 + \vec{B}} \end{aligned}

(12)

As $\vec{J}$ is the best TSA search agent, which is substituted in $H_{b e s t}$ of improved IWO. (13) $\begin{aligned} H_{l}^{s + 1} = H_{l}^{s} (η (s) - 1) + \frac{H_{l}^{s + 1} + \vec{B} r_{a n d} H_{l}^{s}}{1 + \vec{B}} \end{aligned}$ (13) (14) $\begin{aligned} H_{l}^{s + 1} = H_{l}^{s} (η (s) - 1) + \frac{H_{l}^{s + 1}}{1 + \vec{B}} + \frac{\vec{B} r_{a n d} H_{l}^{s}}{1 + \vec{B}} \end{aligned}$ (14) (15) $\begin{aligned} H_{l}^{s + 1} - \frac{H_{l}^{s + 1}}{1 + \vec{B}} = H_{l}^{s} (η (s) - 1) + \frac{\vec{B} r_{a n d} H_{l}^{s}}{1 + \vec{B}} \end{aligned}$ (15) (16) $\begin{aligned} H_{l}^{s + 1} (1 - \frac{1}{1 + \vec{B}}) = H_{l}^{s} (η (s) - 1) + \frac{\vec{B} r_{a n d} H_{l}^{s}}{1 + \vec{B}} \end{aligned}$ (16) (17) $\begin{aligned} H_{l}^{s + 1} (\frac{1 + \vec{B} - 1}{1 + \vec{B}}) = H_{l}^{s} (η (s) - 1) + \frac{\vec{B} r_{a n d} H_{l}^{s}}{1 + \vec{B}} \end{aligned}$ (17) (18) $\begin{aligned} H_{l}^{s + 1} = \frac{1 + \vec{B}}{\vec{B}} [H_{l}^{s} (η (s) - 1) + \frac{\vec{B} r_{a n d} H_{l}^{s}}{1 + \vec{B}}] \end{aligned}$ (18) (19) $\begin{aligned} H_{l}^{s + 1} = \frac{1 + \vec{B}}{\vec{B}} H_{l}^{s} (η (s) - 1) + r_{a n d} H_{l}^{s} \end{aligned}$ (19) where, (20) $\begin{aligned} η (s) = {(\frac{S - s}{S})}^{m} (η_{i n i t i a l} - η_{f i n a l}) + η_{f i n a l} x (s) \end{aligned}$ (20) (21) $\begin{aligned} \vec{B} = \frac{\vec{C}}{\vec{I}} \end{aligned}$ (21) (22) $\begin{aligned} \vec{C} = b_{2} + b_{3} - \overset{⇀}{K} \end{aligned}$ (22) (23) $\begin{aligned} \vec{K} = 2. b_{1} \end{aligned}$ (23) (24) $\begin{aligned} \vec{I} = [Q_{min} + b_{1} . Q_{max} - Q_{min}] \end{aligned}$ (24) Here, $\vec{B}$ denotes the vector, $\vec{C}$ implies gravity force, $\vec{I}$ signifies social force between the search agents, $\vec{K}$ indicates water flow advection, $x (s)$ represents chaotic mapping, $r_{a n d} b_{1}$ , $b_{2}$ , and $b_{3}$ are a random number with the interval of $[0, 1]$ . Here, $Q_{min}$ is set to 1, and $Q_{max}$ is set to 4, respectively.

(i) (iv) Feasibility evaluation: The factor of fitness is computed for every result in such a way that the result with the best fitness value is declared as the best result.
(ii) (v) Termination: The aforementioned steps are repeated until the best result is attained. Algorithm 1 represents the pseudo-code of the IWTSO.

Algorithm 1. Pseudo code of IWTSO

Display Table

3.5 Text classification using proposed IWTSO-based HAN

After selecting the features, the text categorisation is achieved by employing the HAN. Text classification is the fundamental task in NLP. The key goal of text categorisation is to allocate the labels to text. The advantage of using HAN for text classification is to capture two basic insights regarding the document structure. First, as the document has a hierarchical structure, the document representation is constructed by modeling the sentence representation and then grouping them to text representation. Second, it is noted that different sentences and texts in the documents are separately informative. The importance of sentences in the text is generally context-dependent, that is same sentence or text is separately important in various contexts. To enhance the performance of categorisation, the HAN includes two different levels of attention, like word level and sentence level.

Structure of HAN: The structure of HAN is composed of different parts (Yang et al., Citation2016). The input taken by the classifier has the dimension of $[40 \times 50]$ , whereas the result generated by the Bidirectional layer has the size of $[40 \times 100]$ . However, the attention layer processes the data with dimension $[40 \times 100]$ and forms the result data with $[1 \times 100]$ size.

Word encoder: Here, the input feature $X$ is embedded in vectors utilising the embedding matrix $E$ . The bidirectional gated recurrent unit (GRU) consists of a forward GRU, in which the reading has been done from first to last, and in the backward GRU the reading has been done from last to first. (25) $\begin{aligned} L = E . X; X \in [1, \dots, M] \end{aligned}$ (25) (26) $\begin{aligned} \vec{g} = \vec{G R U} (L), X \in [1, M] \end{aligned}$ (26) (27) $\begin{aligned} \vec{g} = \vec{G R U} (L), X \in [M, 1] \end{aligned}$ (27)

Here, annotation is obtained for the given feature $X$ by integrating the forward and backward hidden state $\overset{\leftarrow}{g}$ such that $g = [\vec{g}, \overset{\leftarrow}{g}]$ .

Word attention: All the words do not equally contribute to the demonstration of sentence meaning. The significant words in the sentence are extracted by using this. The demonstration of informative features is merged for forming the sentence vector. (28) $\begin{aligned} y_{X} = \tanh (E . g + q) \end{aligned}$ (28) (29) $\begin{aligned} β = \frac{\exp (y_{X}^{T} y_{v})}{\sum_{L} \exp (y_{X}^{T} y_{v})} \end{aligned}$ (29) (30) $\begin{aligned} V = β . g \end{aligned}$ (30)

At first, word footnote $g$ is passed to the one-layer MLP to get $y_{X}$ as the hidden demonstration of $g$ , then the significance of the word as the resemblance of $y_{X}$ is measured with the context vector of word level $y_{v}$ and obtain the normalised weight $β$ using the Softmax function. Later the sentence vector $V$ is determined by utilising the word annotation weighted sum by the consideration of the weights. Figure portrays the structure of HAN.

Sentence encoder: $V$ is the sentence vector, which is used to derive the document vector. This is useful for encoding sentences in bidirectional GRU. (31) $\begin{aligned} \vec{g_{u}} = \vec{G R U} (V) \end{aligned}$ (31) (32) $\begin{aligned} \overset{\leftarrow}{g_{u}} = \overset{\leftarrow}{G R U} (V) \end{aligned}$ (32) The annotation of the sentence is obtained by concatenating $\vec{g_{u}}$ and $\overset{\leftarrow}{g_{u}}$ . The $g_{u} [g_{u} = \vec{g_{u}}, \overset{\leftarrow}{g_{u}}]$ summarize neighbour sentences around the sentences.

Figure 3. Structure of HAN.

Sentence attention: The significance of sentences is determined using the sentence-level context vector $W$ . (33) $\begin{aligned} y_{u} = \tanh (E_{z} g_{u} + q_{z}) \end{aligned}$ (33) (34) $\begin{aligned} β_{u} = \frac{\exp (y_{u}^{T} W)}{\sum_{u} \exp (y_{u}^{T} W)} \end{aligned}$ (34) (35) $\begin{aligned} Y = \sum_{u} β_{u} . g_{u} \end{aligned}$ (35)

Here, the document vector is denoted as $Y$ _, which has the details of text. The output has the dimension of $[1 \times 22]$ .

(ii) Training process of HAN: The HAN is trained with the IWTSO.

Solution encoding: It is useful in identifying the accurate optimal solution. Here, $H = [1 \times κ]$ is the solution, where $κ$ indicates the weight. For every iteration the weight factor is updated.

Fitness function: The error variation between the target and the actual output is computed to determine the fitness measure, which is defined as, (36) $F = \frac{1}{k} \sum_{λ = 1}^{k} {[O_{λ} - Y_{λ}]}^{2}$ (36) where, the actual value is indicated as $O$ , and the classified outcome is represented as $Y$ . Other steps of the IWTSO algorithm are discussed in section 3.4.

4. Results and discussion

The results and analysis of the IWTSO-based HAN regarding the performance measures are explained in this section.

4.1 Experimental setup

The accomplishment of the IWTSO-based HAN is done in the PYTHON tool with TensorFlow and Keras library, windows 10 OS, intel processor, and 2GB RAM. Table shows the experimentation parameters.

Table 2. Experimental parameters.

Download CSV Display Table

4.2 Dataset description

The dataset used for the evaluation includes the Reuter dataset (https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection), 20-Newsgroup dataset (https://www.kaggle.com/crawford/20-newsgroups), and real-time data. The dataset details have been provided in Table .

Table 3. Dataset details.

Download CSV Display Table

4.3 Evaluation metrics

The evaluation is done using accuracy, TPR, precision, and TNR metrics.

4.4 Performance analysis

This section explains the performance analysis made by the IWTSO-based HAN.

Analysis with Reuter dataset

Figure portrays the analysis of the IWTSO-based HAN with the Reuter dataset. Figure (a) shows the accuracy analysis. Considering the training data of 80%, the accuracy computed by the IWTSO-based HAN with feature size 100 is 0.813, 200 is 0.854, 300 is 0.897, 400 is 0.935, and 500 is 0.954. The performance analysis made by the TPR is portrayed in Figure (b). For 90% of training data, the TPR measured by the IWTSO-based HAN by considering feature size 100, 200, 300, 400, and 500 is 0.846, 0.874, 0.924, 0.931, and 0.954. Figure (c) depicts the performance analysis by considering the TNR measure. At 80% of the training data, the IWTSO-based HAN achieved the TNR with feature size 100 is 0.799, 200 is 0.815, 300 is 0.894, 400 is 0.914, and 500 is 0.924.

Figure 4. Performance analysis with Reuter dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (d) portrays the FNR analysis. When considering 80% of training data, the developed method achieved the FNR by considering the feature size 100 is 0.193, 200 is 0.165, 300 is 0.096, 400 is 0.075, and 500 is 0.059. The precision analysis is illustrated in Figure (e). At 80% of the training value, the precision computed by the IWTSO-based HAN with the feature size of 100 is 0.865, 200 feature size is 0.887, 300 feature size is 0.904, 400 feature size is 0.924, and 500 feature size is 0.948.

Analysis with 20-Newsgroup dataset

Figure depicts the performance analysis made by the IWTSO-based HAN by considering the 20-Newsgroup dataset. Figure (a) depicts the accuracy analysis. At 60% of the training value, the accuracy measured by the developed method by considering feature size 100 is 0.760, 200 is 0.784, 300 is 0.824, 400 is 0.874, and 500 is 0.924. Figure (b) portrays the analysis measured with the TPR measure. At training data = 60%, the TPR of the IWTSO-based HAN with feature size 100 is 0.741, 200 is 0.774, 300 is 0.814, 400 is 0.864, and 500 is 0.913. The TNR analysis is given in Figure (c). When training value = 60%, the TNR measured by the IWTSO-based HAN with 100 feature size is 0.724, 200 feature size is 0.754, 300 feature size is 0.804, 400 feature size is 0.877, and 500 feature size is 0.905.

Figure 5. Performance analysis with 20-Newsgroup dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (d) portrays the FNR analysis. At 70% of training value, the FNR of IWTSO-based HAN with 100 feature size is 0.226, 200 feature size is 0.215, 300 feature size is 0.175, 400 feature size is 0.126, and 500 feature size is 0.076. The precision analysis is portrayed in Figure (e). At 80% of the training value, the precision computed by the IWTSO-based HAN with the feature size of 100 is 0.874, 200 feature size is 0.897, 300 feature size is 0.914, 400 feature size is 0.924, and 500 feature size is 0.945.

4.5 Comparative methods

The performance improvement of the IWTSO-based HAN is analyzed by considering the conventional approaches, like Improved Sine Cosine Algorithm (ISCA) (Belazzoug et al., Citation2020), Lion Fuzzy neural network (LFNN) (Ranjan & Prasad, Citation2018), NDM-BJO (Thirumoorthy & Muneeswaran, Citation2020), Hebb rule-based feature selection (HRFS) (Wang & Hong, Citation2019), Improved IWO-HAN (Misaghi & Yaghoobi, Citation2019), TSA-HAN (Kaur et al., Citation2020), recurrent neural network (RNN), Long short-term memory (LSTM), and support vector machine (SVM).

4.6 Comparative analysis

This section explains the comparative analysis made by the IWTSO-based HAN with three kinds of datasets.

Analysis with Reuter dataset

Figure portrays the comparative analysis made with the Reuter dataset. Figure (a) depicts the accuracy analysis. When the training value is considered as 60%, accuracy of existing ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.701, 0.732, 0.745, 0.754, 0.764, 0.801, 0.788, 0.836, and 0.847, whereas the IWTSO-based HAN has the accuracy of 0.854, that shows the percentage of improvement with traditional methods, such as ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 18%, 14%, 13%, 12%, 11%, 6%, 8%, 2%, and 0.8%. The TPR analysis is depicted in Figure (b). By considering the training data as 90%, TPR determined using ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM and IWTSO-based HAN is 0.781, 0.795, 0.8012, 0.8098, 0.814, 0.865, 0.837, 0.888, 0.898, and 0.903 which reports the percentage of improvement with ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 14%, 12%, 11%, 10%, 10%, 4%, 7.3%, 1.7%, and 0.6%.

Figure 6. Analysis with Reuter dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (c) depicts the analysis of the TNR metric. When increasing the training value to 80%, TNR measured by ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 0.764, 0.784, 0.799, 0.804, 0.813, 0.831, 0.887, 0.916, 0.926, and 0.897 that reports the performance improvement with ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 15%, 13%, 11%, 10%, 9%, 7%, 4.8%, 1.7%, and 0.6%. The FNR analysis is portrayed in Figure (d). At 60% of training value, FNR observed by the traditional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.2998, 0.2679, 0.2531, 0.2419, 0.2258, 0.2005, 0.215, 0.198, and 0.188, whereas IWTSO-based HAN achieved lower FNR of 0.1587, respectively. Figure (e) portrays the precision analysis. At the training data as 70%, precision determined by the ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.735, 0.760, 0.798, 0.813, 0.866, 0.890, 0.875, 0.908, and 0.915, whereas the IWTSO-based HAN has the precision of 0.935, that outcome the performance improvement with that of ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 21%, 19%, 15%, 13%, 7%, 5%, 6.4%, 2.9%, and 2.1%.

Analysis with 20-Newsgroup dataset

Figure portrays the comparative analysis of the developed method with the 20-Newsgroup dataset. Figure (a) illustrates the accuracy analysis. At training data as 60%, accuracy measured by methods, like ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.712, 0.745, 0.7541, 0.7654, 0.780, 0.813, 0.798, 0.836, and 0.847, whereas the IWTSO-based HAN has the accuracy of 0.865. The TPR analysis is depicted in Figure (b). By considering the training data as 90%, TPR using ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM and IWTSO-based HAN is 0.821, 0.851, 0.860, 0.864, 0.875, 0.884, 0.879, 0.908, 0.916, and 0.924.

Figure 7. Analysis with 20-Newsgroup dataset, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (c) depicts the TNR analysis. For 60% of the training value, TNR measured by the existing ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.732, 0.763, 0.775, 0.785, 0.799, 0.824, 0.809, 0.847, 0.858, whereas proposed IWTSO-based HAN achieved the TNR of 0.887 such that it reported the performance improvement with that of ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 17%, 14%, 13%, 11%, 10%, 7%, 8.8%, 4.5%, and 3.3%. The FNR analysis is depicted in Figure (d). At 60% of training value, FNR observed by the traditional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.2786, 0.2459, 0.2346, 0.2215, 0.2159, 0.1864, 0.198, 0.158, and 0.136, whereas IWTSO-based HAN has lower FNR of 0.1135, respectively. Figure (e) denotes the precision analysis. By considering the training data as 90%, precision measured using ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 0.769, 0.785, 0.835, 0.898, 0.914, 0.935, 0.927, 0.945, 0.949 and 0.954.

Analysis with real-time data

Figure portrays the comparative analysis made with real-time data. Figure (a) depicts the accuracy analysis. When the training value is assumed as 60%, accuracy measured by conventional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.722, 0.752, 0.765, 0.775, 0.784, 0.823, 0.798, 0.847, and 0.865, whereas the IWTSO-based HAN has 0.874 accuracy, that shows the percentage of improvement by considering the existing ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 17%, 14%, 12%, 11%, 10%, 6%, 8.7%, 3.09%, and 1%. The TPR analysis is shown in Figure (b). By considering the training data as 60%, TPR computed by existing methods, like ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.732, 0.763, 0.775, 0.781, 0.791, 0.832,0.816, 0.858, and 0.865, whereas the IWTSO-based HAN achieved the TPR of 0.887 that reports the performance of improvement with traditional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 17%, 14%, 13%, 12%, 11%, 6%, 8%, 3.3%, and 2.5%.

Figure 8. Analysis with real-time data, (a) accuracy, (b) TPR, (c) TNR, (d) FNR, (e) precision.

Figure (c) portrays the TNR analysis. By considering the 60% training value, TNR measured by the existing ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 0.741, 0.774, 0.784, 0.799, 0.801, 0.849, 0.816, 0.866, 0.877, and 0.894. The FNR analysis is depicted in Figure (d). At 60% of training value, FNR observed by the traditional ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, and SVM is 0.268, 0.237, 0.225, 0.219, 0.209, 0.168, 0.188, 0.147, and 0.136 whereas IWTSO-based HAN achieved lower FNR of 0.113, respectively. Figure (e) depicts the analysis of the precision. For the training data as 90%, precision measured using ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 0.7765, 0.7941, 0.8413, 0.8941, 0.9248, 0.9471, 0.936, 0.953, 0.958, and 0.9641.

4.7 Comparative discussion

Table portrays the comparative discussion of the IWTSO-based HAN. By considering the Reuter dataset, the accuracy measured by the IWTSO-based HAN is 0.913, whereas the TPR and TNR achieved by the IWTSO-based HAN are 90.3% and 93.2%. With the 20-Newsgroup dataset, the accuracy achieved by methods like ISCA, LFNN, NDM-BJO, HRFS, Improved IWO-HAN, TSA-HAN, LSTM, RNN, SVM, and IWTSO-based HAN is 81.3%, 84.1%, 85.6%, 86.5%, 87.1%, 91.3%, 88.7%, 91.7%, 91.9%, and 92.4%. The proposed method achieved better accuracy, TPR, and TNR of 92.4%, 92.4%, and 94.1% for the 20-Newsgroup dataset. With the real-time data, the accuracy evaluated by the IWTSO-based HAN is 93.4%, whereas the TPR and TNR values are measured as 92.1%, and 93.2%, respectively.

Table 4. Comparative discussion.

Download CSV Display Table

The reasons for the better performance of the IWTSO-based HAN are discussed as follows:

In the IWTSO-based HAN, the pre-processing is done by stop word removal and stemming processes, in which the unstructured format is changed to the structured text representation. Also, the behaviour and the jet propulsion of TSA are integrated with the weed behaviour of improved IWO and so the convergence rate of the optimisation is increased thus enabling to generate global best solution by eliminating the local optima. Moreover, to enhance the performance of classification, two different attention levels of HAN are used, such as, word level and sentence level. Thus, the performance of the IWTSO-based HAN is improved compared to other existing methods.

5. Conclusion

An effective classifier named HAN is developed for performing the text classification process. The IWTSO-based HAN involves different phases to classify the text document. At first, the input text data is pre-processed, then, the feature extraction acquires features associated with the text data. The feature selection process is employed to select the optimal features of data to enhance the classification performance. The HAN is employed to classify the text document such that HAN is trained with the IWTSO algorithm. The IWTSO-based HAN obtained higher performance in terms of accuracy, TPR, TNR, precision, and lower FNR of 0.924, 0.924, 0.941, 0.954, and 0.0758, respectively. Text classification is used in various fields, like data mining, artificial intelligence, information retrieval, NLP, etc. The major applications of the text classifications are spam detection in emails, language detection, sentiment analysis, speech recognition, topic labeling, and intent detection. However, the efficiency of the feature selection method is not evaluated, which may affect the accuracy of the model. The future dimension of research would be the consideration of publicly available larger datasets. Also, the performance of the implemented feature selection method will be evaluated with other filter-based feature selection approaches.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

20 Newsgroup dataset. Retrieved November, 2020, from https://www.kaggle.com/datasets/crawford/20-newsgroups.
Google Scholar
Abdollahzadeh, B., Gharehchopogh, F. S., Khodadadi, N., & Mirjalili, S. (2022). Mountain gazelle optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems. Advances in Engineering Software, 174, 103282. doi:10.1016/j.advengsoft.2022.103282
Web of Science ®Google Scholar
Aldjanabi, W., Dahou, A., Al-qaness, M. A. A., Elaziz, M. A., Helmi, A. M., & Damaševičius, R. (2021). Arabic offensive and hate speech detection using a cross-corpora multi-task learning model. Informatics, 8(4), 69. doi:10.3390/informatics8040069
Google Scholar
Anagnostopoulos, I., Anagnostopoulos, C., Loumos, V., & Kayafas, E. (2004). Classifying web pages employing a probabilistic neural network. IEEE Proceedings-Software, 151(3), 139–150. doi:10.1049/ip-sen:20040121
Google Scholar
Azizi, M., Talatahari, S., & Gandomi, A. H. (2023). Fire Hawk optimizer: A novel metaheuristic algorithm. Artificial Intelligence Review, 56(1), 287–363. doi:10.1007/s10462-022-10173-w
Web of Science ®Google Scholar
Belazzoug, M., Touahria, M., Nouioua, F., & Brahimi, M. (2020). An improved sine cosine algorithm to select features for text categorization. Journal of King Saud University-Computer and Information Sciences, 32(4), 454–464. doi:10.1016/j.jksuci.2019.07.003
Web of Science ®Google Scholar
Borhani, M. (2020). Multi-label log-loss function using L-BFGS for document categorization. Engineering Applications of Artificial Intelligence, 91, 103623. doi:10.1016/j.engappai.2020.103623
Web of Science ®Google Scholar
Che, Y., & He, D. (2022). An enhanced seagull optimization algorithm for solving engineering optimization problems. Applied Intelligence, 52(11), 13043–13081. doi:10.1007/s10489-021-03155-y
Web of Science ®Google Scholar
Chen, R. C., & Hsieh, C. H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427–435. doi:10.1016/j.eswa.2005.09.079
Web of Science ®Google Scholar
Coban, O. (2022). A new modification and application of item response theory-based feature selection for different machine learning tasks. Concurrency and Computation: Practice and Experience, 34(26), doi:10.1002/cpe.7282
Web of Science ®Google Scholar
Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural networks, 10(5), 1048–1054. doi:10.1109/72.788645
PubMed Web of Science ®Google Scholar
ElAmine Chennafi, M., Bedlaoui, H., Dahou, A., & Al-qaness, M. A. A. (2022). Arabic Aspect-Based Sentiment Classification Using Seq2Seq Dialect Normalization and Transformers. Knowledge, 2(3), 388–401. doi:10.3390/knowledge2030022
Google Scholar
Fan, Y., Zhang, W., Bai, J., Lei, X., & Li, K. (2023). Privacy-preserving deep learning on big data in cloud. China Communications.
Web of Science ®Google Scholar
Feng, F., Li, K. C., Yang, E., Zhou, Q., Han, L., Hussain, A., & Cai, M. (2023). A novel oversampling and feature selection hybrid algorithm for imbalanced data classification. Multimedia Tools and Applications, 82(3), 3231–3267. doi:10.1007/s11042-022-13240-0
Web of Science ®Google Scholar
Gasmi, K. (2022). Improving bert-based model for medical text classification with an optimization algorithm. In the proceeding of International Conference on Computational Collective Intelligence, 1653, 101–111.
Google Scholar
Gharehchopogh, F. S., Maleki, I., & Dizaji, Z. A. (2021). Chaotic vortex search algorithm: metaheuristic algorithm for feature selection. Evolutionary Intelligence.
Web of Science ®Google Scholar
Goudjil, M., Koudil, M., Bedda, M., & Ghoggali, N. (2018). A novel active learning method using SVM for text classification. International Journal of Automation and Computing, 15(3), 290–298. doi:10.1007/s11633-015-0912-z
Web of Science ®Google Scholar
Günal, S., Ergin, S., Gülmezoğlu, M. B., & Gerek, ÖN. (2006). On feature extraction for spam e-mail detection. In International Workshop on Multimedia Content Representation, Classification and Security, Springer, Berlin, Heidelberg, 635–642.
Google Scholar
Guzella, T. S., & Caminhas, W. M. (2009). A review of machine learning approaches to spam filtering. Expert Systems with Applications, 36(7), 10206–10222. doi:10.1016/j.eswa.2009.02.037
Web of Science ®Google Scholar
Han, D., Pan, N., & Li, K. C. (2022). A traceable and revocable ciphertext-policy attribute-based encryption scheme based on privacy protection. IEEE Transactions on Dependable and Secure Computing, 19(1), 316–327. doi:10.1109/TDSC.2020.2977646
Web of Science ®Google Scholar
Jin, L., Zhang, L., & Zhao, L. (2023). Feature selection based on absolute deviation factor for text classification. Information Processing and Management, 60(3).
Web of Science ®Google Scholar
Kaur, S., Awasthi, L. K., Sangal, A. L., & Dhiman, G. (2020). Tunicate Swarm algorithm: A new bio-inspired based metaheuristic paradigm for global optimization. Engineering Applications of Artificial Intelligence, 90, 103541. doi:10.1016/j.engappai.2020.103541
Web of Science ®Google Scholar
Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. StanfordInfoLab.
Google Scholar
Kou, G., Yang, P., Peng, Y., Xiao, F., Chen, Y., & Alsaadi, F. E. (2020). Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Applied Soft Computing, 86.
Web of Science ®Google Scholar
Lim, H., & Kim, D. W. (2020). Generalized term similarity for feature selection in text classification using quadratic programming. Entropy, 22(4), 395. doi:10.3390/e22040395
PubMed Web of Science ®Google Scholar
Liu, G., & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 337, 325–338. doi:10.1016/j.neucom.2019.01.078
Web of Science ®Google Scholar
Liu, Y., Ju, S., Wang, J., & Su, C. (2020). A new feature selection method for text classification based on independent feature space search. Mathematical Problems in Engineering.
Web of Science ®Google Scholar
Liu, Y., Sun, C. J., Lin, L., Wang, X., & Zhao, Y. (2015). Computing semantic text similarity using rich features. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, 44–52.
Google Scholar
Maragheh, H. K., Gharehchopogh, F. S., Majidzadeh, K., & Sangar, A. B. (2022). A new hybrid based on long short-term memory network with spotted hyena optimization algorithm for multi-label text classification. Mathematics, 10, 1–24.
Google Scholar
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2020). Deep learning based text classification: A comprehensive review. ACM Computing, 1(1), 1–43.
Google Scholar
Misaghi, M., & Yaghoobi, M. (2019). Improved invasive weed optimization algorithm (IWO) based on chaos theory for optimal design of PID controller. Journal of Computational Design and Engineering, 6(3), 284–295. doi:10.1016/j.jcde.2019.01.001
Web of Science ®Google Scholar
Mohammadzadeh, H., & Gharehchopogh, F. S. (2021). Feature selection with binary symbiotic organisms search algorithm for Email spam detection. International Journal of Information Technology and Decision Making, 20(1), 469–515. doi:10.1142/S0219622020500546
Google Scholar
Naruei, I., & Keynia, F. (2022). Wild horse optimizer: A new meta-heuristic algorithm for solving engineering optimization problems. Engineering with Computers, 38(S4), 3025–3056. doi:10.1007/s00366-021-01438-z
Google Scholar
Parlak, B., & Uysal, A. K. (2023). A novel filter feature selection method for text classification: Extensive feature selector. Journal of Information Science, 49(1), 59–78. doi:10.1177/0165551521991037
Web of Science ®Google Scholar
Pedersen, B. P., Ifrim, G., Liboriussen, P., Axelsen, K. B., Palmgren, M. G., Nissen, P., Wiuf, C., & Pedersen, C. N. S. (2014). Large scale identification and categorization of protein sequences using structured logistic regression. PLOS ONE, 9(1), e85139. doi:10.1371/journal.pone.0085139
PubMed Web of Science ®Google Scholar
Ranjan, N. M., & Prasad, R. S. (2018). LFNN: Lion fuzzy neural network-based evolutionary model for text classification using context and sense based features. Applied Soft Computing, 71, 994–1008. doi:10.1016/j.asoc.2018.07.016
Web of Science ®Google Scholar
Reuters-21578 Text Categorization Collection Data Set. Retrieved November, 2020, from https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection.
Google Scholar
Roy, P. K., Tripathy, A. K., Weng, T. H., & Li, K. C. (2023). Securing social platform from misinformation using deep learning. Computer Standards & Interfaces, 84, 103674. doi:10.1016/j.csi.2022.103674
Web of Science ®Google Scholar
Saeed, M. M., & Al Aghbari, Z. (2022). ARTC: Feature selection using association rules for text classification. Neural Computing and Applications, 34(24), 22519–22529. doi:10.1007/s00521-022-07669-5
Web of Science ®Google Scholar
Şahin, DÖ, & Kılıç, E. (2019). Two new feature selection metrics for text classification. Automatika, 60(2), 162–171. doi:10.1080/00051144.2019.1602293
Web of Science ®Google Scholar
Stein, R. A., Jaques, P. A., & Valiati, J. F. (2019). An analysis of hierarchical text classification using word embeddings. Information Sciences, 471, 216–232. doi:10.1016/j.ins.2018.09.001
Web of Science ®Google Scholar
Tanhaeean, M., Moghaddam, R. T., & Akbari, A. H. (2022). Boxing Match algorithm: a new meta-heuristic algorithm. Soft Computing, 26(24), 13277–13299. doi:10.1007/s00500-022-07518-6
Web of Science ®Google Scholar
Thirumoorthy, K., & Muneeswaran, K. (2020). Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification. Sadhana, 45(1), 1–13. doi:10.1007/s12046-020-01443-w
Web of Science ®Google Scholar
Vidyadhari, C. H., Sandhya, N., & Premchand, P. (2019). A semantic word processing using enhanced cat swarm optimization algorithm for automatic text clustering. Multimedia Research, 2(4), 23–32.
Google Scholar
Wang, H., & Hong, M. (2019). Supervised Hebb rule based feature selection for text classification. Information Processing & Management, 56(1), 167–191. doi:10.1016/j.ipm.2018.09.004
Web of Science ®Google Scholar
Wang, L., Cao, Q., Zhang, Z., Mirjalili, S., & Zhao, W. (2022). Artificial rabbits optimization: A new bio-inspired meta-heuristic algorithm for solving engineering optimization problems. Engineering Applications of Artificial Intelligence, 114.
PubMed Web of Science ®Google Scholar
Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In the Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, Association for Computational Linguistics, 6382–6388.
Google Scholar
Wu, D., Yang, R., & Shen, C. (2020). Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm. Journal of Intelligent Information Systems, 1–23.
Web of Science ®Google Scholar
Xue, J., & Shen, B. (2023). Dung beetle optimizer: A new meta-heuristic algorithm for global optimization. The Journal of Supercomputing, 79(7), 7305–7336. doi:10.1007/s11227-022-04959-6
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480–1489.
Google Scholar
Yu, B., & Zhu, D. H. (2009). Combining neural networks and semantic feature space for email classification. Knowledge-Based Systems, 22(5), 376–381. doi:10.1016/j.knosys.2009.02.009
Web of Science ®Google Scholar
Zhou, H., Li, X., Wang, C., & Ma, Y. (2022). A feature selection method based on term frequency difference and positive weighting factor. Data and Knowledge Engineering, 141, 102060. doi:10.1016/j.datak.2022.102060
Web of Science ®Google Scholar

Optimal feature selection and invasive weed tunicate swarm algorithm-based hierarchical attention network for text classification

Abstract