752
Views
0
CrossRef citations to date
0
Altmetric
Research Article

PS-GCN: psycholinguistic graph and sentiment semantic fused graph convolutional networks for personality detection

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Article: 2295820 | Received 11 Aug 2023, Accepted 12 Dec 2023, Published online: 30 Dec 2023

Abstract

Personality detection identifies personality traits in text. Current approaches often rely on deep learning networks for text representation but they overlook the significance of psychological language knowledge in connecting user language expression to psychological characteristics. Consequently, the accuracy of personality detection is compromised. To address this issue, this paper presents PS-GCN, a model integrating Psychological knowledge and Sentiment semantic features through Graph Convolution Networks. Firstly, the Bi-LSTM network captures local features of preprocessed sentences to accurately represent the output of sentence sentiment features. Secondly,  GCNs map psycholinguistic knowledge, forming semantic networks of entities and relationships. P-GCN is designed to capture the dependency information between psycholinguistic features, while S-GCN utilises syntactic structure analysis to gather more abundant information features and enhance semantic understanding ability. Finally, attention calculation is employed to reinforce key features and weaken irrelevant information. Additionally, a sentence group model captures combined features of related sentences, effectively utilising the text structure to mine sentimental features. Experimental results on multiple datasets demonstrate that the proposed method significantly improves the classification accuracy in personality detection tasks.

1. Introduction

Personality is a personality trait in the face of the outside world and is the combination of behaviour, sentiment, motivation, and thinking mode characteristics. Personality traits are shapeable and show individual preferences. In recent years, scholars have favoured it because the difference in personality affects our life choices, happiness, health, and many other decisions (Xu et al., Citation2021; Zhang et al., Citation2022). Personality detection has been proven to play an essential role in many critical practical applications such as personalised recommendation (Wu et al., Citation2015), economics, and psychology research (Yang et al., Citation2021). As an information processing task, it is highly valued by researchers of computational psycholinguistics and natural language processing. Personality testing has important value in practical applications and business environments. It can help achieve better management and effectiveness in the fields of human resource management, marketing and advertising, recommendation systems, and precision healthcare.

The existing personality detection studies (Mehta et al., Citation2020) are mainly divided into traditional machine learning and deep learning methods. Personality is divided by setting multiple classification dimensions. The most famous is the Big Five model (Digman, Citation1990): Openness (OPN), Responsibility (CON), Extroversion (EXT), Agreeableness (AGR), and Neuroticism (NEU). The traditional machine learning methods are mainly based on dictionaries and naive Bayesian methods for personality analysis (Al-Samarraie et al., Citation2017; Cui & Qi, Citation2017). Language surveys and word number statistics (LIWC) are commonly used feature extraction tools for extracting psycholinguistic features. However, with the explosive growth of data, personality traits have proven to be closely related to personal data trajectory (posts, social media, etc.).

Recent studies are more inclined to use deep learning methods for text representation and establish detection models in a big data-driven manner (Zhang et al., Citation2021). Despite significant improvements, there are still some challenges: 1) Psychological language knowledge is not fully utilised. In the process of machines understanding natural language, common sense knowledge plays a vital role (Speer et al., Citation2016). Common sense knowledge is the relationship between multiple entities, concepts, and attributes. It is challenging to extract character clues accurately, but they can be found in the knowledge base. Psychological language knowledge, as a form of common-sense knowledge, includes aspects such as word sentiment, emotional syntax, and psychological traits. It aids in understanding the hidden information and emotional nuances in language expressions, thus allowing for more precise inferences of users’ psychological features and personalities. but it has not been fully excavated. 2) Complex tree structure has been unable to meet the text syntax information modelling. As shown in Figure , syntactic information has a crucial impact on sentimental polarity due to the influence of syntactic connection on sentimental polarity judgment, which further affects personality detection. Each non-leaf node in the tree structure may contain two or more leaf nodes. It is more like a graph structure. Tree structures such as dependency trees are more suitable for processing text represented by binary tree structures, and it is difficult to fully represent the syntactic structure of the complex text. Different from the previous work, this paper considers the complete mapping expression of commonsense psychological language through the knowledge map, and at the same time, considers expressing the grammatical information of the text in a graphical structure.

Figure 1. An example of a Dependency Tree.

Figure 1. An example of a Dependency Tree.

Given the above problems, although traditional and deep learning methods can obtain better recognition results, the following two aspects should be considered in an effective personality detection method. 1) How to excavate the potential personality clues in psychological language knowledge. Psychological knowledge plays a vital role in personality testing tasks. 2) How to model semantic information of the complex text. The sentimental polarity of the text has an important impact on the personality detection task, and the accurate recognition of grammatical information plays a positive role in it.

Based on the above discussion, this paper proposes a new personality detection method and introduces graph convolution into it. In the existing research, multiple nodes constitute a knowledge graph that assumes that the node is of equal importance. Only the average vectors on these nodes need to be calculated without closely integrating knowledge and computing structure. The dependency between knowledge nodes is crucial to the representation of the whole sentence. However, this oversimplification ignores it. A neural model needs to be defined that can encode the pre-selected knowledge so that it can enrich the context representation by the available knowledge. The graph convolution network is a good model. It is the latest multi-layer neural network for processing graph structure data. In recent years, it has been widely used in the field of graphics due to its excellent performance and high interpretability. For each node (i.e. a word or a node in the knowledge graph), the primary function of GCN is to transform neighbourhood information into a low-dimensional real-valued feature vector. So, it is natural to solve the two problems mentioned above by GCN in personality detection. This paper proposes a graph convolution personality detection method, namely PS-GCN. Specifically, this method uses the constructed P-GCN and S-GCN, respectively, to model the text by using psychological language knowledge and sentiment semantic features to enrich the information representation of sentences to improve the accuracy of personality detection.

The motivation of this paper is to model the text from two aspects psychological language knowledge and sentiment semantic features. The dependency relationship between words and more abundant characteristics is obtained to improve the accuracy of personality detection. Specifically, the model framework is shown in Figure . Through Bi-LSTM bidirectional memory network and 50th pooling, local features of sentences are effectively captured, and the output of sentimental features of sentences is accurately measured for feature extraction and classification. Two GCNs are used to learn psychological language knowledge and sentiment semantic features, respectively, and more abundant features are obtained by integrating the two. The model is divided into two aspects: (1) Constructing P-GCN to obtain dependency information between words; (2) Constructing S-GCN to analyze the grammatical features of complex sentences and enhance the sentiment semantic understanding ability of the model. The advantage of P-GCN is to obtain the dependency information between words through the complete mapping expression of the psycholinguistic knowledge map. S-GCN is more efficient in syntactic structure analysis, which makes the obtained information features more representative and enhances the sentiment semantic understanding ability of the model.

Figure 2. The framework of PS-GCN.

Figure 2. The framework of PS-GCN.

The main contributions of this paper are as follows:

  1. Effectively obtain the dependency relationship of sentimental feature words. Part of the dependency of sentimental feature words is hidden in background knowledge, and a knowledge map maps the information of different entities to a relational network, providing the ability to analyze problems from the perspective of “relationship”. This paper introduces psycholinguistic knowledge into personality detection using a knowledge map, explores the potential personality clues between different entities in psycholinguistic knowledge from the perspective of “relationship”, and provides a new perspective on personality testing tasks.

  2. Proposing a personality detection method combining psycholinguistic knowledge and sentiment semantic features. Firstly, this method combines BERT pre-training with Bi-LSTM to accurately represent the output of sentence sentimental features. Then, in the process of processing graph structure data, through the advantages of complete mapping expression and syntactic analysis of multi-layer neural network GCN, the text is jointly modelled from psychological language knowledge and sentiment semantic features to obtain more abundant traits feature representation, to improve the accuracy of personality detection.

The organisational structure of this paper is as follows. In the 2 Section, we summarise the current work. Then, in Section 3, we will introduce the PS-GCN model in detail. In the 4 Section, we conduct experiments based on datasets and analyze the experimental results. Finally, in Section 5, we summarise the work of this paper and put forward some ideas for future work.

2. Related works

In this section, we introduce the related work of this paper from the following two aspects: personality detection research and graph convolution neural network research.

2.1. Personality detection

The existing personality detection studies are mainly divided into traditional machine learning methods and deep learning methods. The traditional machine learning method mainly uses language features of keywords in sentences to construct a personality detection model and link language use to psychological factors. On this basis, using syntactic structure, a text-based automatic personality detection model, indicates that language-based syntactic structure is helpful for personality detection (Štajner & Yenikent, Citation2020). Pennebaker et al. (Citation2001) found that there was a reliable connection between writing style (word frequency, part of speech, etc.) and personality in the study. So, they proposed the classical language survey and LIWC method. In this method, different words are classified into different psychological categories, the frequency of words in each category is counted, and the personality features of a given text are identified by traditional machine learning methods.

Inspired by the above, some people focus on the semantic similarity between combined sentences. Kazemeini et al. (Citation2021) proposed an interpretable representation learning method for personality detection to reduce the computational cost of the model. Based on a convolutional neural network, Rahman et al. (Citation2019) compared different loss functions to find out the best personality detection performance. With the explosive growth of data, effective personality detection from massive data texts has become the latest research trend. To deal with the data redundancy and structural confusion in massive data, Hernández et al. (Citation2018) established a DISC personality detection model based on linguistic statistical analysis. They constructed two corpora with individual annotations and related semantics. Rudra et al. (Citation2020) constructed a benchmark dataset with advanced methods for comparative performance analysis, which greatly facilitated the research of researchers’ personality detection tasks in massive datasets (Indira & Maharani, Citation2021; Mehta et al., Citation2020). However, the data comes from the social network, which makes the comment text language have a dynamic abstract nature. The text analysis needs to combine the topic background and the psychological language knowledge background. It will be challenging to construct an effective and concise personality detection model.

With the deepening of research, researchers found that the combination of psychological language knowledge and sentimental information can effectively improve the accuracy of the existing model (Sun et al., Citation2018; Vásquez & Ochoa-Luna, Citation2021) used by the deep neural network method. Personality detection has achieved great success. In the application of psycholinguistic knowledge, some personality models have been proposed, such as the Big Five (Myers et al., Citation1998) model and the MBTI (Mehta et al., Citation2020) model. In the personality model, the Big Five model is a more authoritative personality model and is widely used in psychology and artificial intelligence (Sun et al., Citation2022; Štajner & Yenikent, Citation2021). The Big Five model describes personality from five dimensions, namely openness, agreeableness, extroversion, conscientiousness and neuroticism. Štajner et al. (Citation2021) proposed an explanatory opinion study based on the theoretical reasons for the low results in MBTI. Yang et al. (Citation2021) introduced psychological language knowledge into the deep learning model and proposed a graph attention network (GAT) model (Cambria et al., Citation2018). Aggregate posts from the psychological point of view to reduce the time complexity and space complexity of graph computing. It effectively improves the accuracy of personality classification. SenticNet (Sun et al., Citation2020) has been used as a popular tool for personality detection because of its substantial advantages in identifying text sentiment polarity and sentiment label tasks. At the same time, he provides sentiment values of four sentimental dimensions to quantify personality in personality testing tasks. To perform group-level personality detection, Sun et al. (Citation2018) used an unsupervised feature learning method. They use text generation adversarial networks to predict, effectively reducing dependence on labels. Tu et al. (Citation2022) constructed a multitask detection model according to the correlation between trait features and sentimental behaviour, which has better performance (Majumder et al., Citation2017). The application field aspect (Zhang et al., Citation2022) confirmed similar language use and sentimental expression of similar trait populations, proving the reliable connection between trait features and sentiment. Ren et al. (Citation2021) input psychological language dictionaries and language information into the personality detection model and achieved good results. Elmitwally et al. (Citation2022) proposed a conceptual personality model to enhance adaptability, which utilises the context subjectivity of textual input and emotions obtained from specific situations to determine personality and behaviour. Kerz et al. (Citation2022) proposed the most comprehensive theory-based psycholinguistic feature set and combined the pre-trained Transformer language model BERT with the within-text distribution-based BLSTM network, resulting in improved performance on both the Essay dataset and the Kaggle MBTI dataset. Yang et al. (Citation2023) proposed a dynamic deep graph convolutional network (D-DGCN) that includes modules such as post encoder, learn-to-connect, and DGCN. This allows the model to automatically learn the connections between posts and train them jointly. The effectiveness of the model was primarily validated on the Kaggle and Pandora datasets. Zhu et al. (Citation2022) used the personality dictionary as a bridge to inject relevant external knowledge and encoded psycholinguistic information into the continuous representation of words to enrich the semantics of documents. The model in this paper not only uses external knowledge as input, but also obtains linguistic features, and converts a series of psycholinguistic features into hidden psychological representation vectors, including three main features: morphological syntactic complexity, lexical richness and diversity, and emotional/emotional lexical features. Stanford CoreNLP is used for marked sentence splitting, part-of-speech tagging and syntactic PCFG parsing. Therefore, the sentiment semantic features and psychological language knowledge of the text play vital roles in the personality testing task.

2.2. Graph convolutional networks

A graph is a data structure that represents the multi-to-multi relationship between objects (Onan, Citation2022; Onan, Citation2023), covering the representation of a set, one-to-one, and one-to-many relationship structures, highlighting the strong representation ability of the graph structure. Graph convolution networks obtain the information of adjacent nodes. It can fully excavate interdependent information from rich relational data.

The need for personality testing is being updated, and data for many real-world applications are limited to European Spaces but non-European Spaces. Traditional learning methods have achieved great success in extracting features of Euclidean spatial data, but their performance in dealing with non-European spatial data is still unsatisfactory. GCNs are more competent in this respect. It can be used to explain the mechanism of related syntactic constraints and long-distance word dependence and better identify context words unrelated to grammar. Recently, some studies have applied GCN to personality detection. A graph convolution network is a simple and efficient network. On the one hand, it can obtain adjacent nodes’ information. On the other hand, it can also learn neighbouring information of its adjacent nodes, which can fully mine the information of interdependence from rich relational data. Zhang et al. (Citation2022) use the GCN feature to process the dependency tree of sentences and apply it to syntactic information and word dependency. On this basis, Zhao et al. (Citation2020) code positions by capturing different aspects of sentences and judging sentiment by the GCNs bidirectional attention mechanism. Hou et al. (Citation2021) joined the multi-level attention mechanism to measure the importance of different words in the full text. Inspired by this, Yang et al. (Citation2021) propose a graph attention network for data transmission between various entities to mine the user's personality traits while effectively reducing the cost of graph computing.

Although the above work has achieved good results, the use of psychological language knowledge is not enough, and semantic text modelling is not effective. Especially, the existing methods mainly rely on data-driving to mine the dependency information for personality detection, resulting in too many interference factors and a lack of effective modelling of dependency signals, which makes it challenging to improve personality detection accuracy. There is their limitation. Therefore, we will use a graph convolution neural network to model the text from psychological language knowledge and semantic features, respectively, to enrich the information representation of sentences and improve the accuracy of personality detection.

3. PS-GCN of proposed method

In this section, we will introduce in detail the graph convolution method based on linguistic knowledge and sentiment semantic features, namely PS-GCN. It is applied to personality detection. This method constructs P-GCN and S-GCN to model the text from two aspects of psychological language knowledge and semantic features. So we can obtain more abundant traits feature representation to improve the accuracy of personality detection. As shown in Figure ., in the following sections, we will specifically introduce how these two aspects work in four steps. This model can be divided into four parts: Sentence embedding, Graph Construction, multi-head attention, and Personality classification.

3.1. Sentence embedding

3.1.1. Embedding layer

In this section, we will introduce the embedding layer of the model. The purpose of the embedding layer is to pre-train each word and map it to a high-dimensional vector space.

We use BERT as the embedding model and BERT as an important model of situational representation learning. We choose BERT-base as the experimental condition and adjust the BERT model using the individual dataset. Then each word is represented by a vector. As the given sentence S = {w1, w2, w3 … , wn}, the input is out = {[CLS], w1, w2, w3 … , wn, [SET]}, where “CLS” and “SET” both are special markers that represent the beginning and end of the input sentence. Z=BERT(out)Rdbert×N, where dbert is the size of the hidden dimension. They can map words into high-latitude vector spaces. The formula is as follows: (1) zi=wi×senti(wi)(1)

3.1.2. Bi-LSTM layer

Compared with the traditional RNN, LSTM can allow the unit to forget some contextual knowledge to some extent. This makes Bi-LSTM more suitable for processing long-term sequence information and provides the context in two directions (LSTM,LSTM). Bi-LSTM is better clustered on specific vocabulary, and accurate context information provides technical support for feature extraction. The percentage of information leakage is set between 0 and 1 by the sigmoid function, and the specific calculation formula is as follows: (2) hi1(t)=LSTM(bif+j=1nUi,jfzi(t)+j=1nWi,jfdj(t1))(2) (3) hi=[lstm(zi,l),lstm(zi,l),i[1,n]],j[n,1](3) where Zi(t) is the input of BERT, l is the parameter of the LSTM model. We connect the hidden state to obtain. In particular, in the pooling process, the maximum pooling is usually selected to replace the average pooling, but the robustness of the maximum pooling is low and is not sensitive to outliers. To solve this problem, percentage pooling is used to measure the output of sentences accurately. Input word embedding Z=[z1,z2,z3,,zi], which we use hi to represent the given word wi and get to H=[h1,h2,h3,,hn]Rdh×N. Where dh represents the dimension of the hidden state. The word input into GCN is the features matrix of this node.

3.2. Graph construction

Personality detection can be described as a multi-document multi-label classification task (Ren et al., Citation2021). Formally, it is obtained by 3.1 Sentence embedding. Everyone has a set of texts H={h1,h2,h3,L,hn}. To solve the connection between different texts, we use the Latent Sentence Group (LSG) (Sun et al., Citation2018) to express the abstract feature combination of closely connected sentences. Then add text structure features, clarify the relationship between them, and Ui={u1,u2,u3,,un} represent everyone's comments on the text, i1,2,3,4n. In this section, we will focus on syntactic-semantic graphs and mental language knowledge graph construction methods.

3.2.1. Syntactic semantic graph

This section will introduce the construction principle and method of a syntax-semantics graph in detail. We will construct a heterogeneous grammar graph Gs = {Vs, Es, Ac}. Where Es represents the edge set of a graph containing all adjacent node pairs (i.e. word relation pairs) in the syntax-semantic dependency tree. The value of |Es| indicates that syntax depends on the number of nodes in the tree. Vc represents the vertex set in the graph and contains nodes (words) in the syntax-semantic dependency tree. And |Vs| is the number of words n in sentence U. The syntax-semantics dependency tree of sentences is obtained by Standford-dependencies (Xu et al., Citation2021). As shown in Figure ., the adjacency matrix of the sentence “Delicious food that has many good effects’ is Gs. The weight between nodes is defined as: (4) AijS={1,iandjareword, eijEC1,i=j0.otherwise(4) Word node characteristics: As mentioned above, we pre-train each word to map it to high dimensional vector space. After BERT pre-training, the output of BERT is input into Bi-LSTM for long-term dependency learning. The 50th pooling is used to measure the vector representation of the sentence accurately. Combined with the syntactic and semantic dependency tree, the contextualised word representation PL is obtained, and L denotes LSTM. and the node characteristics of the word are represented. The feature matrix PRn×dn is PL of graph G, where each row i of the feature matrix P represents the feature of the word node ui.

Figure 3. The adjacency matrix of Gp.

Figure 3. The adjacency matrix of Gp.

Word Node Features: As described in Section 3.1.2, we pass the word embeddings into a Bi-LSTM. This allows us to obtain the contextualised word representations, QLSTM, which serve as the features for the word nodes. Therefore, the feature matrix HS is the QLSTM output for the graph GS, with each row QiS representing the feature representation of the word node ui.

3.2.2. Psychological language knowledge graph

LIWC (Zhang et al., Citation2022) is a major psycholinguistic analysis tool in the field of personality testing (Lynn et al., Citation2020 ). Its biggest feature is that it can classify words into psychologically related categories to realise the classification in psycholinguistics. In this paper, we introduce LIWC into the knowledge graph and use SenticNe (Wu et al., Citation2015)Footnote1,Footnote2,Footnote3 as a common knowledge base to embed it into the GCN encoder (Tausczik et al., Citation2010). SenticNet contains 100,000 rich emotional attribute information or concepts and is conceptually represented. At the same time, we use the LIWC dictionary to classify them. The LIWC dictionary can divide words into 73 categories, including nine major categories and 64 subcategories. After comprehensive consideration, some pronouns and prepositions equal to personality testing task-irrelevant factors are filtered, and all the main categories and part subcategories are studied.

We construct a knowledge graph GP = {VP, EP, AP}, where AP is the adjacency matrix of GP, and VP is not only the word node but also the knowledge node. Therefore, |VP| represents the total number of n words plus the number of knowledge nodes in the knowledge subgraph. At the same time, EP represents not only the relationship between word nodes but also the relationship between adjacent knowledge nodes in the knowledge subgraph – namely, the node pair.

The knowledge subgraph contains the most relevant and important knowledge nodes of words in this node. Specifically, as shown in Figure , the adjacency matrix of the sentence “Delicious food that has many good effects’ GP. First, use each word in the knowledge graph as a seed node. Then, the nodes in five steps are used to obtain the most relevant nodes and embedded into the GCN encoder. The weights between the nodes ui uj are calculated as follows: (5) AijP={1,iiswordandjareknowledgenode, eijEK1,iandjareknowledgenode, eijEK1,i=j0,otherwise(5) Knowledge node features: Common sense knowledge and psycholinguistic knowledge concepts are mapped to a continuous low-dimensional embedding in EAff. To ensure that the semantic and sentimental relevance of the original space will not be lost in the process of embedding EAff, we use Affective Space (Cambria et al., Citation2018) to construct the embedding matrix. And calculate the vector representation of the knowledge node HAff by EAff. For the feature matrix of graph GP, QPR(n+nP)×dh is [QL;HAff], where each row i of the feature matrix represents the feature vector of the word node or the knowledge node ui. Based on this, we construct a psycholinguistic knowledge map and embed it into the GCN model to better perform personality detection tasks.

Figure 4. The framework of Gs.

Figure 4. The framework of Gs.

In the personality detection task, GCN (Graph Convolutional Network) and GAT (Graph Attention Network) are two commonly used graph neural network models for modelling and analyzing graph data. GCN is a graph convolutional network model, which updates the representation of the central node through the information transmission of neighbour nodes. Based on the idea of local neighbour aggregation, GCN convolves the node features with their neighbour node features to learn the representation of each node. In the personality detection task, GCN can capture the relationship and context information between nodes according to the topological structure and node characteristics of the graph, to classify emotions. GCN uses a fixed weight matrix for convolution, which is relatively simple and has good scalability. Given the above graph construction, we input it into the GCN model, construct P-GCN and S-GCN, and continue to model the text from the aspects of syntactic semantics and psychological knowledge graph, respectively. The calculation formula of the word node is as follows: (6) QP(j+1)=σ(A^PQP(j)WP(j))(6) (7) QS(j+1)=σ(A^SQS(j)WS(j))(7) Where A^P=DP12APDP12 is the normalised symmetric adjacency matrices of P-GCN, and A^S=DS12ASDS12 is the normalised symmetric adjacency matrices of S-GCN. Similarly, WP(j) is the weight matrices for jth layer of P-GCN, and WS(j) is the weight matrices for jth layer of S-GCN, respectively. The degree matrices of AP is DP, where DiiP=jAijP. The degree matrices of AS is DS, where DiiS=jAijS. The node -level output is NP=QP(L), QP is feature matrix, which is an (n+nP)×dgP feature matrix. The node – level output is NS=QS(L), QS is feature matrix, which is an n×dgS feature matrix, where dgP and dgS are hidden dimensions.

3.3. Multi-head attention

Multiple attention refers to the ability to perform multiple attention functions. In this paper, we use multi-headed attention (MHA) (Vaswani et al., Citation2017) to capture the critical parts of the sentence different from the Transformer (Erik et al., Citation2015). The motivation is to measure the importance of words for personality detection direction and to weaken irrelevant information by strengthening the key features. (8) MHA(Z)=[Sa(1);Sa(2);;Sa(nhead)]×Whead(8) (9) Sa(t)=i=1nexp(f(t)(Zi,xipos))j=1nexp(f(t)(Zj,xjpos))Zi(9) (10) f(t)(Zj,xjpos)=uA(t)tanh(WA(t)[Zj;xjpos+bA(t)])(10)

Where WA(t)R(dg+dpos)×(dg+dpos),bA(t)R(dg+dpos),uA(t)R(dg+dpos),WheadR(nhead+dpos)×dg is a serial operator, and their parameters can be continuously adjusted. The nodes in the Zi graph are represented.

3.4. Multitask training

We utilise two different loss functions to define the final loss function and combine them into a joint function (Li et al., Citation2021). By optimising this joint loss function, we can simultaneously consider the contributions of both loss functions to guide model optimisation and training. This approach helps us better address complex tasks, and the specific calculation formula is formula (11): (11) LMulti=LPersonality+LEmotion(11) Both LEmotion and LPersonality are defined as the loss function, but the definition is different. This task involves predicting multiple personality traits in a multilabel manner. For this purpose, we use the multilabel soft margin loss, denoted LPersonality to define the objective function. This loss function quantifies the discrepancy between the model’s predictions and the ground truth labels of the data. By optimising this objective function, we can adjust the model’s parameters to improve its accuracy and consistency in predicting the various personality traits, and the specific calculation formula is formula (12): (12) LPersonality=1CiCyilog(e(1+exp(y^i))1)+(1yi)log(exp(y^i)1+exp(y^i))(12) C is the class number. However, it is the multiclass prediction task for emotion detection, and we apply the cross entropy as the loss function, and the specific calculation formula is formula (13): (13) LEmotion=iCyilog(y^i)(13) The process of optimising the loss function is carried out using a gradient descent algorithm. Iteratively perform forward propagation, loss calculation, backpropagation, and parameter updates on the training data until the stop condition is reached. The specific parameters are shown in Table .

3.5. Classification

In this section, we will introduce the user's personality detection task. After L-layer iterative training, the final node representation of the user is obtained and fed back to the softmax layer. In this paper, the multi-label classification method is used to perform personality detection tasks. The specific calculation formula (14) is as follows. (14) p(yt)=softmax(uWut+but)(14) Where Wut is a trained weight matrix, but is a deviation item. The objective function is defined as: (15) J(θ)=1Vv=1Vt=1TXi[yutlogp(yut|θ)](15) Where V is the size of the training set, and T represents the type of personality traits. yvt is the actual label of the corresponding personality and represents the prediction probability of the corresponding personality under the background of parameter θ.

4. Experiments

In this section, we will further introduce the experimental details and do a lot of experiments on the text trait dataset. These are mainly from the following four aspects: experimental datasets, baselines and settings, experimental results and analysis, and ablation study.

4.1. Experimental datasets

Since the dataset used for personality testing involves personal privacy issues, it is difficult to collect the dataset. In addition, manual labelling personnel need to have a particular psychological language research background, which makes the cost of labelling higher. However, recent studies (Zhou et al., Citation2020) have confirmed that small high-quality datasets perform better than large low-quality datasets in the model training of personality testing tasks. Therefore, we chose the MBTI and Big Five datasets, which are currently common and more representative. I prefer the Kaggle (Yang et al., Citation2021)Footnote4,Footnote5 dataset to the MBTI dataset, which is from Personality Café (Mehta et al., Citation2020). The dataset has a total of 8675 personnel information, and each person has texts with personalised labels (Keh & Cheng, Citation2019). The content of the text involves the discussion of health, happiness, life, and personality types. The personality labels are I/E, S/N, T/F, and P/J. The specific data are shown in Table . Big Five is a popular dataset with 2468 personnel information and 50 text information per person. This dataset divides people's personality traits into five categories, namely EXT, AGR, NEU, CON, and OPN. The details of our two datasets are shown in Table .

Table 1. Datasets Introduction.

4.2. Experimental setup and evaluation measurement

We conducted statistical analysis on the two datasets, and the specific information is shown in Tables  and . We delete some irrelevant information in the data, such as URL links and meaningless special symbols, and use NLTK (Yang et al., Citation2021) to remove stop words. At the same time, we find that the types of data labels in the MBTI dataset are unbalanced. Therefore, we use under-sampling to extract data so that the data in multiple dimensions can be balanced. In contrast, Big Five Dataset data is more balanced. we take the dataset from the research of Majumder et al (Majumder et al., Citation2017). After data preprocessing, the BERT model is BERT-BASE (Mehta et al., Citation2020), and the specific parameters are shown in Table .

Table 2. MBTI personality types.

Table 3. Big Five personality types.

Table 4. Parameter setting.

In the process of the experiment, we shuffle the data sets and split them in a 60-20-20 proportion for training, validation, and testing. In the process of model training, different optimisers are selected to optimise, so that the model training effect is more effective and convenient for experimental comparison. The selection of model parameters and optimisers is shown in Table .

The personality detection task in this paper is a combination of multiple extraction methods of graph convolution based on psycholinguistic knowledge and sentiment semantic features. To verify the personality detection performance of the PS-GCN model, we conduct experiments on real datasets to compare the advantages and disadvantages of different variants and models. In the experiment, the baselines we compared include:

SVM (Diederik & Jimmy, Citation2014): Firstly, TF-IDF and LIWC are used to obtain features from sentences, and a support vector machine (SVM) is used as a classifier.

2CBiLSTM (Sun et al., Citation2018): It is the combination of bidirectional LSTM and CNN. Firstly, bidirectional LSTM is used to process sentences, and the maximum pooling operation is used to obtain the vector features of sentences. Then, LSG is used to represent multiple sentence vectors of the same user, and CNN and text structure are combined to detect personality.

BERT (Cui & Qi, Citation2017): Firstly, the BERT model is pre-trained with processed data. Then, each sentence is encoded by fine-tuning BERT, and user representation is generated by averaging pooling operations.

BERT + Sentic + CNN (Hou et al., Citation2021): Multi-label personality detection from the sentimental and semantic point of view. Firstly, the bidirectional encoder is used to generate sentence-level embedded text for semantic extraction, and the sentimental information is considered through the sentimental dictionary. Then, the semantic information and sentimental information are input into the neural network for personality detection.

SN + Attn (Lynn et al., Citation2020): SN + Attn uses a hierarchical attention network, which is mainly composed of two GUR. One GRU of word-level attention is to encode each post, and the other is a post-level attention GUR which is to generate user representation.

TrigNet (Yang et al., Citation2021): TrigNet uses a tripartite graph (Post Node, Word Node, Category Node) to construct a heterogeneous ternary graph. Personality testing from a psychological point of view.

D-DGCN (Yang et al., Citation2023): As the latest model, they design a learn-to-connect approach that adopts a dynamic multi-hop structure instead of a deterministic structure, and combine it with a DGCN module to automatically learn the connections between posts.

In practice, due to the imbalance between the two datasets under some marking dimensions, Macro-F1 is used to evaluate the performance of each personality trait. In all Macro-F1, the average Macro-F1 performance is the best. Therefore, this paper uses the average Macro-F1 index to measure the overall performance of personality detection.

4.3. Experimental results and analysis

Datasets of Kaggle and Big Five are used to compare the experimental results of different models. The specific experimental results are shown in Table .

Table 5. PS-GCN and baselines in Macro-F1 score results.

As shown in Figures  and ., we can get the following conclusions:

Figure 5. The result of Kaggle Macro-average.

Figure 5. The result of Kaggle Macro-average.

Figure 6. The result of the Big Five Macro-average.

Figure 6. The result of the Big Five Macro-average.

1)The classification accuracy of this model is higher than other models. The Macro-F1 average score of PS-GCN is in a relatively leading position, which proves the advancement and superiority of the PS-GCN in-text personality detection task. Specifically, compared with the TrigNet model, the Macro-F1 average scores of PS-GCN on Kaggle and Big Five increased by 0.45 and 1.35, respectively. Compared with the advanced D-DGCN model, the Macro-F1 scores of PS-GCN on I/E and S/N of Kaggle increased by 1.11 and 0.07. This proves the importance of the sentiment semantics of sentences in personality detection tasks. Secondly, when compared with the BERT model and its variant BERT + Sentic + CNN, the F1 scores of PS-GCN on the two datasets increased by 4.27, 1.74, and 3.67, 2.19, respectively. Which effectively proved the superiority of GCN in dealing with text semantics.

2)Psychological language plays a vital role in personality detection. Compared with the SN + Attn model, the improvement of PS-GCN is 3.32 and 1.71, respectively. Compared with the 2CBiLSTM model, PS-GCN is improved, which verifies that the introduction of psychological language knowledge is an effective method. In addition, compared with the SVM model and Bi-LSTM model, PS-GCN is based on LIWC and deep learning and introduces the psychological language knowledge and syntactic-semantic module, which further proves the importance of psychological language knowledge and syntactic-semantic module in the personality detection task.

Figure 7. The result of Ablation in Kaggle and Big Five.

Figure 7. The result of Ablation in Kaggle and Big Five.

4.4. Ablation study

To prove the superiority of the PS-GCN model, this section further explores the contribution of each module of the PS-GCN model on the whole, as shown in Table  and .

Table 6. P-GCN and S-GCN in Macro-F1 score results.

We conducted an ablation study on the PS-GCN model on Kaggle and Big Five datasets. The PS-GCN model has two modules: P-GCN and S-GCN. We removed P-GCN and S-GCN modules to study their contribution to the experiment. The specific results are shown in Table . We find that the Macro-F1 average scores of S-GCN on Kaggle and Big Five datasets are 63.17 and 60.21, respectively. At the same time, the Macro-F1 average scores of P-GCN are 66.16 and 61.08, respectively. P-GCN was 2.99 and 5.60 higher than S-GCN, respectively. This means that psycholinguistic knowledge and syntactic semantics are effective methods in personality detection, and psycholinguistic knowledge is more conducive to capturing personality clues in personality detection tasks.

5. Conclusions

Personality detection tasks have important value in practical applications and business environments. In terms of human resource management, it helps to better understand the individual characteristics of employees, thereby better assigning tasks, conducting employee evaluations and promotions, and matching them with the organisation’s values and culture. In terms of marketing and advertising, it helps to understand the individual characteristics of consumers and can provide valuable information for marketing and advertising strategies. In terms of recommendation systems, an accurate understanding of user personality can improve the effectiveness of the recommendation system. In precision medicine, the recognition of personality traits can help doctors design more personalised and targeted psychological treatment plans. To improve the accuracy of the personality detection task, a classification method for the personality detection task is proposed from the perspective of psycholinguistics and sentiment semantics, namely PS-GCN. Through graph convolution, text personality detection and classification are carried out from two aspects psychological language knowledge and sentiment semantic features. The contributions of this paper are as follows:

  1. The effective acquisition of the dependency relationship of sentimental words. The psycholinguistic knowledge graph introduced by the text maps the information of different entities into the relational network, thus explicitly introducing psycholinguistic knowledge into personality detection. Exploring the potential personality clues between various entities in psycholinguistic knowledge from the perspective of “relationship” increases the ability to explain personality and provides a new view for personality testing tasks.

  2. A personality detection method is proposed to combine psychological language knowledge and sentiment semantic features. Firstly, through BERT pre-training and Bi-LSTM, the output of sentence sentimental features is accurately represented. Then, in the process of processing graph structure data, the advantages of complete mapping expression and syntactic analysis of multi-layer neural network GCN are adopted. The text is jointly modelled from two aspects of psychological language knowledge and sentiment semantic features to obtain more abundant trait features. The structure of a multi-label classifier is used to improve the accuracy of personality detection.

The experimental results show that the proposed model can effectively identify the personality traits of the text, which is superior in the personality detection task. In future work, we will further optimise the combination of psycholinguistic knowledge and syntax and semantics to learn more distinctive text personality traits. At the same time, we also consider extending it to personality analysis and personalised speculation to obtain better results.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Opening Foundation of State Key Laboratory of Cognitive Intelligence, iFLYTEK: [Grant Number COGOS-2023HE02]; the National Natural Science Foundation of China: [Grant Number 62076006]; the University Synergy Innovation Program of Anhui Province: [Grant Number GXXT-2021-008].

Notes

1 https://nlp.stanford.edu/software/stanford-dependencies.html

2 http://liwc.wpengine.com/

3 http://alt.qcri.org/semeval2016/task5/

4 http://www.nltk.org

5 https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-12_H-768_A-12.zip

References

  • Al-Samarraie, H., Eldenfria, A., & Dawoud, H. (2017). The impact of personality traits on users' information-seeking behavior. Information Processing & Management, 53(1), 237–247. https://doi.org/10.1016/j.ipm.2016.08.004
  • Cambria, E., Poria, S., Hazarika, D., & Kwok, K. (2018). SenticNet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11559
  • Cui, B., & Qi, C. (2017). Survey analysis of machine learning methods for natural language processing for MBTI personality type prediction. https://arxiv.org/abs/1707.07012.
  • Diederik, P., & Jimmy, B. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980.
  • Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of Psychology, 41(1), 417–440. https://doi.org/10.1146/annurev.ps.41.020190.002221
  • Elmitwally, N. S., Kanwal, A., Abbas, S., Khan, M. A., Khan, M. A., Ahmad, M., & Alanazi, S. (2022). Personality detection using context based emotions in cognitive agents. Computers, Materials & Continua, 70(3), 4947–4964. https://doi.org/10.32604/cmc.2022.021104
  • Erik, C., Jie, F., Federica, B., & Soujanya, P. (2015). Affective space 2: Enabling affective intuition for concept-level sentiment analysis. Proceedings of AAAI, pp. 508–514. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/10179/10357.
  • Hernández, Y., Peña, C. A., & Martínez, A. (2018, October). Model for personality detection based on text analysis. Mexican International Conference on Artificial Intelligence (pp. 207-217), Springer, Cham.
  • Hou, X. C., Huang, J., Wang, G. T., Qi, P., He, X. D., & Zhou, B. W. (2021). Selective attention based graph convolutional networks for aspect-level sentiment classification. Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15) (pp. 83-93).
  • Indira, R., & Maharani, W. (2021, July). Personality detection on social media twitter using long short-term memory with Word2Vec. 2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT) (pp. 64-69), IEEE.
  • Kazemeini, A., Roy, S. S., Mercer, R. E., & Cambria, E. (2021, December). Interpretable representation learning for personality detection. 2021 international conference on data mining workshops (ICDMW) (pp. 158-165), IEEE.
  • Keh, S. S., & Cheng, I. (2019). Myers-Briggs personality classification and personality-specific language generation using pre-trained language models. https://arxiv.org/abs/1907.06333.
  • Kerz, E., Qiao, Y., Zanwar, S., & Wiechmann, D. (2022). Pushing on personality detection from verbal behavior: A transformer meets text contours of psycholinguistic features. Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (pp. 182–194).
  • Li, Y., Kazameini, A., Mehta, Y., & Cambria, E. (2021). Multitask learning for emotion and personality detection. arXiv preprint arXiv:2101.02346, https://doi.org/10.48550/arXiv.2101.02346
  • Lynn, V., Balasubramanian, N., & Schwartz, H. A. (2020). Hierarchical modeling for user personality prediction: The role of message-level attention. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5306-5316). https://www.aclweb.org/anthology/2020.acl-main.476/.
  • Majumder, N., Poria, S., Gelbukh, A., & Cambria, E. (2017). Deep learning-based document modeling for personality detection from text. IEEE Intelligent Systems, 32(2), 74–79. https://doi.org/10.1109/MIS.2017.23
  • Mehta, Y., Fatehi, S., Kazameini, A., Stachl, C., Cambria, E., & Eetemadi, S. (2020). Bottom-up and top-down: Predicting personality with psycholinguistic and language model features. 2020 IEEE International Conference on Data Mining (ICDM) (pp. 1184-1189), IEEE.
  • Mehta, Y., Majumder, N., Gelbukh, A., & Cambria, E. (2020). Recent trends in deep learning based personality detection. Artificial Intelligence Review, 53(4), 2313–2339. https://doi.org/10.1007/s10462-019-09770-z
  • Myers, I. B., McCaulley, M. H., Quenk, N. L., & Hammer, A. L. (1998). MBTI manual: A guide to the development and use of the Myers-Briggs Type Indicator. Consulting Psychologists Press.
  • Onan, A. (2022). Bidirectional convolutional recurrent neural network architecture with group-wise enhancement mechanism for text sentiment classification. Journal of King Saud University – Computer and Information Sciences, 34(5), 2098–2117. https://doi.org/10.1016/j.jksuci.2022.02.025
  • Onan, A. (2023). GTR-GA: Harnessing the power of graph-based neural networks and genetic algorithms for text augmentation. Expert Systems with Applications, 232, https://doi.org/10.1016/j.eswa.2023.120908
  • Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71(2001), https://doi.org/10.1037/0022-0663.51.2.335
  • Rahman, M. A., Al Faisal, A., Khanam, T., Amjad, M., & Siddik, M. S. (2019, May). Personality detection from text using convolutional neural network. 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT) (pp. 1-6), IEEE.
  • Ren, Z., Shen, Q., Diao, X., & Xu, H. (2021). A sentiment-aware deep learning approach for personality detection from text. Information Processing & Management, 58(3), 102532. https://doi.org/10.1016/j.ipm.2021.102532
  • Rudra, U., Chy, A. N., & Seddiqui, M. H. (2020, December). Personality traits detection in bangla: A benchmark dataset with comparative performance analysis of state-of-the-Art methods. 2020 23rd International Conference on Computer and Information Technology (ICCIT), 1–6, IEEE.
  • Speer, R., Chin, J., & Havasi, C. C. (2016, December). 5.5: An open multilingual graph of general knowledge. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 4444–4451).
  • Štajner, S., & Yenikent, S. (2020). Mining crowdsourcing problems from discussion forums of workers. Proceedings of the 28th International Conference on Computational Linguistics, 6264–6276.
  • Štajner, S., & Yenikent, S. (2021). Why is MBTI personality detection from texts a difficult task? Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 3580–3589).
  • Štajner, S., Yenikent, S., & Franco-Salvador, M. (2021, October). Five psycholinguistic characteristics for better interaction with users. 2021 8th international conference on behavioral and social computing (BESC) (pp. 1-7), IEEE.
  • Sun, X., Huang, J., Zheng, S., Rao, X., & Wang, M. (2022). Personality assessment based on multimodal attention network learning With category-based mean square error. IEEE Transactions on Image Processing, 31, 2162–2174. https://doi.org/10.1109/TIP.2022.3152049
  • Sun, X., Liu, B., Cao, J., Luo, J., & Shen, X. (2018). Who am I? Personality detection based on deep learning for texts. 2018 IEEE International Conference on Communications (ICC) (pp. 1-7), IEEE.
  • Sun, X., Liu, B., Meng, Q., Cao, J., Luo, J., & Yin, H. (2020). Group-level personality detection based on text generated networks. World Wide Web, 23(3), 1887–1906. https://doi.org/10.1007/s11280-019-00729-2
  • Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: Liwc and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. https://doi.org/10.1177/0261927X09351676.
  • Tu, G., Wen, J., Liu, H., Chen, S., Zheng, L., & Jiang, D. (2022). Exploration meets exploitation: Multitask learning for emotion recognition based on discrete and dimensional models. Knowledge-Based Systems, 235, 107598. https://doi.org/10.1016/j.knosys.2021.107598
  • Vásquez, R. L., & Ochoa-Luna, J. (2021, October). Transformer-based approaches for personality detection using the MBTI model. 2021 XLVII Latin American computing conference (CLEI) (pp. 1-7), IEEE.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł, & Polosukhin, I. (2017). Attention is all you need. Proceedings of NIPS, pp. 5998–6008. https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  • Wu, Y. Y., Michal, K., & David, S. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4), 1036. https://doi.org/10.1073/pnas.1418680112
  • Xu, H. Q., Zhang, S. X., Zhu, G. L., & Zhu, H. Y. (2021). ALSEE: A framework for attribute-level sentiment element extraction towards product reviews. Connection Science, 1–19. https://doi.org/10.1080/09540091.2021.1914247
  • Yang, F., Quan, X., Yang, Y., & Yu, J. (2021). Multi-document transformer for personality detection. Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 16, pp. 14221-14229).
  • Yang, T., Deng, J., Quan, X., & Wang, Q. (2023). Orders are unwanted: Dynamic deep graph convolutional network for personality detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 13896-13904.
  • Yang, T., Yang, F., Haolan, O., Ouyang, H., & Quan, X. (2021). Psycholinguistic tripartite graph network for personality detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Vol. 1, pp. 4229–4239). https://doi.org/10.48550/arXiv.2106.04963.
  • Zhang, S. X., Hu, Z. Y., Zhu, G. L., Jin, M., & Li, K. C. (2021). Sentiment classification model for Chinese micro-blog comments based on key sentences extraction. Soft Computing, 25(1), 463–476. https://doi.org/10.1007/s00500-020-05160-8
  • Zhang, S. X., Xu, H. Q., Zhu, G. L., Chen, X., & Li, K. C. (2022). A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Soft Computing, 26(2), 853–866. https://doi.org/10.1007/s00500-021-06228-9
  • Zhang, S. X., Yu, H. B., & Zhu, G. L. (2022). An emotional classification method of Chinese short comment text based on ELECTRA. Connection Science, 34(1), 254–273. https://doi.org/10.1080/09540091.2021.1985968
  • Zhao, P. N., Hou, L. I., & Wu, O. (2020). Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification. Knowledge-Based Systems, 193, 105443. https://doi.org/10.1016/j.knosys.2019.105443
  • Zhou, J., Huang, J. X., Hu, Q. V., & He, L. (2020). Sk-gcn: Modeling syntax and knowledge via graph convolutional network for aspect-level sentiment classification. Knowledge-Based Systems, 205, 106292. https://www.sciencedirect.com/science/article/pii/S0950705120304524.
  • Zhu, Y., Hu, L., Ning, N., Zhang, W., & Wu, B. (2022). A lexical psycholinguistic knowledge-guided graph neural network for interpretable personality detection. Knowledge-Based Systems, 249, https://doi.org/10.1016/j.knosys.2022.108952